SPARQL is a powerful query language over linked data. As one might expect, it is therefore quite easy to write expensive queries; those that consume significant time, computation or memory to execute. When one knows the people doing the querying this isn’t too much of an issue, but when it’s been opened to the public web things are a little different. One would certainly think twice about allowing arbitrary SQL queries through a public interface.
The CLAROS SPARQL endpoint uses rate-limiting and query time-outs to protect it from abuse, unintentional or otherwise.
The CLAROS data site is built on top of Fuseki (a web-based interface to a Jena triplestore) and humfrey (a RESTful web framework for displaying data from SPARQL endpoints). humfrey allows us to mediate requests to the underlying Fuseki instance.
humfrey uses redis to maintain a lock for each IP address to prevent a single user issuing multiple queries concurrently. Each IP has a score associated with it which, when a query is performed, is increased by the number of seconds it takes to run. It also decays at a constant rate of 0.05 per second. When a query is run when the score exceeds 10 we delay the query by (score – 10) seconds. When it exceeds 20 we reject the query. As an example, if I run a query that takes 7 seconds, wait a minute (reducing the score by 3) and then execute another 7 second query, a query run immediately afterwards would be delayed by one second. The code that implements this can be found on GitHub.
This policy allows users a buffer-zone before they hit any limits, and we don’t expect that most users will notice. However, it should have an effect on people trying to spider the data to the detriment of other users. If you need to do this, there are easier ways; contact us!
Jena‘s ARQ recently gained the capability to have queries time out. This hasn’t yet been exposed through Fuseki, so we’re currently running a forked version of Fuseki with time-outs hard-coded at eight seconds. We hope to abandon our fork as soon as this functionality appears.
- Question on answers.semanticweb.com about mitigating abuse of public SPARQL endpoints
- Main JIRA issue for query cancelling and time-outs in ARQ