Wednesday, February 6, 2008

Throttling service requests: a puzzle

So I have this proxy for the Project Gutenberg. It routs most of the pages through it from the original P. G. site, but to the pages that allow downloading the texts it adds the SONY Reader option. When user clicks on it, it fetches the text file from P. G. site, transcodes it, and returns it to the user.

One important thing that I had to solve was throttling the request stream to the original site so that I don't trigger their abuse logic.

In a more general terms, it can be formulated thus: A service can process N requests per second. Design a system that ensures that there's never more than N requests sent in any given one-second interval.

Note that the imprecise solution is beyond straightforward: have a counter and clear it every second. However, if N-1 requests are sent at time 0.75 and another N-1 requests at time 1.25, this simplified system would violate the requirement, because in one-second interval starting [0.5, 1.5) there will be 2N-2 requests.

Have a solution? Post it here. Interested in a solution? Check the comments; I will post mine in a couple of days here.

3 comments:

dimrub said...

Have the requests added to a list. Upon a new request, check whether there are any at the tail of the list that are older than a second, and remove them, simultaneously substracting their number from a running counter.

Alex Efros said...

Code untested. :) If you've simpler reliable solution which doesn't need cyclic array to hold amount of requests sent on every queue check - it will be interesting to see.

#!/usr/bin/perl
use Time::HiRes qw( sleep );
use List::Util qw( sum );

# max requests per second
my $RATE = 10;
# check queue 5 times per second
my @Request = (0) x 5;

for (my $i = 0; 1; $i = ($i+1) % @Request) {
  $Request[$i] = 0;
  my $can_request = $RATE - sum(@Request);
  # do_request() should return false if it
  # has nothing to do (queue is empty)
  $Request[$i]++ while $can_request-- && do_request();
  sleep 1/@Request;
}

P.S. sum() call can be optimized away and replaced by additional variable, but I leave it for clarity.

P.P.S. It also a good idea to add some protection agains queue overflow (if your proxy receive much more simultaneous requests than it able to handle in sensible period of time it better to notify user about queue overflow than silently accept request which will be executed month later).

Anonymous said...

Just host it on Vista. You probably will never hit your limit :-) ( and if you will Bill&Steve may send you that special prize - well worth it !)