Rate limiting protects your API against DDoS attacks, accidental infinite loops and a single user hogging shared resources. It also enforces quotas in paid tier plans. This article explains the four main algorithms and a distributed implementation with Redis.
Fixed Window Counter
The simplest approach: keep a per-user counter for the current minute and reject when the limit is hit. Easy to build, but suffers from a boundary problem — a user bursting at the end of one minute and the start of the next can push through two minutes' worth of limits.
async function fixedWindow(userId, maxPerMinute = 60) {
const key = `rl:fixed:${userId}:${Math.floor(Date.now() / 60000)}`;
const count = await redis.incr(key);
if (count === 1) await redis.expire(key, 60);
return count <= maxPerMinute;
}
// Issue: 60 requests at second 59, 60 more at second 00
// → 120 requests slip through in 2 seconds
Sliding Window Log
Keep every request's timestamp in a sorted set; to check the limit, count entries in the last 60 seconds. Perfectly accurate but expensive on memory (one entry per request).
async function slidingLog(userId, maxPerMinute = 60) {
const key = `rl:log:${userId}`;
const now = Date.now();
const windowStart = now - 60000;
const pipe = redis.pipeline();
pipe.zremrangebyscore(key, 0, windowStart); // remove old entries
pipe.zcard(key); // current count
pipe.zadd(key, now, `${now}-${Math.random()}`);// new entry
pipe.expire(key, 60);
const [, [, count]] = await pipe.exec();
return count < maxPerMinute;
}
Sliding Window Counter
Fixed window with interpolation. A weighted average of the previous and current minute. Memory efficient, no boundary problem — the algorithm most production systems use.
async function slidingCounter(userId, maxPerMinute = 60) {
const now = Date.now();
const currentMin = Math.floor(now / 60000);
const prevMin = currentMin - 1;
const percentInCurrentMin = (now % 60000) / 60000;
const [curCount, prevCount] = await redis.mget(
`rl:sw:${userId}:${currentMin}`,
`rl:sw:${userId}:${prevMin}`
);
const weighted = (parseInt(prevCount) || 0) * (1 - percentInCurrentMin)
+ (parseInt(curCount) || 0);
if (weighted >= maxPerMinute) return false;
await redis.multi()
.incr(`rl:sw:${userId}:${currentMin}`)
.expire(`rl:sw:${userId}:${currentMin}`, 120)
.exec();
return true;
}
Token Bucket
Every user has a "bucket" of capacity N. R tokens are added per second. Every request spends 1 token. If the bucket is empty, the request is rejected. Bursts are allowed but the sustained rate is capped — AWS, Stripe and most large APIs use this model.
-- Atomic token bucket (Redis Lua script)
local key = KEYS[1]
local capacity = tonumber(ARGV[1])
local rate = tonumber(ARGV[2]) -- tokens per second
local now = tonumber(ARGV[3])
local cost = tonumber(ARGV[4]) or 1
local bucket = redis.call('HMGET', key, 'tokens', 'last')
local tokens = tonumber(bucket[1]) or capacity
local last = tonumber(bucket[2]) or now
-- Add tokens based on elapsed time
local elapsed = math.max(0, now - last)
tokens = math.min(capacity, tokens + (elapsed * rate / 1000))
if tokens < cost then
redis.call('HSET', key, 'tokens', tokens, 'last', now)
redis.call('EXPIRE', key, 300)
return 0
end
tokens = tokens - cost
redis.call('HSET', key, 'tokens', tokens, 'last', now)
redis.call('EXPIRE', key, 300)
return 1
const LUA_SCRIPT = `...` // the script above
const sha = await redis.script('load', LUA_SCRIPT);
async function tokenBucket(userId, capacity = 10, rate = 1) {
const result = await redis.evalsha(sha, 1,
`rl:tb:${userId}`, capacity, rate, Date.now());
return result === 1;
}
Leaky Bucket
A cousin of the token bucket — requests go into a queue and leave at a fixed rate. Because requests are buffered, it smooths out bursts. Usually used for traffic shaping.
Which Algorithm Should You Pick?
Layered Rate Limiting
// Express middleware — multi-layered
const rateLimit = require('express-rate-limit');
// 1) Global — per IP
app.use(rateLimit({ windowMs: 60000, max: 300 }));
// 2) Auth — brute-force protection
app.use('/login', rateLimit({ windowMs: 900000, max: 5 }));
// 3) Authenticated user — plan-based
app.use('/api', async (req, res, next) => {
if (!req.user) return next();
const limits = {
free: { rpm: 60, burst: 10 },
pro: { rpm: 300, burst: 50 },
ent: { rpm: 3000, burst: 500 }
};
const limit = limits[req.user.plan];
const ok = await tokenBucket(req.user.id, limit.burst, limit.rpm / 60);
if (!ok) return res.status(429).json({ error: 'Rate limit exceeded' });
next();
});
Response Headers
X-RateLimit-Limit: 60
X-RateLimit-Remaining: 42
X-RateLimit-Reset: 1713398400
Retry-After: 30
# In the 429 response:
HTTP/1.1 429 Too Many Requests
Retry-After: 15
{
"error": "rate_limit_exceeded",
"retryAfter": 15
}
Enterprise: Plans + Quotas
Rate limits (per-second) plus quotas (per-month) are the standard model in SaaS products. Rate limit lives in Redis, quota in PostgreSQL (as a monthly cumulative counter).
Conclusion
A production API without rate limiting can be taken out overnight by a DDoS or an infinite-loop bug. A sliding counter is enough to start; at scale prefer the token bucket. Redis + a Lua script gives you distributed rate limiting in 50 lines of code.
Reach out to KEYDAL for distributed rate limiting, quota management and DDoS protection. Contact us