Rate limiting protects your API against DDoS attacks, accidental infinite loops and a single user hogging shared resources. It also enforces quotas in paid tier plans. This article explains the four main algorithms and a distributed implementation with Redis.

Fixed Window Counter

The simplest approach: keep a per-user counter for the current minute and reject when the limit is hit. Easy to build, but suffers from a boundary problem — a user bursting at the end of one minute and the start of the next can push through two minutes' worth of limits.

async function fixedWindow(userId, maxPerMinute = 60) {
    const key = `rl:fixed:${userId}:${Math.floor(Date.now() / 60000)}`;
    const count = await redis.incr(key);
    if (count === 1) await redis.expire(key, 60);
    return count <= maxPerMinute;
}

// Issue: 60 requests at second 59, 60 more at second 00
// → 120 requests slip through in 2 seconds

Sliding Window Log

Keep every request's timestamp in a sorted set; to check the limit, count entries in the last 60 seconds. Perfectly accurate but expensive on memory (one entry per request).

async function slidingLog(userId, maxPerMinute = 60) {
    const key = `rl:log:${userId}`;
    const now = Date.now();
    const windowStart = now - 60000;

    const pipe = redis.pipeline();
    pipe.zremrangebyscore(key, 0, windowStart);   // remove old entries
    pipe.zcard(key);                               // current count
    pipe.zadd(key, now, `${now}-${Math.random()}`);// new entry
    pipe.expire(key, 60);
    const [, [, count]] = await pipe.exec();
    return count < maxPerMinute;
}

Sliding Window Counter

Fixed window with interpolation. A weighted average of the previous and current minute. Memory efficient, no boundary problem — the algorithm most production systems use.

async function slidingCounter(userId, maxPerMinute = 60) {
    const now = Date.now();
    const currentMin = Math.floor(now / 60000);
    const prevMin = currentMin - 1;
    const percentInCurrentMin = (now % 60000) / 60000;

    const [curCount, prevCount] = await redis.mget(
        `rl:sw:${userId}:${currentMin}`,
        `rl:sw:${userId}:${prevMin}`
    );

    const weighted = (parseInt(prevCount) || 0) * (1 - percentInCurrentMin)
                   + (parseInt(curCount) || 0);
    if (weighted >= maxPerMinute) return false;

    await redis.multi()
        .incr(`rl:sw:${userId}:${currentMin}`)
        .expire(`rl:sw:${userId}:${currentMin}`, 120)
        .exec();
    return true;
}

Token Bucket

Every user has a "bucket" of capacity N. R tokens are added per second. Every request spends 1 token. If the bucket is empty, the request is rejected. Bursts are allowed but the sustained rate is capped — AWS, Stripe and most large APIs use this model.

-- Atomic token bucket (Redis Lua script)
local key = KEYS[1]
local capacity = tonumber(ARGV[1])
local rate = tonumber(ARGV[2])  -- tokens per second
local now = tonumber(ARGV[3])
local cost = tonumber(ARGV[4]) or 1

local bucket = redis.call('HMGET', key, 'tokens', 'last')
local tokens = tonumber(bucket[1]) or capacity
local last = tonumber(bucket[2]) or now

-- Add tokens based on elapsed time
local elapsed = math.max(0, now - last)
tokens = math.min(capacity, tokens + (elapsed * rate / 1000))

if tokens < cost then
    redis.call('HSET', key, 'tokens', tokens, 'last', now)
    redis.call('EXPIRE', key, 300)
    return 0
end

tokens = tokens - cost
redis.call('HSET', key, 'tokens', tokens, 'last', now)
redis.call('EXPIRE', key, 300)
return 1
const LUA_SCRIPT = `...` // the script above
const sha = await redis.script('load', LUA_SCRIPT);

async function tokenBucket(userId, capacity = 10, rate = 1) {
    const result = await redis.evalsha(sha, 1,
        `rl:tb:${userId}`, capacity, rate, Date.now());
    return result === 1;
}

Leaky Bucket

A cousin of the token bucket — requests go into a queue and leave at a fixed rate. Because requests are buffered, it smooths out bursts. Usually used for traffic shaping.

Which Algorithm Should You Pick?

Layered Rate Limiting

// Express middleware — multi-layered
const rateLimit = require('express-rate-limit');

// 1) Global — per IP
app.use(rateLimit({ windowMs: 60000, max: 300 }));

// 2) Auth — brute-force protection
app.use('/login', rateLimit({ windowMs: 900000, max: 5 }));

// 3) Authenticated user — plan-based
app.use('/api', async (req, res, next) => {
    if (!req.user) return next();
    const limits = {
        free:  { rpm: 60,  burst: 10 },
        pro:   { rpm: 300, burst: 50 },
        ent:   { rpm: 3000, burst: 500 }
    };
    const limit = limits[req.user.plan];
    const ok = await tokenBucket(req.user.id, limit.burst, limit.rpm / 60);
    if (!ok) return res.status(429).json({ error: 'Rate limit exceeded' });
    next();
});

Response Headers

X-RateLimit-Limit: 60
X-RateLimit-Remaining: 42
X-RateLimit-Reset: 1713398400
Retry-After: 30

# In the 429 response:
HTTP/1.1 429 Too Many Requests
Retry-After: 15

{
  "error": "rate_limit_exceeded",
  "retryAfter": 15
}

Enterprise: Plans + Quotas

Rate limits (per-second) plus quotas (per-month) are the standard model in SaaS products. Rate limit lives in Redis, quota in PostgreSQL (as a monthly cumulative counter).

Conclusion

A production API without rate limiting can be taken out overnight by a DDoS or an infinite-loop bug. A sliding counter is enough to start; at scale prefer the token bucket. Redis + a Lua script gives you distributed rate limiting in 50 lines of code.

API protection and rate limiting

Reach out to KEYDAL for distributed rate limiting, quota management and DDoS protection. Contact us

WhatsApp