API Rate Limiting: Token Bucket, Sliding Window and

Rate limiting protects your API against DDoS attacks, accidental infinite loops and a single user hogging shared resources. It also enforces quotas in paid tier plans. This article explains the four main algorithms and a distributed implementation with Redis.

Fixed Window Counter

Related guides: REST API security · GraphQL vs REST · KEYDAL API services

The simplest approach: keep a per-user counter for the current minute and reject when the limit is hit. Easy to build, but suffers from a boundary problem — a user bursting at the end of one minute and the start of the next can push through two minutes' worth of limits.

async function fixedWindow(userId, maxPerMinute = 60) {
    const key = `rl:fixed:${userId}:${Math.floor(Date.now() / 60000)}`;
    const count = await redis.incr(key);
    if (count === 1) await redis.expire(key, 60);
    return count <= maxPerMinute;
}

// Issue: 60 requests at second 59, 60 more at second 00
// → 120 requests slip through in 2 seconds

Sliding Window Log

Keep every request's timestamp in a sorted set; to check the limit, count entries in the last 60 seconds. Perfectly accurate but expensive on memory (one entry per request).

async function slidingLog(userId, maxPerMinute = 60) {
    const key = `rl:log:${userId}`;
    const now = Date.now();
    const windowStart = now - 60000;

    const pipe = redis.pipeline();
    pipe.zremrangebyscore(key, 0, windowStart);   // remove old entries
    pipe.zcard(key);                               // current count
    pipe.zadd(key, now, `${now}-${Math.random()}`);// new entry
    pipe.expire(key, 60);
    const [, [, count]] = await pipe.exec();
    return count < maxPerMinute;
}

Sliding Window Counter

Fixed window with interpolation. A weighted average of the previous and current minute. Memory efficient, no boundary problem — the algorithm most production systems use.

async function slidingCounter(userId, maxPerMinute = 60) {
    const now = Date.now();
    const currentMin = Math.floor(now / 60000);
    const prevMin = currentMin - 1;
    const percentInCurrentMin = (now % 60000) / 60000;

    const [curCount, prevCount] = await redis.mget(
        `rl:sw:${userId}:${currentMin}`,
        `rl:sw:${userId}:${prevMin}`
    );

    const weighted = (parseInt(prevCount) || 0) * (1 - percentInCurrentMin)
                   + (parseInt(curCount) || 0);
    if (weighted >= maxPerMinute) return false;

    await redis.multi()
        .incr(`rl:sw:${userId}:${currentMin}`)
        .expire(`rl:sw:${userId}:${currentMin}`, 120)
        .exec();
    return true;
}

Token Bucket

Every user has a "bucket" of capacity N. R tokens are added per second. Every request spends 1 token. If the bucket is empty, the request is rejected. Bursts are allowed but the sustained rate is capped — AWS, Stripe and most large APIs use this model.

-- Atomic token bucket (Redis Lua script)
local key = KEYS[1]
local capacity = tonumber(ARGV[1])
local rate = tonumber(ARGV[2])  -- tokens per second
local now = tonumber(ARGV[3])
local cost = tonumber(ARGV[4]) or 1

local bucket = redis.call('HMGET', key, 'tokens', 'last')
local tokens = tonumber(bucket[1]) or capacity
local last = tonumber(bucket[2]) or now

-- Add tokens based on elapsed time
local elapsed = math.max(0, now - last)
tokens = math.min(capacity, tokens + (elapsed * rate / 1000))

if tokens < cost then
    redis.call('HSET', key, 'tokens', tokens, 'last', now)
    redis.call('EXPIRE', key, 300)
    return 0
end

tokens = tokens - cost
redis.call('HSET', key, 'tokens', tokens, 'last', now)
redis.call('EXPIRE', key, 300)
return 1

const LUA_SCRIPT = `...` // the script above
const sha = await redis.script('load', LUA_SCRIPT);

async function tokenBucket(userId, capacity = 10, rate = 1) {
    const result = await redis.evalsha(sha, 1,
        `rl:tb:${userId}`, capacity, rate, Date.now());
    return result === 1;
}

Leaky Bucket

A cousin of the token bucket — requests go into a queue and leave at a fixed rate. Because requests are buffered, it smooths out bursts. Usually used for traffic shaping.

Which Algorithm Should You Pick?

Algorithm	Bursts	Accuracy	Memory	Use Case
Fixed Window	Allowed	Low	O(1)	Simple systems
Sliding Log	No	Exact	O(n)	Low traffic
Sliding Counter	Limited	High	O(1)	General purpose
Token Bucket	Allowed	High	O(1)	Public APIs
Leaky Bucket	No	High	O(1)	Traffic shaping

Layered Rate Limiting

// Express middleware — multi-layered
const rateLimit = require('express-rate-limit');

// 1) Global — per IP
app.use(rateLimit({ windowMs: 60000, max: 300 }));

// 2) Auth — brute-force protection
app.use('/login', rateLimit({ windowMs: 900000, max: 5 }));

// 3) Authenticated user — plan-based
app.use('/api', async (req, res, next) => {
    if (!req.user) return next();
    const limits = {
        free:  { rpm: 60,  burst: 10 },
        pro:   { rpm: 300, burst: 50 },
        ent:   { rpm: 3000, burst: 500 }
    };
    const limit = limits[req.user.plan];
    const ok = await tokenBucket(req.user.id, limit.burst, limit.rpm / 60);
    if (!ok) return res.status(429).json({ error: 'Rate limit exceeded' });
    next();
});

Response Headers

X-RateLimit-Limit: 60
X-RateLimit-Remaining: 42
X-RateLimit-Reset: 1713398400
Retry-After: 30

# In the 429 response:
HTTP/1.1 429 Too Many Requests
Retry-After: 15

{
  "error": "rate_limit_exceeded",
  "retryAfter": 15
}

Enterprise: Plans + Quotas

Rate limits (per-second) plus quotas (per-month) are the standard model in SaaS products. Rate limit lives in Redis, quota in PostgreSQL (as a monthly cumulative counter).

API Design Principles and Secure Endpoint Architecture

Professional API design combines four elements: protocol choice (REST for CRUD, GraphQL for flexible queries, gRPC for microservice-to-microservice), authentication (OAuth 2.0 / OIDC, JWT access tokens, refresh token rotation), rate limiting (token bucket, sliding window, per-IP/user/API key) and versioning (URL versioning /v1/, header versioning, deprecation processes). API endpoint security uses input validation, prepared statements, CORS policy, idempotency-key (for POST), webhook signature verification and OpenAPI/Swagger documentation. For high-traffic APIs, use Redis for rate limit and cache, Kafka or RabbitMQ for async job queues, and OpenTelemetry for distributed tracing.

API protection and rate limiting

Reach out to KEYDAL for distributed rate limiting, quota management and DDoS protection. Contact us

Readers of this article also read these

api 18 min

API Rate Limiting: Token Bucket, Sliding Window and Redis Implementation