A high-performance, distributed API Gateway with resilient rate limiting, built with Java 21 and Netty. This project is designed to protect internal services from abuse and traffic spikes, ensuring P99 latency ≤ 5ms and 99.99% availability through a "Fail-Closed" architecture.
The project implements a high-throughput, non-blocking gateway using Netty, with a hybrid rate-limiting strategy backed by Redis and local caching.
- Application: A high-performance HTTP server built with Java 21 and Netty.
- Rate Limiting: Token Bucket algorithm implemented via Lua scripts in Redis for atomicity.
- State Management: Hybrid approach using Redis (Global State) and Local Cache (Hot Path).
- Containerization: Multi-stage Docker builds for lean, production-ready images.
- Orchestration: k3d (Kubernetes) for local cluster orchestration.
- Ingress: NGINX Ingress Controller manages external access.
- Observability: Micrometer exposes Prometheus metrics for latency, decisions, and errors.
What: When the system cannot safely determine if a request should be allowed (e.g., Redis timeout or error), the request is denied.
Why:
- The gateway is a protection mechanism, not just a delivery mechanism.
- Allowing abusive traffic during a failure could cascade to internal services.
- Blocking legitimate traffic is a local, recoverable issue; crashing downstream services is a systemic failure.
What: We use a two-tier state strategy to minimize latency.
How:
- Hot Path (Local Cache): The gateway first checks a local
ConcurrentHashMap. If tokens are available locally, they are consumed immediately. This path involves zero network calls, ensuring sub-millisecond latency. - Cold Path (Redis Sync): If the local bucket is empty or missing, the gateway calls Redis. A Lua script atomically recalculates tokens and syncs the state back to the local instance.
Why:
- Performance: Redis cannot be in the critical path of every request if we want P99 ≤ 5ms.
- Scalability: Reduces load on Redis by handling the majority of traffic locally.
What:
- Local: Fine-grained
StampedLockper bucket to handle concurrent threads within the same instance. - Distributed: Redis Lua scripts ensure atomicity across multiple gateway instances.
Why:
- Prevents race conditions where multiple requests could consume the same token.
- Ensures the global rate limit is respected (within a 1-2% acceptable drift margin).
- Consistency vs. Latency: We accept a small margin of error (1-2% overshoot) in exchange for extreme speed. The local cache might slightly lag behind the global state.
- Fail-Closed Impact: In the event of a total Redis failure, traffic will be blocked once local tokens run out. This is a conscious decision to prioritize system stability over availability during outages.
- Java GC: Using Java implies managing Garbage Collection. We mitigate this by minimizing allocations in the hot path, but GC pauses are still a factor.
- Adaptive Rate Limiting: Adjust limits dynamically based on downstream health.
- Sharding: Shard Redis to handle even higher throughput.
- Billing Integration: Connect usage metrics to a billing system.
.
├── src/
│ ├── main/
│ │ ├── java/com/sentinel/
│ │ │ ├── server/ # Netty Server & Handlers (HTTP, RateLimit, Metrics)
│ │ │ ├── ratelimit/ # Core Logic (Service, LocalTokenBucket)
│ │ │ └── infrastructure/ # Redis Manager & Configuration
│ │ └── resources/
│ │ └── scripts/ # Lua scripts for Redis
├── k6/ # Load testing scripts
│ └── k6-load-test.js
├── k8s/ # Kubernetes manifests
│ ├── deployment.yaml
│ ├── service.yaml
│ └── ingress.yaml
├── ci.Dockerfile # Build environment
├── Dockerfile # Runtime image
├── build.gradle # Project dependencies
├── k3d-config.yaml # Local cluster config
└── Makefile # Automation scripts
Ensure you have the following tools installed on your system:
The application can be configured via environment variables.
| Variable | Description | Default |
|---|---|---|
REDIS_URI |
Connection string for Redis. | redis://localhost:6379 |
PORT |
HTTP Server port. | 8081 |
REDIS_TIMEOUT_MS |
Timeout for Redis operations. Set to 50 for local/Docker environments, 2 for production. |
2 |
A Makefile is provided to automate the entire lifecycle.
This command creates the cluster, builds the image, installs ingress, and deploys the app.
make runThe gateway will be accessible at http://localhost:8085.
We use k6 to simulate traffic for Free, Pro, and Enterprise plans.
# Run the load test container
docker run --rm -i --add-host=host.docker.internal:host-gateway grafana/k6 run - < k6/k6-load-test.jsWatch the rate limiting metrics update in real-time:
watch -n 1 "curl -s http://localhost:8085/metrics | grep gateway_"To delete the cluster and remove resources:
make cleanThe main endpoint protected by the rate limiter.
- Endpoint:
GET /api/resource - Headers:
X-API-Key: (Optional)- No key: Free Plan (10 req/s)
pro_...: Pro Plan (50 req/s)enterprise_...: Enterprise Plan (500 req/s)
- Response:
200 OK: Request allowed.429 Too Many Requests: Rate limit exceeded.
Prometheus metrics endpoint.
- Endpoint:
GET /metrics - Key Metrics:
gateway_decision_local_total: Decisions made by local cache.gateway_decision_redis_total: Decisions requiring Redis sync.gateway_decision_denied_total: Blocked requests.gateway_request_latency_seconds: Latency distribution.
| Plan | Burst Capacity | Refill Rate | Identifier |
|---|---|---|---|
| Free | 20 tokens | 10 req/s | IP Address |
| Pro | 100 tokens | 50 req/s | API Key |
| Enterprise | 1000 tokens | 500 req/s | API Key (enterprise_*) |