Thanks to visit codestin.com
Credit goes to github.com

Skip to content

diogomassis/sentinel-gateway

Repository files navigation

Sentinel Gateway

A high-performance, distributed API Gateway with resilient rate limiting, built with Java 21 and Netty. This project is designed to protect internal services from abuse and traffic spikes, ensuring P99 latency ≤ 5ms and 99.99% availability through a "Fail-Closed" architecture.

Architecture Overview

The project implements a high-throughput, non-blocking gateway using Netty, with a hybrid rate-limiting strategy backed by Redis and local caching.

  • Application: A high-performance HTTP server built with Java 21 and Netty.
  • Rate Limiting: Token Bucket algorithm implemented via Lua scripts in Redis for atomicity.
  • State Management: Hybrid approach using Redis (Global State) and Local Cache (Hot Path).
  • Containerization: Multi-stage Docker builds for lean, production-ready images.
  • Orchestration: k3d (Kubernetes) for local cluster orchestration.
  • Ingress: NGINX Ingress Controller manages external access.
  • Observability: Micrometer exposes Prometheus metrics for latency, decisions, and errors.

Architectural Decisions

1. Fail-Closed Posture

What: When the system cannot safely determine if a request should be allowed (e.g., Redis timeout or error), the request is denied.

Why:

  • The gateway is a protection mechanism, not just a delivery mechanism.
  • Allowing abusive traffic during a failure could cascade to internal services.
  • Blocking legitimate traffic is a local, recoverable issue; crashing downstream services is a systemic failure.

2. Hybrid Token Bucket (Hot vs. Cold Path)

What: We use a two-tier state strategy to minimize latency.

How:

  • Hot Path (Local Cache): The gateway first checks a local ConcurrentHashMap. If tokens are available locally, they are consumed immediately. This path involves zero network calls, ensuring sub-millisecond latency.
  • Cold Path (Redis Sync): If the local bucket is empty or missing, the gateway calls Redis. A Lua script atomically recalculates tokens and syncs the state back to the local instance.

Why:

  • Performance: Redis cannot be in the critical path of every request if we want P99 ≤ 5ms.
  • Scalability: Reduces load on Redis by handling the majority of traffic locally.

3. Deterministic Concurrency Control

What:

  • Local: Fine-grained StampedLock per bucket to handle concurrent threads within the same instance.
  • Distributed: Redis Lua scripts ensure atomicity across multiple gateway instances.

Why:

  • Prevents race conditions where multiple requests could consume the same token.
  • Ensures the global rate limit is respected (within a 1-2% acceptable drift margin).

Known Issues, Trade-offs, and Future Improvements

Trade-offs

  • Consistency vs. Latency: We accept a small margin of error (1-2% overshoot) in exchange for extreme speed. The local cache might slightly lag behind the global state.
  • Fail-Closed Impact: In the event of a total Redis failure, traffic will be blocked once local tokens run out. This is a conscious decision to prioritize system stability over availability during outages.
  • Java GC: Using Java implies managing Garbage Collection. We mitigate this by minimizing allocations in the hot path, but GC pauses are still a factor.

Future Improvements

  • Adaptive Rate Limiting: Adjust limits dynamically based on downstream health.
  • Sharding: Shard Redis to handle even higher throughput.
  • Billing Integration: Connect usage metrics to a billing system.

Project Structure

.
├── src/
│   ├── main/
│   │   ├── java/com/sentinel/
│   │   │   ├── server/          # Netty Server & Handlers (HTTP, RateLimit, Metrics)
│   │   │   ├── ratelimit/       # Core Logic (Service, LocalTokenBucket)
│   │   │   └── infrastructure/  # Redis Manager & Configuration
│   │   └── resources/
│   │       └── scripts/         # Lua scripts for Redis
├── k6/                          # Load testing scripts
│   └── k6-load-test.js
├── k8s/                         # Kubernetes manifests
│   ├── deployment.yaml
│   ├── service.yaml
│   └── ingress.yaml
├── ci.Dockerfile                # Build environment
├── Dockerfile                   # Runtime image
├── build.gradle                 # Project dependencies
├── k3d-config.yaml              # Local cluster config
└── Makefile                     # Automation scripts

Prerequisites

Ensure you have the following tools installed on your system:

Configuration

The application can be configured via environment variables.

Variable Description Default
REDIS_URI Connection string for Redis. redis://localhost:6379
PORT HTTP Server port. 8081
REDIS_TIMEOUT_MS Timeout for Redis operations. Set to 50 for local/Docker environments, 2 for production. 2

Getting Started

A Makefile is provided to automate the entire lifecycle.

1. Full Setup (Recommended)

This command creates the cluster, builds the image, installs ingress, and deploys the app.

make run

The gateway will be accessible at http://localhost:8085.

2. Run Load Tests

We use k6 to simulate traffic for Free, Pro, and Enterprise plans.

# Run the load test container
docker run --rm -i --add-host=host.docker.internal:host-gateway grafana/k6 run - < k6/k6-load-test.js

3. View Metrics

Watch the rate limiting metrics update in real-time:

watch -n 1 "curl -s http://localhost:8085/metrics | grep gateway_"

Cleaning Up

To delete the cluster and remove resources:

make clean

API Endpoints

Protected Resource

The main endpoint protected by the rate limiter.

  • Endpoint: GET /api/resource
  • Headers:
    • X-API-Key: (Optional)
      • No key: Free Plan (10 req/s)
      • pro_...: Pro Plan (50 req/s)
      • enterprise_...: Enterprise Plan (500 req/s)
  • Response:
    • 200 OK: Request allowed.
    • 429 Too Many Requests: Rate limit exceeded.

Metrics

Prometheus metrics endpoint.

  • Endpoint: GET /metrics
  • Key Metrics:
    • gateway_decision_local_total: Decisions made by local cache.
    • gateway_decision_redis_total: Decisions requiring Redis sync.
    • gateway_decision_denied_total: Blocked requests.
    • gateway_request_latency_seconds: Latency distribution.

Rate Limiting Plans

Plan Burst Capacity Refill Rate Identifier
Free 20 tokens 10 req/s IP Address
Pro 100 tokens 50 req/s API Key
Enterprise 1000 tokens 500 req/s API Key (enterprise_*)