Distributed WebSocket server built on GoAkt with clustering, pubsub fanout, and optional durability via Google Spanner.
This service accepts WebSocket connections behind a load balancer:
- Each WebSocket connection is represented by a
Memberactor - Members join/leave rooms and forward inbound messages to a
Routersingleton - The Router routes to
Roomgrains, which broadcast to members via GoAkt pubsub - With the Spanner store enabled, the system adds:
- session resume (token + room list)
- basic replay (recent messages)
- distributed rate limiting
- Server (
server/server.go): HTTP/WebSocket gateway, cluster boot, health endpoints, metrics, tracing, and auth/rate limiting. - Member actor (
actor/member.go): One per WebSocket connection; handles join/leave, inbound/outbound messaging, session resume, and replay. - Room grain (
actor/room.go): Tracks membership and publishes messages viaTopicActor. - Router singleton (
actor/router.go): Routes room messages to the right grain. - Store abstraction (
store/store.go): Pluggable durability layer with a Spanner implementation (store/spanner_store.go). - Tracing helpers (
actor/trace.go): W3C trace context propagation on message envelopes.
All payloads are JSON-encoded Envelope messages (protojson).
{"join_room":{"room_id":"room-1"}}
{"leave_room":{"room_id":"room-1"}}
{"message":{"target":"room-1","content":"hello","message_id":"optional","sent_at":"optional"}}{"message":{"target":"room-1","content":"hello","message_id":"...","sent_at":"...","traceparent":"...","tracestate":"..."}}
{"session":{"token":"...","member_id":"...","rooms":["room-1","room-2"]}}message_idandsent_atare optional on inbound messages; the server fills them if missing.- Trace context (
traceparent,tracestate) is propagated onMessage. - The server sends a
sessionenvelope on connect to enable resume.
Apply the schema in store/spanner_schema.sql before enabling the Spanner provider.
The store is used for:
- Room membership persistence
- Session resume (token + room list)
- Recent message replay (
SESSION_REPLAY_LIMIT) - Distributed rate limiting (
RATE_LIMIT_DISTRIBUTED)
- Metrics: Prometheus exporter at
METRICS_PATH. - Tracing: OTLP gRPC exporter to the Collector; HTTP and actor spans are linked via W3C trace context.
Chart is in helm/goakt-ws.
Example:
helm upgrade --install goakt-ws helm/goakt-ws \
--set image.repository=ghcr.io/tochemey/goakt-ws \
--set config.cluster.enabled=true
Regenerate protobufs:
earthly --no-cache +protogen
Run tests:
go test ./...
- Make it runnable in Kubernetes (Helm values, manifests, docs).
- Comprehensive tests: unit, integration, e2e, and load tests.
- Protocol versioning for backward compatibility (clients now receive
Envelopepayloads). - Replay semantics with offsets/acks and idempotency enforcement.
- Rate limiting upgrades (GCRA/leaky bucket), per-room/user quotas, and abuse protection.
- Store hygiene: Spanner TTL/GC policies, index tuning, and migration workflow.
- Failure handling: circuit breakers, retry policy, and failover for store outages.
- Security: JWT validation, token rotation, room ACLs, TLS termination guidance.
- Operational: PDBs, graceful drain, load tests, chaos tests, and alerting SLOs.