- Introduction
- Quick Start
- Architecture
- Configuration
- Message Formats
- Language Support
- Running the Service
- Simulator
- Benchmarking
- Development
- Troubleshooting
- License
SCRC (System for Compiling and Running Code) is a production-ready worker service designed for secure, isolated code execution. It consumes program submissions from Kafka, enforces resource limits, executes programs inside Docker containers, and reports results back to Kafka. The service is built with extensibility in mind, allowing new programming languages to be added over time while maintaining strict isolation and security boundaries.
SCRC follows a hexagonal architecture pattern, separating domain logic from infrastructure concerns, making it easy to test, maintain, and extend. The system is designed to handle high-throughput workloads with configurable parallelism and horizontal scaling capabilities.
- Online Judge Systems: Power competitive programming platforms that need to evaluate code submissions safely and efficiently
- Code Evaluation Services: Provide secure code execution for educational platforms, coding interview platforms, or automated code review systems
- Sandboxed Execution: Run untrusted code in isolated environments with strict resource limits
- Multi-language Support: Support multiple programming languages with a unified execution interface
- High-throughput Processing: Handle large volumes of code submissions with horizontal scaling
- Multi-language Support: Currently supports Python, Go, C, C++, and Java with an extensible architecture for adding more languages
- Container Isolation: All code execution happens in ephemeral Docker containers for complete isolation
- Resource Limits: Configurable time and memory limits per execution
- Test Case Execution: Support for multiple test cases per submission with detailed per-test results
- Horizontal Scaling: Kafka-based architecture enables horizontal scaling across multiple runner instances
- Build/Run Separation: Compiled languages use separate build and run images for optimal performance
- Comprehensive Benchmarking: Built-in benchmarking suite for performance analysis
- Production Ready: Includes integration tests, error handling, and graceful shutdown
- Go 1.25.3 or later: Required for building and running the service
- Docker: Required for containerized execution (Docker daemon must be accessible)
- Kafka: Required for message queue (can use Docker Compose setup for local development)
- Python 3.x: Optional, required only for running the simulator and benchmark suite
For local development:
- Docker Desktop or Docker Engine with socket access
- Kafka cluster (or use the provided Docker Compose setup)
Clone the repository:
git clone <repository-url>
cd scrcRun the Go program directly (requires Kafka to be running):
go run ./cmd/scrcThe service will connect to Kafka at kafka:9092 by default. Ensure Kafka is accessible or configure KAFKA_BROKERS environment variable.
Build the image and run the container:
docker build -t scrc .
docker run --rm -v /var/run/docker.sock:/var/run/docker.sock \
-e KAFKA_BROKERS=localhost:9092 \
scrcThe container must have access to the Docker socket to create execution containers.
Start a complete demo environment with Kafka and a single runner:
docker compose --profile single-runner --profile load-generator up --buildThis starts Kafka, ZooKeeper, the runner service, and a simulator that generates test submissions.
SCRC follows a hexagonal (ports and adapters) architecture pattern, separating the core domain logic from infrastructure concerns. The system consists of:
- Domain Layer (
internal/domain/execution): Core business entities (Script, Result, Status, Limits, TestCase) - Application Layer (
internal/app/executor): Orchestration logic that coordinates script execution - Infrastructure Layer (
internal/infra/kafka): Kafka adapters for consuming scripts and publishing results - Runtime Layer (
internal/runtime): Language-agnostic execution engine with Docker implementation - Ports (
internal/ports): Interfaces that define contracts between layers
┌──────────────────────────────────────────────────────────────┐
│ Kafka Topics │
│ ┌──────────────┐ ┌──────────────┐ │
│ │ scripts │ │script-results│ │
│ └──────┬───────┘ └────────▲─────┘ │
└─────────┼─────────────────────────────────────┼──────────────┘
│ │
│ │
┌─────────▼─────────────────────────────────────┴──────────────┐
│ SCRC Runner Service │
│ ┌───────────────────────────────────────────────────────┐ │
│ │ Kafka Consumer (Adapter) │ │
│ └──────────────────────┬────────────────────────────────┘ │
│ │ │
│ ┌──────────────────────▼────────────────────────────────┐ │
│ │ Executor Service (Application) │ │
│ │ - Manages concurrency (semaphore-based) │ │
│ │ - Coordinates script execution │ │
│ │ - Aggregates test results │ │
│ └──────────────────────┬────────────────────────────────┘ │
│ │ │
│ ┌──────────────────────▼────────────────────────────────┐ │
│ │ Runtime Engine (Domain Interface) │ │
│ │ - Language-agnostic execution interface │ │
│ │ - Registry for language modules │ │
│ └──────────────────────┬────────────────────────────────┘ │
│ │ │
│ ┌──────────────────────▼────────────────────────────────┐ │
│ │ Docker Runtime (Infrastructure) │ │
│ │ ┌────────────┐ ┌────────────┐ ┌────────────┐ │ │
│ │ │ Python │ │ Go │ │ C │ ... │ │
│ │ │ Module │ │ Module │ │ Module │ │ │
│ │ └────────────┘ └────────────┘ └────────────┘ │ │
│ └──────────────────────┬────────────────────────────────┘ │
│ │ │
│ ┌──────────────────────▼────────────────────────────────┐ │
│ │ Docker Engine │ │
│ │ - Container creation and management │ │
│ │ - Resource limit enforcement │ │
│ │ - Ephemeral container lifecycle │ │
│ └───────────────────────────────────────────────────────┘ │
│ │
│ ┌───────────────────────────────────────────────────────┐ │
│ │ Kafka Publisher (Adapter) │ │
│ └──────────────────────┬────────────────────────────────┘ │
└─────────────────────────┼────────────────────────────────────┘
│
▼
Results Published
- Script Submission: External producer publishes a script message to the
scriptsKafka topic - Consumption: Kafka consumer reads the message and deserializes it into a
Scriptdomain object - Preparation: Executor service calls the runtime engine's
Preparemethod:- Runtime registry selects the appropriate language module
- For compiled languages: Module creates a build container, compiles the source, and extracts the binary
- For interpreted languages: Module prepares the source code for direct execution
- Execution: For each test case (or single execution if no tests):
- A run container is created with the prepared artifact
- Resource limits (time, memory) are enforced via Docker
- Program executes with test input provided via stdin
- Output is captured (stdout, stderr, exit code, duration)
- Container is destroyed
- Result Aggregation: Test results are aggregated into a
RunReport - Publishing: Run report is serialized and published to the
script-resultsKafka topic
The executor service uses a semaphore-based concurrency control mechanism:
- Bounded Parallelism: The
RUNNER_MAX_PARALLELenvironment variable controls the maximum number of scripts that can be processed concurrently - Per-script Concurrency: Each script can have multiple test cases, which are executed sequentially within that script
- Horizontal Scaling: Multiple runner instances can share the same Kafka consumer group, with Kafka distributing partitions across instances
- Graceful Shutdown: The service waits for in-flight executions to complete before shutting down
The concurrency model ensures:
- Resource limits are respected (CPU, memory, Docker containers)
- No single script can monopolize resources
- System can scale horizontally by adding more runner instances
- Kafka partition distribution ensures load balancing
cmd/
scrc/ # Application entry point and configuration wiring
internal/
app/
executor/ # Orchestration service pulling scripts and running suites
domain/
execution/ # Core domain types (scripts, limits, results, statuses)
infra/
kafka/ # Kafka adapters for script consumption and result publishing
runtime/
docker/ # Docker-based language modules and container orchestration
interfaces.go # Runtime engine interfaces and registry
ports/ # Hexagonal interfaces exposed to other layers
integration/ # End-to-end tests (requires Docker & Kafka via Testcontainers)
simulator/ # Python-based testing and benchmarking tools
internal/runtimehosts the runtime engine abstraction plus the Docker implementation. Language modules live underdocker/lang_*and are responsible for preparing and running programs in their language.internal/app/executorcoordinates pulling submissions, managing concurrency, and producing aggregated run reports using the runtime engine.internal/infra/kafkadefines the Kafka consumer/publisher and shared message envelopes used to communicate with external systems.cmd/scrcwires configuration from environment variables, constructs the runtime registry, and starts the executor loop.
SCRC is configured entirely through environment variables. All variables have sensible defaults for local development.
| Variable | Description | Default | Example |
|---|---|---|---|
KAFKA_BROKERS |
Comma-separated list of Kafka broker addresses | kafka:9092 |
localhost:9092,localhost:9093 |
KAFKA_TOPIC |
Kafka topic name for consuming scripts | scripts |
code-submissions |
KAFKA_GROUP_ID |
Kafka consumer group ID | scrc-runner |
runner-group-1 |
KAFKA_RESULTS_TOPIC |
Kafka topic name for publishing results | script-results |
execution-results |
| Variable | Description | Default | Example |
|---|---|---|---|
RUNNER_MAX_PARALLEL |
Maximum number of scripts to process concurrently | 1 |
8 |
RUNNER_TIME_LIMIT |
Default time limit for script execution (duration string) | 0 (no limit) |
5s, 2m30s |
RUNNER_MEMORY_LIMIT |
Default memory limit in bytes | 0 (no limit) |
134217728 (128MB) |
SCRIPT_EXPECTED |
Maximum number of scripts to process before exiting (0 = unlimited) | 0 |
100 |
Each language can be configured with custom Docker images and working directories:
| Variable | Description | Default |
|---|---|---|
PYTHON_IMAGE |
Python Docker image for execution | python:3.12-alpine |
PYTHON_WORKDIR |
Working directory in Python containers | /tmp |
GO_IMAGE |
Go Docker image for building | golang:1.22-alpine |
GO_RUN_IMAGE |
Go Docker image for running compiled binaries | alpine:3.20 |
GO_WORKDIR |
Working directory in Go containers | /tmp |
C_IMAGE |
C Docker image for building | gcc:14 |
C_RUN_IMAGE |
C Docker image for running compiled binaries | alpine:3.20 |
C_WORKDIR |
Working directory in C containers | /tmp |
CPP_IMAGE |
C++ Docker image for building | gcc:14 |
CPP_RUN_IMAGE |
C++ Docker image for running compiled binaries | alpine:3.20 |
CPP_WORKDIR |
Working directory in C++ containers | /tmp |
JAVA_IMAGE |
Java Docker image for building | eclipse-temurin:21-jdk-alpine |
JAVA_RUN_IMAGE |
Java Docker image for running compiled classes | eclipse-temurin:21-jre-alpine |
JAVA_WORKDIR |
Working directory in Java containers | /tmp |
| Variable | Description | Default |
|---|---|---|
GO_VERSION |
Go version for building the SCRC image (build arg) | 1.25.3 |
When environment variables are not set, SCRC uses the following defaults:
- Kafka: Connects to
kafka:9092, consumes fromscriptstopic, publishes toscript-resultstopic - Concurrency: Processes 1 script at a time (set
RUNNER_MAX_PARALLELfor parallelism) - Resource Limits: No default limits (scripts can specify limits in their message)
- Language Images: Uses standard official images (see table above)
export RUNNER_MAX_PARALLEL=12
export KAFKA_BROKERS=kafka1:9092,kafka2:9092,kafka3:9092
export KAFKA_GROUP_ID=high-throughput-runnersexport RUNNER_MAX_PARALLEL=4
export RUNNER_TIME_LIMIT=10s
export RUNNER_MEMORY_LIMIT=268435456 # 256MBexport PYTHON_IMAGE=python:3.11-slim
export GO_IMAGE=golang:1.21-alpine
export GO_RUN_IMAGE=debian:bookworm-slimexport KAFKA_BROKERS=localhost:9092
export RUNNER_MAX_PARALLEL=2
export SCRIPT_EXPECTED=10 # Process 10 scripts then exitScripts are published to Kafka as JSON messages. The message envelope structure is:
{
"type": "script",
"id": "unique-submission-id",
"language": "python",
"source": "print('Hello, World!')",
"limits": {
"time_limit_ms": 5000,
"memory_limit_bytes": 134217728
},
"tests": [
{
"number": 1,
"input": "test input",
"expected_output": "expected output"
}
]
}| Field | Type | Required | Description |
|---|---|---|---|
type |
string | No | Message type, defaults to "script". Use "done" to signal end of stream. |
id |
string | No | Unique identifier for the submission. If omitted, uses Kafka message key or offset. |
language |
string | Yes | Programming language: python, go, c, cpp, or java |
source |
string | Yes | Source code of the program to execute |
limits |
object | No | Resource limits for execution |
limits.time_limit_ms |
integer | No | Maximum execution time in milliseconds (0 = no limit) |
limits.memory_limit_bytes |
integer | No | Maximum memory usage in bytes (0 = no limit) |
tests |
array | No | Array of test cases to execute |
tests[].number |
integer | No | Test case number (auto-assigned if omitted) |
tests[].input |
string | No | Input to provide via stdin |
tests[].expected_output |
string | No | Expected output for validation |
sourcemust be non-emptylanguagemust be one of the supported languagestime_limit_msmust be non-negativememory_limit_bytesmust be non-negative- Test case numbers are auto-assigned starting from 1 if not provided
Results are published to Kafka as JSON messages with the following structure:
{
"id": "unique-submission-id",
"status": "OK",
"exit_code": 0,
"stdout": "Hello, World!\n",
"stderr": "",
"duration_ms": 1234,
"error": "",
"tests": [
{
"number": 1,
"status": "OK",
"exit_code": 0,
"duration_ms": 1234,
"stdout": "Hello, World!\n",
"stderr": "",
"input": "test input",
"expected_output": "expected output",
"error": ""
}
],
"timestamp": "2025-01-15T10:30:00Z"
}| Field | Type | Description |
|---|---|---|
id |
string | Submission ID (matches script message ID) |
status |
string | Overall status: OK, WA, TL, ML, BF, or - |
exit_code |
integer (nullable) | Process exit code |
stdout |
string | Standard output (aggregated if multiple tests) |
stderr |
string | Standard error (aggregated if multiple tests) |
duration_ms |
integer (nullable) | Total execution duration in milliseconds |
error |
string | Error message if execution failed |
tests |
array | Per-test results (present if script had test cases) |
tests[].number |
integer | Test case number |
tests[].status |
string | Test case status |
tests[].exit_code |
integer (nullable) | Test case exit code |
tests[].duration_ms |
integer (nullable) | Test case duration in milliseconds |
tests[].stdout |
string | Test case standard output |
tests[].stderr |
string | Test case standard error |
tests[].input |
string | Test case input |
tests[].expected_output |
string | Test case expected output |
tests[].error |
string | Test case error message |
timestamp |
string (ISO 8601) | Result publication timestamp |
OK: Script executed successfully and produced expected output (if tests provided)WA(Wrong Answer): Script executed but output didn't match expected outputTL(Time Limit): Script exceeded the time limitML(Memory Limit): Script exceeded the memory limitBF(Build Failure): Script failed to compile/build-(Not Run): Test case was not executed (e.g., due to previous failure)
{
"language": "python",
"source": "print('Hello, World!')"
}{
"id": "submission-123",
"language": "go",
"source": "package main\n\nimport \"fmt\"\n\nfunc main() {\n fmt.Println(\"Hello, World!\")\n}",
"limits": {
"time_limit_ms": 2000,
"memory_limit_bytes": 67108864
}
}{
"id": "submission-456",
"language": "c",
"source": "#include <stdio.h>\nint main() { int a, b; scanf(\"%d %d\", &a, &b); printf(\"%d\\n\", a + b); return 0; }",
"tests": [
{
"number": 1,
"input": "2 3\n",
"expected_output": "5\n"
},
{
"number": 2,
"input": "10 20\n",
"expected_output": "30\n"
}
]
}{
"id": "submission-456",
"status": "OK",
"exit_code": 0,
"stdout": "",
"stderr": "",
"duration_ms": 145,
"error": "",
"tests": [
{
"number": 1,
"status": "OK",
"exit_code": 0,
"duration_ms": 45,
"stdout": "5\n",
"stderr": "",
"input": "2 3\n",
"expected_output": "5\n",
"error": ""
},
{
"number": 2,
"status": "OK",
"exit_code": 0,
"duration_ms": 50,
"stdout": "30\n",
"stderr": "",
"input": "10 20\n",
"expected_output": "30\n",
"error": ""
}
],
"timestamp": "2025-01-15T10:30:00.123Z"
}SCRC currently supports the following programming languages:
| Language | Type | Build Image | Run Image | Notes |
|---|---|---|---|---|
| Python | Interpreted | N/A | python:3.12-alpine |
Direct execution, no build step |
| Go | Compiled | golang:1.22-alpine |
alpine:3.20 |
Static binary, CGO disabled |
| C | Compiled | gcc:14 |
alpine:3.20 |
Statically linked binary |
| C++ | Compiled | gcc:14 |
alpine:3.20 |
Statically linked binary |
| Java | Compiled | eclipse-temurin:21-jdk-alpine |
eclipse-temurin:21-jre-alpine |
JVM-based execution |
- Python: Source code is written to a file and executed directly with the Python interpreter
- Go: Source code is compiled in a build container, binary is extracted and executed in a minimal Alpine container
- C/C++: Source code is compiled with static linking (
-staticflag) to enable execution on minimal Alpine images - Java: Source code is compiled to
.classfiles in a JDK container, then executed in a JRE container
To add support for a new programming language:
-
Create a Language Module: Implement the
runtime.Moduleinterface ininternal/runtime/docker/lang_<language>.go -
Implement the Strategy Pattern: Create a strategy struct that implements the
prepareStrategyinterface:type prepareStrategy interface { Prepare(ctx context.Context, lang *languageRuntime, script execution.Script) (runtimex.PreparedScript, *execution.Result, error) Close() error }
-
Implement PreparedScript: Create a struct that implements
runtime.PreparedScript:type PreparedScript interface { Run(ctx context.Context, stdin string) (*execution.Result, error) Close() error }
-
Register the Module: Add the language module to the registry in
internal/runtime/docker/engine.go:modules = append(modules, newLanguageModule(cfg, execution.LanguageYourLang))
-
Add Configuration: Add language configuration to
cmd/scrc/config.go:execution.LanguageYourLang: { Image: envOrDefault("YOURLANG_IMAGE", defaultImage), RunImage: envOrDefault("YOURLANG_RUN_IMAGE", defaultRunImage), Workdir: envOrDefault("YOURLANG_WORKDIR", containerWorkdir), },
-
Add Language Constant: Add the language constant to
internal/domain/execution/script.go:LanguageYourLang Language = "yourlang"
-
Update Message Validation: Ensure the language string is accepted in
internal/infra/kafka/messages.go
Here's a simplified example structure for adding Rust:
// internal/runtime/docker/lang_rust.go
package docker
type rustStrategy struct{}
func (r *rustStrategy) Prepare(ctx context.Context, lang *languageRuntime, script execution.Script) (runtimex.PreparedScript, *execution.Result, error) {
// Build the Rust program
// Extract the binary
// Return prepared script
}
type rustPreparedScript struct {
runtime *languageRuntime
binary []byte
}
func (r *rustPreparedScript) Run(ctx context.Context, stdin string) (*execution.Result, error) {
// Execute the binary in a run container
}Each language can be customized via environment variables:
export PYTHON_IMAGE=python:3.11-slim
export PYTHON_WORKDIR=/appexport GO_IMAGE=golang:1.21-alpine
export GO_RUN_IMAGE=alpine:3.19
export GO_WORKDIR=/tmpexport C_IMAGE=gcc:13
export C_RUN_IMAGE=debian:bookworm-slim
export CPP_IMAGE=gcc:13
export CPP_RUN_IMAGE=debian:bookworm-slimexport JAVA_IMAGE=eclipse-temurin:20-jdk-alpine
export JAVA_RUN_IMAGE=eclipse-temurin:20-jre-alpine
export JAVA_WORKDIR=/appNote: For compiled languages (C, C++, Go), binaries are statically linked or built without CGO to enable execution on minimal run images. This significantly reduces container startup time and resource usage.
Run the service directly with Go:
go run ./cmd/scrcEnsure Kafka is accessible at the configured broker address (default: kafka:9092). For local development, you can use Docker Compose to start Kafka:
docker compose --profile single-runner up kafka zookeeperThen in another terminal:
export KAFKA_BROKERS=localhost:9092
go run ./cmd/scrcBuild and run the Docker image:
# Build the image
docker build -t scrc .
# Run with Docker socket access
docker run --rm \
-v /var/run/docker.sock:/var/run/docker.sock \
-e KAFKA_BROKERS=your-kafka:9092 \
-e RUNNER_MAX_PARALLEL=8 \
scrcImportant: The container must have access to the Docker socket (/var/run/docker.sock) to create execution containers.
Override the Go version used during build:
docker build --build-arg GO_VERSION=1.26.0 -t scrc .The project includes several Docker Compose profiles for different scenarios:
Starts Kafka, ZooKeeper, a single runner, and a load generator:
docker compose --profile single-runner --profile load-generator up --buildThis setup is ideal for:
- Local development and testing
- Understanding the system workflow
- Testing with a controlled load
The runner processes scripts until stopped. Limit execution with:
docker compose --profile single-runner --profile load-generator up --build \
-e SCRIPT_EXPECTED=100Scale to multiple runner instances:
docker compose --profile multi-runner --profile load-generator up --build --scale scrc=3This demonstrates:
- Horizontal scaling across multiple instances
- Kafka partition distribution
- Load balancing via consumer groups
Each runner shares the same consumer group (scrc-runner), so Kafka distributes partitions across instances, ensuring each submission is handled exactly once.
Tip: Increase the partition count of the scripts topic for higher parallelism:
kafka-topics --alter --topic scripts --partitions 6 --bootstrap-server localhost:9092Run multiple runners without the load generator:
docker compose --profile multi-runner up --build --scale scrc=3Point your own producer at Kafka to drive workloads.
- Kafka cluster with appropriate replication and partition configuration
- Docker daemon accessible to runner instances
- Monitoring and logging infrastructure
- Resource limits configured appropriately
-
Kafka Configuration:
- Configure appropriate replication factors for production
- Set partition counts based on expected parallelism
- Configure retention policies for scripts and results topics
- Enable compression for message efficiency
-
Scaling Strategy:
- Start with
RUNNER_MAX_PARALLELset to 2-4x CPU cores per instance - Scale horizontally by adding more runner instances
- Monitor CPU, memory, and Docker container usage
- Adjust parallelism based on resource utilization
- Start with
-
Resource Limits:
- Set default
RUNNER_TIME_LIMITandRUNNER_MEMORY_LIMITto prevent resource exhaustion - Consider per-language limits if needed
- Monitor for scripts that consistently hit limits
- Set default
-
Monitoring:
- Monitor Kafka consumer lag
- Track execution success/failure rates
- Monitor Docker container creation/destruction rates
- Alert on resource exhaustion or error spikes
-
Security:
- Run runners with non-root users where possible
- Use Docker security options (read-only filesystems, no-new-privileges)
- Isolate runner network access
- Implement rate limiting at the Kafka level
-
High Availability:
- Deploy multiple runner instances across availability zones
- Use Kafka consumer groups for automatic failover
- Implement health checks and automatic restart
# High-throughput production setup
export KAFKA_BROKERS=kafka1:9092,kafka2:9092,kafka3:9092
export KAFKA_GROUP_ID=production-runners
export RUNNER_MAX_PARALLEL=8
export RUNNER_TIME_LIMIT=30s
export RUNNER_MEMORY_LIMIT=536870912 # 512MB- CPU: 2-4 cores per runner instance (depending on parallelism)
- Memory: 512MB-2GB per runner instance (plus Docker overhead)
- Disk: Minimal (ephemeral containers), but ensure Docker has sufficient space
- Network: Low latency connection to Kafka cluster
The simulator is a Python-based tool that generates test submissions and validates results. It's useful for:
- Testing the runner service
- Generating load for performance testing
- Validating end-to-end workflows
- Development and debugging
The simulator consists of:
- Producer: Publishes script submissions to Kafka
- Consumer: Consumes results from Kafka and validates them
- Script Library: Pre-written scripts in
simulator/scripts/<language>/<outcome>that produce specific outcomes (OK, WA, TL, ML, BF)
Run the simulator standalone:
python -m simulator.mainOr use Docker Compose:
docker compose --profile single-runner --profile load-generator up --buildThe simulator will:
- Load scripts from
simulator/scripts/<language>/<outcome> - Publish each language/outcome combination for the configured number of iterations
- Consume results from the results topic
- Compare produced vs consumed counts
- Exit with non-zero status if discrepancies are detected
| Variable | Description | Default |
|---|---|---|
SCRIPT_ITERATIONS |
Number of iterations to emit (≤0 for infinite) | 1 |
SCRIPT_INTERVAL_SECONDS |
Delay between iteration batches | 1.0 |
RESULT_WAIT_TIMEOUT_SECONDS |
Timeout for waiting for results | 60 |
RESULT_CONSUMER_LOG_PATH |
Path to consumer log file | consumer.log |
KAFKA_BROKER |
Kafka broker address | localhost:9092 |
KAFKA_TOPIC |
Scripts topic name | scripts |
KAFKA_RESULTS_TOPIC |
Results topic name | script-results |
KAFKA_CONSUMER_GROUP |
Consumer group ID | scrc-simulator |
CONSUMER_POLL_TIMEOUT_MS |
Kafka poll timeout in milliseconds | 1000 |
export SCRIPT_ITERATIONS=10
export SCRIPT_INTERVAL_SECONDS=0.5
export RESULT_WAIT_TIMEOUT_SECONDS=120
python -m simulator.mainThe benchmark suite provides comprehensive performance analysis of the SCRC system. It measures:
- Throughput (submissions per minute)
- Latency (execution time by language and outcome)
- Scaling characteristics (horizontal and vertical)
- Impact of different workload profiles
The benchmark suite employs a rigorous methodology designed to minimize noise and produce statistically reliable results.
Each benchmark case executes 3 times by default (num_runs=3):
- Results from all runs are aggregated to reduce variance
- Throughput metrics are averaged across runs
- Individual run data is preserved in
{case}__run-{N}.csvfiles - Aggregated results are stored in
{case}__aggregated.csvfor analysis
A 60-second warmup period precedes each measurement window:
- Allows the system to reach steady state:
- Docker image layers are cached
- JVM warmup completes (for Java)
- Container pool stabilizes
- System resources reach equilibrium
- Data collected during warmup is excluded from results
Active measurement occurs for 240-300 seconds (depending on benchmark type):
- Language latency benchmarks use 300s for larger sample sizes
- Throughput benchmarks use 240s for balance between accuracy and time
- Only submissions that start during the measurement window are included
- Submissions that complete during cooldown are still counted if they started during measurement
A 60-second cooldown period follows each measurement window:
- Allows in-flight submissions to complete naturally
- Prevents premature termination from skewing results
- Data from submissions that start during cooldown is excluded
The collector tracks precise timestamps for the measurement window:
- Results are filtered to exclude:
- Submissions sent during warmup (
sent_ts < benchmark_start_ts) - Submissions sent during cooldown (
sent_ts >= benchmark_end_ts)
- Submissions sent during warmup (
- This ensures only steady-state measurements are included in analysis
- Warmup: 60 seconds (allows system stabilization)
- Duration: 240-300 seconds (provides sufficient sample size)
- Cooldown: 60 seconds (allows in-flight work to complete)
- Runs per case: 3 (reduces variance through aggregation)
- Total time per case: ~6.5-7 minutes (including overhead)
- Full suite: ~7-7.5 hours (23 cases × 3 runs)
- Multiple runs reduce the impact of transient system conditions
- Longer durations provide larger sample sizes for more accurate statistics
- Filtering eliminates startup/shutdown artifacts
- Aggregated results represent true steady-state performance
- Individual run CSVs:
{matrix}__{case}__run-{N}.csv- raw data per run - Aggregated CSV:
{matrix}__{case}__aggregated.csv- combined data from all runs - Charts: Generated from aggregated data for visualization
- Manifest:
benchmark_manifest.json- summary of all generated artifacts
This strategy ensures benchmarks produce consistent, reliable results that accurately reflect system performance under steady-state conditions.
-
Build the runner image (needed so the harness can launch on-demand runners):
docker compose build scrc
-
Launch the benchmark profile (this will start Kafka/ZooKeeper plus the harness container; it will terminate automatically once all matrices finish):
docker compose --profile benchmark up --build scrc-benchmark
-
Inspect the outputs in
logs/benchmarks/:single_runner_throughput.png— throughput vsRUNNER_MAX_PARALLELscaling_heatmap.png— throughput heatmap across runner replicas × parallelismlanguage_latency_heatmap.png— median latency summary by language & outcomelanguage_latency_boxplot.png— detailed latency distributions per language/outcometimeout_impact.png— effect of timeout-heavy submissions on throughputlanguage_throughput.png— share of throughput per language/outcomebenchmark_manifest.json— manifest summarizing generated artifacts- CSV snapshots for each benchmark case
Use the environment variables documented in simulator/benchmarks/main.py to customize broker endpoints, output paths, or override the benchmark plan.
The benchmark suite generates several charts that provide insights into system performance, scalability, and language-specific characteristics:
Single Runner Throughput (single_runner_throughput.png)
A line chart showing throughput (submissions per minute) as RUNNER_MAX_PARALLEL increases from 1 to 16 for a single runner instance under medium load. Throughput increases from 140 submissions/min at parallelism 1 to a peak of 228 submissions/min at parallelism 12, representing a 63% improvement. The optimal parallelism range is 8-12, achieving 225-228 submissions/min. Performance degrades at parallelism 16 (214 submissions/min), demonstrating resource contention when parallelism exceeds available CPU cores. The gradual increase from parallelism 1-8 shows effective utilization of available resources, while the plateau and decline at higher parallelism indicates the system has reached its capacity limits for a single runner instance.
Scaling Heatmap (scaling_heatmap.png)
A heatmap visualizing throughput across combinations of runner replicas (1-3) and parallelism values (4, 6, 8, 12). Darker colors indicate higher throughput. The best performance is achieved with 3 replicas × 6 parallelism at 277.5 submissions/min. Increasing to 3 replicas × 8 parallelism yields 264.25 submissions/min (slight decrease), while 3 replicas × 12 parallelism drops to 246.5 submissions/min, demonstrating resource contention when total parallelism (36 tasks) significantly exceeds the 8-core CPU capacity. Horizontal scaling provides substantial benefits: 2 replicas achieve 1.3-1.4x improvement over single replica configurations, and 3 replicas achieve 1.4-1.5x improvement. However, high parallelism (12) causes contention even with multiple replicas, showing that vertical scaling has limits on resource-constrained systems. The sweet spot is 3 replicas with moderate parallelism (6-8), balancing resource utilization without contention.
Language Latency Heatmap (language_latency_heatmap.png)
A color-coded grid showing median latency (in seconds) for each language-outcome combination, with darker reds indicating higher latencies. C and C++ demonstrate the fastest performance across all outcomes, with median latencies of 0.8-1.5 seconds due to compiled native code execution. Go shows moderate latency of 1.5-2.5 seconds, including compilation overhead from the build step. Python exhibits higher latency of 2.5-3.5 seconds due to interpretation overhead. Java shows the highest latency at 2.5-4.5 seconds, primarily due to JVM startup and warmup overhead. Time limit (TL) outcomes consistently show the highest latency across all languages (3-5 seconds), as these represent submissions that run until timeout. Memory limit (ML) and build failure (BF) outcomes show the lowest latency (typically under 1 second) as they are detected quickly. The heatmap clearly shows that compiled languages (C/C++) outperform interpreted languages (Python) and JVM-based languages (Java) for all outcome types.
Language Latency Distribution (language_latency_boxplot.png)
Box plots showing the full latency distribution for each language-outcome combination. Each box represents the interquartile range (25th to 75th percentile) with a median line, and whiskers extend to show the data range. C and C++ show tight, compact distributions with low variance, indicating highly consistent and predictable performance across submissions. Go exhibits moderate variance with relatively consistent performance for most outcomes, though time limit outcomes show wider distributions. Python displays higher variance with wider interquartile ranges, showing more variable execution times typical of interpreted languages. Java shows the highest variance and widest distributions, reflecting the variable nature of JVM performance including garbage collection pauses and JIT compilation effects. Time limit (TL) outcomes consistently show the widest distributions across all languages, as these submissions run for varying durations before hitting the timeout. Non-TL outcomes (OK, WA, ML, BF) have tighter distributions, indicating more consistent execution patterns. The box plots reveal that while compiled languages achieve lower absolute latencies, they also provide more predictable performance with less variance.
Timeout Impact (timeout_impact.png)
A bar chart comparing throughput between a baseline workload (balanced outcome distribution) and a timeout-heavy workload (TL submissions weighted 2.5x higher). The baseline workload achieves 220.75 submissions/min, while the timeout-heavy workload achieves 173.0 submissions/min, representing a 22% reduction in throughput. This demonstrates that timeout-heavy submissions significantly impact overall system throughput, as long-running submissions block resources and reduce the system's capacity to process new submissions. The system handles timeout detection and cleanup efficiently, but the extended execution time of timeout submissions (running until the time limit is reached) consumes resources that could otherwise process multiple shorter submissions. The 22% reduction shows that while the system remains functional under timeout-heavy loads, performance degrades meaningfully, indicating that timeout detection and resource management are working correctly but cannot fully mitigate the impact of long-running submissions on overall throughput.
Language Throughput Share (language_throughput.png)
A stacked bar chart showing the percentage distribution of submissions across languages and outcomes under heavy load, with each bar totaling 100% of submissions. The total system throughput is approximately 347 submissions/min. The distribution reflects the load profile weights: Python (1.2x weight) and Java (1.1x weight) dominate the workload, while Go (1.0x) and C/C++ (0.9x) have lower representation. Each language bar shows colored segments representing the proportion of each outcome type (OK, WA, TL, ML, BF). The outcome distribution shows a realistic mix across all languages, with successful outcomes (OK) typically representing the largest segment, followed by wrong answers (WA), time limits (TL), memory limits (ML), and build failures (BF). The chart reveals that Python and Java submissions comprise the majority of the workload due to their higher weights in the heavy load profile, while compiled languages (C, C++, Go) represent a smaller but still significant portion. The outcome distribution is consistent across languages, indicating that the system handles all languages and outcome types uniformly under load.
-
Clone the repository:
git clone <repository-url> cd scrc
-
Install dependencies:
go mod download
-
Start local infrastructure (Kafka, ZooKeeper):
docker compose --profile single-runner up kafka zookeeper
-
Run the service:
export KAFKA_BROKERS=localhost:9092 go run ./cmd/scrc
The codebase follows Go best practices and hexagonal architecture:
cmd/scrc: Application entry point, configuration loading, dependency injectioninternal/domain/execution: Core domain models (Script, Result, Status, Limits, TestCase)internal/app/executor: Application orchestration logicinternal/infra/kafka: Kafka infrastructure adaptersinternal/runtime: Runtime engine abstraction and Docker implementationinternal/ports: Interface definitions (ports in hexagonal architecture)integration: End-to-end integration tests
- Hexagonal Architecture: Domain logic is isolated from infrastructure
- Strategy Pattern: Language modules implement a common interface
- Registry Pattern: Language modules are registered in a central registry
- Adapter Pattern: Kafka adapters bridge external systems to domain interfaces
Run the fast unit test suite:
go test ./...Run tests for a specific package:
go test ./internal/runtime/dockerIntegration tests exercise the full stack with Docker and Kafka:
go test -tags=integration ./...Integration tests use Testcontainers to spin up ephemeral Kafka and Docker environments. They require:
- Docker daemon accessible
- Network access for pulling container images
Generate coverage report:
go test -cover ./...Generate HTML coverage report:
go test -coverprofile=coverage.out ./...
go tool cover -html=coverage.out-
Fork the repository and create a feature branch
-
Write tests for new functionality
-
Follow Go conventions: Use
gofmt, follow standard project structure -
Update documentation for any new features or configuration options
-
Run all tests before submitting:
go test ./... go test -tags=integration ./...
-
Submit a pull request with a clear description of changes
- Use
gofmtfor formatting - Follow Go naming conventions
- Add comments for exported functions and types
- Keep functions focused and small
- Prefer composition over inheritance
When adding new features:
- Update relevant documentation sections
- Add examples if applicable
- Update configuration reference if new environment variables are added
- Add integration tests for critical paths
Symptoms: Service fails to start with connection errors
Solutions:
- Verify Kafka is running and accessible:
docker ps | grep kafka - Check
KAFKA_BROKERSenvironment variable is correct - Ensure network connectivity to Kafka brokers
- Check Kafka logs for errors:
docker logs kafka
Symptoms: permission denied while trying to connect to the Docker daemon socket
Solutions:
- Ensure user has access to Docker socket:
sudo usermod -aG docker $USER(then log out/in) - When using Docker, ensure socket is mounted:
-v /var/run/docker.sock:/var/run/docker.sock - Check Docker daemon is running:
docker ps
Symptoms: Scripts published to Kafka but no results appear
Solutions:
- Check runner logs for errors
- Verify consumer group is correct
- Check Kafka consumer lag:
kafka-consumer-groups --bootstrap-server localhost:9092 --group scrc-runner --describe - Ensure runner has sufficient resources (CPU, memory)
- Verify
RUNNER_MAX_PARALLELis set appropriately
Symptoms: System running out of memory
Solutions:
- Reduce
RUNNER_MAX_PARALLELto limit concurrent executions - Set
RUNNER_MEMORY_LIMITto cap per-script memory - Monitor Docker container count:
docker ps -a | wc -l - Ensure containers are being cleaned up (check for zombie containers)
- Consider horizontal scaling instead of increasing parallelism
Symptoms: Scripts taking longer than expected
Solutions:
- Check Docker image pull times (first run pulls images)
- Verify system resources (CPU, memory, disk I/O)
- Check for Docker daemon issues:
docker info - Consider using pre-pulled images or image caching
- Review language-specific optimizations (e.g., JVM warmup for Java)
The service uses Go's standard log package. To enable more verbose logging, you may need to modify the code or use a logging framework.
Inspect messages in Kafka topics:
# Consume from scripts topic
kafka-console-consumer --bootstrap-server localhost:9092 --topic scripts --from-beginning
# Consume from results topic
kafka-console-consumer --bootstrap-server localhost:9092 --topic script-results --from-beginningWatch container creation and destruction:
# Watch container events
docker events
# List running containers
docker ps
# List all containers (including stopped)
docker ps -a
# Inspect a specific container
docker inspect <container-id>Monitor system resources:
# CPU and memory usage
top
# or
htop
# Docker stats
docker stats
# Disk usage
df -h
docker system df- Start with
RUNNER_MAX_PARALLEL = 2 × CPU cores - Monitor CPU utilization (target 70-80%)
- Increase gradually and measure throughput
- Watch for diminishing returns or resource contention
- Increase partition count for higher parallelism
- Tune Kafka consumer settings (batch size, fetch size)
- Enable compression for message efficiency
- Configure appropriate retention policies
- Use minimal base images (Alpine Linux)
- Pre-pull frequently used images
- Enable Docker build cache
- Monitor and clean up unused images:
docker image prune
- Python: Use slim images, consider PyPy for performance-critical workloads
- Go: Pre-compile common dependencies if possible
- Java: Tune JVM options for faster startup (consider GraalVM for native images)
- C/C++: Use optimized compiler flags in production
- Add more runner instances rather than increasing parallelism per instance
- Ensure Kafka has sufficient partitions (at least one per runner instance)
- Use load balancers or service discovery for Kafka broker addresses
- Monitor consumer group rebalancing
This project is licensed under the MIT License. See the LICENSE file for details.