Bridge Four

Bridge Four is a simple, functional, effectful, single-leader, multi worker, distributed compute system optimized for embarrassingly parallel workloads.

The name is inspired by Brandon Sanderson's "The Stormlight Archive".

It is explained in detail on my blog, where I'm building out this system as a learning exercise instead of "grinding Leetcode":

Part 1: Intro
Part 2: Improvements

This is a work in progress, missing key features. It is, however, surprisingly [resilient](#Chaos Monkey).

Architecture

General semantics

Single leader, multi-worker
Eventually consistent
Redundant workers w/ rebalancing
Leaders are redundant and run a Raft election; leaders do not replicate state yet but forward requests to the active leader
"Self"-healing (workers can be killed and re-started during execution, the leader will recover)

Terminology

A job reads files, performs computations, and writes output files
Each job executes N tasks (N > 0)
Each task executes on a worker (aka a spren)
Each worker has a set number of slots with each can execute exactly 1 task in parallel

Starting a Job

General purpose diagram for "starting a job":

POST to /start to submit a job
Leader accepts or rejects the job, adds it to the cache
If a free slot on the cluster is available, it will assign the job; if the cluster capacity is < the required processing capacity for the job, it will do so piecemeal
The user can GET /status/$jobId or GET /data/$jobId, which will either return an ExecutionStatus or the result of the job

The user facing interface therefore is eventually consistent.

Comparing it to `WordCount`

(From the blog)

Instead of the leader assigning lines of input to mappers that emit individual key-value pairs that get moved deterministically (shuffled) and then reduced to an output, our system is more naive:

It reads pre-split files (assume e.g. a Spark output with properly sized files) which can be read in parallel and write independent outputs, which the leader at the end doing the final aggregation, akin to e.g. a Spark collect() call that sends data to the Driver, with the idea being that each worker's output is a fraction of the original input file. Please see below for a caveat on that in the "Breaking it" section.

That, of course, is not realistic - outside of embarrassingly parallel problems, most systems would run into amassive bottleneck at the leader's end. [1]

However, consider a more computationally intense problem, such as distributed.net - brute forcing ciphers. With minimal adjustments - namely, a mechanism for a worker to report definitive success, which would cause the leader to ignore other worker's output - our system here could feasibly model this, assuming we can pre-generate a set of seed files:

For a naive brute force, task A tries all combinations starting with the letter "a", task B tries all starting with "b" and so on - the input file can be a single letter in atxt file. "Naive" is definitely the word of the hourhere.

Lastly, please check the last section for a retrofit of Partitioning into Bridge Four, making this look more similar to classic Map/Reduce.

No Implemented / Missing / Broken

Partitioning: File assignment is greedy and not optimal
Worker Stages: Jobs start and complete, the leader only checks their state, not intermediate stages (i.e., we can't build a shuffle stage like Map/Reduce right now)
Global leader locks: The BackgroundWorker is concurrency-safe, but you can start two jobs that work on the same data, causing races - the leader job controller uses a simple Mutex[F] to compensate
Atomic operations / 2 Phase Commits
I've been very heavy handed with Sync[F].blocking, which often isn't the correct effect
File System Abstraction: This assumes a UNIX-like + something like NFS to be available, which isn't ideal and has it's own locking problems
Raft Log Replication: Currently, this system implements Raft only to elect a leader, not to replicate state

Run application

Docker

sbt docker:publishLocal
docker-compose up

And run sbin/wordcount_docker.sh to run a sample job.

Bare Metal

In separate terminals (or computers, provided you have an NFS share mounted - see above), run:

BRIDGEFOUR_PORT=5553 WORKER_ID=0 sbt worker/run
BRIDGEFOUR_PORT=5554 WORKER_ID=1 sbt worker/run
WORKER1_PORT=5554 WORKER2_PORT=5553 sbt leader/run

Sample Jobs

You'll need a job. I suggest using wordcount.

git clone https://github.com/chollinger93/bridgefour-example-jobs.git
cd bridgefour-example-jobs
sbt assembly
export JAR_PATH=`pwd`/target/scala-3.6.4/wordcount-assembly-0.1.0-SNAPSHOT.jar

Happy Path

sbin/wordcount_docker.sh will showcase a execution with more tasks than workers:

Download "War and Peace" by Leo Tolstoy
Run a word count job on it, with an artificial delay of 30s per task, with 5 tasks on 4 workers
Print the output

Sample output:

# ...
{"id":1,"slots":[{"id":{"id":0,"workerId":1},"available":true,"status":{"type":"Done"},"taskId":{"id":1200393588,"jobId":-1368283400}},{"id":{"id":1,"workerId":1},"available":true,"status":{"type":"Done"},"taskId":{"id":1049728891,"jobId":-1368283400}}],"allSlots":[0,1],"availableSlots":[0,1],"runningTasks":[]}
Sleeping for 10 seconds
{"type":"Done"}
Job done
Checking results
{
  "the": 31714,
  "and": 20560,
  "": 16663,
  "to": 16324,

Chaos Monkey

sbin/wordcount_chaos_monkey.sh will gradually murder your cluster and then recover by restarting a single worker, finishing the job in a degraded cluster state.

Run curl --location 'http://localhost:6550/cluster' to see the cluster state.

Scaling

The docker-compose file starts with 3 workers (spren), but only 2 are active. This is to demonstrate dynamic scaling.

You can scale this up and down by starting more Docker pods (or bare metal instances) and call

curl --location 'http://localhost:6550/cluster/addWorker' \
--header 'Content-Type: application/json' \
--data '{
    "id": 3,
    "schema": "http",
    "host": "spren-03",
    "port": 6553
}'

Name		Name	Last commit message	Last commit date
Latest commit History 39 Commits
docs		docs
modules		modules
project		project
sbin		sbin
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
.scalafix.conf		.scalafix.conf
.scalafmt.conf		.scalafmt.conf
README.md		README.md
build.sbt		build.sbt
docker-compose.yml		docker-compose.yml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Uh oh!

Repository files navigation

Bridge Four

Architecture

General semantics

Terminology

Starting a Job

Comparing it to `WordCount`

No Implemented / Missing / Broken

Run application

Docker

Bare Metal

Sample Jobs

Happy Path

Chaos Monkey

Scaling

About

Uh oh!

Releases 5

Packages

Languages

Uh oh!

Uh oh!

chollinger93/bridgefour

Folders and files

Latest commit

History

Repository files navigation

Bridge Four

Architecture

General semantics

Terminology

Starting a Job

Comparing it to WordCount

No Implemented / Missing / Broken

Run application

Docker

Bare Metal

Sample Jobs

Happy Path

Chaos Monkey

Scaling

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases 5

Packages 0

Languages

Comparing it to `WordCount`

Packages