Snowflake

A small distributed system in Elixir to assign unique numbers to each resource that is being managed. These ids are 64 bit globally unique numbers given out at most once.

The service is composed of a set of nodes, each running one process serving ids.

A caller will connect to one of the nodes (typically via load balancer like haproxy) and ask it for a globally unique id.

There are a fixed number of nodes in the system, up to 1024. Each node has a numeric id, 0 <= id <= 1023 which is stored in a dets file locally on the host to simulate multiple nodes.

Each node knows its id at startup and that id never changes for the node unless explicitly removed via deleting the priv/persist directory.

We assume that any node will not receive more than 100,000 requests per second.

NOTE: The highest requests/sec that I was able to see using wrk and wrk2 with Logger essentially turned off and max_keepalive connections set to 5M was around ~ 26k req/sec or ~ 800k req/30 secs. I have added explicit coordination in the form of a peer lookup for load distribution purposes triggered only via the /dunique path and also to illustrate node and cluster resiliency. The unique id generation itself does not rely on any distributed coordination other than a node_id read upon startup :)

asdf install erlang 22.0.7
asdf global erlang 22.0.7
asdf install elixir 1.9.1
asdf global elixir 1.9.1

Here's an example run

Local node lookups only

$ wrk -t6 -c30 -d30s --latency http://127.0.0.1:5454/lunique
Running 30s test @ http://127.0.0.1:5454/lunique
  6 threads and 30 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency     1.22ms    1.01ms  41.49ms   96.55%
    Req/Sec     4.40k     1.42k   12.06k    73.68%
  Latency Distribution
     50%    1.12ms
     75%    1.42ms
     90%    1.72ms
     99%    5.07ms
  788618 requests in 30.10s, 153.43MB read
Requests/sec:  26200.53
Transfer/sec:      5.10MB

Local and peer node lookups

$  wrk -t6 -c30 -d30s --latency http://127.0.0.1:5454/dunique?target=random
Running 30s test @ http://127.0.0.1:5454/dunique?target=random
  6 threads and 30 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency     3.29ms    2.71ms  40.79ms   81.27%
    Req/Sec     1.72k   526.28     3.07k    71.11%
  Latency Distribution
     50%    2.60ms
     75%    4.35ms
     90%    6.63ms
     99%   13.15ms
  307555 requests in 30.04s, 59.83MB read
Requests/sec:  10236.57
Transfer/sec:      1.99MB

Using a lua script which utilizes each of the worker nodes on a single machine

$ wrk -t6 -c30 -d30s -s run.lua --latency http://127.0.0.1:5454
Running 30s test @ http://127.0.0.1:5454
  6 threads and 30 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency     2.80ms    4.64ms 102.79ms   90.16%
    Req/Sec     3.39k     1.31k    9.42k    77.35%
  Latency Distribution
     50%    0.99ms
     75%    3.01ms
     90%    7.35ms
     99%   21.42ms
  607150 requests in 30.08s, 118.12MB read
Requests/sec:  20187.59
Transfer/sec:      3.93MB

NOTE: For wrk tests open 5 terminals, 1 for the wrk command and the other for each of the nodes

$ PORT=5454 elixir --name lookup_node1@127.0.0.1 -S mix run --no-halt
$ PORT=5455 elixir --name lookup_node2@127.0.0.1 -S mix run --no-halt
$ PORT=5456 elixir --name lookup_node3@127.0.0.1 -S mix run --no-halt
$ PORT=5457 elixir --name lookup_node4@127.0.0.1 -S mix run --no-halt

A) Global uniqueness The ids are guaranteed globally unique assuming the source of timestamps has millisecond / sub millisecond precision, as we prevent id collisions by appending node_ids (or worker_ids) given a cluster. For id generations on the same node, we monotonically increment an atomic counter. Should a node be deployed to multiple data centers either they should have different node ids or we should add some data center bits to our id generation. This is discussed more in the Id module. (This is all within the custom epoch time)

Property testing can also be used to further verify uniqueness

B) Performance - 100,000 req/sec per node? One node is able to hit a max of around 25k requests through various wrk tests. Haven't tried the Tsung Erlang load test tool.

This was all done on a quad-core Linux box vs a cluster of cloud instances

$ cat  /proc/cpuinfo
model name	: Intel(R) Core(TM) i5-8250U CPU @ 1.60GHz
cpu MHz		: 983.461
cache size	: 6144 KB
siblings	: 8
cpu cores	: 4

C) Failure cases Uniqueness is still preserved after system fails and restarts and node crashes since it is centered around timestamps Software defects are handled through exception handling and supervisors and let it crash :)

Lastly, to run tests you may want to run the custom mix task and the Erlang port mapper daemon

$ MIX_ENV=test mix seed 
$ epmd -daemon

Feel free to blow away the priv/persist dir and rerun the mix seed custom task

Feel free to read the source code and comments.

Multiple nodes generating new ids

Happy coding, thanks!

Bibek

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
config		config
lib		lib
priv		priv
test		test
.formatter.exs		.formatter.exs
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
mix.exs		mix.exs
mix.lock		mix.lock
run.lua		run.lua

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Snowflake

About

Uh oh!

Releases

Packages

Contributors 2

Uh oh!

Languages

License

brpandey/snowflake

Folders and files

Latest commit

History

Repository files navigation

Snowflake

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Uh oh!

Languages

Packages