Trinamo is a globally consistent data store implemented on top of Amazon DynamoDB.
Trinamo is ideal for use cases where you need consistency across multiple AWS regions. Trinamo always uses 3 regions, no more, no less.
Then thou must count to three. Three shall be the number of the counting and the number of the counting shall be three. Four shalt thou not count, neither shalt thou count two, excepting that thou then proceedeth to three. Five is right out.
- DynamoDB Global table replication is expensive, and not consistent.
- Spanner (expensive).
- Single region (region outages).
- Global 2-phase commit (slow).
- Aurora Global (expensive, where is it mastered?)
All consistency protocols, like Paxos, and Raft are based on the idea of voting. This is why Trinamo always uses three nodes (regions). This way, a consensus can always be reached by ensuring that two nodes (regions) agree.
Trinamo defines a table where the primary key is the combination of the partition key (PK), and sort key (SK).
Here is an example table, which we can call Locations has three. Each
The primary key of this table is the tuple (PK, SK).
| User (PK) | Serial (SK) | City |
|---|---|---|
| alice | 1 | Tokyo |
| alice | 2 | London |
| alice | 3 | Tokyo |
| bob | 1 | Sydney |
In this example, Alice started in Tokyo, went to london, and then back to Tokyo. Bob is in Sydney. If we query the table for the latest items we would get:
| User | City |
|---|---|
| alice | Tokyo |
| bob | Sydney |
The SQL equivalent of this query would be:
SELECT User, City from Locations ORDER BY Serial DESC LIMIT 1
To modify the state of an item, Trinamo writes a new item with the next serial, so that the serials are always consecutive integers for each PK.
Trinamo uses the DynamoDB attribute_not_exists condition to ensure that
existing items are not overwritten.
When writing items, Trinamo will send the write (PutItem) to all three regions. Once the item is successfully written to two regions, then the write is considered successful. If the writes fail for any reason other than a ConditionalCheckFailedException, then Trinamo will retry according to the default boto3 retry logic. In the case of one or more ConditionalCheckFailedException being returned, then Trinamo moves onto the conflict resolution logic.
The priorities of the conflict resolution logic are, in order:
- Guarantee strong consistency.
- Tolerate a regional outage.
- Simplicity.
- Speed (in the common case where there are no conflicts).
- Resolve conflicts in a reasonable timeframe.
In this case, the Trinamo Client will return a success to the caller.
In this case, the Trinamo Client will return a failure to the caller.
In the rare case that three different values are written for the same serial, all clients attempting to write the next value will first write a VOID item. A VOID item indicates that the value of the previous serial was never committed.
Notably, since all three clients are attempting to write the same VOID item, it doesn't matter which clients succeed.
| User (PK) | Serial (SK) | City (Ohio) | City (Dublin) | City (Tokyo) |
|---|---|---|---|---|
| alice | 1 | Sydney | Sydney | Sydney |
| alice | 2 | London | Paris | Helsinki |
| alice | 3 | VOID | VOID | VOID |
| alice | 4 | London | Paris | Paris |
In the example above, Serial 2 has three different values for City written in the Oregon, Ohio, and, Tokyo regions. The VOID allows any client to know that the latest committed value is Serial 1 ( Sydney), until Serial 4 ( Paris) is committed.
The minority value of London at Serial 4 in Ohio may get cleaned up later, but clients can determine the committed state by reading from all three regions.