Thanks to visit codestin.com
Credit goes to www.scribd.com

0% found this document useful (0 votes)
4 views18 pages

System Design Concepts

Distributed System Design Concepts

Uploaded by

gitantranthien
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
4 views18 pages

System Design Concepts

Distributed System Design Concepts

Uploaded by

gitantranthien
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 18

SYSTEM DESIGN CONCEPTS

1. Network Protocols … is a set of rules and messages that form an


Internet standard
1.1 MODELS:

7) Application
6) Presentation
5) Session
4) Transport = Ports, TCP, UDP
3) Network = IP Address, Routers, Hosts
2) Data Link = MAC Address, NICs, Switches
1) Physical = Ethernet Cables, Wi-Fi, HUB

1.2 PROTOCOLS:
 ARP = Address Resolution Protocol
 FTP = File Transfer Protocol
 SMTP = Simple Mail Transfer Protocol
 HTTP = Hyper Text Transfer Protocol
 SSL = Secure Socket Layer
 TLS = Transport Layer Security
 HTTPS = HTTP secured with SSL/TLS
 DNS = Domain Name System
 DHCP = Dynamic Host Configuration Protocol

1.3 DATA TRANSFER:


Every host needs 4 items for Internet Connectivity
1. IP Address = Host’s Identity on the Internet
2. Subnet Mask = Size of Host’s network
3. Default Gateway = Router’s IP Address
4. DNS Server IP(s) = Translate domain name to IPs
2. Latency, Throughput, and Availability
3.1 LATENCY
Latency is the amount of time in milliseconds (ms) it takes a single message to be delivered. The
concept can be applied to any aspect of a system where data is being requested and transferred.
Latency can be improved by:
 avoid network calls whenever possible
 replicate data across data centers for disaster recovery as well as performance
 use CDNs to reduce latency
 keep frequently use data in memory if possible

3.2 THROUGHPUT
Throughput is the amount of data that is successfully transmitted through a system in a certain
amount of time, measured in bits per second (bps). Throughput is a measurement of how much is
actually transmitted, and it is not the theoretical capacity (bandwidth) of the system. Throughput can
be improved by: increasing bandwidth, improving latency, protocol choice

3.3 AVAILABILITY
 Availability is amount of time a system is operational during a period of time.
 Availability = (available time / total time) x 100
 Availability = (23 hours uptime / 24 hours) X 100
 Availability = 95.83 %

Availability can be improved by:


 Failover systems: duplicates of any part of the system that are switched to in the case of failure.
 Clustering: running multiple instances of a part of the system
 Backups: data backups and replication
 Redundancy: physically locating systems in different parts of the world

3.4 RELIABILITY
Reliability: is the probability a system will fail during a period of time & slightly harder to define than
HW reliability

 MTBF = (Total elapsed time – Total downtime) / number of failures


 MTBF = (24 hours uptime – 4 hours downtime) / 4 failures
 MTBF = 5 hours
3.5 CAPACITY ESTIMATES
3.5.1 Data conversions

3.5.2 Common data types


 1 byte = 8 bits
 Character = 1 byte
 Integer = 4 bytes (32 bits integer)
 UNIX timestamp = 4 bytes

3.5.3 Time conversions


 60 seconds x 60 minutes = 3,600 seconds per hour
 3600 seconds x 24 hours = 86,400 seconds per day
 86,400 seconds x 30 days = 2,500,000 seconds per month
3.5.4 Traffic estimates
 Instagram App

Estimate total number of requests app will receive


(Average Daily Active Users) * (Average reads/writes per user) … DAU = Daily Active User

o 10 million DAU * 30 photos viewed = 300 million photo reads


o 10 million DAU * 1 photo uploaded = 10 million photo writes
o ===================================================
o 300 million reads // 86,400 seconds = 3, 472 reads per second
o 10 million writes // 86,400 seconds = 115 writes per second

Estimate total number of cache memory app will need


(Read Requests per Day) * (Average request size) * .2
o 300 million reads * 500 bytes = 150 Gigabytes
o 150 Gigabytes * .2 (80/20 rule) = 30 Gigabytes … (20% of request-out of 80% data)
o 30 Gigabytes * 3 (replications) = 90 Gigabytes

Estimate total bandwidth app will need


(Requests per Day) * (Request size)

o 300 million requests * 1.5 Megabytes = 450,000 Gigabytes


o 450,000 Gigabytes // 86,400 seconds = 5.2 Gigabyte per second

Estimate total storage app will need


(Writes per Day) * (Size of writes) * (Time to store data)

o 10 million writes * 1.5 Megabytes = 15 Terabytes per day


o 15 Terabytes * 365 days * 10 years = 55 Petabytes
3. LOAD BALANCER (*)
3.1 OVERVIEW
Balance incoming traffic to multiple servers to improve scalability, availability, and flexibility of application
with the following features:

 Distribute incoming traffic to the network by efficiently distributing across multiple servers
 Reliability & High Availability is maintained by redirecting requests only to the available severs
 Ease of use in adding and removing servers in the network as per demand

3.2 TYPES OF LOAD BALANCERS


1. SW-based LB: The most well-known and respected software load balancers are Nginx, Avi Vantage,
and HAProxy. Software load balancers like these run on standard servers and are less hardware
optimized, but cheaper to set up and run. SW LB may include a reverse proxy

2. Cloud-based LB: If you are designing a high-availability application for better performance &
security, then the following cloud LB will help you. Each has some advantages or additional features
than others, so choose what works for you: AWS ELB, GCP LB, MSFT Azure LB, etc.

3. HW-based LB: Hardware load balancers are specialized appliances with circuitry designed to
perform load balancing tasks. They are generally very performant and very expensive. Hardware
LBs are generally L4 LBs … F5, A10 Thunder ADC, or Citrix ADC

4. DNS LB: integrate with the Domain Name Services (DNS) infrastructure to cause a client’s name
lookup for a service (e.g., for “www.google.com”) to return a different IP address to each requester
corresponding to one of a pool of back-end servers in their geographic location.

3.3 METHODS OF LOAD BALANCING


Load balancing algorithms are generally divided into two groups: static and dynamic. Static algorithms
function the same regardless of the state of the back end serving the requests, whereas dynamic
algorithms take into account the state of the backend and consider system load when routing requests.
The most commonly used classes of algorithms are Round Robin, Least Connections, Least Load, and IP
Hashing.

Round Robin Least Connections Least Load


1. Static … algorithms are generally simpler and more efficient to implement but can lead to uneven
distribution of requests
 (Weighted) Round Robin: most common and simple
 Randomized: simple and stateless
 Hashing: caching, stickiness

2. Dynamic … algorithms are more complex and entail more overhead, as they require
communication between the LB and the back-end servers, but can more efficiently distribute
requests
 Least Load: optimal use of resources
 Power-of-d-choices: efficient when “d” is low, works with multiple LBs

3.4 SECURITY OF LOAD BALANCING


Load Balancers are also a natural point of protection against DDOS attacks. Since they generally
prevent server overload by distributing requests well, in the case of a DDOS attack the LB makes it
harder to overload the whole system. They also remove a single point of failure, and therefore make
the system harder to attack.

 Layer 4:
o Only has access to TCP & UDP data
o Faster but lack of information that can lead to uneven traffic

 Layer 7:
o Full access to HTTP protocol and data & SSL termination
o Check for authentication & smarting routing options
4. CACHING (*)
4.1 OVERVIEW
Caching can exist at any level of a system, from a single CPU to a distributed cluster to improve
performance and save cost in the long run. AWS ElastiCache,
 Caching is putting most frequently requested data onto memory instead of storage media
 Caching at multiple layers: DNS, CDN (Content Delivery Network), Web App, Databases
 Cache hit when data request found in memory
 Cache miss when data request NOT found in memory & retrieve from datastore

4.2 WRITING POLICY


A cache is made of copies of data, and is thus transient storage, so when writing we need to decide
when to write to the cache and when to write to the primary datastore.

 Write-through cache: solves data inconsistency problem by forcing all writes to update both the
cache and datastore at the same time. If the cache layer fails, then the update isn’t lost because it’s
been persisted. In exchange, the write takes longer to succeed because it needs to update the
slower memory.

 Write-back cache: writes first to the cache and marks the cache block with a dirty bit, then writes
to the primary datastore. New value will NOT be stored to datastore unless the cache block gets
replaced.

 Write-around cache: writes directly to the primary datastore, and the cache checks with the data
store to keep the cached data valid. If the application is accessing the newest data, the cache might
be behind, but the write doesn’t have to wait on two systems being updated and the primary
datastore is always up to date.
4.3 EVICTION POLICY
The cache replacement policy is an essential part of the success of a caching layer. The replacement
policy (also called eviction policy) decides what memory to free when the cache is full.

 LRU – Least Recently Used …the oldest entry will be freed … most popular policy
 LFU – Least Frequently Used … the least frequently used entry will be freed
 LFRU – Mixed … starting with LFU + LRU entry will be freed
 TTL – Time To Live … an amount of time after which a resource is freed if it hasn’t been used

4.4 DISTRIBUTED CACHE


In a distributed system a caching layer can reside on each machine in the application service cluster, or
it can be implemented in a cluster isolated from the application service.

 Private cache: is only used by one client, only for the IP it was created for. Generally this applies
only to a cache maintained by that client itself, though if you had a proxy that was only being used
by one client it would be possible to configure it to act as a private cache.
1. Can only be used by one user/client, such as personal information on a web site
2. Resources such as documents only available for one particular user or authorized users
3. Resources served via the HTTPS protocol
4. Responses with cookies

 Public cache: or “shared” cache is used by more than one client. As such, it gives a greater
performance gain and a much greater scalability gain, as a user may receive cached copies of
representations without ever having obtained a copy directly from the origin server.
1. Infrequently changed
2. Popular demand (requested frequently)

4.5 CONTENT DELIVERY NETWORK


 Content Delivery Network: Akamai
5. DATABASES (*)
5.1 OVERVIEW
A database is an organized collection of structured information, or data, typically stored electronically
in a computer system. The core functionality of a database are to store data, update and delete data,
return data according to a query.

The CAP Theorem says that any distributed database can only satisfy two of the three features:
1. Consistency: every node responds with the most recent version of the data
2. Availability: any node can send a response
3. Partition Tolerance: the system continues working if communication between the nodes is broken

A system is called a CP database if it provides Consistency and Partition Tolerance, and an AP database
if it provides Availability and Partition Tolerance. A transaction is a series of database operations that
are considered to be a "single unit of work". The operations in a transaction either all succeed, or they
all fail in comply with the ACID properties. SQL DB follows this property but NoSQL DB does NOT.

5.2 DBASE SCHEMAS


SQL database schema defines the shape of a data structure & specifies tables, indexes, data constraints

 Tuples (ROW): data sets that apply to one item


 Attributes (COLUMN): describing characteristic of each tuple
 Index: is a table that has a column of interest & a foreign key reference the original table
 Data Types: integer, floating point, character, string, Boolean
==============
 Data Constraints:
1. not null - each value in a column must not be NULL
2. unique - value(s) in specified column(s) must be unique for each row in a table
3. check - an expression is specified, which must evaluate to true for constraint to be satisfied
4. PRIMARY KEY: uniquely identifies every row (a record) in the table
5. FOREIGN KEY: provides a link between data in two tables

Index

Tuples

Constraint
5.3 DBASE SCALING
Different databases are better and worse at scaling because of the features they provide. There are
two kinds of scaling:
 Vertical Scaling: is fairly straightforward but has much lower overall memory capacity with adding
more powerful hardware. There is a diminishing value point in return of investment
1. Adding compute resources: CPU
2. Adding memory resources: RAM, Disk, SSD, NICs

 Horizontal Scaling: Horizontal scaling on the other hand has much higher overall compute and
storage capacity, and can be sized dynamically without downtime. Adding LB & more nodes
1. Replication: when database has too much load -> 1 master node & Nth slaves nodes
2. Sharding: when database has too much data -> shard database to Nth shards

 Vertical Scaling  Horizontal cal Scaling

5.4 SQL DBASE … MySQL, PostgreSQL, MSFT SQL, AWS Aurora


A relational database typically stores information in tables containing specific pieces and types of data.
This form of data storage is often called structured data. A relational database organizes data into
tables which can be linked—or related—based on data to each. This capability enables you to retrieve
an entirely new table from data in one or more tables with a single query.

 Small project + Low Scale + Unknown


Access Patterns
 Large project + High Scale + Relational
Queries w/ replicas

When to use SQL:


 When access patterns aren’t defined
 When to perform flexible & relational queries
 When to enforce field constraints
5.5 NOSQL DBASE … MongoDB, ElasticSearch, DynamoDB, HBase, Cassandra
The major purpose of using a NoSQL database is for distributed data stores with humongous data
storage needs. NoSQL is used for Big-Data and real-time web apps such as Twitter, Facebook and
Google. The obvious advantage of a non-relational database is the ability to store and process large
amounts of unstructured data. As a result, it can process ANY type of data without needing to modify
the architecture. So, creating and maintaining a NoSQL database is faster and cheaper.

 When need high performance & low


latency

 Medium/Large project + High Scale +


High Perf

When to use NoSQL:


 When access patterns is defined
 When primary key is known
 When data model fits (graphs)

5.6 DBASE TYPES


Types of NoSQL databases:

 Document Databases: these DB usually pair each key with a complex data structure which is
called a document. Documents can contain key-array pairs or key-value pairs or even nested
documents. Examples: MongoDB, Raven DB, Couchbase, ArangoDB, CouchDB

 Key-value stores: every single item is stored as a Key-value pair. Key-value stores are the
simplest database among all NoSQL DB. Examples: Redis, MemCached, Apache Ignite, Riak

 Wide-column stores: these types of DB are optimized for queries over large datasets, and
instead of rows, they store columns of data together. Examples: Cassandra, HBase, Scylla

 Graph stores: these store information about graphs, networks, such as social connections, road
maps, and transport links. Examples: Neo4j, AllegroGraph
6. SHARDING (*)
6.1 OVERVIEW
Sharding is essentially the horizontal scaling of a database system that is accomplished by breaking the
database up into smaller “shards”, which are separate database servers that all contain a subset of the
overall dataset. Database sharding splits up data in a particular way so that data access patterns can
remain as efficient as possible. A shard is a horizontal partition, meaning the database table is split up
by drawing a horizontal line between rows.
Replication: when database has too much load -> replicate database to Master -> Slaves nodes
Sharding: when database has too much data -> shard database to multiple shards

Optimization Techniques:
1. Scaling up HW:
 CPU
 Memory
 Disk

2. Adding Replica (Read)

Write
M
Write “Eventual Consistency”
... Stale data due to WRITE delay from Master
Read R node to Replica nodes while READ request
Requests Write comes in. Not acceptable for Financial App …!

3. Sharding Database … splitting Master database into S1, S2, S3 … Sn

S1
Routing
Requests

S2

Sn
 Each row is assigned a shard key that maps it to the logical shard it can be found on. More than
one logical shard can be located on the same physical shard, but a logical shard can’t be split
between physical shards.
 Relational data models are optimized for data with many cross-table relationships. If shards aren’t
isolated, the cluster will spend a lot of time on multi-shard queries and upholding multi-shard
consistency. This process of making sure there aren’t relational constraints between different
shards is called Denormalization. Denormalization is achieved by duplicating data (which adds
write complexity) and grouping data so that it can be accessed in a single row.

6.2 SHARDING ADVANTAGES


PROs: CONs:
 Scalability  Complexity
 Availability 1. Partition Mapping
 Fault Tolerance 2. Routing
3. Non-Uniformity
o Re-Sharding
 Analytic Queries

6. 3 SHARD KEY
The different approaches to sharded architectures are based on how the shard key is assigned.
Regardless of what they're derived from, shard keys need to be unique across shards, so their values
need to be coordinated. This leads to a tradeoff between a centralized "name server" that can
dynamically optimize logical shards for performance, and a predetermined distributed algorithm that
is faster to compute. Here are the three most common approaches to sharding:
 Range: data can be assigned to a shard based on what "range" it falls into. For example, a
database with sequential time-based data like log history could shard based on month ranges.

 Hash: to address the issue of imbalanced shards, data can be distributed based on a hash of
part of the data. An effective hash function will randomize and distribute incoming data to
prevent any access patterns that could overwhelm a node.

 Name Node: the last approach is any implementation that uses a central "name node" to
coordinate the mapping of data to shard keys. What's nice about this approach is it makes the
business logic very clear and easy to update based on usage patterns.
7. Polling, SSE, and WebSockets
7.1 PURPOSES
The HTTP standard is widely supported and very functional. But when your application needs to
transmit continuous information streams or real-time updates to clients, like a collaborative document
editor that shows changes in real time. In cases like this, having to repeatedly make regular HTTP
requests will slow things down. Polling, WebSockets, and server sent events are all techniques for
streaming high volumes of data to or from a server.

7.2 POLLING
 Short Polling: is the original polling protocol for clients to get regular information updates from a
server. The steps of short polling are:
1. Client sends Server an HTTP request for new information.
2. Server responds with new information, or no information.
3. Client repeats the request at a set interval (e.g. 2s)

 Long Polling: is a more efficient version of short polling. The steps of long polling are:
1. Client sends Server and HTTP request for new information
2. Server waits until there’s new information to respond (a “hanging” response)
3. Client repeats the request as soon as it gets the previous response back

7.3 SERVER SENT EVENTS


 Server Sent Events: provide a one-way connection for a server to push new data to a client,
without reestablishing a connection every time. For example a social media app could use SSE to
push new posts to a user feed as soon as they’re available. SSE connections follow the EventSource
interface, which uses HTTP to make the underlying communications. At a high level, the steps of
SSE are:

1. Client creates a new EventSource object targeting the server


2. Server registers SSE connection
3. Server sends new data to the client
4. Client receives messages with EventSource handlers
5. Either side closes the connection

7.4 WEBSOCKETS
 WebSockets: is a two-way message passing protocol based on TCP. WebSockets are faster for data
transmission than HTTP because it has less protocol overhead and operates at a lower level in the
network stack. At a high level, the steps of a WebSockets connection are:

1. Client-Server establish connection over HTTP then upgraded to WebSockets handshake


2. TCP messages are transmitted bi-directions over port 443 or 80 if it’s not TLS encrypted
3. Either side closes the connection
8. Queues & pub-sub
8.1 DEFINITION
Queues and pub-sub are both mechanisms that allow a system to process messages asynchronously,
which basically allows the sender and receiver of a message to work independently, by providing a
“middleman”. This can eliminate bottlenecks and help the system to operate more efficiently.

8.2 MESSAGING-ORIENTED MIDDLEWARE


Any complex system will have different components, possibly running entirely different hardware and
software, which need to be able to communicate with each other. Messaging-oriented middleware
(MOM) enables this communication, much like the post office enables people to send each other
letters. Producers hand off packets of data called messages to the MOM which makes sure it’s
delivered to the correct consumers.

Message passing allows components to communicate asynchronously. In other words, a producer can
send messages independently of the state of the consumer. If the consumer is too busy or offline, a
MOM will make sure the messages are delivered once the consumer becomes available again.

Asynchronicity enables system components to be decoupled from each other. This adds resilience
because, when one component fails, the others can continue functioning normally. It also adds data
integrity because successful message passing isn’t dependent on the producer and consumer being
responsive at the same time.

The software that implements a MOM can be called a message broker. Message brokers may
implement just one, or several different kinds of message passing including both queues and pub-sub.
Let’s take a look at how queues and pub-sub work.

8.3 MESSAGES QUEUES


Message queues are a kind of messaging-oriented middleware where producers push new messages to
a named First-In, First-Out (FIFO) queue, which consumers can then pull from.

Message queues are also called point-to-point messaging because there is a one-to-one relationship
between a message’s producer and consumer. There can be many producers and consumers using the
same queue, but any particular message will only have one producer and one consumer.

Different queue implementations will vary in how much space the queue has, whether or not messages
are batched, and how long a message is kept for if it isn’t consumed.

8.4 PUB-SUB
The publish-subscribe pattern, also called pub-sub, is a kind of messaging-oriented middleware that
pushes a producer’s newly “published” messages based on a “subscription” of the consumer’s
preferences.

There is a one-to-many relationship between publishers and subscribers, meaning any number of
subscribers can get a copy of a message, but there’s only one publisher of that message. Pub-sub
doesn’t guarantee message order, just that consumers will only see messages that they’ve subscribed
to.

9. Leader Election
9.1 LEADER ELECTION BASICS
Sometimes horizontally scaling a system is as simple as spinning up a cluster of nodes and letting each
node respond to whatever subset of the incoming requests they receive. At other times, the task at
hand requires more precise coordination between the nodes and it’s helpful to have a leader node
directing what the follower nodes work on.

9.2 LEADER ELECTION ALGORITHMS


 Bully Algorithm: is a simple synchronous leader election algorithm. This algorithm requires that each
nodes has a unique numeric id, and that nodes know the ids of all other nodes in the cluster. The
election process starts when a node starts up or when the current leader fails the healthcheck. There
are two cases:
1. If the node has the highest id, it declares itself the winner and sends this message to the rest of the
nodes.
2. If the node has a lower id, it messages all nodes with higher ids and if it doesn't get a response, it
assumes all of them have failed or are unavailable, and declares itself the winner.

 Paxos: uses state machine replication to model the distributed system, and then chooses a leader by
having some nodes propose a leader, and some nodes accept proposals. When a quorum of (enough
of) the accepting nodes choose the same proposed leader, that proposed leader becomes the actual
leader.

 RAFT: each node keeps track of the current "election term". When leader election starts each node
increments its copy of the term number and listens for messages from other nodes. After a random
interval, if the node doesn't hear anything, it will become a candidate leader and ask other nodes for
votes.

 Apache ZooKeeper (ZAB): is the protocol used by Apache ZooKeeper to handle leader election,
replication order guarantees, and node recovery. It is called this because the leader “broadcasts” state
changes to followers to make sure writes are consistent and propagated to all nodes. ZAB is an
asynchronous algorithm.

9.3 ALTERNATIVES TO LEADER ELECTION


Alternatives to leader election are based on the premise that coordination is possible without a
dedicated leader node, thus achieving the primary function of leader election with lower
implementation complexity. Here’s a brief overview of three of the most notable alternatives:

 Locking
 Idempotent APIs
 Workflow Engines

You might also like