0% found this document useful (0 votes)

4 views18 pages

System Design Concepts

Distributed System Design Concepts

Uploaded by

gitantranthien

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

4 views18 pages

System Design Concepts

Distributed System Design Concepts

Uploaded by

gitantranthien

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 18

SYSTEM DESIGN CONCEPTS

1. Network Protocols … is a set of rules and messages that form an

Internet standard
1.1 MODELS:

7) Application
6) Presentation
5) Session
4) Transport = Ports, TCP, UDP
3) Network = IP Address, Routers, Hosts
2) Data Link = MAC Address, NICs, Switches
1) Physical = Ethernet Cables, Wi-Fi, HUB

1.2 PROTOCOLS:
 ARP = Address Resolution Protocol
 FTP = File Transfer Protocol
 SMTP = Simple Mail Transfer Protocol
 HTTP = Hyper Text Transfer Protocol
 SSL = Secure Socket Layer
 TLS = Transport Layer Security
 HTTPS = HTTP secured with SSL/TLS
 DNS = Domain Name System
 DHCP = Dynamic Host Configuration Protocol

1.3 DATA TRANSFER:

Every host needs 4 items for Internet Connectivity
1. IP Address = Host’s Identity on the Internet
2. Subnet Mask = Size of Host’s network
3. Default Gateway = Router’s IP Address
4. DNS Server IP(s) = Translate domain name to IPs
2. Latency, Throughput, and Availability
3.1 LATENCY
Latency is the amount of time in milliseconds (ms) it takes a single message to be delivered. The
concept can be applied to any aspect of a system where data is being requested and transferred.
Latency can be improved by:
 avoid network calls whenever possible
 replicate data across data centers for disaster recovery as well as performance
 use CDNs to reduce latency
 keep frequently use data in memory if possible

3.2 THROUGHPUT
Throughput is the amount of data that is successfully transmitted through a system in a certain
amount of time, measured in bits per second (bps). Throughput is a measurement of how much is
actually transmitted, and it is not the theoretical capacity (bandwidth) of the system. Throughput can
be improved by: increasing bandwidth, improving latency, protocol choice

3.3 AVAILABILITY
 Availability is amount of time a system is operational during a period of time.
 Availability = (available time / total time) x 100
 Availability = (23 hours uptime / 24 hours) X 100
 Availability = 95.83 %

Availability can be improved by:

 Failover systems: duplicates of any part of the system that are switched to in the case of failure.
 Clustering: running multiple instances of a part of the system
 Backups: data backups and replication
 Redundancy: physically locating systems in different parts of the world

3.4 RELIABILITY
Reliability: is the probability a system will fail during a period of time & slightly harder to define than
HW reliability

 MTBF = (Total elapsed time – Total downtime) / number of failures

 MTBF = (24 hours uptime – 4 hours downtime) / 4 failures
 MTBF = 5 hours
3.5 CAPACITY ESTIMATES
3.5.1 Data conversions

3.5.2 Common data types

 1 byte = 8 bits
 Character = 1 byte
 Integer = 4 bytes (32 bits integer)
 UNIX timestamp = 4 bytes

3.5.3 Time conversions

 60 seconds x 60 minutes = 3,600 seconds per hour
 3600 seconds x 24 hours = 86,400 seconds per day
 86,400 seconds x 30 days = 2,500,000 seconds per month
3.5.4 Traffic estimates
 Instagram App

Estimate total number of requests app will receive

(Average Daily Active Users) * (Average reads/writes per user) … DAU = Daily Active User

o 10 million DAU * 30 photos viewed = 300 million photo reads

o 10 million DAU * 1 photo uploaded = 10 million photo writes
o ===================================================
o 300 million reads // 86,400 seconds = 3, 472 reads per second
o 10 million writes // 86,400 seconds = 115 writes per second

Estimate total number of cache memory app will need

(Read Requests per Day) * (Average request size) * .2
o 300 million reads * 500 bytes = 150 Gigabytes
o 150 Gigabytes * .2 (80/20 rule) = 30 Gigabytes … (20% of request-out of 80% data)
o 30 Gigabytes * 3 (replications) = 90 Gigabytes

Estimate total bandwidth app will need

(Requests per Day) * (Request size)

o 300 million requests * 1.5 Megabytes = 450,000 Gigabytes

o 450,000 Gigabytes // 86,400 seconds = 5.2 Gigabyte per second

Estimate total storage app will need

(Writes per Day) * (Size of writes) * (Time to store data)

o 10 million writes * 1.5 Megabytes = 15 Terabytes per day

o 15 Terabytes * 365 days * 10 years = 55 Petabytes
3. LOAD BALANCER (*)
3.1 OVERVIEW
Balance incoming traffic to multiple servers to improve scalability, availability, and flexibility of application
with the following features:

 Distribute incoming traffic to the network by efficiently distributing across multiple servers
 Reliability & High Availability is maintained by redirecting requests only to the available severs
 Ease of use in adding and removing servers in the network as per demand

3.2 TYPES OF LOAD BALANCERS

1. SW-based LB: The most well-known and respected software load balancers are Nginx, Avi Vantage,
and HAProxy. Software load balancers like these run on standard servers and are less hardware
optimized, but cheaper to set up and run. SW LB may include a reverse proxy

2. Cloud-based LB: If you are designing a high-availability application for better performance &
security, then the following cloud LB will help you. Each has some advantages or additional features
than others, so choose what works for you: AWS ELB, GCP LB, MSFT Azure LB, etc.

3. HW-based LB: Hardware load balancers are specialized appliances with circuitry designed to
perform load balancing tasks. They are generally very performant and very expensive. Hardware
LBs are generally L4 LBs … F5, A10 Thunder ADC, or Citrix ADC

4. DNS LB: integrate with the Domain Name Services (DNS) infrastructure to cause a client’s name
lookup for a service (e.g., for “www.google.com”) to return a different IP address to each requester
corresponding to one of a pool of back-end servers in their geographic location.

3.3 METHODS OF LOAD BALANCING

Load balancing algorithms are generally divided into two groups: static and dynamic. Static algorithms
function the same regardless of the state of the back end serving the requests, whereas dynamic
algorithms take into account the state of the backend and consider system load when routing requests.
The most commonly used classes of algorithms are Round Robin, Least Connections, Least Load, and IP
Hashing.

Round Robin Least Connections Least Load

1. Static … algorithms are generally simpler and more efficient to implement but can lead to uneven
distribution of requests
 (Weighted) Round Robin: most common and simple
 Randomized: simple and stateless
 Hashing: caching, stickiness

2. Dynamic … algorithms are more complex and entail more overhead, as they require
communication between the LB and the back-end servers, but can more efficiently distribute
requests
 Least Load: optimal use of resources
 Power-of-d-choices: efficient when “d” is low, works with multiple LBs

3.4 SECURITY OF LOAD BALANCING

Load Balancers are also a natural point of protection against DDOS attacks. Since they generally
prevent server overload by distributing requests well, in the case of a DDOS attack the LB makes it
harder to overload the whole system. They also remove a single point of failure, and therefore make
the system harder to attack.

 Layer 4:
o Only has access to TCP & UDP data
o Faster but lack of information that can lead to uneven traffic

 Layer 7:
o Full access to HTTP protocol and data & SSL termination
o Check for authentication & smarting routing options
4. CACHING (*)
4.1 OVERVIEW
Caching can exist at any level of a system, from a single CPU to a distributed cluster to improve
performance and save cost in the long run. AWS ElastiCache,
 Caching is putting most frequently requested data onto memory instead of storage media
 Caching at multiple layers: DNS, CDN (Content Delivery Network), Web App, Databases
 Cache hit when data request found in memory
 Cache miss when data request NOT found in memory & retrieve from datastore

4.2 WRITING POLICY

A cache is made of copies of data, and is thus transient storage, so when writing we need to decide
when to write to the cache and when to write to the primary datastore.

 Write-through cache: solves data inconsistency problem by forcing all writes to update both the
cache and datastore at the same time. If the cache layer fails, then the update isn’t lost because it’s
been persisted. In exchange, the write takes longer to succeed because it needs to update the
slower memory.

 Write-back cache: writes first to the cache and marks the cache block with a dirty bit, then writes
to the primary datastore. New value will NOT be stored to datastore unless the cache block gets
replaced.

 Write-around cache: writes directly to the primary datastore, and the cache checks with the data
store to keep the cached data valid. If the application is accessing the newest data, the cache might
be behind, but the write doesn’t have to wait on two systems being updated and the primary
datastore is always up to date.
4.3 EVICTION POLICY
The cache replacement policy is an essential part of the success of a caching layer. The replacement
policy (also called eviction policy) decides what memory to free when the cache is full.

 LRU – Least Recently Used …the oldest entry will be freed … most popular policy
 LFU – Least Frequently Used … the least frequently used entry will be freed
 LFRU – Mixed … starting with LFU + LRU entry will be freed
 TTL – Time To Live … an amount of time after which a resource is freed if it hasn’t been used

4.4 DISTRIBUTED CACHE

In a distributed system a caching layer can reside on each machine in the application service cluster, or
it can be implemented in a cluster isolated from the application service.

 Private cache: is only used by one client, only for the IP it was created for. Generally this applies
only to a cache maintained by that client itself, though if you had a proxy that was only being used
by one client it would be possible to configure it to act as a private cache.
1. Can only be used by one user/client, such as personal information on a web site
2. Resources such as documents only available for one particular user or authorized users
3. Resources served via the HTTPS protocol
4. Responses with cookies

 Public cache: or “shared” cache is used by more than one client. As such, it gives a greater
performance gain and a much greater scalability gain, as a user may receive cached copies of
representations without ever having obtained a copy directly from the origin server.
1. Infrequently changed
2. Popular demand (requested frequently)

4.5 CONTENT DELIVERY NETWORK

 Content Delivery Network: Akamai
5. DATABASES (*)
5.1 OVERVIEW
A database is an organized collection of structured information, or data, typically stored electronically
in a computer system. The core functionality of a database are to store data, update and delete data,
return data according to a query.

The CAP Theorem says that any distributed database can only satisfy two of the three features:
1. Consistency: every node responds with the most recent version of the data
2. Availability: any node can send a response
3. Partition Tolerance: the system continues working if communication between the nodes is broken

A system is called a CP database if it provides Consistency and Partition Tolerance, and an AP database
if it provides Availability and Partition Tolerance. A transaction is a series of database operations that
are considered to be a "single unit of work". The operations in a transaction either all succeed, or they
all fail in comply with the ACID properties. SQL DB follows this property but NoSQL DB does NOT.

5.2 DBASE SCHEMAS

SQL database schema defines the shape of a data structure & specifies tables, indexes, data constraints

 Tuples (ROW): data sets that apply to one item

 Attributes (COLUMN): describing characteristic of each tuple
 Index: is a table that has a column of interest & a foreign key reference the original table
 Data Types: integer, floating point, character, string, Boolean
==============
 Data Constraints:
1. not null - each value in a column must not be NULL
2. unique - value(s) in specified column(s) must be unique for each row in a table
3. check - an expression is specified, which must evaluate to true for constraint to be satisfied
4. PRIMARY KEY: uniquely identifies every row (a record) in the table
5. FOREIGN KEY: provides a link between data in two tables

Index

Tuples

Constraint
5.3 DBASE SCALING
Different databases are better and worse at scaling because of the features they provide. There are
two kinds of scaling:
 Vertical Scaling: is fairly straightforward but has much lower overall memory capacity with adding
more powerful hardware. There is a diminishing value point in return of investment
1. Adding compute resources: CPU
2. Adding memory resources: RAM, Disk, SSD, NICs

 Horizontal Scaling: Horizontal scaling on the other hand has much higher overall compute and
storage capacity, and can be sized dynamically without downtime. Adding LB & more nodes
1. Replication: when database has too much load -> 1 master node & Nth slaves nodes
2. Sharding: when database has too much data -> shard database to Nth shards

 Vertical Scaling  Horizontal cal Scaling

5.4 SQL DBASE … MySQL, PostgreSQL, MSFT SQL, AWS Aurora

A relational database typically stores information in tables containing specific pieces and types of data.
This form of data storage is often called structured data. A relational database organizes data into
tables which can be linked—or related—based on data to each. This capability enables you to retrieve
an entirely new table from data in one or more tables with a single query.

 Small project + Low Scale + Unknown

Access Patterns
 Large project + High Scale + Relational
Queries w/ replicas

When to use SQL:

 When access patterns aren’t defined
 When to perform flexible & relational queries
 When to enforce field constraints
5.5 NOSQL DBASE … MongoDB, ElasticSearch, DynamoDB, HBase, Cassandra
The major purpose of using a NoSQL database is for distributed data stores with humongous data
storage needs. NoSQL is used for Big-Data and real-time web apps such as Twitter, Facebook and
Google. The obvious advantage of a non-relational database is the ability to store and process large
amounts of unstructured data. As a result, it can process ANY type of data without needing to modify
the architecture. So, creating and maintaining a NoSQL database is faster and cheaper.

 When need high performance & low

latency

 Medium/Large project + High Scale +

High Perf

When to use NoSQL:

 When access patterns is defined
 When primary key is known
 When data model fits (graphs)

5.6 DBASE TYPES

Types of NoSQL databases:

 Document Databases: these DB usually pair each key with a complex data structure which is
called a document. Documents can contain key-array pairs or key-value pairs or even nested
documents. Examples: MongoDB, Raven DB, Couchbase, ArangoDB, CouchDB

 Key-value stores: every single item is stored as a Key-value pair. Key-value stores are the
simplest database among all NoSQL DB. Examples: Redis, MemCached, Apache Ignite, Riak

 Wide-column stores: these types of DB are optimized for queries over large datasets, and
instead of rows, they store columns of data together. Examples: Cassandra, HBase, Scylla

 Graph stores: these store information about graphs, networks, such as social connections, road
maps, and transport links. Examples: Neo4j, AllegroGraph
6. SHARDING (*)
6.1 OVERVIEW
Sharding is essentially the horizontal scaling of a database system that is accomplished by breaking the
database up into smaller “shards”, which are separate database servers that all contain a subset of the
overall dataset. Database sharding splits up data in a particular way so that data access patterns can
remain as efficient as possible. A shard is a horizontal partition, meaning the database table is split up
by drawing a horizontal line between rows.
Replication: when database has too much load -> replicate database to Master -> Slaves nodes
Sharding: when database has too much data -> shard database to multiple shards

Optimization Techniques:
1. Scaling up HW:
 CPU
 Memory
 Disk

2. Adding Replica (Read)

Write
M
Write “Eventual Consistency”
... Stale data due to WRITE delay from Master
Read R node to Replica nodes while READ request
Requests Write comes in. Not acceptable for Financial App …!

3. Sharding Database … splitting Master database into S1, S2, S3 … Sn

S1
Routing
Requests

Sn
 Each row is assigned a shard key that maps it to the logical shard it can be found on. More than
one logical shard can be located on the same physical shard, but a logical shard can’t be split
between physical shards.
 Relational data models are optimized for data with many cross-table relationships. If shards aren’t
isolated, the cluster will spend a lot of time on multi-shard queries and upholding multi-shard
consistency. This process of making sure there aren’t relational constraints between different
shards is called Denormalization. Denormalization is achieved by duplicating data (which adds
write complexity) and grouping data so that it can be accessed in a single row.

6.2 SHARDING ADVANTAGES

PROs: CONs:
 Scalability  Complexity
 Availability 1. Partition Mapping
 Fault Tolerance 2. Routing
3. Non-Uniformity
o Re-Sharding
 Analytic Queries

6. 3 SHARD KEY
The different approaches to sharded architectures are based on how the shard key is assigned.
Regardless of what they're derived from, shard keys need to be unique across shards, so their values
need to be coordinated. This leads to a tradeoff between a centralized "name server" that can
dynamically optimize logical shards for performance, and a predetermined distributed algorithm that
is faster to compute. Here are the three most common approaches to sharding:
 Range: data can be assigned to a shard based on what "range" it falls into. For example, a
database with sequential time-based data like log history could shard based on month ranges.

 Hash: to address the issue of imbalanced shards, data can be distributed based on a hash of
part of the data. An effective hash function will randomize and distribute incoming data to
prevent any access patterns that could overwhelm a node.

 Name Node: the last approach is any implementation that uses a central "name node" to
coordinate the mapping of data to shard keys. What's nice about this approach is it makes the
business logic very clear and easy to update based on usage patterns.
7. Polling, SSE, and WebSockets
7.1 PURPOSES
The HTTP standard is widely supported and very functional. But when your application needs to
transmit continuous information streams or real-time updates to clients, like a collaborative document
editor that shows changes in real time. In cases like this, having to repeatedly make regular HTTP
requests will slow things down. Polling, WebSockets, and server sent events are all techniques for
streaming high volumes of data to or from a server.

7.2 POLLING
 Short Polling: is the original polling protocol for clients to get regular information updates from a
server. The steps of short polling are:
1. Client sends Server an HTTP request for new information.
2. Server responds with new information, or no information.
3. Client repeats the request at a set interval (e.g. 2s)

 Long Polling: is a more efficient version of short polling. The steps of long polling are:
1. Client sends Server and HTTP request for new information
2. Server waits until there’s new information to respond (a “hanging” response)
3. Client repeats the request as soon as it gets the previous response back

7.3 SERVER SENT EVENTS

 Server Sent Events: provide a one-way connection for a server to push new data to a client,
without reestablishing a connection every time. For example a social media app could use SSE to
push new posts to a user feed as soon as they’re available. SSE connections follow the EventSource
interface, which uses HTTP to make the underlying communications. At a high level, the steps of
SSE are:

1. Client creates a new EventSource object targeting the server

2. Server registers SSE connection
3. Server sends new data to the client
4. Client receives messages with EventSource handlers
5. Either side closes the connection

7.4 WEBSOCKETS
 WebSockets: is a two-way message passing protocol based on TCP. WebSockets are faster for data
transmission than HTTP because it has less protocol overhead and operates at a lower level in the
network stack. At a high level, the steps of a WebSockets connection are:

1. Client-Server establish connection over HTTP then upgraded to WebSockets handshake

2. TCP messages are transmitted bi-directions over port 443 or 80 if it’s not TLS encrypted
3. Either side closes the connection
8. Queues & pub-sub
8.1 DEFINITION
Queues and pub-sub are both mechanisms that allow a system to process messages asynchronously,
which basically allows the sender and receiver of a message to work independently, by providing a
“middleman”. This can eliminate bottlenecks and help the system to operate more efficiently.

8.2 MESSAGING-ORIENTED MIDDLEWARE

Any complex system will have different components, possibly running entirely different hardware and
software, which need to be able to communicate with each other. Messaging-oriented middleware
(MOM) enables this communication, much like the post office enables people to send each other
letters. Producers hand off packets of data called messages to the MOM which makes sure it’s
delivered to the correct consumers.

Message passing allows components to communicate asynchronously. In other words, a producer can
send messages independently of the state of the consumer. If the consumer is too busy or offline, a
MOM will make sure the messages are delivered once the consumer becomes available again.

Asynchronicity enables system components to be decoupled from each other. This adds resilience
because, when one component fails, the others can continue functioning normally. It also adds data
integrity because successful message passing isn’t dependent on the producer and consumer being
responsive at the same time.

The software that implements a MOM can be called a message broker. Message brokers may
implement just one, or several different kinds of message passing including both queues and pub-sub.
Let’s take a look at how queues and pub-sub work.

8.3 MESSAGES QUEUES

Message queues are a kind of messaging-oriented middleware where producers push new messages to
a named First-In, First-Out (FIFO) queue, which consumers can then pull from.

Message queues are also called point-to-point messaging because there is a one-to-one relationship
between a message’s producer and consumer. There can be many producers and consumers using the
same queue, but any particular message will only have one producer and one consumer.

Different queue implementations will vary in how much space the queue has, whether or not messages
are batched, and how long a message is kept for if it isn’t consumed.

8.4 PUB-SUB
The publish-subscribe pattern, also called pub-sub, is a kind of messaging-oriented middleware that
pushes a producer’s newly “published” messages based on a “subscription” of the consumer’s
preferences.

There is a one-to-many relationship between publishers and subscribers, meaning any number of
subscribers can get a copy of a message, but there’s only one publisher of that message. Pub-sub
doesn’t guarantee message order, just that consumers will only see messages that they’ve subscribed
to.

9. Leader Election
9.1 LEADER ELECTION BASICS
Sometimes horizontally scaling a system is as simple as spinning up a cluster of nodes and letting each
node respond to whatever subset of the incoming requests they receive. At other times, the task at
hand requires more precise coordination between the nodes and it’s helpful to have a leader node
directing what the follower nodes work on.

9.2 LEADER ELECTION ALGORITHMS

 Bully Algorithm: is a simple synchronous leader election algorithm. This algorithm requires that each
nodes has a unique numeric id, and that nodes know the ids of all other nodes in the cluster. The
election process starts when a node starts up or when the current leader fails the healthcheck. There
are two cases:
1. If the node has the highest id, it declares itself the winner and sends this message to the rest of the
nodes.
2. If the node has a lower id, it messages all nodes with higher ids and if it doesn't get a response, it
assumes all of them have failed or are unavailable, and declares itself the winner.

 Paxos: uses state machine replication to model the distributed system, and then chooses a leader by
having some nodes propose a leader, and some nodes accept proposals. When a quorum of (enough
of) the accepting nodes choose the same proposed leader, that proposed leader becomes the actual
leader.

 RAFT: each node keeps track of the current "election term". When leader election starts each node
increments its copy of the term number and listens for messages from other nodes. After a random
interval, if the node doesn't hear anything, it will become a candidate leader and ask other nodes for
votes.

 Apache ZooKeeper (ZAB): is the protocol used by Apache ZooKeeper to handle leader election,
replication order guarantees, and node recovery. It is called this because the leader “broadcasts” state
changes to followers to make sure writes are consistent and propagated to all nodes. ZAB is an
asynchronous algorithm.

9.3 ALTERNATIVES TO LEADER ELECTION

Alternatives to leader election are based on the premise that coordination is possible without a
dedicated leader node, thus achieving the primary function of leader election with lower
implementation complexity. Here’s a brief overview of three of the most notable alternatives:

 Locking
 Idempotent APIs
 Workflow Engines

System Design Notes 1664811186
No ratings yet
System Design Notes 1664811186
24 pages
Algomasterio System Design Interview Handbook
No ratings yet
Algomasterio System Design Interview Handbook
19 pages
System Design Interviews
No ratings yet
System Design Interviews
151 pages
System Design Basics: Key Concepts
No ratings yet
System Design Basics: Key Concepts
35 pages
Thesis
No ratings yet
Thesis
158 pages
The System Design
No ratings yet
The System Design
135 pages
Scaling Systems to Millions of Users
No ratings yet
Scaling Systems to Millions of Users
40 pages
Grokking The System Design Interview
0% (1)
Grokking The System Design Interview
25 pages
Caching
No ratings yet
Caching
12 pages
HLD Notes
No ratings yet
HLD Notes
35 pages
System Design Cheat Sheet
No ratings yet
System Design Cheat Sheet
6 pages
System Design Interview Essentials
No ratings yet
System Design Interview Essentials
18 pages
Basics
No ratings yet
Basics
4 pages
System Design Scale System From Zero To Million Users #Systemdesign (English) (DownloadYoutubeSubtitles - Com)
No ratings yet
System Design Scale System From Zero To Million Users #Systemdesign (English) (DownloadYoutubeSubtitles - Com)
8 pages
Distributed Caching - A System Design Interview Guide
No ratings yet
Distributed Caching - A System Design Interview Guide
8 pages
Unit - 2
No ratings yet
Unit - 2
16 pages
System Design
No ratings yet
System Design
56 pages
System Design
No ratings yet
System Design
56 pages
All Notes of WEB
No ratings yet
All Notes of WEB
15 pages
System Design
No ratings yet
System Design
30 pages
Networking Long
No ratings yet
Networking Long
17 pages
System Design: Interview Prep
No ratings yet
System Design: Interview Prep
30 pages
2023 Load Balancing Solutions Guide
No ratings yet
2023 Load Balancing Solutions Guide
14 pages
Concepts You Should Know About Large System Design - by Lee - Medium
No ratings yet
Concepts You Should Know About Large System Design - by Lee - Medium
19 pages
Lec11 2024 02 26 CachingDNS
No ratings yet
Lec11 2024 02 26 CachingDNS
29 pages
Network Programming-Module 05
No ratings yet
Network Programming-Module 05
49 pages
Sys Design
No ratings yet
Sys Design
3 pages
Load Balancing Algorithms-Worksheet1
No ratings yet
Load Balancing Algorithms-Worksheet1
11 pages
90 Must Know Interview Questions
No ratings yet
90 Must Know Interview Questions
90 pages
Strategy Design Interviews
No ratings yet
Strategy Design Interviews
4 pages
Chapter - 1 Introduction To Networking
No ratings yet
Chapter - 1 Introduction To Networking
58 pages
Lecture 4 P1
No ratings yet
Lecture 4 P1
42 pages
System Design Terms
No ratings yet
System Design Terms
20 pages
Scaling To Millions Users
No ratings yet
Scaling To Millions Users
21 pages
Content Delivery Networks: Content in Today's Internet
No ratings yet
Content Delivery Networks: Content in Today's Internet
15 pages
Tim Hawkins: or "How To Survive The Digg or Slashdot Effect"
100% (10)
Tim Hawkins: or "How To Survive The Digg or Slashdot Effect"
34 pages
BHARAT
No ratings yet
BHARAT
25 pages
HLD Notes
No ratings yet
HLD Notes
76 pages
S11 - System Architecture
No ratings yet
S11 - System Architecture
79 pages
Cluster Based Content Caching Driven by Popularity Prediction
No ratings yet
Cluster Based Content Caching Driven by Popularity Prediction
10 pages
System Design Interview Preparation Guide
No ratings yet
System Design Interview Preparation Guide
14 pages
All1 7ForMidTerm PDF
No ratings yet
All1 7ForMidTerm PDF
97 pages
System Design of Distributed Cache
No ratings yet
System Design of Distributed Cache
76 pages
Week 2-3
No ratings yet
Week 2-3
21 pages
Session 11 Under The Hood of A Commercial Website: 15.561 Information Technology Essentials
No ratings yet
Session 11 Under The Hood of A Commercial Website: 15.561 Information Technology Essentials
29 pages
04 sp24 Overview
No ratings yet
04 sp24 Overview
25 pages
System Design
No ratings yet
System Design
8 pages
A Crash Course in Caching - Part 1 - by Alex Xu
No ratings yet
A Crash Course in Caching - Part 1 - by Alex Xu
9 pages
System Design Theory Book
No ratings yet
System Design Theory Book
128 pages
MIT P2P Report
No ratings yet
MIT P2P Report
8 pages
Unit 4.1 High-Speed Networks
No ratings yet
Unit 4.1 High-Speed Networks
26 pages
Unit 1
No ratings yet
Unit 1
133 pages
Week 2-Lec2 HTTP
No ratings yet
Week 2-Lec2 HTTP
27 pages
ACN Week 6
No ratings yet
ACN Week 6
31 pages
System Design Glossary Guide
No ratings yet
System Design Glossary Guide
13 pages
Chapter 7 Notes Final
No ratings yet
Chapter 7 Notes Final
13 pages
2 - Communications
No ratings yet
2 - Communications
10 pages
Final - Network Load Balancing
No ratings yet
Final - Network Load Balancing
67 pages
System Architecture Tutorial
No ratings yet
System Architecture Tutorial
16 pages
System Design Samples
No ratings yet
System Design Samples
18 pages
System Design Tracker
No ratings yet
System Design Tracker
7 pages
System Architecture Tutorial2
No ratings yet
System Architecture Tutorial2
19 pages
Speeding Mitigation Plan
No ratings yet
Speeding Mitigation Plan
2 pages
w2 - For Students - w2 - Preparation For Chap 5
No ratings yet
w2 - For Students - w2 - Preparation For Chap 5
3 pages
RPG and Story-Based Game in Game Development
No ratings yet
RPG and Story-Based Game in Game Development
9 pages
Study On The Relationship Between The WTO's IP Agreement and The Convention On Biological Diversity - Ipleaders
No ratings yet
Study On The Relationship Between The WTO's IP Agreement and The Convention On Biological Diversity - Ipleaders
20 pages
PSNC Intervention Motion in MVP DC FERC Case
No ratings yet
PSNC Intervention Motion in MVP DC FERC Case
16 pages
Constantine, Sirmium & Early Christianity
No ratings yet
Constantine, Sirmium & Early Christianity
82 pages
Httpbin (1) - HTTP Client Testing Service
No ratings yet
Httpbin (1) - HTTP Client Testing Service
2 pages
Jared Mccutchen: Education
No ratings yet
Jared Mccutchen: Education
2 pages
Cardiovascular & Pulmonary Review
No ratings yet
Cardiovascular & Pulmonary Review
59 pages
Structure of RNA
No ratings yet
Structure of RNA
36 pages
HSSC Cet: HKRHZ Ijh (Kk&2025
No ratings yet
HSSC Cet: HKRHZ Ijh (Kk&2025
128 pages
Grsim - RoboCup Small Size Robot Soccer Simulator PDF
No ratings yet
Grsim - RoboCup Small Size Robot Soccer Simulator PDF
11 pages
DLL Basic Cal. June 19-21-23 Week 12
No ratings yet
DLL Basic Cal. June 19-21-23 Week 12
3 pages
Incorporating Data Warehouse Using SSIS
No ratings yet
Incorporating Data Warehouse Using SSIS
25 pages
Procedimento Troca Da Correia Ford 1.6 Tivct SIGMA
No ratings yet
Procedimento Troca Da Correia Ford 1.6 Tivct SIGMA
20 pages
Iot Physical Devices and Endpoints: Bahga & Madisetti, © 2015
No ratings yet
Iot Physical Devices and Endpoints: Bahga & Madisetti, © 2015
14 pages
Brief Amici Curiae of Owners' Counsel of America and Nat'l Federation of Independent Business Small Business Legal Center in Support of Application for Writ of Certiorari, County of Kauai v. Hanalei River Holdings, Ltd., No. SCWC-0000828 (July 29, 2016)
No ratings yet
Brief Amici Curiae of Owners' Counsel of America and Nat'l Federation of Independent Business Small Business Legal Center in Support of Application for Writ of Certiorari, County of Kauai v. Hanalei River Holdings, Ltd., No. SCWC-0000828 (July 29, 2016)
25 pages
Firm Attributes & Market Value Nigeria
No ratings yet
Firm Attributes & Market Value Nigeria
38 pages
Algorithms and Models For The Web Graph Anthony Bonato Download
No ratings yet
Algorithms and Models For The Web Graph Anthony Bonato Download
175 pages
Music Listening and Critical Thinking
No ratings yet
Music Listening and Critical Thinking
15 pages
LUFFING TOWER CRANE Tower Crane Specifications - Jib Tower Crane
No ratings yet
LUFFING TOWER CRANE Tower Crane Specifications - Jib Tower Crane
14 pages
Ethics and Human Interface
No ratings yet
Ethics and Human Interface
17 pages
A Brief History of Behavioral and Cognitive Behavioral Approaches To Sexual Offenders
No ratings yet
A Brief History of Behavioral and Cognitive Behavioral Approaches To Sexual Offenders
19 pages
Community Health Nursing Review (Edited)
91% (35)
Community Health Nursing Review (Edited)
407 pages
AC Motors Vs DC Motors
No ratings yet
AC Motors Vs DC Motors
5 pages
The Ultimate 5-Ingredient Cookbook - Fast and Flavorful 5 Ingredients or Less Recipes For Any Skill Leve
100% (1)
The Ultimate 5-Ingredient Cookbook - Fast and Flavorful 5 Ingredients or Less Recipes For Any Skill Leve
105 pages
Delay-Tolerant Network Architecture
No ratings yet
Delay-Tolerant Network Architecture
17 pages
Trivia Quiz: Environment, History, Geography, Sports, Politics
No ratings yet
Trivia Quiz: Environment, History, Geography, Sports, Politics
4 pages
Introduction To Literary Theory Syllabus
No ratings yet
Introduction To Literary Theory Syllabus
2 pages
Interlocking Paver Block Making Cost: Top Layer 500
No ratings yet
Interlocking Paver Block Making Cost: Top Layer 500
5 pages

System Design Concepts

Uploaded by

System Design Concepts

Uploaded by

SYSTEM DESIGN CONCEPTS

1. Network Protocols … is a set of rules and messages that form an

1.3 DATA TRANSFER:

Availability can be improved by:

 MTBF = (Total elapsed time – Total downtime) / number of failures

3.5.2 Common data types

3.5.3 Time conversions

Estimate total number of requests app will receive

o 10 million DAU * 30 photos viewed = 300 million photo reads

Estimate total number of cache memory app will need

Estimate total bandwidth app will need

o 300 million requests * 1.5 Megabytes = 450,000 Gigabytes

Estimate total storage app will need

o 10 million writes * 1.5 Megabytes = 15 Terabytes per day

3.2 TYPES OF LOAD BALANCERS

3.3 METHODS OF LOAD BALANCING

Round Robin Least Connections Least Load

3.4 SECURITY OF LOAD BALANCING

4.2 WRITING POLICY

4.4 DISTRIBUTED CACHE

4.5 CONTENT DELIVERY NETWORK

5.2 DBASE SCHEMAS

 Tuples (ROW): data sets that apply to one item

 Vertical Scaling  Horizontal cal Scaling

5.4 SQL DBASE … MySQL, PostgreSQL, MSFT SQL, AWS Aurora

 Small project + Low Scale + Unknown

When to use SQL:

 When need high performance & low

 Medium/Large project + High Scale +

When to use NoSQL:

5.6 DBASE TYPES

2. Adding Replica (Read)

3. Sharding Database … splitting Master database into S1, S2, S3 … Sn

6.2 SHARDING ADVANTAGES

7.3 SERVER SENT EVENTS

1. Client creates a new EventSource object targeting the server

1. Client-Server establish connection over HTTP then upgraded to WebSockets handshake

8.2 MESSAGING-ORIENTED MIDDLEWARE

8.3 MESSAGES QUEUES

9.2 LEADER ELECTION ALGORITHMS

9.3 ALTERNATIVES TO LEADER ELECTION

You might also like