0% found this document useful (0 votes)

818 views83 pages

System Design For Data Engineering

The document provides an overview of system design in data engineering, covering key concepts such as scalable systems, horizontal and vertical scaling, load balancing, caching, and various database types. It discusses the importance of system architecture, the differences between monolithic and microservices architectures, and the benefits of using message queues and distributed systems. Additionally, it explores data processing methods like batch and stream processing, as well as the lambda and kappa architectures.

Uploaded by

krishnapmishra

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

818 views83 pages

System Design For Data Engineering

Uploaded by

krishnapmishra

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 83

Data

Engineering
System Design
Core Concepts

Shwetank Singh
GritSetGrow - GSGLearn.com
DATA ENGINEERING - SYSTEM DESIGN

WHAT IS SYSTEM DESIGN?

System design is the process of defining

1
the architecture, components, modules,
interfaces, and data for a system to
satisfy specified requirements. It involves
both high-level architecture and
detailed design.

Designing a news feed system for a

social media application.

Shwetank Singh
GritSetGrow - GSGLearn.com
DATA ENGINEERING - SYSTEM DESIGN

WHAT IS A SCALABLE SYSTEM?

A scalable system is one that can

2
handle increased load without
compromising performance by adding
resources.

Horizontal scaling by adding more

servers to handle increased traffic.

Shwetank Singh
GritSetGrow - GSGLearn.com
DATA ENGINEERING - SYSTEM DESIGN

HOW DO YOU APPROACH A

SYSTEM DESIGN INTERVIEW?

Understand the requirements, define the

3
scope, outline a high-level architecture,
dive into detailed design for key
components, and address potential
bottlenecks and trade-offs.

Designing a URL shortener: Start with the

core functionality, then address
storage, scalability, and fault tolerance.

Shwetank Singh
GritSetGrow - GSGLearn.com
DATA ENGINEERING - SYSTEM DESIGN

WHAT IS THE DIFFERENCE BETWEEN

HORIZONTAL AND VERTICAL
SCALING?
Horizontal scaling (scaling out)

4
involves adding more machines to a
system, while vertical scaling (scaling
up) involves adding more power (CPU,
RAM) to an existing machine.

Adding more servers to a web

application (horizontal) vs. upgrading
the server’s hardware (vertical).

Shwetank Singh
GritSetGrow - GSGLearn.com
DATA ENGINEERING - SYSTEM DESIGN

WHAT ARE THE ADVANTAGES OF

HORIZONTAL SCALING?

It improves fault tolerance, allows for

5
easier load distribution, and avoids the
limitations of a single machine.

Distributing web traffic across multiple

servers using a load balancer.

Shwetank Singh
GritSetGrow - GSGLearn.com
DATA ENGINEERING - SYSTEM DESIGN

WHAT IS A LOAD BALANCER?

A load balancer distributes incoming

6
network traffic across multiple servers to
ensure no single server becomes
overwhelmed.

Using an AWS Elastic Load Balancer to

manage web traffic.

Shwetank Singh
GritSetGrow - GSGLearn.com
DATA ENGINEERING - SYSTEM DESIGN

HOW DOES A LOAD BALANCER

IMPROVE SYSTEM RELIABILITY?

By distributing traffic, it ensures that if

7
one server fails, the load balancer can
redirect traffic to other available servers,
thus maintaining service availability.

In a web application with three servers,

if one fails, the load balancer redirects
traffic to the remaining two.

Shwetank Singh
GritSetGrow - GSGLearn.com
DATA ENGINEERING - SYSTEM DESIGN

WHAT IS CACHING?

Caching is a technique to store

8
frequently accessed data in a
temporary storage location to speed up
subsequent data retrievals.

Using Redis to cache database query

results.

Shwetank Singh
GritSetGrow - GSGLearn.com
DATA ENGINEERING - SYSTEM DESIGN

WHAT ARE THE TYPES OF CACHES?

There are client-side caches, server-side

9
caches, and distributed caches.

Browser cache (client-side),

Memcached (server-side), and Redis
(distributed).

Shwetank Singh
GritSetGrow - GSGLearn.com
DATA ENGINEERING - SYSTEM DESIGN

WHAT IS A CDN (CONTENT

DELIVERY NETWORK)?

A CDN is a network of servers distributed

10
geographically to deliver static content
to users from the nearest server location,
reducing latency.

Using Cloudflare CDN to serve static

assets like images and CSS files.

Shwetank Singh
GritSetGrow - GSGLearn.com
DATA ENGINEERING - SYSTEM DESIGN

HOW DOES A CDN IMPROVE

PERFORMANCE?

By reducing the distance between users

11
and the content, decreasing load times
and reducing bandwidth usage.

Serving a video from a server located

closer to the user, resulting in faster
loading times.

Shwetank Singh
GritSetGrow - GSGLearn.com
DATA ENGINEERING - SYSTEM DESIGN

WHAT IS DATABASE REPLICATION?

Database replication is the process of

12
copying data from one database server
(master) to another (slave) to ensure
high availability and fault tolerance.

Using MySQL replication to maintain a

backup database server.

Shwetank Singh
GritSetGrow - GSGLearn.com
DATA ENGINEERING - SYSTEM DESIGN

WHAT ARE THE BENEFITS OF

DATABASE REPLICATION?

It improves read performance, ensures

13
high availability, and provides data
redundancy.

A master-slave setup where the master

handles writes, and slaves handle
reads.

Shwetank Singh
GritSetGrow - GSGLearn.com
DATA ENGINEERING - SYSTEM DESIGN

WHAT IS SHARDING?

Sharding is a database partitioning

14
technique that divides a large database
into smaller, more manageable pieces,
called shards, which can be distributed
across multiple servers.

Splitting user data across different

databases based on user ID ranges.

Shwetank Singh
GritSetGrow - GSGLearn.com
DATA ENGINEERING - SYSTEM DESIGN

WHAT ARE THE CHALLENGES OF

SHARDING?

Challenges include data consistency,

15
complex queries across shards, and re-
sharding data when a shard grows too
large.

Implementing a consistent hashing

algorithm to evenly distribute data
across shards.

Shwetank Singh
GritSetGrow - GSGLearn.com
DATA ENGINEERING - SYSTEM DESIGN

WHAT IS A MESSAGE QUEUE?

A message queue is a component used

16
for communication between processes,
allowing them to send and receive
messages asynchronously.

Using RabbitMQ to manage

background tasks in a web application.

Shwetank Singh
GritSetGrow - GSGLearn.com
DATA ENGINEERING - SYSTEM DESIGN

WHAT ARE THE BENEFITS OF USING

A MESSAGE QUEUE?

It enables asynchronous processing,

17
improves system resilience, and
decouples system components.

Processing user signup emails in the

background while the main application
handles user requests.

Shwetank Singh
GritSetGrow - GSGLearn.com
DATA ENGINEERING - SYSTEM DESIGN

WHAT IS A MICROSERVICES
ARCHITECTURE?
Microservices architecture is an
architectural style that structures an

18
application as a collection of loosely
coupled services, each with its own
functionality and data storage.
Breaking down a monolithic e-
commerce application into individual
services for inventory, payment, and
user management.

Shwetank Singh
GritSetGrow - GSGLearn.com
DATA ENGINEERING - SYSTEM DESIGN

WHAT ARE THE BENEFITS OF

MICROSERVICES?

Benefits include improved modularity,

19
easier scaling, and better fault isolation.

Like scaling the payment service

independently of the inventory service
in an e-commerce application.

Shwetank Singh
GritSetGrow - GSGLearn.com
DATA ENGINEERING - SYSTEM DESIGN

WHAT IS A MONOLITHIC
ARCHITECTURE?

A monolithic architecture is a single-tier

20
software application where all
components are interconnected and
interdependent.

A traditional web application where the

frontend, backend, and database are
tightly integrated.

Shwetank Singh
GritSetGrow - GSGLearn.com
DATA ENGINEERING - SYSTEM DESIGN

WHAT ARE THE DRAWBACKS OF A

MONOLITHIC ARCHITECTURE?

Drawbacks include difficulty in scaling

21
individual components, tight coupling,
and challenges in maintaining and
deploying the application.

Updating one part of the application

requires redeploying the entire system.

Shwetank Singh
GritSetGrow - GSGLearn.com
DATA ENGINEERING - SYSTEM DESIGN

WHAT IS EVENTUAL CONSISTENCY?

Eventual consistency is a consistency

22
model used in distributed systems
where updates to a database will
propagate to all nodes, but not
immediately.

Shwetank Singh
GritSetGrow - GSGLearn.com
DATA ENGINEERING - SYSTEM DESIGN

HOW DO YOU ENSURE DATA

CONSISTENCY IN A DISTRIBUTED
SYSTEM?
Techniques include using consensus

23
algorithms (e.g., Paxos, Raft), distributed
transactions, and ensuring idempotent
operations.

The implementation of the Raft

algorithm is used to manage leader
election and data replication.

Shwetank Singh
GritSetGrow - GSGLearn.com
DATA ENGINEERING - SYSTEM DESIGN

WHAT IS CAP THEOREM?

CAP theorem states that in a distributed

data store, you can only achieve two out
of the following three guarantees:

24
Consistency, Availability, and Partition
tolerance.

Choosing between strong consistency

and availability in a distributed
database during network partitions.

Shwetank Singh
GritSetGrow - GSGLearn.com
DATA ENGINEERING - SYSTEM DESIGN

WHAT IS A KEY-VALUE STORE?

A key-value store is a type of NoSQL

database that stores data as a

25
collection of key-value pairs.

Using Redis or Amazon DynamoDB to

store session data.

Shwetank Singh
GritSetGrow - GSGLearn.com
DATA ENGINEERING - SYSTEM DESIGN

WHAT ARE THE ADVANTAGES OF

KEY-VALUE STORES?

They offer fast read and write

26
operations, are easy to scale, and
provide flexible data models.

For example storing user preferences in

Redis helps in quick retrieval.

Shwetank Singh
GritSetGrow - GSGLearn.com
DATA ENGINEERING - SYSTEM DESIGN

WHAT IS A DOCUMENT STORE?

A document store is a type of NoSQL

27
database that stores data as
documents, typically in JSON or BSON
format.

Using MongoDB to store user profiles

with varying attributes.

Shwetank Singh
GritSetGrow - GSGLearn.com
DATA ENGINEERING - SYSTEM DESIGN

WHAT IS A GRAPH DATABASE?

A graph database is designed to store

28
and query data modeled as graphs, with
nodes, edges, and properties.

Using Neo4j to manage social network

data with complex relationships.

Shwetank Singh
GritSetGrow - GSGLearn.com
DATA ENGINEERING - SYSTEM DESIGN

WHAT ARE THE BENEFITS OF GRAPH

DATABASES?

They excel at handling complex

29
relationships, enable efficient traversal
queries, and provide a flexible schema.

Finding shortest paths and

recommendations in a social network
using Neo4j.

Shwetank Singh
GritSetGrow - GSGLearn.com
DATA ENGINEERING - SYSTEM DESIGN

WHAT IS A COLUMN-FAMILY
STORE?

A column-family store is a type of NoSQL

30
database that stores data in columns
rather than rows, optimized for read and
write operations.

Using Apache Cassandra for time-

series data storage.

Shwetank Singh
GritSetGrow - GSGLearn.com
DATA ENGINEERING - SYSTEM DESIGN

WHAT ARE THE BENEFITS OF

COLUMN-FAMILY STORES?

They offer high write throughput,

31
horizontal scalability, and are suitable
for time-series data and real-time
analytics.

Storing and retrieving logs in Apache

Cassandra.

Shwetank Singh
GritSetGrow - GSGLearn.com
DATA ENGINEERING - SYSTEM DESIGN

WHAT IS A DISTRIBUTED FILE

SYSTEM?

A distributed file system (DFS) is a file

32
system that allows access to files from
multiple hosts, providing redundancy
and fault tolerance.

Using Hadoop Distributed File System

(HDFS) for big data storage.

Shwetank Singh
GritSetGrow - GSGLearn.com
DATA ENGINEERING - SYSTEM DESIGN

WHAT IS A DATA WAREHOUSE?

A data warehouse is a centralized

33
repository for storing large volumes of
structured and semi-structured data,
optimized for query and analysis.

Example using Amazon Redshift or

Google BigQuery for business analytics.

Shwetank Singh
GritSetGrow - GSGLearn.com
DATA ENGINEERING - SYSTEM DESIGN

WHAT IS ETL (EXTRACT,

TRANSFORM, LOAD)?

ETL is a process in data warehousing

34
that involves extracting data from
various sources, transforming it to fit
operational needs, and loading it into a
target data store.

Using Apache NiFi to extract data from

databases, transform it, and load it into
HDFS.

Shwetank Singh
GritSetGrow - GSGLearn.com
DATA ENGINEERING - SYSTEM DESIGN

WHAT IS A DATA LAKE?

A data lake is a storage repository that

35
holds vast amounts of raw data in its
native format until it is needed.

Using Amazon S3 as a data lake for

storing structured and unstructured
data.

Shwetank Singh
GritSetGrow - GSGLearn.com
DATA ENGINEERING - SYSTEM DESIGN

WHAT ARE THE BENEFITS OF A DATA

LAKE?

Benefits include scalability, flexibility, and

36
the ability to store diverse data types.

Storing log files, images, and structured

data together in Amazon S3.

Shwetank Singh
GritSetGrow - GSGLearn.com
DATA ENGINEERING - SYSTEM DESIGN

WHAT IS STREAM PROCESSING?

Stream processing involves processing

37
data in real-time as it flows from one
source to another, allowing immediate
insights and actions.

Using Apache Kafka and Apache Flink

to process real-time clickstream data.

Shwetank Singh
GritSetGrow - GSGLearn.com
DATA ENGINEERING - SYSTEM DESIGN

WHAT ARE THE ADVANTAGES OF

STREAM PROCESSING?

It enables real-time analytics, reduces

38
data latency, and supports event-driven
architectures.

Real-time fraud detection in financial

transactions using Apache Kafka and
Apache Storm.

Shwetank Singh
GritSetGrow - GSGLearn.com
DATA ENGINEERING - SYSTEM DESIGN

WHAT IS BATCH PROCESSING?

Batch processing involves processing

39
data in large blocks or batches at
scheduled intervals.

Using Apache Hadoop to run nightly ETL

jobs.

Shwetank Singh
GritSetGrow - GSGLearn.com
DATA ENGINEERING - SYSTEM DESIGN

WHAT IS THE DIFFERENCE BETWEEN

BATCH PROCESSING AND STREAM
PROCESSING?
Batch processing deals with large

40
volumes of data processed at intervals,
while stream processing deals with
continuous data processing in real-time.

Using Apache Hadoop for batch

processing and Apache Kafka for
stream processing.

Shwetank Singh
GritSetGrow - GSGLearn.com
DATA ENGINEERING - SYSTEM DESIGN

WHAT IS A LAMBDA
ARCHITECTURE?
A lambda architecture is a data-
processing architecture designed to

41
handle massive quantities of data by
taking advantage of both batch and
stream-processing methods.
Combining Apache Hadoop for batch
processing and Apache Storm for real-
time processing in a lambda
architecture.

Shwetank Singh
GritSetGrow - GSGLearn.com
DATA ENGINEERING - SYSTEM DESIGN

WHAT ARE THE COMPONENTS OF A

LAMBDA ARCHITECTURE?

The components include a batch layer

42
for processing large datasets, a speed
layer for real-time processing, and a
serving layer to merge the results.

Using Hadoop for batch processing,

Kafka for real-time data, and HBase as
the serving layer.

Shwetank Singh
GritSetGrow - GSGLearn.com
DATA ENGINEERING - SYSTEM DESIGN

WHAT IS THE KAPPA

ARCHITECTURE?

The kappa architecture simplifies the

43
lambda architecture by using only
stream processing to handle both real-
time and historical data.

Using Apache Kafka and Apache Flink

to process all data as streams.

Shwetank Singh
GritSetGrow - GSGLearn.com
DATA ENGINEERING - SYSTEM DESIGN

WHAT IS A NOSQL DATABASE?

A NoSQL database is a non-relational

44
database designed to handle large
volumes of unstructured or semi-
structured data with flexible schemas.

Using MongoDB or Cassandra for

storing large-scale data with varying
structures.

Shwetank Singh
GritSetGrow - GSGLearn.com
DATA ENGINEERING - SYSTEM DESIGN

WHAT ARE THE TYPES OF NOSQL

DATABASES?

Types include key-value stores,

45
document stores, column-family stores,
and graph databases.

Redis (key-value), MongoDB

(document), Cassandra (column-
family), Neo4j (graph).

Shwetank Singh
GritSetGrow - GSGLearn.com
DATA ENGINEERING - SYSTEM DESIGN

WHAT IS A RELATIONAL
DATABASE?

A relational database is a type of

46
database that organizes data into
tables with rows and columns, using SQL
for data management.

Using MySQL or PostgreSQL for

structured data storage.

Shwetank Singh
GritSetGrow - GSGLearn.com
DATA ENGINEERING - SYSTEM DESIGN

WHAT ARE THE BENEFITS OF

RELATIONAL DATABASES?

They provide strong consistency, ACID

47
transactions, and support complex
queries and relationships.

Using SQL joins to query related data

across multiple tables in PostgreSQL.

Shwetank Singh
GritSetGrow - GSGLearn.com
DATA ENGINEERING - SYSTEM DESIGN

WHAT IS ACID COMPLIANCE?

ACID stands for Atomicity, Consistency,

Isolation, and Durability, which are

48
properties that ensure reliable database
transactions.

A banking system using ACID

transactions to ensure accurate fund
transfers.

Shwetank Singh
GritSetGrow - GSGLearn.com
DATA ENGINEERING - SYSTEM DESIGN

WHAT IS BASE CONSISTENCY?

BASE stands for Basically Available, Soft

49
state, and Eventual consistency, which
are properties of some NoSQL databases
that prioritize availability over strict
consistency.

Using Cassandra for high availability

and eventual consistency in a
distributed database.

Shwetank Singh
GritSetGrow - GSGLearn.com
DATA ENGINEERING - SYSTEM DESIGN

WHAT IS A DISTRIBUTED SYSTEM?

A distributed system is a system whose

50
components are located on different
networked computers, which
communicate and coordinate to
achieve a common goal.

Using a microservices architecture with

services running on multiple servers.

Shwetank Singh
GritSetGrow - GSGLearn.com
DATA ENGINEERING - SYSTEM DESIGN

WHAT ARE THE CHALLENGES OF

DISTRIBUTED SYSTEMS?

Challenges include network latency,

51
fault tolerance, data consistency, and
synchronization.

Implementing consensus algorithms

like Raft to manage distributed state.

Shwetank Singh
GritSetGrow - GSGLearn.com
DATA ENGINEERING - SYSTEM DESIGN

WHAT IS FAULT TOLERANCE?

Fault tolerance is the ability of a system

52
to continue operating properly in the
event of the failure of some of its
components.

Using redundant servers and data

replication to ensure high availability.

Shwetank Singh
GritSetGrow - GSGLearn.com
DATA ENGINEERING - SYSTEM DESIGN

HOW DO YOU ACHIEVE HIGH

AVAILABILITY?

Techniques include using load

53
balancers, redundant systems, failover
mechanisms, and data replication.

Implementing database replication and

load balancing for a web application.

Shwetank Singh
GritSetGrow - GSGLearn.com
DATA ENGINEERING - SYSTEM DESIGN

WHAT IS A CONSENSUS
ALGORITHM?

A consensus algorithm is a process in

54
computer science used to achieve
agreement on a single data value
among distributed processes or
systems.

Using the Raft algorithm to elect a

leader in a distributed system.

Shwetank Singh
GritSetGrow - GSGLearn.com
DATA ENGINEERING - SYSTEM DESIGN

WHAT IS THE RAFT ALGORITHM?

Raft is a consensus algorithm designed

55
to be understandable and
implementable, used to manage a
replicated log in distributed systems.

Implementing Raft for leader election

and log replication in a distributed
database.

Shwetank Singh
GritSetGrow - GSGLearn.com
DATA ENGINEERING - SYSTEM DESIGN

WHAT IS THE PAXOS ALGORITHM?

Paxos is a family of protocols for solving

56
consensus in a network of unreliable or
faulty processors.

Using Paxos for achieving consensus in

a distributed system with unreliable
nodes.

Shwetank Singh
GritSetGrow - GSGLearn.com
DATA ENGINEERING - SYSTEM DESIGN

WHAT IS A LEADER ELECTION?

Leader election is the process of

57
designating a single node as the
organizer of some task distributed
among several nodes.

Using ZooKeeper for leader election in a

distributed system.

Shwetank Singh
GritSetGrow - GSGLearn.com
DATA ENGINEERING - SYSTEM DESIGN

WHAT IS A QUORUM?

A quorum is the minimum number of

58
votes needed for a distributed
transaction to be committed.

Using a majority quorum (n/2 + 1) in a

distributed database to ensure data
consistency.

Shwetank Singh
GritSetGrow - GSGLearn.com
DATA ENGINEERING - SYSTEM DESIGN

WHAT IS A WRITE-AHEAD LOG

(WAL)?

A write-ahead log is a technique used in

59
databases to ensure data integrity by
logging changes before applying them
to the database.

Using WAL in PostgreSQL to ensure data

is not lost during a crash.

Shwetank Singh
GritSetGrow - GSGLearn.com
DATA ENGINEERING - SYSTEM DESIGN

WHAT IS STRONG CONSISTENCY?

Strong consistency ensures that all

61
nodes in a distributed system see the
same data at the same time after a
write operation.

Using a relational database like

PostgreSQL to ensure strong
consistency.

Shwetank Singh
GritSetGrow - GSGLearn.com
DATA ENGINEERING - SYSTEM DESIGN

WHAT IS A DATA PARTITION?

Data partitioning involves dividing a

62
database into smaller, more
manageable pieces, which can be
stored across multiple servers.

Sharding a user database by user ID to

distribute data across different servers.

Shwetank Singh
GritSetGrow - GSGLearn.com
DATA ENGINEERING - SYSTEM DESIGN

WHAT IS A DISTRIBUTED HASH

TABLE (DHT)?

A DHT is a decentralized distributed

65
system that provides a lookup service
similar to a hash table, where data is
distributed across multiple nodes.

Using a DHT for peer-to-peer file

sharing in BitTorrent.

Shwetank Singh
GritSetGrow - GSGLearn.com
DATA ENGINEERING - SYSTEM DESIGN

WHAT IS A GOSSIP PROTOCOL?

A gossip protocol is a method of peer-

66
to-peer communication where nodes
periodically exchange state information
to achieve eventual consistency.

Using a gossip protocol in Cassandra

for node state information exchange.

Shwetank Singh
GritSetGrow - GSGLearn.com
DATA ENGINEERING - SYSTEM DESIGN

WHAT IS A LEADER-FOLLOWER
PATTERN?

The leader-follower pattern is a

67
replication strategy where one node
(leader) handles all writes, and other
nodes (followers) replicate the leader's
data.

Using leader-follower replication in

Kafka for high availability.

Shwetank Singh
GritSetGrow - GSGLearn.com
DATA ENGINEERING - SYSTEM DESIGN

WHAT IS A DISTRIBUTED LOCK?

A distributed lock is a mechanism to

68
synchronize access to a shared resource
across multiple nodes in a distributed
system.

Using ZooKeeper to implement

distributed locks for coordination.

Shwetank Singh
GritSetGrow - GSGLearn.com
DATA ENGINEERING - SYSTEM DESIGN

WHAT IS A HEARTBEAT IN
DISTRIBUTED SYSTEMS?

A heartbeat is a periodic signal sent by a

69
node to indicate its presence and
operational status to other nodes in a
distributed system.

Using heartbeats in Kubernetes to

monitor node health.

Shwetank Singh
GritSetGrow - GSGLearn.com
DATA ENGINEERING - SYSTEM DESIGN

WHAT IS A PARTITION TOLERANCE?

Partition tolerance is the ability of a

70
distributed system to continue
functioning even when network
partitions occur, isolating parts of the
system.

Using eventual consistency to handle

network partitions in a distributed
database.

Shwetank Singh
GritSetGrow - GSGLearn.com
DATA ENGINEERING - SYSTEM DESIGN

WHAT IS THE DIFFERENCE BETWEEN

SYNCHRONOUS AND
ASYNCHRONOUS REPLICATION?
Synchronous replication requires that

71
data be written to all replicas before
acknowledging the write, while
asynchronous replication allows for
acknowledgement before all replicas
are updated.

Synchronous replication in a financial

transaction system for data integrity vs.
asynchronous replication in a content
delivery network for performance.

Shwetank Singh
GritSetGrow - GSGLearn.com
DATA ENGINEERING - SYSTEM DESIGN

WHAT IS A READ REPLICA?

A read replica is a copy of a database

that is used to offload read traffic from

72
the primary database, improving
performance and availability.

Using read replicas in Amazon RDS to

handle increased read traffic.

Shwetank Singh
GritSetGrow - GSGLearn.com
DATA ENGINEERING - SYSTEM DESIGN

WHAT IS DATA CONSISTENCY?

Data consistency ensures that data is

73
the same across all nodes in a
distributed system after a write
operation.

Using strong consistency in a

distributed database to ensure
accurate data retrieval.

Shwetank Singh
GritSetGrow - GSGLearn.com
DATA ENGINEERING - SYSTEM DESIGN

WHAT IS A ROLLBACK?

A rollback is the process of undoing

75
changes made to a database during a
transaction if an error occurs, ensuring
data integrity.

Using rollback in SQL transactions to

revert changes if an error occurs during
a multi-step process.

Shwetank Singh
GritSetGrow - GSGLearn.com
DATA ENGINEERING - SYSTEM DESIGN

WHAT IS A DISTRIBUTED CACHE?

A distributed cache is a caching

76
mechanism where data is stored across
multiple nodes to improve scalability
and performance.

Using Redis Cluster to distribute cache

data across multiple nodes.

Shwetank Singh
GritSetGrow - GSGLearn.com
DATA ENGINEERING - SYSTEM DESIGN

WHAT IS A CIRCUIT BREAKER

PATTERN?

The circuit breaker pattern is a design

77
pattern used to detect failures and
prevent cascading failures in distributed
systems by stopping the flow of requests
to a failing service.

Implementing a circuit breaker in a

microservices architecture to handle
service failures.

Shwetank Singh
GritSetGrow - GSGLearn.com
DATA ENGINEERING - SYSTEM DESIGN

WHAT IS A BULKHEAD PATTERN?

The bulkhead pattern isolates

78
components of a system into separate
pools to prevent a failure in one
component from affecting others.

Using separate thread pools for

different services in a microservices
architecture to isolate failures.

Shwetank Singh
GritSetGrow - GSGLearn.com
DATA ENGINEERING - SYSTEM DESIGN

WHAT IS A FALLBACK MECHANISM?

A fallback mechanism provides an

79
alternative action or data source when a
service call fails, ensuring system
resilience.

Returning cached data or a default

response when a primary service call
fails.

Shwetank Singh
GritSetGrow - GSGLearn.com
DATA ENGINEERING - SYSTEM DESIGN

WHAT IS IDEMPOTENCY?

Idempotency ensures that performing

80
an operation multiple times has the
same effect as performing it once, which
is crucial for reliable systems.

Using unique transaction IDs to ensure

that duplicate payment requests do not
result in multiple charges.

Shwetank Singh
GritSetGrow - GSGLearn.com
DATA ENGINEERING - SYSTEM DESIGN

WHAT IS A SERVICE DISCOVERY?

Service discovery is the process of

81
automatically detecting services and
their endpoints in a distributed system,
facilitating communication between
components.

Using Consul or Eureka for service

discovery in a microservices
architecture.

Shwetank Singh
GritSetGrow - GSGLearn.com
DATA ENGINEERING - SYSTEM DESIGN

WHAT IS A PROXY SERVER?

A proxy server acts as an intermediary

82
for requests from clients seeking
resources from other servers, providing
benefits like load balancing, caching,
and security.

Using Nginx as a reverse proxy to

distribute traffic to backend servers.

Shwetank Singh
GritSetGrow - GSGLearn.com
DATA ENGINEERING - SYSTEM DESIGN

WHAT IS A REVERSE PROXY?

A reverse proxy is a type of proxy server

83
that retrieves resources on behalf of a
client from one or more servers, often
used for load balancing and security.

Using HAProxy to distribute incoming

HTTP requests to multiple web servers.

Shwetank Singh
GritSetGrow - GSGLearn.com
DATA ENGINEERING - SYSTEM DESIGN

WHAT IS A CDN EDGE SERVER?

A CDN edge server is a server located at

84
the edge of a network that stores
cached content, delivering it to users
from a location closer to them.

Using Cloudflare edge servers to deliver

static content to users with low latency.

Shwetank Singh
GritSetGrow - GSGLearn.com
DATA ENGINEERING - SYSTEM DESIGN

WHAT IS A MICROSERVICES
GATEWAY?

A microservices gateway is an API

85
gateway that manages and routes
requests to the appropriate
microservice, often handling cross-
cutting concerns like authentication and
rate limiting.

Using Kong or Apigee as a

microservices gateway to manage API
traffic.

Shwetank Singh
GritSetGrow - GSGLearn.com

System Design
No ratings yet
System Design
385 pages
Chemistry Concepts Overview
100% (2)
Chemistry Concepts Overview
1 page
Revision Assignment-01 To 11 (Set-03)
100% (2)
Revision Assignment-01 To 11 (Set-03)
67 pages
Designing Scalable Systems - A Guide For Engineers
No ratings yet
Designing Scalable Systems - A Guide For Engineers
56 pages
Physical Sciences P2 Grade 11 Exemplar 2013 Eng
No ratings yet
Physical Sciences P2 Grade 11 Exemplar 2013 Eng
15 pages
Aspirin - British Pharmacopoeia
No ratings yet
Aspirin - British Pharmacopoeia
5 pages
Website Login Details for Courses
No ratings yet
Website Login Details for Courses
1 page
Sushmita Madhu's Engineering Resume
No ratings yet
Sushmita Madhu's Engineering Resume
1 page
Sample Questions Science 2025
No ratings yet
Sample Questions Science 2025
13 pages
Lotus Completeset UKG Final
No ratings yet
Lotus Completeset UKG Final
136 pages
Worksheets LKG Ukg - 2 PDF
No ratings yet
Worksheets LKG Ukg - 2 PDF
4 pages
Grade 6 English Teacher's Guide
No ratings yet
Grade 6 English Teacher's Guide
147 pages
Click Https://bit - ly/FG-Books To Download All PDF FG Books For FREE
No ratings yet
Click Https://bit - ly/FG-Books To Download All PDF FG Books For FREE
96 pages
Entrance Test Class One PDF
No ratings yet
Entrance Test Class One PDF
2 pages
AQA Chemistry Atomic Structure and The Periodic Table KnowIT GCSE
No ratings yet
AQA Chemistry Atomic Structure and The Periodic Table KnowIT GCSE
86 pages
Grammer
No ratings yet
Grammer
186 pages
D2896 - Standard Test Method For Base Number of Petroleum Products by Potentiometric Perchloric Acid Titration
No ratings yet
D2896 - Standard Test Method For Base Number of Petroleum Products by Potentiometric Perchloric Acid Titration
10 pages
UKG DAV Songs The Letters Final
100% (1)
UKG DAV Songs The Letters Final
60 pages
3 English Textbook
0% (1)
3 English Textbook
102 pages
Biochem Proteins Reviewer
No ratings yet
Biochem Proteins Reviewer
4 pages
Astm Test Method
No ratings yet
Astm Test Method
37 pages
My Life A.P.J ABDUL KALAM
No ratings yet
My Life A.P.J ABDUL KALAM
79 pages
My English Book UKG Final 2023
100% (1)
My English Book UKG Final 2023
28 pages
MCAT Test 7 Ans
No ratings yet
MCAT Test 7 Ans
17 pages
KPK Mathematics Grade 1 Book PDF Downlaod
No ratings yet
KPK Mathematics Grade 1 Book PDF Downlaod
146 pages
CRO Olympiad Book For Class 1
100% (1)
CRO Olympiad Book For Class 1
9 pages
Grade 4 Maths Question-Difficult
100% (1)
Grade 4 Maths Question-Difficult
3 pages
General English
100% (2)
General English
75 pages
All Done DLP John Carlo Cerico
No ratings yet
All Done DLP John Carlo Cerico
13 pages
Grade 2 - Even & Odd
No ratings yet
Grade 2 - Even & Odd
9 pages
SCERT Kerala State Syllabus 1st Standard English Textbooks Part 1
No ratings yet
SCERT Kerala State Syllabus 1st Standard English Textbooks Part 1
84 pages
Math Fun for Young Learners
No ratings yet
Math Fun for Young Learners
10 pages
Krita & Robotics Study Guide
No ratings yet
Krita & Robotics Study Guide
2 pages
Computer Fundamentals Grade 5
0% (1)
Computer Fundamentals Grade 5
2 pages
Anisole Synthesis
0% (1)
Anisole Synthesis
6 pages
Class 5 English
No ratings yet
Class 5 English
59 pages
Mil PRF 6083F
No ratings yet
Mil PRF 6083F
23 pages
Oc 1. Alkynes and Alkadienes Final RK Sir - 05.03.14 (01-16) PDF
100% (1)
Oc 1. Alkynes and Alkadienes Final RK Sir - 05.03.14 (01-16) PDF
16 pages
COCAT TEST 1 CATS AnanthGarg - On Trak&EduCompetishun
No ratings yet
COCAT TEST 1 CATS AnanthGarg - On Trak&EduCompetishun
16 pages
Computer Book Class1 Chapter2
No ratings yet
Computer Book Class1 Chapter2
8 pages
Picture Composition PPT 1
No ratings yet
Picture Composition PPT 1
14 pages
Gaskets TESNIT BA 200 - 2
No ratings yet
Gaskets TESNIT BA 200 - 2
2 pages
Database Design & Development Guide
No ratings yet
Database Design & Development Guide
112 pages
CBSE Class 5 Social Studies Sample Paper SA1 2015
No ratings yet
CBSE Class 5 Social Studies Sample Paper SA1 2015
8 pages
U3 Difference Between Substitution Cipher Technique and Transposition Cipher Technique
No ratings yet
U3 Difference Between Substitution Cipher Technique and Transposition Cipher Technique
3 pages
Annual Paper Maths Class 1
No ratings yet
Annual Paper Maths Class 1
23 pages
NCERT Class 4 Environmental Science
No ratings yet
NCERT Class 4 Environmental Science
218 pages
NCERT Class 4 Mathematics
No ratings yet
NCERT Class 4 Mathematics
168 pages
CBSE Class 5 English Question Paper SA 1 2013
No ratings yet
CBSE Class 5 English Question Paper SA 1 2013
7 pages
11 Toy Based Pedagogy DVK Varma Ta Ziet BBSR
No ratings yet
11 Toy Based Pedagogy DVK Varma Ta Ziet BBSR
27 pages
5.hafta Ingilizce Bik Pratik-3
No ratings yet
5.hafta Ingilizce Bik Pratik-3
7 pages
Acid
No ratings yet
Acid
3 pages
Cambridge IGCSE: Combined Science 0653/23
No ratings yet
Cambridge IGCSE: Combined Science 0653/23
16 pages
350204011ws Class 4
100% (1)
350204011ws Class 4
62 pages
NCERT Class 4 Maths Chapter 1 Building With Brick Solutions
No ratings yet
NCERT Class 4 Maths Chapter 1 Building With Brick Solutions
95 pages
Time Table
No ratings yet
Time Table
1 page
Resource Bank Class 4 Lesson - 4 'SOIL & ITS TYPES'
No ratings yet
Resource Bank Class 4 Lesson - 4 'SOIL & ITS TYPES'
3 pages
Production of Acetic Acid by Catalytic Oxidation of Butene
No ratings yet
Production of Acetic Acid by Catalytic Oxidation of Butene
7 pages
Ieo Class-1
100% (1)
Ieo Class-1
2 pages
Shorter or Taller?: To Compare The Heights of Objects
No ratings yet
Shorter or Taller?: To Compare The Heights of Objects
6 pages
Class 1
100% (1)
Class 1
43 pages
Comp - 01
No ratings yet
Comp - 01
4 pages
Class III EVS Model Test Paper
No ratings yet
Class III EVS Model Test Paper
7 pages
Play Group Paper Math 2022
No ratings yet
Play Group Paper Math 2022
4 pages
Paper No.: Mechanism of CO Corrosion of Mild Steel: A New Narrative
No ratings yet
Paper No.: Mechanism of CO Corrosion of Mild Steel: A New Narrative
16 pages
Chemistry Class 10th Viva Notes
No ratings yet
Chemistry Class 10th Viva Notes
3 pages
Grade 5 Math Exam Paper
No ratings yet
Grade 5 Math Exam Paper
3 pages
Tamilnadu State Board 2nd Standard Textbook
No ratings yet
Tamilnadu State Board 2nd Standard Textbook
60 pages
CS2301 Discrete Math Notes
No ratings yet
CS2301 Discrete Math Notes
23 pages
Grade 9 Notes Printed - 10 - 2010 - Databases
No ratings yet
Grade 9 Notes Printed - 10 - 2010 - Databases
4 pages
Syllabus-Class U.K.G: Important Note
No ratings yet
Syllabus-Class U.K.G: Important Note
16 pages
Circular 62 - Vi English (Term II)
100% (2)
Circular 62 - Vi English (Term II)
56 pages
Computer Note in Nepali
No ratings yet
Computer Note in Nepali
13 pages
Cambridge International General Certificate of Secondary Education
No ratings yet
Cambridge International General Certificate of Secondary Education
16 pages
Extended Essay - Chemistry - IB
No ratings yet
Extended Essay - Chemistry - IB
32 pages
Sample Happy Kids UKG Social Moral
No ratings yet
Sample Happy Kids UKG Social Moral
62 pages
Subject: English Level: A 2 Class:III Lesson: 1 The Magic Garden
No ratings yet
Subject: English Level: A 2 Class:III Lesson: 1 The Magic Garden
33 pages
Nursery GK Dec 19
No ratings yet
Nursery GK Dec 19
1 page
Homeworks MR
No ratings yet
Homeworks MR
3 pages
On Exercise 2I
No ratings yet
On Exercise 2I
10 pages
Test 3 - ELGA 16 Question Paper - Written
No ratings yet
Test 3 - ELGA 16 Question Paper - Written
1 page
Acron Product Ts
No ratings yet
Acron Product Ts
5 pages
Buffers and Titration Qs
No ratings yet
Buffers and Titration Qs
15 pages
Metric Metric Conversions
No ratings yet
Metric Metric Conversions
3 pages
Aquatic Respiration Study Guide
No ratings yet
Aquatic Respiration Study Guide
13 pages
Grade 1 Tamil
No ratings yet
Grade 1 Tamil
3 pages
Period 3 Chemistry and Reactions
No ratings yet
Period 3 Chemistry and Reactions
35 pages
All India Sainik School Entrance Exam (AISSEE) Class-VI Paper-I 2011
No ratings yet
All India Sainik School Entrance Exam (AISSEE) Class-VI Paper-I 2011
19 pages
CLASS UKG GK Test Paper St. Anthony Iiiterm Test Paper Our Helpers 2
100% (2)
CLASS UKG GK Test Paper St. Anthony Iiiterm Test Paper Our Helpers 2
2 pages