Thanks to visit codestin.com
Credit goes to www.scribd.com

0% found this document useful (0 votes)
8 views10 pages

COMP3358 Distributed and Parallel Computing Final Notes

The document outlines various concepts related to client-server architecture, including the RMI process for remote procedure calls, the ACID properties of transactions, and consistency models. It also discusses caching performance in systems like Facebook's photo cache, the Raft consensus protocol for fault tolerance, and the scalability of MapReduce dataflow. Key takeaways include the importance of parallelization, data locality, and fault tolerance in distributed systems.

Uploaded by

dave
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
8 views10 pages

COMP3358 Distributed and Parallel Computing Final Notes

The document outlines various concepts related to client-server architecture, including the RMI process for remote procedure calls, the ACID properties of transactions, and consistency models. It also discusses caching performance in systems like Facebook's photo cache, the Raft consensus protocol for fault tolerance, and the scalability of MapReduce dataflow. Key takeaways include the importance of parallelization, data locality, and fault tolerance in distributed systems.

Uploaded by

dave
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 10

L02 Client Server

L02 RMI → Example of object middleware


In summary, the process follows this order:
1.​ Parameter Collection: The client stub gathers the parameters for
the remote procedure call
2.​ Marshalling: The parameters are prepared for transmission
(including type information, order, etc.)
3.​ Serialization: As part of marshalling, the parameters are converted
into a stream of bytes
4.​ Network Transmission: The serialized data is sent over the
network
5.​ Deserialization: The server converts the byte stream back into
data structures
6.​ Unmarshalling: The server reconstructs the original parameter
values.
7.​ Calling the server procedure/ method
L03 ACID & Transcation

Atomicity. In a txn involving two or more discrete


pieces of info, either all of the pieces are committed or
none are. (example: transaction 1 is effectively
incomplete, because transaction 2 has overwritten its
effect)
Consistency. A transaction either creates a new
and valid state of data, or, if any failure occurs, returns
all data to its state before the transaction was started.
(example: two people have booked the seat, but only
one booking is recorded in the database)
Isolation. A txn in process and not yet committed
must remain isolated from any other txn. (example:
both transactions update the same location;
transaction 1 is not isolated from transaction 2)
Durability. Committed data is saved by the
system such that, even in the event of a failure and
system restart, the data is available in its correct state.
(example: the effects of transaction 1 have not
endured because they have been overwritten by
transaction 2)
L04 Consistency Models

Parallelization and scalability, Amdahl's law

Synchronization, consistency
L05 Facebook photo cache
Mutual exclusion, locking, and issues related to locking
Key takeaway: FIFO takes X cache size to get a 59% hit
ratio, while S4LRU only takes ⅓ X cache size to get the
same performance.

Conclusion:
• Quantify caching performance
• Quantify popularity changes across layers of caches
• Recency, frequency, age, and social factors impact
cache
L06 RAFT - http://thesecretlivesofdata.com/raft/
Reliable, Replicated, Redundant, And Fault-Tolerant.
Raft consensus is already the strongest protocol in
terms of reliability, 100% sequential consistency, best
effort (not 100% guaranteed), in some cases it can faill
to elect the leader or approve some operations.

If the network is really bad, using raft protocol, can


cause no progress at all as no node can achieve
consensus, but this is okay as no progress is still
considered sequential consistency (better than making
local decision and make a diverged decision)
L07 Map-reduce What makes MapReduce dataflow so scalable?
Parallelization:
-​ Both map and reduce operations run in parallel across
many machines
-​ Input data is automatically partitioned, allowing
independent processing
Locality optimization:
-​ The system tries to assign mappers to machines where the
input data is stored
-​ This minimizes network transfer and utilizes data locality
The shuffle mechanism:
-​ Though network-intensive, it's a structured
communication pattern
-​ Data with the same keys is routed to the same reducer
regardless of scale
Fault tolerance:
-​ Failed tasks are automatically redistributed and restarted
-​ Intermediate results are stored for recovery purposes

What is meant by "dataflow" in MapReduce?


In the context of MapReduce, "dataflow" refers to the controlled
movement and transformation of data through the entire processing
pipeline. Specifically, it describes:
-​ The path data takes: How data flows from input sources
through mappers, through the shuffle phase, to reducers,
and finally to output storage.
-​ Transformation stages: How data changes form at each
stage - from raw input data to key-value pairs after
mapping, to grouped key-value pairs after shuffling, to
aggregated results after reducing.
-​ Data exchange patterns: How data moves between
distributed components, particularly during the shuffle
phase where data is reorganized across the network.
L09 HLF (Hyperledger Fabric) and BFT (Byzantine Fault
Tolerance)

Hyperledger Fabric (HLF)


SAMPLE FINAL

L08, L10, L11


L08 - Spark

Extra time:
BIDL: A High-throughput, Low-latency Permissioned
Blockchain Framework for Datacenter Networks

You might also like