What are the key differences between multicore CPU
architectures and many-core GPU architectures?
Aspect CPU (Multicore) GPU (Many-core)
Core Count Few cores (4-64) Thousands of cores
Core Design Complex, powerful cores Simple, streamlined cores
Performance Focus Low latency, single- High throughput, parallel
thread performance processing
Cache System Large multilevel caches Small caches, high-
(L1/L2/L3) bandwidth memory
Memory Strategy Minimize latency for Hide latency through
random access thread switching
Execution Model Task parallelism Data parallelism (SIMD)
Threading 1-2 threads per core Thousands of lightweight
threads
Control Logic Complex branch Simple control units
prediction, out-of-order shared across groups
execution
Context Switching Sophisticated scheduling Rapid context switching
Branch Handling Excellent with complex Poor performance with
branching divergent branches
Best Use Cases Sequential tasks, general Graphics rendering,
computing, complex ML/AI, scientific
algorithms computing
Optimization Goal Minimize time per task Maximize tasks
completed per unit time
What are the major distributed computing technologies
that led to cloud computing?
Grid Computing (1990s-2000s)
Enabled resource sharing across geographically distributed systems
Projects like SETI@home demonstrated massive distributed processing
Laid groundwork for resource pooling and remote computation
Cluster Computing
Connected multiple machines to work as single system
Technologies like Beowulf clusters made parallel computing accessible
Established concepts of fault tolerance and load distribution
Service-Oriented Architecture (SOA)
Introduced loose coupling between services
Web services standards (SOAP, REST, XML) enabled remote service calls
Created foundation for microservices and API-based architectures
Virtualization Technologies
VMware, Xen, KVM enabled hardware abstraction
Multiple virtual machines on single physical hardware
Essential for resource isolation and multi-tenancy
Web Services & Middleware
CORBA, RMI, .NET Remoting for distributed object communication
Message queuing systems (MQSeries, RabbitMQ)
Application servers for scalable web applications
Peer-to-Peer (P2P) Networks
BitTorrent, Napster showed distributed content delivery
Demonstrated scalability without central control
Influenced CDN and edge computing concepts
Utility Computing
IBM's concept of computing as metered service
Pay-per-use models for computing resources
Direct precursor to cloud pricing models
Internet Infrastructure Evolution
High-speed broadband adoption
Improved network reliability and bandwidth
Made remote computing practical for enterprises
Explain the concept of scalability in distributed
computing and how it can be achieved
Scalability in distributed computing refers to the system's ability to handle increasing
workloads or expand by adding more resources (e.g., servers, nodes) without degrading
performance.
Types:
1. Horizontal Scalability – Adding more machines/nodes.
2. Vertical Scalability – Adding more power (CPU, RAM) to existing machines.
Achieving Scalability:
Load Balancing: Distribute tasks evenly across nodes.
Data Partitioning (Sharding): Split data across multiple databases or servers.
Caching: Store frequently accessed data in memory for faster access.
Asynchronous Processing: Use queues and background jobs to handle tasks
efficiently.
Decentralization: Avoid bottlenecks by removing single points of failure.
Scalability ensures the system remains efficient and responsive as demand grows.
Describe the role of Hadoop in distributed computing
and its advantages for big data processing.
Core Function
Framework for storing and processing massive datasets across commodity
hardware clusters
Handles data distribution, fault tolerance, and parallel processing automatically
Key Components
HDFS: Distributed file system storing data across multiple nodes
MapReduce: Programming model for parallel data processing
YARN: Resource management and job scheduling
Advantages for Big Data Processing
Scalability
Scales horizontally by adding commodity servers
Handles petabytes of data across thousands of nodes
Cost-Effective
Uses inexpensive commodity hardware instead of specialized systems
Open-source reduces licensing costs
Fault Tolerance
Automatic data replication (default 3 copies)
Continues processing even when nodes fail
Self-healing through automatic recovery
Flexibility
Handles structured, semi-structured, and unstructured data
No predefined schema required (schema-on-read)
Parallel Processing
Distributes computation across cluster nodes
Processes data where it's stored (data locality)
Reduces network traffic and improves performance
High Throughput
Optimized for batch processing large volumes
Better for throughput than low-latency operations
Ecosystem Integration
Rich ecosystem (Hive, Pig, Spark, HBase)
Integrates with various data sources and analytics tools
Data Locality
Moves computation to data rather than data to computation
Minimizes network overhead and improves efficiency
Describe Xen architecture with neat diagram and explain
its working.
Xen Hypervisor Architecture
Architecture Overview Xen is a Type-1 (bare-metal) hypervisor that runs directly on
hardware, managing multiple virtual machines called "domains."
Key Components
Xen Hypervisor
Thin layer running directly on hardware
Manages CPU scheduling, memory allocation, and interrupt handling
Provides isolation between domains
Minimal footprint for better performance
Domain Types
Domain0 (Dom0) - Control Domain
Privileged domain with special access to hardware
Runs control stack and device drivers
Manages other domains (create, destroy, migrate)
Handles I/O operations for guest domains
Only domain with direct hardware access
Guest Domains (DomU)
Unprivileged virtual machines
Run guest operating systems (Linux, Windows)
No direct hardware access
Communicate with Dom0 for I/O operations
Working
The XEN hypervisor sits directly on the hardware.
Dom0 is responsible for device I/O, VM management, and control.
Guest domains (DomU) run isolated virtual machines with their applications.
This setup enables efficient virtualization, isolation, and resource management
across multiple OSes.
Differentiate Full-virtualized, Para-virtualized and OS-
level virtualization.
Virtualization Types Comparison
Aspect Full Virtualization Para- OS-Level
virtualization Virtualization
Definition Complete Modified guest OS Shared kernel with
hardware aware of isolated user
simulation virtualization spaces
Guest OS No modification Requires OS kernel No separate guest
Modification required modification OS
Hypervisor Type Type-1 or Type-2 Type-1 (bare- Container runtime
metal)
Hardware Complete Paravirtualized No hardware
Abstraction hardware drivers virtualization
emulation
Performance Lower (emulation Higher (reduced Highest (native
overhead) overhead) performance)
Isolation Level Strong (hardware- Strong Process-level
level) (hypervisor- isolation
managed)
Resource High Medium Low
Overhead
Boot Time Slow (full OS boot) Moderate Fast (container
(modified OS startup)
boot)
Memory Usage High (separate OS Medium (shared Low (shared
instances) components) kernel)
OS Diversity Multiple different Multiple OS types Same OS kernel
OS types (modified) only
Examples VMware vSphere, Xen (paravirt Docker, LXC,
VirtualBox mode), VMware OpenVZ
ESX
Hardware Virtualization Standard Standard
Requirements extensions helpful hardware hardware
Security Strong VM Strong domain Container-level
isolation isolation isolation
Scalability Limited (resource Better than full High (lightweight)
intensive) virtualization
Use Cases Legacy High- Microservices,
applications, performance DevOps
multiple OS virtualization
Migration VM migration Domain migration Container
supported supported portability
Management High Medium Low
Complexity
Startup Overhead High Medium Minimal
Network Lower (emulated Better (paravirt Native network
Performance network) network) performance
Storage Lower (emulated Better (paravirt Native storage
Performance storage) storage) performance