Duration: ~45 minutes Level: Intermediate
By the end of this tutorial, you will understand:
- The difference between CPU-bound and I/O-bound tasks
- When to use threads vs processes for parallelism
- How to use Python's
multiprocessingmodule - How to use Python's
concurrent.futuresmodule - The Global Interpreter Lock (GIL) and its implications
- Performance tradeoffs in parallel programming
- Python 3.7+
- Basic understanding of Python functions and loops
- Familiarity with timing code execution
-
CPU-bound tasks: Heavy computation (e.g., mathematical calculations, data processing)
- Limited by processor speed
- Best parallelized with processes (bypass GIL)
-
I/O-bound tasks: Waiting for external resources (e.g., file I/O, disk operations)
- Limited by I/O operations
- Best parallelized with threads (lightweight, shared memory)
Each exercise includes:
- A skeleton file in the root directory
- A completed solution in the
solutions/directory - Clear TODOs marking what you need to implement
File: exercise_01_serial.py
Establish a baseline by computing prime numbers serially. This will help us measure speedup in later exercises.
File: exercise_02_multiprocessing_pool.py
Learn to parallelize CPU-intensive tasks using multiprocessing.Pool to bypass Python's GIL.
File: exercise_03_io_threading.py
Parallelize I/O-bound tasks (reading/processing multiple files) using concurrent.futures.ThreadPoolExecutor.
File: exercise_04_threads_vs_processes.py
See firsthand why threads don't help with CPU-bound tasks due to the GIL, but processes do. This exercise demonstrates both ThreadPoolExecutor and ProcessPoolExecutor side-by-side.
File: exercise_05_mixed_workload.py
Handle tasks that combine both I/O and CPU work by using threading and multiprocessing together in a pipeline.
Review and discussion of when to use each approach.
| Task Type | Best Approach | Module |
|---|---|---|
| CPU-bound | Processes | multiprocessing.Pool or ProcessPoolExecutor |
| I/O-bound | Threads | ThreadPoolExecutor |
Each exercise is standalone:
# Run skeleton to see TODOs
python exercise_01_serial.py
# Run solution to see expected output
python solutions/exercise_01_serial.pyNo additional packages or network access required! All exercises use Python standard library and local file operations.
# Optional: Create virtual environment
python -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate- Start with the serial version to understand the baseline
- Time your code using
time.time()to measure improvements - Experiment with worker counts to see how it affects performance
- Watch for common pitfalls: serialization overhead, shared state issues
- Read error messages carefully: multiprocessing errors can be tricky
MIT License - Feel free to use for teaching and learning!