Thanks to visit codestin.com
Credit goes to github.com

Skip to content

leiterrl/multiproc_tutorial

Repository files navigation

Python Parallel Programming Tutorial

Duration: ~45 minutes Level: Intermediate

Learning Objectives

By the end of this tutorial, you will understand:

  1. The difference between CPU-bound and I/O-bound tasks
  2. When to use threads vs processes for parallelism
  3. How to use Python's multiprocessing module
  4. How to use Python's concurrent.futures module
  5. The Global Interpreter Lock (GIL) and its implications
  6. Performance tradeoffs in parallel programming

Prerequisites

  • Python 3.7+
  • Basic understanding of Python functions and loops
  • Familiarity with timing code execution

Tutorial Structure

Part 1: Understanding the Problem Types (~5 min)

  • CPU-bound tasks: Heavy computation (e.g., mathematical calculations, data processing)

    • Limited by processor speed
    • Best parallelized with processes (bypass GIL)
  • I/O-bound tasks: Waiting for external resources (e.g., file I/O, disk operations)

    • Limited by I/O operations
    • Best parallelized with threads (lightweight, shared memory)

Part 2: Exercises (~35 min)

Each exercise includes:

  • A skeleton file in the root directory
  • A completed solution in the solutions/ directory
  • Clear TODOs marking what you need to implement

Exercise 1: Serial Baseline (5 min)

File: exercise_01_serial.py

Establish a baseline by computing prime numbers serially. This will help us measure speedup in later exercises.

Exercise 2: CPU-Bound with Multiprocessing Pool (8 min)

File: exercise_02_multiprocessing_pool.py

Learn to parallelize CPU-intensive tasks using multiprocessing.Pool to bypass Python's GIL.

Exercise 3: I/O-Bound with Threading (8 min)

File: exercise_03_io_threading.py

Parallelize I/O-bound tasks (reading/processing multiple files) using concurrent.futures.ThreadPoolExecutor.

Exercise 4: Threads vs Processes for CPU Tasks (7 min)

File: exercise_04_threads_vs_processes.py

See firsthand why threads don't help with CPU-bound tasks due to the GIL, but processes do. This exercise demonstrates both ThreadPoolExecutor and ProcessPoolExecutor side-by-side.

Exercise 5: Bonus - Mixed Workloads (7 min, optional)

File: exercise_05_mixed_workload.py

Handle tasks that combine both I/O and CPU work by using threading and multiprocessing together in a pipeline.

Part 3: Key Takeaways (~5 min)

Review and discussion of when to use each approach.

Quick Reference

Task Type Best Approach Module
CPU-bound Processes multiprocessing.Pool or ProcessPoolExecutor
I/O-bound Threads ThreadPoolExecutor

Running the Exercises

Each exercise is standalone:

# Run skeleton to see TODOs
python exercise_01_serial.py

# Run solution to see expected output
python solutions/exercise_01_serial.py

Installation

No additional packages or network access required! All exercises use Python standard library and local file operations.

# Optional: Create virtual environment
python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

Tips for Participants

  1. Start with the serial version to understand the baseline
  2. Time your code using time.time() to measure improvements
  3. Experiment with worker counts to see how it affects performance
  4. Watch for common pitfalls: serialization overhead, shared state issues
  5. Read error messages carefully: multiprocessing errors can be tricky

Additional Resources

License

MIT License - Feel free to use for teaching and learning!

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •  

Languages