Binary file handling in Python
Great — let's dive deep into binary file handling in Python, which is crucial when working with non-
text files like images, audio, video, PDFs, ZIPs, and executable files.
🔹 What is a Binary File?
A binary file is any file that contains data in a format other than plain text — usually raw bytes, not
meant for human reading.
Examples:
.jpg, .png, .pdf, .mp3, .mp4, .zip, .exe, etc.
🔸 Why Handle Binary Files Differently?
Text files deal with characters (strings), while binary files deal with bytes (raw 0s and 1s). So Python
needs a different file mode and data type when handling them.
🔹 Opening Binary Files in Python
Use the open() function with a binary mode:
Mode Description
'rb' Read binary
'wb' Write binary (overwrite)
'ab' Append binary
'rb+' Read & write binary
🔹 Basic Operations
1. Reading a Binary File
with open("image.jpg", "rb") as file:
data = file.read()
print(type(data)) # <class 'bytes'>
2. Writing to a Binary File
binary_data = b'\x89PNG\r\n' # Example PNG file signature
with open("output.png", "wb") as file:
file.write(binary_data)
3. Copying a Binary File
with open("original.pdf", "rb") as src, open("copy.pdf", "wb") as dest:
dest.write(src.read())
🔹 Binary vs Text Mode Comparison
Feature Text Mode ('r', 'w') Binary Mode ('rb', 'wb')
Data type str bytes
Encoding/decoding Automatically handled Must handle manually
Newline handling OS-specific (\n, \r\n) No transformation
Use cases .txt, .csv, .json .jpg, .exe, .mp3, etc.
🔹 Reading Binary Data in Chunks
For large files, read in chunks to save memory:
with open("large_file.zip", "rb") as file:
while chunk := file.read(1024): # 1 KB at a time
process(chunk) # hypothetical function
🔹 Writing Binary from Strings
If you want to write a string to a binary file, you must encode it:
with open("text.bin", "wb") as file:
file.write("Hello, binary!".encode('utf-8'))
To convert it back later:
with open("text.bin", "rb") as file:
print(file.read().decode('utf-8'))
🧪 When to Use Binary File Handling
Use binary mode when:
Working with images, audio, video, PDFs
Writing or reading encrypted or compressed data
Creating or reading custom binary formats
Avoiding platform-dependent newline issues (\n vs \r\n)
🔐 Important Notes
Always use with open(...) — it closes files automatically.
Do not open binary files in text mode — it may corrupt the file.
Be careful with data types: binary mode expects and returns bytes, not str.
✅ Summary
Task Mode Data Type Function to Use
Read binary file 'rb' bytes read(), readline()
Write binary file 'wb' bytes write(), writelines()
Append to binary 'ab' bytes write()
Copy file 'rb', 'wb' bytes read() + write()
Excellent — now you're moving into more advanced and very useful territory: load() and dump()
functions in binary file handling, which come from Python’s pickle module.
These functions are used to serialize and deserialize Python objects — basically saving Python data
(like dictionaries, lists, objects) to a binary file and loading them back later.
🔹 What Is Serialization?
Serialization is the process of converting a Python object into a byte stream (binary), so it can be:
Written to a file
Sent over a network
Stored in a database
The reverse process is deserialization.
🔸 Module: pickle
import pickle
🔹 1. pickle.dump(obj, file) → Write to Binary File
✅ Purpose:
Serializes a Python object and writes it to a binary file.
📘 Example:
import pickle
data = {'name': 'Alice', 'age': 25, 'languages': ['English', 'Hindi']}
with open("data.pkl", "wb") as file:
pickle.dump(data, file)
File must be opened in 'wb' mode (write binary)
Object can be any serializable Python data: dict, list, tuple, etc.
🔹 2. pickle.load(file) → Read from Binary File
✅ Purpose:
Reads from a binary file and deserializes (reconstructs) the original object.
📘 Example:
import pickle
with open("data.pkl", "rb") as file:
loaded_data = pickle.load(file)
print(loaded_data)
Output:
{'name': 'Alice', 'age': 25, 'languages': ['English', 'Hindi']}
🔁 Summary: dump() vs load()
Function Purpose File Mode Input/Output
dump() Serialize and write object 'wb' Python object → file
load() Read and deserialize object 'rb' file → Python object
🔒 Safety Note
Never use pickle.load() on a file from an untrusted source — it can execute arbitrary code and is a
security risk.
🔹 What Can You Serialize?
Supported Not Supported
int, float, str, bool Open file objects
list, dict, tuple, set Sockets, threads
Custom classes (mostly) Lambdas, nested functions (some cases)
🔸 Bonus: Saving Multiple Objects
with open("multi.pkl", "wb") as f:
pickle.dump([1, 2, 3], f)
pickle.dump("hello", f)
pickle.dump({"x": 10}, f)
with open("multi.pkl", "rb") as f:
print(pickle.load(f)) # [1, 2, 3]
print(pickle.load(f)) # 'hello'
print(pickle.load(f)) # {'x': 10}
🧠 When to Use pickle
Use pickle if:
You need to save/load Python data structures in a binary format
You're working with models, config data, or pre-processed objects
Speed and file size efficiency are important
Avoid it when:
You need interoperability with other programming languages → use json instead
Excellent question! Comparing pickle with conventional write() and read() functions helps
understand why and when serialization is better than plain text or manual file handling.
✅ Quick Summary:
Feature pickle write() / read()
Handles complex data ✅ Yes ❌ Manual formatting required
Feature pickle write() / read()
structures
❌ Text only (unless manually
Binary format ✅ Compact & fast
converted)
Human-readable ❌ No ✅ Yes
Cross-language support ❌ Python only ✅ (e.g., with JSON/XML/CSV)
Easy to use ✅ Very easy ❌ Tedious for complex data
Secure to load from ❌ No! (Risk of code
✅ Yes (if using plain text)
untrusted? execution)
🔍 Detailed Advantages of pickle over write() / read()
1. 🧠 Automatic Handling of Python Objects
With pickle, you can directly store:
Lists, dictionaries, sets
Nested data structures
Custom class objects
Example:
data = {'name': 'Alice', 'scores': [85, 90, 78]}
pickle.dump(data, file) # Just one line!
💥 But with write():
You'd have to convert it to a string manually, and parse it again with eval() or similar (which is
dangerous and error-prone).
2. Performance (Speed and Size)
Pickle uses a compact binary format, which:
Is faster to read/write
Produces smaller file sizes for complex data
Text formats (like writing strings with write()) are usually larger and slower for structured data.
3. Less Code and No Manual Parsing
With write():
You'd need to format strings (str(), join(), split(), etc.)
Later you'd have to reconstruct the structure from text
With pickle, both serialization and deserialization are one-liners.
4. 🧩 Custom Class Support
You can store and reload instances of your own classes:
pickle.dump(my_object, file)
...
my_object = pickle.load(file)
This is impossible with basic write() without building a full custom save/load system.
5. 🔄 Handles References Automatically
If your data structure has shared references or cycles, pickle preserves them — whereas write()
would lose or duplicate them.
🛑 Limitations / When Not to Use pickle
Issue Why Use Alternative
Security load() can execute arbitrary code Only load from trusted sources
Human readability Binary format, not editable Use JSON/XML for text formats
Cross-language usage Python-only format Use JSON/CSV/XML
✅ Best Use Cases for pickle
Saving trained machine learning models (.pkl files)
Storing cached processed data
Saving and restoring sessions or program states
Quick prototyping and testing
✅ Recap:
Use pickle when you want to:
Store complex data structures
Avoid manual parsing
Prioritize speed and convenience
Use write()/read() when:
You need plain text for logging, editing, or debugging
You're writing simple strings
You're sharing files with non-Python systems
Would you like a working example showing how the same dictionary is saved using both pickle and
write() for side-by-side comparison?