Thanks to visit codestin.com
Credit goes to www.scribd.com

0% found this document useful (0 votes)
8 views8 pages

Binary File Handling in Python

Uploaded by

Himanshu Mudgal
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
8 views8 pages

Binary File Handling in Python

Uploaded by

Himanshu Mudgal
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 8

Binary file handling in Python

Great — let's dive deep into binary file handling in Python, which is crucial when working with non-
text files like images, audio, video, PDFs, ZIPs, and executable files.

🔹 What is a Binary File?

A binary file is any file that contains data in a format other than plain text — usually raw bytes, not
meant for human reading.

Examples:

 .jpg, .png, .pdf, .mp3, .mp4, .zip, .exe, etc.

🔸 Why Handle Binary Files Differently?

Text files deal with characters (strings), while binary files deal with bytes (raw 0s and 1s). So Python
needs a different file mode and data type when handling them.

🔹 Opening Binary Files in Python

Use the open() function with a binary mode:

Mode Description

'rb' Read binary

'wb' Write binary (overwrite)

'ab' Append binary

'rb+' Read & write binary

🔹 Basic Operations

1. Reading a Binary File

with open("image.jpg", "rb") as file:

data = file.read()

print(type(data)) # <class 'bytes'>

2. Writing to a Binary File

binary_data = b'\x89PNG\r\n' # Example PNG file signature

with open("output.png", "wb") as file:

file.write(binary_data)
3. Copying a Binary File

with open("original.pdf", "rb") as src, open("copy.pdf", "wb") as dest:

dest.write(src.read())

🔹 Binary vs Text Mode Comparison

Feature Text Mode ('r', 'w') Binary Mode ('rb', 'wb')

Data type str bytes

Encoding/decoding Automatically handled Must handle manually

Newline handling OS-specific (\n, \r\n) No transformation

Use cases .txt, .csv, .json .jpg, .exe, .mp3, etc.

🔹 Reading Binary Data in Chunks

For large files, read in chunks to save memory:

with open("large_file.zip", "rb") as file:

while chunk := file.read(1024): # 1 KB at a time

process(chunk) # hypothetical function

🔹 Writing Binary from Strings

If you want to write a string to a binary file, you must encode it:

with open("text.bin", "wb") as file:

file.write("Hello, binary!".encode('utf-8'))

To convert it back later:

with open("text.bin", "rb") as file:

print(file.read().decode('utf-8'))

🧪 When to Use Binary File Handling

Use binary mode when:

 Working with images, audio, video, PDFs

 Writing or reading encrypted or compressed data

 Creating or reading custom binary formats

 Avoiding platform-dependent newline issues (\n vs \r\n)


🔐 Important Notes

 Always use with open(...) — it closes files automatically.

 Do not open binary files in text mode — it may corrupt the file.

 Be careful with data types: binary mode expects and returns bytes, not str.

✅ Summary

Task Mode Data Type Function to Use

Read binary file 'rb' bytes read(), readline()

Write binary file 'wb' bytes write(), writelines()

Append to binary 'ab' bytes write()

Copy file 'rb', 'wb' bytes read() + write()

Excellent — now you're moving into more advanced and very useful territory: load() and dump()
functions in binary file handling, which come from Python’s pickle module.

These functions are used to serialize and deserialize Python objects — basically saving Python data
(like dictionaries, lists, objects) to a binary file and loading them back later.

🔹 What Is Serialization?

Serialization is the process of converting a Python object into a byte stream (binary), so it can be:

 Written to a file

 Sent over a network

 Stored in a database

The reverse process is deserialization.

🔸 Module: pickle

import pickle

🔹 1. pickle.dump(obj, file) → Write to Binary File

✅ Purpose:

Serializes a Python object and writes it to a binary file.


📘 Example:

import pickle

data = {'name': 'Alice', 'age': 25, 'languages': ['English', 'Hindi']}

with open("data.pkl", "wb") as file:

pickle.dump(data, file)

 File must be opened in 'wb' mode (write binary)

 Object can be any serializable Python data: dict, list, tuple, etc.

🔹 2. pickle.load(file) → Read from Binary File

✅ Purpose:

Reads from a binary file and deserializes (reconstructs) the original object.

📘 Example:

import pickle

with open("data.pkl", "rb") as file:

loaded_data = pickle.load(file)

print(loaded_data)

Output:

{'name': 'Alice', 'age': 25, 'languages': ['English', 'Hindi']}

🔁 Summary: dump() vs load()

Function Purpose File Mode Input/Output

dump() Serialize and write object 'wb' Python object → file

load() Read and deserialize object 'rb' file → Python object

🔒 Safety Note

Never use pickle.load() on a file from an untrusted source — it can execute arbitrary code and is a
security risk.
🔹 What Can You Serialize?

Supported Not Supported

int, float, str, bool Open file objects

list, dict, tuple, set Sockets, threads

Custom classes (mostly) Lambdas, nested functions (some cases)

🔸 Bonus: Saving Multiple Objects

with open("multi.pkl", "wb") as f:

pickle.dump([1, 2, 3], f)

pickle.dump("hello", f)

pickle.dump({"x": 10}, f)

with open("multi.pkl", "rb") as f:

print(pickle.load(f)) # [1, 2, 3]

print(pickle.load(f)) # 'hello'

print(pickle.load(f)) # {'x': 10}

🧠 When to Use pickle

Use pickle if:

 You need to save/load Python data structures in a binary format

 You're working with models, config data, or pre-processed objects

 Speed and file size efficiency are important

Avoid it when:

 You need interoperability with other programming languages → use json instead

Excellent question! Comparing pickle with conventional write() and read() functions helps
understand why and when serialization is better than plain text or manual file handling.

✅ Quick Summary:

Feature pickle write() / read()

Handles complex data ✅ Yes ❌ Manual formatting required


Feature pickle write() / read()

structures

❌ Text only (unless manually


Binary format ✅ Compact & fast
converted)

Human-readable ❌ No ✅ Yes

Cross-language support ❌ Python only ✅ (e.g., with JSON/XML/CSV)

Easy to use ✅ Very easy ❌ Tedious for complex data

Secure to load from ❌ No! (Risk of code


✅ Yes (if using plain text)
untrusted? execution)

🔍 Detailed Advantages of pickle over write() / read()

1. 🧠 Automatic Handling of Python Objects

With pickle, you can directly store:

 Lists, dictionaries, sets

 Nested data structures

 Custom class objects

Example:

data = {'name': 'Alice', 'scores': [85, 90, 78]}

pickle.dump(data, file) # Just one line!

💥 But with write():


You'd have to convert it to a string manually, and parse it again with eval() or similar (which is
dangerous and error-prone).

2. Performance (Speed and Size)

Pickle uses a compact binary format, which:

 Is faster to read/write

 Produces smaller file sizes for complex data

Text formats (like writing strings with write()) are usually larger and slower for structured data.

3. Less Code and No Manual Parsing

With write():
 You'd need to format strings (str(), join(), split(), etc.)

 Later you'd have to reconstruct the structure from text

With pickle, both serialization and deserialization are one-liners.

4. 🧩 Custom Class Support

You can store and reload instances of your own classes:

pickle.dump(my_object, file)

...

my_object = pickle.load(file)

This is impossible with basic write() without building a full custom save/load system.

5. 🔄 Handles References Automatically

If your data structure has shared references or cycles, pickle preserves them — whereas write()
would lose or duplicate them.

🛑 Limitations / When Not to Use pickle

Issue Why Use Alternative

Security load() can execute arbitrary code Only load from trusted sources

Human readability Binary format, not editable Use JSON/XML for text formats

Cross-language usage Python-only format Use JSON/CSV/XML

✅ Best Use Cases for pickle

 Saving trained machine learning models (.pkl files)

 Storing cached processed data

 Saving and restoring sessions or program states

 Quick prototyping and testing

✅ Recap:

Use pickle when you want to:

 Store complex data structures

 Avoid manual parsing


 Prioritize speed and convenience

Use write()/read() when:

 You need plain text for logging, editing, or debugging

 You're writing simple strings

 You're sharing files with non-Python systems

Would you like a working example showing how the same dictionary is saved using both pickle and
write() for side-by-side comparison?

You might also like