Thanks to visit codestin.com
Credit goes to fory.apache.org

Skip to main content
Version: 0.14

Python Serialization Guide

Apache Fory™ is a blazing fast multi-language serialization framework powered by JIT compilation and zero-copy techniques, providing up to ultra-fast performance while maintaining ease of use and safety.

pyfory provides the Python implementation of Apache Fory™, offering both high-performance object serialization and advanced row-format capabilities for data processing tasks.

Key Features

Flexible Serialization Modes

  • Python native Mode: Full Python compatibility, drop-in replacement for pickle/cloudpickle
  • Cross-Language Mode: Optimized for multi-language data exchange
  • Row Format: Zero-copy row format for analytics workloads

Versatile Serialization Features

  • Shared/circular reference support for complex object graphs in both Python-native and cross-language modes
  • Polymorphism support for customized types with automatic type dispatching
  • Schema evolution support for backward/forward compatibility when using dataclasses in cross-language mode
  • Out-of-band buffer support for zero-copy serialization of large data structures like NumPy arrays and Pandas DataFrames, compatible with pickle protocol 5

Blazing Fast Performance

  • Extremely fast performance compared to other serialization frameworks
  • Runtime code generation and Cython-accelerated core implementation for optimal performance

Compact Data Size

  • Compact object graph protocol with minimal space overhead—up to 3× size reduction compared to pickle/cloudpickle
  • Meta packing and sharing to minimize type forward/backward compatibility space overhead

Security & Safety

  • Strict mode prevents deserialization of untrusted types by type registration and checks.
  • Reference tracking for handling circular references safely

Installation

Basic Installation

pip install pyfory

Optional Dependencies

# Install with row format support (requires Apache Arrow)
pip install pyfory[format]

# Install from source for development
git clone https://github.com/apache/fory.git
cd fory/python
pip install -e ".[dev,format]"

Requirements

  • Python: 3.8 or higher
  • OS: Linux, macOS, Windows

Thread Safety

pyfory provides ThreadSafeFory for thread-safe serialization using thread-local storage:

import pyfory
import threading
from dataclasses import dataclass

@dataclass
class Person:
name: str
age: int

# Create thread-safe Fory instance
fory = pyfory.ThreadSafeFory(xlang=False, ref=True)
fory.register(Person)

# Use in multiple threads safely
def serialize_in_thread(thread_id):
person = Person(name=f"User{thread_id}", age=25 + thread_id)
data = fory.serialize(person)
result = fory.deserialize(data)
print(f"Thread {thread_id}: {result}")

threads = [threading.Thread(target=serialize_in_thread, args=(i,)) for i in range(10)]
for t in threads: t.start()
for t in threads: t.join()

Key Features:

  • Instance Pool: Maintains a pool of Fory instances protected by a lock for thread safety
  • Shared Configuration: All registrations must be done upfront and are applied to all instances
  • Same API: Drop-in replacement for Fory class with identical methods
  • Registration Safety: Prevents registration after first use to ensure consistency

When to Use:

  • Multi-threaded Applications: Web servers, concurrent workers, parallel processing
  • Shared Fory Instances: When multiple threads need to serialize/deserialize data
  • Thread Pools: Applications using thread pools or concurrent.futures

Quick Start

import pyfory
from dataclasses import dataclass

@dataclass
class Person:
name: str
age: int

# Create Fory instance
fory = pyfory.Fory(xlang=False, ref=True)
fory.register(Person)

person = Person("Alice", 30)
data = fory.serialize(person)
result = fory.deserialize(data)
print(result) # Person(name='Alice', age=30)

Next Steps