-
Notifications
You must be signed in to change notification settings - Fork 4.4k
Open
1 / 11 of 1 issue completedDescription
Cloudpickle is set as the default pickle_library
in 2.65.0, where the previous default was dill. See https://s.apache.org/beam-cloudpickle-next-steps for background.
This can cause breakages in cases where the behavior of dill and cloudpickle diverge.
cloudpickle_pickler_test tests demonstrates the behavior of cloudpickle in various cases. Notable behavior includes:
- Globals defined in
__main__
module are pickled by value - Globals defined in importable modules are pickled by reference
- Module aliased globals are pickled by value
- All functions and classes defined in
__main__
module are pickled by value - All closures and dynamic types are pickled by value.
Known issues include:
- Unittests that rely on globals will fail. Cloudpickle assumes the
__main__
module is not available in the unpickling environment and therefore redefines globals. To fix tests that rely on globals use the apache_beam.utils.shared module as shown indef test_globals_shared_are_pickled_by_reference(self): - Closures and dynamic classes that reference unpicklable objects fail. This can be fixed by defining functions at the top level and binding arguments with
functools.partial
when necessary - When encountering types not picklable by cloudpickle, rather define these types in an importable module in which case they will be pickled by reference.
Please report any new issues on this tracking bug. For any breakages that require reverting back to dill specify the --pickle_library=dill
pipeline option.
Issue Priority
Priority: 2 (default / most normal work should be filed as P2)
Issue Components
- Component: Python SDK
- Component: Java SDK
- Component: Go SDK
- Component: Typescript SDK
- Component: IO connector
- Component: Beam YAML
- Component: Beam examples
- Component: Beam playground
- Component: Beam katas
- Component: Website
- Component: Infrastructure
- Component: Spark Runner
- Component: Flink Runner
- Component: Samza Runner
- Component: Twister2 Runner
- Component: Hazelcast Jet Runner
- Component: Google Cloud Dataflow Runner
sadovnychyi