Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Conversation

phoebusm
Copy link
Collaborator

@phoebusm phoebusm commented Aug 4, 2025

Reference Issues/PRs

https://man312219.monday.com/boards/7852509418/pulses/9122332360

What does this implement or fix?

The PR that trying to align will_item_be_pickled and is_symbol_pickled has been scrapped as the change is likely to break users' logic.
Instead better docstrings are added for both. Better warning for will_item_be_pickled only as doing so for is_symbol_pickle requires substantatial effort

Any other comments?

Checklist

Checklist for code changes...
  • Have you updated the relevant docstrings, documentation and copyright notice?
  • Is this contribution tested against all ArcticDB's features?
  • Do all exceptions introduced raise appropriate error messages?
  • Are API changes highlighted in the PR description?
  • Is the PR labelled as enhancement or bug so it appears in autogenerated release notes?

@phoebusm phoebusm added no-release-notes This PR shouldn't be added to release notes. minor Feature change, should increase minor version and removed no-release-notes This PR shouldn't be added to release notes. labels Aug 4, 2025
@phoebusm phoebusm changed the title Feature/add warning to pickle related funcs Add docstring and warning for disparity between will_item_be_pickled and is_symbol_pickle Aug 4, 2025
result = True

return norm_meta.WhichOneof("input_type") == "msg_pack_frame"
result |= norm_meta.WhichOneof("input_type") == "msg_pack_frame"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this correct if the item to be normalized is a POD dict like {"hello": "there"}? Then I think the norm meta will still say msgpack, but nothing will be pickled?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In [11]: dict_data = {"hello": "there"}

In [12]: lib._nvs.write("test", dict_data, recursive_normalizers=True)
Out[12]: VersionedItem(symbol='test', library='test', data=n/a, version=2, metadata=None, host='S3(endpoint=s3.eu-west-1.amazonaws.com, bucket=arcticdb-ci-test-bucket-02)', timestamp=1754503211644392787)

In [13]: lib._nvs.is_symbol_pickled("test")
Out[13]: True

In [14]: lib._nvs.will_item_be_pickled(dict_data, recursive_normalizers=True)
Out[14]: True

Yes it'll msgpack-normalized. Both APIs return true but because of different reasons.
is_symbol_pickled: as the type of data is not native
will_item_be_pickled: As it considers msgpack-normalized data "pickled"

if is_recursive_normalize_preferred:
log.warning("As the library setting recursive_normalizers is enabled, the item "
"will be recursively normalized in `write`. "
"However, for backward compatibility, this API will still return True.")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It isn't necessarily the library setting, it could be the env var or the argument to this method.
I would also combine with the other call to log.warning, there is no guarantee these logs will appear near each other in a busy logfile.
I would also add more detail as to why this method has historically returned true even if pickling is not involved (not date_range searchable, queryable, etc).
I think we also discussed being able to disable this log with an env var?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The warning can be disabled by setting the log level to INFO.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Log level is a global setting though, we want to be able to just disable this message

{"a": [1, 2, 3], "b": {"c": np.arange(24)}, "d": [TestCustomNormalizer()]} # A random item that will be pickled
]
)
def test_will_item_be_pickled_recursive_normalizer(lmdb_version_store_v1, data, capfd):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We stopped using capfd as all the tests it was used for were flaky. If all these tests passed for you locally we can live with the warning messages not being tested


return norm_meta.WhichOneof("input_type") == "msg_pack_frame"
result |= norm_meta.WhichOneof("input_type") == "msg_pack_frame"
log_warning_message = strtobool(os.getenv("VersionStore.WillItemBePickledWarningMsg", "1")) and log.is_active(_LogLevel.WARN)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Env var should follow our naming convention

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It should be a config instead of env var

@phoebusm phoebusm force-pushed the feature/add_warning_to_pickle_related_funcs branch from efd68d9 to fee7b04 Compare August 22, 2025 16:55
@phoebusm phoebusm merged commit d552af4 into master Aug 26, 2025
399 of 404 checks passed
@phoebusm phoebusm deleted the feature/add_warning_to_pickle_related_funcs branch August 26, 2025 09:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
minor Feature change, should increase minor version
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants