-
-
Notifications
You must be signed in to change notification settings - Fork 26.6k
[MRG] add warning when importing old or new pickle. #7248
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
So, this works nicely. My test is a bit crazy, removing ping @jnothman ;) |
|
one question remains, should we leave the |
sklearn/base.py
Outdated
| pickle_version = state.get("__version__", "pre-0.18") | ||
| if pickle_version != __version__: | ||
| warnings.warn( | ||
| "Trying to unpickle estimator from version {} when with using " |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
drop "with"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given the lack of traceback with warnings, we should probably mention the estimator. (Although we could use stacklevel, the previous stack entries will be in the pickle module.) The annoyance of that will be that the message will be printed for every estimator loaded.
|
done |
|
|
||
| # check that no warning is raised for external estimators | ||
| DecisionTreeClassifier.__module__ = "notsklearn" | ||
| assert_no_warnings(pickle.loads, tree_pickle_noversion) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You probably need to reset the __module__ after the assert_no_warnings. I am guessing this is the reason of the test failures.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh yeah that's true.
|
Is there a way to make this monkey-patching not break stuff if tests are run with parallelism? |
|
maybe I should just remove the monkey patching. It's evil. |
|
Ah, wait, it's the fiddeling with the |
| if type(self).__module__.startswith('sklearn.'): | ||
| return dict(self.__dict__.items(), __version__=__version__) | ||
| else: | ||
| return dict(self.__dict__.items()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why don't we simply add a version_ attribute to BaseEstimator?
That way the version number of sklearn is stored on pickle without such a hack for everything that inherits from BaseEstimator.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why hack? This adds the information to the file when the estimator is stored.
What would be the benefit of a version_ attribute outside of serialization?
The __setstate__ would still need to be changed the same way, right? (only without a pop)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why hack? This adds the information to the file when the estimator is stored.
What would be the benefit of a version_ attribute outside of serialization?
Clarity and simplicity. Here we are relying on overriding the pickling
mechanism. We wouldn't need so. Less overriding of __ methods leads to
less tricky bug to debug.
The setstate would still need to be changed the same way, right? (only
without a pop)
I believe so.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not fundamentally against adding it as an attribute to BaseEstimator but I'd rather make the attribute private to not pollute the autocomplete too much.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the class attribute issue is the bigger problem here. It doesn't get pickled unless it's set on the instance, does it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we still have that only on models inside scikit-learn? It would be a class
attribute,
+1: see in the class of BaseEstimator. It should be very simple, just
adding
_version_ = sklearn.__version__
before the init in the class.
The only doubt that I have is about circular imports by accessing the
central sklearn module.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@GaelVaroquaux does that mean you don't object to the current solution?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ping @GaelVaroquaux [sorry, I know you're busy]
|
@jnothman wdyt? |
|
|
|
|
||
| # check that not including any version also works: | ||
| # TreeNoVersion has no getstate, like pre-0.18 | ||
| tree = TreeNoVersion().fit(iris.data, iris.target) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is pretty neat compared to the previous version with monkey-patching!
add tests for pickle warning add test that loading something stored with no custom __getstate__ will still work (and raise a warning) added "within sklearn" check, included estimator name in warning message. add that pickle warnings only apply to sklearn estimators. changed tests to be threadsave and contain no monkey business
|
@jnothman added the tests to common tests. Only isotonic is not tested (because 1d input).... |
1edd87c to
b28cfbc
Compare
sklearn/utils/estimator_checks.py
Outdated
|
|
||
| # pickle and unpickle! | ||
| pickled_estimator = pickle.dumps(estimator) | ||
| assert_true(b"version" in pickled_estimator) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm... Perhaps only if module.startswith('sklearn.')!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
| # pickle and unpickle! | ||
| pickled_estimator = pickle.dumps(estimator) | ||
| if Estimator.__module__.startswith('sklearn.'): | ||
| assert_true(b"version" in pickled_estimator) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"version", no?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
took me a while to figure out what you mean ;) __version__? the underscores get mangled, I think. I felt this was enough. I can try to get the mangling to work, but I'd rather merge this and get 0.18 out of the door...
|
Apart from https://github.com/scikit-learn/scikit-learn/pull/7248/files#r78072506, I am also +1. |
|
Merging! |
|
sweet, thanks :) |
|
needs a whatsnew, right? |
Adds a warning when loading a possibly incompatible pickle from a different version of sklearn.
Fixes #7135.