Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Conversation

@amueller
Copy link
Member

Adds a warning when loading a possibly incompatible pickle from a different version of sklearn.
Fixes #7135.

@amueller amueller changed the title add warning when importing old or new pickle. [MRG] add warning when importing old or new pickle. Aug 25, 2016
@amueller
Copy link
Member Author

So, this works nicely. My test is a bit crazy, removing __getstate__ from BaseEstimator by monkey-patching, to emulate the old (current) behavior. I could also overwrite the __getstate__ on a single estimator, making it the vanilla return self.__dict__.

ping @jnothman ;)

@amueller
Copy link
Member Author

one question remains, should we leave the __version__ string stored in the estimator in unpacking? We can remove it to make pickle.loads(pickle.dumps(est)) a no-op.

@MechCoder MechCoder added this to the 0.18 milestone Aug 26, 2016
sklearn/base.py Outdated
pickle_version = state.get("__version__", "pre-0.18")
if pickle_version != __version__:
warnings.warn(
"Trying to unpickle estimator from version {} when with using "
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

drop "with"

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given the lack of traceback with warnings, we should probably mention the estimator. (Although we could use stacklevel, the previous stack entries will be in the pickle module.) The annoyance of that will be that the message will be printed for every estimator loaded.

@amueller
Copy link
Member Author

done


# check that no warning is raised for external estimators
DecisionTreeClassifier.__module__ = "notsklearn"
assert_no_warnings(pickle.loads, tree_pickle_noversion)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You probably need to reset the __module__ after the assert_no_warnings. I am guessing this is the reason of the test failures.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh yeah that's true.

@jnothman
Copy link
Member

Is there a way to make this monkey-patching not break stuff if tests are run with parallelism?

@amueller
Copy link
Member Author

maybe I should just remove the monkey patching. It's evil.

@amueller
Copy link
Member Author

Ah, wait, it's the fiddeling with the __module__ that breaks tests, not the way more intrusive monkey-patching of BaseEstimator... I'll fix the module stuff in a bit, not sure if the monkey patching of the BaseEstimator test is worth the potential trouble.

if type(self).__module__.startswith('sklearn.'):
return dict(self.__dict__.items(), __version__=__version__)
else:
return dict(self.__dict__.items())
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why don't we simply add a version_ attribute to BaseEstimator?

That way the version number of sklearn is stored on pickle without such a hack for everything that inherits from BaseEstimator.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why hack? This adds the information to the file when the estimator is stored.
What would be the benefit of a version_ attribute outside of serialization?

The __setstate__ would still need to be changed the same way, right? (only without a pop)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why hack? This adds the information to the file when the estimator is stored.
What would be the benefit of a version_ attribute outside of serialization?

Clarity and simplicity. Here we are relying on overriding the pickling
mechanism. We wouldn't need so. Less overriding of __ methods leads to
less tricky bug to debug.

The setstate would still need to be changed the same way, right? (only
without a pop)

I believe so.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not fundamentally against adding it as an attribute to BaseEstimator but I'd rather make the attribute private to not pollute the autocomplete too much.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the class attribute issue is the bigger problem here. It doesn't get pickled unless it's set on the instance, does it?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we still have that only on models inside scikit-learn? It would be a class
attribute,

+1: see in the class of BaseEstimator. It should be very simple, just
adding

   _version_ = sklearn.__version__

before the init in the class.

The only doubt that I have is about circular imports by accessing the
central sklearn module.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@GaelVaroquaux does that mean you don't object to the current solution?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ping @GaelVaroquaux [sorry, I know you're busy]

@amueller
Copy link
Member Author

amueller commented Sep 7, 2016

@jnothman wdyt?

@jnothman
Copy link
Member

jnothman commented Sep 8, 2016

IsotonicRegression overrides __[gs]etstate__ without calling super.
This should be (part of) a common test. (Sorry for the added work.)


# check that not including any version also works:
# TreeNoVersion has no getstate, like pre-0.18
tree = TreeNoVersion().fit(iris.data, iris.target)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is pretty neat compared to the previous version with monkey-patching!

add tests for pickle warning

add test that loading something stored with no custom __getstate__ will still work (and raise a warning)

added "within sklearn" check, included estimator name in warning message.

add that pickle warnings only apply to sklearn estimators.

changed tests to be threadsave and contain no monkey business
@amueller
Copy link
Member Author

amueller commented Sep 8, 2016

@jnothman added the tests to common tests. Only isotonic is not tested (because 1d input)....


# pickle and unpickle!
pickled_estimator = pickle.dumps(estimator)
assert_true(b"version" in pickled_estimator)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm... Perhaps only if module.startswith('sklearn.')!

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

# pickle and unpickle!
pickled_estimator = pickle.dumps(estimator)
if Estimator.__module__.startswith('sklearn.'):
assert_true(b"version" in pickled_estimator)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"version", no?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

took me a while to figure out what you mean ;) __version__? the underscores get mangled, I think. I felt this was enough. I can try to get the mangling to work, but I'd rather merge this and get 0.18 out of the door...

@ogrisel
Copy link
Member

ogrisel commented Sep 8, 2016

@ogrisel
Copy link
Member

ogrisel commented Sep 10, 2016

Merging!

@ogrisel ogrisel merged commit b4872fe into scikit-learn:master Sep 10, 2016
@amueller
Copy link
Member Author

sweet, thanks :)

@amueller
Copy link
Member Author

needs a whatsnew, right?

amueller added a commit to amueller/scikit-learn that referenced this pull request Sep 12, 2016
rsmith54 pushed a commit to rsmith54/scikit-learn that referenced this pull request Sep 14, 2016
rsmith54 pushed a commit to rsmith54/scikit-learn that referenced this pull request Sep 14, 2016
TomDLT pushed a commit to TomDLT/scikit-learn that referenced this pull request Oct 3, 2016
TomDLT pushed a commit to TomDLT/scikit-learn that referenced this pull request Oct 3, 2016
Sundrique pushed a commit to Sundrique/scikit-learn that referenced this pull request Jun 14, 2017
Sundrique pushed a commit to Sundrique/scikit-learn that referenced this pull request Jun 14, 2017
paulha pushed a commit to paulha/scikit-learn that referenced this pull request Aug 19, 2017
paulha pushed a commit to paulha/scikit-learn that referenced this pull request Aug 19, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants