Thanks to visit codestin.com
Credit goes to github.com

Skip to content

DEP PassiveAggressiveClassifier and PassiveAggressiveRegressor #29097

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 15 commits into
base: main
Choose a base branch
from

Conversation

lorentzenchr
Copy link
Member

Reference Issues/PRs

#29088 (comment)

What does this implement/fix? Explain your changes.

This PR deprecates the 2 classes PassiveAggressiveClassifier and PassiveAggressiveRegressor. A user can easily use SGDClassifier and SGDRegressor instead and I cannot figure out the added value of these 2 classes.

@scikit-learn/core-devs ping for decision (in particular in case you are against the deprecation)

Copy link

github-actions bot commented May 24, 2024

βœ”οΈ Linting Passed

All linting checks passed. Your pull request is in excellent shape! β˜€οΈ

Generated for commit: 05b7bbd. Link to the linter CI: here

@adrinjalali
Copy link
Member

I'm +1 on this:

  • I haven't seen anybody use it (except maybe @amueller ?)
  • If we deprecate and people start complaining, we can revert deprecation. So in a sense deprecation is a way to check usage here
  • The deprecation message clearly should always point to an alternative, which I think is the case here.

@glemaitre
Copy link
Member

The deprecation message clearly should always point to an alternative, which I think is the case here.

I think that we should do a bit more and specify how you get a similar behaviour by setting the right parameters.

@lorentzenchr
Copy link
Member Author

In fa3d842, I added C to SGDClassifier and SGDRegressor to make it more accessible.
I could imagine to put the equivalent PA estimators into the SGD docstring - maybe once the deprecation is carried out.

@adrinjalali
Copy link
Member

Nice. We need to add more information on the docstring about C though. It's very short and not clear what it does when I read it. Otherwise LGTM.

@lorentzenchr lorentzenchr added this to the 1.6 milestone Jun 17, 2024
@adrinjalali
Copy link
Member

@lorentzenchr getting close to the release, would you mind resolving issues here?

@lorentzenchr
Copy link
Member Author

@lorentzenchr getting close to the release, would you mind resolving issues here?

I can resolve merge conflicts, but I won't do

We need to add more information on the docstring about C though.

SGDRegressor(
penalty=None,
alpha=1.0,
C=1.0
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I must be missing something, our SGDRegressor doesn't have a C param.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BaseSGDClassifier has an attibute C. This PR exposes this attribute to SGDClassifier (same for regressors).

I don't really like that. Proposition: I revert the exposition of C and write in the deprecation note, using somewhat private API:

clf = SGDClassifier(
    penalty=None,
    alpha=1.0,
    eta0=1.0,
    learning_rate="pa1",
    loss="hinge",
)
clf.C = 1.0  # THIS IS USING PRIVATE API

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See c24c95f. I can revert if you dislike it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we're deprecating these estimators and telling users to use SGDRegressor and SGDClassifier instead, it makes sense to actually have a public way of doing that. So I'm in favor of exposing C in both SGD estimators, document them, and then here simply create and equivalent estimator using them.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about leaving them private for the time being?

I would like to have a dedicated discussion whether to

  • expose C in SGD; xor
  • remove C and all passive aggressive things altogether

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this PR is the right place to have that discussion. I'd like others to weigh in about this point.

maybe @ogrisel @amueller @GaelVaroquaux @adam2392 would have an opinion?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not super familiar with passive aggressive vs SGD but a loose read suggests passive aggressive allows more online updating. Though i thought SGD easily allows online training too, so why would someone use one over the other?

Is it true that the only parameter that makes these different is "C" step size parameter?

Pragmatically: if no one uses "C", then i kinda agree might be simplest to just remove it? If there's something I'm missing tho, lmk.

Copy link
Member Author

@lorentzenchr lorentzenchr Oct 24, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PassiveAggressiveClassifier was first implemented in #1259 in 2012. I have never seen it used anywhere and I think our SGD... offer enough for online learning (partial_fit). So I would be fine with removing it completely, i.e. also removing C in SGD. Unless someone steps in for keeping it.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Having read through the code and partially papers, I could also live with making learning_rate="pa1" and "pa2" as well as C public in SGD.... In that case I would rename C to something like pa_C.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am partially in favor of just simplification and deprecating those parameters / the PAClf altogether, unless someone has an actual use case for them.

@lorentzenchr
Copy link
Member Author

The decorator deprecated interferes with the signature, e.g. in check_do_not_raise_errors_in_init_or_set_params:

    params = signature(Estimator).parameters

I need help how to avoid that (I won't fix deprecated) or rather to (temporarily) switch of those tests for PassiveAggressiveXX.

@lorentzenchr lorentzenchr force-pushed the dep_passive_aggressive branch from 7b21cfb to c367698 Compare October 23, 2024 21:20
@lorentzenchr
Copy link
Member Author

lorentzenchr commented Oct 24, 2024

Note that 306a5fa should be made a PR of its own in case this PR is not merged.

Edit: I opened #30145.

@glemaitre glemaitre removed this from the 1.6 milestone Nov 25, 2024
@glemaitre glemaitre added this to the 1.7 milestone Nov 25, 2024
@alexshtf
Copy link

I think that the main advantage of those two are that these are algorithms that work without a learning rate. It's one less parameter to tune, so they are theoretically easier to use.

I think the main issue is with their name. I don't have a better name for "Like SGD but you don't need to know the best learning rate", but I think that the lack of their usage is primarily a marketing issue stemming from the name.

In one of the videos in the probabl youtube channel, Vincent also demonstrated when it is useful.

@lorentzenchr
Copy link
Member Author

@scikit-learn/core-devs @scikit-learn/documentation-team @scikit-learn/contributor-experience-team @mblondel @zaxtax ping for a decision

The Passive-Aggressive algorithm was added in #1259 over 12 years ago. Almost no active core developer seems to know it well and google searches don't show many hits indicating little usage. This raises the question what to do with it. Options:

  1. Keep status quo.
  2. Remove the class PassiveAggressiveClassifier via a deprication cycle, but make it possible to parametrise a SGDClassifier equivalently (which needs to expose the additional parameter C).
  3. Remove it altogether, i.e. class PassiveAggressiveClassifier and private parameter C in SGD.

@adrinjalali
Copy link
Member

I'm happy to remove these classes, but it'd be nice for having the option of running them somehow, hence I'm in favor of option 2.

@betatim
Copy link
Member

betatim commented May 23, 2025

I tried to quickly educate myself about what C is and my "university of ten minutes effort" degree suggests it is some form of regularisation. Given that SGDClassifier has a way to have regularisation ... how about removing PassiveAggressive* and waiting to see if anyone comes complaining? If there are complaints about it missing and that SGDClassifier isn't a good replacement we can still add C as a hyper-parameter (though maybe with a better description than "max step size")

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants