Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Conversation

@glemaitre
Copy link
Member

closes #30016

This PR makes sure that we fed an array of the same dtype as X in the np.log to ensure that idf_ has also the same dtype than X.

This bug is only feasible in NumPy < 2. The casting inference in NumPy >= 2 will pick up the higher precision dtype from the left and right end side variables.

@github-actions
Copy link

github-actions bot commented Oct 7, 2024

✔️ Linting Passed

All linting checks passed. Your pull request is in excellent shape! ☀️

Generated for commit: d0fc504. Link to the linter CI: here

@glemaitre glemaitre changed the title Is/30016 FIX make sure that TFIDFVectorizer set idf_ dtype based on X.dtype Oct 7, 2024
Copy link
Contributor

@OmarManzoor OmarManzoor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks @glemaitre

@OmarManzoor OmarManzoor added the Waiting for Second Reviewer First reviewer is done, need a second one! label Oct 11, 2024
@adrinjalali adrinjalali enabled auto-merge (squash) October 14, 2024 11:51
@adrinjalali adrinjalali merged commit 691b00f into scikit-learn:main Oct 14, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

module:feature_extraction Waiting for Second Reviewer First reviewer is done, need a second one!

Projects

None yet

Development

Successfully merging this pull request may close these issues.

TfidfVectorizer does not preserve dtype for large size inputs

3 participants