Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Conversation

thomasjpfan
Copy link
Member

Reference Issues/PRs

Address a part of #12327

What does this implement/fix? Explain your changes.

Removes the use of np.matrix in CountVectorizer.inverse_transform

Copy link
Member

@lorentzenchr lorentzenchr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this PR. A further step to get rid of np.matrix.

Copy link
Member

@ogrisel ogrisel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

Copy link
Member

@lorentzenchr lorentzenchr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@lorentzenchr lorentzenchr merged commit 863c552 into scikit-learn:main Jan 30, 2021
@glemaitre glemaitre mentioned this pull request Apr 22, 2021
12 tasks
glemaitre pushed a commit to glemaitre/scikit-learn that referenced this pull request Apr 22, 2021
glemaitre pushed a commit that referenced this pull request Apr 28, 2021
@rachanagusain
Copy link

word n-grams not working for Devanagari script. Tokens are formed only until a modifier is found, which should not be the case. The default should be a whitespace.
Please see if regex should be imported instead of re.

d = ['आई- जाइयै नुहाड़ा ध्यान उस्सै भेठा जा’रदा हा।',
'य बात सुणते ही सब गौं वाल नजदीक जंगलै तरफ भाज् और बांनर पकड़ पकड़ बेर 100- 100 रु में बेचंण लाग्।']

v = CountVectorizer()
x = v.fit_transform(d)
f = v.get_feature_names()
print(f)

Output: ['100', 'आई', 'इय', 'उस', 'और', 'गल', 'णत', 'तरफ', 'नजद', 'नर', 'पकड', 'रद', 'सब']

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants