Thanks to visit codestin.com
Credit goes to Github.com

Skip to content

Conversation

@henrifroese
Copy link
Collaborator

@henrifroese henrifroese commented Aug 26, 2020

Straightforward implementation of description in #166 .

Example:

>>> import texthero as hero
>>> import pandas as pd
>>> df = pd.read_csv("https://raw.githubusercontent.com/jbesomi/texthero/master/dataset/bbcsport.csv")
>>> hero.describe(df["text"], df["topic"])
                                                                                              Value
number of documents                                                                             737
number of unique documents                                                                      727
number of missing documents                                                                       0
most common words                                          [the, to, a, in, and, of, for, ", I, is]
most common words excluding stopwords             [said, first, england, game, one, year, two, w...
average document length                                                                     387.803
length of shortest document                                                                     119
length of longest document                                                                     1855
standard deviation of document lengths                                                      210.728
25th percentile document lengths                                                                241
50th percentile document lengths                                                                340
75th percentile document lengths                                                                494
label distribution                     football                                            0.359566
                                       rugby                                               0.199457
                                       cricket                                              0.16825
                                       athletics                                           0.137042
                                       tennis                                              0.135685

Screenshot of pretty-printed output from Google Colab:

Screenshot from 2020-08-26 15-41-24

Note: only so many lines changed because this builds upon #157

mk2510 and others added 16 commits August 18, 2020 22:06
suport MultiIndex as function parameter

returns MultiIndex, where Representation was returned

* missing: correct test


Co-authored-by: Henri Froese <[email protected]>
*missing: test adopting for new types


Co-authored-by: Henri Froese <[email protected]>
- add functionality for decorator @InputSeries to handle several allowed input types
- Add typing decorator/hints to representation.py
- add tests for _types DocumentTermDF

Co-authored-by: Maximilian Krahn <[email protected]>
Co-authored-by: Maximilian Krahm <[email protected]>
Co-authored-by: Henri Froese <[email protected]>
@mk2510
Copy link
Collaborator

mk2510 commented Sep 22, 2020

this branch is now based on the master and ready for review/to be merged 🦸 🦸‍♂️ 🦸‍♀️

@mk2510 mk2510 marked this pull request as ready for review September 22, 2020 12:57
@jbesomi
Copy link
Owner

jbesomi commented Apr 8, 2021

Amazing, will check soon and let you know! 🎉 🎉

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants