Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Latest commit

 

History

History

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

README.md

Text Representation

🔖 Outline

To be added

🗒️ Notebooks

Set of notebooks associated with the chapter.

  1. One-Hot Encoding: Here we demonstrate One-Hot encoding from the first principle as well as scikit learn's implementation on our toy corpus.

  2. Bag of Words : Here we demonstrate how to arrive at the bag of words representation for our toy corpus.    

  3. Bag of N Grams: Here we demonstrate how Bag of N-Grams work using our toy corpus.

  4. TF-IDF: Here we demonstrate how to obtain the get the TF-IDF representation of a document using sklearn's TfidfVectorizer(we will be using our toy corpus).

  5. Pre-trained Word Embeddings: Here we demonstrate how we can represent text using pre-trained word embedding models and how to use them to get representations for the full text.

  6. Custom Word Embeddings: Here we demonstrate how to train a custom Word Embedding model(word2vec) using gensim on both, our toy corpus and a subset of Wikipedia data.

  7. Vector Representations via averaging: Here we demonstrate averaging of Document Vectors using spaCy.

  8. Doc2Vec Model: Here we demonstrate how to train your own doc2vec model.

  9. Visualizing Embeddings Using TSNE: Here we demonstrate how we can use dimensionality reduction techniques such as TSNE to visualize embeddings.

  10. Visualizing Embeddings using Tensorboard: Here we demonstrate how we can visualize embeddings using Tensorboard.

🖼️ Figures

Color figures as requested by the readers.

figure figure figure figure figure figure figure figure figure figure figure figure figure figure figure figure figure figure figure