This repository includes the notebooks which handles the basic things, likes algorithms and basic tools for doing NLP (Natural Language Processing) within each specific folder. Mainly. this is intended for the beginners who want to start NLP stuffs. It shows what kinds of things he/she should be familiar with with detailed explanation.
- notebooks the assignments and labs of the
Natural Language Propressing in Tensorflowcourse fromDeepLearning.AIonCoursera
- notebooks to see different basic visualization with matplotlib in each cell.
- notebooks to do topic modeling with
Latent Dirichlet Allocation.
- stores Data Files such as
.csvand Model files which are needed to load in the notebooks
tokenize_basic_tensorflow_keras.ipynb
- Notebook with basic tokenization code to tokenize the sentences with spaces using tensorflow and keras
- checking synonyms and hypernyms of WordNet from
NLTK
- normalizing and tokenizing the tweets including processing with stopwords, punctuations, stemming, lowercase and hyperlinks, needs to import utils.py
- the utility file to be imported in preprocessing.ipynb, building_and_visualizing_word_frequencies.ipynb
- the notebook how to do linear algebra with vectors and matrices with numpy
manipulating_word_embeddings.ipynb
- to see how word vectors works and find the relations betweens words. will need to upload the model file word_embeddings_subset.p.
building_and_visualizing_word_frequencies.ipynb
- to create word frequencies for feature extraction, needs to import utils.py
- Explaining PCA, based on the Singular Value Decomposition (SVD) of the Covariance Matrix of the original dataset, related to Eigenvalues and Eigenvectors which are used as The Rotation Matrix.pdf
- might need some images under the
imagesdirectory for the display in the notebook
logistic_regression_model.ipynb
- visualization and interpreting logistic regression
- uses logistic_features.csv under the
datadirectory
LogisticRegression_fromScratch.ipynb
- building and evaluating the
Logistic Regressionfrom Scratch - does Preprocessing, Feature Extraction, predicting new tweets
- includes implementing loss function and the
gradient descentlearning algorithm from Scratch - needs to import utils.py and w1_unittest.py
- interpreting
Naive BayesPerformance - need to upload data/bayes_features.csv
- how to get the data from Wikipedia
-
WikipediaData wikipedia_library.ipynb
Tensorflow Subword Text Encoder or Subword Tokenizer