Natural Language Processing

This repo contains different examples for below points related to NLP in Python.

Tokenization

Break up the text into component pieces called "tokens". They are the basic building block of a document object.

Stemming

Defines a keyword and then find variations to find relations between them. For example, when you search for "boat" might also returns: "boating", "boats". There are two main approaches: Porter & Snowball.

Lemmatization

Lemma is another word reduction approach but based on a morphological analysis of the words. For example, the lemma of "meeting" might be is "meet" or can be "meeting" depending on it is use in a sentence.

Stop Words

The words which appears frequently and they are not nouns, verbs or modifiers. This words do not require tagging.

Pattern Matching

Defines patterns to find if they exists in the document.

Phrase Matching

Defines patterns to find if they exists in the document.

Part of speech "POS"

The context defines the meaning of the words. Same words in different order can mean something completely different.

Named Entity Recognition (NER)

Locate and classify named entity mentions in unstructured text into predefined categories like person names, organizations, locations, medical codes, time expressions, monetary, quantity, percentages and so on.

Feature Extraction

Use SKLearn to pre-process text based on the frequency of the words.

Topic Modelling using LDA (Unsupervised Learning) & Non Negative Matrix

Classify large volumes of text by clustering documents into topics. Use LDA - Latent Dirichlet Allocation to group the words in clusters.

Semantic Analysis

VADER (Valence Aware Dictionary for Sentiment Reasoning) is a model to use in sentimental analysis which is sensitive to both polarity (positive or negative) and intensity of emotion. The "score" will be calculated summing the intensity of each word in the text (positive, negative, strong)

Installation using pip

pip install -r requirements.txt

python3 -m spacy download en_core_web_sm

Name		Name	Last commit message	Last commit date
Latest commit History 28 Commits
README.md		README.md
feature_extraction.py		feature_extraction.py
lemma.py		lemma.py
ner.py		ner.py
pattern_matcher.py		pattern_matcher.py
phrase_matcher.py		phrase_matcher.py
pos.py		pos.py
requirements.txt		requirements.txt
semantic_analysis.py		semantic_analysis.py
stemming.py		stemming.py
stopwords.py		stopwords.py
tokenization.py		tokenization.py
topic_lda.py		topic_lda.py
topic_non_negative_matrix.py		topic_non_negative_matrix.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Natural Language Processing

Tokenization

Stemming

Lemmatization

Stop Words

Pattern Matching

Phrase Matching

Part of speech "POS"

Named Entity Recognition (NER)

Feature Extraction

Topic Modelling using LDA (Unsupervised Learning) & Non Negative Matrix

Semantic Analysis

Installation using pip

About

Uh oh!

Releases

Packages

Uh oh!

Languages

fsartoris/nlp

Folders and files

Latest commit

History

Repository files navigation

Natural Language Processing

Tokenization

Stemming

Lemmatization

Stop Words

Pattern Matching

Phrase Matching

Part of speech "POS"

Named Entity Recognition (NER)

Feature Extraction

Topic Modelling using LDA (Unsupervised Learning) & Non Negative Matrix

Semantic Analysis

Installation using pip

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages