This repo contains different examples for below points related to NLP in Python.
Break up the text into component pieces called "tokens". They are the basic building block of a document object.
Defines a keyword and then find variations to find relations between them. For example, when you search for "boat" might also returns: "boating", "boats". There are two main approaches: Porter & Snowball.
Lemma is another word reduction approach but based on a morphological analysis of the words. For example, the lemma of "meeting" might be is "meet" or can be "meeting" depending on it is use in a sentence.
The words which appears frequently and they are not nouns, verbs or modifiers. This words do not require tagging.
Defines patterns to find if they exists in the document.
Defines patterns to find if they exists in the document.
The context defines the meaning of the words. Same words in different order can mean something completely different.
Locate and classify named entity mentions in unstructured text into predefined categories like person names, organizations, locations, medical codes, time expressions, monetary, quantity, percentages and so on.
Use SKLearn to pre-process text based on the frequency of the words.
Classify large volumes of text by clustering documents into topics. Use LDA - Latent Dirichlet Allocation to group the words in clusters.
VADER (Valence Aware Dictionary for Sentiment Reasoning) is a model to use in sentimental analysis which is sensitive to both polarity (positive or negative) and intensity of emotion. The "score" will be calculated summing the intensity of each word in the text (positive, negative, strong)
pip install -r requirements.txt
python3 -m spacy download en_core_web_sm