Awesome Natural Language Processing (NLP)

A curated list of awesome frameworks, libraries, tools, datasets, tutorials, and research papers for Natural Language Processing (NLP). This list covers a variety of NLP tasks, from text processing and tokenization to state-of-the-art language models and applications like sentiment analysis and machine translation.

Frameworks and Libraries

Hugging Face Transformers - A comprehensive library of state-of-the-art NLP models like BERT, GPT, and RoBERTa.
spaCy - An open-source library for advanced natural language processing in Python.
NLTK (Natural Language Toolkit) - A comprehensive library for text processing and analysis.
Stanford NLP - A suite of NLP tools developed by the Stanford NLP Group.
AllenNLP - An open-source NLP research library built on top of PyTorch.
TextBlob - A simple library for processing textual data in Python.

Text Processing and Tokenization

Moses Tokenizer - A widely used tokenizer for machine translation tasks.
BPE (Byte Pair Encoding) - A subword tokenization technique used by models like GPT and BERT.
SentencePiece - A language-independent tokenization and text processing library.
RegexpTokenizer (NLTK) - A tokenizer that uses regular expressions to split text into tokens.
spaCy Tokenizer - A fast and efficient tokenizer integrated within the spaCy library.

Pretrained Language Models

BERT (Bidirectional Encoder Representations from Transformers) - A Transformer-based model for a variety of NLP tasks.
GPT-3 (Generative Pre-trained Transformer 3) - A powerful generative language model by OpenAI.
RoBERTa - An optimized variant of BERT, focusing on robustly optimized pretraining.
T5 (Text-to-Text Transfer Transformer) - A model that treats every NLP task as a text-to-text problem.
XLNet - A generalized autoregressive pretraining model that outperforms BERT on several tasks.
DistilBERT - A smaller, faster, and lighter version of BERT.

NLP Tasks

Sentiment Analysis: The process of determining the sentiment (positive, negative, or neutral) of a text.
- TextBlob Sentiment Analysis
- VADER Sentiment Analysis
Named Entity Recognition (NER): Identifying and classifying entities in text (e.g., names, dates).
- spaCy NER
- Stanford NER
Machine Translation: Translating text from one language to another.
- OpenNMT - A neural machine translation framework.
- Fairseq - A Facebook AI research framework for sequence-to-sequence models.
Text Summarization: Generating a concise summary of a given text.
- BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation
- PEGASUS - A pre-trained model specifically designed for text summarization.

Tools and Applications

Gensim - A Python library for topic modeling and document similarity.
Stanford CoreNLP - A suite of NLP tools for linguistic analysis.
FastText - A library for efficient text classification and representation learning.
Polyglot - A multilingual NLP toolkit supporting various languages.
LexRank - A text summarization library using graph-based ranking algorithms.

Datasets

GLUE Benchmark - A collection of resources for evaluating natural language understanding systems.
SQuAD (Stanford Question Answering Dataset) - A dataset for reading comprehension and question answering tasks.
CoNLL-2003 - A dataset for named entity recognition.
IMDB Reviews - A dataset for sentiment analysis.
WikiText - A collection of high-quality text from Wikipedia for language modeling tasks.
tiny_qa_benchmark_pp – Multilingual question-answering dataset collection with tools for generating synthetic QA data.

Research Papers

Attention Is All You Need (2017) - The paper that introduced the Transformer architecture, revolutionizing NLP.
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding (2018) - The introduction of the BERT model.
GloVe: Global Vectors for Word Representation (2014) - A model for generating word embeddings.
Word2Vec: Efficient Estimation of Word Representations in Vector Space (2013) - The introduction of Word2Vec, a method for learning word embeddings.
ELMo: Deep Contextualized Word Representations (2018) - A model for contextual word embeddings.

Learning Resources

Coursera: Natural Language Processing Specialization - A comprehensive course on NLP by Deeplearning.ai.
Stanford CS224N: Natural Language Processing with Deep Learning - A popular university course on NLP.
Fast.ai NLP Course - A practical course on NLP using the fastai library.
Hugging Face Tutorials - Official tutorials for using the Hugging Face NLP library.

Books

Speech and Language Processing by Daniel Jurafsky and James H. Martin - A comprehensive textbook on NLP.
Natural Language Processing with Python by Steven Bird, Ewan Klein, and Edward Loper - An introduction to NLP using Python.
Deep Learning for Natural Language Processing by Palash Goyal, Sumit Pandey, and Karan Jain - A book covering deep learning techniques in NLP.

Community

Reddit: r/NLP - A subreddit for discussions on natural language processing.
Hugging Face Community - A forum for discussing the Hugging Face NLP library.
NLP Summit - An annual conference focused on NLP research and applications.

Contribute

Contributions are welcome. Please ensure your submission fully follows the requirements outlined in CONTRIBUTING.md, including formatting, scope alignment, and category placement.

Pull requests that do not adhere to the contribution guidelines may be closed.

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
.github		.github
.editorconfig		.editorconfig
.gitattributes		.gitattributes
CHANGELOG.md		CHANGELOG.md
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
README.md		README.md
awesome-lists.json		awesome-lists.json
check_readme_links.py		check_readme_links.py
lychee.toml		lychee.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Awesome Natural Language Processing (NLP)

Contents

Frameworks and Libraries

Text Processing and Tokenization

Pretrained Language Models

NLP Tasks

Tools and Applications

Datasets

Research Papers

Learning Resources

Books

Community

Contribute

License

About

Uh oh!

Releases 2

Sponsor this project

Uh oh!

Packages

Uh oh!

Contributors

Uh oh!

Languages

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

Awesome Natural Language Processing (NLP)

Contents

Frameworks and Libraries

Text Processing and Tokenization

Pretrained Language Models

NLP Tasks

Tools and Applications

Datasets

Research Papers

Learning Resources

Books

Community

Contribute

License

About

Topics

Resources

Code of conduct

Contributing

Uh oh!

Stars

Watchers

Forks

Releases 2

Sponsor this project

Uh oh!

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages