Thanks to visit codestin.com
Credit goes to www.libhunt.com

Python Natural Language Processing

Open-source Python projects categorized as Natural Language Processing

Top 23 Python Natural Language Processing Projects

Natural Language Processing
  1. transformers

    🤗 Transformers: the model-definition framework for state-of-the-art machine learning models in text, vision, audio, and multimodal models, for both inference and training.

    Project mention: Using “ibm-granite/granite-speech-3.3–8b” 🪨 for ASR | dev.to | 2025-11-02

    python3.12 -m venv new_venv_312 source new_venv_312/bin/activate pip install --upgrade pip pip install https://github.com/huggingface/transformers/archive/main.zip torchaudio peft soundfile torchcodec ### and also pip install librosa

  2. Stream

    Stream - Scalable APIs for Chat, Feeds, Moderation, & Video. Stream helps developers build engaging apps that scale to millions with performant and flexible Chat, Feeds, Moderation, and Video APIs and SDKs powered by a global edge network and enterprise-grade infrastructure.

    Stream logo
  3. funNLP

    中英文敏感词、语言检测、中外手机/电话归属地/运营商查询、名字推断性别、手机号抽取、身份证抽取、邮箱抽取、中日文人名库、中文缩写库、拆字词典、词汇情感值、停用词、反动词表、暴恐词表、繁简体转换、英文模拟中文发音、汪峰歌词生成器、职业名称词库、同义词库、反义词库、否定词库、汽车品牌词库、汽车零件词库、连续英文切割、各种中文词向量、公司名字大全、古诗词库、IT词库、财经词库、成语词库、地名词库、历史名人词库、诗词词库、医学词库、饮食词库、法律词库、汽车词库、动物词库、中文聊天语料、中文谣言数据、百度中文问答数据集、句子相似度匹配算法集合、bert资源、文本生成&摘要相关工具、cocoNLP信息抽取工具、国内电话号码正则匹配、清华大学XLORE:中英文跨语言百科知识图谱、清华大学人工智能技术系列报告、自然语言生成、NLU太难了系列、自动对联数据及机器人、用户名黑名单列表、罪名法务名词及分类模型、微信公众号语料、cs224n深度学习自然语言处理课程、中文手写汉字识别、中文自然语言处理 语料/数据集、变量命名神器、分词语料库+代码、任务型对话英文数据集、ASR 语音数据集 + 基于深度学习的中文

  4. crewAI

    Framework for orchestrating role-playing, autonomous AI agents. By fostering collaborative intelligence, CrewAI empowers agents to work together seamlessly, tackling complex tasks.

    Project mention: My Top Open-Source AI Tools for Building Smarter in 2025 | dev.to | 2025-08-14

    GitHub - crewAIInc/crewAI

  5. HanLP

    Natural Language Processing for the next decade. Tokenization, Part-of-Speech Tagging, Named Entity Recognition, Syntactic & Semantic Dependency Parsing, Document Classification

  6. Jieba

    结巴中文分词

    Project mention: Show HN: Mandarin Word Segmenter with Translation | news.ycombinator.com | 2025-02-04

    Thanks for the kind words!

    I'm using Jieba[0] because it hits a nice balance of fast and accurate. But I'm initializing it with a custom dictionary (~800k entries), and have added several layers of heuristic post-segmentation. For example, Jieba tends to split up chengyu into two words, but I've decided they should be displayed as a single word, since chengyu are typically a single entry in dictionaries.

    [0] https://github.com/fxsjy/jieba

  7. spaCy

    💫 Industrial-strength Natural Language Processing (NLP) in Python

    Project mention: Strengthening Open-Source Integrity: My First Contribution to spaCy | dev.to | 2025-10-28

    🔗 Pull Request: #13877 — Remove spaCy Quickstart from Universe/Courses due to spam redirect

  8. d2l-en

    Interactive deep learning book with multi-framework code, math, and discussions. Adopted at 500 universities from 70 countries including Stanford, MIT, Harvard, and Cambridge.

  9. InfluxDB

    InfluxDB – Built for High-Performance Time Series Workloads. InfluxDB 3 OSS is now GA. Transform, enrich, and act on time series data directly in the database. Automate critical tasks and eliminate the need to move data externally. Download now.

    InfluxDB logo
  10. Resume-Matcher

    Improve your resumes with Resume Matcher. Get insights, keyword suggestions and tune your resumes to job descriptions.

    Project mention: Ask HN: Someone has committed 20K+ LoC to a PR, exhausting my CI a& AI workflows | news.ycombinator.com | 2025-08-26

    I'm maintaining an OSS project, and someone raised a PR a few days earlier, and since then, 20K+ LoC has been added to the PR. There are two new accounts, but they lack details on how to contact them, only providing usernames.

    PR: https://github.com/srbhr/Resume-Matcher/pull/497

    Accounts:

  11. NLP-progress

    Repository to track the progress in Natural Language Processing (NLP), including the datasets and the current state-of-the-art for the most common NLP tasks.

  12. datasets

    🤗 The largest hub of ready-to-use datasets for AI models with fast, easy-to-use and efficient data manipulation tools

    Project mention: Training with Big Data on Any Cloud | dev.to | 2025-06-20

    Hugging Face Datasets -- the library that lets you download and manage datasets from the Hugging Face Hub, as well as being a convenient vendor-neutral interface for your own datasets.

  13. rasa

    💬 Open source machine learning framework to automate text- and voice-based conversations: NLU, dialogue management, connect to Slack, Facebook, and more - Create chatbots and voice assistants

    Project mention: Eliza Reanimated Published in IEEE Annals of the History of Computing | news.ycombinator.com | 2025-06-20

    Right before LLMs broke into the scene we had a few techniques I was aware of:

    * Personality Forge uses a rules-based scripting approach [0]. This is basically ELIZA extended to take advantage of modern processing power.

    * Rasa [1] used traditional NLP/NLU techniques and small-model ML to match intents and parse user requests. This is the same kind of tooling that Google/Alexa historically used, just without the voice layer and with more effort to keep the context in mind.

    Rasa is actually open source [2], so you can poke around the internals to see how it's implemented. It doesn't look like it's changed architecture substantially since the pre-LLM days. Rhasspy [3] (also open source) uses similar techniques but in the voice assistant space rather than as a full chatbot.

    [0] https://www.personalityforge.com/developers/how-to-build-cha...

    [1] https://web.archive.org/web/20200801000000*/https://rasa.com... (old link because Rasa's marketing today is ambiguous about whether they're adding LLMs now).

    [2] https://github.com/RasaHQ/rasa

    [3] https://rhasspy.readthedocs.io/en/latest/

  14. Ciphey

    ⚡ Automatically decrypt encryptions without knowing the key or cipher, decode encodings, and crack hashes ⚡

  15. Qwen

    The official repo of Qwen (通义千问) chat & pretrained large language model proposed by Alibaba Cloud.

    Project mention: Running Qwen, Nearly as Powerful as DeepSeek, on a MacBook Pro | dev.to | 2025-02-05

    Qwen (Qwen GitHub Repository) has been gaining attention recently as a powerful open-source large language model (LLM). I decided to give it a spin on my MacBook Pro using Ollama, a platform designed for running local LLMs. While Qwen2.5-Max boasts the highest performance, my setup could only handle the smaller Qwen2.5 (32B) model. Here's what I found!

  16. DocsGPT

    Private AI platform for agents, assistants and enterprise search. Built-in Agent Builder, Deep research, Document analysis, Multi-model support, and API connectivity for agents.

    Project mention: 15 AI tools that almost replace a full dev team but please don’t fire us yet | dev.to | 2025-05-03

    DocsGPT: Lets users query your docs using GPT.

  17. gensim

    Topic Modelling for Humans

  18. camel

    🐫 CAMEL: The first and the best multi-agent framework. Finding the Scaling Law of Agents. https://www.camel-ai.org

    Project mention: Revisiting Minsky's Society of Mind in 2025 | news.ycombinator.com | 2025-06-18

    It seems like you might be confusing "research programs with things like "branding" and superficial terminology. Here, enjoy this thing clearly building on SoM and edited earlier this week: ideas https://github.com/camel-ai/camel/blob/master/camel/societie...

  19. NLTK

    NLTK Source

    Project mention: What is the Most Effective AI Tool for App Development Today? | dev.to | 2025-08-17

    At the core of many AI-powered applications are foundational models—large language models (LLMs) and APIs that provide the intelligence for features like natural language processing, image recognition, and decision-making. These tools serve as the brain of the app, processing inputs and generating outputs that feel intuitive and human-like.

  20. flair

    A very simple framework for state-of-the-art Natural Language Processing (NLP)

    Project mention: WhisperNER: Unified Open Named Entity and Speech Recognition | news.ycombinator.com | 2024-11-21

    only the last string is a LOC named entity. Of course you can change definitions from the standard ones if you like, but then you should be careful not to compare with tools that use the original standard definition of NER such as flairNLP [1].

    [1] https://github.com/flairNLP/flair?tab=readme-ov-file

  21. MOSS

    An open-source tool-augmented conversational language model from Fudan University

  22. LLMSurvey

    The official GitHub page for the survey paper "A Survey of Large Language Models".

  23. ludwig

    Low-code framework for building custom LLMs, neural networks, and other AI models

  24. doccano

    Open source annotation tool for machine learning practitioners.

  25. autogluon

    Fast and Accurate ML in 3 Lines of Code

    Project mention: Gluon: a GPU programming language based on the same compiler stack as Triton | news.ycombinator.com | 2025-09-17

    Amazon (+ Microsoft) already released a language for ML called gluon 8 years ago: https://aws.amazon.com/blogs/aws/introducing-gluon-a-new-lib...

    autogluon is popular as well: https://github.com/autogluon/autogluon

  26. SaaSHub

    SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives

    SaaSHub logo
NOTE: The open source projects on this list are ordered by number of github stars. The number of mentions indicates repo mentiontions in the last 12 Months or since we started tracking (Dec 2020).

Python Natural Language Processing discussion

Log in or Post with

Python Natural Language Processing related posts

  • Learning to Model the World with Language

    1 project | news.ycombinator.com | 6 Nov 2025
  • Using “ibm-granite/granite-speech-3.3–8b” 🪨 for ASR

    1 project | dev.to | 2 Nov 2025
  • Strengthening Open-Source Integrity: My First Contribution to spaCy

    1 project | dev.to | 28 Oct 2025
  • 5 Ways to Detect AI Agent Hallucinations

    1 project | dev.to | 26 Oct 2025
  • Updating ASR examples in Hugging Face Transformers Hub datasets, clearer args, smoother Windows setup

    1 project | dev.to | 30 Sep 2025
  • A Simple Guide to Keyword Clustering with spaCy

    1 project | dev.to | 15 Sep 2025
  • Wikipedia survives while the rest of the internet breaks

    1 project | news.ycombinator.com | 4 Sep 2025
  • A note from our sponsor - Stream
    getstream.io | 16 Nov 2025
    Stream helps developers build engaging apps that scale to millions with performant and flexible Chat, Feeds, Moderation, and Video APIs and SDKs powered by a global edge network and enterprise-grade infrastructure. Learn more →

Index

What are some of the best open-source Natural Language Processing projects in Python? This list will help you:

# Project Stars
1 transformers 152,508
2 funNLP 77,063
3 crewAI 40,292
4 HanLP 35,840
5 Jieba 34,551
6 spaCy 32,785
7 d2l-en 26,601
8 Resume-Matcher 23,901
9 NLP-progress 22,963
10 datasets 20,844
11 rasa 20,840
12 Ciphey 20,165
13 Qwen 19,710
14 DocsGPT 17,365
15 gensim 16,267
16 camel 14,781
17 NLTK 14,382
18 flair 14,324
19 MOSS 12,049
20 LLMSurvey 11,956
21 ludwig 11,616
22 doccano 10,381
23 autogluon 9,564

Sponsored
Stream - Scalable APIs for Chat, Feeds, Moderation, & Video.
Stream helps developers build engaging apps that scale to millions with performant and flexible Chat, Feeds, Moderation, and Video APIs and SDKs powered by a global edge network and enterprise-grade infrastructure.
getstream.io

Did you know that Python is
the 2nd most popular programming language
based on number of references?