ToyNLP

Implementing classic NLP models from scratch with clean code and easy-to-understand architecture.

This library is for educational purposes only. It is not optimized for production use. And it may contain bugs CURRENTLY, so feel free to contribute and report issues.

Until now, we have only done simple tests, which is not enough. But we will do much more rigorous testing in the future. We will also add more docs so you can run it easily, and add more playgrounds for you to experiment with the models and look inside the model implementations.

Models

8 important NLP models ranging from 2003 to 2018:

Model & Paper	Code	Doc(EN)	Blog(ZH)
NNLM(2003)
Word2Vec(2013)
Seq2Seq(2014)
Attention(2014)
fastText(2016)
Transformer(2017)
GPT(2018)
BERT(2018)

FAQ

I find there are DIFFERENCES between the implementations in toynlp and original papers.

Yes, there are some differences in the implementations. The goal of toynlp is to provide a simple and educational implementation of these models, which may not include all the optimizations and features in the original papers.

The reason is that I want to focus on the core ideas and concepts behind each model, rather than getting bogged down in implementation details, especially when the original papers may introduce complexities that are not essential for understanding the main contributions of the work.

However, I do need to add docs for each model to clarify these differences and provide guidance on how to use the implementations effectively. I'll do this later. Let's first make it work and then make it better.

Where is GPT-2 and other LLMs?

Well, it's in toyllm! I separated the models into two libraries, toynlp for traditional "small" NLP models and toyllm for LLMs, which are typically larger and more complex.

Like the "toy" style, is there anything else?

Glad you asked! The "toy" style is all about simplicity and educational value. We have two other toys besides toynlp and toyllm: toyml for traditional machine learning models, and toyrl for deep reinforcement learning models.

Name		Name	Last commit message	Last commit date
Latest commit History 50 Commits
.github		.github
docs		docs
playground		playground
tests		tests
toynlp		toynlp
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
.python-version		.python-version
LICENSE		LICENSE
README.md		README.md
justfile		justfile
mkdocs.yml		mkdocs.yml
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

ToyNLP

Models

FAQ

I find there are DIFFERENCES between the implementations in toynlp and original papers.

Where is GPT-2 and other LLMs?

Like the "toy" style, is there anything else?

About

Uh oh!

Releases 4

Uh oh!

Contributors 4

Uh oh!

Languages

License

ai-glimpse/toynlp

Folders and files

Latest commit

History

Repository files navigation

ToyNLP

Models

FAQ

I find there are DIFFERENCES between the implementations in toynlp and original papers.

Where is GPT-2 and other LLMs?

Like the "toy" style, is there anything else?

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 4

Uh oh!

Contributors 4

Uh oh!

Languages