This project tackles the task of building and training a large language model (LLM) from scratch. The model is trained on the FNSPID dataset, which aggregates financial news articles, to answer the question of whether it is possible to train a transformer that completes sentences similar to those of a financial news article. We transformed and tokenized the FNSPID dataset in a way that can be ingested by a transformer, built a transformer architecture in PyTorch, and trained the transformer using a GPU and random search. In the end, we obtained a model with a perplexity of 13.69 that can complete sentences in a way that appears reasonable to a human and that sound like financial news.
-
Notifications
You must be signed in to change notification settings - Fork 0
hugolatendresse/news-llm
Folders and files
| Name | Name | Last commit message | Last commit date | |
|---|---|---|---|---|
Repository files navigation
About
Build an LLM from scratch and train it
Resources
Stars
Watchers
Forks
Releases
No releases published
Packages 0
No packages published