Dataset ideas:
- BoolQ
- IMDB sentiment classification text
- HateXplain
- MASSIVE
Interesting read: https://arxiv.org/abs/2305.07759
@misc{eldan2023tinystoriessmalllanguagemodels, title={TinyStories: How Small Can Language Models Be and Still Speak Coherent English?}, author={Ronen Eldan and Yuanzhi Li}, year={2023}, eprint={2305.07759}, archivePrefix={arXiv}, primaryClass={cs.CL}, url={https://arxiv.org/abs/2305.07759}, }
Current demo ideas:
- character generation
- token generation
- sentence reversal