This repository contains the code used to produce the models and data in the blog post Language Models vs. The SAT Reading Test.
Dataset: emozilla/sat-reading Models: XXL (11B) XL (3B) Large (780M) Base (350M)
| File | Description |
|---|---|
| combine-raw-data.py | Combine data in raw-data folder into a single JSON |
| create-dataset.py | Create datasets-compatible datasets from combined JSON |
| process-dataset-for-training.py | Create a tokenized version of an existing dataset for training |
| prompt-loop.py | Playground for loading and prompting models |
| take-tests.py | Evaluate models against a dataset |
| train.py | Finetune a FLAN-T5 model |
To check the generalization of finetuned models, install lm-evaluation-harness and run it on the SuperGLUE metrics: cb, copa, superglue_rte, wic, and wsc (and any other metrics you'd like, of course).