Codestin Search App

This repository contains the code used to produce the models and data in the blog post Language Models vs. The SAT Reading Test.

Dataset: emozilla/sat-reading Models: XXL (11B) XL (3B) Large (780M) Base (350M)

File	Description
combine-raw-data.py	Combine data in `raw-data` folder into a single JSON
create-dataset.py	Create datasets-compatible datasets from combined JSON
process-dataset-for-training.py	Create a tokenized version of an existing dataset for training
prompt-loop.py	Playground for loading and prompting models
take-tests.py	Evaluate models against a dataset
train.py	Finetune a FLAN-T5 model

To check the generalization of finetuned models, install lm-evaluation-harness and run it on the SuperGLUE metrics: cb, copa, superglue_rte, wic, and wsc (and any other metrics you'd like, of course).

Name		Name	Last commit message	Last commit date
Latest commit History 21 Commits
raw-data		raw-data
results		results
.gitignore		.gitignore
README.md		README.md
combine-raw-data.py		combine-raw-data.py
create-dataset.py		create-dataset.py
ds_flan_t5_z3_config_bf16.json		ds_flan_t5_z3_config_bf16.json
process-dataset-for-training.py		process-dataset-for-training.py
prompt-loop.py		prompt-loop.py
requirements.txt		requirements.txt
take-tests.py		take-tests.py
train.py		train.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

About

Uh oh!

Releases

Packages

Uh oh!

Languages

jquesnelle/sat-reading

Folders and files

Latest commit

History

Repository files navigation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages