Fine-Tuning Text Pre-Trained Transformers to In-Context Learn Simple Function Classes.

CS 182/282A Deep Learning Final Project

Project Team:

William Zhang, EECS Graduate, [email protected]
Yuanbo Chen, EECS Graduate, [email protected]
Eric Tai, EECS Graduate, [email protected]
Michelle Wang, EECS Undergraduate, [email protected]

Our project extends from the paper:
What Can Transformers Learn In-Context? A Case Study of Simple Function Classes
Shivam Garg*, Dimitris Tsipras*, Percy Liang, Gregory Valiant
Paper: http://arxiv.org/abs/2208.01066

We find the paper's practical value limited due to its training approach, which involves starting from scratch. Instead, our focus is on exploring the in-context learning capability of a text pre-trained language model, specifically GPT-2, in the context of a linear regression task.

Given the substantial domain shift, we propose two inference approaches. The first adheres to the methodology in the referenced paper, utilizing vectorized xy-pairs as inputs. The second transforms vectorized xy-pairs into a single line of text. Our experimentation reveals that the second approach fails, while the first approach successfully learns the linear regression task. This outcome underscores the presence of valuable knowledge in text pre-trained weights that can be leveraged.

We thus further experiment with various fine-tuning techniques, including full fine-tune, soft prompting, and low rank adaptation (LoRA). Through a comprehensive comparison and analysis of results concerning computation and in-context learning errors, we demonstrate that LoRA is most effective in transferring text pre-trained knowledge to in-context linear regression learning with minimal additional computational cost.

In short, our project explores the in-context learning performance of text pre-trained GPT-2. We use five different approaches:

infer from text pre-trained weights with default (vectorized) input,
infer from text pre-trained weights with devised text input,
full fine-tune,
fine-tune with soft prompting,
fine-tune with low rank adaptation (LoRA).

We focus primarily on the simple linear regression task to compare the results.

Getting started

Clone the repository

git clone https://github.com/MstXy/in-context-learning.git

Install the dependencies using Conda.

conda env create -f environment.yml
conda activate icl

Download model checkpoints and extract them in the current directory under models folder.
```
unzip models.zip
```
To evaluate, use eval.ipynb, each step is already in the notebook.

Note: all metrics have already been computed by the end of each training, so we removed state.pt in the folder to save space.

To train, in conf/wandb.yaml, provide wandb user name for entity. Then for different task, run:

cd src

## Approach: only pre-train
python train.py --config conf/finetune_baseline_linear_regression.yaml

## Approach: full fine-tune
python train.py --config conf/finetune_unfreeze_linear_regression.yaml

## Approach: soft prompting fine-tune
python train.py --config conf/finetune_softprompt_linear_regression.yaml

## Approach: LoRA fine-tune
python train.py --config conf/finetune_lora_linear_regression.yaml

## Approach: train from scratch
python train.py --config conf/linear_regression.yaml

## Approach: text input from pre-trained weights
python train.py --config conf/finetune_textbase_linear_regression.yaml
# note this approach fails to generate numerical output, so will throw out error.
# but it will be fun to run it in debug mode to check the outputs.
# vscode launch.json file is attached.

Name		Name	Last commit message	Last commit date
Latest commit History 27 Commits
.vscode		.vscode
figs		figs
src		src
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
environment.yml		environment.yml
logs.md		logs.md
paper.pdf		paper.pdf
setting.jpg		setting.jpg

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Fine-Tuning Text Pre-Trained Transformers to In-Context Learn Simple Function Classes.

CS 182/282A Deep Learning Final Project

Project Team:

Getting started

About

Uh oh!

Releases

Packages

Languages

License

MstXy/in-context-learning

Folders and files

Latest commit

History

Repository files navigation

Fine-Tuning Text Pre-Trained Transformers to In-Context Learn Simple Function Classes.

CS 182/282A Deep Learning Final Project

Project Team:

Getting started

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages