Thanks to visit codestin.com
Credit goes to Github.com

Skip to content

Implementation of Low-Rank Adaptation (LoRA) for parameter-efficient fine-tuning of GPT-2 on the SQuAD dataset for question answering, exploring training efficiency, loss masking, and performance metrics like F1 and Exact Match. Final Course project for Deep Learning at University of Kerman, Spring 2025.

License

Notifications You must be signed in to change notification settings

AmirAAZ818/GPT2-LoRA-QA

Repository files navigation

LoRA-FineTuning-GPT2-QA

Overview

This repository contains the implementation of parameter-efficient fine-tuning using Low-Rank Adaptation (LoRA) on the GPT-2 model for extractive question answering tasks with the SQuAD dataset. Weights & Biases (W&B) is used for tracking experiments and logging metrics.

What is LoRA?

LoRA reduces the computational cost of fine-tuning LLMa by updating only a low-rank decomposition of the weight matrix, leaving the original weights frozen. The formula is:

$$W' = W + \Delta W = W + BA$$

where:

  • $W$ is the original weight matrix
  • $\Delta W = BA$ with $B \in \mathbb{R}^{d \times r}$ and $A \in \mathbb{R}^{r \times k}$
  • $r \ll \min(d, k)$ controls the rank, minimizing trainable parameters.

Purpose

This project tackles the computational challenges of fine-tuning large language models like GPT-2 by applying LoRA to adapt only a small subset of parameters while achieving competitive performance on extractive question answering. It evaluates the impact of hyperparameters on convergence, gradient flow, and metrics such as F1-score and Exact Match.

Experiments

The training process used the following hyperparameters, detailed in the table below:

Feature Value
Batch Size 8
Number of Epochs 3
Optimizer AdamW
Learning Rate 0.0001, 0.0002, 0.0005
LoRA Rank 4, 8, 16, 32
Target Modules Attention, Attention + Projection
Alpha 16
LoRA Scaling Factor 16

For instance, the effect of varying target modules on loss is illustrated below, with Attention + Projection showing superior convergence.

Evaluation Loss Train Loss
Evaluation Loss Train Loss

Explore additional visualizations in this Weights & Biases Project!

Results

The best configuration achieved the following performance, summarized in the table below:

LoRA Rank Target Module Learning Rate F1-Score Exact Match (EM)
32 Attention + Projection 0.0005 90.67 80
8 Attention 0.0002 80 80

Setup

Dependencies

  • PyTorch
  • Transformers (Hugging Face)
  • Datasets (Hugging Face)
  • Evaluate (Hugging Face)
  • Weights & Biases (wandb)

Install dependencies via:

pip install -r requirements.txt

Running the Code

  1. Clone the repository:

    git clone https://github.com/AmirAAZ818/GPT2-LoRA-QA.git
    cd GPT2-LoRA-QA
  2. Set up Weights & Biases (optional but recommended for experiment tracking):

    • Sign up at wandb.ai and obtain your API key.
    • Run wandb login and paste your API key.
  3. Run the notebook:

    • Open parameter-effcient-fine-tuning-with-lora_Experiments.ipynb in Jupyter Notebook or Colab.
    • Experiments log metrics (e.g., F1, Exact Match, loss) to W&B; adjust configurations for hyperparameters like rank and learning rate.

About

Implementation of Low-Rank Adaptation (LoRA) for parameter-efficient fine-tuning of GPT-2 on the SQuAD dataset for question answering, exploring training efficiency, loss masking, and performance metrics like F1 and Exact Match. Final Course project for Deep Learning at University of Kerman, Spring 2025.

Topics

Resources

License

Stars

Watchers

Forks