Train a lightweight Phi‑2 model using LoRA to perform chain‑of‑thought reasoning on 2D grid maps. Solve simple maze or spatial planning problems step by step.
- Grid‑based visual reasoning using spatial CoT prompts
- LoRA fine‑tuning to keep compute and model size low
- Example maze puzzles with Web UI (
map_interface.html) - Training and evaluation scripts included (
plot_trainer_state_cli.py,flask_api.py)
Simulate an agent navigating a 10x10 grid using discrete action steps.
The objective is to compare different input formats and reasoning strategies:
- CoT with vector inputs
- NLP-based commands
- Direct vector-to-position reasoning (baseline)
| Model Folder | Format | Output | Description |
|---|---|---|---|
phi2-CoT-finetune5 |
(dx, dy) |
CoT trace + final pos | Full reasoning with 5 starting points |
phi2-NLP-finetune1 |
up/down/... |
CoT trace + final pos | Instruction-following version |
phi2-vec-finetune |
(dx, dy) |
Final position only | Baseline model, no step-by-step explanation |
Each model is under outputs/, and each .bin file is under 100MB.
| Model Folder | Format | Output | Description |
|---|---|---|---|
phi2-CoT-finetune11x11 |
(dx, dy) |
CoT trace + final pos | Trained on 11x11 map-free world, perfect accuracy |
phi2-CoT-finetune11x11_map |
(dx, dy) |
CoT trace + final pos + SG map | Input includes grid map with S, model returns final map with SG |
phi2-Label-finetune1 |
(dx, dy) |
CoT trace + label | Labeled path validity on map with wall (future extension) |
11x11: basic spatial trace task, vector action → position (no map)11x11 map: adds map context to input, model must parse visual structurelabel: data includes correctness classification (correct,loop, etc.)
GPT-CoT/
├── configs/ # LoRA training config files (YAML)
├── data/ # JSONL training files
├── outputs/ # Fine-tuned models (3 total)
│ ├── phi2-CoT-finetune5/
│ ├── phi2-CoT-finetune11x11/
│ ├── phi2-CoT-finetune11x11_map/
│ ├── phi2-Label-finetune1/
│ ├── phi2-NLP-finetune1/
│ └── phi2-vec-finetune/
├── source/ # Training and inference scripts
├── .gitignore
├── README.md
└── requirements.txt
git clone https://github.com/Seanaaa0/GPT-CoT.git
cd GPT-CoT
conda activate gpt-env # or your preferred environment
pip install -r requirements.txt- New fine-tuned model:
phi2-Label-finetune1 - Task: Given a series of vector actions
(dx,dy), reason step-by-step to compute the final position and classify the path as one of: correct,too short,too long,loop,out of bound,wrong- Training data:
10x10_vec_labeled.jsonl - Inference script:
inference_phi2_vec.py - Accuracy: ~95%, supports full CoT + label correctness tracking
- Output example includes
"label"and"correct"field for each prediction
map_interface.html: displays a 10x10 grid and agent paths interactivelyflask_api.py: serves model predictions and links frontend ↔ backend- Future integration with live inference and editing
We provide a Python tool to visualize inference traces from test_label.jsonl.
This script will:
- Parse GPT output traces
- Generate per-sample visualizations
- Combine up to 25 images into a grid
Use plot_trainer_state_cli.py to visualize training loss and gradients:
python plot_trainer_state_cli.py --file results/trainer_state/trainer_stateX.json --metrics 1- Metric options:
1: Loss2: Grad norm3: Learning rate
Output PNG files are saved to results/png/.
cd source/data/test_output
python generate_trace_images.py
---
## TODO
- [✅] Train LoRA on vector trace task
- [✅] NLP command version
- [✅] Multi-entry point generalization
- [✅] Trace classification (valid/invalid)
- [ ] Decision Transformer for path generation
- [ ] Add goal-aware discriminator
---
## 📜 License
MIT