LoRA post-training and inference acceleration for e-commerce buyer intent classification.
This project turns a classroom notebook workflow into a reproducible LLM engineering pipeline:
- Build zero-shot and few-shot prompting baselines.
- Fine-tune an open LLM with LoRA / PEFT for intent classification.
- Evaluate label accuracy, macro F1, weighted F1, and error cases.
- Benchmark local Hugging Face generation against vLLM batch inference.
Given a buyer message, classify it into exactly one intent:
- Product Details
- Product Condition
- Product Availability
- Irrelevant Intent
- Prompt Injection
- Offensive Intent
- Price Negotiation
The task is intentionally practical: the model must return only one normalized label, even when a query mixes multiple signals such as product questions and prompt injection.
flowchart LR
A["Buyer Intent CSV<br/>Query + Intent + DatasetType"] --> B["Data Preparation<br/>clean + split train/valid/test"]
B --> C["Prompt Baselines<br/>zero-shot + few-shot"]
B --> D["LoRA SFT<br/>PEFT adapter training"]
C --> E["Evaluation<br/>accuracy + macro F1 + errors"]
D --> E
D --> F["Fine-tuned Model<br/>base LLM + LoRA adapter"]
F --> G["HF Transformers Inference"]
F --> H["vLLM Batch Inference"]
G --> I["Benchmark<br/>latency + throughput"]
H --> I
llm-intent-posttraining/
├── configs/ # YAML configs for training, evaluation, vLLM inference
├── data/ # Sample data and optional local full dataset
├── scripts/ # Reproducible shell/python entrypoints
├── src/intent_llm/ # Python package
├── tests/ # Lightweight unit tests
├── notebooks/ # Optional experiment notebooks
└── results/ # Metrics and prediction outputs
cd /Users/gongjin/Downloads/LLM_course/llm-intent-posttraining
python -m pip install -e ".[dev]"For vLLM benchmarking on a CUDA GPU machine:
python -m pip install -e ".[vllm]"Set your Hugging Face token:
export HF_TOKEN="your_huggingface_token"Do not commit real tokens. Use .env.example as the template.
A tiny public-safe sample is included:
data/sample_intent_data.csv
For full experiments, copy the course dataset into this project:
python scripts/prepare_data.pyThat creates:
data/buyer_intent_dataset_final.csv
data/processed/train.csv
data/processed/valid.csv
data/processed/test.csv
Edit configs/train_lora.yaml if you want a different base model or output path.
intent-train-lora --config configs/train_lora.yamlDefault base model:
meta-llama/Llama-3.2-3B-Instruct
The trained LoRA adapter is saved to:
outputs/llama3_intent_lora
Evaluate a base model:
intent-evaluate --config configs/eval.yamlEvaluate a LoRA adapter by setting adapter_path in configs/eval.yaml:
adapter_path: outputs/llama3_intent_loraOutputs:
results/eval_predictions.csv
results/eval_predictions.metrics.json
intent-hf-infer \
--model-name meta-llama/Llama-3.2-3B-Instruct \
--data-path data/sample_intent_data.csv \
--output-path results/sample_hf_predictions.csvOn a GPU machine with vLLM installed:
intent-vllm-infer --config configs/inference_vllm.yamlFor a LoRA adapter, set:
adapter_path: outputs/llama3_intent_loraHugging Face Transformers:
intent-benchmark \
--engine hf \
--model-name meta-llama/Llama-3.2-3B-Instruct \
--data-path data/buyer_intent_dataset_final.csv \
--limit 100 \
--batch-size 8vLLM:
intent-benchmark \
--engine vllm \
--model-name meta-llama/Llama-3.2-3B-Instruct \
--data-path data/buyer_intent_dataset_final.csv \
--limit 100Benchmark results append to:
results/benchmark.jsonl
Fill this table after running experiments:
| Method | Model | Adapter | Accuracy | Macro F1 | Avg Latency | Throughput |
|---|---|---|---|---|---|---|
| Zero-shot | Llama-3.2-3B-Instruct | No | TBD | TBD | TBD | TBD |
| Few-shot | Llama-3.2-3B-Instruct | No | TBD | TBD | TBD | TBD |
| LoRA SFT | Llama-3.2-3B-Instruct | Yes | TBD | TBD | TBD | TBD |
| LoRA + vLLM | Llama-3.2-3B-Instruct | Yes | TBD | TBD | TBD | TBD |
This repository demonstrates the full post-training lifecycle for a small domain-specific LLM system:
- Prompt engineering baseline
- Parameter-efficient fine-tuning
- Structured evaluation
- Error analysis-ready prediction exports
- Production-oriented inference acceleration with vLLM
It is designed to be portfolio-friendly: the code is modular, secrets are not hardcoded, and large model artifacts are excluded from Git.