Compact, reproducible experiments demonstrating how to evaluate LLM-based classification strategies (zero-shot, one-shot, few-shot) and compare their performance.
Why this repo matters (for recruiters)
- Shows end-to-end LLM integration: adapter, data pipeline, experiment drivers, and evaluation.
- Demonstrates prompt engineering and measurement of outcomes (accuracy / F1 / comparison reports).
- Produces reproducible artifacts so results can be audited by reviewers.
Core features
- Experiment drivers:
zero_shot_classification.py,one_shot_classification.py,few_shot_classification.py. - LLM client adapter:
gemini_client.py(reads credentials from environment). - Evaluation utilities:
evaluate_llm_predictions.py,compare_results.py,analyze_saved_results.py. - Sanity checks:
test_api_keys.py,test_setup.py. - Results: CSV/JSON artifacts written to
results/(sample artifacts included).
Quick start (Windows PowerShell)
- Create environment and install deps
python -m venv .venv
.\.venv\Scripts\Activate.ps1
pip install -r requirements.txt- Add credentials
Copy .env.example to .env and fill in your API key and any provider URL. Do NOT commit secrets.
- Validate
python test_api_keys.py
python test_setup.py- Run an experiment
python zero_shot_classification.py
python one_shot_classification.py
python few_shot_classification.py- Analyze results
python run_analysis.py
python show_results.pyFiles & what to look at
gemini_client.py— where API calls are made; first place to add retries or change provider settings.data_loader.py— dataset expectations and preprocessing.*_classification.py— experiment drivers that generate outputs and save toresults/.results/— contains sample outputs: predictions CSVs and metrics JSON files.
Outputs produced
*_results_*.csv— model predictions with metadata.*_metrics_*.json— computed evaluation metrics for a run.comparison_report_*.json— aggregated comparisons across runs.
Design notes
- Keep prompts modular and colocated with the driver scripts for easy experimentation.
- Store full raw outputs and parsed labels to allow post-hoc analysis.
- Keep client adapter minimal so it can be swapped for other providers.
Recommended next steps (I can implement any of these)
- Add
.github/workflows/ci.ymlto runtest_setup.pyon pushes. - Add a short
results/README.mdsummarizing included sample outputs. - Add thorough retry/backoff in
gemini_client.pyfor production stability.
Contributing
- Fork and branch.
- Run tests and add new ones for new behavior.
- Submit a PR describing changes and sample outputs where relevant.
License
Add a LICENSE file or update this section.