B.E. Artificial Intelligence & Data Science @ SPPU · GPA 8.75/10
I train RL agents that learn what humans never state, and ship ML systems that survive production.
Business instructions contradict. ConflictBench teaches LLMs to resolve them.
ConflictBench is an RL environment that trains language models to resolve contradictory business directives by discovering an implicit 6-tier authority hierarchy — Legal > C-Suite > VP > Director > Team Lead > IC — entirely from reward signal. The hierarchy is never stated in the prompt; the model discovers it through episodes of 8–28 directives with 2–6 embedded conflict pairs.
┌──────────────────────────────────────────────────────────────────┐
│ Scenario Generator → 8–28 directives, 2–6 conflict pairs │
│ Reward Function → 5-rubric deterministic (no LLM judge) │
│ Training → GRPO + LoRA (r=32) on Qwen2.5-3B │
│ Hardware → Single A100 48GB · 2 epochs · 400 scenes │
│ Output → Conflict-free resolution + JSON schema │
└──────────────────────────────────────────────────────────────────┘
| Metric | Result |
|---|---|
| Composite reward lift | 0.14 → 0.50 (+257%) over zero-shot baseline |
| Reward rubrics | Correctness · Contradiction-freedom · F1 · Efficiency · Schema |
| Training | GRPO + LoRA (r=32) on Qwen2.5-3B, A100 48GB |
| Recognition | Finalist — Meta × PyTorch × HuggingFace OpenEnv Hackathon, Bangalore |
|
Two-agent zero-sum adversarial loop for hallucination detection — Red Agent generates plausible silent-failure hallucinations, Blue Agent acts as a factual gatekeeper.
|
Production-deployed system for full forward/backward traceability across 6 entity types — raw material lots to customer dispatch orders.
|
|
Detects metacognitive miscalibration — when a student's confidence diverges from actual performance — and dynamically adjusts learning paths.
|
End-to-end ML pipeline predicting SO₂ emissions from Indian coal power plants, deployed as a containerised microservice.
|
| 🥇 | Finalist — Meta × PyTorch × HuggingFace OpenEnv Hackathon, Bangalore | ConflictBench |
| 🏅 | Top 100 — Scaler School of Technology OpenEnv Pre-Selection | ConflictBench |
| 🥉 | 3rd Place — Pragyantra, PES Modern College of Engineering | Arivon |
LANGUAGES = ["Python", "SQL", "JavaScript", "TypeScript"]
ML_RL = ["PyTorch", "GRPO", "PPO", "LoRA/QLoRA", "TRL", "Unsloth",
"Ray RLlib", "HuggingFace Transformers", "PEFT"]
VISION = ["YOLOv8", "ONNX Runtime", "OpenCV", "Albumentations"]
LLM_INFRA = ["RAG Pipelines", "RLHF", "Adversarial RL", "Agentic AI",
"Haystack", "Groq Whisper"]
BACKEND = ["FastAPI", "Next.js 15", "React"]
INFRA_DB = ["Docker", "GitHub Actions", "Google Cloud",
"PostgreSQL", "MongoDB", "Redis"]
CERTS = ["Deep Learning Specialization (Andrew Ng)",
"Generative AI with LLMs (AWS / Coursera)",
"LLM Fundamentals (Hugging Face)"]- 🔬 Building adversarial RL systems and LLM fine-tuning pipelines
- 🏗️ Shipping production ML with FastAPI + Docker
- 📖 B.E. AI & Data Science @ SPPU · Open to ML engineering roles & research internships
- 📬 Reach me: [email protected] · linkedin.com/in/harsh-jain0621