Harsh jain Harsh-4210

Applied ML & AI Engineer · Adversarial RL · LLM Fine-Tuning · Production ML Systems

B.E. Artificial Intelligence & Data Science @ SPPU · GPA 8.75/10

I train RL agents that learn what humans never state, and ship ML systems that survive production.

🧠 Flagship Project — ConflictBench

Business instructions contradict. ConflictBench teaches LLMs to resolve them.

ConflictBench is an RL environment that trains language models to resolve contradictory business directives by discovering an implicit 6-tier authority hierarchy — Legal > C-Suite > VP > Director > Team Lead > IC — entirely from reward signal. The hierarchy is never stated in the prompt; the model discovers it through episodes of 8–28 directives with 2–6 embedded conflict pairs.

┌──────────────────────────────────────────────────────────────────┐
│  Scenario Generator  →  8–28 directives, 2–6 conflict pairs      │
│  Reward Function     →  5-rubric deterministic (no LLM judge)    │
│  Training            →  GRPO + LoRA (r=32) on Qwen2.5-3B         │
│  Hardware            →  Single A100 48GB · 2 epochs · 400 scenes │
│  Output              →  Conflict-free resolution + JSON schema   │
└──────────────────────────────────────────────────────────────────┘

Metric	Result
Composite reward lift	0.14 → 0.50 (+257%) over zero-shot baseline
Reward rubrics	Correctness · Contradiction-freedom · F1 · Efficiency · Schema
Training	GRPO + LoRA (r=32) on Qwen2.5-3B, A100 48GB
Recognition	Finalist — Meta × PyTorch × HuggingFace OpenEnv Hackathon, Bangalore

🔬 Projects

ARMSRACE — Adversarial Oversight Arena

Two-agent zero-sum adversarial loop for hallucination detection — Red Agent generates plausible silent-failure hallucinations, Blue Agent acts as a factual gatekeeper.

Expert Correction Training (ECT): converts failed RL steps into supervised signal, preventing policy collapse
Hallucination detection: 25% → 100% with 4% false-alarm rate
96% OOD generalisation across unseen domains
Asymmetric rewards (TP+0.6, FP−2.0, FN−0.6) + zero-sum ELO tracking

PPO LoRA PEFT REINFORCE SFT

TraceLink — Manufacturing Traceability

Production-deployed system for full forward/backward traceability across 6 entity types — raw material lots to customer dispatch orders.

6-role RBAC with Firebase ID-token verification
CSV ingestion with full rollback + request-level audit trail
Natural-language AI query endpoint for non-technical users
Containerised (Bun + FastAPI) → deployed on Render with auto-deploy

FastAPI React Firebase Auth SQLite Docker

Arivon — Adaptive Learning Platform

Detects metacognitive miscalibration — when a student's confidence diverges from actual performance — and dynamically adjusts learning paths.

Bloom's taxonomy difficulty engine
Voice-based exam interface via Groq Whisper
RAG-powered study mentor (Haystack) + React Flow knowledge graph
🥉 3rd Place — Pragyantra, PES Modern College of Engineering

Next.js 15 FastAPI Groq Whisper Haystack RAG MongoDB Redis

SO₂ Emission Prediction System

End-to-end ML pipeline predicting SO₂ emissions from Indian coal power plants, deployed as a containerised microservice.

85% accuracy on held-out test data via cross-validation
Optuna-based hyperparameter tuning on XGBoost
20% efficiency boost through feature engineering + pipeline automation
Comprehensive REST API with structured error handling

XGBoost FastAPI Docker PostgreSQL Optuna

🏆 Hackathons & Awards

🥇	Finalist — Meta × PyTorch × HuggingFace OpenEnv Hackathon, Bangalore	`ConflictBench`
🏅	Top 100 — Scaler School of Technology OpenEnv Pre-Selection	`ConflictBench`
🥉	3rd Place — Pragyantra, PES Modern College of Engineering	`Arivon`

⚙️ Tech Stack

LANGUAGES    = ["Python", "SQL", "JavaScript", "TypeScript"]

ML_RL        = ["PyTorch", "GRPO", "PPO", "LoRA/QLoRA", "TRL", "Unsloth",
                "Ray RLlib", "HuggingFace Transformers", "PEFT"]

VISION       = ["YOLOv8", "ONNX Runtime", "OpenCV", "Albumentations"]

LLM_INFRA    = ["RAG Pipelines", "RLHF", "Adversarial RL", "Agentic AI",
                "Haystack", "Groq Whisper"]

BACKEND      = ["FastAPI", "Next.js 15", "React"]

INFRA_DB     = ["Docker", "GitHub Actions", "Google Cloud",
                "PostgreSQL", "MongoDB", "Redis"]

CERTS        = ["Deep Learning Specialization (Andrew Ng)",
                "Generative AI with LLMs (AWS / Coursera)",
                "LLM Fundamentals (Hugging Face)"]

📊 GitHub Stats

📈 Contribution Activity

🎯 Currently

🔬 Building adversarial RL systems and LLM fine-tuning pipelines
🏗️ Shipping production ML with FastAPI + Docker
📖 B.E. AI & Data Science @ SPPU · Open to ML engineering roles & research internships
📬 Reach me: [email protected] · linkedin.com/in/harsh-jain0621

"I don't just train models — I build systems that ship, scale, and survive production."

Provide feedback

Saved searches

Use saved searches to filter your results more quickly