I build end-to-end AI systems, from RL research and agentic pipelines and production deployment.
Financial Agentic RAG β LangGraph agent routing financial report queries across Qdrant vector search and SQLite, with CRAG-style relevance loops and SSE streaming to a React frontend.
VoiceNav β LLM browser automation agent with Text vs Vision planner ablation (+17pp task success), two-layer failure attribution, and CDP-based live browser streaming.
Web World Model β RL training system for web navigation agents. SFT + GRPO on Qwen2.5-3B on 2ΓA100 with vLLM rollout serving, 150K synthetic samples β +44% over CoT, 4.4Γ faster than Tree Search.
Spatial VLM Investigator β Spatial reasoning in VLMs via CoT and RL fine-tuning. GRPO beats SFT on OOD generalization (3.17% vs 12.03% ID-OOD gap).
Academic Knowledge Graph β End-to-end KG pipeline: crawling β ontology construction β BERT+BiLSTM+CRF NER β Neo4j β semantic retrieval.
ML Inference on Kubernetes β PyTorch training Job + inference Deployment on GKE, shared PersistentVolume, liveness/readiness probes, LoadBalancer REST API.

