Second-year B.Tech CSE (Data Science) student at R.C. Patel Institute of Technology, Shirpur. My focus is applied ML and AI systems — building pipelines that work under real-world constraints, not just on clean benchmarks. I want to understand where models fail, how to connect LLMs to actual data, and what it takes to ship inference APIs that hold up in production.
Interned at Vault of Code twice — first as a Software Development Intern building production web interfaces, then as an AI & Prompt Engineering Intern designing LLM pipelines that reduced generation latency by 15%.
Document authenticity verification system using deep learning. Achieves 84%+ classification accuracy on 150+ document images with a live Flask REST API serving real-time predictions.
Architecture:
- EfficientNetB0 transfer learning (fine-tuned from ImageNet weights) chosen after benchmarking against simpler CNN baselines on accuracy vs. inference speed tradeoff
- OpenCV preprocessing pipeline: noise removal, adaptive thresholding, and augmentation applied before model input to improve robustness across document quality variations
- Flask REST API with a responsive web interface — returns prediction label and confidence score on each request
Three decisions I had to think through:
How deep to fine-tune — Freezing all EfficientNetB0 layers gave lower accuracy; fine-tuning the top layers pushed it to 84%+. The risk was overfitting on a small dataset (150+ images), managed with dropout and data augmentation on the minority fraud class.
Preprocessing as a first-class concern — Early runs showed high variance on low-quality scans. Adding adaptive thresholding and noise reduction in OpenCV before inference stabilised predictions significantly. The preprocessing pipeline ended up being as important as the model architecture.
API design for live inference — Chose Flask over a heavier framework to keep the serving layer lightweight. The REST API accepts an image upload, runs the full preprocessing + inference pipeline, and returns structured JSON with label and confidence — usable directly from a frontend or another service.
Natural language analytics platform — upload a CSV or Excel file, ask questions in plain English, get SQL results and visualisations back. No SQL knowledge required.
Architecture:
- Groq API (Llama 3.3 70B) translates NLP prompts into optimised SQL queries in real time — the LLM acts as a query compiler, not a chatbot
- DuckDB runs the generated SQL directly on the in-memory dataframe — no external database needed, low latency, works entirely on the uploaded file
- Streamlit frontend handles file upload, chat interface, and dynamic chart rendering in a single script
What made this interesting:
LLM as a query layer — The core design decision was treating the LLM as a SQL translator rather than a general assistant. This keeps outputs structured and auditable: every answer traces back to a SQL query the user can inspect.
DuckDB for in-process analytics — Using DuckDB meant I could run analytical SQL on Pandas DataFrames without spinning up a database server. It handles aggregations and joins on uploaded files in milliseconds, which matters for a demo-able interactive tool.
Personalised learning path recommender that generates career-aligned course sequences from a user's skill profile and interests.
Architecture:
- Collaborative and content-based filtering via Scikit-learn — hybrid approach handles cold-start (new users with no history) better than either method alone
- MongoDB stores user skill profiles, interaction history, and course metadata — document structure fits naturally since user profiles are heterogeneous
- Flask REST API exposes recommendation endpoints consumed by a JavaScript frontend; career roadmap generation is a separate endpoint that chains recommendations into a learning sequence
AI & Prompt Engineering Intern — Vault of Code (Jan 2025 – Mar 2025)
Designed structured prompt engineering pipelines to automate content generation workflows using LLM APIs. Built reusable prompt templates that improved response accuracy and reduced generation latency by 15%. Integrated AI models into internal tools, cutting manual effort in document processing and data extraction.
Software Development Intern — Vault of Code (Jun 2024 – Aug 2024)
Developed responsive UI components in HTML, CSS, and JavaScript for the EDITKARO.IN production platform. Optimised cross-device layout and visual consistency across 10+ web pages. Collaborated via Git/GitHub for version control and iterative design improvements.
| Domain | Stack |
|---|---|
| Languages | Python · Java · C · PHP |
| ML & AI | TensorFlow/Keras · Scikit-learn · OpenCV · NumPy · Pandas |
| NLP & LLMs | Groq API (Llama 3.3 70B) · Prompt Engineering · LLM Pipelines |
| Backend | Flask · REST APIs |
| Frontend | HTML5 · CSS3 · JavaScript · Streamlit |
| Databases | MySQL · MongoDB · SQLite · DuckDB |
| Tools | Git · GitHub · Docker · Postman · VS Code |
| Concepts | Deep Learning · Transfer Learning · NLP · OOP · Data Structures |
- Extending the document fraud detection system with broader document type support
- Exploring MLOps fundamentals — model versioning, monitoring, and CI/CD for ML pipelines
- Preparing for Amazon ML Summer School 2026 — revisiting probability, optimisation, and deep learning fundamentals
- Contributing to Data Polaris, the AI & Data Science club at RCPIT
- Looking for ML / Data Science / Data Analyst internship roles (remote preferred)
- 🧩 500+ problems solved on CodeChef — consistent practice on Data Structures & Algorithms
- 🤖 National Finalist — IIT Indore Robo Soccer Competition
- ☁️ Google Cloud Arcade Trooper Milestone
- 🛠️ AWS AI for Bharat Hackathon participant — built a Government Schemes Chatbot

