Iโm an engineering leader and hands-on AI/ML builder, focused on deeply understanding and implementing the nuts and bolts of modern LLMs. From training tokenizers and building BPE from scratch, to implementing transformers line-by-line, I love demystifying AIโone project and one workshop at a time.
Project | What I Did |
---|---|
๐งฉ Byte Pair Encoding (BPE) from Scratch | Wrote a custom BPE tokenizer in Python with support for special tokens, regex splitting, and vocab merging. |
๐ง LLM from Scratch | Implemented a transformer-based LLM (embedding โ attention โ MLP โ logits) using only PyTorch. Includes training loop, sampling, and inference. |
๐ฆ Agentic RAG Pipeline | Built end-to-end Retrieval-Augmented Generation workflows using LangChain, DuckDB, FAISS, and streaming token-by-token inference. |
๐ Text-to-SQL for CSVs | Built a system to parse natural language queries into SQL and run them over uploaded CSVs. Added Vespa-style search for LIKE queries. |
๐ฎ๐ณ Hindi Tokenizer (WIP) | Training a BPE tokenizer from scratch on Hindi corpora to enable better subword tokenization for Indian languages. |
๐ Secure LLM Workflows | Integrated Cloudflare Zero Trust, IP whitelisting, and API key validation in LangChain-based pipelines. |
๐ฆ SmartInvestReturns | A personal finance site to calculate SIP, retirement corpus, and mutual fund strategies. Built with Next.js & TypeScript. |
-
๐ฅ YouTube Channel โ @VivekNayyar
I create short explainers and tutorials on AI topics like tokenization, transformers, and building your own RAG pipeline.
Recent videos include โNo code whatsapp botโ and โChat with any CSV using langchainโ -
๐ง RAG Beyond Basics Workshop
Covers advanced topics like agentic workflows, text-to-SQL, streaming outputs, observability, and PII-safe deployments.
Delivered at internal events, React Summit 2024, and community meetups.
- ๐บ YouTube
- ๐ LinkedIn
- ๐ฆ Twitter / X (if you use one)
- ๐ท Substack (for AI content)