Thanks to visit codestin.com
Credit goes to github.com

Skip to content
View vivek12345's full-sized avatar

Block or report vivek12345

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this userโ€™s behavior. Learn more about reporting abuse.

Report abuse
vivek12345/README.md

๐Ÿ‘‹ Hi, I'm Vivek Nayyar

๐Ÿš€ Engineering Leader | ๐Ÿง  AI Builder

Iโ€™m an engineering leader and hands-on AI/ML builder, focused on deeply understanding and implementing the nuts and bolts of modern LLMs. From training tokenizers and building BPE from scratch, to implementing transformers line-by-line, I love demystifying AIโ€”one project and one workshop at a time.

๐Ÿง  Projects & Experiments

Project What I Did
๐Ÿงฉ Byte Pair Encoding (BPE) from Scratch Wrote a custom BPE tokenizer in Python with support for special tokens, regex splitting, and vocab merging.
๐Ÿง  LLM from Scratch Implemented a transformer-based LLM (embedding โ†’ attention โ†’ MLP โ†’ logits) using only PyTorch. Includes training loop, sampling, and inference.
๐Ÿฆ™ Agentic RAG Pipeline Built end-to-end Retrieval-Augmented Generation workflows using LangChain, DuckDB, FAISS, and streaming token-by-token inference.
๐Ÿ“Š Text-to-SQL for CSVs Built a system to parse natural language queries into SQL and run them over uploaded CSVs. Added Vespa-style search for LIKE queries.
๐Ÿ‡ฎ๐Ÿ‡ณ Hindi Tokenizer (WIP) Training a BPE tokenizer from scratch on Hindi corpora to enable better subword tokenization for Indian languages.
๐Ÿ” Secure LLM Workflows Integrated Cloudflare Zero Trust, IP whitelisting, and API key validation in LangChain-based pipelines.
๐Ÿ“ฆ SmartInvestReturns A personal finance site to calculate SIP, retirement corpus, and mutual fund strategies. Built with Next.js & TypeScript.

๐ŸŽ“ Workshops & Knowledge Sharing

  • ๐ŸŽฅ YouTube Channel โ†’ @VivekNayyar
    I create short explainers and tutorials on AI topics like tokenization, transformers, and building your own RAG pipeline.
    Recent videos include โ€œNo code whatsapp botโ€ and โ€œChat with any CSV using langchainโ€

  • ๐Ÿง  RAG Beyond Basics Workshop
    Covers advanced topics like agentic workflows, text-to-SQL, streaming outputs, observability, and PII-safe deployments.
    Delivered at internal events, React Summit 2024, and community meetups.


๐Ÿ”— Let's Connect


Pinned Loading

  1. bpe-tokenizer bpe-tokenizer Public

    A pure Python implementation of Byte Pair Encoding (BPE) tokenization, inspired by GPT-4's tokenization approach

    Python

  2. rag-beyond-basics-workshop rag-beyond-basics-workshop Public

    This is repository for rag beyond basics

    Python

  3. gpt2-from-scratch gpt2-from-scratch Public

    A clean, educational implementation of GPT-2 built from scratch using PyTorch. This project demonstrates the architecture and training of transformer-based language models.

    Python

  4. llama-with-gqa-and-rope llama-with-gqa-and-rope Public

    Python

  5. react-polling react-polling Public

    ๐Ÿ”” Polling an api made easy with react-polling

    JavaScript 43 9

  6. babel-plugin-better-async-await babel-plugin-better-async-await Public

    Babel plugin for better error handling using async await

    JavaScript 8