🎉🎉 See you in CVPR'26 (Denver, USA) and AISTATS'26 (Tangier, Morocco)! 🎉🎉
CVPR 2026 Accepted: https://arxiv.org/pdf/2602.23295
1. RAG (Retrieval-Augmented Generation) for Long-Context Queries (32k paragraphs in query space) + LLM (a. Falcon LLM local serving, and b. Openai API) + Semantic Search in Vector DB (Chroma) + End-to-end Deployed on GCP cloud platform
2. Neo4j + FastAPI + Prometheus + Grafana (Observability, Maintainability) + Load Test End-to-end Deployed on AWS cloud platform + Load Testing on AWS
Cloud Deploy: Delivered a semantic search, chat app end-to-end (React UI, Python APIs), deployed on GCP with CI/CD. Backend Maintainability, Scalability: Built an ANN Vector Database (Chroma) over 11 novels with sub-second retrieval; evaluated quality & performance (R@5/10 0.88/0.94, MRR@10 0.82; latency p50/p95 95/220 ms; QPS 14).
[Method] Added 20-line PortAudio guard that raises RunTime Error when an output device vanishes,
[Issue] eliminating an infinite-block bug that could
[Impact] stall Spotify’s Safe-and-Sound pipelines (used to vet 7 M+ podcasts for 696 M MAU)
[Method] Added a gin-configurable HSTU attention backend dispatcher (auto: C++ on Hopper, else logs safe fallback). [Issue] Addresses the H100 efficiency / missing integration called out. [Impact] Enables Hopper installs to use the optimized attention path while preserving public pipelines; HSTU is reported deployed across multiple Meta surfaces serving billions of users, underpins Meta’s large-scale GR stack (incl. recent context-parallel training results), and is referenced by NVIDIA’s RecSys examples.
-
VMAF Netflix — Enhancing Video Streaming Quality
Advanced encoding & evaluation techniques
https://github.com/TeleViaBox/vid-stream-quality-public -
my App 1 — RAG LLM Chat App (GCP)
Retrieval-augmented generation chat application, deployed & hosted on GCP
https://github.com/TeleViaBox/society-llm-opinions-public -
my App 2 — RealEstateProject
Full-stack project
https://github.com/TeleViaBox/RealEstateProject_beautified_beau_new_upload
External Interface (B2B / B2C Users)
- GraphQL API (HTTP): Q&A requests, text uploads, job status queries, and result fetching from frontend/web.
- GraphQL Subscriptions: Real-time updates (job progress, streaming tokens).
Internal Communication (between backend services)
- gRPC: Fast, type-safe calls across embedding, retrieval, generation, classification services.
- Event Bus (SNS→SQS or Kafka/Redpanda): Async, decoupled workflows for indexing, streaming updates, long-job orchestration.
Background Task Processing
- Celery (recommended) or RQ: Long-running tasks like indexing, embedding generation, offline analysis, batch reporting.
Persistent Storage & Dependencies
- Chroma / Vector DB: Embedding storage for semantic retrieval (already in use).
- Redis: Task-queue broker (Celery/RQ), cache, and pub/sub for GraphQL subscriptions.
- (Optional) PostgreSQL: Tenants, billing, quotas, audit logs, job metadata.
- Object Storage (S3 or equivalent): Original files, intermediate artifacts, exported results.
Business Scope (Feb 2024)
1) Metrics: Primary, Secondary, Guardrails
- Primary: single main outcome; attributable, sensitive, stable.
Examples: daily engagement per user, average time spent, 7-day retention. - Secondary: aid understanding & side effects.
Examples: share rate, cold-start content views, creator diversity. - Guardrails: protect UX, performance, and business health.
Examples: latency (p95/p99), crash rate, content quality, policy violations, ad revenue. - Rule: Primary decides go/no-go; guardrails prevent “winning dirty.”
2) Power & Sample Size
- Use historical baselines + calculator/internal tooling.
- Reduce sample size via:
- Trigger-based sampling: include only exposed/affected users.
- Variance reduction (CUPED): pre-experiment behavior covariates.
- Clustering adjustment: feeds aren’t IID → real sample needs may be higher.
4) Feed-Specific Stats Considerations
- Design: user-level randomization; trigger-based (only users who open feed); switchback tests for infra (alternate by time/geography).
- Analysis: aggregate at user-day; cluster-robust SEs; pre-bucket users; IPW/position correction at impression level.
- Leakage control: avoid splitting viral content/creators; for highly connected systems consider ghost experiments or community-level randomization.
Implementation Details (Jan 2024)
1) Trustworthiness: SRM, Stopping Rules, Multiplicity
- SRM: chi-square check for expected control/treatment ratios; investigate routing/triggering if mismatched.
- Stopping rules: pre-register window & analysis; avoid peeking without correction; use alpha-spending or Bayesian approaches.
- Multiple comparisons: one primary metric; FDR (Benjamini–Hochberg) for secondaries/variants; bandits for exploration, confirm final rollout with traditional testing.
2) Practical Execution Plan
- Define goal & thresholds
– e.g., “+1% lift in daily engagement,” MDE=+1%, α=0.05, power=0.8 - Estimate sample size
– historical data → triggered users per group - Randomization
– user-level bucketing; include only triggered users; prevent creator/content leakage - Variance reduction
– use previous 7-day behavior as covariates - Metrics & significance
– aggregate at user-day; cluster-robust t-tests or regression - Health checks
– SRM; latency/crash; guardrails (harmful content, complaints) - Interpretation & rollout
– primary passed + guardrails stable; analyze heterogeneity; roll out / iterate / rollback; optional follow-up for long-term effects (e.g., 28-day retention)
-
Spotify/Pedalboard — Fixed #411
Method: 20-line PortAudio guard that raisesRuntimeErrorwhen an output device vanishes.
Issue: Eliminates an infinite-block bug that could stall Spotify’s Safe-and-Sound pipelines (used to vet 7M+ podcasts for 696M MAU). -
Meta-RecSys/Generative-Recommenders — Fixed #308
Method: Gin-configurable HSTU attention backend dispatcher (auto→C++ on Hopper; safe fallback otherwise) without changing defaults.
Issue: Addresses H100 efficiency / missing integration.
Impact: Hopper gets the optimized attention path without public-pipeline changes; HSTU underpins large-scale GR incl. context-parallel and appears in NVIDIA RecSys examples.
- Languages: C, C++, MATLAB, Python, Java
- Software engineering: OOP, large-scale system scalability design
- Hardware-adjacent: signal & image processing; encoding/decoding computation
- Tools: VMware, Postman
- Light-weight search engine on GCP (Compute Engine) with LLM chat via RAG on Project Gutenberg novels.
Repo - Sepolia smart contract for real-estate trading + full-stack website (Flask) deployed on GCP.
Repo
- https://github.com/TeleViaBox/vidtrans/
- https://github.com/TeleViaBox/vqaeffic/blob/main/README.md
- “not yet for public”: https://github.com/TeleViaBox/leos-vehicle-network
- ML toolbox for information-space search (vector DB, loss functions)
- Dashboard website for easy data import & visualization
- Travel-spot search & visualization
- PageRank & reverse-connected network analysis
- Tabular machine & system-stability controller hyper-parameter optimization (heuristics)
- Course: Operating Systems — pintos-prac
- Course: Computer Security experiments
- Course: Computer Networks experiments
- Course: Algorithm Analysis & Design
- Hard problems solving — repo
- My understanding in computer science — repo
- Java: multimedia 2D Windows desktop application
- Data-science EDA with school course list
- IQA (Image) & VQA (Video) study — repo
- Optimization theory study — repo
- My understanding in Computer Network
- Pac-Man design
- WhatsApp design
- ML aspects
- Search engine aspects
- Algorithm aspects
- System & network aspects: distributed systems, networking, operating systems
My best practice in DevOps: https://gist.github.com/TeleViaBox/f702ec4454e783216f072d7dd43615eb
I’m building the next generation of LLM search and practical AI systems you can actually run.
- 🧠 LLM Search & RAG — retrieval pipelines, vector DBs, hybrid search, agentic workflows
- 🎥 Video IQA/VQA & streaming quality — VMAF, encoding & evaluation tooling
- 📈 Recommendation systems — experiment design, metrics, guardrails, and trustworthy analysis
- 🛠 Distributed backends — gRPC services, event buses, background workers
- 🔗 Blockchain + full-stack demos — Sepolia smart contracts, GCP deployments
- Argmax Flows and Multinomial Diffusion: Learning Categorical Distributions — https://www.youtube.com/watch?v=150ceiAVDCY
- Breaking the Sample Size Barrier in RL via Model-Based Algorithms — https://www.youtube.com/watch?v=7PYfv9KRZfQ
- Improving & Generalizing Flow-Based GMs with Minibatch Optimal Transport (Alex Tong) — https://www.youtube.com/watch?v=UhDtH7Ia9Ag
- Application-oriented goals (e.g., cancer image recognition)
- Integrate & utilize two large repositories (Case One / Case Two)
- High-difficulty replication where needed
- Prep status updates for LeetCode
- Semantic commit messages: https://gist.github.com/joshbuchea/6f47e86d2510bce28f8e7f42ae84c716
