I'm obsessed with the intersection of reliability engineering and autonomous intelligenceβbuilding Agentic AI systems that don't just "chat" but act: diagnosing infrastructure, mitigating incidents, and reasoning over complex telemetry with minimal human-in-the-loop latency.
Currently: Lead Machine Learning Engineer @ ING Nederland
π€ Agentic Architectures
Designing stateful, multi-agent systems and custom orchestration layers to solve non-deterministic problems in SRE and DevOps.
βοΈ LLMOps & Evals
Engineering rigorous evaluation harnesses and CI/CD pipelines for non-deterministic software, ensuring safety and alignment in enterprise deployments.
β‘ ML System Optimization
Accelerating inference (vLLM, TGI, quantization) and training (3D parallelism: DP/TP/PP) for open-weights models (Mistral, Gemma, Llama) to achieve cost-effective scale.
π§ DeepSeek Implementation Series
Built educational implementations of DeepSeek-R1 and DeepSeek-V3.2 from scratchβevery tensor operation, every attention mechanism, every expert routing decision.
Three implementations:
- Rust + Candle (Metal/GPU) β Inference-optimized for Apple Silicon
- PyTorch (CUDA/MPS/CPU) β Distributed training with Flash Attention
- MLX β Native Apple Silicon development
Key architectures implemented:
- Multi-Head Latent Attention (MLA) β 93% KV cache reduction
- DeepSeek Sparse Attention (DSA) β Hybrid local + dilated patterns
- 256-expert MoE with hierarchical routing
- Multi-Token Prediction for improved sample efficiency
- 5D Parallelism (Tensor, Pipeline, Data, Expert, Sequence)
π Current Research
- Model Context Protocol (MCP) for standardizing agent-observability integrations
- GNN-based anomaly detection for 3D parallelism in distributed training systems
- DualPipe implementation for advanced pipeline parallelism
My background spans the full stack of computational hardnessβfrom embedded C++ optimization on ARM microcontrollers to distributed training pipelines on AWS/GCP. This "bits to billions" perspective allows me to build AI systems that are not only intelligent but fundamentally performant and secure.
Tech Stack:
- Languages: Python, Rust, Go, C++
- ML/AI: PyTorch, JAX, CUDA, MLX, Candle, TensorFlow
- LLM Infra: vLLM, TGI, Ray, Modal, DeepSpeed, FSDP, Flash Attention
- MLOps: Kubernetes, Docker, Airflow, MLflow, ZeRO, 3D Parallelism
- Cloud: AWS (GenAI SME), GCP, Azure
βοΈ Technical Writing
17+ articles on MLwithDev (Medium) covering:
- Production MLOps and the messy realities of production AI
- Multimodal AI and advanced architectures
- Security testing of ML systems
- How to make models reliable and handle failures gracefully
- π Employee of the Year 2022 @ Lox Solution
- π AWS Subject Matter Expert β Generative AI Certification
- π 15% accuracy improvement in breast cancer detection using GANs @ ScreenPoint Medical
- π 40% faster insights, 45% reduced delivery time in multi-cloud AI/ML solutions
- β‘ 30% performance gains in production Data&ML systems
I bring the rare ability to understand both cutting-edge AI architectures and the operational realities of serving them at scaleβexactly what's needed to bridge research and production.
My work focuses on the messy realities: How to make models reliable. How to handle failures gracefully. How to build systems that scale beyond proof-of-concept.
I'm passionate about production AI systems, multi-agent architectures, and AI safety.
- πΌ LinkedIn
- βοΈ [email protected]
- π Medium - MLwithDev
- π deviahc.com - SRE Agent Services
π‘ Open to: Research collaborations, speaking opportunities at AI/ML conferences, and impactful roles bridging cutting-edge AI research with production systems.
β Currently working on: Making DeepSeek implementations using 3 different backends(rust, Pytorch, MLX)



