I specialize in High-Performance Computing (HPC) and Large Language Model (LLM) Optimization. My passion lies in pushing the boundaries of hardware constraints through algorithmic innovation and surgical memory management.
Currently, I am building next-generation inference engines that enable massive AI models to run on consumer-grade hardware.
Adaptive Hybrid Quantization Framework for Low-VRAM Devices
I am the creator of QKV Core, a groundbreaking framework designed to democratize AI.
- The Problem: Running 7B+ parameter models typically requires expensive enterprise GPUs (24GB+ VRAM).
- My Solution: Developed a "Surgical Alignment" and "Adaptive Hybrid Compression" algorithm that eliminates memory fragmentation.
- The Result: Achieved 34% faster I/O and enabled 7B models to run seamlessly on a 4GB GTX 1050, strictly adhering to physical memory limits.
I don't just use libraries; I optimize them at the kernel level.
| Domain | Technologies |
|---|---|
| Core AI & Kernels | Python, PyTorch, Numba (JIT), CUDA Kernels, Transformers |
| Systems Engineering | C#, .NET 8, C++, Memory Alignment, Low-level I/O |
| Data & DevOps | PostgreSQL, Docker, GitHub Actions (CI/CD), Hugging Face Hub |
| Quantization | GGUF, GPTQ, Custom Bit-Packing Algorithms |
I am open to collaborations on Open Source AI, Model Compression, and Fintech Security.