trying to eke out the most from every cpu cycle possible, seeing the hardware not as an abstraction, but as a key in execution.
To do this I focus on:
-
Memory Hierarchy: Designing for L1/L2 cache residency. I utilize Data-Oriented Design (DOD) and strict 64-byte cache-line alignment to eliminate misses and false sharing. -
CPU Pipeline: Optimizing for the Hardware Prefetcher via linear access patterns. I minimize pipeline stalls through branchless programming, bit-manipulation, and std::intrinsics to keep the Branch Predictor saturated. -
Execution concurrency: Maximizing ILP (Instruction Level Parallelism) and Out-of-Order execution by breaking data dependencies. I leverage SIMD and inline assembly when the compiler reaches its limit. -
Zero-Cost Resource Management: Eliminating pointer indirection by prioritizing stack allocation and pre-allocated arenas over the heap.
"I only care if it is possible to improve, no matter the difficulty. Code is an Art! And at my very best, my goal is to write code that honors the most complex human invention ever, the microprocessor."
🦀 Rust ⚡ Performance Engineering The stack Ofc! Branchless logic! ⚙️ Risc-V Embedded systems 🧬 Evolving Code Parallelism even though it hurts
✉️ Email: [email protected]
💼 LinkedIn: Hadrian Lazic


