#cache
Blocking
Array Blocking Array blocking divides an array into contiguous chunks and processes one chunk at a time. This can improve cache locality because nearby elements are reused before moving to the next block. You use it when large arrays exceed cache capacity or when work should be processed in batches. Problem Given an array $A$ of length $n$ and a block size $b$, process the array in ranges: $$ [0,...
Padding
Array Padding Array padding inserts unused bytes between elements or groups of elements. It adjusts layout to satisfy alignment constraints or to avoid contention such as false sharing. You use it when memory layout affects performance or correctness in concurrent settings. Problem Given elements of size $s$ and required alignment or spacing $p$, place elements so that consecutive elements are separated by a stride: $$ stride \ge s $$ and...
Tiling
Array Tiling Array tiling divides a multidimensional array into small rectangular regions called tiles. Each tile is processed before moving to the next tile. You use it when matrix or grid operations touch nearby elements repeatedly and cache locality affects performance. Problem Given a matrix $A$ with $r$ rows and $c$ columns, process all elements in tiles of size: $$ t_r \times t_c $$ where $t_r$ is the tile height...
Stride Access
Array Stride Access Array stride access visits elements with a fixed step between consecutive indices. A stride of $1$ scans every element. Larger strides skip elements and may reduce cache locality. You use it when data is sampled, stored in interleaved form, or traversed by columns in row-major storage. Problem Given an array $A$ of length $n$ and a stride $s$, process indices: $$ 0, s, 2s, 3s, \dots $$...
Memory Layout
Array Memory Layout Array memory layout describes how elements are arranged in physical memory. The layout determines how efficiently the CPU can access data due to cache behavior and alignment. You use this knowledge to write code that is fast in practice, not only in asymptotic terms. Problem Given an array $A$ of length $n$, understand how its storage layout affects: access time cache utilization traversal performance Structure A standard...