Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Conversation

@devarajabc
Copy link
Contributor

@devarajabc devarajabc commented Nov 11, 2025

Implements a block cache in findBlock() to avoid repeated RB-tree lookups when accessing the same memory block consecutively.
Exploits both temporal locality (repeated access to same block) and spatial locality (newly created blocks are immediately used).

Real-world cache hit rates measured(same config as in issue #2511 but with the new version of box64):

  1. Small program (qsort): 64.7% (249 hits / 136 misses)
  2. Large linux game (petsitting): 72.7% (35,392 hits / 13,302 misses)
  3. Large Windows game with Wine (petsitting): 85.3% (64,432 hits / 11,097 misses)

Impact: Converts 65-85% of $O(log n)$ RB-tree lookups to $O(1)$ cache hits, reducing findBlock() overhead in allocation-heavy workloads.

side note:
Cache invalidated before box_realloc() to prevent use-after-free when p_blocks array grows. Without this, cached pointer would reference freed memory after reallocation moves the array. Cache is repopulated after new block creation with valid pointer from new array location.

    if (n_blocks > c_blocks) {
        last_found_block = NULL;           // Invalidate before realloc
        p_blocks = box_realloc(...);       // Array may move
    }
    // ... create block ...
    last_found_block = &p_blocks[i];       // Repopulate with valid pointer

@devarajabc devarajabc force-pushed the block_cache branch 2 times, most recently from 6fc5fe8 to 390e68a Compare November 11, 2025 09:08
Cache last found block to skip RB-tree lookup on consecutive accesses.
Safe during re-entrance via direct pointer dereference (reads NULL when
blocks temporarily zeroed, causing natural cache miss).

Cache hit rates measured in real workloads:
- Small program (qsort): 64.7% (249/136)
- Large game ELF: 72.7% (35,392/13,302)
- Large game PE+Wine: 85.3% (64,432/11,097)

Wine workloads benefit most due to intensive allocation patterns.
Converts 65-85% of O(log n) lookups to O(1) cache hits.
@ptitSeb
Copy link
Owner

ptitSeb commented Nov 11, 2025

Thnaks, interesting stats. Do you you think it could improve if there was 1 cache per blocks type?

Also, would it be interesting to do that on Dynarec Blocks too?

@ptitSeb ptitSeb merged commit a1bf126 into ptitSeb:main Nov 11, 2025
27 checks passed
@devarajabc
Copy link
Contributor Author

devarajabc commented Nov 11, 2025

Thnaks, interesting stats. Do you you think it could improve if there was 1 cache per blocks type?

Also, would it be interesting to do that on Dynarec Blocks too?

Good idea! I will do more stats on it.

@devarajabc
Copy link
Contributor Author

devarajabc commented Nov 11, 2025

For "per-type block cahce", I ran a temporal clustering analysis on Large Windows game with Wine (petsitting) (46,518 block accesses) to measure how densely the three block types are used by findBlock during execution. Here are the findings:

TYPE DISTRIBUTION:
BTYPE_MAP64 : 45,352 (97.5%)
BTYPE_LIST : 656 ( 1.4%)
BTYPE_MAP : 510 ( 1.1%)

wine_timeline_timeline

The visualization above shows the temporal access pattern across all 46,518 accesses. The data reveals strong temporal clustering - only 2.39% of accesses involve switching between block types, meaning blocks of the same type are accessed consecutively 97.61% of the time.

I analyzed the continuous sequences of each type and found:
Total runs : 1,114
Average run length : 41.8 consecutive accesses
Max run length : 36,368 consecutive accesses
Type switches : 1,113 (2.39% of total)
cache_figure

For this Wine gaming workload, I'm not sure if there will be a significant improvement in performance, since MAP64 dominates and programs typically access the same block type repeatedly before switching.

How to Reproduce:

  1. Insert printf in findBlcok to show the type of block.
  2. pipe the output into a txt file
  3. read the txt file with: https://github.com/TYL0102/cache-sequence-counter

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants