Thanks to visit codestin.com
Credit goes to www.scribd.com

0% found this document useful (0 votes)
3 views4 pages

Chapter 5

Uploaded by

xuantung080506
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
3 views4 pages

Chapter 5

Uploaded by

xuantung080506
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 4

Name: Lê Minh Quang

Class: A4

Student ID: 2410814

Chapter 5
Exercise 1:

a. Each 64-bit integer is 8 bytes. A 16-byte cache block can store 16 / 8 = 2 integers.

-> 2 64-bit integers per block.

b. Temporal locality is observed when the same memory location is accessed repeatedly in a
short span. In the C code, B[I][0] is accessed repeatedly inside the inner loop for fixed I.

-> B[I][0] exhibits temporal locality.

c. Spatial locality is observed when nearby memory locations are accessed. Since A[I][J]
accesses all elements of a row sequentially (J from 0 to 7999), it has spatial locality.

-> A[I][J] exhibits spatial locality.

d. In MATLAB, matrices are column-major. B(I,0) is accessed repeatedly for fixed I, showing
temporal locality.

-> B(I,0) exhibits temporal locality.

e. A(J,I) accesses elements down a column in MATLAB (column-major), so it has spatial


locality.

-> A(J,I) exhibits spatial locality.

f. Total elements = 8x8000 (A) + 8 (B) = 64008 64-bit integers. Each 16-byte cache block
holds 2 integers. Blocks needed = 64008 / 2 = 32004 blocks.

-> 32004 cache blocks in both MATLAB and C.

Exercise 2:

a. Direct-mapped cache with 16 one-word blocks:

Each 64-bit address maps to a block via the 4 LSBs. Since all addresses are unique and cache
is initially empty, all accesses result in misses.

b. Cache with 2-word blocks and 8 blocks total:


Offset = 1 bit, Index = 3 bits, rest is tag. There are 4 hits out of 12 accesses (at addresses
0x02, 0xbe, 0xb5, 0xfd).

c. Cache designs (8-word total):

C1 (1-word): high flexibility but fewer hits on sequential.

C2 (2-word): balance.

C3 (4-word): best for sequential access.

-> C2 is often the best compromise.

Exercise 3:

a. Cache block size is 16 bytes = 4 words.

Offset = 4 bits.

b. Cache has 64 blocks.

Index = 6 bits.

Tag = 22 bits (from 32 - 6 - 4).

c. Tag = upper 22 bits

Index = 6 bits

Offset = 4 bits

d. Hit ratio = hits / total accesses = 6 / 11 = 54.5%

e. Final cache state: <0, 0x34, Mem[0xD00–0xD0F]>, <2, 0x30, Mem[0xC20–0xC2F]>

Exercise 4:

a. Buffers between caches:

- Between L1 and L2: write buffer, read buffer

- Between L2 and memory: write buffer

b. L1 write-miss handling:

1. Check L2.

2. If dirty, write back to L2.

3. Bring block from L2.

4. Update LRU.
5. Write data.

c. Exclusive caches:

L1 write-miss: fetch from L2, move to L1. Evicted dirty blocks in L1 go to L2.

L1 read-miss: move block from L2 to L1, evict L2 copy.

Exercise 5:

a. 64 KiB cache, 32-byte blocks, 512 KiB working set:

Total blocks = 16384, Cache blocks = 2048 → All accesses are misses (no reuse).

-> 100% miss rate. Misses are compulsory + capacity (3C model).

b. Varying block sizes:

Smaller blocks → more misses. Larger blocks → more spatial locality reuse.

-> Workload exploits spatial locality.

c. Prefetching using stream buffer (2-entry):

Miss only on every 3rd access due to prefetching → Approx. 1/3 miss rate.

Exercise 6:

a. TLB hits: 0, Page Table Hits: 0, Page Faults: 10 (every page is new -> first time seen)

b. Using 16 KiB pages: fewer TLB misses, fewer entries, less overhead.

Trade-off: more internal fragmentation.

c. With 2-way set associative TLB: TLB miss rate will improve slightly over direct-mapped,
due to fewer conflicts.

d. With direct-mapped TLB: only one entry per set → more conflict misses.

-> Least flexible, highest miss rate

e. TLB is needed to avoid expensive page table lookups for every memory access. Without
TLB: very slow.

Exercise 7:

a. LRU policy: tracks least recently used, ensures optimal recent reuse. Number of hits
increases after first few accesses.

Total hits: 3 (steps 6,7,8)


Total misses: 17

Hit rate: 3/20= 15%

b. MRU policy: evicts most recently used → worse performance than LRU.

Hits :3

Misses:17

c. Random policy: evicts randomly. Results vary, simulate with coin flip.

Expected hits: 2-3

d. Optimal replacement: evict the block not needed for the longest future time. Requires
future knowledge (impractical).

Hits:0

e. OPT is hard to implement because it requires perfect knowledge of future references.

f. Deciding whether to cache or not can save space for reusable data, lowering the miss rate.

You might also like