Thanks to visit codestin.com
Credit goes to www.scribd.com

0% found this document useful (0 votes)
6 views60 pages

Memory Consistency Part1

Uploaded by

rppay777
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
6 views60 pages

Memory Consistency Part1

Uploaded by

rppay777
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 60

Indian Institute of Science (IISc), Bangalore, India

Memory consistency
models and
synchronizations
Part 1

www.csa.iisc.ac.in
Indian Institute of Science (IISc), Bangalore, India

Acknowledgements
 Several of the slides in the deck are from Luis Ceze
(Washington), Nima Horanmand (Stony Brook),
Mark Hill, David Wood, Karu Sankaralingam
(Wisconsin), Abhishek Bhattacharjee(Rutgers).

 Development of this course is partially supported


by Western Digital corporations.

8/9/2018 2
Indian Institute of Science (IISc), Bangalore, India

Coherence vs. Consistency


A=0 FLAG=0
C 0 C 1
ST A 1; L1: LD r1 FLAG
ST FLAG 1; If r1 == 0; JMP L1;// spin lock
LD r2 A;

10/7/201
3
9
Indian Institute of Science (IISc), Bangalore, India

Coherence vs. Consistency


A=0 FLAG=0
C 0 C 1
ST A 1; L1: LD r1 FLAG
ST FLAG 1; If r1 == 0; JMP L1;// spin lock
LD r2 A;

What value should r2 contain?

87
Indian Institute of Science (IISc), Bangalore, India

Coherence vs. Consistency


A=0 FLAG=0
C 0 C 1
ST A 1; L1: LD r1 FLAG
ST FLAG 1; If r1 == 0; JMP L1;// spin lock
LD r2 A;

What value should r2 contain? (expect r2=1)

But what does the coherence says?

87
Indian Institute of Science (IISc), Bangalore, India

6
Indian Institute of Science (IISc), Bangalore, India

7
Indian Institute of Science (IISc), Bangalore, India

Coherence vs. Consistency


A=0 FLAG=0
C 0 C 1
ST A 1; L1: LD r1 FLAG
ST FLAG 1; If r1 == 0; JMP L1;// spin lock
LD r2 A;

What value should r2 contain? (expect r2=1)

But what does the coherence says? Nothing

Coherence is about total ordering of stores and loads to one given memory
location (in practice a single cache block)

Says nothing about ordering across different memory locations/addresses

87
Indian Institute of Science (IISc), Bangalore, India

Coherence vs. Consistency


A=0 FLAG=0
C 0 C 1
ST A 1; L1: LD r1 FLAG
ST FLAG 1; If r1 == 0; JMP L1;// spin lock
LD r2 A;

What value should r2 contain? (expect r2=1)

But what does the coherence says? Nothing

Coherence is about total ordering of stores and loads to one given memory
location (in practice a single cache block)

Says nothing about ordering across different memory locations/addresses

But correctly specifying/writing multi-threaded program requires guarantees for


ordering of load/stores to different locations

87
Indian Institute of Science (IISc), Bangalore, India

Memory consistency models


A memory consistency model specifies global orders of writes to all memory
locations relative to each other

A memory consistency model tells what are the legal reordering of loads/stores
to different memory locations

Different memory consistency models possible


 Tradeoff between ease of programmability vs. performance
 “Relaxed” models enforces less ordering (better performance) but harder to program

Memory consistency model part of ISA


 X86 and ARM has different memory consistency model
 Cache coherent protocol’s aren’t – software only needs to know if the hardware supports cache
coherence or not

87
Indian Institute of Science (IISc), Bangalore, India

Memory consistency models


 Specifies which re-ordering of loads and stores, to
different addresses, are allowed
A=0 FLAG=0
C 0 C 1
ST A 1; L1: LD r1 FLAG
ST FLAG 1; If r1 == 0; JMP L1;// spin lock
LD r2 A;

10/7/201
8
9
Indian Institute of Science (IISc), Bangalore, India

Memory consistency models


 Specifies which re-ordering of loads and stores, to
different addresses, are allowed
A=0 FLAG=0
C 0 C 1
ST A 1; L1: LD r1 FLAG
ST FLAG 1; If r1 == 0; JMP L1;// spin lock
LD r2 A;
 For example, this program will work fine if load, store
bypassing is not allowed within a given thread, even if
they are to different addresses

10/7/201
8
9
Indian Institute of Science (IISc), Bangalore, India

Memory consistency models


 Specifies which re-ordering of loads and stores (to
different addresses) are allowed
A=0 FLAG=0
C 0 C 1
ST A 1; L1: LD r1 FLAG
ST FLAG 1; If r1 == 0; JMP L1;// spin lock
LD r2 A;
 For example, this program will work fine if load, store
bypassing is not allowed within a given thread, even if
they are to different addresses
 Parallel programmers rely on the memory model to
reason about correctness of your program
10/7/201
8
9
Indian Institute of Science (IISc), Bangalore, India

Who defines memory


consistency models?
 Any programmer of parallel shared memory
programs should care of memory consistency
models

 Defined by the H/W –described in ISA

 Programming languages should also need to define


their memory consistency models
‣ Determines what optimizations are allowed

10/7/201
9
9
Indian Institute of Science (IISc), Bangalore, India

Many possible memory


consistency models
 Sequential consistency
‣ Most intuitive, programmer friendly
‣ But most restriction in terms of performance optimizations
‣ e.g., implemented in MIPS R10K
 Total store order (processor consistency model)
‣ Less restrictive than
‣ Intel, AMD’s x86’64 processors
‣ Sun UltraSPARC
 Weak memory models (release consistency models)
‣ Most relaxed, burdens programmer but allows more
hardware optimizations
‣ ARM, IBM POWER processors

10/7/201
10
9
Indian Institute of Science (IISc), Bangalore, India

Sequential consistency
...
P1 P2 P3 PN
st A st C ld C ld A

Shared memory

Per-processor program order: memory operations from


individual processors maintain program order
Single sequential order: the memory operations from all
processors maintain a single sequential order
[Lamport’79]
10/7/201
16 11
9
Indian Institute of Science (IISc), Bangalore, India

Sequential consistency
... C1 C2
P1 P2 P3 PN
st A st C ld C ld A
st A st A
ld C st C
st C ld D

Shared memory

Per-processor program order: memory operations from


individual processors maintain program order
Single sequential order (Total order): memory
operations from all processors maintain a single
sequential order
10/7/201
17 11
9
Indian Institute of Science (IISc), Bangalore, India

Sequential consistency
A possible legal
... C1 C2 global order
P1 P2 P3 PN
st A
st A st A
st A st C ld C ld A st A
ld C st C
ld C
st C ld D st C
st C
ld D
Shared memory

Per-processor program order: memory operations from


individual processors maintain program order
Single sequential order (Total order): memory
operations from all processors maintain a single
sequential order
10/7/201
18 11
9
Indian Institute of Science (IISc), Bangalore, India

Sequential consistency
A possible legal
... C1 C2 global order
P1 P2 P3 PN
st A st A
st A st A
st A st C ld C ld A st A ld C
ld C st C st A
ld C
st C ld D st C st C
st C st C
ld D ld D
Shared memory

Per-processor program order: memory operations from


individual processors maintain program order
Single sequential order (Total order): memory
operations from all processors maintain a single
sequential order
10/7/201
19 11
9
Indian Institute of Science (IISc), Bangalore, India

Sequential consistency (SC)


A=0 FLAG=0
C 0 C 1
ST A 1; L1: LD r1 FLAG
ST FLAG 1; If r1 == 0; JMP L1;// spin lock
LD r2 A;

 What will be the value loaded in r2 under SC?

10/7/201
12
9
Indian Institute of Science (IISc), Bangalore, India

Reordering of load/stores
Program order

Earlier operation LD ST

Later operation
LD NO NO

ST NO NO

Allowed reordering of load/stores to different addresses under


sequential consistency.

No reordering for load/stores to same address – ensured by


coherence
10/7/201
13
9
Indian Institute of Science (IISc), Bangalore, India

Food for thought (assume SC)


• Answer the following questions:
• Initially: all variables zero (that is, x is 0, y is 0, flag is 0, A is 0)
• What value pairs can be read by the two loads? (x, y) pairs:
Core 0 Core 1
LD x ST y 1
LD y ST x 1

110
Indian Institute of Science (IISc), Bangalore, India

Food for thought (assume SC)


• Answer the following questions:
• Initially: all variables zero (that is, x is 0, y is 0, flag is 0, A is 0)
• What value pairs can be read by the two loads? (x, y) pairs:
Core 0 Core 1
LD x ST y 1 How about (1,0)?
LD y ST x 1

110
Indian Institute of Science (IISc), Bangalore, India

Food for thought (assume SC)


• Answer the following questions:
• Initially: all variables zero (that is, x is 0, y is 0, flag is 0, A is 0)
• What value pairs can be read by the two loads? (x, y) pairs:
Core 0 Core 1
LD x ST y 1 How about (1,0)?
LD y ST x 1
•What value pairs can be read by the two loads? (x, y) pairs:
Core 0 Core 1
ST y 1 ST x 1 How about (0,0)?
LD x LD y

110
Indian Institute of Science (IISc), Bangalore, India

Problems with SC memory model


 Difficult to implement efficiently in hardware
‣ Straight-forward implementations of SC dictate:
• Strict ordering of memory accesses at each processors
• Essentially precludes most out-of-order CPU benefits
→ Conflicts with common latency-hiding techniques

 Constrains compiler optimizations


‣ Disallows code motion, common subexpression elimination

 Implementations of SC which tries to extract


concurrency of accesses are complex
‣ e.g., MIPS R10K

 No commercial processors implement SC today


Indian Institute of Science (IISc), Bangalore, India

Constraints of SC: Write buffer


 Why have a write (store buffer) buffer?
Core

WB

L1 Cache
Indian Institute of Science (IISc), Bangalore, India

Constraints of SC: Write buffer


 Why have a write (store buffer) buffer?
Core

 Can existence of write buffer break SC? WB

L1 Cache
Indian Institute of Science (IISc), Bangalore, India

Write buffer breaks SC

Core 0 Core 1

WB WB

L1 Cache L1 Cache
Coherent

LLC
Indian Institute of Science (IISc), Bangalore, India

Write buffer breaks SC


Core 0 Core 1
ST y 1 ST x 1
LD x LD y

Core 0 Core 1

WB WB

L1 Cache L1 Cache
Coherent

LLC
Indian Institute of Science (IISc), Bangalore, India

Write buffer breaks SC


Core 0 Core 1
ST x 1
LD x LD y

ST y 1 Core 1

WB WB

L1 Cache L1 Cache
Coherent

LLC
Indian Institute of Science (IISc), Bangalore, India

Write buffer breaks SC


Core 0 Core 1
ST x 1
LD x LD y

Core 1

WB y 1 WB

L1 Cache L1 Cache
Coherent

LLC
Indian Institute of Science (IISc), Bangalore, India

Write buffer breaks SC


Core 0 Core 1

LD x LD y

ST x 1

WB y 1 WB

L1 Cache L1 Cache
Coherent

LLC
Indian Institute of Science (IISc), Bangalore, India

Write buffer breaks SC


Core 0 Core 1

LD x LD y

WB y 1 WB x 1

L1 Cache L1 Cache
Coherent

LLC
Indian Institute of Science (IISc), Bangalore, India

Write buffer breaks SC


Core 0 Core 1

LD y

LD x

WB y 1 WB x 1

L1 Cache L1 Cache
Coherent

LLC
Indian Institute of Science (IISc), Bangalore, India

Write buffer breaks SC


Core 0 Core 1

LD y

WB y 1 WB x 1

LD x=0 L1 Cache L1 Cache


Coherent

LLC
Indian Institute of Science (IISc), Bangalore, India

Write buffer breaks SC


Core 0 Core 1

LD y

WB y 1 WB x 1

LD x=0 L1 Cache L1 Cache


Coherent

LLC
Indian Institute of Science (IISc), Bangalore, India

Write buffer breaks SC


Core 0 Core 1

WB y 1 WB x 1

LD x=0 L1 Cache L1 Cache LD y=0

Coherent

LLC
Indian Institute of Science (IISc), Bangalore, India

Write buffer breaks SC


Core 0 Core 1

X= 0
Y= 0

NOT allowed
by SC !
WB y 1 WB x 1

LD x=0 L1 Cache L1 Cache LD y=0

Coherent

LLC
Indian Institute of Science (IISc), Bangalore, India

Alternative memory model


 Total store order (TSO) memory model
Program order

Earlier operation LD ST

Later operation
LD NO YES
ST NO NO

Allowed reordering of load/stores under TSO


Indian Institute of Science (IISc), Bangalore, India

Alternative memory model


 Total store order (TSO) memory model
Program order

Earlier operation LD ST

Later operation
LD NO YES
ST NO NO

Allowed reordering of load/stores under TSO


Remember that reordering allowed only if ST/LD are to
different addresses.
Indian Institute of Science (IISc), Bangalore, India

Alternative memory model


 Total store order (TSO) memory model
‣ Implemented by Intel, AMD and Sun/Oracle's SPARC
processors
‣ It is sometime called processor consistency model
Program order

Earlier operation LD ST

Later operation
LD NO YES
ST NO NO

Allowed reordering of load/stores under TSO


Remember that reordering allowed only if ST/LD are to
different addresses.
Indian Institute of Science (IISc), Bangalore, India

TSO vs. SC
 What performance optimization would TSO allow?
‣ (That is not allowed by SC)
Indian Institute of Science (IISc), Bangalore, India

TSO vs. SC
 What performance optimization would TSO allow?
‣ (That is not allowed by SC)
‣ A FIFO write buffer
• Still need to maintain store-to-store order
Indian Institute of Science (IISc), Bangalore, India

TSO vs. SC
 What performance optimization would TSO allow?
‣ (That is not allowed by SC)
‣ A FIFO write buffer
• Still need to maintain store-to-store order

 What is disadvantage of TSO?


‣ Some programs will break
Indian Institute of Science (IISc), Bangalore, India

What breaks under TSO?


A=0 FLAG=0
C 0 C 1
ST A 1; L1: LD r1 FLAG
ST FLAG 1; If r1 == 0; JMP L1;// spin lock
LD r2 A;
Indian Institute of Science (IISc), Bangalore, India

What breaks under TSO?


A=0 FLAG=0
C 0 C 1
ST A 1; L1: LD r1 FLAG
ST FLAG 1; If r1 == 0; JMP L1;// spin lock
LD r2 A;

 This will work as expected


Indian Institute of Science (IISc), Bangalore, India

What breaks under TSO?


A=0 FLAG=0
C 0 C 1
ST A 1; L1: LD r1 FLAG
ST FLAG 1; If r1 == 0; JMP L1;// spin lock
LD r2 A;

 This will work as expected


 TSO allows only later load to bypass previous stores
to different address
Indian Institute of Science (IISc), Bangalore, India

What breaks under TSO?


Core 0 Core 1
ST y 1 ST x 1
LD x LD y
Indian Institute of Science (IISc), Bangalore, India

What breaks under TSO?


Core 0 Core 1
ST y 1 ST x 1
LD x LD y

 Is x=0, y=0 possible?


Indian Institute of Science (IISc), Bangalore, India

What breaks under TSO?


Core 0 Core 1
ST y 1 ST x 1
LD x LD y

 Is x=0, y=0 possible?


 Yes, because later load to different address can
bypass earlier store
Indian Institute of Science (IISc), Bangalore, India

What if the programmer wants SC


like ordering?
 Special fence instructions to explicitly introduce
ordering
‣ Example, mfence instruction in x86/x86-64
‣ Programmer needs to insert them manually
Indian Institute of Science (IISc), Bangalore, India

What if the programmer wants SC


like ordering?
 Special fence instructions to explicitly introduce
ordering
‣ Example, mfence instruction in x86/x86-64
‣ Programmer needs to insert them manually

Earlier operation LD ST mfence

Later operation
LD NO YES NO
ST NO NO NO

mfence NO NO NO
Indian Institute of Science (IISc), Bangalore, India

Fences to order load/stores


Core 0 Core 1
ST y 1 ST x 1
mfence mfence
LD x LD y
Indian Institute of Science (IISc), Bangalore, India

Fences to order load/stores


Core 0 Core 1
ST y 1 ST x 1
mfence mfence
LD x LD y

 x=0, y=0 not possible anymore  SC compliant


Indian Institute of Science (IISc), Bangalore, India

Fences to order load/stores


Core 0 Core 1
ST y 1 ST x 1
LD Y LD X
LD x LD y

 Is x=0, y=0 still possible?


Indian Institute of Science (IISc), Bangalore, India

Store atomicity
 A memory consistency model supports store
atomicity iff all cores see the stores in the same
order

 TSO implemented in x86-64 does not guarantee


store atomicity
‣ A core can “see” its own store early

10/12/20
55
24
Indian Institute of Science (IISc), Bangalore, India

No store atomicity in x86-64


Core 0 Core 1
LD x LD y

LD y=1 LD X=1

Load-to-store
WB ST y 1 Load-to-store WB ST x 1
bypass
bypass

L1 Cache L1 Cache
Coherent

LLC
Indian Institute of Science (IISc), Bangalore, India

No store atomicity in x86-64


Core 0 Core 1

LD x=0 LD y=0

WB ST y 1 WB ST x 1

L1 Cache L1 Cache
LD y=1 LD X=1
Coherent

LLC
Indian Institute of Science (IISc), Bangalore, India

No store atomicity in x86-64


Core 0 Core 1
X= 0
Y= 0

LD x=0 LD y=0 NOT allowed


by SC !

WB ST y 1 WB ST x 1

L1 Cache L1 Cache
LD y=1 LD X=1
Coherent

LLC
Indian Institute of Science (IISc), Bangalore, India

Store atomicity
 A memory consistency model supports store
atomicity iff all cores see the stores in the same
order

 TSO implemented in x86-64 does not guarantee


store atomicity
‣ A core can “see” its own store early

 TSO implemented in IBM 370 guaranteed store


atomicity
‣ Load can see bypassed value during execution, but it
stalls until the store before it makes to cache
10/12/20
58
24

You might also like