Thanks to visit codestin.com
Credit goes to www.scribd.com

0% found this document useful (0 votes)
88 views50 pages

Mmnet13 Sarkar

This document summarizes a presentation on shared memory concurrency models from C/C++11 to POWER and ARM architectures. It discusses how shared memory concurrency has existed since 1962, and how C/C++11 introduced a new concurrency model. It provides examples of message passing programs to illustrate data races and different memory ordering rules. It explains how C11 aims to have a data race free model using happens-before relations. It also discusses how acquiring and releasing memory orders can prevent unintended behavior. Finally, it examines how operations from C/C++11 map correctly to instructions on the POWER architecture while allowing for optimizations.

Uploaded by

AbhishekLolage
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
88 views50 pages

Mmnet13 Sarkar

This document summarizes a presentation on shared memory concurrency models from C/C++11 to POWER and ARM architectures. It discusses how shared memory concurrency has existed since 1962, and how C/C++11 introduced a new concurrency model. It provides examples of message passing programs to illustrate data races and different memory ordering rules. It explains how C11 aims to have a data race free model using happens-before relations. It also discusses how acquiring and releasing memory orders can prevent unintended behavior. Finally, it examines how operations from C/C++11 map correctly to instructions on the POWER architecture while allowing for optimizations.

Uploaded by

AbhishekLolage
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 50

From C/C++11 to POWER and ARM:

What is Shared-Memory Concurrency, Anyway?


Susmit Sarkar
University of St Andrews

MMnet, Heriot Watt


May, 2013

Shared Memory Concurrency: Since 1962


Burroughs D825

(first multiprocessing computer)

Outstanding features include truly modular hardware with


parallel processing throughout.
FUTURE PLANS
The complement of compiling languages is to be expanded.
Susmit Sarkar (St Andrews)

From C/C++11 to POWER and ARM:

May 2013

2 / 34

And Since 2011: In C/C++

ISO C/C++11: introduces a new concurrency model

Susmit Sarkar (St Andrews)

From C/C++11 to POWER and ARM:

May 2013

3 / 34

Example: Message Passing

Initially:
Thread 0
d = 1;
f = 1;

d = 0; f = 0;
Thread 1
while (f == 0)
{};
r = d;

Finally: r = 0 ??

Programmer would hope this is Forbidden

Susmit Sarkar (St Andrews)

From C/C++11 to POWER and ARM:

May 2013

4 / 34

Example: Message Passing (racy)

Initially:

d = 0; f = 0;

Thread 0
d = 1;
f = 1;

Thread 1
while (f == 0)
{};
r = d;

Finally: r = 0 ??

Programmer would hope this is Forbidden


In C/C++11, this has undefined semantics
Data race on d and f variables

Susmit Sarkar (St Andrews)

From C/C++11 to POWER and ARM:

May 2013

4 / 34

C11: A Data Race Free Model


Idea: Programmer mistake to write Data Races

Basis of C11 Concurrency

Susmit Sarkar (St Andrews)

From C/C++11 to POWER and ARM:

May 2013

5 / 34

Example (contd.): mark atomics

Mark atomic variables (accesses have memory order parameter)


Initially:

atomic d = 0; f = 0;

Thread 0

Thread 1

d.store(1,sc);
f.store(1,sc);

while (f.load(sc) == 0)
{};
r = d.load(sc);

Finally: r = 0 ??
Races on Atomic Accesses ignored (now have defined semantics)

Susmit Sarkar (St Andrews)

From C/C++11 to POWER and ARM:

May 2013

6 / 34

Shared Memory Concurrency


Multiple threads with a single shared memory
Question: How do we reason about it?
Answer [1979]: Sequential Consistency
. . . the result of any execution is the same
as if the operations of all the processors
were executed in some sequential order,
respecting the order specified by the program.
[Lamport, 1979]

Susmit Sarkar (St Andrews)

From C/C++11 to POWER and ARM:

May 2013

7 / 34

Sequential Consistency
Thread 0

Thread 1

Thread 2

Thread 3

(Shared) Memory

Traditional assumption (concurrent algorithms, semantics,


verification): Sequential Consistency (SC)
Implies: can use interleaving semantics

Susmit Sarkar (St Andrews)

From C/C++11 to POWER and ARM:

May 2013

8 / 34

Sequential Consistency
Thread 0

Thread 1

Thread 2

Thread 3

(Shared) Memory

Traditional assumption (concurrent algorithms, semantics,


verification): Sequential Consistency (SC)
Implies: can use interleaving semantics
False on modern (since 1972) multiprocessors, or with optimizing
compilers

Susmit Sarkar (St Andrews)

From C/C++11 to POWER and ARM:

May 2013

8 / 34

Our world is not SC


Not since IBM System 370/158MP (1972)

Susmit Sarkar (St Andrews)

From C/C++11 to POWER and ARM:

May 2013

9 / 34

Our world is not SC

Not since IBM System 370/158MP (1972)

. . . . . . Nor in x86, ARM, POWER, SPARC, Itanium, . . .

. . . . . . Nor in C, C++, Java, . . .

Susmit Sarkar (St Andrews)

From C/C++11 to POWER and ARM:

May 2013

10 / 34

Example (contd.): mark atomics relaxed

Mark atomic variables as relaxed (a memory-order parameter)


Initially:

atomic d = 0; f = 0;

Thread 0

Thread 1

d.store(1,rlx);
f.store(1,rlx);

while (f.load(rlx) == 0)
{};
r = d.load(rlx);

Finally: r = 0 ??
(Forbidden on SC)

Susmit Sarkar (St Andrews)

From C/C++11 to POWER and ARM:

May 2013

11 / 34

Example (contd.): mark atomics relaxed


Mark atomic variables as relaxed (a memory-order parameter)
Initially:

atomic d = 0; f = 0;

Thread 0

Thread 1

d.store(1,rlx);
f.store(1,rlx);

while (f.load(rlx) == 0)
{};
r = d.load(rlx);

Finally: r = 0 ??
(Forbidden on SC)
Defined, and possible, in C/C++11
Allows for hardware (and compiler) optimisations

Susmit Sarkar (St Andrews)

From C/C++11 to POWER and ARM:

May 2013

11 / 34

C11 Concurrency: An Axiomatic Model

Complete executions are considered


(threadwise operational, reading arbitrary values)
Relations defined over memory events (e.g. happens-before)
Predicate says whether execution is consistent
Further, no consistent execution should have races

Susmit Sarkar (St Andrews)

From C/C++11 to POWER and ARM:

May 2013

12 / 34

Example (contd.): release-acquire synchronization


Mark release stores and acquire loads
Initially:

atomic d = 0; f = 0;

Thread 0

Thread 1

d.store(1,rlx);
f.store(1,rel);

while (f.load(acq) == 0)
{};
r = d.load(rlx);

Finally: r = 0 ??

(Forbidden on SC)
Forbidden in C/C++11 due to release-acquire synchronization
Implementation must ensure result not observed

Susmit Sarkar (St Andrews)

From C/C++11 to POWER and ARM:

May 2013

13 / 34

Example (contd.): release-acquire synchronization


Mark release stores and acquire loads
Initially:

atomic d = 0; f = 0;

Thread 0

Thread 1

d.store(1,rlx);
f.store(1,rel);

while (f.load(acq) == 0)
{};
r = d.load(rlx);

Finally: r = 0 ??

(Forbidden on SC)
Forbidden in C/C++11 due to release-acquire synchronization
Implementation must ensure result not observed
Susmit Sarkar (St Andrews)

From C/C++11 to POWER and ARM:

May 2013

13 / 34

Implementation of acquire/release on POWER


Initially:

d = 0; f = 0;

Thread 0
st d 1;
lwsync;
st f 1;

Thread 1
loop:

ld f rtmp;
cmp rtmp 0;
beq loop;

isync;
ld d r;
Finally: r = 0 ??

Forbidden (and not observed) on POWER7, and ARM


lwsync prevents write reordering
control dependency with isync prevents read speculation
Susmit Sarkar (St Andrews)

From C/C++11 to POWER and ARM:

May 2013

14 / 34

Correct implementations of C/C++ on hardware

Can it be done?
I

. . . on highly relaxed hardware?

What is involved?
I

Mapping new constructs to assembly

Optimizations: which ones legal?

Susmit Sarkar (St Andrews)

From C/C++11 to POWER and ARM:

May 2013

15 / 34

Correct implementations of C/C++ on hardware

Can it be done?
I

. . . on highly relaxed hardware? e.g. POWER/ARM

What is involved?
I

Mapping new constructs to assembly

Optimizations: which ones legal?

Susmit Sarkar (St Andrews)

From C/C++11 to POWER and ARM:

May 2013

15 / 34

Implementing C/C++11 on POWER: Pointwise Mapping


C/C++11 Operation POWER Implementation
Store (non-atomic)
Load (non-atomic)

st
ld

Store relaxed
Store release
Store seq-cst

st
lwsync; st
lwsync; st

Load
Load
Load
Load

ld
ld (and preserve dependency)
ld; cmp; bc; isync
hwsync; ld; cmp; bc; isync

relaxed
consume
acquire
seq-cst

Fence acquire
Fence release
Fence seq-cst
CAS relaxed
CAS seq-cst
...

lwsync
lwsync
hwsync
loop: lwarx; cmp; bc exit;
stwcx.; bc loop; exit:
hwsync; loop: lwarx; cmp; bc exit;
stwcx.; bc loop; isync; exit:
...

(From Paul McKenney and Raul Silvera)


Susmit Sarkar (St Andrews)

From C/C++11 to POWER and ARM:

May 2013

16 / 34

Implementing C/C++11 on POWER: Pointwise Mapping


C/C++11 Operation POWER Implementation
Store (non-atomic)
Load (non-atomic)

st
ld

Store relaxed
Store release
Store seq-cst

st
lwsync; st
lwsync; st

Load
Load
Load
Load

ld
ld (and preserve dependency)
ld; cmp; bc; isync
hwsync; ld; cmp; bc; isync

relaxed
consume
acquire
seq-cst

Is that mapping correct?

Fence acquire
Fence release
Fence seq-cst
CAS relaxed
CAS seq-cst
...

lwsync
lwsync
hwsync
loop: lwarx; cmp; bc exit;
stwcx.; bc loop; exit:
hwsync; loop: lwarx; cmp; bc exit;
stwcx.; bc loop; isync; exit:
...

(From Paul McKenney and Raul Silvera)


Susmit Sarkar (St Andrews)

From C/C++11 to POWER and ARM:

May 2013

16 / 34

Implementing C/C++11 on POWER: Pointwise Mapping


C/C++11 Operation POWER Implementation
Store (non-atomic)
Load (non-atomic)

st
ld

Store relaxed
Store release
Store seq-cst

st
lwsync; st
lwsync; hwsync; st

Load
Load
Load
Load

ld
ld (and preserve dependency)
ld; cmp; bc; isync
hwsync; ld; cmp; bc; isync

relaxed
consume
acquire
seq-cst

Fence acquire
Fence release
Fence seq-cst

lwsync
lwsync
hwsync

CAS relaxed

Answer: No!

CAS seq-cst

hwsync; loop: lwarx; cmp; bc exit;


stwcx.; bc loop; isync; exit:
...

...

loop: lwarx; cmp; bc exit;


stwcx.; bc loop; exit:

(From Paul McKenney and Raul Silvera)


Susmit Sarkar (St Andrews)

From C/C++11 to POWER and ARM:

May 2013

16 / 34

Implementing C/C++11 on POWER: Pointwise Mapping


C/C++11 Operation POWER Implementation
Store (non-atomic)
Load (non-atomic)

st
ld

Store relaxed
Store release
Store seq-cst

st
lwsync; st
hwsync; st

Load
Load
Load
Load

ld
ld (and preserve dependency)
ld; cmp; bc; isync
hwsync; ld; cmp; bc; isync

relaxed
consume
acquire
seq-cst

Is that mapping correct?

Fence acquire
Fence release
Fence seq-cst

lwsync
lwsync
hwsync

CAS relaxed

Answer: Yes!

CAS seq-cst

hwsync; loop: lwarx; cmp; bc exit;


stwcx.; bc loop; isync; exit:
...

...

loop: lwarx; cmp; bc exit;


stwcx.; bc loop; exit:

(From Paul McKenney and Raul Silvera)


Susmit Sarkar (St Andrews)

From C/C++11 to POWER and ARM:

May 2013

16 / 34

Implementing C/C++11 on POWER: Pointwise Mapping


C/C++11 Operation POWER Implementation
Store (non-atomic)
Load (non-atomic)

st
ld

Store relaxed
Store release
Store seq-cst

st
lwsync; st
hwsync; st

Load
Load
Load
Load

ld
ld (and preserve dependency)
ld; cmp; bc; isync
hwsync; ld; cmp; bc; isync

relaxed
consume
acquire
seq-cst

Is that the only correct mapping?

Fence acquire
Fence release
Fence seq-cst

lwsync
lwsync
hwsync

CAS relaxed

Answer: No!

CAS seq-cst

hwsync; loop: lwarx; cmp; bc exit;


stwcx.; bc loop; isync; exit:
...

...

loop: lwarx; cmp; bc exit;


stwcx.; bc loop; exit:

(From Paul McKenney and Raul Silvera)


Susmit Sarkar (St Andrews)

From C/C++11 to POWER and ARM:

May 2013

16 / 34

Implementing C/C++11 on POWER: Pointwise Mapping


C/C++11 Operation POWER Implementation
Store (non-atomic)
Load (non-atomic)

st
ld

Store relaxed
Store release
Store seq-cst

st
lwsync; st
hwsync; st

hwsync; st; hwsync;

Load
Load
Load
Load

ld
ld (and preserve dependency)
ld; cmp; bc; isync
hwsync; ld; cmp; bc; isync

ld; hwsync

Alternative

relaxed
consume
acquire
seq-cst

Fence acquire
Fence release
Fence seq-cst
CAS relaxed
CAS seq-cst
...

lwsync
lwsync
hwsync
loop: lwarx; cmp; bc exit;
stwcx.; bc loop; exit:
hwsync; loop: lwarx; cmp; bc exit;
stwcx.; bc loop; isync; exit:
...

All compilers must agree for separate compilation


Susmit Sarkar (St Andrews)

From C/C++11 to POWER and ARM:

May 2013

16 / 34

Implementing C/C++11 on POWER correctly


Theorem: For any sane, non-optimising compiler following the mapping:
C/C++ prog

C/C++11 semantics C/C++11 execution


observations

compilation

POWER prog

POWER semantics

POWER execution
observations

Showed previous mapping incorrect


Easily adapt proof for an alternative mapping

Susmit Sarkar (St Andrews)

From C/C++11 to POWER and ARM:

May 2013

17 / 34

Benefits of a formal proof

Reasoning about industrial-strength concurrency


Enables:
Confidence in C/C++ and Power concurrency models
Confidence in compiler implementations [gcc]
Reasoning about C/C++ and Power
(Path to) Reasoning about ARM ??

Susmit Sarkar (St Andrews)

From C/C++11 to POWER and ARM:

May 2013

18 / 34

POWER: Hardware Modeling

Hard to see an axiomatic characterisation


Model the microarchitecture (operational model)
But, have to be abstract
Susmit Sarkar (St Andrews)

From C/C++11 to POWER and ARM:

May 2013

19 / 34

POWER operational model


Thread
Write request
Read request
Barrier request

Thread

Read response
Barrier ack

Storage Subsystem

Operational model of POWER [PLDI11]


Abstract view of microarchitecture
I
I

Abstract (topology-independent) Storage Subsystem


Speculation in threads visible

Labelled transition systems, synchronising on messages


2500 lines of formal mathematics, described in 3 pages of prose

Susmit Sarkar (St Andrews)

From C/C++11 to POWER and ARM:

May 2013

20 / 34

Topology-Independent Storage Subsystem


Thread1
W

W
W

W
W

W
W

ry 3
mo

d3

Me
rea

Th

Thre

Mem
ory
5

y2

or
Mem

ad 2
Thre

W
W

ad5

Memory1

Me
mo
ry

d4

rea

Th

Do not expose topology


Equivalently: Copy of memory per thread
Have to take into account barriers/ordering instructions
Susmit Sarkar (St Andrews)

From C/C++11 to POWER and ARM:

May 2013

21 / 34

Cumulativity: Programming on many threads

Initially:

d = 0; f = 0;

Thread 0

Thread 1

Thread 2

st d 1

ld rd d
lwsync
st f 1

loop: ld r1 f;
cmp r1 1;
beq loop;
isync;
ld r r2 ;

Finally: rd = 1 r1 = 1 r = 0 ??
The lwsync is cumulative: it keeps the stores in order for all threads
Flipping the dependency and barrier does not recover SC

Susmit Sarkar (St Andrews)

From C/C++11 to POWER and ARM:

May 2013

22 / 34

A (slightly) More Complex Example

Initially:

data = 0; flag = 0;

Thread 0
data = 1;
lwsync;
flag = 1;

Thread 1
while (flag == 0)
{};
tmp = 1;
r1 = tmp;
r = data + (r1 r1 );

Finally: r = 0 ??

Is that behaviour Allowed? Observable?

Susmit Sarkar (St Andrews)

From C/C++11 to POWER and ARM:

May 2013

23 / 34

A (slightly) More Complex Example

Initially:

data = 0; flag = 0;

Thread 0
data = 1;
lwsync;
flag = 1;

Thread 1
while (flag == 0)
{};
tmp = 1;
r1 = tmp;
r = data + (r1 r1 );

Finally: r = 0 ??

Is that behaviour Allowed? Observable?


Observed on Power7; Allowed by the model

Susmit Sarkar (St Andrews)

From C/C++11 to POWER and ARM:

May 2013

23 / 34

Overall Model Size

Explanation in 3 pages of prose


Microarchitectural intuitions
No extraneous concrete details
2500 lines of machine-processed math
In LEM [ITP11], a simple new semantic metalanguage
Can extract executable code, and theorem-prover code
With OCaml harness: interactive and exhaustive checker
Compilable to browser!

Susmit Sarkar (St Andrews)

From C/C++11 to POWER and ARM:

May 2013

24 / 34

Validating the model


Extract executable code from definition, exhaustively enumerate
possible behaviours of tests
Run many iterations of tests on real hardware (Power G5, 6, 7)
Excerpt of results:
Test
WRC+sync+addr
WRC+data+sync
PPOCA
PPOAA
LB

Model
Forbid
Allow
Allow
Forbid
Allow

POWER 6
ok
0 / 16G
ok 150k / 12G
unseen 0 / 39G
ok
0 / 39G
unseen 0 / 31G

POWER 7
ok
0 / 110G
ok 56k / 94G
ok 62k / 141G
ok
0 / 157G
unseen 0 / 176G

Agreed with key IBM Power designers/architects


Susmit Sarkar (St Andrews)

From C/C++11 to POWER and ARM:

May 2013

25 / 34

Validating the model


Extract executable code from definition, exhaustively enumerate
possible behaviours of tests
Run many iterations of tests on real hardware (Power G5, 6, 7)
Excerpt of results:
Test
WRC+sync+addr
WRC+data+sync
PPOCA
PPOAA
LB

Model
Forbid
Allow
Allow
Forbid
Allow

POWER 6
ok
0 / 16G
ok 150k / 12G
unseen 0 / 39G
ok
0 / 39G
unseen 0 / 31G

POWER 7
ok
0 / 110G
ok 56k / 94G
ok 62k / 141G
ok
0 / 157G
unseen 0 / 176G

Agreed with key IBM Power designers/architects


Susmit Sarkar (St Andrews)

From C/C++11 to POWER and ARM:

May 2013

25 / 34

Validating the model


Extract executable code from definition, exhaustively enumerate
possible behaviours of tests
Run many iterations of tests on real hardware (Power G5, 6, 7)
Excerpt of results:
Test
WRC+sync+addr
WRC+data+sync
PPOCA
PPOAA
LB

Model
Forbid
Allow
Allow
Forbid
Allow

POWER 6
ok
0 / 16G
ok 150k / 12G
unseen 0 / 39G
ok
0 / 39G
unseen 0 / 31G

POWER 7
ok
0 / 110G
ok 56k / 94G
ok 62k / 141G
ok
0 / 157G
unseen 0 / 176G

Agreed with key IBM Power designers/architects


Susmit Sarkar (St Andrews)

From C/C++11 to POWER and ARM:

May 2013

25 / 34

C/C++11 Implementation Proof


And Its Consequences

Proof outline

Theorem: For any sane, non-optimising compiler following the mapping:


DRF C/C++ prog

C/C++11 semantics C/C++11 execution


observations

compilation

POWER prog

Susmit Sarkar (St Andrews)

POWER semantics

From C/C++11 to POWER and ARM:

POWER execution
observations

May 2013

27 / 34

Proof outline

Theorem: For any sane, non-optimising compiler following the mapping:


C/C++11 semantics C/C++11 execution
observations
Preserves memory accesses;
Uses the mapping table;
compilation
Respects the thread local semantics of C/C++, preserving
dependencies
POWER semantics
POWER execution
POWER prog
observations
DRF C/C++ prog

Susmit Sarkar (St Andrews)

From C/C++11 to POWER and ARM:

May 2013

27 / 34

Proof outline
Theorem: For any sane, non-optimising compiler following the mapping:
DRF C/C++ prog

C/C++11 semantics C/C++11 execution


observations

compilation
POWER semantics

POWER execution
observations
From POWER trace, build key relations (happens-before, SC
order)
Required properties from abs. machine properties
If trace looks like it produces data race, build the C/C++
data race for contradiction
POWER prog

Susmit Sarkar (St Andrews)

From C/C++11 to POWER and ARM:

May 2013

27 / 34

Building up happens-before (outline)

C11

Power correspondence

Base case: release-acquire

lwsync and isync

Transitive (multiple rel/acq)

Cumulativity of lwsync

Release-consume with dependencies

lwsync and dependencies

Special rules for CAS

coherence-point reasoning

...

...

Susmit Sarkar (St Andrews)

From C/C++11 to POWER and ARM:

May 2013

28 / 34

Using Proofs for Hardware Design

Previously, similar C11 proof for x86-TSO


I

There, much simpler

What properties of Hardware were necessary?


Turns out: x86 Compare-and-Swap have strong properties
Weakening guarantees: Better implementation, just as good
programming [PLDI13]

Susmit Sarkar (St Andrews)

From C/C++11 to POWER and ARM:

May 2013

29 / 34

Using Proofs for Hardware Design (2)


Initially:

data = 0; flag = 0;

Thread 0
data = 1;
sync;
flag = 1;

Thread 1
while (flag == 0)
{};
atomically (flag = 2);
r1 = flag;
r = data + (r1 r1 );

Finally: r = 0 ??
Is that Allowed? Observable?

Susmit Sarkar (St Andrews)

From C/C++11 to POWER and ARM:

May 2013

30 / 34

Using Proofs for Hardware Design (2)


Initially:

data = 0; flag = 0;

Thread 0
data = 1;
sync;
flag = 1;

Thread 1
while (flag == 0)
{};
atomically (flag = 2);
r1 = flag;
r = data + (r1 r1 );

Finally: r = 0 ??
Is that Allowed? Observable?
C11/C++11 mapping would break (and no good way of fixing)
Fortunately, current hardware does not do this
. . . and now we know why future hardware should not
Susmit Sarkar (St Andrews)

From C/C++11 to POWER and ARM:

May 2013

30 / 34

Conclusion

Reasoning about industrial-strength concurrency


Correct compilation of C/C++ concurrency primitives on Power
Confidence in both models
Compiler implementation relevance
Isolate relevant properties of h/w (Path to Hardware Design)
Reasoning about machine code at C/C++ level

Susmit Sarkar (St Andrews)

From C/C++11 to POWER and ARM:

May 2013

31 / 34

Thank You!
More details at:
http://www.cl.cam.ac.uk/~pes20/cppppc
Understanding POWER Multiprocessors [PLDI11]
Clarifying and Compiling C/C++ Concurrency: From C++11 to POWER
[POPL12]
Synchronising C/C++ and POWER [PLDI12]
Fast RMWs for TSO: Semantics and Implementation [PLDI13]
The ppcmem tool at:
http://www.cl.cam.ac.uk/~pes20/ppcmem

Model Excerpt
Propagate write to another thread
The storage subsystem can propagate a write w (by thread tid) that it has seen
to another thread tid 0 , if:
the write has not yet been propagated to tid 0 ;
w is coherence-after any write to the same address that has already been
propagated to tid 0 ; and
all barriers that were propagated to tid before w (in
s.events propagated to (tid)) have already been propagated to tid 0 .
Action: append w to s.events propagated to (tid 0 ).

Explanation: This rule advances the thread tid 0 view of the coherence
order to w , which is needed before tid 0 can read from w , and is also
needed before any barrier that is in tids view after w (has w in its Group
A) can be propagated to tid 0 .
Susmit Sarkar (St Andrews)

From C/C++11 to POWER and ARM:

May 2013

33 / 34

Model Excerpt
Propagate write to another thread
let write_announce_cand m s w tid =
(w IN s.writes_seen) &&
(tid IN s.threads) &&
(not (List.mem (SWrite w) (s.events_propagated_to tid))) &&
(forall (w IN s.writes_seen).
if List.mem (SWrite w) (s.events_propagated_to tid) && w.w_addr = w.w_addr
then (w,w) IN s.coherence
else true) &&
(forall (b IN barriers_seen s).
if (ordered_before_in (s.events_propagated_to w.w_thread)
(SBarrier b) (SWrite w))
then List.mem (SBarrier b) (s.events_propagated_to tid) else true)

let write_announce_action s w tid =


let events_propagated_to = funupd s.events_propagated_to tid
(add_event (s.events_propagated_to tid) (SWrite w))
<| s with events_propagated_to = events_propagated_to |>

Susmit Sarkar (St Andrews)

From C/C++11 to POWER and ARM:

May 2013

34 / 34

You might also like