Property-based tests of the OCaml multicore compiler and run time.
This project contains
- a randomized test suite of OCaml 5.x, packaged up in
multicoretests.opam - two reusable testing libraries:
Linpackaged up inqcheck-lin.opamandSTMpackaged up inqcheck-stm.opam
All of the above build on QCheck, a black-box, property-based testing library in the style of QuickCheck.
The two libraries are further described below, in three blog posts, and in a 2022 OCaml Workshop paper:
- Under the Hood: Developing Multicore Property-Based Tests for OCaml 5
- Multicore Property-Based Tests for OCaml 5: Challenges and Lessons Learned
- OCaml 5 Multicore Testing Tools
- Multicoretests - Parallel Testing Libraries for OCaml 5.0
The multicore test suite requires OCaml 5.x:
opam update
opam switch create 5.3.0
It is designed to stress test OCaml 5's multicore features. As such, it requires a multicore machine to run successfully and will fail on a single-CPU setup. We recommend running the test suite with 3 or more (virtual) CPUs.
The two testing libraries are available as packages qcheck-lin
and qcheck-stm from the opam repository. The full versions require
OCaml 5.x and reduced, non-Domain versions are available for OCaml
4.12.x to 4.14.x. They can be installed in the usual way:
opam install qcheck-lin
opam install qcheck-stm
Bleeding edge users can pin and install the latest main as follows:
opam pin -y https://github.com/ocaml-multicore/multicoretests.git#main
To use the Lin library in parallel mode on a Dune project, add the
following dependency to your dune rule:
(libraries qcheck-lin.domain)
Using the STM library in sequential mode requires the dependency
(libraries qcheck-stm.sequential) and the parallel mode similarly
requires the dependency (libraries qcheck-stm.domain).
We have not released the test suite on the opam repository at this point. The test suite can be built and run from a clone of this repository with the following commands:
opam install . --deps-only --with-test
dune build
dune runtest -j1 --no-buffer --display=quiet
As some of the tests are negative, e.g., confirming known domain unsafety by
finding a counterexample, we recommend running only one property-based test at a
time (-j1) to prevent contention between the individual tests for CPU
resources.
Individual tests can be run by invoking dune exec. For example:
$ dune exec src/atomic/stm_tests.exe -- -v
random seed: 51501376
generated error fail pass / total time test name
[✓] 1000 0 0 1000 / 1000 0.2s sequential atomic test
[✓] 1000 0 0 1000 / 1000 180.8s parallel atomic test
================================================================================
success (ran 2 tests)
See src/README.md for an overview of the current PBTs of OCaml 5.x.
It is also possible to run the test suite in the CI, by altering .github/workflows/common.yml to target a particular compiler PR:
COMPILER_REF: 'refs/pull/12345/head'
or a particular branch of a particular fork:
COMPILER_REPO: 'login/ocaml'
COMPILER_REF: 'refs/heads/test-me'
Since ocaml/ocaml#13458
the test suite can be triggered on an ocaml/ocaml PR (or on a fork of it)
by adding the run-multicoretests label.
The Lin module lets a user test an API for sequential consistency,
i.e., it performs a sequence of random commands in parallel, records
the results, and checks whether the observed results can be linearized
and reconciled with some sequential execution. The library offers an
embedded, combinator DSL to describe signatures succinctly. As an
example, the required specification to test (a small part of) the
Hashtbl module is as follows:
module HashtblSig =
struct
type t = (char, int) Hashtbl.t
let init () = Hashtbl.create ~random:false 42
let cleanup _ = ()
open Lin
let a,b = char_printable,nat_small
let api =
[ val_ "Hashtbl.add" Hashtbl.add (t @-> a @-> b @-> returning unit);
val_ "Hashtbl.remove" Hashtbl.remove (t @-> a @-> returning unit);
val_ "Hashtbl.find" Hashtbl.find (t @-> a @-> returning_or_exc b);
val_ "Hashtbl.mem" Hashtbl.mem (t @-> a @-> returning bool);
val_ "Hashtbl.length" Hashtbl.length (t @-> returning int); ]
end
module HT = Lin_domain.Make(HashtblSig)
;;
QCheck_base_runner.run_tests_main [
HT.lin_test `Domain ~count:1000 ~name:"Lin Hashtbl test";
]The first line indicates the type of the system under test along with
bindings init and cleanup for setting it up and tearing it down.
The api then contains a list of type signature descriptions using
combinators unit, bool, int, returning, returning_or_exc,
... in the style of Ctypes.
The functor Lin_domain.Make expects a description of the tested
commands and outputs a module with a QCheck test lin_test that
performs the linearization test.
The QCheck linearization test iterates a number of test
instances. Each instance consists of a "sequential prefix" of calls to
the above commands, followed by a spawn of two parallel Domains
that each call a sequence of operations. Lin chooses the individual
operations and arguments arbitrarily and records their results. The
framework then performs a search for a sequential interleaving of the
same calls, and succeeds if it finds one.
Since Hashtbls are not safe for parallelism, if you run
dune exec doc/example/lin_tests.exe the output can produce the
following output, where each tested command is annotated with its result:
Messages for test Lin Hashtbl test:
Results incompatible with sequential execution
|
|
.------------------------------------.
| |
Hashtbl.add t 'a' 0 : () Hashtbl.add t 'a' 0 : ()
Hashtbl.length t : 1 Hashtbl.length t : 1
In this case, the test tells us that there is no sequential
interleaving of these calls which would return 1 from both calls to
Hashtbl.length. For example, in the following sequential interleaving
the last call should return 2:
Hashtbl.add t 'a' 0;;
let res1 = Hashtbl.length t;;
Hashtbl.add t 'a' 0;;
let res2 = Hashtbl.length t;;See src/atomic/lin_tests.ml for
another example of testing the Atomic module.
STM contains a revision of qcstm
extended to run parallel state-machine tests akin to Erlang
QuickCheck, Haskell Hedgehog, ScalaCheck, ....
To do so, the STM library also performs a sequence of random
operations in parallel and records the results. In contrast to Lin,
STM then checks whether the observed results are linearizable by
reconciling them with a sequential execution of a model description.
The model expresses the intended meaning of each tested command. As
such, it requires more of the user compared to Lin. The
corresponding code to describe a Hashtbl test using STM is
given below:
open QCheck
open STM
(** parallel STM tests of Hashtbl *)
module HashtblModel =
struct
type sut = (char, int) Hashtbl.t
type state = (char * int) list
type cmd =
| Add of char * int
| Remove of char
| Find of char
| Mem of char
| Length [@@deriving show { with_path = false }]
let init_sut () = Hashtbl.create ~random:false 42
let cleanup (_:sut) = ()
let arb_cmd (s:state) =
let char =
if s=[]
then Gen.printable
else Gen.(oneof [oneofl (List.map fst s); printable]) in
let int = Gen.nat in
QCheck.make ~print:show_cmd
(Gen.oneof
[Gen.map2 (fun k v -> Add (k,v)) char int;
Gen.map (fun k -> Remove k) char;
Gen.map (fun k -> Find k) char;
Gen.map (fun k -> Mem k) char;
Gen.return Length; ])
let next_state (c:cmd) (s:state) = match c with
| Add (k,v) -> (k,v)::s
| Remove k -> List.remove_assoc k s
| Find _
| Mem _
| Length -> s
let run (c:cmd) (h:sut) = match c with
| Add (k,v) -> Res (unit, Hashtbl.add h k v)
| Remove k -> Res (unit, Hashtbl.remove h k)
| Find k -> Res (result int exn, protect (Hashtbl.find h) k)
| Mem k -> Res (bool, Hashtbl.mem h k)
| Length -> Res (int, Hashtbl.length h)
let init_state = []
let precond (_:cmd) (_:state) = true
let postcond (c:cmd) (s:state) (res:res) = match c,res with
| Add (_,_), Res ((Unit,_),_)
| Remove _, Res ((Unit,_),_) -> true
| Find k, Res ((Result (Int,Exn),_),r) -> r = (try Ok (List.assoc k s) with Not_found -> Error Not_found)
| Mem k, Res ((Bool,_),r) -> r = List.mem_assoc k s
| Length, Res ((Int,_),r) -> r = List.length s
| _ -> false
end
module HT_seq = STM_sequential.Make(HashtblModel)
module HT_dom = STM_domain.Make(HashtblModel)
;;
QCheck_base_runner.run_tests_main
(let count = 200 in
[HT_seq.agree_test ~count ~name:"Hashtbl test sequential";
HT_dom.agree_test_par ~count ~name:"Hashtbl test parallel"; ])Again this requires a type sut for the system under test, and
bindings init_sut and cleanup for setting it up and tearing it
down. The type cmd describes the tested commands.
The type state = (char * int) list describes with a pure association
list the internal state of a Hashtbl. The init_state represents
the empty Hashtbl mode and the state transition function
next_state describes how it changes across each cmd. For
example, Add (k,v) appends the key-value pair onto the association
list.
arb_cmd is a generator of cmds, taking state as a parameter.
This allows for state-dependent cmd generation, which we use
to increase the chance of producing a Remove 'c', Find 'c', ...
following an Add 'c'. Internally arb_cmd uses QCheck combinators
Gen.return, Gen.map, and Gen.map2 to generate one of
5 different commands.
run executes the tested cmd over the sut and wraps the result up
in a result type res offered by STM. Combinators unit, bool,
int, ... allow to annotate the result with the expected type.
postcond expresses a post-condition by matching the received res,
for a cmd with the corresponding answer from the model. For
example, this compares the Boolean result r from Hashtbl.mem with
the result from List.mem_assoc. Similarly precond expresses a
pre-condition.
The module is phrased as functors:
- the functor
STM_sequential.Makeproduces a module with a functionagree_testto test whether the model agrees with thesutacross a sequential run of an arbitrary command sequence and - the functor
STM_domain.Makeproduces a module with a functionagree_test_parwhich tests in parallel byspawning two domains withDomainsimilarly toLinand searches for a sequential interleaving over the model.
When running the above with the command dune exec doc/example/stm_tests.exe
one may obtain the following output:
Messages for test Hashtbl test parallel:
Results incompatible with linearized model
|
|
.------------------------------------.
| |
(Add ('e', 5268)) : () (Add ('!', 4)) : ()
Length : 1 Length : 1
This illustrates how two hashtable Add commands may interfere when
executed in parallel, leaving only 1 entry in the resulting
Hashtbl - which is not reconcilable with the declarative model
description.
The above examples are available from the doc/example directory. The doc directory also contains our 2022 OCaml Workshop paper presenting the project in a bit more detail.
Both Lin and STM perform randomized property-based testing with
QCheck. When rerunning a test to shrink/reduce the test input, QCheck
thus starts from the same Random seed to limit non-determinism.
This is however not suffient for multicore programs where CPU
scheduling and garbage collection may hinder reproducibility.
Lin and STM primarily uses test repetition to increase
reproducibility and it is sufficient that only a single repetition
triggers an issue. Currently repeating a non-deterministic QCheck
property can be done in two different ways:
- a
repeat-combinator lets you test a property, e.g., 50 times rather than just 1. (Pro: a failure is found faster, Con: wasted, repetitive testing when there are no failures) - a
QCheckPR extendsTest.makewith a~retriesparameter causing it to only perform repetition during shrinking. (Pro: each test is cheaper so we can run more, Con: more tests are required to trigger a race)
Since PR#540 the
Lin_thread and STM_thread modes both utilize a Gc.Memprof callback to
trigger more frequent context switching at allocation points, effectively
increasing the chance of surfacing Thread-based concurrency issues.
A Gc stress test including Gc.compact could trigger hangs and segfaults,
due to a race condition on orphaned major heap pools, which could leave
dangling heap pointers.
A Lin Dynarray test observed rare unsuspected values (and even rarer crashes)
on the CI's musl build, which turned out to be due to tearing in the memcpy
underlying Array.sub andArray.copy.
A recently merged PR replacing blocking functions
by unblocking ones caused a regression in the form of deadlocks in the parallel Sys STM test.
Early STM tests of the unboxed Dynarray PR triggered
segfaults caused by mixing flat float arrays with boxed arrays.
An assertion error revealed a race condition between two atomic updates underlying the coordination between a spawned domain and its backup thread.
An assertion error during the upstreaming of a mark-delay improvement, revealed a problem with the marking color of orphaned Ephemerons.
The Out_channel tests found that flush may raise a Sys_error
exception when used in parallel with a close.
Both the Gc and Domain.DLS tests triggered crashes due to bytecode fragments in
callbacks not being properly unregistered,
fixed in a separate PR.
Tests of the Gc module revealed that Gc.control records contain
constant zero fields ignored by Gc.set
Tests of the Gc module revealed that Gc.quick_stat did not
return a record with 4 zero fields as documented
New GC tests offered a simple reproducer for consistently triggering a shared heap assertion error
New GC tests spotted an issue with unsafe root registration in
Gc.counters in 5.2.0, already fixed upstream
Tests of In_channel would trigger an occasional race in a debug
assertion, due to a TOCTOU
race for incoming interrupts during backup thread termination
Tests of Dynlink on Windows revealed that the underlying FlexDLL was
unsafe for parallel usage
Earlier fixes to bring Windows behaviour closer to other platforms introduced
an unfortunate cornercase regression
if attempting to Sys.rename a parent directory to an empty child directory
A configure PR accidentally introduced a regression causing a flexlink test to fail for a Cygwin build
We observed crashes and hangs of the threadomain test under
MinGW, which turned out to be due
to unsafe systhread yielding.
While revising out Out_channel tests we discovered a regression when
outputting to a closed Out_channel
A change to the OCaml compiler's internals revealed that dune was not
using CAML_INTERNALS according to the OCaml manual
The tests found a regression where a failure to create a domain would trigger an abort rather than an exception
A PR merged to trunk reintroduced off-by-one assertion errors in caml_reset_young_limit
The thread_joingraph test triggered an assertion boundary case in
caml_memprof_renew_minor_sample from memprof.c
The thread_joingraph test triggered an assertion boundary case in
caml_reset_young_limit which was too strict
A repro test case submitted upstream from multicoretests to the ocaml
compiler test suite and two separate
multicoretests all triggered an race condition in install_backup_thread
The sequential Float.Array STM test revealed that a float register
was not properly preserved on ppc64, sometimes resulting in
random float values appearing
The sequential STM tests of Array, Bytes, and Float.Array
would trigger segfaults on ppc64
Negative Lin Effect tests exercising exceptions for unhandled
Effects triggered a crash on a frame pointer switch
Negative Lin Effect tests exercising exceptions for unhandled
Effects also triggered a crash on the newly restored s390x backend
Sequential STM tests targeting Sys.rename found two corner cases
where MingW behaves differently
Parallel Lin tests of the Dynlink module found a race
condition in accesses to
the global variables storing the last error.
Parallel STM tests of the Buffer module found a segfault, leading
to the discovery of an assertion failure
revealing a race condition in the add_string function
Parallel STM tests found a combination of Weak Hashset functions
that may cause the run-time to abort or segfault
Sequential STM tests of Sys showed how Sys.readdir of a
non-existing directory on MingW Windows returns an empty array, thus
disagreeing with the Linux and macOS behavior
A failure of Lin In_channel tests revealed that seek on a closed
in_channel may read uninitialized memory
Racing Weak.set and Weak.get can in some cases produce strange values
In seldom cases the tests would trigger a segfault in the bytecode interpreter when treating an unhandled Effect
In some cases (even sequential) the Ephemeron tests can trigger an assertion failure and abort.
The Bytes tests triggered a segfault which turned out to be caused by an unsafe Bytes.escaped definition.
The tests triggered an apparent infinite loop on ARM64 while amd64 would complete the tests as expected.
The tests found that the Buffer module implementation is unsafe under parallel usage - initially described in multicoretests#63.
The tests found an issue causing a segfault on MacOS.
The tests found a problem with In_channel and Out_channel which
could trigger segfaults under parallel usage. For details see
issue ocaml-multicore/multicoretests#13 and
this ocaml/ocaml#10960 comment.
The tests found an issue in Domainslib.parallel_for_reduce which
would yield the wrong result for empty arrays.
As of domainslib#100
the Domainslib tests have been moved to the Domainslib repo.
The initial tests of ws_deque just applied the parallelism property agree_prop_par.
However that is not sufficient, as only the original domain (thread)
is allowed to call push, pop, ..., while a spawned domain
should call only steal.
A custom, revised property test runs a cmd prefix, then
spawns a "stealer domain" with steal, ... calls, while the
original domain performs calls across a broder random selection
(push, pop, ...). As of
lockfree#43
this test has now been moved to the lockfree repo.
Here is an example output illustrating how size may return -1 when
used in a "stealer domain". The first line in the Failure section lists
the original domain's commands and the second lists the stealer
domains commands (Steal,...). The second Messages section lists a
rough dump of the corresponding return values: RSteal (Some 73) is
the result of Steal, ... Here it is clear that the spawned domain
successfully steals 73, and then observes both a -1 and 0 result from
size depending on timing. Size should therefore not be considered
threadsafe (none of the
two
papers make any such
promises though):
$ dune exec src/ws_deque_test.exe
random seed: 55610855
generated error fail pass / total time test name
[✗] 318 0 1 317 / 10000 2.4s parallel ws_deque test (w/repeat)
--- Failure --------------------------------------------------------------------
Test parallel ws_deque test (w/repeat) failed (8 shrink steps):
Seq.prefix: Parallel procs.:
[] [(Push 73); Pop; Is_empty; Size]
[Steal; Size; Size]
+++ Messages ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Messages for test parallel ws_deque test (w/repeat):
Result observations not explainable by linearized model:
Seq.prefix: Parallel procs.:
[] [RPush; (RPop None); (RIs_empty true); (RSize 0)]
[(RSteal (Some 73)); (RSize -1); (RSize 0)]
================================================================================
failure (1 tests failed, 0 tests errored, ran 1 tests)Testing Domainslib.Tasks with one dependency and with 2 work pools
found a segfault in domainslib.
As of domainslib#100
the domainslib/task_one_dep.ml test in question has been moved to
the Domainslib repo.
A reported deadlock in domainslib motivated the development of these tests:
The tests domainslib/task_one_dep.ml and
domainslib/task_more_deps.ml were run with a timeout to prevent
deadlocking indefinitely. domainslib/task_one_dep.ml could trigger one
such deadlock. As of domainslib#100
these tests have been moved to the Domainslib repo.
The test exhibits no non-determistic behaviour when repeating the same tested property from within the QCheck test. However it fails (due to timeout) on the following test input:
$ dune exec -- src/task_one_dep.exe -v
random seed: 147821373
generated error fail pass / total time test name
[✗] 25 0 1 24 / 100 36.2s Task.async/await
--- Failure --------------------------------------------------------------------
Test Task.async/await failed (2 shrink steps):
{ num_domains = 3; length = 6;
dependencies = [|None; (Some 0); None; (Some 1); None; None|] }
================================================================================
failure (1 tests failed, 0 tests errored, ran 1 tests)This corresponds to the following program with 3+1 domains and 6 promises. It loops infinitely with both bytecode/native:
...
open Domainslib
(* a simple work item, from ocaml/testsuite/tests/misc/takc.ml *)
let rec tak x y z =
if x > y then tak (tak (x-1) y z) (tak (y-1) z x) (tak (z-1) x y)
else z
let work () =
for _ = 1 to 200 do
assert (7 = tak 18 12 6);
done
let pool = Task.setup_pool ~num_additional_domains:3 ()
let p0 = Task.async pool work
let p1 = Task.async pool (fun () -> work (); Task.await pool p0)
let p2 = Task.async pool work
let p3 = Task.async pool (fun () -> work (); Task.await pool p1)
let p4 = Task.async pool work
let p5 = Task.async pool work
let () = List.iter (fun p -> Task.await pool p) [p0;p1;p2;p3;p4;p5]
let () = Task.teardown_pool poolThis project has been created by Tarides. It is now maintained with support from the OCaml Software Foundation.