0% found this document useful (0 votes)

59 views42 pages

Part 4 - Easy Data Parallelism

This document discusses data parallelism in Java streams. It begins by explaining why parallelism is important due to multicore CPUs. It then defines data parallelism as distributing data over different processes to be processed simultaneously. The document provides examples of using parallel streams in Java to parallelize processing of data. It notes some pitfalls to avoid, such as interfering with data sources, misusing reduce, holding locks, or using mutable shared state. Overall it presents best practices for effectively parallelizing stream processing of data in Java.

Uploaded by

Ionut Negru

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

59 views42 pages

Part 4 - Easy Data Parallelism

Uploaded by

Ionut Negru

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 42

Easy Data Parallelism

Richard Warburton
Raoul-Gabriel Urma
Overview
● Why is parallelism Important?

● What is data parallelism?

● Parallelising your Streams

● Performance and Internals

Why is Parallelism important?
source: http://www.gotw.ca/images/CPU.png
Multicore
What is Data Parallelism?
Concurrency is not Parallelism!
● Concurrency
○ At least two threads are making progress
○ May not run at the same time
○ Eg: chrome and eclipse both running

● Parallelism
○ At least two threads are executing simultaneously
○ A specific case of concurrency
○ Eg: servlet container dealing with two users at
once on a multicore machine
Parallelism
● Task
○ Distribute execution processes over processes
○ Threads and Executors in Java
○ Eg: each thread services a user in JEE App

● Data
○ Distribute data over different processes
○ Support built on top of Streams
○ Eg: process a payroll and give each core 100
employee’s salary
What are good data parallel
problems?
● Big Batch Jobs

○ Transaction Processing

○ Analytics/Reporting

● Web crawlers / parsers

● Maths

○ Monte Carlo Simulations

○ Linear Algebra
What’s a good data parallel problem from your

workplace?
Parallelising your Streams
Data Parallelism
● Useful
○ a lot of data
○ want to process in a similar way

● API aims to be explicit, but unobtrusive

○ .parallelStream()
○ .parallel()

● Can flip between sequential and parallel

Data Parallelism

// Replace stream() with parallelStream()

Set<String> origins = musicians
.parallelStream()
.filter(artist -> artist.getName().startsWith("The"))
.map(artist -> artist.getNationality())
.collect(toSet());
Not all serial code works in parallel.
DON’T interfere with data sources

// add double each value into a list.

List<Integer> numbers = getNumbers();

numbers.parallelStream()
.forEach(i -> numbers.add(i * 2));
Referring to data sources fixed

// add double each value into a list.

List<Integer> numbers = getNumbers();

numbers = numbers.parallelStream()
.flatMap(i -> Stream.of(i, i * 2))
.collect(toList());
DON’T misuse reduce

int totalCost(List<Purchase> items) {

return items.parallelStream()
.reduce(DELIVERY_FEE,
(tally, item) -> tally + item.cost());
}
Associativity

“you can flip order around and things still work”

(4 + 2) + 1 = 4 + (2 + 1) = 7
(4 * 2) * 1 = 4 * (2 * 1) = 8
Identity

“the do nothing value”

0 + 5 = 5
1 * 5 = 5
How to fix reduce

int totalCost(List<Purchase> items) {

return DELIVERY_FEE
+ items.parallelStream()
.reduce(0,
(tally, item) -> tally + item.cost());
}
How to fix reduce (2)

int totalCost(List<Purchase> items) {

return DELIVERY_FEE
+ items.parallelStream()
.mapToInt(Purchase::getCost)
.sum();
}
DON’T hold locks

List<Integer> values = getValues();

CountDownLatch latch = new CountDownLatch(values.size());

values.parallelStream()
.forEach(i -> {
try {
doSomething(i);
// Potential Deadlock
latch.countdown();
} catch (Exception e ) {
e.printStackTrace();
}});
No mutable state!
public static long sideEffectParallelSum(long n) {
Accumulator accumulator = new Accumulator();
LongStream.rangeClosed(1,n).parallel()
.forEach(accumulator::add);
return accumulator.total;
}

public static class Accumulator {

private long total = 0;
public void add(long value) {
total += value;
}
}
Parallel Code Summary
● Very easy to make your code parallel,

but …

● Sometimes you can get away with things

sequentially that you can’t in parallel
○ sources
○ reduce
○ locks
○ unprotected mutable data
Performance and Internals
Under the hood

● Work distributed using Fork/Join framework

● Distributed by data

● New abstraction: Spliterator

Parallel Integer Sums

int sum =
values.parallelStream()
.mapToInt(i -> i)
.sum();
Spliterator
public interface Spliterator<T> {
/** Carve off a portion of the data
into a separate Spliterator */
Spliterator<T> trySplit();

/** Iterate the data described by this Spliterator */

void forEachRemaining(Consumer<? super T> action);

/** The size of the data described

by this Spliterator, if known */
long getExactSizeIfKnown();
}
Always a tradeoff ...
● Parallelism eats more CPU time
○ Thread communication
○ Distributing & Decomposing work
○ Potentially increased memory pressure
○ Competing for the CPU with other processes

● It can reduce wall time

○ Time from beginning to end of the processes’
execution
○ Ideally only need to wait for 1/N of the execution
time
Decomposition Performance
● Data Size

● Source Data Structure

● Packing

● Number of Cores

● Cost per Element

Data Structures
● Good
○ ArrayList / Intstream.range / Stream.of
○ Random Access + Easy to balance
● Meh
○ Hashset / Treeset
○ Usually good balance
● Bad
○ LinkedList / BufferedReader.lines() /
Streams.iterate()
○ Unknown length
○ bad random access performance
Stateful Operations
● Stateless
○ no need to keep state when evaluated
○ eg: map, reduce
○ superior parallel decomposition
○ bounded amounts of data

● Stateful
○ accumulate state during evaluation
○ eg: sorted
○ unbounded caching of data
Benchmarking and Testing
● Don’t assume parallel = faster, measure it
● Use jmh:
http://openjdk.java.net/projects/code-tools/jmh/

● Best Practices
○ Warmup
○ Repeatability
○ Evade the JIT
Summary
Lesson Summary

● Easy to obtain Data Parallelism

● Pick your situation well

● A lot of performance influencers

● Benchmark your parallel code

The End
Exercise
In: com.java_8_training.problems.data_parallelism

1. Looks at OptimisationExample
2. Try to improve the performance of this code
3. Measure performance using the benchmark harness
4. Don’t make the code uglier!
Exercise
In: com.java_8_training.problems.data_parallelism

1. Parallelise the sum of squares method

Question1Test

2. Fix the bug in the "multiplyThrough" method

Question2Test

3. Remove the locks and keep the code safe

Question3Test
Amdahl’s Law
● Defines upper bound for parallel speedup

● Time(n) = Time(1) * (s + 1/n * (1 - s))

○ n = number of cores
○ s = proportion of code that is strictly serial

● Speedup(n) = 1 / (s + 1/n * (1 - s))

● Example
○ 1024 cores, 50% serial
○ 1 / (0.5 + 1/1024 * (1 - 0.5)) ~= 2x speedup

Unit II
No ratings yet
Unit II
81 pages
Java Multithreading For Senior Engineering Interviews Part I
No ratings yet
Java Multithreading For Senior Engineering Interviews Part I
80 pages
Depth of Parallel Stream in Java
No ratings yet
Depth of Parallel Stream in Java
5 pages
En - Secure Coding in C and C++ Race Conditions
No ratings yet
En - Secure Coding in C and C++ Race Conditions
89 pages
? Java Stream
No ratings yet
? Java Stream
13 pages
Java Stream API Guide
No ratings yet
Java Stream API Guide
7 pages
Android Developer Fundamentals Course Practicals en PDF
100% (5)
Android Developer Fundamentals Course Practicals en PDF
566 pages
Unit3 ppt4
No ratings yet
Unit3 ppt4
28 pages
Upper Knowledge of Functional Streams With Parallel Streams in Java
No ratings yet
Upper Knowledge of Functional Streams With Parallel Streams in Java
3 pages
Deadlock Notes
No ratings yet
Deadlock Notes
3 pages
Best Process Managment Handout.
No ratings yet
Best Process Managment Handout.
34 pages
Java 8 Features Cheat Sheet
100% (1)
Java 8 Features Cheat Sheet
2 pages
BDS Session 7 Distributed Programming
No ratings yet
BDS Session 7 Distributed Programming
53 pages
BDS Session 7
No ratings yet
BDS Session 7
51 pages
Dbms 3rd Dbms 3rd Unit
No ratings yet
Dbms 3rd Dbms 3rd Unit
7 pages
Stream & Parallel Stream - Java 8
No ratings yet
Stream & Parallel Stream - Java 8
11 pages
3 - Scheduling Queue, Schedulers, Context Switch
No ratings yet
3 - Scheduling Queue, Schedulers, Context Switch
13 pages
5 Java Concurrent Patterns Advanced m5 Slides
No ratings yet
5 Java Concurrent Patterns Advanced m5 Slides
79 pages
Python Mutex Solution for Readers-Writers
No ratings yet
Python Mutex Solution for Readers-Writers
14 pages
Android Developer Fundamentals Course Concepts en
100% (2)
Android Developer Fundamentals Course Concepts en
434 pages
Map Reduce
No ratings yet
Map Reduce
44 pages
Distributed Systems Course Guide
No ratings yet
Distributed Systems Course Guide
4 pages
Parallelism (The Java™ Tutorials - Collections - Aggregate Operations)
No ratings yet
Parallelism (The Java™ Tutorials - Collections - Aggregate Operations)
5 pages
Devoxx Dagger2.ForExport PDF
No ratings yet
Devoxx Dagger2.ForExport PDF
291 pages
CS 3306 01 Written Assignment Unit 2
No ratings yet
CS 3306 01 Written Assignment Unit 2
5 pages
6.189 Lecture5 Parallelism
No ratings yet
6.189 Lecture5 Parallelism
63 pages
COA Chapter 2
No ratings yet
COA Chapter 2
8 pages
PCP 2025 1 Introduction
No ratings yet
PCP 2025 1 Introduction
45 pages
ch07 Synchronization Examples - Blankfill
No ratings yet
ch07 Synchronization Examples - Blankfill
15 pages
Bounded-Buffer Problem
No ratings yet
Bounded-Buffer Problem
9 pages
OS-Day Wise Course - Handout 2020-21
No ratings yet
OS-Day Wise Course - Handout 2020-21
5 pages
L04 Concurrency Consistency
No ratings yet
L04 Concurrency Consistency
39 pages
Multi Threaded Programming: Heavyweight Process. There Is One Program Counter, and One Sequence of
No ratings yet
Multi Threaded Programming: Heavyweight Process. There Is One Program Counter, and One Sequence of
39 pages
Multitasking (Overview)
No ratings yet
Multitasking (Overview)
23 pages
Threads: Abhijit A M
No ratings yet
Threads: Abhijit A M
40 pages
Java Parallel & Async Programming
No ratings yet
Java Parallel & Async Programming
144 pages
Parallel Computing for Students
No ratings yet
Parallel Computing for Students
113 pages
Fork and Exec
No ratings yet
Fork and Exec
9 pages
... The Garden: Turfline Owner's Favourite
No ratings yet
... The Garden: Turfline Owner's Favourite
2 pages
Hunter Run Time Calculator - Door Card - X-Core Main
No ratings yet
Hunter Run Time Calculator - Door Card - X-Core Main
1 page
CPU Scheduling Concepts
No ratings yet
CPU Scheduling Concepts
4 pages
Axolight 2017 PDF
No ratings yet
Axolight 2017 PDF
176 pages
Ds-2Cd2047G2-L (U) 4 MP Colorvu Fixed Bullet Network Camera
No ratings yet
Ds-2Cd2047G2-L (U) 4 MP Colorvu Fixed Bullet Network Camera
5 pages
Parallel & Distributed Computing Course
50% (2)
Parallel & Distributed Computing Course
2 pages
Hunter Run Time Calculator - Programming Guide - X-Core Main
No ratings yet
Hunter Run Time Calculator - Programming Guide - X-Core Main
1 page
RN Hik Design Tool v1.0.1.5 011320NA PDF
No ratings yet
RN Hik Design Tool v1.0.1.5 011320NA PDF
2 pages
DG GraphPaper em
No ratings yet
DG GraphPaper em
2 pages
AcuSense Security Cameras & DVRs
No ratings yet
AcuSense Security Cameras & DVRs
8 pages
Parallalism Tips
No ratings yet
Parallalism Tips
6 pages
Module 7
No ratings yet
Module 7
28 pages
Swe3001 Da4
No ratings yet
Swe3001 Da4
9 pages
OS Question Bank
No ratings yet
OS Question Bank
3 pages
Parallel Computing: "Parallelization" Redirects Here. For Parallelization of Manifolds, See
No ratings yet
Parallel Computing: "Parallelization" Redirects Here. For Parallelization of Manifolds, See
20 pages
CSC323 PoOS Syllabus V3.1
No ratings yet
CSC323 PoOS Syllabus V3.1
4 pages
Parallel Computation Models Explained
No ratings yet
Parallel Computation Models Explained
3 pages
Ingredients For A Healthy Codebase
No ratings yet
Ingredients For A Healthy Codebase
97 pages
Distributed Mutual Exclusion Upload
No ratings yet
Distributed Mutual Exclusion Upload
48 pages
Linux Process
No ratings yet
Linux Process
28 pages
Parallel Streams in Java
No ratings yet
Parallel Streams in Java
2 pages
NVR and Camera Configuration Guide
No ratings yet
NVR and Camera Configuration Guide
22 pages
Part 5 - Testing and Debugging Lambda Expresions
No ratings yet
Part 5 - Testing and Debugging Lambda Expresions
32 pages
Lock Free
No ratings yet
Lock Free
4 pages
Data Parallel Model
No ratings yet
Data Parallel Model
11 pages
Spring Boot
No ratings yet
Spring Boot
4 pages
In3200 Chap05
No ratings yet
In3200 Chap05
34 pages
U2 - Mechanism of Process Creation - 37 - 01-10-20
No ratings yet
U2 - Mechanism of Process Creation - 37 - 01-10-20
2 pages
CS609
100% (1)
CS609
292 pages
Java-1 8 PDF
100% (1)
Java-1 8 PDF
143 pages
Java CountDownLatch Explained
No ratings yet
Java CountDownLatch Explained
128 pages
26 Parallel Algorithms
No ratings yet
26 Parallel Algorithms
24 pages
PCP 2022 7 MutualExclusion
No ratings yet
PCP 2022 7 MutualExclusion
49 pages
Java Concurency in Practice
No ratings yet
Java Concurency in Practice
234 pages
BDS Session 5
No ratings yet
BDS Session 5
48 pages
Day14 Collections
No ratings yet
Day14 Collections
32 pages
Java Multithreading Basics
No ratings yet
Java Multithreading Basics
90 pages
Lambda Operations
No ratings yet
Lambda Operations
34 pages
PCP 2022 6 ParallelAlgorithms PartI
No ratings yet
PCP 2022 6 ParallelAlgorithms PartI
39 pages
Lecture 9-OpenMP Coclusion
No ratings yet
Lecture 9-OpenMP Coclusion
39 pages
Java Stream API 1740049148
No ratings yet
Java Stream API 1740049148
17 pages
Lecture 05
No ratings yet
Lecture 05
73 pages
Parallel Algorithms: Theory and Practice
No ratings yet
Parallel Algorithms: Theory and Practice
44 pages
Cloud Computing Unit4
No ratings yet
Cloud Computing Unit4
55 pages
From Concurrent To Parallel
No ratings yet
From Concurrent To Parallel
35 pages
Parallel Asynchronous Programming Java
No ratings yet
Parallel Asynchronous Programming Java
144 pages
Content Development - Umair (172-190)
No ratings yet
Content Development - Umair (172-190)
17 pages
Parallel Programming
No ratings yet
Parallel Programming
42 pages
E - Notes - HPC-Unit 3-1
No ratings yet
E - Notes - HPC-Unit 3-1
26 pages
BDS Session 6
No ratings yet
BDS Session 6
53 pages
Fork Join
No ratings yet
Fork Join
24 pages
Stream API 1742391736
No ratings yet
Stream API 1742391736
7 pages
Google Guava Concurrent Slides
No ratings yet
Google Guava Concurrent Slides
39 pages
Linq Plinq
No ratings yet
Linq Plinq
24 pages
Paper Content
No ratings yet
Paper Content
21 pages
Concurrency Models
No ratings yet
Concurrency Models
22 pages
Java Concurrency Essentials
No ratings yet
Java Concurrency Essentials
34 pages
Don't Repeat Yourself: 10 Principle-Based Acronyms Clear, Real-World Examples When and How
No ratings yet
Don't Repeat Yourself: 10 Principle-Based Acronyms Clear, Real-World Examples When and How
25 pages
Android App Development Guide
No ratings yet
Android App Development Guide
28 pages
Overview of Parallel Programming in C++ - Pablo Halpern - CppCon 2014
No ratings yet
Overview of Parallel Programming in C++ - Pablo Halpern - CppCon 2014
37 pages
Introduction To Parallel Programming - Student Workbook With Instructor's Notes PDF
No ratings yet
Introduction To Parallel Programming - Student Workbook With Instructor's Notes PDF
33 pages
Lecture HPC 11 Parallelization
No ratings yet
Lecture HPC 11 Parallelization
128 pages
Preliminary
No ratings yet
Preliminary
169 pages
Part 1 - Lecture 3 - Parallel Software-1
No ratings yet
Part 1 - Lecture 3 - Parallel Software-1
45 pages

Part 4 - Easy Data Parallelism

Uploaded by

Part 4 - Easy Data Parallelism

Uploaded by

Easy Data Parallelism

● What is data parallelism?

● Parallelising your Streams

● Performance and Internals

● Web crawlers / parsers

○ Monte Carlo Simulations

● API aims to be explicit, but unobtrusive

● Can flip between sequential and parallel

// Replace stream() with parallelStream()

// add double each value into a list.

List<Integer> numbers = getNumbers();

// add double each value into a list.

List<Integer> numbers = getNumbers();

int totalCost(List<Purchase> items) {

“you can flip order around and things still work”

“the do nothing value”

int totalCost(List<Purchase> items) {

int totalCost(List<Purchase> items) {

List<Integer> values = getValues();

public static class Accumulator {

● Sometimes you can get away with things

● Work distributed using Fork/Join framework

● New abstraction: Spliterator

/** Iterate the data described by this Spliterator */

/** The size of the data described

● It can reduce wall time

● Source Data Structure

● Cost per Element

● Easy to obtain Data Parallelism

● Pick your situation well

● A lot of performance influencers

● Benchmark your parallel code

1. Parallelise the sum of squares method

2. Fix the bug in the "multiplyThrough" method

3. Remove the locks and keep the code safe

● Time(n) = Time(1) * (s + 1/n * (1 - s))

● Speedup(n) = 1 / (s + 1/n * (1 - s))

You might also like