Java Stream API & Collectors — Complete
Practitioner’s Guide
Generated on August 16, 2025 15:15
How to Read This Guide
This document is a practical, interview-ready reference to the Java Stream ecosystem:
Stream/BaseStream, primitive streams (IntStream, LongStream, DoubleStream), Optional types, and the
Collectors toolkit. For each method, you’ll find purpose, key behaviors, examples, and corner cases.
Common/most useful APIs are {\f1\cf5 highlighted like this}.
Streams in One Minute
• A Stream is a sequence of elements supporting aggregate operations in a pipelined fashion.
• Pipelines have three parts: source → intermediate operations → terminal operation.
• Operations are lazily evaluated until a terminal operation is invoked.
• Streams don’t store data; they view data. Most operations are non-mutating.
Creating Streams (Sources)
• {\f1\cf5 Collection.stream(), parallelStream()} — from in-memory collections.
• {\f1\cf5 Stream.of(T...)}, {\f1\cf5 Stream.ofNullable(T)} — varargs / possibly-null single element
(JDK 9+).
• {\f1\cf5 Stream.generate(Supplier)} — infinite stream; use {\f1\cf5 limit(n)}.
• {\f1\cf5 Stream.iterate(seed, UnaryOperator)} — infinite; {\f1\cf5 iterate(seed, hasNext, next)}
bounded (JDK 9+).
• {\f1\cf5 Arrays.stream(array)} — arrays; boxed or primitive variants.
• {\f1\cf5 Files.lines(Path)}, {\f1\cf5 BufferedReader.lines()} — I/O-backed streams; remember to
close.
• {\f1\cf5 Pattern.splitAsStream(CharSequence)} — splitting text as a stream.
--- code ---
// Common sources
Stream<String> a = Stream.of("a", "b", "c"); // common
Stream<String> maybeOne = Stream.ofNullable(System.getenv("USER")); // JDK
9+
Stream<Integer> evens = Stream.iterate(0, x -> x + 2).limit(5); //
0,2,4,6,8
--- end code ---
Intermediate Operations
Stateless
• {\f1\cf5 filter(Predicate)} — keep matching elements.
• {\f1\cf5 map(Function)} — transform each element.
• {\f1\cf5 mapToInt/Long/Double} — specialized projections.
• {\f1\cf5 flatMap(Function<T,Stream<R>>)} — flatten one level.
• {\f1\cf5 flatMapToInt/Long/Double} — flatten to primitives.
• {\f1\cf5 mapMulti(BiConsumer<T,Consumer<R>>) (JDK 16+)} — emit 0..N results per input
without creating intermediate streams (perf).
• {\f1\cf5 peek(Consumer)} — debug/observe; side-effects discouraged.
Stateful
• {\f1\cf5 distinct()} — deduplicate via equals/hashCode.
• {\f1\cf5 sorted() / sorted(Comparator)} — natural/custom order.
• {\f1\cf5 limit(n), skip(n)} — truncation / offset.
• {\f1\cf5 takeWhile(Predicate), dropWhile(Predicate) (JDK 9+)} — prefix/suffix slicing until
predicate flips.
Stream configuration
• {\f1\cf5 sequential(), parallel()} — mode hints for execution.
• {\f1\cf5 unordered()} — allow relaxed ordering when safe.
• {\f1\cf5 onClose(Runnable)} — callback when stream closes.
--- code ---
// Example: flatten and deduplicate sorted tags
List<String> tags = posts.stream()
.flatMap(p -> p.getTags().stream())
.map(String::toLowerCase) // common
.distinct()
.sorted()
.toList(); // JDK 16+
--- end code ---
Terminal Operations
• {\f1\cf5 forEach(Consumer) / forEachOrdered(Consumer)} — consume; ordered variant
preserves encounter order.
• {\f1\cf5 toArray() / toArray(IntFunction<A[]>)} — materialize array.
• {\f1\cf5 reduce(identity, accumulator) / reduce(accumulator) / reduce(identity, accumulator,
combiner)} — fold.
• {\f1\cf5 collect(Collector) / collect(supplier, accumulator, combiner)} — general reduction.
• {\f1\cf5 min/max(Comparator)}, {\f1\cf5 count()}
• {\f1\cf5 anyMatch/allMatch/noneMatch(Predicate)} — short-circuit checks.
• {\f1\cf5 findFirst()/findAny()} — Optional results; in parallel, {\f1\cf5 findAny} may be faster.
• {\f1\cf5 toList() (JDK 16+)} — unmodifiable List (common & recommended).
--- code ---
// Reduce vs Collect
int sum = numbers.stream().reduce(0, Integer::sum);
// reduce
int sum2 = numbers.stream().collect(Collectors.summingInt(x -> x)); //
collect
--- end code ---
Primitive Streams (IntStream, LongStream, DoubleStream) — What’s
Special?
• Avoid boxing overhead; provide numeric ops: {\f1\cf5 sum(), average(), summaryStatistics(),
range(), rangeClosed()}
• Conversions: {\f1\cf5 mapToObj, boxed, asLongStream, asDoubleStream}
• Corner case: {\f1\cf5 average()} returns {\f1\cf5 OptionalDouble} — handle empty streams.
--- code ---
IntSummaryStatistics s = IntStream.of(1,2,3).summaryStatistics();
System.out.println(s.getCount()+", "+s.getSum()+", "+s.getAverage());
--- end code ---
Optionals from Streams
• {\f1\cf5 findFirst/findAny/min/max} → {\f1\cf5 Optional<T>}
• Primitive variants: {\f1\cf5 OptionalInt/Long/Double}
• Common handling: {\f1\cf5 orElse, orElseGet, orElseThrow, ifPresent, ifPresentOrElse}
Parallel Streams — Use with Care
• Good for CPU-bound, associative operations over large, non-contentious data.
• Avoid with I/O, synchronization, or tiny datasets.
• Ensure {\f1\cf5 combiner} in {\f1\cf5 reduce/collect} is associative and side-effect free.
--- code ---
// Parallel frequency count (Collector is associative)
Map<String, Long> freq = words.parallelStream()
.collect(Collectors.groupingByConcurrent(String::toString,
Collectors.counting()));
--- end code ---
Collectors — The Swiss Army Knife
Materializing
• {\f1\cf5 toList()} (modifiable unspecified) and {\f1\cf5 toUnmodifiableList()}
• {\f1\cf5 toSet()}, {\f1\cf5 toUnmodifiableSet()}
• {\f1\cf5 toCollection(Supplier<C>)} — choose collection type (e.g., LinkedHashSet).
• {\f1\cf5 joining() / joining(delim[, prefix, suffix])} — concatenate CharSequence.
Maps
• {\f1\cf5 toMap(keyMapper, valueMapper)} — may throw on duplicate keys.
• {\f1\cf5 toMap(kMapper, vMapper, mergeFn)} — resolve duplicates (common).
• {\f1\cf5 toMap(kMapper, vMapper, mergeFn, mapSupplier)} — choose map type.
• {\f1\cf5 toUnmodifiableMap(...)} (JDK 10+)
• {\f1\cf5 toConcurrentMap(...)} — concurrent accumulation.
--- code ---
// Safe toMap with merge on duplicate keys (keep larger value)
Map<String,Integer> bestScore =
entries.stream().collect(Collectors.toMap(
e -> e.name(),
e -> e.score(),
Integer::max
));
--- end code ---
Grouping & Partitioning
• {\f1\cf5 groupingBy(classifier)} — Map<K,List<V>>
• {\f1\cf5 groupingBy(classifier, downstream)} — Map<K,R>
• {\f1\cf5 groupingBy(classifier, mapFactory, downstream)} — choose map type (e.g.,
LinkedHashMap, TreeMap).
• {\f1\cf5 groupingByConcurrent(...)} — concurrent version.
• {\f1\cf5 partitioningBy(predicate)} — Map<Boolean,List<T>>
• {\f1\cf5 partitioningBy(predicate, downstream)}
--- code ---
// Group employees by department and count
Map<String, Long> counts =
emps.stream().collect(Collectors.groupingBy(
Emp::dept, Collectors.counting()
));
--- end code ---
Math & Stats
• {\f1\cf5 counting()}
• {\f1\cf5 summingInt/Long/Double(mapper)}
• {\f1\cf5 averagingInt/Long/Double(mapper)}
• {\f1\cf5 summarizingInt/Long/Double(mapper)}
Transformers & Advanced
• {\f1\cf5 mapping(mapper, downstream)} — map + collect in one pass.
• {\f1\cf5 flatMapping(mapperToStream, downstream)} (JDK 9+) — flatMap + collect.
• {\f1\cf5 filtering(predicate, downstream)} (JDK 9+) — filter within group.
• {\f1\cf5 reducing(identity, mapper, op)} — reduction as a collector.
• {\f1\cf5 collectingAndThen(downstream, finisher)} — post-process result.
• {\f1\cf5 teeing(down1, down2, merger) (JDK 12+)} — combine two collectors.
--- code ---
// Top N per group using collectingAndThen
Map<String, List<Employee>> top3 =
emps.stream().collect(Collectors.groupingBy(
Employee::dept,
Collectors.collectingAndThen(
Collectors.toList(),
list -> list.stream()
.sorted(Comparator.compari
ng(Employee::score).reversed())
.limit(3)
.toList()
)
));
--- end code ---
Collector Mechanics (for Custom Collectors)
• A Collector has {\f1\cf5 supplier, accumulator, combiner, finisher, characteristics}
• Characteristics: {\f1\cf5 UNORDERED, CONCURRENT, IDENTITY_FINISH}
• Rule: combiner must merge two partial results associatively; safe under parallelism.
--- code ---
// Minimal custom Collector: joining ints with brackets
Collector<Integer,StringJoiner,String> bracketJoin =
Collector.of(
() -> new StringJoiner(", ", "[", "]"),
(sj, i) -> sj.add(String.valueOf(i)),
(a, b) -> a.merge(b),
StringJoiner::toString
);
String s = Stream.of(1,2,3).collect(bracketJoin); // [1, 2, 3]
--- end code ---
Corner Cases & Gotchas (Quick Hits)
• Empty {\f1\cf5 min/max/average} → empty Optional; handle default.
• {\f1\cf5 Collectors.toMap} duplicates throw {\f1\cf5 IllegalStateException} unless merge
function provided.
• {\f1\cf5 peek} may not run without a terminal operation; don’t rely on side effects.
• Parallel {\f1\cf5 forEach} is unordered; use {\f1\cf5 forEachOrdered} for order (slower).
• {\f1\cf5 Stream.toList()} is unmodifiable; trying to {\f1\cf5 add} throws {\f1\cf5
UnsupportedOperationException}.
• {\f1\cf5 Files.lines} creates a stream that must be closed (try-with-resources).
• {\f1\cf5 distinct} uses equals/hashCode; mutable elements can break it.
• Avoid shared mutable state in lambdas; use collectors instead.
Most Useful Day-to-Day APIs
• {\f1\cf5 filter → map → collect(toList())}
• {\f1\cf5 flatMap} for one-to-many transformations
• {\f1\cf5 groupingBy + counting/summing/collectingAndThen}
• {\f1\cf5 toMap with merge function}
• {\f1\cf5 toList()} (JDK 16+) over {\f1\cf5 Collectors.toList()} when you want unmodifiable results
• {\f1\cf5 takeWhile/dropWhile} for streaming prefixes/suffixes
Worked Examples
Frequency Map
--- code ---
Map<String, Long> freq =
words.stream()
.map(String::toLowerCase)
.collect(Collectors.groupingBy(w -> w,
Collectors.counting()));
--- end code ---
First Non-Empty String
--- code ---
Optional<String> first =
strings.stream().filter(s -> s != null && !
s.isBlank()).findFirst();
--- end code ---
Safe toMap with Duplicates
--- code ---
Map<String, String> latest =
entries.stream().collect(Collectors.toMap(
e -> e.key(),
e -> e.value(),
(a,b) -> b // keep last
));
--- end code ---
API Reference — Stream<T> (by category)
Creation
of, ofNullable, empty, generate, iterate (2 overloads), builder; Arrays.stream;
Collection.stream/parallelStream; Files.lines; Pattern.splitAsStream, BufferedReader.lines
Intermediate
filter, map, mapToInt/Long/Double, flatMap, flatMapToInt/Long/Double, mapMulti, distinct, sorted,
peek, limit, skip, takeWhile, dropWhile, boxed, parallel, sequential, unordered, onClose
Terminal
forEach, forEachOrdered, toArray, reduce(3), collect(Collector), collect(supplier,acc,combiner), min, max,
count, anyMatch, allMatch, noneMatch, findFirst, findAny, toList (JDK 16+)
API Reference — IntStream/LongStream/DoubleStream
range, rangeClosed (Int/Long); sum, average, min, max, count, summaryStatistics; map, mapToObj,
flatMap, mapMulti; distinct, sorted, limit, skip; boxed; asDoubleStream/asLongStream; parallel,
sequential; collect; reduce; anyMatch/allMatch/noneMatch; findFirst/findAny; toArray; iterate/generate
API Reference — Collectors
toList, toUnmodifiableList, toSet, toUnmodifiableSet, toCollection, toMap (3 overloads),
toUnmodifiableMap, toConcurrentMap (3 overloads), joining (3 overloads), counting,
summingInt/Long/Double, averagingInt/Long/Double, summarizingInt/Long/Double, mapping, filtering,
flatMapping, reducing (3 overloads), collectingAndThen, partitioningBy (2 overloads), groupingBy (3
overloads), groupingByConcurrent (3 overloads), teeing (JDK 12+)
Under the Hood: Spliterators
• A Stream is backed by a {\f1\cf5 Spliterator} with characteristics like {\f1\cf5 ORDERED,
DISTINCT, SORTED, SIZED, NONNULL, IMMUTABLE, CONCURRENT, SUBSIZED}
• Parallel splits work best with balanced, efficiently splittable sources (e.g., ArrayList).
Interview Tips & Patterns
• Explain laziness and stateless vs stateful operations with examples.
• Know {\f1\cf5 toMap} duplicate handling and {\f1\cf5 groupingBy + downstream} combos.
• Avoid side effects; prefer collectors and pure functions.
• Choose sequential vs parallel based on workload and data size.