ZSink.collectAll and Zstream.runAll use chunk #3587

borissmidt · 2020-05-14T21:21:21Z

Change ZSink.collectAll to return Chunk
Change Zstream.runAll to return Chunk

I think i left one todo for the ZIO.foreach that returns a list as well but it would make this PR too big.

There was on inconvenience with the current test system i only had 1 compilation error left
but because it didn't check all test cases i had to compile many times with only a single error reported.

#3575

CLAassistant · 2020-05-14T21:21:26Z

All committers have signed the CLA.

iravid

Thanks. One small comment.

streams/shared/src/main/scala/zio/stream/ZSink.scala

iravid

Taking a step back here, @borissmidt I'd like to see a benchmark of this first.

Building a list (and yes, reversing it) is very fast in Scala because the appends are O(1) and the reversal can happen with a pre-allocated buffer.

According to this: https://www.lihaoyi.com/post/BenchmarkingScalaCollections.html#construction-performance the only thing that is faster than prepending to a list is pre-allocating an array and filling it up, but we can't do that because we don't know up-front what is the size of the stream.

Another thing to consider when collecting to a chunk is the size limit: a ChunkBuilder will accumulate elements into an array, so this has a hard size limit at 2^31. Not sure if this is a showstopper or not.

borissmidt · 2020-05-15T18:58:39Z

Taking a step back here, @borissmidt I'd like to see a benchmark of this first.

Building a list (and yes, reversing it) is very fast in Scala because the appends are O(1) and the reversal can happen with a pre-allocated buffer.

According to this: https://www.lihaoyi.com/post/BenchmarkingScalaCollections.html#construction-performance the only thing that is faster than prepending to a list is pre-allocating an array and filling it up, but we can't do that because we don't know up-front what is the size of the stream.

Another thing to consider when collecting to a chunk is the size limit: a ChunkBuilder will accumulate elements into an array, so this has a hard size limit at 2^31. Not sure if this is a showstopper or not.

Maybe it is a better idea to change it to Seq then as the return type then you can change it on to something that would give the best performance without breaking binary compatibility?

Since i can imagine that there are other better structures than an array that has to be copied every so often and a list that has a lot of indirection. Also in case of reversing it has to iterate over the whole list?

How could i best benchmark this?

looking at the scala 2.13 source for the reverse it doesn't use a pre-allocated buffer so you need to iterate the full list.

  final override def reverse: List[A] = {
    var result: List[A] = Nil
    var these = this
    while (!these.isEmpty) {
      result = these.head :: result
      these = these.tail
    }
    result
  }

In case of a ListBuilder it used mutible cons and replaces the 'next' each time you append to the ListBuilder.

Array builder pre allocates an Array and doubles the size each time it runs out of capacity, (it reminds me of the c++ vector class)

the benchmark doesn't use 'builders' in for construction i could extend that and give an update?
https://github.com/lihaoyi/scala-bench/blob/master/bench/src/main/scala/bench/Benchmark.scala

i'll rerun the constructor benchmarks with a list builder and array builder for scala 2.11/2.12/2.13 since these results are from scala 2.11.8.

as for the maximum size, list, seq and vector only index up to 2^32 elements so maybe use a long as index for chunk ;)

borissmidt · 2020-05-15T22:18:38Z

I've updated the scala-bench project to use builders and will run them overnight.
It could be used to benchmark the chunk collection?

https://github.com/borissmidt/scala-bench

iravid · 2020-05-16T09:40:14Z

@borissmidt I think it could be easier to just add another test case to ChunkBenchmarks.scala in this project.

borissmidt · 2020-05-17T09:52:45Z

bench-result.zip
I was just curios my self to see if the array builder is faster than the normal list add operation and indeed it is. results are the average runtime in nano seconds of iteration of a run.

So in these results i see that the arrayBuilder is signifcantly faster than the array :+ .
A list can be made faster then the arrayBuilder. But its always slower if you reverse it afterwards.

iravid · 2020-05-17T10:20:21Z

Can you post the results here?

iravid · 2020-05-17T10:20:50Z

You were testing list prepend, not append, right?

borissmidt · 2020-05-17T10:26:31Z

Yes list prepend! the :: operation. you can view the code and run it yourself i've posted it.
I also call the builder.result at the end of each run so its a fair comparison.

Another suprising fact, very big arrays become slow to allocate than list > 1M elements in size.

iravid · 2020-05-17T10:37:30Z

Right. I'm looking at the results of ListReverse and ArrayBuilder which is what we'd be using in practice. At 250k elements, the List becomes faster.

I think this is good evidence that ChunkBuilder and Chunk are good for the runCollect usecase!

borissmidt · 2020-05-17T13:07:54Z

I think the added advantage with the chunk is that it's part of the library. So you can improve it. Maybe you could make it a linked list of arrays of at most 128k elements :)

…

On Sun, May 17, 2020, 12:37 Itamar Ravid ***@***.***> wrote: Right. I'm looking at the results of ListReverse and ArrayBuilder which is what we'd be using in practice. At 250k elements, the List becomes faster. I think this is good evidence that ChunkBuilder and Chunk are good for the runCollect usecase! — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#3587 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ABNXZFW2WFDSLBRP6RUX4UTRR647LANCNFSM4NBBJYJQ> .

simpadjo · 2020-05-17T19:21:24Z

ZSink is expected to be the final stage of stream processing. If a user decided to collect a huge stream into a List/Chunk, it opted-in for having abnormally large objects in its program, and will face problems anyway.
I don't think we should spend too much of an effort optimizing for this case.

@borissmidt you can get rid of linting errors by running sbt prepare (see https://github.com/zio/zio/blob/master/docs/about/contributing.md)

borissmidt · 2020-05-18T12:08:09Z

There is still the open decision to keep the old Api and add the chunk as separate function calls.
Or replace them with chunk as i've done now.

borissmidt · 2020-05-20T16:58:10Z

"size can be modified locally" at src/test/scala/zio/test/GenSpec.scala:607 fails and i don't understand why this is related to the changes i had made.

iravid · 2020-05-21T14:34:05Z

@adamgfraser Can you peek at the test failure here and give us a hint?

adamgfraser · 2020-05-21T14:35:27Z

@iravid Yes let me take a look.

streams/shared/src/main/scala/zio/stream/ZSink.scala

borissmidt · 2020-05-21T15:25:34Z

Ah so i'm sharing a reference between a mutable array because its passed by value and not as a thunk, that sounds logical to me. I'll fix it

…

On Thu, May 21, 2020, 17:01 Adam Fraser ***@***.***> wrote: ***@***.**** commented on this pull request. ------------------------------ In streams/shared/src/main/scala/zio/stream/ZSink.scala <#3587 (comment)>: > */ - def collectAll[A]: ZSink[Any, Nothing, A, List[A]] = - foldLeftChunks(Chunk[A]())(_ ++ (_: Chunk[A])).map(_.toList) + def collectAll[A]: ZSink[Any, Nothing, A, Chunk[A]] = + foldLeftChunks[A, ChunkBuilder[A]](ChunkBuilder.make[A]())((b, chunk) => b ++= chunk) I believe the issue is that ChunkBuilder.make() is not wrapped in an effect constructor and is violating referential transparency. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#3587 (review)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ABNXZFTIKQADMKX7ASBKFWLRSU64TANCNFSM4NBBJYJQ> .

borissmidt · 2020-05-22T14:38:20Z

I fixed, it but i'm not sure if it is a clean way of doing it, i tried t make a 'lazy chunkbuilder' that creates a new instance on the first += operation. But it wasn't possible with the typing of the return type.
An alternative could have been using Option. But then i wonder what the impact of the extra allocations would have been.

streams/shared/src/main/scala/zio/stream/ZSink.scala

iravid

Looks good to me. Thank you @borissmidt. Can you fix the conflicts?

streams-tests/shared/src/test/scala/zio/stream/ZSinkSpec.scala

streams-tests/shared/src/test/scala/zio/stream/ZStreamSpec.scala

borissmidt · 2020-05-25T21:49:07Z

I think it should be ok now? i've added the missing merge conflicts.

borissmidt · 2020-05-27T08:24:01Z

I ran the fmt and fix command but there are no more changes detected. so i don't know why the fmtCheck still fails. Is there any reason why fix and fmt aren't ran during the build?

iravid · 2020-05-27T08:26:49Z

Use the prepare command in sbt. I think the contributor guide mentions that.

The CI doesn't commit to Github, so it doesn't format the code there.

borissmidt · 2020-05-27T08:53:15Z

Aha you have to do sbt clean prepare otherwise it has chached the formatting and will not do anything. I've more experience with gradle.

borissmidt · 2020-05-27T10:44:55Z

Oke i fixed all issues

borissmidt · 2020-05-29T06:54:54Z

@iravid Could this be merged before there is another merge conflict?

iravid · 2020-05-29T10:54:36Z

Thank you @borissmidt!

borissmidt requested a review from iravid as a code owner May 14, 2020 21:21

borissmidt force-pushed the zstream-collectall-to-chunk branch 2 times, most recently from 8c4ca99 to 96aa0f7 Compare May 14, 2020 22:08

iravid requested changes May 15, 2020

View reviewed changes

streams/shared/src/main/scala/zio/stream/ZSink.scala Outdated Show resolved Hide resolved

borissmidt force-pushed the zstream-collectall-to-chunk branch from 96aa0f7 to 46aebc3 Compare May 15, 2020 16:30

iravid requested changes May 15, 2020

View reviewed changes

iravid added enhancement New feature or request stream ZIO Stream labels May 18, 2020

borissmidt force-pushed the zstream-collectall-to-chunk branch from 46aebc3 to c6725ea Compare May 19, 2020 08:54

adamgfraser reviewed May 21, 2020

View reviewed changes

streams/shared/src/main/scala/zio/stream/ZSink.scala Outdated Show resolved Hide resolved

adamgfraser reviewed May 22, 2020

View reviewed changes

streams/shared/src/main/scala/zio/stream/ZSink.scala Outdated Show resolved Hide resolved

borissmidt force-pushed the zstream-collectall-to-chunk branch 2 times, most recently from 32f5d26 to 039f344 Compare May 22, 2020 16:17

iravid reviewed May 22, 2020

View reviewed changes

streams-tests/shared/src/test/scala/zio/stream/ZSinkSpec.scala Outdated Show resolved Hide resolved

streams-tests/shared/src/test/scala/zio/stream/ZStreamSpec.scala Outdated Show resolved Hide resolved

borissmidt force-pushed the zstream-collectall-to-chunk branch from 039f344 to a5543e2 Compare May 23, 2020 11:55

borissmidt added 2 commits May 25, 2020 21:45

merge conflict

efedd2c

Make collectAll of Sink referential transparent.

6b79b95

borissmidt force-pushed the zstream-collectall-to-chunk branch from a5543e2 to 6b79b95 Compare May 25, 2020 21:13

borissmidt force-pushed the zstream-collectall-to-chunk branch from 27c5fc9 to 091d70a Compare May 26, 2020 16:17

remove commented test

793080f

borissmidt force-pushed the zstream-collectall-to-chunk branch from 091d70a to 793080f Compare May 27, 2020 08:51

iravid approved these changes May 29, 2020

View reviewed changes

iravid merged commit 2756324 into zio:master May 29, 2020

Uh oh!

ZSink.collectAll and Zstream.runAll use chunk #3587

ZSink.collectAll and Zstream.runAll use chunk #3587

Uh oh!

Conversation

borissmidt commented May 14, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

CLAassistant commented May 14, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

iravid left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

iravid left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

borissmidt commented May 15, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

borissmidt commented May 15, 2020

Uh oh!

iravid commented May 16, 2020

Uh oh!

borissmidt commented May 17, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

iravid commented May 17, 2020

Uh oh!

iravid commented May 17, 2020

Uh oh!

borissmidt commented May 17, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

iravid commented May 17, 2020

Uh oh!

borissmidt commented May 17, 2020 via email

Uh oh!

simpadjo commented May 17, 2020

Uh oh!

borissmidt commented May 18, 2020

Uh oh!

borissmidt commented May 20, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

iravid commented May 21, 2020

Uh oh!

adamgfraser commented May 21, 2020

Uh oh!

Uh oh!

borissmidt commented May 21, 2020 via email

Uh oh!

borissmidt commented May 22, 2020

Uh oh!

Uh oh!

iravid left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

borissmidt commented May 25, 2020

Uh oh!

borissmidt commented May 27, 2020

Uh oh!

iravid commented May 27, 2020

Uh oh!

borissmidt commented May 27, 2020

Uh oh!

borissmidt commented May 27, 2020

Uh oh!

borissmidt commented May 29, 2020

Uh oh!

iravid commented May 29, 2020

Uh oh!

Reviewers

Assignees

Labels

Projects

borissmidt commented May 14, 2020 •

edited

Loading

CLAassistant commented May 14, 2020 •

edited

Loading

iravid left a comment •

edited

Loading

borissmidt commented May 15, 2020 •

edited

Loading

borissmidt commented May 17, 2020 •

edited

Loading

borissmidt commented May 17, 2020 •

edited

Loading

borissmidt commented May 20, 2020 •

edited

Loading