Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Conversation

@borissmidt
Copy link
Contributor

@borissmidt borissmidt commented May 14, 2020

Change ZSink.collectAll to return Chunk
Change Zstream.runAll to return Chunk

I think i left one todo for the ZIO.foreach that returns a list as well but it would make this PR too big.

There was on inconvenience with the current test system i only had 1 compilation error left
but because it didn't check all test cases i had to compile many times with only a single error reported.

#3575

@borissmidt borissmidt requested a review from iravid as a code owner May 14, 2020 21:21
@CLAassistant
Copy link

CLAassistant commented May 14, 2020

CLA assistant check
All committers have signed the CLA.

@borissmidt borissmidt force-pushed the zstream-collectall-to-chunk branch 2 times, most recently from 8c4ca99 to 96aa0f7 Compare May 14, 2020 22:08
Copy link
Member

@iravid iravid left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. One small comment.

@borissmidt borissmidt force-pushed the zstream-collectall-to-chunk branch from 96aa0f7 to 46aebc3 Compare May 15, 2020 16:30
Copy link
Member

@iravid iravid left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Taking a step back here, @borissmidt I'd like to see a benchmark of this first.

Building a list (and yes, reversing it) is very fast in Scala because the appends are O(1) and the reversal can happen with a pre-allocated buffer.

According to this: https://www.lihaoyi.com/post/BenchmarkingScalaCollections.html#construction-performance the only thing that is faster than prepending to a list is pre-allocating an array and filling it up, but we can't do that because we don't know up-front what is the size of the stream.

Another thing to consider when collecting to a chunk is the size limit: a ChunkBuilder will accumulate elements into an array, so this has a hard size limit at 2^31. Not sure if this is a showstopper or not.

@borissmidt
Copy link
Contributor Author

borissmidt commented May 15, 2020

Taking a step back here, @borissmidt I'd like to see a benchmark of this first.

Building a list (and yes, reversing it) is very fast in Scala because the appends are O(1) and the reversal can happen with a pre-allocated buffer.

According to this: https://www.lihaoyi.com/post/BenchmarkingScalaCollections.html#construction-performance the only thing that is faster than prepending to a list is pre-allocating an array and filling it up, but we can't do that because we don't know up-front what is the size of the stream.

Another thing to consider when collecting to a chunk is the size limit: a ChunkBuilder will accumulate elements into an array, so this has a hard size limit at 2^31. Not sure if this is a showstopper or not.

Maybe it is a better idea to change it to Seq then as the return type then you can change it on to something that would give the best performance without breaking binary compatibility?

Since i can imagine that there are other better structures than an array that has to be copied every so often and a list that has a lot of indirection. Also in case of reversing it has to iterate over the whole list?

How could i best benchmark this?

looking at the scala 2.13 source for the reverse it doesn't use a pre-allocated buffer so you need to iterate the full list.

  final override def reverse: List[A] = {
    var result: List[A] = Nil
    var these = this
    while (!these.isEmpty) {
      result = these.head :: result
      these = these.tail
    }
    result
  }

In case of a ListBuilder it used mutible cons and replaces the 'next' each time you append to the ListBuilder.

Array builder pre allocates an Array and doubles the size each time it runs out of capacity, (it reminds me of the c++ vector class)

the benchmark doesn't use 'builders' in for construction i could extend that and give an update?
https://github.com/lihaoyi/scala-bench/blob/master/bench/src/main/scala/bench/Benchmark.scala

i'll rerun the constructor benchmarks with a list builder and array builder for scala 2.11/2.12/2.13 since these results are from scala 2.11.8.

as for the maximum size, list, seq and vector only index up to 2^32 elements so maybe use a long as index for chunk ;)

@borissmidt
Copy link
Contributor Author

I've updated the scala-bench project to use builders and will run them overnight.
It could be used to benchmark the chunk collection?

https://github.com/borissmidt/scala-bench

@iravid
Copy link
Member

iravid commented May 16, 2020

@borissmidt I think it could be easier to just add another test case to ChunkBenchmarks.scala in this project.

@borissmidt
Copy link
Contributor Author

borissmidt commented May 17, 2020

bench-result.zip
I was just curios my self to see if the array builder is faster than the normal list add operation and indeed it is. results are the average runtime in nano seconds of iteration of a run.

So in these results i see that the arrayBuilder is signifcantly faster than the array :+ .
A list can be made faster then the arrayBuilder. But its always slower if you reverse it afterwards.

@iravid
Copy link
Member

iravid commented May 17, 2020

Can you post the results here?

@iravid
Copy link
Member

iravid commented May 17, 2020

You were testing list prepend, not append, right?

@borissmidt
Copy link
Contributor Author

borissmidt commented May 17, 2020

Yes list prepend! the :: operation. you can view the code and run it yourself i've posted it.
I also call the builder.result at the end of each run so its a fair comparison.

Another suprising fact, very big arrays become slow to allocate than list > 1M elements in size.

@iravid
Copy link
Member

iravid commented May 17, 2020

Right. I'm looking at the results of ListReverse and ArrayBuilder which is what we'd be using in practice. At 250k elements, the List becomes faster.

I think this is good evidence that ChunkBuilder and Chunk are good for the runCollect usecase!

@borissmidt
Copy link
Contributor Author

borissmidt commented May 17, 2020 via email

@simpadjo
Copy link
Contributor

ZSink is expected to be the final stage of stream processing. If a user decided to collect a huge stream into a List/Chunk, it opted-in for having abnormally large objects in its program, and will face problems anyway.
I don't think we should spend too much of an effort optimizing for this case.

@borissmidt you can get rid of linting errors by running sbt prepare (see https://github.com/zio/zio/blob/master/docs/about/contributing.md)

@borissmidt
Copy link
Contributor Author

There is still the open decision to keep the old Api and add the chunk as separate function calls.
Or replace them with chunk as i've done now.

@iravid iravid added enhancement New feature or request stream ZIO Stream labels May 18, 2020
@borissmidt borissmidt force-pushed the zstream-collectall-to-chunk branch from 46aebc3 to c6725ea Compare May 19, 2020 08:54
@borissmidt
Copy link
Contributor Author

borissmidt commented May 20, 2020

"size can be modified locally" at src/test/scala/zio/test/GenSpec.scala:607 fails and i don't understand why this is related to the changes i had made.

@iravid
Copy link
Member

iravid commented May 21, 2020

@adamgfraser Can you peek at the test failure here and give us a hint?

@adamgfraser
Copy link
Contributor

@iravid Yes let me take a look.

@borissmidt
Copy link
Contributor Author

borissmidt commented May 21, 2020 via email

@borissmidt
Copy link
Contributor Author

I fixed, it but i'm not sure if it is a clean way of doing it, i tried t make a 'lazy chunkbuilder' that creates a new instance on the first += operation. But it wasn't possible with the typing of the return type.
An alternative could have been using Option. But then i wonder what the impact of the extra allocations would have been.

@borissmidt borissmidt force-pushed the zstream-collectall-to-chunk branch 2 times, most recently from 32f5d26 to 039f344 Compare May 22, 2020 16:17
Copy link
Member

@iravid iravid left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me. Thank you @borissmidt. Can you fix the conflicts?

@borissmidt borissmidt force-pushed the zstream-collectall-to-chunk branch from 039f344 to a5543e2 Compare May 23, 2020 11:55
@borissmidt borissmidt force-pushed the zstream-collectall-to-chunk branch from a5543e2 to 6b79b95 Compare May 25, 2020 21:13
@borissmidt
Copy link
Contributor Author

I think it should be ok now? i've added the missing merge conflicts.

@borissmidt borissmidt force-pushed the zstream-collectall-to-chunk branch from 27c5fc9 to 091d70a Compare May 26, 2020 16:17
@borissmidt
Copy link
Contributor Author

I ran the fmt and fix command but there are no more changes detected. so i don't know why the fmtCheck still fails. Is there any reason why fix and fmt aren't ran during the build?

@iravid
Copy link
Member

iravid commented May 27, 2020

Use the prepare command in sbt. I think the contributor guide mentions that.

The CI doesn't commit to Github, so it doesn't format the code there.

@borissmidt borissmidt force-pushed the zstream-collectall-to-chunk branch from 091d70a to 793080f Compare May 27, 2020 08:51
@borissmidt
Copy link
Contributor Author

Aha you have to do sbt clean prepare otherwise it has chached the formatting and will not do anything. I've more experience with gradle.

@borissmidt
Copy link
Contributor Author

Oke i fixed all issues

@borissmidt
Copy link
Contributor Author

@iravid Could this be merged before there is another merge conflict?

@iravid iravid merged commit 2756324 into zio:master May 29, 2020
@iravid
Copy link
Member

iravid commented May 29, 2020

Thank you @borissmidt!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request stream ZIO Stream

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants