-
Couldn't load subscription status.
- Fork 1.4k
ZSink.collectAll and Zstream.runAll use chunk #3587
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
8c4ca99 to
96aa0f7
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks. One small comment.
96aa0f7 to
46aebc3
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Taking a step back here, @borissmidt I'd like to see a benchmark of this first.
Building a list (and yes, reversing it) is very fast in Scala because the appends are O(1) and the reversal can happen with a pre-allocated buffer.
According to this: https://www.lihaoyi.com/post/BenchmarkingScalaCollections.html#construction-performance the only thing that is faster than prepending to a list is pre-allocating an array and filling it up, but we can't do that because we don't know up-front what is the size of the stream.
Another thing to consider when collecting to a chunk is the size limit: a ChunkBuilder will accumulate elements into an array, so this has a hard size limit at 2^31. Not sure if this is a showstopper or not.
Maybe it is a better idea to change it to Seq then as the return type then you can change it on to something that would give the best performance without breaking binary compatibility? Since i can imagine that there are other better structures than an array that has to be copied every so often and a list that has a lot of indirection. Also in case of reversing it has to iterate over the whole list? How could i best benchmark this? looking at the scala 2.13 source for the reverse it doesn't use a pre-allocated buffer so you need to iterate the full list. In case of a ListBuilder it used mutible cons and replaces the 'next' each time you append to the ListBuilder. Array builder pre allocates an Array and doubles the size each time it runs out of capacity, (it reminds me of the c++ vector class) the benchmark doesn't use 'builders' in for construction i could extend that and give an update? i'll rerun the constructor benchmarks with a list builder and array builder for scala 2.11/2.12/2.13 since these results are from scala 2.11.8. as for the maximum size, list, seq and vector only index up to |
|
I've updated the scala-bench project to use builders and will run them overnight. |
|
@borissmidt I think it could be easier to just add another test case to |
|
bench-result.zip So in these results i see that the |
|
Can you post the results here? |
|
You were testing list prepend, not append, right? |
|
Yes list prepend! the :: operation. you can view the code and run it yourself i've posted it. Another suprising fact, very big arrays become slow to allocate than list > 1M elements in size. |
|
Right. I'm looking at the results of I think this is good evidence that |
|
I think the added advantage with the chunk is that it's part of the
library. So you can improve it.
Maybe you could make it a linked list of arrays of at most 128k elements :)
…On Sun, May 17, 2020, 12:37 Itamar Ravid ***@***.***> wrote:
Right. I'm looking at the results of ListReverse and ArrayBuilder which
is what we'd be using in practice. At 250k elements, the List becomes
faster.
I think this is good evidence that ChunkBuilder and Chunk are good for
the runCollect usecase!
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#3587 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/ABNXZFW2WFDSLBRP6RUX4UTRR647LANCNFSM4NBBJYJQ>
.
|
|
ZSink is expected to be the final stage of stream processing. If a user decided to collect a huge stream into a List/Chunk, it opted-in for having abnormally large objects in its program, and will face problems anyway. @borissmidt you can get rid of linting errors by running |
|
There is still the open decision to keep the old Api and add the chunk as separate function calls. |
46aebc3 to
c6725ea
Compare
|
|
|
@adamgfraser Can you peek at the test failure here and give us a hint? |
|
@iravid Yes let me take a look. |
|
Ah so i'm sharing a reference between a mutable array because its passed by
value and not as a thunk, that sounds logical to me. I'll fix it
…On Thu, May 21, 2020, 17:01 Adam Fraser ***@***.***> wrote:
***@***.**** commented on this pull request.
------------------------------
In streams/shared/src/main/scala/zio/stream/ZSink.scala
<#3587 (comment)>:
> */
- def collectAll[A]: ZSink[Any, Nothing, A, List[A]] =
- foldLeftChunks(Chunk[A]())(_ ++ (_: Chunk[A])).map(_.toList)
+ def collectAll[A]: ZSink[Any, Nothing, A, Chunk[A]] =
+ foldLeftChunks[A, ChunkBuilder[A]](ChunkBuilder.make[A]())((b, chunk) => b ++= chunk)
I believe the issue is that ChunkBuilder.make() is not wrapped in an
effect constructor and is violating referential transparency.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#3587 (review)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/ABNXZFTIKQADMKX7ASBKFWLRSU64TANCNFSM4NBBJYJQ>
.
|
|
I fixed, it but i'm not sure if it is a clean way of doing it, i tried t make a 'lazy chunkbuilder' that creates a new instance on the first += operation. But it wasn't possible with the typing of the return type. |
32f5d26 to
039f344
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me. Thank you @borissmidt. Can you fix the conflicts?
streams-tests/shared/src/test/scala/zio/stream/ZStreamSpec.scala
Outdated
Show resolved
Hide resolved
039f344 to
a5543e2
Compare
a5543e2 to
6b79b95
Compare
|
I think it should be ok now? i've added the missing merge conflicts. |
27c5fc9 to
091d70a
Compare
|
I ran the fmt and fix command but there are no more changes detected. so i don't know why the fmtCheck still fails. Is there any reason why fix and fmt aren't ran during the build? |
|
Use the The CI doesn't commit to Github, so it doesn't format the code there. |
091d70a to
793080f
Compare
|
Aha you have to do |
|
Oke i fixed all issues |
|
@iravid Could this be merged before there is another merge conflict? |
|
Thank you @borissmidt! |
Change ZSink.collectAll to return Chunk
Change Zstream.runAll to return Chunk
I think i left one todo for the ZIO.foreach that returns a list as well but it would make this PR too big.
There was on inconvenience with the current test system i only had 1 compilation error left
but because it didn't check all test cases i had to compile many times with only a single error reported.
#3575