Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Sink drop while bug#9405

Merged
kyri-petrou merged 26 commits intozio:series/2.xfrom
eyalfa:sink_dropWhile_bug
Jan 18, 2025
Merged

Sink drop while bug#9405
kyri-petrou merged 26 commits intozio:series/2.xfrom
eyalfa:sink_dropWhile_bug

Conversation

@eyalfa
Copy link
Contributor

@eyalfa eyalfa commented Dec 13, 2024

adresses #9395

@eyalfa eyalfa marked this pull request as ready for review December 13, 2024 09:09
@eyalfa
Copy link
Contributor Author

eyalfa commented Dec 15, 2024

cc: @kyri-petrou @jdegoes

assert(stripped)(isNone)
}
),
suite("filter")(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you remove the Cause changes that are not related to this PR?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

funny, thought I branched out of the series/2.x branch,
even funnier, these are already merged

@eyalfa
Copy link
Contributor Author

eyalfa commented Dec 24, 2024

cc: @regiskuckaertz,
@ghostdogpr mentioned u in discord as a potential reviewer, I'd appreciate your input on this pr.

Comment on lines 233 to 234
//I beleive the original assertion had wrong expectations
}(equalTo(Chunk(Right(3) /*, Left("Aie")*/ )))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm confused, why do you think the original assertion was wrong? To me it seems like the correct one

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the sink succeeds while processing the element 3, at this point it has no leftovers, so it emits nothing to downstream and succeeds immediately without reading anything else from upstream. hence upstream never gets to execute the failing effect.

Comment on lines +178 to +184
test("correct leftovers behavior") {
assertZIO(
ZStream
.range(0, 20, chunkSize = 3)
.run(ZSink.dropWhile[Int](_ <= 10).collectLeftover)
.map(_._2)
)(equalTo(Chunk(11)))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't this be Chunk(11, 12, ..., 19)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no, that exactly the bug.
once a sink succeeds it's not supposed to read anymore from upstream, due to chunking it may have leftovers it must write to downstream. in this case upstream emits chunks of 3, so when the sink completes it has a single leftover element (11) which is exactly what the assertion tests

Comment on lines 187 to 196
assertZIO(
ZStream(0, 0, 0, 1, 0, 0, 2)
.rechunk(3)
.transduce {
ZSink.dropWhile[Int](_ == 0) *> ZSink.head[Int]
}
.take(10)
.runCollect
//seems like the 'trailing' sink invocation is by design
)(equalTo(Chunk(Some(1), Some(2), None)))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel like the correct behaviour here should be Chunk(Some(1)) since we have ZSink.head[Int]?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

transduce (based on ZStream.fromSink) runs the sink repeatedly until upstream completes. each time the sink completes, its result is emitted and the cycle begins again.

in this specific example:
first invocation of the sink consumes 0, 0, 0, 1 and emits Some(1),
second invocation consumes 0, 0, 2 and emits Some(2),
third invocation operates on an empty stream and emits None, then the entire stream completes.

in => {
val out = in.dropWhile(p)
if (out.nonEmpty)
ZChannel.write(out) *> ZChannel.unit
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure I follow what's going on here. In the case that out.nonEmpty == true, won't this write the current Chunk and stop reading? Meaning that any subsequent chunks won't be read?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

exactly, when out.nonEmpty is true the sink actually succeeds (with a unit), however it must first write the leftovers to downstream.

once the sink succeeds it will no longer read from upstream, it is possible for other sinks to keep reading the leftovers + remaining upstream (achieved by composing sinks, ie. using flatMap)

.pipeThrough(ZSink.dropWhileZIO[Any, String, Int](x => ZIO.succeed(x < 3)))
.either
.runCollect
}(equalTo(Chunk(Left("Aie"))))
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kyri-petrou in this case the sink is still reading from upstream after consuming 1,2,2, so upstream actually gets to execute the failing effect.

),
test("late error")(
assertZIO {
(ZStream(1, 2, 3) ++ ZStream.fail("Aie") ++ ZStream(5, 1, 2, 3, 4, 5))
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kyri-petrou , here you can see the same 'error handling' behavior with the head sink in case of a 'late' failure

),
test("late error")(
assertZIO {
(ZStream(1, 2, 3) ++ ZStream.fail("Aie") ++ ZStream(5, 1, 2, 3, 4, 5))
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kyri-petrou , here you can see the same 'error handling' behavior with the head sink in case of a 'late' failure

test("early error")(
assertZIO {
(ZStream.fail("Aie") ++ ZStream(1, 2, 3) ++ ZStream(5, 1, 2, 3, 4, 5))
.pipeThrough(ZSink.head)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same as above, with an early failure this time

@eyalfa eyalfa requested a review from kyri-petrou December 26, 2024 12:45
Copy link
Contributor

@kyri-petrou kyri-petrou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm still trying to wrap my head around ZSink and how it's supposed to work, but in the meantime we need to revisit any usages of ZSink.dropWhile and ZSink.dropWhileZIO and make sure that they'll work under the new behaviour. One usage I can see that is likely affected is in ZStream#dropWhileZIO

@eyalfaZS
Copy link

eyalfaZS commented Dec 30, 2024

@kyri-petrou , this pr fixes dropWhileZIO as well.
notice the current behavior is flawed, feeding an infinite stream to this sink never ends, I suggest you compare that with ZSink.head's behavior as it's a bit easier to follow.

one way to 'wrap your head' about ZSink is considering what would you expect if it operated on element level (instead of chunk level). in this case ZSink.head consumes exactly one element (or none if upstream is empty) and completes. ZSink.dropWhile reads one element, determine if it's still dropping and then either complete without further reading from upstream or otherwise cycles this process again.

Enter chunks, ZSink.head now reads a chunk instead of a just a single element, when this chunk has more than one element it must expose the remaining leftovers in order to support proper ZSink composition, this is achieved by writing the leftovers to downstream before completing.

I think a good reference for sinks behavior is Zsink.fold which many sinks (such as head) are defined in terms of.

@kyri-petrou
Copy link
Contributor

I think the part that I'm trying to wrap my head around is whether having ZSinks such as dropWhile even makes sense. In the case of dropWhile there is no success value (it's actually Unit even though the type is set to Any for some reason), and the remainder of the stream becomes a leftover.

As for ZStream#dropWhileZIO, currently it's implemented different than ZStream#dropWhile (one uses a ZSink, the other does not). Shouldn't both implementations use the same underlying mechanism?

@eyalfa
Copy link
Contributor Author

eyalfa commented Jan 1, 2025

@kyri-petrou , the dropWhile and dropWhileZIO sinks are useful when composed with other sinks, like the composition in @oridag 's original use case (combined with ZSink.head).

re. the inconsistency in ZStream's operator implementations, I can easily fix this well. lmk if you think it belongs in this pr.

are you convinced about the validity of the fixes introduced by this pr?

def dropWhileZIO[R, InErr, In](
p: In => ZIO[R, InErr, Boolean]
)(implicit trace: Trace): ZSink[R, InErr, In, In, Any] =
new ZSink(ZPipeline.dropWhileZIO(p).toChannel)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't this change target the dropWhile pipeline?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@hearnadam , not sure what do u mean by that,
implementing the sink in terms of the pipeline resulted with a bug since the pipeline keeps pulling the entire stream while the sink is expected to stop immediately.
it is possible to change the pipeline and implement it in term of the sink (combined with identity to make sure it keeps pulling) but I can't see a good reason to do so as it'd reduce the (pipeline's) code readability

@eyalfa eyalfa requested a review from kyri-petrou January 5, 2025 13:43
@eyalfa
Copy link
Contributor Author

eyalfa commented Jan 9, 2025

@kyri-petrou @ghostdogpr ,
any chance to finalize this review?

@eyalfaZS
Copy link

@kyri-petrou , thanks for pulling series/2.x into my branch,
I've looked into the failure and I think it just shows that implementing ZStream.dropWhileXXX inb terms of ZSink.dropWhileXXX is wrong.
I think the following test using ZSink.head and ZStream.pipeThrough (equivalent to the failing ZStream.dropWhileXXX implementation) shows this:

test("pipeThrough(ZSink.head)")(
            check(pureStreamOfInts) {ints =>
              for{
                res1 <- ints.pipeThrough(ZSink.head).runCollect
                res2 <- ints.chunks.filter(_.nonEmpty).run(ZSink.head)
                res3 = res2.toSeq.flatMap(_.drop(1))
              } yield {
                assert(res1)(equalTo(res3))
              }
            }
          )

in short: a sink stops consuming as soon as it can produce a result, at this time it'd emit the leftovers to downstream.
ZStream.pipeThrough runs the sink (once) and emits its leftovers, in the case of ZSink.head this is the tail of the first non empty chunk (or nothing in case the stream is empty).
ZSink.dropWhilXXX use to consume the entire upstream as leftovers which is a bug, it is also the reason the faulty implementation of ZStream.dropWhieXXX passed the tests.

thanks @kyri-petrou for adding this ZStream test show-casing your concern, I'll fix the ZStream.dropWhilXXX operators as well.

@eyalfaZS
Copy link

@kyri-petrou , btw seems like ZStream.dropWhile is properly implemented in term of ZPipeline.dropWhile, not really sure why ZStrea,dropWhilZIO was implemented in terms of the sink rather than the pipeline.

what's more alarming is that it seems ZSink.dropUntil is also implemented in terms of the pipeline, hence emitting the entire upstream as leftovers... I'll add a fix as part of this pr (assuming acceptable by u).

@kyri-petrou
Copy link
Contributor

kyri-petrou commented Jan 13, 2025

are you convinced about the validity of the fixes introduced by this pr?

There are things that I still don't understand with ZSink. e.g., why does this happen? Is this valid or not?

ZStream(0, 1, 2, 0, 2, 0, 2)
                .rechunk(3)
                .transduce {
                  ZSink.dropWhile[Int](_ == 0) *> ZSink.head[Int]
                }
                .take(10)
                .runCollect // produces Chunk(Some(1), Some(2), Some(2), Some(2), None)

Why does the 2nd Some(2) exist when we used ZSink.head? If it's valid that it exists, why did the 0 after 1 got removed? It feels like transduce is broken altogether 😕

In either way, the part that worries me the most about these changes is not their validity per se but the fact that it alters the underlying behaviour of ZSink.dropWhle. Effectively its behaviour changes entirely, so users that are currently using it will likely end up with broken code unknowingly. It's even worse if libraries are using it because users will not even be able to change its usage

@eyalfaZS
Copy link

re. transduce: it applies the sink multiple times until exhausting upstream, in this case u can think about it as if it breaks the stream into these chunks:

[0, 1] --> Some(1)
[2] --> Some(2)
[0, 2] --> Some(2)
[0, 2] --> Some(2)
[] --> None

re. change behavior, existing behavior is a bug. it may manifest itself in multiple ways:

  1. non-completing streams
  2. memory leaks as result of accumulation.
  3. eager 'error consumption'
  4. broken sinks composition

most of these won't happen when using sinks the 'normal' way, they appear only when using operators like transduce or ZPipeline.fromSink and alike.

is there a process to test this kind of change at least on the zio ecosystem? major libraries like zio-http, zio-grc, zio-kafka, zio-aws (feel free to add more), perhaps by releasing a side branch and then running test prs on these projects?

@kyri-petrou
Copy link
Contributor

is there a process to test this kind of change at least on the zio ecosystem

No, not at the moment at least. I recently proposed to create a nightly build of sorts that tests the latest ZIO snapshot against all the zio-* libraries to ensure that there aren't any regressions. In the meantime, the only option will be to search for usages of dropWhile and dropWhileZIO in the zio org

@eyalfaZS
Copy link

and another fun fact, seems like ZPipeline.dropUntilZIO has a bug as well, I suspect it happens when the last dropped element is the last one in a chunk.

see this test:

test("dropUntilZIO") {
          check(pureStreamOfInts, Gen.function(Gen.boolean)) { (s, p) =>
            for {
              res1 <- (s >>> ZPipeline.dropUntilZIO(p.andThen(ZIO.succeed(_)))).runCollect
              res2 <- s.runCollect.map(_.dropWhile(!p(_)).drop(1))
            } yield assert(res1)(equalTo(res2))
          }
        }

fails with:

✗ Empty$[0] : expected '447334053' got 'null'

      res1 did not satisfy equalTo(res2)
      res1 = Chunk()

I'll take a stub at this as well

@oridag
Copy link

oridag commented Jan 13, 2025

are you convinced about the validity of the fixes introduced by this pr?

There are things that I still don't understand with ZSink. e.g., why does this happen? Is this valid or not?

ZStream(0, 1, 2, 0, 2, 0, 2)
                .rechunk(3)
                .transduce {
                  ZSink.dropWhile[Int](_ == 0) *> ZSink.head[Int]
                }
                .take(10)
                .runCollect // produces Chunk(Some(1), Some(2), Some(2), Some(2), None)

Why does the 2nd Some(2) exist when we used ZSink.head? If it's valid that it exists, why did the 0 after 1 got removed? It feels like transduce is broken altogether 😕

In either way, the part that worries me the most about these changes is not their validity per se but the fact that it alters the underlying behaviour of ZSink.dropWhle. Effectively its behaviour changes entirely, so users that are currently using it will likely end up with broken code unknowingly. It's even worse if libraries are using it because users will not even be able to change its usage

Yes, this is the expected behavior (the trailing None is arguable). transduce takes a sink and runs the stream through it multiple times. Given a stream Stream[A] and a sink Sink[B], stream.transduce(sink) will transform the stream of A into a stream of B. In this case, the sink is composed from two sinks, one that consumes (and discards) all zeros, and then one that takes the first element. Running the stream into this sink multiple times should yield all non-zero elements.

@eyalfa
Copy link
Contributor Author

eyalfa commented Jan 13, 2025

@kyri-petrou , searched for ZSink in major zio org repos, scalapb's zio-grpc, caliban..
seems like ZSink are quite sparsely used, any ways no usage of any of the sinks mentioned in this pr.

Copy link
Contributor

@kyri-petrou kyri-petrou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the nitpicks, code in the streams package is quite difficult to read as is so I prefer to have it as clean as possible. Happy to merge once they're addressed

new ZPipeline(loop)
}
)(implicit trace: Trace): ZPipeline[Env, Err, In, In] =
ZPipeline.dropWhileZIO(p(_: In).map(!_)) >>>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can use p(_: In).negate by the way


new ZPipeline(loop)
}
)(implicit trace: Trace): ZPipeline[Env, Err, In, In] =
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A bit outside the scope of this PR but since we're at it:

When I first read dropUntil and dropUntilZIO I assumed that the method would drop until but not inclusive of the element where the predicate evaluated true. Might be good to change the scaladoc to mention that it will drop elements up to (inclusive) of the element where it evaluated true

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kyri-petrou , since this text appears multiple times in the scaladocs, let's agree on the wording here, then I'm willing to make the change. either in this pr or a separate one.

p: In => ZIO[R, InErr, Boolean]
)(implicit trace: Trace): ZSink[R, InErr, In, In, Any] =
new ZSink(ZPipeline.dropUntilZIO(p).toChannel)
ZSink.dropWhileZIO(p(_: In).map(!_)) *> ZSink.head
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use negate

@eyalfa eyalfa requested a review from kyri-petrou January 18, 2025 09:23
@kyri-petrou kyri-petrou merged commit 2c28eb2 into zio:series/2.x Jan 18, 2025
18 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants