-
Couldn't load subscription status.
- Fork 266
Fix root cause of duplicate summer execution in Memory platform #667
Conversation
AlsoProducers. The root cause has actually to do with the summer. In toStream method we keep a map of Producer to stream to make sure we don't run parts of the graph again and again. The map uses value equality. The problem is that summer includes store which is mutable. If a summer is referenced by two also producers and if by the time summer is revisited the store has changed then the key would be different from the original summer and won't be found in the map, leading to the summer being planned again. d3b5808 seems to fix the issue and passes a unit test because that test has only one AlsoProducer, where planning left and right before forcing left means that store doesn't change in between. But it likely wouldn't fix the case where there are multiple AlsoProducers at different levels. Even if it did fix those cases this is an indirect solution and it's better to avoid taking any chances and fix the root of the issue. This is a case where reference equality fits the bill. I don't know of an immutable implementation of a map that uses reference equality so I'm using java.util.IdentityHashMap, if there is one I would love to use that instead.
| val st = s.asInstanceOf[Stream[T]] | ||
| (st, m + (outerProducer -> st)) | ||
| def toStream[T, K, V](outerProducer: Prod[T], jamfs: JamfMap): (Stream[T], JamfMap) = { | ||
| val stream = jamfs.get(outerProducer) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could we do: Option(jamfs.get(outerProducer)) match { so that the diff can be smaller? This is harder to read.
| private def toStream[T](outerProducer: Prod[T], jamfs: JamfMap): (Stream[T], JamfMap) = | ||
| jamfs.get(outerProducer) match { | ||
| case Some(s) => (s, jamfs) | ||
| def toStream[T, K, V](outerProducer: Prod[T], jamfs: JamfMap): (Stream[T], JamfMap) = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
K, V don't seem to be used?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch, fixed.
|
Some context for the review. This change is basically a revert of d3b5808 followed by use of IdentityHashMap instead of Map for JamfMap. btw, I don't know what Jamf stands for and curios find out. |
|
|
||
| private type Prod[T] = Producer[Memory, T] | ||
| private type JamfMap = HMap[Prod, Stream] | ||
| private type JamfMap = util.IdentityHashMap[Prod[_], Stream[_]] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is mutable now, and before it was immutable. Does this not cause any problems? Can you add some comments explaining how it is safe? Also, why thread it through if you are mutating it? There is no need to return JamfMap if you are just returning the input.
I would prefer not introduce a mutable map, and instead add:
class Identity[T](val unwrap: T) {
override def equals(that: Any) = that match {
case i: Identity[_] => unwrap eq i.unwrap
case _ => false
}
override def hashCode = System.identityHashCode(unwrap)
}
private type Prod[T] = Identity[Producer[Memory, T]]then wrap keys with new Identity(key).
I really hate to give up the reasoning we get with immutable types, and we should only do so for big performance wins, but we don't care about performance here (planning is fast, and only happens at submit).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a good idea. I would really love to keep everything immutable too. Let me give this a shot.
|
He he, I love pulp fiction dialogs. I've updated the review to use immutable map with reference equality. |
| lazy val lforcedEmpty = left.filter(_ => false) | ||
| (right.append(lforcedEmpty), rightM) | ||
| val lforcedEmpty = left.filter(_ => false) | ||
| val (right, rightM) = toStream(r, leftM) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why do we have to revert this change? Calling toStream(r, leftM) first seems more correct to me, as does using the lazy concatenation (rather than the strict that we have reverted to).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It breaks a bunch of our tests which are relying on the current execution order. Since we don't need to change the ordering to fix the duplication issue it will be great if we can preserve the same execution order.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As we discussed, can you just copy this file into the repo rather than hardcode a bug just because a few tests assumed something false?
I fear the "keeping bugs for a few tests" is not scalable. I doubt you would be happy if you hit a bug that I want us to keep because my tests assumed the buggy behavior.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just to link to the prior discussion:
https://github.com/twitter/summingbird/pull/647/files#r47308385
#647 may indeed be fixed more than one way, but I still don't like that planning forces the left hand side.
Could you get the tests to pass if you did something like:
def merge[A, B](a: Stream[A], b: Stream[B]): Stream[Either[A, B]] = {
val itA = a.iterator
val itB = b.iterator
var left = itA.hasNext
// try to alternate as much as possible
def next: Option[Either[A, B]] =
if (!(itA.hasNext || itB.hasNext)) None
else if (left) {
left = false
if (itA.hasNext) Some(Left(itA.next)) else next
} else {
left = true
if (itB.hasNext) Some(Right(itB.next)) else next
}
Stream.continually(next).takeWhile(_.isDefined).map(_.get)
}Then we can return merge(left, right).collect { case Right(r) => r }. as the result of the Also. In this way, we are closer to modeling a real streaming system.
If the contract we are presenting here is that the left will always be consumed at planning, I just don't like that. It leads to a side-effect from planning which was never our intent. I think the test needs to not assume that.
I think we need to make a clear case as to why the behavior is a certain way to which is not "some broken test at Twitter will fail if we change this". If that is the only reason, you should have an internal fork of the platform.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One of the broken tests actually got fixed by this cl. Other teams have agreed to work on fixing their tests or comment out the broken test for now. I've updated the cl to restore the lazy evaluation of left producer. Please take a look.
happens during planning.
|
|
||
| package com.twitter.summingbird.memory | ||
|
|
||
| import java.util |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we aren't using this anymore are we?
|
Okay, given that this was always an example and toy platform, if it is really important for twitter for it to have the (unrealistic) semantics that in an also, the left is fully evaluated before the right, I'm okay with that (since it didn't cause an issue for anyone that we know of yet). If the tests can be fixed to not assume that, even better. I'll leave it up to @pankajroark 👍 |
|
In this case we've decided to remove/fix the tests that are failing due to wrong assumptions. So we're going ahead and keeping the change to make execution of left producer lazy. |
|
|
|
@sritchie Never heard of Gravity's rainbow, I've got to read it now :) |
AlsoProducers. The root cause has actually to do with the summer. In
toStream method we keep a map of Producer to stream to make sure we don't run
parts of the graph again and again. The map uses value equality. The
problem is that summer includes store which is mutable. If a summer is
referenced by two also producers and if by the time summer is revisited
the store has changed then the key would be different from the original
summer and won't be found in the map, leading to the summer being
planned again. d3b5808 seems to fix the issue and passes a unit test
because that test has only one AlsoProducer, where planning left and
right before forcing left means that store doesn't change in between.
But it likely wouldn't fix the case where there are multiple AlsoProducers at
different levels. Even if it did fix those cases this is an indirect
solution and it's better to avoid taking any chances and fix the root of
the issue.
This is a case where reference equality fits the bill. I don't know of
an immutable implementation of a map that uses reference equality so
I'm using java.util.IdentityHashMap, if there is one I would love to use
that instead.