-
Couldn't load subscription status.
- Fork 266
Manually block up sections of output from caches into lists to avoid flo... #453
Conversation
|
I think we should tackle these two items separately:
Breaking this into two issues (the second being the highest priority) seems like a good way to go. |
|
--> When you come in you mean to the FFM bolt?
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this should no longer be needed. This was added to Algebird as the default for tuples.
…l-partition Conflicts: summingbird-storm/src/main/scala/com/twitter/summingbird/storm/StormPlatform.scala
…l-partition Conflicts: summingbird-storm/src/main/scala/com/twitter/summingbird/storm/StormPlatform.scala
|
going to wait to review this until #455 is in so we can look at a clean diff. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why is this better than the default Semigroup?
…eature/manual-partition Conflicts: summingbird-online/src/main/scala/com/twitter/summingbird/online/executor/FinalFlatMap.scala summingbird-online/src/main/scala/com/twitter/summingbird/online/executor/Summer.scala summingbird-storm/src/main/scala/com/twitter/summingbird/storm/StormPlatform.scala
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is still unclear to me.
So, from the above, it seems that we are using this number to be the max number of keys in a cache on the flatmappers, and then using the fact that Maps can be summed, and then emitted as a single batch.
Can you explain more the whole algorithm here?
I would prefer if we could just get the effect of batching here. For instance, the cache emits blocks of results, those could be written as a List, and sent over (perhaps with a max size and chunked).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So we can't just emit the cache as a list of keys, we need an outer key grouping. Since we are on a key grouped level, we need to have a stable mapping of K -> OuterK. Here we are configuring a setting of how big the space we should map the original K into, a multiple of the available consumers. Generic batching still wouldn't really solve this I don't think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
shouldn't this go in AllOpts and get a comment? I really don't know what this does.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Its not there because its not a user set variable, not really sure where it should go. I didn't want to pass around an integer. It contains the size space which to mod the hash of the key into. (calculated in storm platform)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should it be private in that case?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, will change
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why wrapping S here rather than leave it opaque? Seems like the previous PR did the opposite: move from (Timestamp, T) => T, here we are taking (what I think) is an opaque parameter S and putting structure we don't use. Why not just instantiante with S=InputState[S1] for some S1?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I use the InputState portion further down in the Summer class now for fan out. I could put S in the type parameter of Summer to be S <: InputState[_] ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ahh... here is where you use the concrete nature of InputState. This is new right?.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep exactly. S <: InputState[_] compiles nicely though, so we could use it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
BTW: This seems super easy to get wrong. fanOut(size) seems more intuitive.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, state.fanOut can throw, but that will cause an exception, not return a Future.exception. What about:
try {
// code you have above here
}
catch {
case t: Throwable => Future.exception(t)
}
|
I've done all the comments except the S <: InputState[_] so far, which I have locally. If you would rather it i'll push it in too. I kind of like it as it'll make it obvious we don't care what the inner type is |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why is 1 not enough here? Can you add a comment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This really came from Aaron, i think some sort of notion that incase the space gets lob sided in some manner. I can't really see why it shouldn't be one if everything is well behaved.
…the output of increasing its size. Wrap the fan out code in a try catch for a future exception
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
size > 0 or we would not have recieved anything, right? Can we put:
assert(size > 0, "Input maps must be non-empty")
else, we are not going to do the acking correctly, right?
Manually block up sections of output from caches into lists to avoid flo...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
.nonEmpty
...oding the underlying transport
@johnynek , @jcoveney Somewhat here as an RFC, going to do more tests with this new branch/build.
Part of this is just a clean up of Timestamp bleeding further and into places not needed, so I could output from a final flatmap without a timestamp. We basically partition the keyspace into a set of buckets before passing off to storm.