-
Couldn't load subscription status.
- Fork 1.4k
Make Chunk Extend IndexedSeq #3342
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Love this. Great idea.
Had one question about the inference regression, but other than that I'm 👍 on this!
| c.foreach(_ => ()) | ||
|
|
||
| assert(c.filter(_ => false).map(_ * 2).length)(equalTo(0)) | ||
| assert(c.filter(_ => false).map[Int](_ * 2).length)(equalTo(0)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you explain why the annotations are required on map here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The issue is that we now have two overloaded versions of map on Scala 2.11 and 2.12:
// From Chunk
def map[B](f: A => B): Chunk[B]
// from IndexedSeq
def map[B, That](f: A => B)(implicit bf: CanBuildFrom[List[A], B, That]): ThatSo in the presence of that the Scala compiler, especially on 2.11 has a harder time inferring these types and complains that the type of an anonymous function should be fully known (even though it seems like it should be inferable here).
I think to address we need to consider moving methods like this to ChunkLike and only having one implementation on Scala 2.11 / Scala 2.12 that uses the signature from IndexedSeq but has an efficient implementation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, I see. Will we able to retain the efficient implementation given that signature? We'd have to go through the CBF and won't be able to use an array directly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's the question. I think we can. We can match on the CBF to get back a ChunkBuilder and add methods on the ChunkBuilder to support an efficient implementation in that case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah yeah. That sounds viable!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Worth writing up in tickets so we don't forget!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am incorporating a solution in this PR. Will push an updated version shortly.
| check(chunkWithIndex(Gen.unit)) { | ||
| case (chunk, i) => | ||
| assert(chunk.apply(i))(equalTo(chunk.toSeq.apply(i))) | ||
| assert(chunk.apply(i))(equalTo(chunk.toList.apply(i))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can omit the whole .toList.and use the apply directly on Chunk.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah never mind, it's testing that it's the same as List implementation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the goal in this test was to make sure the apply method itself is accurate, and so we are comparing the apply implementation in Chunk with the known correct value from accessing the same index in the list. I suppose maybe it would be better to check against Vector here.
| // val empty: Chunk[B] = Chunk.empty | ||
|
|
||
| val _: NonEmptyChunk[A] = empty ++ Chunk(new A {}) | ||
| // val _: NonEmptyChunk[A] = empty ++ Chunk(new A {}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Dead code in light of the changes to ++? Here and above.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I think we are going to have to add a new operator with a different name, something like concatNonEmpty.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tricky and delicate work here; love the attention to detail.
This exciting work is going to make Chunk the go-to collection type!
|
This is ready for another review. I added a |
| while (i < len) { | ||
| val chunk = f(self(i)) match { | ||
| case chunk: Chunk[B] => chunk | ||
| case other => Chunk.fromIterable(other.toList) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This toList will be detrimental to performance. Probably best to do Chunk.fromArray(other.toArray) if we need to do that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can't call toArray here because we don't have a ClassTag at this point. May make sense to refactor this so we just iterate over the original collection and either get the class tag from the Chunk if it is a chunk or otherwise from the first value of the collection using Tag.fromValue.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right. I think if we can only iterate over it once, then the best we can do is use the chunk builder to turn the "iterable once" into a chunk.
OTOH if we match against a few other cases, e.g. Vector, List, etc., we know what collection type we are dealing with and can handle it more specially (e.g. Vector we can preallocate, List cannot be pre-allocated since it's O(n) to determine the size, etc.).
But open question how much of that performance work to do here. I'm fine merging this in now, we can create tickets for performance work.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've got an idea about how we can do this efficiently. Will do a turn on it now.
| * and is not referentially transparent. It is provided for compatibility | ||
| * with Scala's collection library and should not be used for other purposes. | ||
| */ | ||
| override protected[this] def newBuilder: ChunkBuilder[A] = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
By convention, side-effecting methods should have nullary parameter list: def newBuilder().
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. The signature of the newBuilder method we are overriding doesn't have the nullary parameter list. Let me see if I can add and still have it be recognized as a valid override.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes unfortunately adding the nullary parameter list generate a compiler error so I think we will have to leave as is.
| /** | ||
| * Constructs a new `ChunkBuilder`. | ||
| */ | ||
| def make[A]: ChunkBuilder[A] = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| def make[A]: ChunkBuilder[A] = | |
| def make[A](capacity: Int = 10): ChunkBuilder[A] = |
I saw you could call ensureSize on ArrayBuilder if you sub class it. Which would allow "pre-allocation" in cases where it's known about how many things will be added.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I think we want to override the sizeHint method. Because these methods will mostly be called by Scala collection library combinators that are outside our control.
| while (i < len) { | ||
| val chunk = f(self(i)) match { | ||
| case chunk: Chunk[B] => chunk | ||
| case other => Chunk.fromIterable(other.iterator.to(List)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Chunk.fromArray(...toArray) if possible.
If not possible maybe we should directly make a Chunk.fromIterator or something since going "through" List will not be efficient.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think adding that method is a good idea.
| * Constructs a `Chunk` from the specified `IterableOnce`. | ||
| */ | ||
| def from[A](source: IterableOnce[A]): Chunk[A] = | ||
| Chunk.fromIterable(source.iterator.to(Iterable)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here.
| CrossType.Full.sharedSrcDir(baseDirectory.value, "test").toList.map(f => file(f.getPath + "-2.12+")), | ||
| CrossType.Full.sharedSrcDir(baseDirectory.value, "main").toList.map(f => file(f.getPath + "-dotty")) | ||
| CrossType.Full.sharedSrcDir(baseDirectory.value, "main").toList.map(f => file(f.getPath + "-dotty")), | ||
| CrossType.Full.sharedSrcDir(baseDirectory.value, "main").toList.map(f => file(f.getPath + "-2.13+")) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sbt's not going to like this. 😆
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was surprised this worked as smoothly as it did!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Excellent work! A few minor tips to improve performance, but overall looks great.
|
Done. |
Makes Chunk extend
IndexedSeqfrom the Scala standard library. This will improve interoperability with other code and pave the way for us to unify theIterableandChunkversions of a variety of combinators.Because of changes to Scala's collection library in 2.13, this is implemented in terms of a new version specific trait
ChunkLikethatChunkextends.ChunkLikeextends the appropriate traits for each versions and implements the corresponding builder interface so that code inChunkcan continue to be written on a cross-version basis.Potential issues:
empty ++ nonemptyreturning a nonempty chunk because we now have another++variant onIndexedSeq. We could address by picking another operator for combining two chunks that doesn't clash to get back the current behavior.ChunkLikedue to version specific differences. In some cases Scala 2.13 provides a signature that matches ours because it doesn't includeCanBuildFrombut Scala 2.12 doesn't. So we need tooverridein one case but not the other. In other cases methods are defined asfinalinIndexedSeqin Scala 2.13 (e.g.sizeis defined in terms oflength) so we can't override them.ChunkBuilerimplementation could be optimized further. It is already backed by anArrayBuilderthat has specialized implementations for primitives but we could override the++=method for increased efficiency.Chunk#foldhas to be renamedChunk#foldLeft. I think this is fine and it probably should have been this way all along but it is a breaking change.