Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Conversation

@adamgfraser
Copy link
Contributor

Makes Chunk extend IndexedSeq from the Scala standard library. This will improve interoperability with other code and pave the way for us to unify the Iterable and Chunk versions of a variety of combinators.

Because of changes to Scala's collection library in 2.13, this is implemented in terms of a new version specific trait ChunkLike that Chunk extends. ChunkLike extends the appropriate traits for each versions and implements the corresponding builder interface so that code in Chunk can continue to be written on a cross-version basis.

Potential issues:

  • I think we are going to have to give up on empty ++ nonempty returning a nonempty chunk because we now have another ++ variant on IndexedSeq. We could address by picking another operator for combining two chunks that doesn't clash to get back the current behavior.
  • A few methods had to be move to ChunkLike due to version specific differences. In some cases Scala 2.13 provides a signature that matches ours because it doesn't include CanBuildFrom but Scala 2.12 doesn't. So we need to override in one case but not the other. In other cases methods are defined as final in IndexedSeq in Scala 2.13 (e.g. size is defined in terms of length) so we can't override them.
  • ChunkBuiler implementation could be optimized further. It is already backed by an ArrayBuilder that has specialized implementations for primitives but we could override the ++= method for increased efficiency.
  • Chunk#fold has to be renamed Chunk#foldLeft. I think this is fine and it probably should have been this way all along but it is a breaking change.

Copy link
Member

@iravid iravid left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Love this. Great idea.

Had one question about the inference regression, but other than that I'm 👍 on this!

c.foreach(_ => ())

assert(c.filter(_ => false).map(_ * 2).length)(equalTo(0))
assert(c.filter(_ => false).map[Int](_ * 2).length)(equalTo(0))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you explain why the annotations are required on map here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The issue is that we now have two overloaded versions of map on Scala 2.11 and 2.12:

// From Chunk
def map[B](f: A => B): Chunk[B]

// from IndexedSeq
def map[B, That](f: A => B)(implicit bf: CanBuildFrom[List[A], B, That]): That

So in the presence of that the Scala compiler, especially on 2.11 has a harder time inferring these types and complains that the type of an anonymous function should be fully known (even though it seems like it should be inferable here).

I think to address we need to consider moving methods like this to ChunkLike and only having one implementation on Scala 2.11 / Scala 2.12 that uses the signature from IndexedSeq but has an efficient implementation.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I see. Will we able to retain the efficient implementation given that signature? We'd have to go through the CBF and won't be able to use an array directly.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's the question. I think we can. We can match on the CBF to get back a ChunkBuilder and add methods on the ChunkBuilder to support an efficient implementation in that case.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah yeah. That sounds viable!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Worth writing up in tickets so we don't forget!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am incorporating a solution in this PR. Will push an updated version shortly.

check(chunkWithIndex(Gen.unit)) {
case (chunk, i) =>
assert(chunk.apply(i))(equalTo(chunk.toSeq.apply(i)))
assert(chunk.apply(i))(equalTo(chunk.toList.apply(i)))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can omit the whole .toList.and use the apply directly on Chunk.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah never mind, it's testing that it's the same as List implementation.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the goal in this test was to make sure the apply method itself is accurate, and so we are comparing the apply implementation in Chunk with the known correct value from accessing the same index in the list. I suppose maybe it would be better to check against Vector here.

// val empty: Chunk[B] = Chunk.empty

val _: NonEmptyChunk[A] = empty ++ Chunk(new A {})
// val _: NonEmptyChunk[A] = empty ++ Chunk(new A {})
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Dead code in light of the changes to ++? Here and above.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I think we are going to have to add a new operator with a different name, something like concatNonEmpty.

jdegoes
jdegoes previously approved these changes Apr 12, 2020
Copy link
Member

@jdegoes jdegoes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tricky and delicate work here; love the attention to detail.

This exciting work is going to make Chunk the go-to collection type!

@adamgfraser
Copy link
Contributor Author

This is ready for another review. I added a ChunkCanBuildFrom subtype of CanBuildFrom. We can pattern against this to prove that the target type That is a Chunk, allowing us to use all of our existing efficient implementations of Chunk combinators.

@adamgfraser adamgfraser requested review from iravid and jdegoes April 13, 2020 02:28
while (i < len) {
val chunk = f(self(i)) match {
case chunk: Chunk[B] => chunk
case other => Chunk.fromIterable(other.toList)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This toList will be detrimental to performance. Probably best to do Chunk.fromArray(other.toArray) if we need to do that.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will change.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can't call toArray here because we don't have a ClassTag at this point. May make sense to refactor this so we just iterate over the original collection and either get the class tag from the Chunk if it is a chunk or otherwise from the first value of the collection using Tag.fromValue.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right. I think if we can only iterate over it once, then the best we can do is use the chunk builder to turn the "iterable once" into a chunk.

OTOH if we match against a few other cases, e.g. Vector, List, etc., we know what collection type we are dealing with and can handle it more specially (e.g. Vector we can preallocate, List cannot be pre-allocated since it's O(n) to determine the size, etc.).

But open question how much of that performance work to do here. I'm fine merging this in now, we can create tickets for performance work.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've got an idea about how we can do this efficiently. Will do a turn on it now.

* and is not referentially transparent. It is provided for compatibility
* with Scala's collection library and should not be used for other purposes.
*/
override protected[this] def newBuilder: ChunkBuilder[A] =
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By convention, side-effecting methods should have nullary parameter list: def newBuilder().

Copy link
Contributor Author

@adamgfraser adamgfraser Apr 13, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. The signature of the newBuilder method we are overriding doesn't have the nullary parameter list. Let me see if I can add and still have it be recognized as a valid override.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes unfortunately adding the nullary parameter list generate a compiler error so I think we will have to leave as is.

/**
* Constructs a new `ChunkBuilder`.
*/
def make[A]: ChunkBuilder[A] =
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
def make[A]: ChunkBuilder[A] =
def make[A](capacity: Int = 10): ChunkBuilder[A] =

I saw you could call ensureSize on ArrayBuilder if you sub class it. Which would allow "pre-allocation" in cases where it's known about how many things will be added.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I think we want to override the sizeHint method. Because these methods will mostly be called by Scala collection library combinators that are outside our control.

while (i < len) {
val chunk = f(self(i)) match {
case chunk: Chunk[B] => chunk
case other => Chunk.fromIterable(other.iterator.to(List))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Chunk.fromArray(...toArray) if possible.

If not possible maybe we should directly make a Chunk.fromIterator or something since going "through" List will not be efficient.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think adding that method is a good idea.

* Constructs a `Chunk` from the specified `IterableOnce`.
*/
def from[A](source: IterableOnce[A]): Chunk[A] =
Chunk.fromIterable(source.iterator.to(Iterable))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here.

CrossType.Full.sharedSrcDir(baseDirectory.value, "test").toList.map(f => file(f.getPath + "-2.12+")),
CrossType.Full.sharedSrcDir(baseDirectory.value, "main").toList.map(f => file(f.getPath + "-dotty"))
CrossType.Full.sharedSrcDir(baseDirectory.value, "main").toList.map(f => file(f.getPath + "-dotty")),
CrossType.Full.sharedSrcDir(baseDirectory.value, "main").toList.map(f => file(f.getPath + "-2.13+"))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sbt's not going to like this. 😆

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was surprised this worked as smoothly as it did!

jdegoes
jdegoes previously approved these changes Apr 13, 2020
Copy link
Member

@jdegoes jdegoes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Excellent work! A few minor tips to improve performance, but overall looks great.

@adamgfraser
Copy link
Contributor Author

Done.

@jdegoes jdegoes merged commit e72fa8a into zio:master Apr 13, 2020
@adamgfraser adamgfraser deleted the chunk branch April 22, 2020 01:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants