Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Conversation

@ghostdogpr
Copy link
Member

Constructing a Chunk of size 1 is faster using Chunk.single rather than Chunk.apply which uses Chunk.fromIterable. Found the one coming from interruptAsFork while profiling an app:
Screenshot 2024-04-09 at 3 27 40 PM

@kyri-petrou
Copy link
Contributor

On second thought, would it work if Chunk.apply was overloaded to accept a single element? Or would that not be binary-compatible?

  def apply[A](as: A): Chunk[A] =
    single(as)

@ghostdogpr
Copy link
Member Author

On second thought, would it work if Chunk.apply was overloaded to accept a single element? Or would that not be binary-compatible?

  def apply[A](as: A): Chunk[A] =
    single(as)

Mima seems to be happy, so I will open another PR.

@ghostdogpr ghostdogpr closed this Apr 9, 2024
@ghostdogpr ghostdogpr deleted the chunk-2.0 branch April 9, 2024 07:06
@ghostdogpr ghostdogpr restored the chunk-2.0 branch April 9, 2024 07:16
@ghostdogpr ghostdogpr reopened this Apr 9, 2024
@ghostdogpr
Copy link
Member Author

@kyri-petrou it didn't work with 2.12 so reopening this one

@jdegoes
Copy link
Member

jdegoes commented Apr 9, 2024

@ghostdogpr

Can I instead suggest the existing .apply constructor call nonEmpty on the sequence, to decide internally whether to call Chunk.single? Then the API does not need to be carefully used and it does not need to be expanded.

@ghostdogpr
Copy link
Member Author

Actually apply is probably going to be called only with very small values so it's okay to check the size. Added this:

override def apply[A](as: A*): Chunk[A] =
    if (as.size == 1) single(as.head) else fromIterable(as)

@kyri-petrou
Copy link
Contributor

Actually apply is probably going to be called only with very small values so it's okay to check the size. Added this:

override def apply[A](as: A*): Chunk[A] =
    if (as.size == 1) single(as.head) else fromIterable(as)

I didn't know this until now, but the Seq created by def apply[A](as: A*) is actually an ArraySeq and has a knownSize defined. So calling .size should have an O1 complexity. In case you want to try it out:

object Foo {
  def apply(vs: String*): Unit = {
    println(vs)
    println(vs.knownSize)
  }

  @main def m = Foo("asd", "asd", "")
}

@jdegoes jdegoes merged commit 582270a into zio:series/2.0.x Apr 9, 2024
@ghostdogpr ghostdogpr deleted the chunk-2.0 branch April 9, 2024 23:42
@erikvanoosten
Copy link
Contributor

erikvanoosten commented Apr 17, 2024

What I have always understood from my Java time is that a vararg method is actually implemented with arrays. Now scala presents this as a Seq, but presumably it actually is still an array at runtime. According to https://users.scala-lang.org/t/storage-backing-varargs/1997 we can get the underlying array with toArray. With that in place the more efficient implementation would be:

override def apply[A](as: A*): Chunk[A] =
    if (as.size == 1) single(as.head) else fromArray(as.toArray)

WDYT?

@erikvanoosten
Copy link
Contributor

but presumably it actually is still an array at runtime

Hmm, after reading some more, I am no longer sure this is true 🤔

@ghostdogpr
Copy link
Member Author

ghostdogpr commented Apr 17, 2024

Looks like we get an ArraySeq so I think you're right that toArray is free. It would rely on internal implementation details that could change though, but I think it's okay.

@erikvanoosten
Copy link
Contributor

erikvanoosten commented Apr 18, 2024

Looks like we get an ArraySeq ...

Nope, I was wrong:

scala> def foo(as: String*) = { println(as.getClass) }
def foo(as: String*): Unit

scala> foo("a","b")
class scala.collection.immutable.ArraySeq$ofRef  // all fine so far

scala> foo(Vector("a","b")*)
class scala.collection.immutable.Vector1   // but not here

We could specialize for ArraySeq though...

@kyri-petrou
Copy link
Contributor

I wonder if it's worth bringing in a dependency on scala-collection-compat. It's very widely used, and it allows for a lot of optimizations to be done by having sizeCompare available on iterables.

@erikvanoosten
Copy link
Contributor

scala>   def bar(as: String*) =
     |     if (as.isInstanceOf[scala.collection.immutable.ArraySeq[?]]) "array!"
     |     else "iterable"
def bar(as: String*): String

scala> bar("a", "b")
val res0: String = array!

scala> bar(Vector("a", "b")*)
val res1: String = iterable

This works, so a good Chunk.apply implementation could be:

  override def apply[A](as: A*): Chunk[A] =
    if (as.size == 1) single(as.head)
    else if (as.isInstanceOf[scala.collection.immutable.ArraySeq[?]]) fromArray(as.toArray)
    else fromIterable(as)

Note sure how to test the performance of this though.

@kyri-petrou
Copy link
Contributor

@erikvanoosten I think there are 2 separate things here:

  1. Applying an optimization for cases that as is an ArraySeq.
  2. Making sure that the as.size doesn't iterate through a large collection.

For (1), I think we probably ignore optimizations for this case since the number of elements that will be passed are likely to be very few, since they're usually written in this kind of fashion: Chunk(1,2,3)

Now (2) is indeed a bit dangerous (performance-wise), and we should probably try and work something out. The most dangerous case I can think of is this one:

val someList = List.fill(10_000)("foo")
val chunk = Chunk(someList*)

In this case, the Seq that is passed to the apply(as: A*) method will be a List, and the call to size will need to iterate through the entire list to determine its size

@erikvanoosten
Copy link
Contributor

erikvanoosten commented Apr 18, 2024

@kyri-petrou You are right, since you can call vargarg methods with any Seq (due to the * operator), so you can not be sure if size needs to traverse the collection.

So the optimization in this PR is questionable.

For (1), I think we probably ignore optimizations for this case since the number of elements that will be passed are likely to be very few, since they're usually written in this kind of fashion: Chunk(1,2,3)

If we're here for micro-optimizations: my intuition says that the extra if statement will be faster than an extra allocation.

We can fix (2) by using knownSize, I kept the if statement for (1):

  override def apply[A](as: A*): Chunk[A] =
    if (as.knownSize == 1) single(as.head)
    else if (as.isInstanceOf[scala.collection.immutable.ArraySeq[?]]) fromArray(as.toArray)
    else fromIterable(as)

@kyri-petrou
Copy link
Contributor

@erikvanoosten problem with knownSize is that it's not available in Scala 2.12. Thus why I recommended this #8718 (comment)

@erikvanoosten
Copy link
Contributor

@erikvanoosten problem with knownSize is that it's not available in Scala 2.12. Thus why I recommended this #8718 (comment)

Ah, now I understand why you wanted this.

I believe IndexedSeq does exist in 2.12. So we could also check against that:

  override def apply[A](as: A*): Chunk[A] =
    if (as.isInstanceOf[scala.collection.immutable.IndexedSeq[?]] && as.size == 1) single(as.head)
    else if (as.isInstanceOf[scala.collection.immutable.ArraySeq[?]]) fromArray(as.toArray)
    else fromIterable(as)

@kyri-petrou
Copy link
Contributor

@erikvanoosten this feels like a workaround for something that there is already a solution for - which is to bring in scala-collection-compat as a dependency. I know it's not ideal to bring in a dependency to a project that has none, but most of zio-* projects already depend on it. For Scala 2.13 and Scala 3, the dependency doesn't bring much code in (none for 3, almost non for 2.13)

The reason I'm advocating to bring it in is because I had to run through similar loops in zio-query myself until I decided to add it as a dependency. At the end, we're very likely going to miss out on optimizations if we don't do it. As an example, the implementation of foreachParX currently uses .size, when a simple sizeCompare would be much cheaper for Lists:

private def foreachParDiscard[R, E, A](
n: => Int
)(as0: => Iterable[A])(f: A => ZIO[R, E, Any])(implicit trace: Trace): ZIO[R, E, Unit] =
ZIO.suspendSucceed {
val as = as0
val size = as.size
if (size == 0) ZIO.unit
else if (size == 1) f(as.head).unit

@ghostdogpr
Copy link
Member Author

Sounds okay to me to depend on scala-collection-compat. @jdegoes any opinion?

@jdegoes
Copy link
Member

jdegoes commented Apr 18, 2024

I think it's okay, but .nonEmpty / .isEmpty is sufficient for many cases where the class is not known. And if the class is known (via type casing), you don't need size cause you can use .length.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants