Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Lazier LazyList.#6880

Closed
hrhino wants to merge 1 commit into
scala:2.13.xfrom
hrhino:topic/laziest-list
Closed

Lazier LazyList.#6880
hrhino wants to merge 1 commit into
scala:2.13.xfrom
hrhino:topic/laziest-list

Conversation

@hrhino
Copy link
Copy Markdown
Contributor

@hrhino hrhino commented Jul 3, 2018

As lamented in scala/bug#10696 and bemoaned in scala/collection-strawman#367, LazyList (and Stream before it) does not have a way of representing a collection with uncomputed emptiness. This adds a third subclass of LazyList, LazyList.Suspended, which wraps a closure returning a LazyList, and only evaluates it when needed. This is about as lazy of a list as I can imagine, now.

Two tests are currently failing, both because ll.filter(...).map(...) is even lazier than contemplated. I'm not sure how to test for the linked bug (scala/bug#9134); I'll take suggestions or think about it over the vacation.

BTW, this is the very first thing I've done with the new collections, so I hope it's stylistically alright. Let me know if I've broken convention or something.

Fixes scala/bug#10696.
Fixes scala/collection-strawman#367.

As lamented in scala/bug#10696 and bemoaned in scala/collection-strawman#367,
`LazyList` (and `Stream` before it) does not have a way of representing a
collection with uncomputed emptiness. This adds a third subclass of `LazyList`,
`LazyList.Suspended`, which wraps a closure returning a `LazyList`, and only
evaluates it when needed. This is about as lazy of a list as I can imagine, now.

Fixes scala/bug#10696.
Fixes scala/collection-strawman#367.
@hrhino hrhino requested a review from julienrf July 3, 2018 12:35
@scala-jenkins scala-jenkins added this to the 2.13.0-M5 milestone Jul 3, 2018
else iterableFactory.empty).asInstanceOf[C]
}
private[immutable] def filterImpl(p: A => Boolean, isFlipped: Boolean): C =
suspend0 {
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just using suspend here complains that we got C but expected CC[A]

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can add a constraint that C <: CC[A], if that helps. But such constraints usually make things hard to abstract over.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

C is defined as

+C <: CC[A] with LazyListOps[A, CC, C]

The problem (I think) is that we don't know that the iterableFactory is going to return the exact same collection type (C) when asked to produce a collection type wrapping A, even though we know/hope that C =:= CC[A].

@hrhino hrhino requested a review from adriaanm July 3, 2018 12:48
f(s) match {
case None => empty[A]
case Some((a, s1)) => newCons(a, loop(s1))
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if #6851 gets merged, this won't be needed

eval = null
while (res.isInstanceOf[Suspended[_]]) {
// skip through multiple suspensions in a loop rather than using the stack
res = res.asInstanceOf[Suspended[A]].next
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a test to make sure this is stack-safe?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There will be.

  • test for stack safety

Copy link
Copy Markdown
Contributor

@NthPortal NthPortal Jul 6, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm actually fairly certain this isn't stack-safe, because each call to next checks isInstanceOf[Suspended[_]], and then calls next on the result. Thus, you get a call stack (of sorts) like the one below

    res.res.res.res.next
 at res.res.res.next
 at res.res.next
 at res.next
 at this.next

Edit: I have tested it

scala> def nest(n: Int): LazyList[Int] = {
     | if (n > 0) LazyList.suspend(nest(n - 1))
     | else LazyList.empty[Int]
     | }
nest: (n: Int)LazyList[Int]

scala> nest(10000)
java.lang.StackOverflowError
  at .$anonfun$nest$1(<console>:2)
  at scala.collection.immutable.LazyList$Suspended.next$lzycompute(LazyList.scala:685)
  at scala.collection.immutable.LazyList$Suspended.next(LazyList.scala:683)
  at scala.collection.immutable.LazyList$Suspended.next$lzycompute(LazyList.scala:689)
  at scala.collection.immutable.LazyList$Suspended.next(LazyList.scala:683)

(it also seems to be losing the laziness somewhere, but I'm not sure where)

*
* @param eval a closure which will create another [[LazyList]]
*/
def suspend[A](eval: => LazyList[A]): LazyList[A] =
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure how I feel about the name suspend, both internally and externally, but especially as an externally visible API. How do you feel about the names defer and Deferred?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm okay with defer. Since we had the Deferrer class, I didn't want to overload the word.

}

@SerialVersionUID(3L)
final class Suspended[A](var eval: () => LazyList[A]) extends LazyList[A] {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure we need the Deferrer class anymore, if we have this

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd love to get rid of the Deferrer class, but doing so makes this break:

val cycle1: LazyList[Int] = 1 #:: 2 #:: cycle1

and call me a crazy functional programmer, but I like being able to do that.

(1 #:: 2 #:: suspend(cycle1) is a workaround, but not one I find all that pretty.)

tl
val res = tl()
tl = null
res
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

Copy link
Copy Markdown
Contributor

@NthPortal NthPortal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it could use a few more tests to assert that it's fully lazy, but that's about it.

Awesome!

}

assertEquals(true, Try { wf.map(identity) }.isFailure) // throws on n == 5
assertEquals(true, Try { wf.map(identity).force }.isFailure) // throws on n == 5
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might be more worthwhile to change the tests to assert better laziness


@SerialVersionUID(3L)
final class Suspended[A](var eval: () => LazyList[A]) extends LazyList[A] {
private[this] var evaluated: Boolean = false
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should this be @volatile? (same question for hdDefined and tlDefined in Cons)

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have no idea. I think maybe, although since head's protected by its own volatile bitmap field, the danger is that one thread will call head, and the other thread will get false from headDefined, which is unfortunate but probably not harmful.

I think the evaluated = true should go after the call to eval, though, now that I think on it.

Copy link
Copy Markdown
Contributor

@NthPortal NthPortal Jul 6, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've been thinking about this a bunch, and I think that, because it's not volatile, the JMM allows another thread to read a value of true from evaluated, as long as it gets set to true at some point (possibly only if set by another thread?). Basically, without @volatile, there is no happens-before relationship between the write of true and the read of true if they (might?) happen on different threads, so it might not happen before.

I would ask a JMM expert before taking that as gospel truth, but it seems racy to me.

@sjrd
Copy link
Copy Markdown
Member

sjrd commented Jul 4, 2018

Thank you for this!

I won't be able to properly review this until July 12. I'm just leaving two high-level comments for now:

  • There is no need for the tail of a LazyList to be lazy anymore. Indeed, it can be an eager LazyList that happens to be Suspended instead!
  • I would highly recommend exploring a design where the subclasses of LazyList are private. The existence of Suspended should be more or less an implementation detail. And you don't want people to try and explicitly pattern-match on Cons/Empty/Suspended, because it will most likely not do what they think it's doing.

As added food for thought: I wonder whether it is still useful that the head be lazy. Are there use cases where one already knows that the lazy list is non-empty, but doesn't know yet what its head will be (and cannot synchronously compute it)? My experience from programming in Oz (where this lazier LazyList is basically the base data type used everywhere--and optimized by the VM) doesn't support any such use case. Combined with my first comment that tail need not be lazy anymore, I expect that LazyList.Cons can be fully eager.

@NthPortal
Copy link
Copy Markdown
Contributor

@sjrd It is conceivable to me that there exists some situation where Iterator.hasNext is cheap, but .next() is not; in such a case, it would be valuable to have a lazy head (when creating a LazyList from the iterator). That being said, I can't think of such a situation.

@sjrd
Copy link
Copy Markdown
Member

sjrd commented Jul 4, 2018

@NthPortal Yes, that is definitely conceivable. But at call site, is it likely that you want to call hasNext and not follow up immediately with next(), in the same synchronous "step"? Because if you always follow hasNext with next(), then it doesn't matter whether the potentially expensive operation is performed during hasNext or during next().

hd
val res = hd()
hd = null
res
Copy link
Copy Markdown
Contributor

@julienrf julienrf Jul 4, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The goal here is to let the thunk be garbage collected?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Copy Markdown
Contributor

@julienrf julienrf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @hrhino for working on this!

It seems that your design is equivalent to the one proposed by @sjrd in scala/collection-strawman#367 (comment) but I must say that I have a slight preference for his design because it prevents returning an eager LazyList by mistake (like we currently do in Process#lazyLines).

Also, I have a preference for renaming suspend into defer, as this was suggested by someone else.

@julienrf
Copy link
Copy Markdown
Contributor

julienrf commented Jul 4, 2018

Maybe it still makes sense to have a lazy head so that we can know whether the lazy list is empty without evaluating its next element:

def expensiveList(n: Int): LazyList[Int] =
  LazyList {
    if (n == 0) State.empty
    else State.nonEmpty(heavyComputation(n), expensiveList(n - 1))
  }

In this example, it might be useful in some cases to evaluate the fact that the underlying state is NonEmpty but still defer the computation of its head to a later point in the program?

@sjrd
Copy link
Copy Markdown
Member

sjrd commented Jul 4, 2018

Your example is again based on the definition site. It does not address what I said #6880 (comment) any more than what has previously been said.

@hrhino
Copy link
Copy Markdown
Contributor Author

hrhino commented Jul 4, 2018

@julienrf: I did originally have a design similar to @sjrd's, but I figured there was too much indirection calling this.state.foo, so this.

it prevents returning an eager LazyList by mistake (like we currently do in Process#lazyLines)

How do you mean? The relevant bit called by lazyLines looks now like

def next(): LazyList[T] = LazyList.suspend(q.take match {
  case Left(0)    => LazyList.empty
  case Left(code) => if (nonzeroException) scala.sys.error("Nonzero exit code: " + code) else LazyList.empty
  case Right(s)   => LazyList.cons(s, next())
})

(previously the suspend call wasn't there, of course). The overeager evaluation here is at the definition side; there's no way to drop the suspend and have a lazy-enough list.

I'm going to take Sébastien's advice and make the subclasses private; if we ensure that the Cons constructor isn't called with a strictly-evaluated tail, then overstrictness won't be a problem.


With reference to @sjrd's point about strictness in the head: iterator is always going to force the heads, but a sufficiently-lazy operation doesn't have to. drop is an easy example. foldLeft and foldRight could avoid evaluating heads, except the parameters to the combining function are strict, so in effect they won't. Personally I prefer the lazy head, but with the current strictness of the collections operations taking advantage of it may wind up being tough to use.

Would lazyFold{Right,Left}s be considered useful specialized operations?

@julienrf
Copy link
Copy Markdown
Contributor

julienrf commented Jul 4, 2018

it prevents returning an eager LazyList by mistake

How do you mean?

For instance when we implement transformation operations such as map or filter, we now have to carefully wrap the result in a defer call to not eagerly evaluate the receiver’s state.

@julienrf
Copy link
Copy Markdown
Contributor

julienrf commented Jul 4, 2018

@sjrd I guess there can be some algorithms that care about the fact that a lazy list is empty or not, and then defer when they actually evaluate the head to a later point. OK, I have no examples…

@julienrf
Copy link
Copy Markdown
Contributor

julienrf commented Jul 9, 2018

@hrhino What do you think about #6880 (comment) ?

@hrhino
Copy link
Copy Markdown
Contributor Author

hrhino commented Jul 9, 2018

@julienrf I think that it's a fair point (and I'm rewriting this to use Sébastien's design), but I think doesn't go far enough on its own; we also need to take care of those SeqOps implementations which don't go through views. If I can make all of the inherited operations that return a LazyList use fromSpecificIterable applied to a view, then I think your point is a good idea. I'll double-check that all the methods can be lazy enough (I don't have the source code with me right now) but there is always the danger that someone adds a new operation or an override in a superclass of LazyList and accidentally makes it strict. I'll put tests in, at least.

@julienrf
Copy link
Copy Markdown
Contributor

julienrf commented Jul 9, 2018

@hrhino For methods such as dropRight or groupBy we have no other choice than being strict. And these methods clearly document that they are strict. So I think it’s fine to inherit them in LazyList.

@julienrf
Copy link
Copy Markdown
Contributor

@hrhino Will you have time to continue this work? Do you want any help?

@hrhino
Copy link
Copy Markdown
Contributor Author

hrhino commented Jul 12, 2018

Yes, I'll pick it up this weekend. It's been really busy at work; sorry. I'll shout in the strawman gitter if I get confused. Thanks for the poke.

When's the feature freeze deadline?

@julienrf
Copy link
Copy Markdown
Contributor

The targeted deadline is the 10th of August.

@julienrf
Copy link
Copy Markdown
Contributor

Any update on this @hrhino? If you are too busy maybe you can push your current state somewhere and someone else can finish the work?

@hrhino
Copy link
Copy Markdown
Contributor Author

hrhino commented Jul 22, 2018

Yes, sorry, I may need to do that.

@NthPortal
Copy link
Copy Markdown
Contributor

NthPortal commented Jul 31, 2018

Status summary

We have decided it is best to go with an implementation like the one suggested by @sjrd; specifically, that LazyList should contain a lazy state, so that knowledge of whether or not it is empty does not need to be known at construction time.

With this new internal structure, significantly less is shared with Stream, and it doesn't make sense for them to share a trait with implementation (LazyListOps) anymore. Thus, in order to allow for a more sensible implementation of LazyList, the implementations need to be separated.

Thus, there are two major tasks to be accomplished to make a better LazyList a reality:

  1. Modify LazyList to be fully lazy
    • use a lazy internal state, as mentioned above
    • make the implementation of all methods specific to LazyList, instead of inheriting from a shared trait
    • remove the implementation of Stream which inherits from LazyListOps
    • remove LazyListOps
  2. Port the implementation of Stream from 2.12 and update it to work with the new collections design. It should still be deprecated in favour of LazyList (though it's possible the deprecation message will need to be tweaked slightly from what it is now).

I am currently working on (1). There are various aspects of the previous Stream implementation which make porting it non-trivial (at least for me), so it would be very helpful if someone more familiar could take care of (2).

@julienrf
Copy link
Copy Markdown
Contributor

Regarding (2), I’m not sure porting (again) the 2.12 implementation to the new design would be simpler than just inlining the LazyListOps inheritance relation in Stream.

@NthPortal
Copy link
Copy Markdown
Contributor

@julienrf I'm open to that as well.

@hrhino
Copy link
Copy Markdown
Contributor Author

hrhino commented Aug 2, 2018

I'm closing this in favor of @NthPortal's work. Sorry for the false alarm.

@hrhino hrhino closed this Aug 2, 2018
@SethTisue SethTisue removed this from the 2.13.0-M5 milestone Aug 2, 2018
@NthPortal
Copy link
Copy Markdown
Contributor

See #7000

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants