-
Notifications
You must be signed in to change notification settings - Fork 1.4k
Change ZPipeline split implementations to use Trie Data Structure #10178
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: series/2.x
Are you sure you want to change the base?
Conversation
ee3c8d9 to
83085da
Compare
7e35eba to
0d075fc
Compare
|
Please fix the tests. Once everything's done, change the status of your PR by clicking on the "Ready to review" button |
| def splitOn(delimiter: => String)(implicit trace: Trace): ZPipeline[Any, Nothing, String, String] = | ||
| ZPipeline.mapChunks[String, Char](_.flatMap(string => Chunk.fromArray(string.toArray))) >>> | ||
| ZPipeline.splitOnChunk[Char](Chunk.fromArray(delimiter.toArray)) >>> | ||
| ZPipeline.splitOnChunk[Char, Char](delimiter.toList, true) >>> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please use named arguments for booleans like this one
| ZPipeline.splitOnChunk[Char, Char](delimiter.toList, true) >>> | |
| ZPipeline.splitOnChunk[Char, Char](delimiter.toList, allowEmpty = true) >>> |
|
|
||
| private object Trie { | ||
| def empty[T] = Trie(Map.empty[T, Trie[T]], 0, false) | ||
| def next[T](depth: Int) = Trie(Map.empty[T, Trie[T]], depth + 1, false) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| def next[T](depth: Int) = Trie(Map.empty[T, Trie[T]], depth + 1, false) | |
| def next[T](depth: Int): Trie[T] = Trie(structure = Map.empty[T, Trie[T]], depth = depth + 1, isLeaf = false) |
| seq match { | ||
| case head +: tail => { | ||
| val next = trie.structure.getOrElse(head, Trie.next[T](trie.depth)) | ||
| trie.copy(structure = trie.structure.updated(head, insert(next, tail))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can avoid one Trie allocation in the else case of getOrElse by doing:
| trie.copy(structure = trie.structure.updated(head, insert(next, tail))) | |
| val next = trie.structure.getOrElse(head, null) | |
| val nextStructure = trie.structure.updated(head, insert(next, tail)) | |
| if (next eq null) Trie.next(nextStructure, trie.depth) // Note that you'll need to change the `next` signature | |
| else trie.copy(structure = nextStructure) |
|
I review a few minor details. Didn't review everything. I haven't watched in details but for the |
|
I had some trouble converting the apply into the mutable/iterable pattern, any thoughts? Using a MutableBoolean would be an obvious solution to the leaf issue, but maybe not the preferred one. private object Trie {
private val _empty = Trie(structure = scala.collection.mutable.Map.empty[Any, Trie[Any]], depth = 0, isLeaf = false)
def empty[T]: Trie[T] = _empty.asInstanceOf[Trie[T]]
def next[T](depth: Int): Trie[T] = Trie(structure = scala.collection.mutable.Map.empty[T, Trie[T]], depth = depth + 1, isLeaf = false) // Using a reference to empty here created a circular reference
def apply[T](delimiters: Seq[Seq[T]]): Trie[T] = {
var index = 0
val outerIterator = delimiters.iterator
var root = Trie.empty[T]
while (outerIterator.hasNext) {
val innerIterator = outerIterator.next().iterator
var curNode = root
index = 0
if (innerIterator.hasNext) {
while (innerIterator.hasNext) {
val elem = innerIterator.next()
curNode = curNode.structure.get(elem) match {
case Some(nextNode) => nextNode
case None => {
var nextNode = Trie.next[T](depth = index)
curNode.structure += (elem -> nextNode)
nextNode
}
}
index += 1
}
curNode = curNode.copy(isLeaf = true) //Doesn't do anything
}
}
return root
}
}
|
6b91527 to
76ff788
Compare
|
@bahuber The exposed |
|
@bahuber You need to iterate with a |
|
Let me know if you want this change added: I think the _empty trick with the Mutable objects caused some bugginess in the tests. |
Tries are a good tool for this type of problem, using them allows for the two similar implementations to be consolidated. I added a somewhat contrived test case to show that a Trie ensures correctness, whereas simple resets could produce incorrect results. Please let me know if there are more efficient ways to do the processing (ex. maybe calling writeChunk per chunk rather than at end).