Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Conversation

@bahuber
Copy link
Contributor

@bahuber bahuber commented Sep 25, 2025

Tries are a good tool for this type of problem, using them allows for the two similar implementations to be consolidated. I added a somewhat contrived test case to show that a Trie ensures correctness, whereas simple resets could produce incorrect results. Please let me know if there are more efficient ways to do the processing (ex. maybe calling writeChunk per chunk rather than at end).

@bahuber
Copy link
Contributor Author

bahuber commented Oct 18, 2025

I noticed that the iterative version takes 2x as long to run this test. Is there underlying benefits of using @tailrec recursion vs purely iterative methods? Or did I just make it less efficient some other way.

@guizmaii guizmaii marked this pull request as draft October 19, 2025 10:03
@guizmaii
Copy link
Member

Please fix the tests. Once everything's done, change the status of your PR by clicking on the "Ready to review" button

def splitOn(delimiter: => String)(implicit trace: Trace): ZPipeline[Any, Nothing, String, String] =
ZPipeline.mapChunks[String, Char](_.flatMap(string => Chunk.fromArray(string.toArray))) >>>
ZPipeline.splitOnChunk[Char](Chunk.fromArray(delimiter.toArray)) >>>
ZPipeline.splitOnChunk[Char, Char](delimiter.toList, true) >>>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please use named arguments for booleans like this one

Suggested change
ZPipeline.splitOnChunk[Char, Char](delimiter.toList, true) >>>
ZPipeline.splitOnChunk[Char, Char](delimiter.toList, allowEmpty = true) >>>


private object Trie {
def empty[T] = Trie(Map.empty[T, Trie[T]], 0, false)
def next[T](depth: Int) = Trie(Map.empty[T, Trie[T]], depth + 1, false)
Copy link
Member

@guizmaii guizmaii Oct 19, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
def next[T](depth: Int) = Trie(Map.empty[T, Trie[T]], depth + 1, false)
def next[T](depth: Int): Trie[T] = Trie(structure = Map.empty[T, Trie[T]], depth = depth + 1, isLeaf = false)

seq match {
case head +: tail => {
val next = trie.structure.getOrElse(head, Trie.next[T](trie.depth))
trie.copy(structure = trie.structure.updated(head, insert(next, tail)))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can avoid one Trie allocation in the else case of getOrElse by doing:

Suggested change
trie.copy(structure = trie.structure.updated(head, insert(next, tail)))
val next = trie.structure.getOrElse(head, null)
val nextStructure = trie.structure.updated(head, insert(next, tail))
if (next eq null) Trie.next(nextStructure, trie.depth) // Note that you'll need to change the `next` signature
else trie.copy(structure = nextStructure)

@guizmaii
Copy link
Member

I review a few minor details. Didn't review everything. I haven't watched in details but for the Trie algorithms implementation, I'd try to use mutable data structures and variables instead of the foldLeft used in the insert, for example

@bahuber
Copy link
Contributor Author

bahuber commented Oct 20, 2025

I had some trouble converting the apply into the mutable/iterable pattern, any thoughts? Using a MutableBoolean would be an obvious solution to the leaf issue, but maybe not the preferred one.

  private object Trie {
    private val _empty               = Trie(structure = scala.collection.mutable.Map.empty[Any, Trie[Any]], depth = 0, isLeaf = false)
    def empty[T]: Trie[T]            = _empty.asInstanceOf[Trie[T]]
    def next[T](depth: Int): Trie[T] = Trie(structure = scala.collection.mutable.Map.empty[T, Trie[T]], depth = depth + 1, isLeaf = false) // Using a reference to empty here created a circular reference

    def apply[T](delimiters: Seq[Seq[T]]): Trie[T] = {
      var index                      = 0
      val outerIterator              = delimiters.iterator
      var root                       = Trie.empty[T]

      while (outerIterator.hasNext) {
        val innerIterator = outerIterator.next().iterator
        var curNode = root
        index   = 0
        if (innerIterator.hasNext) {
          while (innerIterator.hasNext) {
            val elem     = innerIterator.next()
            curNode = curNode.structure.get(elem) match {
              case Some(nextNode) => nextNode
              case None           => {
                var nextNode = Trie.next[T](depth = index)
                curNode.structure += (elem -> nextNode)
                nextNode
              }
            }
            index += 1
          }
          curNode = curNode.copy(isLeaf = true) //Doesn't do anything
        }
      }
      return root
    }
  }

@bahuber bahuber marked this pull request as ready for review October 20, 2025 02:18
@guizmaii
Copy link
Member

guizmaii commented Oct 20, 2025

@bahuber The exposed structure field in the Trie case class needs to be immutable map

@guizmaii
Copy link
Member

@bahuber You need to iterate with a mutable.Map to accumulate the structure and only at the end build a Trie

@bahuber
Copy link
Contributor Author

bahuber commented Oct 21, 2025

Let me know if you want this change added:

  private object Trie {
    private final case class MutableTrie[T](structure: scala.collection.mutable.Map[T, MutableTrie[T]], depth: Int, isLeaf: AtomicBoolean) {
      def toTrie: Trie[T] =
        Trie(
          structure = this.structure.map{ case (key, mutableTrie) => (key, mutableTrie.toTrie)}.toMap,
          depth = this.depth,
          isLeaf = this.isLeaf.get()
        )
    }

    private object MutableTrie {
      def empty[T]: MutableTrie[T]            = MutableTrie(structure = scala.collection.mutable.Map.empty[T, MutableTrie[T]], depth = 0, isLeaf = new AtomicBoolean(false))
      def next[T](depth: Int): MutableTrie[T] = MutableTrie(structure = scala.collection.mutable.Map.empty[T, MutableTrie[T]], depth = depth, isLeaf = new AtomicBoolean(false))
    }

    def apply[T](delimiters: Seq[Seq[T]]): Trie[T] = {
      var index                      = 0
      val outerIterator              = delimiters.iterator
      var root                       = MutableTrie.empty[T]

      while (outerIterator.hasNext) {
        val innerIterator = outerIterator.next().iterator
        var curNode = root
        index   = 0
        if (innerIterator.hasNext) {
          while (innerIterator.hasNext) {
            val elem     = innerIterator.next()
            curNode = curNode.structure.get(elem) match {
              case Some(nextNode) => nextNode
              case None           => {
                var nextNode = MutableTrie.next[T](depth = index + 1)
                curNode.structure += (elem -> nextNode)
                nextNode
              }
            }
            index += 1
          }
          curNode.isLeaf.set(true)
        }
      }
      return root.toTrie
    }
  }

I think the _empty trick with the Mutable objects caused some bugginess in the tests.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants