-
Couldn't load subscription status.
- Fork 1.4k
Description
I encounter an OutOfMemoryError when using ZTransducer.splitLines. However I can read the full file content into a String using Apache Commons FileUtils.readFileToString.
I would expect a streaming library to not consume more heap space when processing a file line by line than the file length.
The file has a size of ~107 MB (110,499.590 Bytes) with 1568 lines. The largest line is a String of ~224,000 characters.
JVM InitialHeapSize is set to 512MB, MaxHeapSize to 4GB (shouldn't matter since FileUtils can read the file to String).
for {
file <- args.headOption
.map(filePath => IO(Paths.get(filePath)))
.getOrElse(IO.fail("No file specified"))
inputStream <- IO(Files.newInputStream(file))
//_ <- console.putStrLn("reading file to memory using FileUtils.readFileToString...")
//s = FileUtils.readFileToString(file.toFile, Charset.forName("UTF-8"))
//_ <- console.putStrLn(s"lenght of file in memeory: ${s.length}")
stream = ZStream.fromInputStream(inputStream)
.transduce(ZTransducer.utf8Decode)
.transduce[ZEnv, IOException, String](ZTransducer.splitLines)
_ <- stream.runDrain
} yield ()java.lang.OutOfMemoryError: Java heap space
at java.util.Arrays.copyOf(Arrays.java:3332)
at java.lang.AbstractStringBuilder.ensureCapacityInternal(AbstractStringBuilder.java:124)
at java.lang.AbstractStringBuilder.append(AbstractStringBuilder.java:448)
at java.lang.StringBuilder.append(StringBuilder.java:136)
at zio.stream.ZTransducer$.$anonfun$splitLines$6(ZTransducer.scala:635)
at zio.stream.ZTransducer$.$anonfun$splitLines$6$adapted(ZTransducer.scala:634)
at zio.stream.ZTransducer$$$Lambda$205/1628294067.apply(Unknown Source)
at zio.Chunk$Singleton.foreach(Chunk.scala:1146)
at zio.Chunk$Concat.foreach(Chunk.scala:1124)
at zio.stream.ZTransducer$.$anonfun$splitLines$4(ZTransducer.scala:634)
I'm willing to fix this bug myself, with some help.
From the implementation I'd suggest to also replace String concat in ZTransducer.splitLines with a StringBuilder stored as local mutable state. However I'm not sure if there is a restriction on using local state in ZTransducers.