-
Notifications
You must be signed in to change notification settings - Fork 3
Pre-process iCalendars line by line to avoid OOM on large files #92
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR addresses Out Of Memory (OOM) issues when processing super large iCalendar files by implementing chunked processing instead of loading entire files into memory. The changes move from a Reader-based preprocessing approach to a line-by-line chunked processing system that processes iCalendar data in groups of 1000 lines to maintain memory efficiency while preserving functionality.
Key changes:
- Refactored stream preprocessing to process data in configurable chunks rather than loading entire files
- Converted StreamPreprocessor from abstract class to interface with simplified API
- Added SequenceReader utility class to convert processed line sequences back to Reader interface
Reviewed Changes
Copilot reviewed 6 out of 6 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
| SequenceReader.kt | New utility class that converts String sequences into Reader interface |
| StreamPreprocessor.kt | Simplified from abstract class to interface, removing preprocessing logic |
| ICalPreprocessor.kt | Refactored to implement chunked processing with configurable chunk sizes |
| FixInvalidUtcOffsetPreprocessor.kt | Updated to implement StreamPreprocessor interface |
| FixInvalidDayOffsetPreprocessor.kt | Updated to implement StreamPreprocessor interface |
| ICalPreprocessorInstrumentedTest.kt | Added test with large iCalendar file generator to verify memory efficiency |
Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.
lib/src/main/kotlin/at/bitfire/synctools/utils/SequenceReader.kt
Outdated
Show resolved
Hide resolved
lib/src/main/kotlin/at/bitfire/synctools/icalendar/validation/ICalPreprocessor.kt
Outdated
Show resolved
Hide resolved
lib/src/main/kotlin/at/bitfire/synctools/icalendar/validation/ICalPreprocessor.kt
Outdated
Show resolved
Hide resolved
...oidTest/kotlin/at/bitfire/synctools/icalendar/validation/ICalPreprocessorInstrumentedTest.kt
Outdated
Show resolved
Hide resolved
|
@bitfireAT/app-dev should be ready :) A bit ugly, but I don't know how to simplify it to be honest. Mainly |
sunkup
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pretty cool, but unfortunately not very necessary and might introduce new problems. The current focus for synctools is the refactoring for stability, which is already breaking things a bit ... So it's not really a good time to be adding this. I will do an actual review if @rfc2822 thinks we should add it to synctools now anyways.
rfc2822
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because the large file / OOM was a real problem, I think it's a good idea to make the preprocessors more stable, too.
Some comments.
lib/src/main/kotlin/at/bitfire/synctools/icalendar/validation/StreamPreprocessor.kt
Outdated
Show resolved
Hide resolved
lib/src/main/kotlin/at/bitfire/synctools/icalendar/validation/StreamPreprocessor.kt
Outdated
Show resolved
Hide resolved
lib/src/main/kotlin/at/bitfire/synctools/icalendar/validation/ICalPreprocessor.kt
Outdated
Show resolved
Hide resolved
lib/src/main/kotlin/at/bitfire/synctools/icalendar/validation/ICalPreprocessor.kt
Outdated
Show resolved
Hide resolved
lib/src/main/kotlin/at/bitfire/synctools/icalendar/validation/ICalPreprocessor.kt
Outdated
Show resolved
Hide resolved
lib/src/main/kotlin/at/bitfire/synctools/icalendar/validation/ICalPreprocessor.kt
Outdated
Show resolved
Hide resolved
lib/src/main/kotlin/at/bitfire/synctools/icalendar/validation/ICalPreprocessor.kt
Outdated
Show resolved
Hide resolved
lib/src/main/kotlin/at/bitfire/synctools/utils/SequenceReader.kt
Outdated
Show resolved
Hide resolved
Signed-off-by: Arnau Mora <[email protected]>
Signed-off-by: Arnau Mora <[email protected]>
Signed-off-by: Arnau Mora <[email protected]>
Signed-off-by: Arnau Mora <[email protected]>
rfc2822
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some more comments :)
...oidTest/kotlin/at/bitfire/synctools/icalendar/validation/ICalPreprocessorInstrumentedTest.kt
Outdated
Show resolved
Hide resolved
...oidTest/kotlin/at/bitfire/synctools/icalendar/validation/ICalPreprocessorInstrumentedTest.kt
Outdated
Show resolved
Hide resolved
...src/main/kotlin/at/bitfire/synctools/icalendar/validation/FixInvalidDayOffsetPreprocessor.kt
Outdated
Show resolved
Hide resolved
lib/src/main/kotlin/at/bitfire/synctools/icalendar/validation/StreamPreprocessor.kt
Outdated
Show resolved
Hide resolved
lib/src/main/kotlin/at/bitfire/synctools/icalendar/validation/ICalPreprocessor.kt
Outdated
Show resolved
Hide resolved
lib/src/main/kotlin/at/bitfire/synctools/icalendar/validation/ICalPreprocessor.kt
Outdated
Show resolved
Hide resolved
# Conflicts: # gradle/libs.versions.toml
|
I tried it out with 100,000 events and processing the stream in chunks of 1000 lines was not faster than line by line (both ~ 10 sec). So I'll remove the chunking for the benefit of having an easy-to-understand pre-processer interface (processes only one line). |
rfc2822
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ArnyminerZ I did some last modifications, please double-check and wait for @sunkup's review before merging (the PR is quite important because it will be applied to every single synced iCalendar).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
Copilot reviewed 9 out of 9 changed files in this pull request and generated 1 comment.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
lib/src/main/kotlin/at/bitfire/synctools/icalendar/validation/ICalPreprocessor.kt
Outdated
Show resolved
Hide resolved
I've checked them, they look good. Let's wait for @sunkup's approval |
sunkup
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks a bit scary tbh, but I think you covered a good amount of test cases so lets hope it goes well :)
As far as I can see the repairs are applied line by line, so this note in the PR description is confusing. Might want to update the PR description.
Note: to avoid having to apply regex conditions (which can get expensive) hundred of thousands of times, the lines are chunked in groups of 1000 lines (arbitrary, can be adjusted).
Context
Right now we are loading the whole iCalendar into memory when applying the preprocessors, which breaks the whole point to use Readers in the first place. This can lead to Out Of Memory exceptions on super large iCalendar documents, or in devices with limited memory.
More info and reproduction in #90
Changes
ICalPreprocessorInstrumentedTest) that generates a very large iCalendar file.With
Int.MAX_VALUEevents.ICalPreprocessor.preprocessStreamworks:StreamPreprocessors on the whole document at once, they are applied line-wise.Readergiven to theICalPreprocessorsupportreset, the result of this function will be aSequenceReader. This is a new class, that converts sequences into aReader. Note that this function does not supportreset, by the way sequences work in Kotlin.Readerdoesn't supportreset, it will have to be loaded fully into memory anyway.StreamPreprocessoris now simpler (most logic has been moved intoICalPreprocessor), they are now interfaces.Note
It is possible to run the pre-processors in a line-basis is because they are applied line-wise. If at some point we require a preprocessor that needs to fix multiple lines at once (maybe description fixes? which allow multi-line), we might have to re-do this.