Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Conversation

@ArnyminerZ
Copy link
Member

@ArnyminerZ ArnyminerZ commented Sep 28, 2025

Context

Right now we are loading the whole iCalendar into memory when applying the preprocessors, which breaks the whole point to use Readers in the first place. This can lead to Out Of Memory exceptions on super large iCalendar documents, or in devices with limited memory.

More info and reproduction in #90

Changes

  • Added a new test (ICalPreprocessorInstrumentedTest) that generates a very large iCalendar file.
    With Int.MAX_VALUE events.
  • Changed the way ICalPreprocessor.preprocessStream works:
    • Instead of applying the StreamPreprocessors on the whole document at once, they are applied line-wise.
    • If the Reader given to the ICalPreprocessor support reset, the result of this function will be a SequenceReader. This is a new class, that converts sequences into a Reader. Note that this function does not support reset, by the way sequences work in Kotlin.
    • If the Reader doesn't support reset, it will have to be loaded fully into memory anyway.
  • Since StreamPreprocessor is now simpler (most logic has been moved into ICalPreprocessor), they are now interfaces.

Note

It is possible to run the pre-processors in a line-basis is because they are applied line-wise. If at some point we require a preprocessor that needs to fix multiple lines at once (maybe description fixes? which allow multi-line), we might have to re-do this.

@ArnyminerZ ArnyminerZ self-assigned this Sep 28, 2025
@ArnyminerZ ArnyminerZ added the refactoring Quality improvement of existing functions label Sep 28, 2025
@ArnyminerZ ArnyminerZ requested a review from a team as a code owner September 28, 2025 13:05
@ArnyminerZ ArnyminerZ linked an issue Sep 28, 2025 that may be closed by this pull request
@ArnyminerZ ArnyminerZ requested a review from Copilot September 28, 2025 13:07
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR addresses Out Of Memory (OOM) issues when processing super large iCalendar files by implementing chunked processing instead of loading entire files into memory. The changes move from a Reader-based preprocessing approach to a line-by-line chunked processing system that processes iCalendar data in groups of 1000 lines to maintain memory efficiency while preserving functionality.

Key changes:

  • Refactored stream preprocessing to process data in configurable chunks rather than loading entire files
  • Converted StreamPreprocessor from abstract class to interface with simplified API
  • Added SequenceReader utility class to convert processed line sequences back to Reader interface

Reviewed Changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
SequenceReader.kt New utility class that converts String sequences into Reader interface
StreamPreprocessor.kt Simplified from abstract class to interface, removing preprocessing logic
ICalPreprocessor.kt Refactored to implement chunked processing with configurable chunk sizes
FixInvalidUtcOffsetPreprocessor.kt Updated to implement StreamPreprocessor interface
FixInvalidDayOffsetPreprocessor.kt Updated to implement StreamPreprocessor interface
ICalPreprocessorInstrumentedTest.kt Added test with large iCalendar file generator to verify memory efficiency

Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

@ArnyminerZ ArnyminerZ marked this pull request as draft September 28, 2025 13:08
@ArnyminerZ
Copy link
Member Author

@bitfireAT/app-dev should be ready :)

A bit ugly, but I don't know how to simplify it to be honest. Mainly SequenceReader is a bit verbose.

@ArnyminerZ ArnyminerZ marked this pull request as ready for review September 28, 2025 13:46
Copy link
Member

@sunkup sunkup left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pretty cool, but unfortunately not very necessary and might introduce new problems. The current focus for synctools is the refactoring for stability, which is already breaking things a bit ... So it's not really a good time to be adding this. I will do an actual review if @rfc2822 thinks we should add it to synctools now anyways.

Copy link
Member

@rfc2822 rfc2822 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because the large file / OOM was a real problem, I think it's a good idea to make the preprocessors more stable, too.

Some comments.

Copy link
Member

@rfc2822 rfc2822 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some more comments :)

@rfc2822 rfc2822 self-assigned this Nov 6, 2025
@rfc2822
Copy link
Member

rfc2822 commented Nov 6, 2025

I tried it out with 100,000 events and processing the stream in chunks of 1000 lines was not faster than line by line (both ~ 10 sec). So I'll remove the chunking for the benefit of having an easy-to-understand pre-processer interface (processes only one line).

Copy link
Member

@rfc2822 rfc2822 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ArnyminerZ I did some last modifications, please double-check and wait for @sunkup's review before merging (the PR is quite important because it will be applied to every single synced iCalendar).

@rfc2822 rfc2822 requested review from Copilot and sunkup November 6, 2025 10:55
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

Copilot reviewed 9 out of 9 changed files in this pull request and generated 1 comment.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@rfc2822 rfc2822 changed the title Fix OOM issues with super large iCalendar files Pre-process iCalendars line by line to avoid OOM on large iCalendar files Nov 6, 2025
@rfc2822 rfc2822 changed the title Pre-process iCalendars line by line to avoid OOM on large iCalendar files Pre-process iCalendars line by line to avoid OOM on large files Nov 6, 2025
@ArnyminerZ
Copy link
Member Author

I did some last modifications

I've checked them, they look good. Let's wait for @sunkup's approval

Copy link
Member

@sunkup sunkup left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks a bit scary tbh, but I think you covered a good amount of test cases so lets hope it goes well :)

As far as I can see the repairs are applied line by line, so this note in the PR description is confusing. Might want to update the PR description.

Note: to avoid having to apply regex conditions (which can get expensive) hundred of thousands of times, the lines are chunked in groups of 1000 lines (arbitrary, can be adjusted).

@rfc2822 rfc2822 merged commit 7af49dd into main Nov 12, 2025
7 checks passed
@rfc2822 rfc2822 deleted the 90-oom-exception-with-large-icalendar-files branch November 12, 2025 09:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

refactoring Quality improvement of existing functions

Projects

None yet

Development

Successfully merging this pull request may close these issues.

OOM Exception with large icalendar files

3 participants