Thanks to visit codestin.com
Credit goes to github.com

Skip to content

iget_records leaks memory when column count 15,000+Β #198

@sodiray

Description

@sodiray

The title is a little accusatory but this is currently the most likely cause to an issue I'm having πŸ™ πŸ˜„

I have a client who is uploading an .xlsx file to a service using pyexcel to parse it. The client somehow managed to get a little over 16,000 columns in the latest sheet they uploaded and our hosted server died. Were using iget_records along with free_resources. Reading the docs this should allow us to iterate a single row in memory at a time - not reading the entire file at once (seen here and here)

The Issue

With .xlsx files having columns less that 200 the memory is managed correctly. Using a profiler I can see that the block iterating the iget_records generator increases the process' memory space by about one row, then dumps it on the next iteration. However, when parsing a file with over 15,000 columns the profiling indicates that the memory space allocated for each row yielded by iget_records is not dumped at the end of the block. The process memory soars over 3G in around 20 seconds.

Reproducability

My powers that be aren't as hip on open source as you --- my thanks and apologies to you for your hard work --- so I can't post exactly what we have. I wanted to get this posted to start the dialog and get your feedback @chfw I'm gonna start working on a small reproducible script/test.

Similar Issue

I see #131 has a similar story. However, it referes to iget_array - possibly this is just the same for iget_records?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions