-
-
Notifications
You must be signed in to change notification settings - Fork 166
Description
The title is a little accusatory but this is currently the most likely cause to an issue I'm having π π
I have a client who is uploading an .xlsx file to a service using pyexcel to parse it. The client somehow managed to get a little over 16,000 columns in the latest sheet they uploaded and our hosted server died. Were using iget_records along with free_resources. Reading the docs this should allow us to iterate a single row in memory at a time - not reading the entire file at once (seen here and here)
The Issue
With .xlsx files having columns less that 200 the memory is managed correctly. Using a profiler I can see that the block iterating the iget_records generator increases the process' memory space by about one row, then dumps it on the next iteration. However, when parsing a file with over 15,000 columns the profiling indicates that the memory space allocated for each row yielded by iget_records is not dumped at the end of the block. The process memory soars over 3G in around 20 seconds.
Reproducability
My powers that be aren't as hip on open source as you --- my thanks and apologies to you for your hard work --- so I can't post exactly what we have. I wanted to get this posted to start the dialog and get your feedback @chfw I'm gonna start working on a small reproducible script/test.
Similar Issue
I see #131 has a similar story. However, it referes to iget_array - possibly this is just the same for iget_records?