Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Update npy_append_array.py #3

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 5 commits into from
Closed

Update npy_append_array.py #3

wants to merge 5 commits into from

Conversation

gityoav
Copy link
Contributor

@gityoav gityoav commented Jan 5, 2022

  • support for both write/append for a smooth switching between the two
  • I did this by splitting the self.write() from the __init() code
  • if original file did not support appending, we will try in the background to re-save it so it does.
  • if original array is not contiguous, we will try in the background to make it contiguous

- support for both write/append for a smooth switching between the two
- if original file did not support appending, we will try in the background to save it so it does
self.filename = filename
self.fp = None
self.__is_init = False
if os.path.isfile(filename):
self.__init()
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

__init is now called only if we are appending, so is done in .append

@xor2k
Copy link
Owner

xor2k commented Jan 6, 2022

Thank you for the pull request, always makes me happy to see them!

However, I don't want do add more features at this point since I want NpyAppendArray to become part of Numpy (compare numpy/numpy#20321). Currently, I want NpyAppendArray to develop as close as possible to the Numpy pull request. Also, I see the following aspects:

  1. write/append: I'd rather add a flag overwrite_existing to __init__ rather than falling back to "r/w/a" schemes of File.write, as "a" would work different from File.write: NpyAppendArray not just appends, but also updates the header (at least for now, could be different with the Numpy pull request I've postet, see above).
  2. Automatically resaving on append is potentially dangerous in particular for large files: if files are very large they can block the machine/a process for a (relatively) long time. If the machine needs a reboot or the command is run with a timeout (e.g. 1 second and after that, the Python process gets terminated), the file may remain in an unfinished state and can thus be damaged. Such actions should be triggered by an extra function call by the user. If this library had been comfort/feature focussed, I'd rather add a method make_file_appendable or so.
  3. Making arrays contiguous automatically can also be problematic due to RAM use and should be done with an extra function call by the user (e.g. by creating a contiguous copy of the array). Would be a nice feature for Numpy though to support efficient writing of contiguous .npy files from non-contiguous arrays (if this has not already implemented, I don't know).

I will leave the pull request open and maybe some variant of it might be merged at some point in the future.

@gityoav
Copy link
Contributor Author

gityoav commented Jan 6, 2022

all made sense. I removed the write/append entirely. The write method is enough: if the user wants to write, let her use npa.write explicitly. I also removed the contigous check, this can be done by user before appending. As suggested, I added a make_file_appendable method.

@gityoav
Copy link
Contributor Author

gityoav commented Jan 6, 2022

btw, I am using it in
https://github.com/gityoav/pyg-npy
It is insanely fast for writing pd.DataFrame :-)

@xor2k
Copy link
Owner

xor2k commented Dec 28, 2022

After a lot of work and discussions here and there, the pull request went through and I've got some feedback from the Numpy team: the wanted to have the AppendArray functionality in an external module. However, now .npy files are appendable by default (given the Numpy version is recent enough). I've got lots of inspiration though and reworked the entire module. It probably now supports all the features you were suggesting in this pull request.

@gityoav
Copy link
Contributor Author

gityoav commented Dec 28, 2022

That's good to hear, I now have in production a listener appending to npy files continuosly throughout the day. It's production code so will take some time to migrate but I will have a look. Thanks for keeping me updated and happy new year.

@xor2k
Copy link
Owner

xor2k commented Dec 28, 2022

If there is something missing for your usecase, just write here, we can add it then. Happy new year, too!

@xor2k
Copy link
Owner

xor2k commented Apr 27, 2023

  • support for both write/append for a smooth switching between the two
  • I did this by splitting the self.write() from the __init() code
  • if original file did not support appending, we will try in the background to re-save it so it does.
  • if original array is not contiguous, we will try in the background to make it contiguous

I have implemented the ensure_appendable to make not appendable files appendable. Non contiguous arrays will automatically be made contiguous on write anyway. Should resolve the issue of the pull request.

@xor2k xor2k closed this Apr 27, 2023
@xor2k
Copy link
Owner

xor2k commented Apr 27, 2023

And sorry for the late answer, took me a while to get it into the state I wanted to have it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants