Thanks to visit codestin.com
Credit goes to github.com

Skip to content

gh-51067: add ZipFile.remove() #103033

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 11 commits into
base: main
Choose a base branch
from
Open

gh-51067: add ZipFile.remove() #103033

wants to merge 11 commits into from

Conversation

danny0838
Copy link

@danny0838 danny0838 commented Mar 25, 2023

This is a revision of #19358 (for issue #51067) as the original author seems not keeping working.

Notable changes:

  • Added docs and tests.

  • Support mode 'w' and 'x', as noted by remove/delete method for zipfile objects #51067 (comment)

  • Support removing multiple members and removing non-physically with the internal _remove_members method, as they may be used by some interested people, as noted by remove/delete method for zipfile objects #51067 (comment) and remove/delete method for zipfile objects #51067 (comment).

    They are not currently introduced in the public remove API, as it would involve more complicated changes to the public APIs (e.g. introducing error handling for multiple members, and a extra method that purges stale data by non-physical removing) and other ZipFile related APIs do not support similar operations.

  • Move physical data in chunks, to prevent a memory issue for large files.

  • Fixed a flaw of the previous implementation that self.NameToInfo gets a missing key when removing one of duplicated arcnames.



📚 Documentation preview 📚: https://cpython-previews--103033.org.readthedocs.build/

@ghost
Copy link

ghost commented Mar 25, 2023

All commit authors signed the Contributor License Agreement.
CLA signed

@bedevere-bot
Copy link

Most changes to Python require a NEWS entry.

Please add it using the blurb_it web app or the blurb command-line tool.

@bedevere-bot
Copy link

Most changes to Python require a NEWS entry.

Please add it using the blurb_it web app or the blurb command-line tool.

@danny0838 danny0838 force-pushed the gh-51067 branch 3 times, most recently from 76722fa to f3450f1 Compare March 26, 2023 12:37
@barneygale
Copy link
Contributor

Automatically compacting the .zip file seems like overkill. Suggestion: split this into two methods:

  • ZipInfo.remove() removes the record for the file, and by default zeroes out its data.
  • ZipFile.repack() reclaims free space.

The latter method is dangerous for self-extracting .zip files, which have an executable header before the the zip data begins. That header would be stripped out, I think.

@barneygale
Copy link
Contributor

barneygale commented Mar 30, 2023

Also, please don't force-push to an open PR. It makes it harder for reviewers to follow changes! Thanks

@arhadthedev arhadthedev added the stdlib Python modules in the Lib dir label Mar 30, 2023
bpepple added a commit to Metron-Project/darkseid that referenced this pull request Aug 31, 2024
bpepple added a commit to Metron-Project/darkseid that referenced this pull request Aug 31, 2024
* Add remove method to ZipFile

Refer to: python/cpython#103033

* Make use of `ZipFileWithRemove`
@merwok merwok added the type-feature A feature request or enhancement label Apr 24, 2025
@python-cla-bot
Copy link

python-cla-bot bot commented May 22, 2025

All commit authors signed the Contributor License Agreement.

CLA signed

@bedevere-app
Copy link

bedevere-app bot commented May 22, 2025

Most changes to Python require a NEWS entry. Add one using the blurb_it web app or the blurb command-line tool.

If this change has little impact on Python users, wait for a maintainer to apply the skip news label instead.

danny0838 added a commit to danny0838/cpython that referenced this pull request May 22, 2025
This is a revision of commit 659eb04 (PR python#19358), notably with following changes:

- Add documentation and tests.
- Raise `ValueError` for a bad mode, as in other methods.
- Support multi-member removal in `_remove_members()`.
- Support non-physical removal in `_remove_members()`.
- Move physical file data in chunks to prevent excessive memory usage on large files.
- Fix missing entry in `self.NameToInfo` when removing a duplicated archive name.
- Also update `ZipInfo._end_offset` for physically moved files.

Co-authored-by: Éric <[email protected]>

(cherry picked from commit e6bc82a (PR python#103033))
@bedevere-app
Copy link

bedevere-app bot commented May 22, 2025

Most changes to Python require a NEWS entry. Add one using the blurb_it web app or the blurb command-line tool.

If this change has little impact on Python users, wait for a maintainer to apply the skip news label instead.

danny0838 added a commit to danny0838/cpython that referenced this pull request May 22, 2025
This is a revision of commit 659eb04 (PR python#19358), notably with following changes:

- Add documentation and tests.
- Raise `ValueError` for a bad mode, as in other methods.
- Support multi-member removal in `_remove_members()`.
- Support non-physical removal in `_remove_members()`.
- Move physical file data in chunks to prevent excessive memory usage on large files.
- Fix missing entry in `self.NameToInfo` when removing a duplicated archive name.
- Also update `ZipInfo._end_offset` for physically moved files.

Co-authored-by: Éric <[email protected]>

(cherry picked from commit e6bc82a (PR python#103033))
@bedevere-app
Copy link

bedevere-app bot commented May 22, 2025

Most changes to Python require a NEWS entry. Add one using the blurb_it web app or the blurb command-line tool.

If this change has little impact on Python users, wait for a maintainer to apply the skip news label instead.

@danny0838 danny0838 changed the title gh-51067: add ZipInfo.remove() gh-51067: add ZipFile.remove() May 22, 2025
@danny0838
Copy link
Author

The PR is being old and the base python executable can be hardly compiled. Rebased onto the latest Python and make sure it builds, with few trivial commits squashed together.

@@ -0,0 +1 @@
Add ``ZipFile.remove()``
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Add ``ZipFile.remove()``
Add :meth:`ZipFile.remove <zipfile.ZipFile.remove>`.

This is a revision of commit 659eb04 (PR python#19358), notably with following changes:

- Add documentation and tests.
- Raise `ValueError` for a bad mode, as in other methods.
- Support multi-member removal in `_remove_members()`.
- Support non-physical removal in `_remove_members()`.
- Move physical file data in chunks to prevent excessive memory usage on large files.
- Fix missing entry in `self.NameToInfo` when removing a duplicated archive name.
- Also update `ZipInfo._end_offset` for physically moved files.

Co-authored-by: Éric <[email protected]>

(cherry picked from commit e6bc82a (PR python#103033))
danny0838 and others added 10 commits June 19, 2025 23:44
- File is not truncated in mode 'w'/'x', which results non-shrinked file.
- This cannot be simply resolved by adding truncation for mode 'w'/'x', which may be used on an unseekable file buffer and truncation is not allowed.
- The seek will be automatically called in `ZipFile.close`.
@danny0838
Copy link
Author

danny0838 commented Jun 19, 2025

Rebased to use the same base as PR #134627 for easier comparison.

In favor of #134627, I'm probably not going to keep working on this PR, unless it proves to be the final accepted approach for the issue.

NOTE: Compared to #134627, this PR is rather quick and naive, and some cares may need to be aware of:

  1. Overlapping entries are not checked. It may be unsafe to call remove on a ZIP file having such entries.
  2. When stripping a local file entry, the algorithm simply strips bytes between its starting and next local file entry's starting instead of actually calculate its size. This means extra bytes after the entry will be removed, which may be a problem in some cases. (See c80d21b for details)

Unfortunately, fixing any of them requires a non-trivial code rework.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
awaiting review stdlib Python modules in the Lib dir type-feature A feature request or enhancement
Projects
Status: No status
Development

Successfully merging this pull request may close these issues.

5 participants