Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Remove reference cycle when writing tarfiles #115256

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
pan324 opened this issue Feb 10, 2024 · 7 comments
Closed

Remove reference cycle when writing tarfiles #115256

pan324 opened this issue Feb 10, 2024 · 7 comments
Labels
type-feature A feature request or enhancement

Comments

@pan324
Copy link
Contributor

pan324 commented Feb 10, 2024

Feature or enhancement

Proposal:

The following code keeps a file handle for eternity and even gc.collect does not help.

>>> import tarfile
>>> tarfile.TarFile("archive.tar.gz","w").add("somefile.py")

The culprit is a line that itself says that it is not needed. Indeed, it is never used anywhere. https://github.com/python/cpython/blob/main/Lib/tarfile.py#L2033

Now the example might be a rather bad style, but even a typical use case where a TarFile variable is defined and closed afterwards will also keep some stuff in memory until gc.collect happens due to this line. Removing the line also adds a ResourceWarning to the bad style example which currently does not appear.

Has this already been discussed elsewhere?

This is a minor feature, which does not need previous discussion elsewhere

Links to previous discussion of this feature:

No response

Linked PRs

@furkanonder
Copy link
Contributor

CC: @encukou

@encukou
Copy link
Member

encukou commented Feb 12, 2024

How can we know that no one is using this member?
If it needs to be removed, it should be deprecated according to PEP-387.

@furkanonder Do you have a particular question? I'm not sure why you mentioned me here.

@serhiy-storchaka
Copy link
Member

When and why this field was added? How it was used in earlier versions? When and why the comment was added?

@pan324
Copy link
Contributor Author

pan324 commented Feb 12, 2024

It was not used even when it was introduced. Several other classes at the time also define self.tarfile and use it in methods but TarInfo does not inherit from them. Here's what things looked like 17 years ago:

tarinfo.tarfile = self

The comment was added by @vadmium 8 years ago when he extended some of the docstrings. f817a48

The only public git repo I can find that accesses info.tarfile uses it to find replace symlink infos by their resolved infos: https://github.com/jlevy/instaclone/blob/master/instaclone/archives.py#L65-L75

It appears that this is not just about lingering handles but may actually corrupt archives as reported here: #78843

@vadmium
Copy link
Member

vadmium commented Feb 12, 2024

I don't remember why I added the comment. I was probably studying how the module used TarInfo objects, and noticed it setting the tarfile attribute without any documentation or usage of that attribute.

The attribute itself seems comes from implementing Pax format (c64e402).

@furkanonder
Copy link
Contributor

How can we know that no one is using this member? If it needs to be removed, it should be deprecated according to PEP-387.

@furkanonder Do you have a particular question? I'm not sure why you mentioned me here.

I've seen you fixing issues with the tarfile module. I tagged you in case you might have an idea about the issue raised here. Sorry if I've disturbed you.

@hugovk
Copy link
Member

hugovk commented Jan 29, 2025

Triage: looks like this can be closed, please re-open if there more to do. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type-feature A feature request or enhancement
Projects
Status: Done
Development

No branches or pull requests

6 participants