Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Commit 7a80389

Browse files
committed
Issue 23193: Add numeric_owner to tarfile.TarFile.extract() and tarfile.TarFile.extractall().
1 parent 28edf12 commit 7a80389

6 files changed

Lines changed: 195 additions & 28 deletions

File tree

Doc/library/tarfile.rst

Lines changed: 16 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -367,7 +367,7 @@ be finalized; only the internally used file object will be closed. See the
367367
available.
368368

369369

370-
.. method:: TarFile.extractall(path=".", members=None)
370+
.. method:: TarFile.extractall(path=".", members=None, *, numeric_owner=False)
371371

372372
Extract all members from the archive to the current working directory or
373373
directory *path*. If optional *members* is given, it must be a subset of the
@@ -377,22 +377,33 @@ be finalized; only the internally used file object will be closed. See the
377377
reset each time a file is created in it. And, if a directory's permissions do
378378
not allow writing, extracting files to it will fail.
379379

380+
If *numeric_owner* is :const:`True`, the uid and gid numbers from the tarfile
381+
are used to set the owner/group for the extracted files. Otherwise, the named
382+
values from the tarfile are used.
383+
380384
.. warning::
381385

382386
Never extract archives from untrusted sources without prior inspection.
383387
It is possible that files are created outside of *path*, e.g. members
384388
that have absolute filenames starting with ``"/"`` or filenames with two
385389
dots ``".."``.
386390

391+
.. versionchanged:: 3.5
392+
Added the *numeric_only* parameter.
393+
387394

388-
.. method:: TarFile.extract(member, path="", set_attrs=True)
395+
.. method:: TarFile.extract(member, path="", set_attrs=True, *, numeric_owner=False)
389396

390397
Extract a member from the archive to the current working directory, using its
391398
full name. Its file information is extracted as accurately as possible. *member*
392399
may be a filename or a :class:`TarInfo` object. You can specify a different
393400
directory using *path*. File attributes (owner, mtime, mode) are set unless
394401
*set_attrs* is false.
395402

403+
If *numeric_owner* is :const:`True`, the uid and gid numbers from the tarfile
404+
are used to set the owner/group for the extracted files. Otherwise, the named
405+
values from the tarfile are used.
406+
396407
.. note::
397408

398409
The :meth:`extract` method does not take care of several extraction issues.
@@ -405,6 +416,9 @@ be finalized; only the internally used file object will be closed. See the
405416
.. versionchanged:: 3.2
406417
Added the *set_attrs* parameter.
407418

419+
.. versionchanged:: 3.5
420+
Added the *numeric_only* parameter.
421+
408422
.. method:: TarFile.extractfile(member)
409423

410424
Extract a member from the archive as a file object. *member* may be a filename
@@ -827,4 +841,3 @@ In case of :const:`PAX_FORMAT` archives, *encoding* is generally not needed
827841
because all the metadata is stored using *UTF-8*. *encoding* is only used in
828842
the rare cases when binary pax headers are decoded or when strings with
829843
surrogate characters are stored.
830-

Doc/whatsnew/3.5.rst

Lines changed: 14 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -479,26 +479,32 @@ socket
479479
:meth:`socket.socket.send`.
480480
(Contributed by Giampaolo Rodola' in :issue:`17552`.)
481481

482+
subprocess
483+
----------
484+
485+
* The new :func:`subprocess.run` function runs subprocesses and returns a
486+
:class:`subprocess.CompletedProcess` object. It Provides a more consistent
487+
API than :func:`~subprocess.call`, :func:`~subprocess.check_call` and
488+
:func:`~subprocess.check_output`.
489+
482490
sysconfig
483491
---------
484492

485493
* The user scripts directory on Windows is now versioned.
486494
(Contributed by Paul Moore in :issue:`23437`.)
487495

488-
489496
tarfile
490497
-------
491498

492499
* The :func:`tarfile.open` function now supports ``'x'`` (exclusive creation)
493500
mode. (Contributed by Berker Peksag in :issue:`21717`.)
494501

495-
subprocess
496-
----------
497-
498-
* The new :func:`subprocess.run` function runs subprocesses and returns a
499-
:class:`subprocess.CompletedProcess` object. It Provides a more consistent
500-
API than :func:`~subprocess.call`, :func:`~subprocess.check_call` and
501-
:func:`~subprocess.check_output`.
502+
* The :meth:`~tarfile.TarFile.extractall` and :meth:`~tarfile.TarFile.extract`
503+
methods now take a keyword parameter *numeric_only*. If set to ``True``,
504+
the extracted files and directories will be owned by the numeric uid and gid
505+
from the tarfile. If set to ``False`` (the default, and the behavior in
506+
versions prior to 3.5), they will be owned bythe named user and group in the
507+
tarfile. (Contributed by Michael Vogt and Eric Smith in :issue:`23193`.)
502508

503509
time
504510
----

Lib/tarfile.py

Lines changed: 28 additions & 17 deletions
Original file line numberDiff line numberDiff line change
@@ -1972,12 +1972,13 @@ def addfile(self, tarinfo, fileobj=None):
19721972

19731973
self.members.append(tarinfo)
19741974

1975-
def extractall(self, path=".", members=None):
1975+
def extractall(self, path=".", members=None, *, numeric_owner=False):
19761976
"""Extract all members from the archive to the current working
19771977
directory and set owner, modification time and permissions on
19781978
directories afterwards. `path' specifies a different directory
19791979
to extract to. `members' is optional and must be a subset of the
1980-
list returned by getmembers().
1980+
list returned by getmembers(). If `numeric_owner` is True, only
1981+
the numbers for user/group names are used and not the names.
19811982
"""
19821983
directories = []
19831984

@@ -1991,7 +1992,8 @@ def extractall(self, path=".", members=None):
19911992
tarinfo = copy.copy(tarinfo)
19921993
tarinfo.mode = 0o700
19931994
# Do not set_attrs directories, as we will do that further down
1994-
self.extract(tarinfo, path, set_attrs=not tarinfo.isdir())
1995+
self.extract(tarinfo, path, set_attrs=not tarinfo.isdir(),
1996+
numeric_owner=numeric_owner)
19951997

19961998
# Reverse sort directories.
19971999
directories.sort(key=lambda a: a.name)
@@ -2001,7 +2003,7 @@ def extractall(self, path=".", members=None):
20012003
for tarinfo in directories:
20022004
dirpath = os.path.join(path, tarinfo.name)
20032005
try:
2004-
self.chown(tarinfo, dirpath)
2006+
self.chown(tarinfo, dirpath, numeric_owner=numeric_owner)
20052007
self.utime(tarinfo, dirpath)
20062008
self.chmod(tarinfo, dirpath)
20072009
except ExtractError as e:
@@ -2010,12 +2012,14 @@ def extractall(self, path=".", members=None):
20102012
else:
20112013
self._dbg(1, "tarfile: %s" % e)
20122014

2013-
def extract(self, member, path="", set_attrs=True):
2015+
def extract(self, member, path="", set_attrs=True, *, numeric_owner=False):
20142016
"""Extract a member from the archive to the current working directory,
20152017
using its full name. Its file information is extracted as accurately
20162018
as possible. `member' may be a filename or a TarInfo object. You can
20172019
specify a different directory using `path'. File attributes (owner,
2018-
mtime, mode) are set unless `set_attrs' is False.
2020+
mtime, mode) are set unless `set_attrs' is False. If `numeric_owner`
2021+
is True, only the numbers for user/group names are used and not
2022+
the names.
20192023
"""
20202024
self._check("r")
20212025

@@ -2030,7 +2034,8 @@ def extract(self, member, path="", set_attrs=True):
20302034

20312035
try:
20322036
self._extract_member(tarinfo, os.path.join(path, tarinfo.name),
2033-
set_attrs=set_attrs)
2037+
set_attrs=set_attrs,
2038+
numeric_owner=numeric_owner)
20342039
except OSError as e:
20352040
if self.errorlevel > 0:
20362041
raise
@@ -2076,7 +2081,8 @@ def extractfile(self, member):
20762081
# blkdev, etc.), return None instead of a file object.
20772082
return None
20782083

2079-
def _extract_member(self, tarinfo, targetpath, set_attrs=True):
2084+
def _extract_member(self, tarinfo, targetpath, set_attrs=True,
2085+
numeric_owner=False):
20802086
"""Extract the TarInfo object tarinfo to a physical
20812087
file called targetpath.
20822088
"""
@@ -2114,7 +2120,7 @@ def _extract_member(self, tarinfo, targetpath, set_attrs=True):
21142120
self.makefile(tarinfo, targetpath)
21152121

21162122
if set_attrs:
2117-
self.chown(tarinfo, targetpath)
2123+
self.chown(tarinfo, targetpath, numeric_owner)
21182124
if not tarinfo.issym():
21192125
self.chmod(tarinfo, targetpath)
21202126
self.utime(tarinfo, targetpath)
@@ -2203,19 +2209,24 @@ def makelink(self, tarinfo, targetpath):
22032209
except KeyError:
22042210
raise ExtractError("unable to resolve link inside archive")
22052211

2206-
def chown(self, tarinfo, targetpath):
2207-
"""Set owner of targetpath according to tarinfo.
2212+
def chown(self, tarinfo, targetpath, numeric_owner):
2213+
"""Set owner of targetpath according to tarinfo. If numeric_owner
2214+
is True, use .gid/.uid instead of .gname/.uname.
22082215
"""
22092216
if pwd and hasattr(os, "geteuid") and os.geteuid() == 0:
22102217
# We have to be root to do so.
2211-
try:
2212-
g = grp.getgrnam(tarinfo.gname)[2]
2213-
except KeyError:
2218+
if numeric_owner:
22142219
g = tarinfo.gid
2215-
try:
2216-
u = pwd.getpwnam(tarinfo.uname)[2]
2217-
except KeyError:
22182220
u = tarinfo.uid
2221+
else:
2222+
try:
2223+
g = grp.getgrnam(tarinfo.gname)[2]
2224+
except KeyError:
2225+
g = tarinfo.gid
2226+
try:
2227+
u = pwd.getpwnam(tarinfo.uname)[2]
2228+
except KeyError:
2229+
u = tarinfo.uid
22192230
try:
22202231
if tarinfo.issym() and hasattr(os, "lchown"):
22212232
os.lchown(targetpath, u, g)

Lib/test/test_tarfile.py

Lines changed: 132 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -2,8 +2,10 @@
22
import os
33
import io
44
from hashlib import md5
5+
from contextlib import contextmanager
56

67
import unittest
8+
import unittest.mock
79
import tarfile
810

911
from test import support, script_helper
@@ -2264,6 +2266,136 @@ def test_partial_input_bz2(self):
22642266
self._test_partial_input("r:bz2")
22652267

22662268

2269+
def root_is_uid_gid_0():
2270+
try:
2271+
import pwd, grp
2272+
except ImportError:
2273+
return False
2274+
if pwd.getpwuid(0)[0] != 'root':
2275+
return False
2276+
if grp.getgrgid(0)[0] != 'root':
2277+
return False
2278+
return True
2279+
2280+
2281+
class NumericOwnerTest(unittest.TestCase):
2282+
# mock the following:
2283+
# os.chown: so we can test what's being called
2284+
# os.chmod: so the modes are not actually changed. if they are, we can't
2285+
# delete the files/directories
2286+
# os.geteuid: so we can lie and say we're root (uid = 0)
2287+
2288+
@staticmethod
2289+
def _make_test_archive(filename_1, dirname_1, filename_2):
2290+
# the file contents to write
2291+
fobj = io.BytesIO(b"content")
2292+
2293+
# create a tar file with a file, a directory, and a file within that
2294+
# directory. Assign various .uid/.gid values to them
2295+
items = [(filename_1, 99, 98, tarfile.REGTYPE, fobj),
2296+
(dirname_1, 77, 76, tarfile.DIRTYPE, None),
2297+
(filename_2, 88, 87, tarfile.REGTYPE, fobj),
2298+
]
2299+
with tarfile.open(tmpname, 'w') as tarfl:
2300+
for name, uid, gid, typ, contents in items:
2301+
t = tarfile.TarInfo(name)
2302+
t.uid = uid
2303+
t.gid = gid
2304+
t.uname = 'root'
2305+
t.gname = 'root'
2306+
t.type = typ
2307+
tarfl.addfile(t, contents)
2308+
2309+
# return the full pathname to the tar file
2310+
return tmpname
2311+
2312+
@staticmethod
2313+
@contextmanager
2314+
def _setup_test(mock_geteuid):
2315+
mock_geteuid.return_value = 0 # lie and say we're root
2316+
fname = 'numeric-owner-testfile'
2317+
dirname = 'dir'
2318+
2319+
# the names we want stored in the tarfile
2320+
filename_1 = fname
2321+
dirname_1 = dirname
2322+
filename_2 = os.path.join(dirname, fname)
2323+
2324+
# create the tarfile with the contents we're after
2325+
tar_filename = NumericOwnerTest._make_test_archive(filename_1,
2326+
dirname_1,
2327+
filename_2)
2328+
2329+
# open the tarfile for reading. yield it and the names of the items
2330+
# we stored into the file
2331+
with tarfile.open(tar_filename) as tarfl:
2332+
yield tarfl, filename_1, dirname_1, filename_2
2333+
2334+
@unittest.mock.patch('os.chown')
2335+
@unittest.mock.patch('os.chmod')
2336+
@unittest.mock.patch('os.geteuid')
2337+
def test_extract_with_numeric_owner(self, mock_geteuid, mock_chmod,
2338+
mock_chown):
2339+
with self._setup_test(mock_geteuid) as (tarfl, filename_1, _,
2340+
filename_2):
2341+
tarfl.extract(filename_1, TEMPDIR, numeric_owner=True)
2342+
tarfl.extract(filename_2 , TEMPDIR, numeric_owner=True)
2343+
2344+
# convert to filesystem paths
2345+
f_filename_1 = os.path.join(TEMPDIR, filename_1)
2346+
f_filename_2 = os.path.join(TEMPDIR, filename_2)
2347+
2348+
mock_chown.assert_has_calls([unittest.mock.call(f_filename_1, 99, 98),
2349+
unittest.mock.call(f_filename_2, 88, 87),
2350+
],
2351+
any_order=True)
2352+
2353+
@unittest.mock.patch('os.chown')
2354+
@unittest.mock.patch('os.chmod')
2355+
@unittest.mock.patch('os.geteuid')
2356+
def test_extractall_with_numeric_owner(self, mock_geteuid, mock_chmod,
2357+
mock_chown):
2358+
with self._setup_test(mock_geteuid) as (tarfl, filename_1, dirname_1,
2359+
filename_2):
2360+
tarfl.extractall(TEMPDIR, numeric_owner=True)
2361+
2362+
# convert to filesystem paths
2363+
f_filename_1 = os.path.join(TEMPDIR, filename_1)
2364+
f_dirname_1 = os.path.join(TEMPDIR, dirname_1)
2365+
f_filename_2 = os.path.join(TEMPDIR, filename_2)
2366+
2367+
mock_chown.assert_has_calls([unittest.mock.call(f_filename_1, 99, 98),
2368+
unittest.mock.call(f_dirname_1, 77, 76),
2369+
unittest.mock.call(f_filename_2, 88, 87),
2370+
],
2371+
any_order=True)
2372+
2373+
# this test requires that uid=0 and gid=0 really be named 'root'. that's
2374+
# because the uname and gname in the test file are 'root', and extract()
2375+
# will look them up using pwd and grp to find their uid and gid, which we
2376+
# test here to be 0.
2377+
@unittest.skipUnless(root_is_uid_gid_0(),
2378+
'uid=0,gid=0 must be named "root"')
2379+
@unittest.mock.patch('os.chown')
2380+
@unittest.mock.patch('os.chmod')
2381+
@unittest.mock.patch('os.geteuid')
2382+
def test_extract_without_numeric_owner(self, mock_geteuid, mock_chmod,
2383+
mock_chown):
2384+
with self._setup_test(mock_geteuid) as (tarfl, filename_1, _, _):
2385+
tarfl.extract(filename_1, TEMPDIR, numeric_owner=False)
2386+
2387+
# convert to filesystem paths
2388+
f_filename_1 = os.path.join(TEMPDIR, filename_1)
2389+
2390+
mock_chown.assert_called_with(f_filename_1, 0, 0)
2391+
2392+
@unittest.mock.patch('os.geteuid')
2393+
def test_keyword_only(self, mock_geteuid):
2394+
with self._setup_test(mock_geteuid) as (tarfl, filename_1, _, _):
2395+
self.assertRaises(TypeError,
2396+
tarfl.extract, filename_1, TEMPDIR, False, True)
2397+
2398+
22672399
def setUpModule():
22682400
support.unlink(TEMPDIR)
22692401
os.makedirs(TEMPDIR)

Misc/ACKS

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1458,6 +1458,7 @@ Norman Vine
14581458
Pauli Virtanen
14591459
Frank Visser
14601460
Johannes Vogel
1461+
Michael Vogt
14611462
Radu Voicilas
14621463
Alex Volkov
14631464
Martijn Vries

Misc/NEWS

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -32,6 +32,10 @@ Core and Builtins
3232
Library
3333
-------
3434

35+
- Issue #23193: Add a numeric_owner parameter to
36+
tarfile.TarFile.extract and tarfile.TarFile.extractall. Patch by
37+
Michael Vogt and Eric Smith.
38+
3539
- Issue #23342: Add a subprocess.run() function than returns a CalledProcess
3640
instance for a more consistent API than the existing call* functions.
3741

0 commit comments

Comments
 (0)