Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Commit e680c3d

Browse files
CAM-Gerlachserhiy-storchaka
authored andcommitted
bpo-36268: Change default tar format to pax from GNU. (GH-12355)
1 parent ed5e29c commit e680c3d

5 files changed

Lines changed: 30 additions & 10 deletions

File tree

Doc/library/tarfile.rst

Lines changed: 10 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -229,7 +229,11 @@ details.
229229

230230
.. data:: DEFAULT_FORMAT
231231

232-
The default format for creating archives. This is currently :const:`GNU_FORMAT`.
232+
The default format for creating archives. This is currently :const:`PAX_FORMAT`.
233+
234+
.. versionchanged:: 3.8
235+
The default format for new archives was changed to
236+
:const:`PAX_FORMAT` from :const:`GNU_FORMAT`.
233237

234238

235239
.. seealso::
@@ -820,8 +824,10 @@ There are three tar formats that can be created with the :mod:`tarfile` module:
820824

821825
* The POSIX.1-2001 pax format (:const:`PAX_FORMAT`). It is the most flexible
822826
format with virtually no limits. It supports long filenames and linknames, large
823-
files and stores pathnames in a portable way. However, not all tar
824-
implementations today are able to handle pax archives properly.
827+
files and stores pathnames in a portable way. Modern tar implementations,
828+
including GNU tar, bsdtar/libarchive and star, fully support extended *pax*
829+
features; some older or unmaintained libraries may not, but should treat
830+
*pax* archives as if they were in the universally-supported *ustar* format.
825831

826832
The *pax* format is an extension to the existing *ustar* format. It uses extra
827833
headers for information that cannot be stored otherwise. There are two flavours
@@ -871,7 +877,7 @@ converted. Possible values are listed in section :ref:`error-handlers`.
871877
The default scheme is ``'surrogateescape'`` which Python also uses for its
872878
file system calls, see :ref:`os-filenames`.
873879

874-
In case of :const:`PAX_FORMAT` archives, *encoding* is generally not needed
880+
For :const:`PAX_FORMAT` archives (the default), *encoding* is generally not needed
875881
because all the metadata is stored using *UTF-8*. *encoding* is only used in
876882
the rare cases when binary pax headers are decoded or when strings with
877883
surrogate characters are stored.

Doc/whatsnew/3.8.rst

Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -316,6 +316,16 @@ and manipulating normal distributions of a random variable.
316316
[7.672102882379219, 12.000027119750287, 4.647488369766392]
317317

318318

319+
tarfile
320+
-------
321+
322+
The :mod:`tarfile` module now defaults to the modern pax (POSIX.1-2001)
323+
format for new archives, instead of the previous GNU-specific one.
324+
This improves cross-platform portability with a consistent encoding (UTF-8)
325+
in a standardized and extensible format, and offers several other benefits.
326+
(Contributed by C.A.M. Gerlach in :issue:`36268`.)
327+
328+
319329
tokenize
320330
--------
321331

Lib/tarfile.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -105,7 +105,7 @@
105105
USTAR_FORMAT = 0 # POSIX.1-1988 (ustar) format
106106
GNU_FORMAT = 1 # GNU tar format
107107
PAX_FORMAT = 2 # POSIX.1-2001 (pax) format
108-
DEFAULT_FORMAT = GNU_FORMAT
108+
DEFAULT_FORMAT = PAX_FORMAT
109109

110110
#---------------------------------------------------------
111111
# tarfile constants

Lib/test/test_tarfile.py

Lines changed: 6 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -2136,15 +2136,16 @@ def test_read_number_fields(self):
21362136
def test_write_number_fields(self):
21372137
self.assertEqual(tarfile.itn(1), b"0000001\x00")
21382138
self.assertEqual(tarfile.itn(0o7777777), b"7777777\x00")
2139-
self.assertEqual(tarfile.itn(0o10000000),
2139+
self.assertEqual(tarfile.itn(0o10000000, format=tarfile.GNU_FORMAT),
21402140
b"\x80\x00\x00\x00\x00\x20\x00\x00")
2141-
self.assertEqual(tarfile.itn(0xffffffff),
2141+
self.assertEqual(tarfile.itn(0xffffffff, format=tarfile.GNU_FORMAT),
21422142
b"\x80\x00\x00\x00\xff\xff\xff\xff")
2143-
self.assertEqual(tarfile.itn(-1),
2143+
self.assertEqual(tarfile.itn(-1, format=tarfile.GNU_FORMAT),
21442144
b"\xff\xff\xff\xff\xff\xff\xff\xff")
2145-
self.assertEqual(tarfile.itn(-100),
2145+
self.assertEqual(tarfile.itn(-100, format=tarfile.GNU_FORMAT),
21462146
b"\xff\xff\xff\xff\xff\xff\xff\x9c")
2147-
self.assertEqual(tarfile.itn(-0x100000000000000),
2147+
self.assertEqual(tarfile.itn(-0x100000000000000,
2148+
format=tarfile.GNU_FORMAT),
21482149
b"\xff\x00\x00\x00\x00\x00\x00\x00")
21492150

21502151
# Issue 32713: Test if itn() supports float values outside the
Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,3 @@
1+
Switch the default format used for writing tars with mod:`tarfile` to
2+
the modern POSIX.1-2001 pax standard, from the vendor-specific GNU.
3+
Contributed by C.A.M. Gerlach.

0 commit comments

Comments
 (0)