Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Commit c64e402

Browse files
committed
This is the implementation of POSIX.1-2001 (pax) format read/write
support. The TarInfo class now contains all necessary logic to process and create tar header data which has been moved there from the TarFile class. The fromtarfile() method was added. The new path and linkpath properties are aliases for the name and linkname attributes in correspondence to the pax naming scheme. The TarFile constructor and classmethods now accept a number of keyword arguments which could only be set as attributes before (e.g. dereference, ignore_zeros). The encoding and pax_headers arguments were added for pax support. There is a new tarinfo keyword argument that allows using subclassed TarInfo objects in TarFile. The boolean TarFile.posix attribute is deprecated, because now three tar formats are supported. Instead, the desired format for writing is specified using the constants USTAR_FORMAT, GNU_FORMAT and PAX_FORMAT as the format keyword argument. This change affects TarInfo.tobuf() as well. The test suite has been heavily reorganized and partially rewritten. A new testtar.tar was added that contains sample data in many formats from 4 different tar programs. Some bugs and quirks that also have been fixed: Directory names do no longer have a trailing slash in TarInfo.name or TarFile.getnames(). Adding the same file twice does not create a hardlink file member. The TarFile constructor does no longer need a name argument. The TarFile._mode attribute was renamed to mode and contains either 'r', 'w' or 'a'.
1 parent bdd0f39 commit c64e402

File tree

5 files changed

+1514
-968
lines changed

5 files changed

+1514
-968
lines changed

Doc/lib/libtarfile.tex

Lines changed: 104 additions & 53 deletions
Original file line numberDiff line numberDiff line change
@@ -12,21 +12,24 @@ \section{\module{tarfile} --- Read and write tar archive files}
1212

1313
\begin{itemize}
1414
\item reads and writes \module{gzip} and \module{bzip2} compressed archives.
15-
\item creates \POSIX{} 1003.1-1990 compliant or GNU tar compatible archives.
16-
\item reads GNU tar extensions \emph{longname}, \emph{longlink} and
17-
\emph{sparse}.
18-
\item stores pathnames of unlimited length using GNU tar extensions.
15+
\item read/write support for the \POSIX{}.1-1988 (ustar) format.
16+
\item read/write support for the GNU tar format including \emph{longname} and
17+
\emph{longlink} extensions, read-only support for the \emph{sparse}
18+
extension.
19+
\item read/write support for the \POSIX{}.1-2001 (pax) format.
20+
\versionadded{2.6}
1921
\item handles directories, regular files, hardlinks, symbolic links, fifos,
2022
character devices and block devices and is able to acquire and
2123
restore file information like timestamp, access permissions and owner.
2224
\item can handle tape devices.
2325
\end{itemize}
2426

25-
\begin{funcdesc}{open}{\optional{name\optional{, mode
26-
\optional{, fileobj\optional{, bufsize}}}}}
27+
\begin{funcdesc}{open}{name\optional{, mode\optional{,
28+
fileobj\optional{, bufsize}}}, **kwargs}
2729
Return a \class{TarFile} object for the pathname \var{name}.
28-
For detailed information on \class{TarFile} objects,
29-
see \citetitle{TarFile Objects} (section \ref{tarfile-objects}).
30+
For detailed information on \class{TarFile} objects and the keyword
31+
arguments that are allowed, see \citetitle{TarFile Objects}
32+
(section \ref{tarfile-objects}).
3033

3134
\var{mode} has to be a string of the form \code{'filemode[:compression]'},
3235
it defaults to \code{'r'}. Here is a full list of mode combinations:
@@ -130,6 +133,31 @@ \section{\module{tarfile} --- Read and write tar archive files}
130133
\versionadded{2.6}
131134
\end{excdesc}
132135

136+
\begin{datadesc}{USTAR_FORMAT}
137+
\POSIX{}.1-1988 (ustar) format. It supports filenames up to a length of
138+
at best 256 characters and linknames up to 100 characters. The maximum
139+
file size is 8 gigabytes. This is an old and limited but widely
140+
supported format.
141+
\end{datadesc}
142+
143+
\begin{datadesc}{GNU_FORMAT}
144+
GNU tar format. It supports arbitrarily long filenames and linknames and
145+
files bigger than 8 gigabytes. It is the defacto standard on GNU/Linux
146+
systems.
147+
\end{datadesc}
148+
149+
\begin{datadesc}{PAX_FORMAT}
150+
\POSIX{}.1-2001 (pax) format. It is the most flexible format with
151+
virtually no limits. It supports long filenames and linknames, large files
152+
and stores pathnames in a portable way. However, not all tar
153+
implementations today are able to handle pax archives properly.
154+
\end{datadesc}
155+
156+
\begin{datadesc}{DEFAULT_FORMAT}
157+
The default format for creating archives. This is currently
158+
\constant{GNU_FORMAT}.
159+
\end{datadesc}
160+
133161
\begin{seealso}
134162
\seemodule{zipfile}{Documentation of the \refmodule{zipfile}
135163
standard module.}
@@ -152,19 +180,70 @@ \subsection{TarFile Objects \label{tarfile-objects}}
152180
\class{TarInfo} object, see \citetitle{TarInfo Objects} (section
153181
\ref{tarinfo-objects}) for details.
154182

155-
\begin{classdesc}{TarFile}{\optional{name
156-
\optional{, mode\optional{, fileobj}}}}
157-
Open an \emph{(uncompressed)} tar archive \var{name}.
183+
\begin{classdesc}{TarFile}{name=None, mode='r', fileobj=None,
184+
format=DEFAULT_FORMAT, tarinfo=TarInfo, dereference=False,
185+
ignore_zeros=False, encoding=None, pax_headers=None, debug=0,
186+
errorlevel=0}
187+
188+
All following arguments are optional and can be accessed as instance
189+
attributes as well.
190+
191+
\var{name} is the pathname of the archive. It can be omitted if
192+
\var{fileobj} is given. In this case, the file object's \member{name}
193+
attribute is used if it exists.
194+
158195
\var{mode} is either \code{'r'} to read from an existing archive,
159196
\code{'a'} to append data to an existing file or \code{'w'} to create a new
160-
file overwriting an existing one. \var{mode} defaults to \code{'r'}.
197+
file overwriting an existing one.
161198

162199
If \var{fileobj} is given, it is used for reading or writing data.
163200
If it can be determined, \var{mode} is overridden by \var{fileobj}'s mode.
164201
\var{fileobj} will be used from position 0.
165202
\begin{notice}
166203
\var{fileobj} is not closed, when \class{TarFile} is closed.
167204
\end{notice}
205+
206+
\var{format} controls the archive format. It must be one of the constants
207+
\constant{USTAR_FORMAT}, \constant{GNU_FORMAT} or \constant{PAX_FORMAT}
208+
that are defined at module level.
209+
\versionadded{2.6}
210+
211+
The \var{tarinfo} argument can be used to replace the default
212+
\class{TarInfo} class with a different one.
213+
\versionadded{2.6}
214+
215+
If \var{dereference} is \code{False}, add symbolic and hard links to the
216+
archive. If it is \code{True}, add the content of the target files to the
217+
archive. This has no effect on systems that do not support symbolic links.
218+
219+
If \var{ignore_zeros} is \code{False}, treat an empty block as the end of
220+
the archive. If it is \var{True}, skip empty (and invalid) blocks and try
221+
to get as many members as possible. This is only useful for reading
222+
concatenated or damaged archives.
223+
224+
\var{debug} can be set from \code{0} (no debug messages) up to \code{3}
225+
(all debug messages). The messages are written to \code{sys.stderr}.
226+
227+
If \var{errorlevel} is \code{0}, all errors are ignored when using
228+
\method{extract()}. Nevertheless, they appear as error messages in the
229+
debug output, when debugging is enabled. If \code{1}, all \emph{fatal}
230+
errors are raised as \exception{OSError} or \exception{IOError} exceptions.
231+
If \code{2}, all \emph{non-fatal} errors are raised as \exception{TarError}
232+
exceptions as well.
233+
234+
The \var{encoding} argument defines the local character encoding. It
235+
defaults to the value from \function{sys.getfilesystemencoding()} or if
236+
that is \code{None} to \code{"ascii"}. \var{encoding} is used only in
237+
connection with the pax format which stores text data in \emph{UTF-8}. If
238+
it is not set correctly, character conversion will fail with a
239+
\exception{UnicodeError}.
240+
\versionadded{2.6}
241+
242+
The \var{pax_headers} argument must be a dictionary whose elements are
243+
either unicode objects, numbers or strings that can be decoded to unicode
244+
using \var{encoding}. This information will be added to the archive as a
245+
pax global header.
246+
\versionadded{2.6}
168247
\end{classdesc}
169248

170249
\begin{methoddesc}{open}{...}
@@ -279,43 +358,11 @@ \subsection{TarFile Objects \label{tarfile-objects}}
279358
\end{methoddesc}
280359

281360
\begin{memberdesc}{posix}
282-
If true, create a \POSIX{} 1003.1-1990 compliant archive. GNU
283-
extensions are not used, because they are not part of the \POSIX{}
284-
standard. This limits the length of filenames to at most 256,
285-
link names to 100 characters and the maximum file size to 8
286-
gigabytes. A \exception{ValueError} is raised if a file exceeds
287-
this limit. If false, create a GNU tar compatible archive. It
288-
will not be \POSIX{} compliant, but can store files without any
289-
of the above restrictions.
361+
Setting this to \constant{True} is equivalent to setting the
362+
\member{format} attribute to \constant{USTAR_FORMAT},
363+
\constant{False} is equivalent to \constant{GNU_FORMAT}.
290364
\versionchanged[\var{posix} defaults to \constant{False}]{2.4}
291-
\end{memberdesc}
292-
293-
\begin{memberdesc}{dereference}
294-
If false, add symbolic and hard links to archive. If true, add the
295-
content of the target files to the archive. This has no effect on
296-
systems that do not support symbolic links.
297-
\end{memberdesc}
298-
299-
\begin{memberdesc}{ignore_zeros}
300-
If false, treat an empty block as the end of the archive. If true,
301-
skip empty (and invalid) blocks and try to get as many members as
302-
possible. This is only useful for concatenated or damaged
303-
archives.
304-
\end{memberdesc}
305-
306-
\begin{memberdesc}{debug=0}
307-
To be set from \code{0} (no debug messages; the default) up to
308-
\code{3} (all debug messages). The messages are written to
309-
\code{sys.stderr}.
310-
\end{memberdesc}
311-
312-
\begin{memberdesc}{errorlevel}
313-
If \code{0} (the default), all errors are ignored when using
314-
\method{extract()}. Nevertheless, they appear as error messages
315-
in the debug output, when debugging is enabled. If \code{1}, all
316-
\emph{fatal} errors are raised as \exception{OSError} or
317-
\exception{IOError} exceptions. If \code{2}, all \emph{non-fatal}
318-
errors are raised as \exception{TarError} exceptions as well.
365+
\deprecated{2.6}{Use the \member{format} attribute instead.}
319366
\end{memberdesc}
320367

321368
%-----------------
@@ -343,12 +390,16 @@ \subsection{TarInfo Objects \label{tarinfo-objects}}
343390
invalid.]{2.6}
344391
\end{methoddesc}
345392

346-
\begin{methoddesc}{tobuf}{posix}
347-
Create a string buffer from a \class{TarInfo} object.
348-
See \class{TarFile}'s \member{posix} attribute for information
349-
on the \var{posix} argument. It defaults to \constant{False}.
393+
\begin{methoddesc}{fromtarfile}{tarfile}
394+
Read the next member from the \class{TarFile} object \var{tarfile} and
395+
return it as a \class{TarInfo} object.
396+
\versionadded{2.6}
397+
\end{methoddesc}
350398

351-
\versionadded[The \var{posix} parameter]{2.5}
399+
\begin{methoddesc}{tobuf}{\optional{format}}
400+
Create a string buffer from a \class{TarInfo} object. See
401+
\class{TarFile}'s \member{format} argument for information.
402+
\versionchanged[The \var{format} parameter]{2.6}
352403
\end{methoddesc}
353404

354405
A \code{TarInfo} object has the following public data attributes:

0 commit comments

Comments
 (0)