Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Add support for png_text metadata, allow to customize metadata for other backends. #7349

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 21 commits into from
Dec 25, 2016

Conversation

Xarthisius
Copy link
Contributor

PNG Specification allows for custom png_text structures as an image attributes. The keywords that are given in the PNG specs are:

    Title            Short (one line) title or
                     caption for image
    Author           Name of image's creator
    Description      Description of image (possibly long)
    Copyright        Copyright notice
    Creation Time    Time of original image creation
                     (usually RFC 1123 format, see below)
    Software         Software used to create the image
    Disclaimer       Legal disclaimer
    Warning          Warning of nature of content
    Source           Device used to create the image
    Comment          Miscellaneous comment; conversion
                     from other image format

This PR allows to pass a dictionary with a metadata that can be used to fill those key/value pairs in png writer. Structure of the data and its type is not validated.

Additionally this PR allows to customize "software/creator" values in other backends such as PDF and PS.

@QuLogic
Copy link
Member

QuLogic commented Oct 25, 2016

Instead of replacing the entire metadata, I'd have the argument update the existing dictionary that was originally generated in the code. That way, the user doesn't need to figure out the default strings (version info and such), but can still provide other metadata or override it if really wanted.

@QuLogic QuLogic added this to the 2.1 (next point release) milestone Oct 25, 2016
@tacaswell
Copy link
Member

I agree with @QuLogic that user supplied data should update the default data, not completely replace it.

@mdboom should check the c, but this looks reasonable to me otherwise.

This will need and entry in whats_new.

Copy link
Member

@tacaswell tacaswell left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please update the existing meta-data rather than completely replacing it with the user supplied metadata.

Copy link
Member

@QuLogic QuLogic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few small issues.

@@ -552,8 +552,16 @@ def print_png(self, filename_or_obj, *args, **kwargs):
else:
close = False

version_str = 'matplotlib version ' + __version__ + \
', http://matplotlib.org/'
metadata = {six.b('Software'): six.b(version_str)}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This would be confusing for users, as they would probably provide regular strings (as that's what all the other ones accept.) If bytestrings are really required here, it should do the encoding after obtaining the user's metadata.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You also don't need six.b because we don't need to support Python 2.5.

@@ -469,10 +469,13 @@ def __init__(self, filename):

revision = ''
self.infoDict = {
'Creator': 'matplotlib %s, http://matplotlib.org' % __version__,
'Creator': 'matplotlib ' + __version__ +
', http://matplotlib.org',
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unnecessary change, looks like.

#ifdef PNG_TEXT_SUPPORTED
// Save the metadata
if (metadata != NULL)
{
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Brace is on the wrong line (comparatively speaking.)

// Save the metadata
if (metadata != NULL)
{
meta_size = PyDict_Size(metadata);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should be 4-space indent.

@tacaswell tacaswell dismissed their stale review October 27, 2016 17:59

Concern has been addressed.

text[meta_pos].key = PyBytes_AsString(meta_key);
text[meta_pos].text = PyBytes_AsString(meta_val);
if (PyUnicode_Check(meta_key)) {
PyObject *temp_key = PyUnicode_AsEncodedString(meta_key, "ASCII", "strict");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe it's actually (some small subset of) latin1/iso-8859-1.

@@ -2423,8 +2425,11 @@ def __init__(self, filename, keep_empty=True):
keep_empty: bool, optional
If set to False, then empty pdf files will be deleted automatically
when closed.
metadata: dictionary, optional
Information dictionary object (see PDF reference section 10.2.1
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't find this documentation particularly useful. You could you add at least some examples of metadata ?
Also, numpydoc format requires a space before the column : metadata : dictionary, …

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The original parameters all don't have a space before the colon, and rendered fine, so I don't think that's a requirement.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is. The current documentation is not rendered properly.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh yes, I missed that the bolding is wrong.

@@ -1073,13 +1073,19 @@ def write(self, *kl, **kwargs):
self.figure.set_facecolor(origfacecolor)
self.figure.set_edgecolor(origedgecolor)

# check for custom metadata
metadata = kwargs.pop("metadata", None)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mild preference for adding metadata=None to the signature.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This pop needs to be removed?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

err, sorry, there were two of these and I grew confused reading the diffs.

@tacaswell
Copy link
Member

@tacaswell
Copy link
Member

Sorry in advance if I am asking obvious questions, just trying to understand the libpng API.

From http://www.libpng.org/pub/png/libpng-manual.txt

    png_set_text(png_ptr, info_ptr, text_ptr, num_text);

    text_ptr       - array of png_text holding image
                     comments

    text_ptr[i].compression - type of compression used
                 on "text" PNG_TEXT_COMPRESSION_NONE
                           PNG_TEXT_COMPRESSION_zTXt
                           PNG_ITXT_COMPRESSION_NONE
                           PNG_ITXT_COMPRESSION_zTXt
    text_ptr[i].key   - keyword for comment.  Must contain
                 1-79 characters.
    text_ptr[i].text  - text comments for current
                         keyword.  Can be NULL or empty.
    text_ptr[i].text_length - length of text string,
                 after decompression, 0 for iTXt
    text_ptr[i].itxt_length - length of itxt string,
                 after decompression, 0 for tEXt/zTXt
    text_ptr[i].lang  - language of comment (NULL or
                         empty for unknown).
    text_ptr[i].translated_keyword  - keyword in UTF-8 (NULL
                         or empty for unknown).

    Note that the itxt_length, lang, and lang_key
    members of the text_ptr structure only exist when the
    library is built with iTXt chunk support.  Prior to
    libpng-1.4.0 the library was built by default without
    iTXt support. Also note that when iTXt is supported,
    they contain NULL pointers when the "compression"
    field contains PNG_TEXT_COMPRESSION_NONE or
    PNG_TEXT_COMPRESSION_zTXt.
  • Do we also need to set text_length?
  • From these docs it is not clear to me that setting the key to NULL is allowed.
  • Do we need to check the length of the key?

I can not quite sort out the difference between tEXt, TEXT, and iTXt in the docs. It seems we are using tEXt (due to the compression flag set) which seems like a good idea as the iTXt has a bunch of comments about bugs that result in non-readable pngs with some versions of libpng.

This is valuable work thank you again for taking this on!

@Xarthisius
Copy link
Contributor Author

Do we also need to set text_length?

AFAICT, it's totally ignored. There's an explicit strlen in png_set_text that populates text_length in the struct.

From these docs it is not clear to me that setting the key to NULL is allowed.

I'm not sure how to interpret the standard either. In libpng there's explicit:

if (text_ptr[i].key == NULL)
    continue;

in the loop over text_ptr.

Do we need to check the length of the key?

It's checked internally. If the key exceeds 79 it is truncated and warning is issued:

libpng warning: keyword truncated

@tacaswell
Copy link
Member

It makes sense that it takes care of the length it's self (so we can not lie to it, that seems like an obvious attack vector).

I am a tad concerned about the NULL and that behavior may be a libpng implementation detail. Might be safer to go with 'BAD KEY N' or something like that? Keys can be repeated, but seems safer to make them unique (as coming from a python dictionary they will be unique on the way in).

Might be worth turning that warning into a python warning? I am 👍 with punting on that for this PR.

@Xarthisius
Copy link
Contributor Author

ping, is there anything else left to do before this PR is merged?

@efiring
Copy link
Member

efiring commented Nov 16, 2016

This looks like a good contribution, but it would be nice to have @mdboom look at the agg part and @jkseppan look at the pdf and pd parts.

As @tacaswell noted, it will need a whats-new entry.

I have a question about the API and behavior: it looks like the present scheme, in which metadata comes in as a dictionary, has the undesirable side effect of making the order in which the metadata appear in the file non-deterministic (and at the very least, not under the control of the user). This could be avoided by using an OrderedDict, or by using a sequence of (key, value) tuples. Correct?

@Xarthisius
Copy link
Contributor Author

As @tacaswell noted, it will need a whats-new entry.

Sorry, it wasn't clear who's supposed to add it. I'll work on it.

I have a question about the API and behavior: it looks like the present scheme, in which metadata comes in as a dictionary, has the undesirable side effect of making the order in which the metadata appear in the file non-deterministic (and at the very least, not under the control of the user). This could be avoided by using an OrderedDict, or by using a sequence of (key, value) tuples. Correct?

In principle you are right. However, since there are no methods to read text value by its index why would it matter? I would say that random order is desirable side effect. It will prevent people from writing software that makes ill assumptions about metadata.

@tacaswell
Copy link
Member

On the bright side, with python 3.6 dictionary order is deterministic again!

@efiring
Copy link
Member

efiring commented Nov 16, 2016

I know zilch about metadata in image files, but I am guessing that utilities dump the metadata in the order in which it is found, e.g. for display to the screen, in which case it would be nice to control the order--wouldn't it?

Regarding deterministic vs non-deterministic, see #6317 for a discussion.

@codecov-io
Copy link

codecov-io commented Nov 20, 2016

Current coverage is 62.07% (diff: 81.81%)

Merging #7349 into master will decrease coverage by <.01%

@@             master      #7349   diff @@
==========================================
  Files           174        174          
  Lines         56007      56021    +14   
  Methods           0          0          
  Messages          0          0          
  Branches          0          0          
==========================================
+ Hits          34765      34773     +8   
- Misses        21242      21248     +6   
  Partials          0          0          

Powered by Codecov. Last update e73625e...935e02f

@tacaswell tacaswell closed this Nov 20, 2016
@tacaswell tacaswell reopened this Nov 20, 2016
@tacaswell
Copy link
Member

'power cycled' to restart against current master

@NelleV NelleV changed the title Add support for png_text metadata, allow to customize metadata for other backends. [MRG+1] Add support for png_text metadata, allow to customize metadata for other backends. Dec 19, 2016
@NelleV
Copy link
Member

NelleV commented Dec 19, 2016

Hi @Xarthisius
Do you mind rebasing? I think this is ready to be merged. Sorry it took so long.
Thanks,
N

@tacaswell tacaswell merged commit ab98852 into matplotlib:master Dec 25, 2016
@tacaswell
Copy link
Member

@Xarthisius Thanks!

Sorry this took so long to get merged.

@QuLogic QuLogic changed the title [MRG+1] Add support for png_text metadata, allow to customize metadata for other backends. Add support for png_text metadata, allow to customize metadata for other backends. Dec 26, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants