-
-
Notifications
You must be signed in to change notification settings - Fork 7.9k
Don't include the postscript title if it is not latin-1 encodable. #11130
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
👍 An alternative would be to use replace or ignore options for encode https://docs.python.org/3/howto/unicode.html#converting-to-bytes |
That'll happen after someone does an exegesis of the 700 pages of the postscript standard to figure out what the correct approach (if any) is, but my limited understanding is summarized above. Anyways, I think "saving with missing optional metadata" is better than "failing to save". |
@anntzer I do not understand your comment if you do as in your pr and catch any exception the title will be blank with any non latin1 char in the title. If on the other hand you do
you will get the title but with any invalid char replaced by a ? I don't see how this requires reading 700 pages of standard |
+1 for
|
The idea was perhaps the standard specifies a way to include non-ascii strings, or perhaps it doesn't, I don't know. In fact it is not even clear that latin-1 encoding (which we use right now) is correct, as PostScript traditionally uses something else. |
From my understanding, the PostScript standard supports more general strings than ASCII. However, you have to do it yourself. From https://unix.stackexchange.com/questions/269659/graphviz-how-to-get-utf-8-and-external-postscript-procedures
IMO not worth looking into. Let's just |
099efbc
to
9e311d9
Compare
fixed accordingly |
Thanks all! |
After update to the latest release of Anaconda including matplotlib 3.0.1, savefig to eps fails, if the filename contains e.g. an "ü" = x'fc which is part of the ISO 8859-1 character set.
Savefig to png works without problems. -- System is Windows 7 pro 64bit. Q: Is this the right place for this issue, or is this an Anaconda issue? Traceback: File "C:\Anaconda3\lib\site-packages\matplotlib\pyplot.py", line 689, in savefig File "C:\Anaconda3\lib\site-packages\matplotlib\figure.py", line 2094, in savefig File "C:\Anaconda3\lib\site-packages\matplotlib\backend_bases.py", line 2075, in print_figure File "C:\Anaconda3\lib\site-packages\matplotlib\backends\backend_ps.py", line 921, in print_eps File "C:\Anaconda3\lib\site-packages\matplotlib\backends\backend_ps.py", line 950, in _print_ps File "C:\Anaconda3\lib\site-packages\matplotlib\backends\backend_ps.py", line 976, in _print_figure |
fwiw I looked a bit more into the standard. The following excerpts are relevant: PostScript standard section 3.2.2
PostScript Document Structuring Conventions
This is a modified version of the elementary type. If the first character encountered is a left parenthesis, it is equivalent to a string. If not, the token is considered to be the rest of the characters on the line until end of line A text string comprises any printable characters and is usually considered to be delimited by blanks. If blanks or special characters are desired inside the text string, the entire string should be enclosed in parentheses. Document managers parsing text strings should be prepared to handle multiple parentheses. Special characters can be denoted using the PostScript language string \ escape mechanism. A quick test shows that at least okular does convert \ddd escapes (while it gets confused by latin-1-encoded, non-ASCII strings) when using the "Import PostScript as PDF" functionality (the metadata can then be checked using "Properties"); however it does not e.g. check for a starting opening parenthesis as required by the spec. Which is kind of strange as I think okular relies on libspectre, which explicitly does handle this (https://github.com/freedesktop/libspectre/blob/48696f7e724923564dd6c8908afdb7c9d4893f02/libspectre/ps.c#L1305). So I guess we could implement \ddd escapes and get some small additional correctness there. |
I don't agree. |
I agree with your interpretation. |
Sorry, I don't (yet) know how to submit a PR. Up to now I only used the (bug)trackers of github, and I have no knowledge of the git system. |
Done in #12890. |
PR Summary
Closes #11124.
There does not appear to be a complete Unicode encoding available for postscript, so even if certain non-latin1 characters can be handled (won't be done in this PR, in any case), we'll always need to know what to do in the case we can't encode the title.
The %%Title is optional, as is clear from the
is_writable_file_like(outfile)
clause.PR Checklist