Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Make pdftex.map parsing stricter #20400

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
Jun 16, 2021
Merged

Conversation

QuLogic
Copy link
Member

@QuLogic QuLogic commented Jun 10, 2021

PR Summary

A test has started failing in Fedora Rawhide with Texlive 2021; while I think there are some issues in the pdftex.map there (see my investigation here), I found the parser in Matplotlib to be a bit laxer than it should be. Annoyingly, dvipdfm and pdflatex appear to use different parsers for this file, but I chose to emulate what pdflatex does, as it appears to match what's in the pdfTeX manual, which we claim to follow.

Some more details are available in the commit messages, but behaviour copied from pdflatex include:

  • ignoring duplicate lines
  • ignoring lines with out of range special entries, or on the wrong font type
  • failing on subset TrueType fonts without encoding files

PR Checklist

  • Has pytest style unit tests (and pytest passes).
  • Is Flake 8 compliant (run flake8 on changed files to check).
  • [n/a] New features are documented, with examples if plot related.
  • [n/a] Documentation is sphinx and numpydoc compliant (the docs should build without error).
  • Conforms to Matplotlib style conventions (install flake8-docstrings and run flake8 --docstring-convention=all).
  • [n/a] New features have an entry in doc/users/next_whats_new/ (follow instructions in README.rst there).
  • [n/a] API changes documented in doc/api/next_api_changes/ (follow instructions in README.rst there).

@QuLogic QuLogic added this to the v3.5.0 milestone Jun 10, 2021
@QuLogic
Copy link
Member Author

QuLogic commented Jun 10, 2021

Also, I noticed that encodingfile and fontfile have inconsistent types. If they're not absolute, they're passed to find_tex_file, which returns a str, but otherwise they're bytes. We seem to pass these results to open or pathlib.Path, which accept both, but I wonder if we should reconcile this difference?

self._unparsed = defaultdict(list)
for line in file:
tfmname = line.split(b' ', 1)[0]
self._unparsed[tfmname].append(line)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I usually write this as

self._unparsed = {}
for line in file: ...; self._unparsed.setdefault(tfmname, []).append(line)

(defaultdict's autovivification always makes me a bit nervous) but I guess you should time whichever is fastest.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using timeit.timeit('dviread.PsFontsMap(".../pdftex.map")', setup='from matplotlib import dviread') taken 5 times, throwing away slowest and fastest times, the average for defaultdict is 0.202 microseconds, and for setdefault is 0.183 microseconds, for a map file that is 40827 lines long. Not sure if that's long or short.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd say that's on the long side, thanks for checking.

@anntzer
Copy link
Contributor

anntzer commented Jun 10, 2021

I guess the spirit of the module would be to keep everything as bytes (so convert back the result of find_tex_file using os.fsencode).

Copy link
Contributor

@anntzer anntzer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

modulo comments above.

QuLogic added 3 commits June 10, 2021 02:20
This can be tested by placing two lines with the same `tfmname`, but
different `psname` in a `pdftex.map`:

```
cmr12 CMR10 <cmr12.pfb
cmr12 CMR12 <cmr12.pfb
```

and then running `TEXFONTMAPS=/path/to/pdftex.map pdflatex` on a file
using Computer Modern. It will warn about the second line, and embed
`CMR10` as the name in the resulting PDF.
As noted in the pfdtex manual, `SlantFont` and `ExtendFont` are only
allowed for T1 fonts, and within range ±1 and ±2, respectively.

This can be confirmed the same way as the previous commit, by copying
the lines from the `test.map` (though using a _real_ tfmname).
As noted in the pdftex manual,

> The *encodingfile* field may be omitted if you are sure that the font
> resource has the correct built-in encoding. In general this option is
> highly recommended, and it is *required* when subsetting a TrueType
> font.

This can be confirmed in a similar way to the previous commits, though
instead of ignoring the line, pdflatex quits while attempting to embed
the font.
@QuLogic QuLogic force-pushed the stricter-psfontsmap branch from d9d727b to ccaf495 Compare June 10, 2021 06:30
@QuLogic
Copy link
Member Author

QuLogic commented Jun 10, 2021

Oops, I thought I tried it, but I guess Path doesn't like bytes, so I'll have to go with str.

@QuLogic QuLogic force-pushed the stricter-psfontsmap branch from ccaf495 to b447363 Compare June 10, 2021 07:20
if not encodingfile.startswith(b"/"):
encodingfile = find_tex_file(encodingfile)
else:
encodingfile = encodingfile.decode('utf-8', errors='replace')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that should be os.fsdecode then? at least on linux... (using surrogateescape rather than replace may matter)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This matches what find_tex_file does.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that's what find_tex_file does for filename which should effectively be just a filename (not an absolute path); the absolute path's encoding is determined by the kwargs a bit further down (well, things are a bit more complicated, but still...).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But that's the encoding for communicating to/from kpsewhich only. For converting the input to find_tex_file (which is directly from this file), it uses replaced utf-8 like above.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we split that out to a separate issue/PR, and keep the type instability for now? I am not convinced that this is correct, but mostly just need to spend some time setting up a system with weird fsencoding for testing...

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, can do that.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, looking at it again, it seems that we can just remove the startswith("/") check and always pass things to find_tex_file. I have checked that kpsewhich (whether called directly or via luatex) will just happily pass-through absolute paths, so we don't need to pre-filter them out. It is true that in theory this may make things slightly slower (due to the subprocess interaction), but in practice I haven't seen any absolute paths in pdftex.map either on my machine or on the shared macos...

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, okay, changed to call that always then.

@QuLogic QuLogic force-pushed the stricter-psfontsmap branch from b447363 to aa8e129 Compare June 10, 2021 22:41
It should work for absolute paths as well.
Copy link
Member

@jklymak jklymak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm approving on the merits of @anntzer review.

@jklymak jklymak merged commit 7d50020 into matplotlib:master Jun 16, 2021
@QuLogic QuLogic deleted the stricter-psfontsmap branch June 16, 2021 19:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants