-
-
Notifications
You must be signed in to change notification settings - Fork 7.9k
Make pdftex.map parsing stricter #20400
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Also, I noticed that |
lib/matplotlib/dviread.py
Outdated
self._unparsed = defaultdict(list) | ||
for line in file: | ||
tfmname = line.split(b' ', 1)[0] | ||
self._unparsed[tfmname].append(line) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I usually write this as
self._unparsed = {}
for line in file: ...; self._unparsed.setdefault(tfmname, []).append(line)
(defaultdict's autovivification always makes me a bit nervous) but I guess you should time whichever is fastest.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using timeit.timeit('dviread.PsFontsMap(".../pdftex.map")', setup='from matplotlib import dviread')
taken 5 times, throwing away slowest and fastest times, the average for defaultdict
is 0.202 microseconds, and for setdefault
is 0.183 microseconds, for a map file that is 40827 lines long. Not sure if that's long or short.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd say that's on the long side, thanks for checking.
I guess the spirit of the module would be to keep everything as bytes (so convert back the result of find_tex_file using os.fsencode). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
modulo comments above.
This can be tested by placing two lines with the same `tfmname`, but different `psname` in a `pdftex.map`: ``` cmr12 CMR10 <cmr12.pfb cmr12 CMR12 <cmr12.pfb ``` and then running `TEXFONTMAPS=/path/to/pdftex.map pdflatex` on a file using Computer Modern. It will warn about the second line, and embed `CMR10` as the name in the resulting PDF.
As noted in the pfdtex manual, `SlantFont` and `ExtendFont` are only allowed for T1 fonts, and within range ±1 and ±2, respectively. This can be confirmed the same way as the previous commit, by copying the lines from the `test.map` (though using a _real_ tfmname).
As noted in the pdftex manual, > The *encodingfile* field may be omitted if you are sure that the font > resource has the correct built-in encoding. In general this option is > highly recommended, and it is *required* when subsetting a TrueType > font. This can be confirmed in a similar way to the previous commits, though instead of ignoring the line, pdflatex quits while attempting to embed the font.
d9d727b
to
ccaf495
Compare
Oops, I thought I tried it, but I guess |
ccaf495
to
b447363
Compare
lib/matplotlib/dviread.py
Outdated
if not encodingfile.startswith(b"/"): | ||
encodingfile = find_tex_file(encodingfile) | ||
else: | ||
encodingfile = encodingfile.decode('utf-8', errors='replace') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
that should be os.fsdecode then? at least on linux... (using surrogateescape rather than replace may matter)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This matches what find_tex_file
does.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
that's what find_tex_file does for filename
which should effectively be just a filename (not an absolute path); the absolute path's encoding is determined by the kwargs a bit further down (well, things are a bit more complicated, but still...).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But that's the encoding for communicating to/from kpsewhich
only. For converting the input to find_tex_file
(which is directly from this file), it uses replaced utf-8
like above.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we split that out to a separate issue/PR, and keep the type instability for now? I am not convinced that this is correct, but mostly just need to spend some time setting up a system with weird fsencoding for testing...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, can do that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, looking at it again, it seems that we can just remove the startswith("/")
check and always pass things to find_tex_file. I have checked that kpsewhich (whether called directly or via luatex) will just happily pass-through absolute paths, so we don't need to pre-filter them out. It is true that in theory this may make things slightly slower (due to the subprocess interaction), but in practice I haven't seen any absolute paths in pdftex.map either on my machine or on the shared macos...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, okay, changed to call that always then.
b447363
to
aa8e129
Compare
It should work for absolute paths as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm approving on the merits of @anntzer review.
PR Summary
A test has started failing in Fedora Rawhide with Texlive 2021; while I think there are some issues in the
pdftex.map
there (see my investigation here), I found the parser in Matplotlib to be a bit laxer than it should be. Annoyingly,dvipdfm
andpdflatex
appear to use different parsers for this file, but I chose to emulate whatpdflatex
does, as it appears to match what's in the pdfTeX manual, which we claim to follow.Some more details are available in the commit messages, but behaviour copied from
pdflatex
include:PR Checklist
pytest
passes).flake8
on changed files to check).flake8-docstrings
and runflake8 --docstring-convention=all
).doc/users/next_whats_new/
(follow instructions in README.rst there).doc/api/next_api_changes/
(follow instructions in README.rst there).