-
-
Notifications
You must be signed in to change notification settings - Fork 7.9k
Handle dvi font names as ASCII bytestrings #6977
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from 1 commit
Commits
Show all changes
24 commits
Select commit
Hold shift + click to select a range
705b021
Handle dvi font names as ASCII bytestrings
jkseppan dbc8b9e
Test that the KeyError is raised when the font is missing
jkseppan 93fad55
Mention bytestrings in docstring
jkseppan 4874e4e
Add a helpful note when raising KeyError from dviread.PsFonts
jkseppan a130ba7
Attempted fix for Python 3.4 compatibility
jkseppan 0f0e41a
More python 3.4 compatibility
jkseppan a7b5772
Use numpydoc format for several dviread docstrings
jkseppan 803a96e
Remove useless docstring
jkseppan ec5d80e
Raise a more useful exception
jkseppan fe52808
Remove misleading parentheses from assert
jkseppan aa8c4f6
Simplify parsing with regular expressions
jkseppan 9de07aa
Perhaps simplify further with regular expressions
jkseppan c87b653
Remove useless assert
jkseppan 2e19a61
Fix dvi font name handling in pdf backend
jkseppan 119934a
Separate the handling of dvi fonts in the pdf backend
jkseppan 8fa303f
Simplify enc file parsing
jkseppan 94587b1
Small changes in response to code review
jkseppan 254e3df
Simplify psfonts.map parsing further
jkseppan a8674b3
Try to fix the KeyError test
jkseppan 25a8fed
ENH: make texFontMap a property
tacaswell 92e2c52
Merge pull request #6 from tacaswell/dvi-ascii
jkseppan 5ba21b0
Use file system encoding for the psfonts file name
jkseppan 10135bf
Document minor API changes
jkseppan 6de9813
Explain named group ordering
jkseppan File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Small changes in response to code review
Improve a docstring, remove unneeded parens from an assert, open a file as binary instead of encoding each line read from it, don't call six.b on variable strings, simplify string handling, improve the formatting of a matplotlib.verbose.report call.
- Loading branch information
commit 94587b1b8ea7c93f468675efba2c1c8e5d7709d1
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -747,14 +747,10 @@ class Tfm(object): | |
Used for verifying against the dvi file. | ||
design_size : int | ||
Design size of the font (unknown units) | ||
width : dict | ||
Width of each character, needs to be scaled by the factor | ||
specified in the dvi file. This is a dict because indexing may | ||
width, height, depth : dict | ||
Dimensions of each character, need to be scaled by the factor | ||
specified in the dvi file. These are dicts because indexing may | ||
not start from 0. | ||
height : dict | ||
Height of each character. | ||
depth : dict | ||
Depth of each character. | ||
""" | ||
__slots__ = ('checksum', 'design_size', 'width', 'height', 'depth') | ||
|
||
|
@@ -844,25 +840,25 @@ def __init__(self, filename): | |
self._filename = filename | ||
if six.PY3 and isinstance(filename, bytes): | ||
self._filename = filename.decode('ascii', errors='replace') | ||
with open(filename, 'rt') as file: | ||
with open(filename, 'rb') as file: | ||
self._parse(file) | ||
|
||
def __getitem__(self, texname): | ||
assert(isinstance(texname, bytes)) | ||
assert isinstance(texname, bytes) | ||
try: | ||
result = self._font[texname] | ||
except KeyError: | ||
matplotlib.verbose.report(textwrap.fill | ||
('A PostScript file for the font whose TeX name is "%s" ' | ||
'could not be found in the file "%s". The dviread module ' | ||
'can only handle fonts that have an associated PostScript ' | ||
'font file. ' | ||
'This problem can often be solved by installing ' | ||
'a suitable PostScript font package in your (TeX) ' | ||
'package manager.' % (texname.decode('ascii'), | ||
self._filename), | ||
break_on_hyphens=False, break_long_words=False), | ||
'helpful') | ||
fmt = ('A PostScript file for the font whose TeX name is "{0}" ' | ||
'could not be found in the file "{1}". The dviread module ' | ||
'can only handle fonts that have an associated PostScript ' | ||
'font file. ' | ||
'This problem can often be solved by installing ' | ||
'a suitable PostScript font package in your (TeX) ' | ||
'package manager.') | ||
msg = fmt.format(texname.decode('ascii'), self._filename) | ||
msg = textwrap.fill(msg, break_on_hyphens=False, | ||
break_long_words=False) | ||
matplotlib.verbose.report(msg, 'helpful') | ||
raise | ||
fn, enc = result.filename, result.encoding | ||
if fn is not None and not fn.startswith(b'/'): | ||
|
@@ -873,7 +869,6 @@ def __getitem__(self, texname): | |
|
||
def _parse(self, file): | ||
for line in file: | ||
line = six.b(line) | ||
line = line.strip() | ||
if line == b'' or line.startswith(b'%'): | ||
continue | ||
|
@@ -979,21 +974,20 @@ def __iter__(self): | |
def _parse(self, file): | ||
result = [] | ||
|
||
lines = (line[:line.find(b'%')] if b'%' in line else line.strip() | ||
for line in file) | ||
lines = (line.split(b'%', 1)[0].strip() for line in file) | ||
data = b''.join(lines) | ||
match = re.search(six.b(r'\['), data) | ||
if not match: | ||
beginning = data.find(b'[') | ||
if beginning < 0: | ||
raise ValueError("Cannot locate beginning of encoding in {}" | ||
.format(file)) | ||
data = data[match.span()[1]:] | ||
match = re.search(six.b(r'\]'), data) | ||
if not match: | ||
data = data[beginning:] | ||
end = data.find(b']') | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. should this be an There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. nevermind 🐑 |
||
if end < 0: | ||
raise ValueError("Cannot locate end of encoding in {}" | ||
.format(file)) | ||
data = data[:match.span()[0]] | ||
data = data[:end] | ||
|
||
return re.findall(six.b(r'/([^][{}<>\s]+)'), data) | ||
return re.findall(br'/([^][{}<>\s]+)', data) | ||
|
||
|
||
def find_tex_file(filename, format=None): | ||
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why ascii instead of utf-8 or the system encoding?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suppose the system encoding is more correct, but conversions like that make me somewhat wary. It's not really enough to specify UTF-8, you have to know which representation to choose for characters where you have a choice. (For example, the Wikipedia page on HFS+: "File and folder names in HFS Plus are [...] normalized to a form very nearly the same as Unicode Normalization Form D (NFD)". At least at one time the Linux HFS+ implementation didn't follow this.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The correct encoding depends on where the bytestring originates. If it's out of a TeX file, I wouldn't be surprised if ASCII were good enough considering the esoteric requirements like fitting in 8 characters.
If it's something the user supplies, then there's really no good default and they really should have done it themselves. If the "user" is us, then we really need to fix that end instead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the only current users in our code are the PDF backend and the text2path code, both of which just pass in the location of "pdftex.map". I initially thought this might need to be made customizable but I've never seen that as a feature request.