Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Presence of umlauts causes problems when parsing searchindex.js #53

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
florian-wagner opened this issue Aug 11, 2015 · 9 comments
Closed

Comments

@florian-wagner
Copy link

We have a few umlauts in our project (i.e. in two of the authors lastnames). make html-noplot works fine, but sphinxgallery runs into problems:

Traceback (most recent call last):
  File "/usr/lib64/python3.4/site-packages/sphinx/cmdline.py", line 254, in main
    app.build(force_all, filenames)
  File "/usr/lib64/python3.4/site-packages/sphinx/application.py", line 221, in build
    self.emit('build-finished', None)
  File "/usr/lib64/python3.4/site-packages/sphinx/application.py", line 400, in emit
    results.append(callback(self, *args))
  File "/usr/lib64/python3.4/site-packages/sphinxgallery/docs_resolv.py", line 432, in embed_code_links
    _embed_code_links(app, gallery_conf, gallery_dir)
  File "/usr/lib64/python3.4/site-packages/sphinxgallery/docs_resolv.py", line 328, in _embed_code_links
    relative=True)
  File "/usr/lib64/python3.4/site-packages/sphinxgallery/docs_resolv.py", line 212, in __init__
    sindex = get_data(searchindex_url, gallery_dir)
  File "/usr/lib64/python3.4/site-packages/sphinxgallery/docs_resolv.py", line 66, in get_data
    data = _get_data(url)
  File "/usr/lib64/python3.4/site-packages/sphinxgallery/docs_resolv.py", line 49, in _get_data
    data = fid.read()
  File "/usr/lib64/python3.4/encodings/ascii.py", line 26, in decode
    return codecs.ascii_decode(input, self.errors)[0]
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 10117: ordinal not in range(128)

They appear cause problems when searchindex.js is parsed. Can this be improved?

@florian-wagner florian-wagner changed the title Presence of umlauts causes problems Presence of umlauts causes problems when parsing searchindex.js Aug 11, 2015
@Titan-C
Copy link
Member

Titan-C commented Aug 11, 2015

I can't get enough information from your issue report. And how you get to
the error.
I have nevertheless tested using Unicode in the examples and it worked. My
name in the license comment has also non ascii characters. Maybe making
sure your files have Unicode encoding and you have on top of the file the
encoding comment.
I remember there was already an issue or pr for Unicode support #18 #19.
But for python 2. Which is not your case.
There was also the discussion of enforcing ascii characters in code.

The case for searchindex.js is when resolving the links for the modules of
your project. Meaning you have the non ascii characters in your modules,
and sphinx is not saving the index in Unicode(which is unexpected, even
more in python 3) but this is out of our scope of control.

@carsten-forty2
Copy link

This problem is originated by me.

I'm running sphinx with Python 3.4.1

I have an authors name with the umlaut ü in a readme.rst (utf-8 encoded).

I don't knwo why .. but sphinxgallery/docs_resolv.py:48

with open(url, 'r') as fid:

opens the file searchindex.js as:
<_io.TextIOWrapper name='searchindex.js' mode='r' encoding='ANSI_X3.4-1968'>

With this encoding the ü umlaut raise the mentioned UnicodeDecodeError.

If I force sphinxgallery/docs_resolv.py:48 to open it with UTF-8 encoding

with open(url, 'r', encoding='utf-8') as fid:

fid is <_io.TextIOWrapper name='searchindex.js' mode='r' encoding='utf-8'>

and the error disapears.

I cannot estimate if this utf8 default encoding leads to further problems or not. But if its safe, it would be nice you can add this setting.

Cheers

@lesteve
Copy link
Member

lesteve commented Aug 18, 2015

Could it not be a problem with your locale?

What's the output of locale in your shell?
Also what do you get when doing python -c 'import sys; print(sys.stdout.encoding)'?

@carsten-forty2
Copy link

Locale: LANG=C
python -c 'import sys; print(sys.stdout.encoding)' says ANSI_X3.4-1968

Hmm .. interesting point.
I have to dig out where these encoding setting come from.
Probably, an utf8 encoding will solve this issue.

Thank you.

@lesteve
Copy link
Member

lesteve commented Oct 27, 2015

@carsten-forty2 did you manage to fix your problem eventually ?

@carsten-forty2
Copy link

@lesteve no .. I had not yet the time to dig in why my sys.stdout.encoding is no UTF8

I live with a local patched version of sphinxgallery/docs_resolv.py

@carsten-forty2
Copy link

my LC_CTYPE was set to 'C' .. that causes python to take these ANSI encoding.

If I set my LC_CTYPE to 'de_DE.UTF-8' .. everything runs fine without the mentioned patch.

The issue can be closed for me.

@lesteve
Copy link
Member

lesteve commented Oct 27, 2015

I live with a local patched version of sphinxgallery/docs_resolv.py

OK can you just paste your patch in case this is useful for later or for someone else?

@carsten-forty2
Copy link

sphinxgallery/docs_resolv.py:48

-with open(url, 'r') as fid:
+with open(url, 'r', encoding='utf-8') as fid:

@Titan-C Titan-C closed this as completed Jul 21, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants