Thanks to visit codestin.com
Credit goes to github.com

Skip to content

[MRG] DOC, FIX: Support Python 2 and 3 in gen_rst.py #3777

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 2 commits into from

Conversation

nmayorov
Copy link
Contributor

The encoding (if not specified) used by open in Python is platform dependent. On my Windows machine it is cp1251, so I had troubles building docs with the example gallery because of that (some examples contain non-ASCII characters.)

I think setting it explicitly to utf-8 is a good thing.

@agramfort
Copy link
Member

on my mac locale.getpreferredencoding(False) is US-ASCII and I can build the doc (unless something changed very recently).

can you point to a file that contains non-ascii characters?

@nmayorov
Copy link
Contributor Author

There are plenty in comment lines. For example: the letter in the name and weird quotes.

The reason of confusion is the version of Python. I tried to build docs in Python 3 and there is indeed a problem with decoding as he tries to access lines of file. In Python 2 it never occurs, because he doesn't try to decode anything at this point. (Is my explanation correct?)

I might suggest to reload open by codecs.open for Python 2 and use open(fname, encoding='utf-8) for both Python 3 and Python 2. Do you approve this approach?

@agramfort
Copy link
Member

@larsmans our encoding expert what do you think?

@larsmans
Copy link
Member

Better than codecs.open is io.open, which should behave the same on Py2 and Py3 (it's a backport of Py3's open).

@larsmans
Copy link
Member

Your patch, as-is, would break Sphinx on Python 2.

@nmayorov
Copy link
Contributor Author

@larsmans I realized that. I will change to io.open.

@larsmans
Copy link
Member

Cool, ping me when you're done.

@nmayorov
Copy link
Contributor Author

Still struggling to get it working. I came to the conclusion that in Python 2 everything should be str (mixing unicode there seems like a bad idea.) So the initial plan should be abandoned probably. I'll keep looking into that.

@nmayorov
Copy link
Contributor Author

I just used conditions with six.PY2. It builds on both versions of Python.

But there are some problems in Python 2 unrelated to this patch (these errors appear on a clean build from master too.) They look like this (just several of them in total):

Traceback (most recent call last):
  File "c:\scikit-learn-python2.7\doc\sphinxext\gen_rst.py", line 869, in generate_file_rst
    execfile(os.path.basename(src_file), my_globals)
  File "plot_species_distribution_modeling.py", line 207, in <module>
    plot_species_distribution()
  File "plot_species_distribution_modeling.py", line 102, in plot_species_distribution
    data = fetch_species_distributions()
  File "C:\scikit-learn-python2.7\sklearn\datasets\species_distributions.py", line 250, in fetch_species_distributions
    bunch = joblib.load(join(data_home, DATA_ARCHIVE_NAME))
  File "C:\scikit-learn-python2.7\sklearn\externals\joblib\numpy_pickle.py", line 419, in load
    unpickler = ZipNumpyUnpickler(filename, file_handle=file_handle)
  File "C:\scikit-learn-python2.7\sklearn\externals\joblib\numpy_pickle.py", line 308, in __init__
    mmap_mode=None)
  File "C:\scikit-learn-python2.7\sklearn\externals\joblib\numpy_pickle.py", line 266, in __init__
    self.file_handle = self._open_pickle(file_handle)
  File "C:\scikit-learn-python2.7\sklearn\externals\joblib\numpy_pickle.py", line 311, in _open_pickle
    return BytesIO(read_zfile(file_handle))
  File "C:\scikit-learn-python2.7\sklearn\externals\joblib\numpy_pickle.py", line 65, in read_zfile
    length = int(length, 16)
ValueError: invalid literal for int() with base 16: '0x339698f          x'

The problem was caused by Python 2 reusing fetched by Python 3 data files.

So everything is all right, I think it can be merged.

Ping @larsmans

@coveralls
Copy link

Coverage Status

Coverage increased (+0.02%) when pulling 1d19e7b on nmayorov:doc_explicit_utf8 into 8d82d2a on scikit-learn:master.

@nmayorov nmayorov changed the title DOC, FIX: Explicit encoding for opened files in gen_rst.py [MRG] DOC, FIX: Support Python 2 and 3 in gen_rst.py Oct 17, 2014
@nmayorov
Copy link
Contributor Author

Hey, @larsmans I think it can be merged, please take a look.

@amueller
Copy link
Member

amueller commented Nov 6, 2014

So the problem with io.open is pep 263 right

If a Unicode string with a coding declaration is passed to compile(),
a SyntaxError will be raised

That is slightly annoying.

@nmayorov
Copy link
Contributor Author

nmayorov commented Nov 6, 2014

I don't think the link is relevant.

The problem is that in PY2 and PY3 default string types are different (str <-> bytes, unicode <-> str). And in Python 3 file's lines are read and decoded (so we have to provide encoding), whereas in Python 2 they are just read as bytes.

The best strategy (as I figured) is to work with default str type in both versions (but they are different types actually), that's why I added conditional opens.

@amueller
Copy link
Member

amueller commented Nov 6, 2014

Interesting, then you ran into a different error on python 2 than I did. For me the reading worked fine using io.open(fname, encoding='utf-8'), just evaling unicode with a coding declaration gave an error.

@nmayorov
Copy link
Contributor Author

nmayorov commented Nov 7, 2014

Initially I ran into problems using Python 3. So I modified open(file) to open(file, encoding='utf-8'), but of course it broke Python 2. I tried to use io.open in Python 2, but then unicode and str started conflicting all over the place. So I decided to simply use different open statements in different versions.

@amueller
Copy link
Member

amueller commented Nov 7, 2014

The current solution is fine with me, but I'm not the expert ;)

@ogrisel
Copy link
Member

ogrisel commented Nov 21, 2014

@nmayorov what about never decoding and just using byte strings (str under Python 2 and bytes under Python 3) with open(filename, 'rb')? Would that work?

@GaelVaroquaux
Copy link
Member

@Titan-C, you want to follow that, for
https://github.com/sphinx-gallery/sphinx-gallery

@nmayorov
Copy link
Contributor Author

@ogrisel:

In theory you could do that, but it's going to cause a lot of conflicts in all sources interacting with gen_rst.py (for starters you'll have to add b prefix to every literal string in code and so on.)

I see it as follows: everything was working well except when encoding gets wrong, so let's fix it and leave the rest intact.

@nmayorov
Copy link
Contributor Author

@ogrisel could you consider merging this?

It's a very small patch, I think it's totally fine. Perhaps it doesn't affect many people, but still a bug. And the project definitely doesn't need another forever hanging pull request (there are already too many imho).

@jnothman
Copy link
Member

I think it's fine to merge, especially given that it's likely to change once we adopt sphinx-gallery.

@amueller amueller added the Bug label Jan 16, 2015
@amueller amueller added this to the 0.16 milestone Jan 16, 2015
@lesteve
Copy link
Member

lesteve commented Mar 3, 2015

I rebased on master, fixed the minor merge conflict and removed a few trailing spaces in this branch.

I regenerated the doc from scratch locally for both python2 and python3 and checked the examples gallery visually and everything seems to work fine AFAICT.

For completeness here is a quick way to reproduce the original problem (only fails inside a python3 environment):

mv examples{,_bak}
mkdir examples
cp examples_bak/{plot_digits_pipe.py,README.txt} examples
cd doc && make clean
LANG=fr_FR LC_CTYPE=fr_FR LC_ALL=fr_FR make html

Output:

~/dev/scikit-learn/doc $ LANG=fr_FR LC_CTYPE=fr_FR LC_ALL=fr_FR make html
# These two lines make the build a bit more lengthy, and the
# the embedding of images more robust
rm -rf _build/html/_images
#rm -rf _build/doctrees/
sphinx-build -b html -d _build/doctrees   . _build/html/stable
Making output directory...
Running Sphinx v1.2.3
loading pickled environment... failed: [Errno 2] No such file or directory: '/home/lesteve/dev/scikit-learn/doc/_build/doctrees/environment.pickle'

Encoding error:
'ascii' codec can't decode byte 0xc3 in position 422: ordinal not in range(128)
The full traceback has been saved in /tmp/sphinx-err-wrv58ba5.log, if you want to report the issue to the developers.
make: *** [html] Error 1
Command exited with non-zero status 2

@ogrisel
Copy link
Member

ogrisel commented Mar 3, 2015

Thanks testing @lesteve. I will merge this branch and you fix.

@ogrisel
Copy link
Member

ogrisel commented Mar 3, 2015

Done! Thanks again @nmayorov and @lesteve!

@ogrisel ogrisel closed this Mar 3, 2015
@nmayorov nmayorov deleted the doc_explicit_utf8 branch September 1, 2015 04:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants