Thanks to visit codestin.com
Credit goes to github.com

Skip to content

[MRG+1]: Allow unicode in code and outputs #106

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 11 commits into from
Mar 17, 2016

Conversation

larsoner
Copy link
Contributor

Previously, our code had unicode in the stdout which gave:

../examples/preprocessing/plot_maxwell_filter.py is not compiling:
Traceback (most recent call last):
  File "/home/larsoner/custombuilds/sphinx-gallery/sphinx_gallery/gen_rst.py", line 474, in execute_script
    my_stdout = my_buffer.getvalue().strip().expandtabs()
  File "/usr/lib/python2.7/StringIO.py", line 271, in getvalue
    self.buf += ''.join(self.buflist)
UnicodeDecodeError: 'ascii' codec can't decode byte 0xce in position 36: ordinal not in range(128)

Modified a test that fails on master and passes on this PR.

Works on my system (™) on Py3k and Python 2.7.

Closes #18.
Closes #19.

@larsoner larsoner changed the title FIX: Allow unicode in code and outputs WIP: Allow unicode in code and outputs Mar 14, 2016
@larsoner
Copy link
Contributor Author

With the latest commit I get this:

        Adjusted coil positions by (μ ± σ): 1.2° ± 1.5° (max: 6.7°)
Traceback (most recent call last):
  File "/usr/lib/python2.7/logging/__init__.py", line 882, in emit
    stream.write(fs % msg.encode("UTF-8"))
UnicodeDecodeError: 'ascii' codec can't decode byte 0xce in position 36: ordinal not in range(128)
Logged from file maxwell.py, line 1698

And the example continues to run, but I still have some issues :(

@@ -124,7 +125,7 @@ def flush(self):
"""


CODE_OUTPUT = """.. rst-class:: sphx-glr-script-out
CODE_OUTPUT = u""".. rst-class:: sphx-glr-script-out
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We start using in our files, so we don't miss a string

from __future__ import unicode_literals

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

unicode_literals takes the need away to prefix every string with u, makes everything unicode. My first guess was to use that instead of going through each string prefixing the u. But it is not that perfect, and I could not get sphinx-gallery to run my examples with unicode_literals in gen_rst.py

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you try putting it in while on this branch, or on a previous one? It might work okay with this one since the unicode reading is a bit more unified.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(I put in the __future__ line, removed all u' and u" and it seems to work fine)

@Titan-C
Copy link
Member

Titan-C commented Mar 14, 2016

I have still an old PR on unicode #19. What I rescue from it now is that file opening is using the python module codecs. Instead of loading the file as binary

with codecs.open(filename,  'w', 'utf8') as file_content:

@larsoner
Copy link
Contributor Author

larsoner commented Mar 14, 2016 via email

@Titan-C
Copy link
Member

Titan-C commented Mar 14, 2016

I tested this on my personal examples in python2. It is actually a pain to get unicode to work every time. Because it can fail when docstrings are extracted or while executing or when writing the rst file. And I'm not entirely sure how to work around this.

I also noticed that it makes a huge difference for the test case if one loads from disk the file or one gets the input from within the test script as in now.

@larsoner
Copy link
Contributor Author

I'm not entirely sure how to work around this.

I think we need to make sure unicode is used properly everywhere. It's a bit annoying to get right but it should be possible.

I also noticed that it makes a huge difference for the test case if one loads from disk the file or one gets the input from within the test script as in now.

So I should add another test, then, I take it?

@larsoner
Copy link
Contributor Author

I also noticed that it makes a huge difference for the test case if one loads from disk the file or one gets the input from within the test script as in now.

I don't quite get this actually. The test script writes a file to disk, and tests it. Shouldn't that cover the use case you mention? If not, can you make a small test that fails, and I can work on fixing it?

@larsoner larsoner changed the title WIP: Allow unicode in code and outputs MRG: Allow unicode in code and outputs Mar 14, 2016
@larsoner
Copy link
Contributor Author

Okay @Titan-C I switched to using codecs where possible. Tests pass over here an 2.7 and 3.4, and things render properly for our repo. Ready to go from my end.

If you see degenerate cases, could you try to turn them into failing tests, and either post them here or open a PR into my branch? Without seeing those bits of code, it will be hard for me to get it right.

@agramfort
Copy link
Contributor

one travis build is not happy

@larsoner
Copy link
Contributor Author

Yeah I saw that, but it looks unrelated...?

@larsoner
Copy link
Contributor Author

(Some error with Pygments)

@larsoner
Copy link
Contributor Author

@Titan-C the example renders fine now, see what you think

print('pass')

###############################################################################
# And then:
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Titan-C if you know something more intelligent to put here let me know :)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

something very smart from my notes

# -*- coding: utf-8 -*-
r"""
=================================================
Some Quantum Mechanics, filling an atomic orbital
=================================================

Considering an atomic single orbital and how to fill it by use of the
chemical potential. This system has a four element basis, :math:`B =
\{ \lvert \emptyset \rangle, \lvert \uparrow \rangle, \lvert
\downarrow \rangle, \lvert \uparrow\downarrow \rangle \}`, that is the
empty orbital, one spin up electron, one spin down electron and the
filled orbital.

The environment of the orbital is set up by an energy cost for
occupying the orbital, that is :math:`\epsilon` and when both
electrons meet a contact interaction corresponding to the Coulomb
repulsion :math:`U`. Finally the chemical potential :math:`\mu` is
what allows in the Grand canonical picture, to fill up our atomic
orbital from a reservoir of electrons.

 The the simple Hamiltonian to model this system is given by:

.. math::
   \mathcal{H} =
        \sum_{\sigma=\uparrow,\downarrow} \epsilon c^\dagger_\sigma c_\sigma
       + Un_\uparrow n_\downarrow - \mu \hat{N}

Here :math:`c^\dagger,c` creation and annihilation operators,
:math:`n=c^\dagger c`, and
:math:`\hat{N}=n_\uparrow+n_\downarrow`. This Hamiltonian is diagonal
in the basis of particle number we have chosen earlier, as the basis
elements are also eigenvectors.

.. math::
   \mathcal{H} \lvert \emptyset \rangle &= 0 \\
   \mathcal{H} \lvert \uparrow  \rangle &= (\epsilon - \mu) | \uparrow  \rangle \\
   \mathcal{H} \lvert \downarrow  \rangle &= (\epsilon - \mu) | \downarrow  \rangle \\
   \mathcal{H} \lvert \uparrow\downarrow \rangle &= (2\epsilon - 2\mu +U) \lvert \uparrow\downarrow \rangle

It is easy to see, that the system will prefer to be empty if
:math:`\mu \in [0,\epsilon)`, be single occupied if :math:`\mu \in (\epsilon, \epsilon +U)`
and doubly occupied if :math:`\mu > \epsilon +U`.

For a more rigorous treatment, the partition function has to be calculated and then
the expected particle number can be found. Introducing a new variable
:math:`\xi = \epsilon - \mu`, and :math:`\beta` corresponding to the
inverse temperature of the system.

.. math::
   \mathcal{Z} &= Tr(e^{-\beta \mathcal{H}}) = 1 + 2e^{-\beta\xi} + e^{-\beta(2\xi + U)} \\
   \langle \hat{N} \rangle &= \frac{1}{\beta} \frac{\partial}{\partial \mu} \ln \mathcal{Z}
"""

import matplotlib.pylab as plt
import numpy as np
mu = np.linspace(0, 3, 800)
for b in [10, 20, 30]:
    n = 2 * (np.exp(b * (mu - 1)) + np.exp(b * (2 * mu - 3))) / \
        (1 + np.exp(b * (mu - 1)) * (2 + np.exp(b * (mu - 2))))
    plt.plot(mu, n, label=r"$\beta={}$".format(b))
plt.xlabel(r'$\mu$ ($\epsilon=1$, $U=1$)')
plt.ylabel(r'$\langle N \rangle=\langle n_\uparrow \rangle+\langle n_\downarrow\rangle$')
plt.legend(loc=0)
plt.show()

@agramfort
Copy link
Contributor

thanks heaps @Eric89GXL !

@@ -527,7 +531,7 @@ def execute_script(code_block, example_globals, image_path, fig_count,

# Breaks build on first example error

if gallery_conf['abort_on_example_error']:
if gallery_conf.get('abort_on_example_error', True):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The default is false and is set up in gen_gallery.py

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not always set -- in our tests, if I had something raise an error at an appropriate time, I got an error here for not having this property

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not always set -- in our tests

You are right. It is not set in the default dictionary, but in the gallery build configuration. Certainly we want tests to fail immediately, but the gallery to continue the build even if some examples fail as the defaults. I'll have to keep track of this in #97, there I'm writing a helper function for the tests to set defaults.

One thing I do prefer is to have the defaults all in one place and not scattered in the code. For now I think we can leave it like that.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I agree about keeping them in one place. I'll add a comment that it should be unified if possible.

@larsoner
Copy link
Contributor Author

@Titan-C merged your PR, any other comments or does it work correctly for you now?

@Titan-C
Copy link
Member

Titan-C commented Mar 17, 2016

It works for my known cases.
+1

@Titan-C Titan-C changed the title MRG: Allow unicode in code and outputs [MRG+1]: Allow unicode in code and outputs Mar 17, 2016
@Titan-C
Copy link
Member

Titan-C commented Mar 17, 2016

Just came to my mind. Can you put a line about this on the CHANGES.rst file, please.

@@ -93,6 +97,18 @@ def flush(self):
self.file2.flush()


class MyBytesIO(BytesIO):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Better name for the class ? I don't have any great suggestion I am afraid ...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I couldn't think of a better one either, but if someone else has an idea I'm happy to change it. I used the My prefix because eventually it goes to a variable named my_buffer.

@@ -291,7 +291,7 @@ def setup(app):
# Do not pop up any mayavi windows while running the
# examples. These are very annoying since they steal the focus.
mlab.options.offscreen = True
except ImportError:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably this should not be part of this PR. I seem to remember that elsewhere there was a except Exceptionsince importing some libraries can raise all sorts of exceptions (that's what the comment say).

I can change this directly in master.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had to have it in order to test on my system, so yes please put it in master, I can open another PR, or just modify it here if you don't mind having the orthogonal change here

@lesteve
Copy link
Member

lesteve commented Mar 17, 2016

My understanding is that your fix requires the example files to be encoded in utf-8. I am wondering how badly this assumption can backfire ...

My gut feeling is that this PR is a real improvement though.

@larsoner
Copy link
Contributor Author

My understanding is that your fix requires the example files to be encoded in utf-8. I am wondering how badly this assumption can backfire ...

Well ASCII or UTF-8, yeah. Previously they had to be ASCII only, so it is an improvement even if it doesn't make it universal.

@@ -39,7 +39,7 @@ install:
if [ "$PYTHON_VERSION" == "2.7" ]; then
conda install --yes --quiet mayavi;
conda upgrade --yes --all;
conda upgrade --yes pyface;
pip install --upgrade pyface;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is also an orthogonal change, but was necessary to make the CIs happy...

@larsoner
Copy link
Contributor Author

Comments addressed

@larsoner
Copy link
Contributor Author

Moved the orthogonal Exception change to #107

@larsoner larsoner mentioned this pull request Mar 17, 2016
super(MyBytesIO, self).write(data)

def getvalue(self):
return super(MyBytesIO, self).getvalue().decode('utf-8')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This class is actually a bit weird since it derives from BytesIO but .getvalue returns a string.

It seems like this is working for me:

class NonUnicodeFriendlyStringIO(StringIO):
    def write(self, data):
        if not isinstance(data, unicode):
            data = data.decode('utf-8')
        super(StringIO, self.).write(data)

and then later:

my_buffer = NonUnicodeFriendlyStringIO()

larsoner and others added 11 commits March 17, 2016 11:08
The conda virtualenv for Mayavi forces a version of pyface that clashes
with sphinx. The manual update of pyface within conda is no longer
enough to update to a new version that does not clash with sphinx. Thus
the update is forced through pip.

Mayavi is an experimentally supported use case of Sphinx-Gallery
For the unicode testing purposes There was the need of an example
breaking it. Having Latex with raw strings and the \u from \uparrow was
a good test.
@@ -505,7 +527,8 @@ def execute_script(code_block, example_globals, image_path, fig_count,
fig_count += 1 # raise count to avoid overwriting image

# Breaks build on first example error

# XXX This check can break during testing e.g. if you uncomment the
# `raise RuntimeError` by the `my_stdout` call, maybe use `.get()`?
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lesteve here you go

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this, we should probably fix it at one point.

@larsoner
Copy link
Contributor Author

Changed the class and got rid of the .get() with a comment that should help the next dev take a look if they want

@lesteve
Copy link
Member

lesteve commented Mar 17, 2016

OK LGTM, merging.

@lesteve
Copy link
Member

lesteve commented Mar 17, 2016

Thanks a lot for the fix!

lesteve added a commit that referenced this pull request Mar 17, 2016
[MRG+1]: Allow unicode in code and outputs
@lesteve lesteve merged commit e2aaf4c into sphinx-gallery:master Mar 17, 2016
@larsoner larsoner deleted the unicode branch March 17, 2016 15:39
@larsoner
Copy link
Contributor Author

Thanks for the quick reviews

@agramfort
Copy link
Contributor

🍻 !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants