Thanks to visit codestin.com
Credit goes to github.com

Skip to content

MRG: Only rebuild necessary parts #448

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 5 commits into from
Mar 4, 2019

Conversation

larsoner
Copy link
Contributor

@larsoner larsoner commented Feb 25, 2019

This writes a few files with a .new extension and replaces the old one only if necessary (e.g., .py, .rst files) to avoid Sphinx rebuilding them. Most of the added lines are new tests adapted from #466.

@NicolasHug @jklymak can you see if it works for you?

Closes #449
Closes #446
Closes #400
Closes #395
Closes #394

@jklymak
Copy link
Contributor

jklymak commented Feb 25, 2019

Does this need conf.py to have a change or does it just work?

@larsoner
Copy link
Contributor Author

It should just work

@jklymak
Copy link
Contributor

jklymak commented Feb 25, 2019

On sphinx-gallery:

  • first call: 11.44 s
  • second call: 4.4s
  • change one tutorial: 4.8 s

Checking matplotlib; update in about 30 minutes 😉

@jklymak
Copy link
Contributor

jklymak commented Feb 25, 2019

In matplotlib, after running git clean -xdf in the doc directory, I get an exception when I run
time make SPHINXOPTS= html (i.e. on the first run)

Exception occurred:
  File "/Users/jklymak/sphinx-gallery/sphinx_gallery/utils.py", line 98, in get_md5sum
    with open(src_file, 'rb') as src_data:
FileNotFoundError: [Errno 2] No such file or directory: '/Users/jklymak/matplotlib/doc/api/_as_gen/matplotlib.pyplot.Subplot.examples.new'

Note that there are lots of *.new files in that directory, so not entirely clear what didn't work...

@NicolasHug
Copy link
Contributor

On sklearn sphinx tells me that it's seeing about 60 changed files (instead of hundreds) and the second build runs in a few seconds.
In other words: looks like it works \o/

@jklymak
Copy link
Contributor

jklymak commented Feb 25, 2019

I wonder if the above issue is something funky about the cases:

ls api/_as_gen/matplotlib.pyplot.Subplot*
matplotlib.pyplot.Subplot.examples              matplotlib.pyplot.subplot_tool.rst
matplotlib.pyplot.subplot.rst                   matplotlib.pyplot.subplots.examples
matplotlib.pyplot.subplot2grid.examples         matplotlib.pyplot.subplots.rst
matplotlib.pyplot.subplot2grid.rst              matplotlib.pyplot.subplots_adjust.examples.new
matplotlib.pyplot.subplot_tool.examples         matplotlib.pyplot.subplots_adjust.rst

Note that Subplot.examples corresponds with subplots.rst.

@codecov-io
Copy link

codecov-io commented Feb 25, 2019

Codecov Report

Merging #448 into master will increase coverage by 0.06%.
The diff coverage is 97.76%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #448      +/-   ##
==========================================
+ Coverage   96.15%   96.21%   +0.06%     
==========================================
  Files          29       29              
  Lines        2418     2564     +146     
==========================================
+ Hits         2325     2467     +142     
- Misses         93       97       +4
Impacted Files Coverage Δ
sphinx_gallery/binder.py 93.45% <ø> (ø) ⬆️
sphinx_gallery/docs_resolv.py 83.85% <ø> (ø) ⬆️
sphinx_gallery/notebook.py 100% <ø> (ø) ⬆️
sphinx_gallery/scrapers.py 96.22% <ø> (ø) ⬆️
sphinx_gallery/py_source_parser.py 95.18% <100%> (+0.11%) ⬆️
sphinx_gallery/tests/test_sorting.py 100% <100%> (ø) ⬆️
sphinx_gallery/gen_gallery.py 91.37% <100%> (-0.09%) ⬇️
sphinx_gallery/gen_rst.py 98.21% <100%> (-0.02%) ⬇️
sphinx_gallery/utils.py 96.55% <100%> (+1.42%) ⬆️
sphinx_gallery/sorting.py 100% <100%> (ø) ⬆️
... and 3 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update bfc6612...a03f72e. Read the comment docs.

@larsoner
Copy link
Contributor Author

larsoner commented Feb 25, 2019

@jklymak I can't replicate that error in building the matplotlib doc. First build I get real 5m7.390s and second real 1m1.422s, no errors. And this is what I see on my file system:

$ ls api/_as_gen/matplotlib.pyplot.Subplot*
api/_as_gen/matplotlib.pyplot.Subplot.examples

I don't know why you see any other files since the case does not match. Are you on Windows? Maybe there is some file-case-insensitivity problem...?

Is it possible you are on some broken matplotlib branch and not clean latest master?

Looking at the code, I'm not sure how it's possible you see this error. Can you paste the full traceback? Is it in finalize_backreferences?

Note that Subplot.examples corresponds with subplots.rst.

Regardless of where Subplot.examples is included in the RST files, the Subplot.examples file should be first written as an empty file, then with meaningful content as Subplot.examples.new, then renamed (if md5sum differs).

Nonetheless can you try the latest commit?

@larsoner
Copy link
Contributor Author

Okay I'm thinking this is a file case sensitivity problem. There is plt.subplot and plt.Subplot. I'm not sure there is a clean fix for this, other than to say "don't build on windows" :(

I can make the code not produce an error, but the output will still not be correct.

@larsoner
Copy link
Contributor Author

Commit pushed to avoid the error and emit a warning.

@larsoner larsoner changed the title FIX: Only rebuild necessary parts MRG: Only rebuild necessary parts Feb 25, 2019
@larsoner
Copy link
Contributor Author

Ready for review/merge from my end. @choldgraf feel free to have a look if you have time.

@jklymak
Copy link
Contributor

jklymak commented Feb 25, 2019

I'm on a mac, so there are definite case issues (subplot = Subplot)....

@jklymak
Copy link
Contributor

jklymak commented Feb 25, 2019

Works now, so long as I don't fail on warning for matplotlib. OTOH:

This PR:

  • 563s for first build,
  • 85s for second build no changes
  • 250s changed one file in tutorials; Most of time is spent in reading sources... api/_as_gen/*

Master:

  • 122s second build no changes; lots of time spent in reading sources...
  • 258s for one file changed in tutorials; Most of time is spent in reading sources... api/_as_gen/*

So, this doesn't particularly speed up incremental building (8s has to be in the noise compared to other stuff I'm doing on my machine)...

@larsoner
Copy link
Contributor Author

Maybe your sphinx is old / bad at detecting changes? For me I get:

  • 311s for first build
  • 61s for second build no changes (almost all time in writing output)
  • 62s for third build after adding 1 char to the intro paragraph of tutorials/introductory/images.py (which SG reports as being rebuilt, and I can verify to be the case viewing the output)

I can confirm this behavior makes sense by:

$ ls -alt api/_as_gen/ | more
total 9356
-rw-r--r-- 1 larsoner larsoner  69798 Feb 25 16:43 matplotlib.pyplot.figure.examples
-rw-r--r-- 1 larsoner larsoner   8924 Feb 25 16:43 matplotlib.pyplot.colorbar.examples
-rw-r--r-- 1 larsoner larsoner   9462 Feb 25 16:43 matplotlib.pyplot.imshow.examples
-rw-r--r-- 1 larsoner larsoner    890 Feb 25 16:43 matplotlib.image.imread.examples
-rw-r--r-- 1 larsoner larsoner   2076 Feb 25 16:43 matplotlib.pyplot.hist.examples
-rw-r--r-- 1 larsoner larsoner      0 Feb 25 16:32 mpl_toolkits.mplot3d.Axes3D.text.examples
-rw-r--r-- 1 larsoner larsoner      0 Feb 25 16:32 mpl_toolkits.mplot3d.Axes3D.quiver.examples
-rw-r--r-- 1 larsoner larsoner      0 Feb 25 16:32 mpl_toolkits.mplot3d.Axes3D.bar.examples
...

You can clearly see only five .examples were changed (the ones used by that example) when I ran it the for that third build.

Moreover, if instead of changing the introductory paragraph or title (which SG turns into a tooltip in the .examples, forcing it to change on the run) I change something in the main body text of the script (e.g., specific argument to specificc argument) there are no .examples files changed at all, so it has even less of an effect. But even changing a few of these .examples files should not be catastrophic.

Did you change some other example? If so, maybe it has worse consequences and we can track down why. Can you see if you can replicate what I see by just changing tutorials/introductory/images.py? If it builds fast for you in that case, I'm curious which example you changed that made it take so much time in the reading sources stage. If it does not build quickly for you, perhaps try updating Sphinx to see if it helps.

@larsoner
Copy link
Contributor Author

... and as a further diagnostic, you can look at e.g. ls -alt tutorials/introductory/ to see that only a few files have changed (the *codeobj.pickle files will have changed but should not matter since Sphinx does not use these)

@jklymak
Copy link
Contributor

jklymak commented Feb 25, 2019

EDIT: This is all updated, my appologies if you have been following along....

I just updated to sphinx 1.8.4. This is on a Mac.

OK, thats bizarre:

Just changing tutorials/introductory/images.py was much quicker 67s:

building [mo]: targets for 0 po files that are out of date
building [html]: targets for 2264 source files that are out of date
updating environment: [] 0 added, 41 changed, 0 removed

If I change tutorials/intermediate/artists.py

building [mo]: targets for 0 po files that are out of date
building [html]: targets for 2264 source files that are out of date
updating environment: [config changed] 2264 added, 4 changed, 0 removed

and reading sources... takes forever, and the total build is ~200s.

If I change tutorials/intermediate/gridspec.py

building [mo]: targets for 0 po files that are out of date
building [html]: targets for 2264 source files that are out of date
updating environment: [config changed] 2264 added, 4 changed, 0 removed

and it takes ~200s

tutorials/introductory/pyplot.py: fast....

building [mo]: targets for 0 po files that are out of date
building [html]: targets for 2264 source files that are out of date
updating environment: [] 0 added, 58 changed, 0 removed

examples/lines_bars_and_markers/linestyles.py: fast

building [mo]: targets for 0 po files that are out of date
building [html]: targets for 2264 source files that are out of date
updating environment: [] 0 added, 40 changed, 0 removed

So.... Something funky about the tutorials/intermediate directory in matplotlib...

If I go back to Master:

examples/lines_bars_and_markers/linestyles.py: 105s

building [mo]: targets for 0 po files that are out of date
building [html]: targets for 2264 source files that are out of date
updating environment: [] 0 added, 816 changed, 0 removed

So this PR definitely speeds things up, quite a bit....

@jklymak
Copy link
Contributor

jklymak commented Feb 25, 2019

The only weird thing about tutorials/intermediate is that it has two png files in it. Could those be triggering the big rebuild somehow?

$ ls -halt ../tutorials/intermediate/
total 140K
-rw-r--r--  1 jklymak staff  11K Feb 25 14:56 gridspec.py
-rw-r--r--  1 jklymak staff  28K Feb 25 14:52 artists.py
-rw-r--r--  1 jklymak staff  11K Feb 25 14:29 CL02.png
-rw-r--r--  1 jklymak staff  11K Feb 25 14:29 CL01.png
-rw-r--r--  1 jklymak staff  30K Feb 25 14:10 constrainedlayout_guide.py
-rw-r--r--  1 jklymak staff  11K Feb 25 11:56 tight_layout_guide.py
drwxr-xr-x 12 jklymak staff  384 Feb 25 11:56 .
-rw-r--r--  1 jklymak staff  11K Feb 25 11:56 legend_guide.py
-rw-r--r--  1 jklymak staff 9.7K Feb 25 11:56 imshow_extent.py
-rw-r--r--  1 jklymak staff 4.0K Feb 25 11:56 color_cycle.py
drwxr-xr-x  9 jklymak staff  288 Sep 25 08:52 ..
-rw-r--r--  1 jklymak staff  213 Sep 25 08:52 README.txt

@jklymak
Copy link
Contributor

jklymak commented Feb 26, 2019

Confirmed, removing the *.png files

building [mo]: targets for 0 po files that are out of date
building [html]: targets for 2264 source files that are out of date
updating environment: [] 0 added, 43 changed, 0 removed

So, why do those png files trigger rebuilding 2264 source files if anything else in that directory is modified?

@larsoner
Copy link
Contributor Author

No idea, but it sounds like this PR is at least working then

@jklymak
Copy link
Contributor

jklymak commented Feb 26, 2019

@larsoner Its certainly much faster 👍 Thanks a lot for coming up with the correct solution!

OTOH, I am still not sure why it takes over 30 s to re-write all the html files. Some of the rest of the 60 seconds or so is taken up w/ cruft Matplotlib has added, but it seems that if the *.rst file doesn't change, then sphinx shouldn't re-generate the html.

I'll note that the sphinx-gallery build does the same thing - regenerates every html file on each make html.

@larsoner
Copy link
Contributor Author

Yes agreed, but I think that's a Sphinx issue. In principle it's only supposed to write for files that are added or changed.

@jklymak
Copy link
Contributor

jklymak commented Feb 26, 2019

OK, if I run make build on sphinx's docs the second time I get:

loading pickled environment... done
building [mo]: targets for 0 po files that are out of date
building [html]: targets for 0 source files that are out of date
updating environment: 0 added, 0 changed, 0 removed
looking for now-outdated files... none found
no targets are out of date.
build succeeded.

If I run it on sphinx-gallery the second time...

building [mo]: targets for 0 po files that are out of date
building [html]: targets for 46 source files that are out of date
updating environment: 46 added, 0 changed, 0 removed
reading sources... [100%] utils
/Users/jklymak/sphinx-gallery/doc/index.rst:113: WARNING: toctree contains reference to nonexisting document 'auto_mayavi_examples/index'
looking for now-outdated files... none found

and then it re-writes all the html files. Since there is only 46 of them, its pretty fast, but for matplotlib, there are 4000 of them, so it takes a while.

A jaunt through sphinx-doc code, says that this coms about because somehow the .buildinfo file isn't correct: if self.build_info != buildinfo gets triggered in html.py and fallback to all the html files getting rergenerated. I have no idea if this is because of something sphinx-gallery is doing, or one of the modules shared by sphinx-gallery and Matplotlib.

Feel free to open a new issue if this is the incorrect place to discuss this. This PR removes a big performance drop, but there are others that seem fixable...

@jklymak
Copy link
Contributor

jklymak commented Feb 26, 2019

Removing that check in sphinx drops the matplotlib compile time from 71 s to 24 s. So the question is why does the buildinfo file get out of sync....

@larsoner
Copy link
Contributor Author

At least in that case, it's probably because you changed the conf.py file. The contents of buildinfo for me are just:

# Sphinx build info version 1
# This file hashes the configuration used when building these files. When it is not found, a full rebuild will be done.
config: b87786b3dc86aa52b49b20323bb1e240
tags: 645f666f9bcd5a90fca523b33c5a78b7

@larsoner
Copy link
Contributor Author

... and if I comment out sphinx_gallery.gen_gallery from the extensions, the config line changes, causing a full rebuild. That build takes 150 sec. A second build takes only 9 sec, though. So indeed something about including sphinx_gallery.gen_gallery makes it rebuild all outputs no matter what.

@jklymak
Copy link
Contributor

jklymak commented Feb 26, 2019

What do you mean I changed the conf file? I don’t touch that between builds.

When you run make do you get 46 files added? If so then the buildfile is still not registering properly somehow.

Edit: run make on Sphinx-gallery. If you run make on matplotlib it says 4000 odd files.

@larsoner
Copy link
Contributor Author

Ahh sorry I misunderstood what you did. I'll take a look to see what's causing the rebuild and see if I can address it here. I have a sneaking suspicion it's our "Sphinx hack" clean_gallery_out function. We are deleting all SG images. We should probably only delete ones for examples that we run (i.e., make it something we clean up before running a given example only, not something we always do). But in any case I think you're right that there is probably still something wrong at the SG end.

@larsoner
Copy link
Contributor Author

larsoner commented Feb 26, 2019

@jklymak I think I have narrowed it down to these to conf.py entries:

    'subsection_order': gallery_order.sectionorder,
    'within_subsection_order': gallery_order.subsectionorder,

I'm guessing that it's because Sphinx can't tell if these config values have changed or not, so decides to re-write everything. With those two lines as they are in mpl master:

  • first run: 328s
    building [html]: targets for 876 source files that are out of date
    updating environment: 2824 added, 0 changed, 0 removed
    Sphinx-gallery successfully executed 521 out of 521 files subselected by:
    
  • second run: 70s
    building [html]: targets for 2824 source files that are out of date
    updating environment: [] 0 added, 35 changed, 0 removed
    Sphinx-gallery successfully executed 0 out of 0 files subselected by:
    
  • change tutorials/intermediate/artists.py: 73s
    building [html]: targets for 2824 source files that are out of date
    updating environment: [] 0 added, 42 changed, 0 removed
    Sphinx-gallery successfully executed 1 out of 1 file subselected by:
    

If I comment out those two lines and do make clean, then:

  • first run: 328s
    building [html]: targets for 876 source files that are out of date
    updating environment: 2824 added, 0 changed, 0 removed
    Sphinx-gallery successfully executed 521 out of 521 files subselected by:
    
  • second run: 19s
    building [html]: targets for 34 source files that are out of date
    updating environment: [] 0 added, 35 changed, 0 removed
    Sphinx-gallery successfully executed 0 out of 0 files subselected by:
    
  • change tutorials/intermediate/artists.py: 24s
    building [html]: targets for 35 source files that are out of date
    updating environment: [] 0 added, 42 changed, 0 removed
    Sphinx-gallery successfully executed 1 out of 1 file subselected by:
    

@jklymak can you confirm?

If so, what we could do is allow these entries to be strings and do some importlib magic with them, rather than allowing them to be classes directly. That should allow sphinx to treat them as static, rather than unknowns. In this case making changes to your classes will not be reflected in the build, but this should be (much) rarer than the case of changing a single example, and also can be worked around with a simple make clean (or removing the quotes from the name) when you want to do this.

@jklymak
Copy link
Contributor

jklymak commented Feb 26, 2019

@jklymak can you confirm?

I can confirm the above, except for the fact that my computer is slower than yours 😉

@jklymak
Copy link
Contributor

jklymak commented Feb 26, 2019

@jklymak can you see if the new config var is okay for you?

This works fine. Set to 'info', and MPL build works out of the box.

OTOH, this seems a bit hacky. I'm still a little confused about what is happening; I'm on APFS, which is case-aware, but probably case insensitive because I migrated from HFS+ on this machine. It seems that the code to make the "*.new" files might be able to figure out that "Subplot" should be "subplot"? It would be a strange library that had case-conflicts in their module names like that.

@larsoner
Copy link
Contributor Author

OTOH, this seems a bit hacky.

It seems like the right behavior to warn by default, and the mechanism added here (dict of override values) is pretty standard (see np.errstate and similar functionality).

A sphinx build on a case insensitive file system with a lib that requires case sensitivity in its submodules seems like it must break autosummary already. Do the docs for plt.subplot and plt.Subplot actually look correct on your system? If so, we can look into what autosummary does to deal with this problem, but not in this PR. It should be a pretty rare corner case (libs that do this are not common AFAIK, and devs building on Windows, the most frequent of case-insensitive file systems, are fairly rare, too -- so the intersection shouldn't be huge).

@jklymak
Copy link
Contributor

jklymak commented Feb 26, 2019

Just to be clear, I'm on a mac... 😉

@jklymak
Copy link
Contributor

jklymak commented Feb 27, 2019

OK, well maybe this is a different bug:

  • there is no pyplot.Subplot in matplotlib, its pyplot.subplot.
  • For some reason on my machine, sg makes /Users/jklymak/matplotlib/doc/api/_as_gen/matplotlib.pyplot.Subplot.examples.

I tracked this down to a few cases where matplotlib gallery entries called plt.Subplot instead of plt.subplot. Removing those un-broke the doc build for me. So... Maybe this warning should stay in, because it indicated a bug?

@larsoner
Copy link
Contributor Author

Yes I think warn is the correct default, but I don't mind starting a log_level config var to control the logging levels. It's only a few lines and people might find it useful.

assert method in ('move', 'copy')
if fname_old is None:
assert fname_new.endswith('.new')
fname_old = fname_new[:-4]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

small nitpick, in my experience this is a bit more dangerous than just doing os.path.splitext(fname_new)[0]...it seems trivial but I've been bitten by it before when I change code in the future

@choldgraf
Copy link
Contributor

this looks really nice to me in general - and to be clear, it sounds like this doesn't require any new behavior on the part of the user, yeah? If that's the case, once we're sure that @jklymak is happy with the current behavior and since tests are passing, then I'm +1 on merge.

(I added a small nitpick comment but it shouldn't block on merging)

@jklymak
Copy link
Contributor

jklymak commented Mar 3, 2019

Seems great to me!

@choldgraf
Copy link
Contributor

will merge tomorrow unless anybody speaks up otherwise! :-)

@larsoner
Copy link
Contributor Author

larsoner commented Mar 4, 2019

this doesn't require any new behavior on the part of the user, yeah?

Correct.

in my experience this is a bit more dangerous than just doing ...

Done!

@choldgraf
Copy link
Contributor

had another appveyor config conflict...I think I resolved it properly, let's see if tests are happy then I'll merge if so!

@choldgraf
Copy link
Contributor

ok tests are happy so I'm 🚢 ing it

@choldgraf choldgraf merged commit 9d8a1af into sphinx-gallery:master Mar 4, 2019
@jklymak
Copy link
Contributor

jklymak commented Mar 4, 2019

Thanks a lot @larsoner - this will really speed things up!

@larsoner larsoner deleted the test-rebuild branch March 4, 2019 19:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants