Thanks to visit codestin.com
Credit goes to github.com

Skip to content

API: Refactor image scraping #313

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 10 commits into from
Aug 13, 2018
Merged

Conversation

larsoner
Copy link
Contributor

@larsoner larsoner commented Nov 8, 2017

This PR:

Provides potential end-user-derived solutions for:

Closes #316.

@larsoner
Copy link
Contributor Author

larsoner commented Nov 8, 2017

Looks like CircleCI has broken independently of my changes -- it updated numpy/scipy but apparently not Mayavi (since there is a numpy version number mismatch)

@larsoner
Copy link
Contributor Author

larsoner commented Nov 9, 2017

@choldgraf
Copy link
Contributor

This is a nice refactor! I'll take a closer look in the coming days (teaching a software carpentry @ sfn tomorrow).

A quick question: this doesn't handle the HTML viz condition I was talking about in #313 right? In that case I was talking about actually grabbing the generated HTML of libraries, to create pages like http://ipyvolume.readthedocs.io/en/latest/bokeh.html

@lesteve
Copy link
Member

lesteve commented Nov 9, 2017

Just glancing at this, it looks quite neat! My main concern is something that I stated elsewhere: I am very reluctant to have captured pngs looking the same as captured figures. Maybe the capturer could output some rst, rather than just saving images to the right location?

@larsoner
Copy link
Contributor Author

larsoner commented Nov 9, 2017

I am very reluctant to have captured pngs looking the same as captured figures.

Hmm... how should they look different?

@larsoner
Copy link
Contributor Author

larsoner commented Nov 9, 2017

this doesn't handle the HTML viz condition I was talking about in #313 right?

Right, that would require a bigger refactoring. Maybe we need image_scrapers for Python plotting libs (matplotlib, mayavi, eventually maybe vispy) since these all have a common, simple interface, and object_scrapers where you need to define both the scraping behavior and the RST output since these have more things to define, like what @Titan-C talked about.

@lesteve
Copy link
Member

lesteve commented Nov 9, 2017

Hmm... how should they look different?

Maybe something like this (i.e. one section per capturer + explicit filenames for the "saved files" capturer)? Better suggestions, more than welcome!

mockup

Having one section per capturer allows to be a bit more clear than what we currently have. At the moment matplotlib and mayavi are mixed and the order can potentially be confusing because matplotlib figures are always captured before mayavi ones. See snapshot at the end of this message.

At the very least for the saved files capturer the filenames should be visible to be able to do easy matching between the script (filenames) and the output (images). Without the saved files filenames, the order of the saved files images is likely not obvious.

Attaching the mockup .svg as text file, in case it is useful if someone wants to tweak it:
mockup.svg.txt

Snapshot for part of mayavi_examples/plot_3d.py:
mayavi-order-snap
The matplotlib figure appears first in the output although it is plotted last in the example.

@larsoner
Copy link
Contributor Author

larsoner commented Nov 9, 2017

FWIW the current PR makes it so that the order in image_scrapers determines the order they get embedded -- if you do ['mayavi', 'matplotlib'] instead then they will be mayavi-first matplotlib-second. It's not ideal (ideally it would go in the order of the script) but I don't know how you'd do that without some very difficult and possible fragile logic.

What you propose about having different sections is a bit of a backward compat change (this PR preserves existing behavior). Is that okay to impose since it's an aesthetic change (as it technically shouldn't break any code just reorganize visually)?

@larsoner
Copy link
Contributor Author

larsoner commented Nov 9, 2017

Personally, I don't mind not having separate sections for each image type. After all, if people really want separate plots for separate types, they can always make separate code sections...

@GaelVaroquaux
Copy link
Contributor

GaelVaroquaux commented Nov 9, 2017 via email

@choldgraf
Copy link
Contributor

+1 to @larsoner as well...not totally against the idea of different visual cues but I don't think it needs to be in this PR

@GaelVaroquaux
Copy link
Contributor

GaelVaroquaux commented Nov 9, 2017 via email

@larsoner
Copy link
Contributor Author

larsoner commented Nov 9, 2017

Different css class for the div seem like a good idea: it opens the possibility of doing that.

One future-compatible change I could make in this PR is ensure that each scraper exposes a .name property. This would be 'matplotlib' and 'mayavi' for the existing ones. Then eventually in another PR we could give each image a class-specific CSS class when populating the RST text.

@choldgraf
Copy link
Contributor

+1 seems like a good compromise

@lesteve
Copy link
Member

lesteve commented Nov 9, 2017

Personally, I don't mind not having separate sections for each image type. After all, if people really want separate plots for separate types, they can always make separate code sections...

Do I understand correctly that everybody agree it's fine to force users to use notebook-like syntax (aka code-cells) and essentially have one image output per cell, in order to avoid possible confusion in the output? It feels quite a bit kludgy to me I have to say but if that's the consensus fine.

I was hoping that the capturer contract would be a bit more than just saving an image in the right location. Returning some rst on top of saving the image was a suggestion but maybe there a better way of doing it. This way if a user wants to implement a PNGScrapper that has some kind of text with the filename he can do it.

@GaelVaroquaux
Copy link
Contributor

GaelVaroquaux commented Nov 9, 2017 via email

@larsoner
Copy link
Contributor Author

Do I understand correctly that everybody agree it's fine to force users to use notebook-like syntax (aka code-cells) and essentially have one image output per cell, in order to avoid possible confusion in the output?

This is essentially the current behavior / status quo. I agree it's not ideal, but it's certainly nonetheless useful.

I was hoping that the capturer contract would be a bit more than just saving an image in the right location. Returning some rst on top of saving the image was a suggestion but maybe there a better way of doing it. This way if a user wants to implement a PNGScrapper that has some kind of text with the filename he can do it.

What if we say the contract in this PR is merely a "minimal contract" change?

  • The master contract is essentially "mpl/mayavi images are embedded automatically in the order [mpl, mayavi]".
  • The current PR says "you can automatically embed mpl/mayavi images, 1) or any other custom images you choose, 2) in a user-specified (per-scraper) order".
  • In the future we can add other optional elements, such as a way to set the RST output instead of using the default image-embedding one ("you can embed whatever you want").

In the last case example (what @Titan-C and @choldgraf have talked about I think), maybe we will want to generalize the existing parameter to handle this, or maybe it will make more sense to make an entirely different class type. In either case, I don't think these are precluded by merging this PR. The current proposal seems pretty future compatible, i.e. shouldn't lead to too clunky interfaces when the API of these more advanced contracts are decided. In the meantime, it opens up the possibility to fix two existing problems: including saved images (#206) and including images from other viz libraries (my use case, VisPy).

In this light, even adding of a .name property to use to populate the CSS doesn't need to be done here. When someone hits a use-case for it, we can make it such that if the callable has a .name property, add it to the CSS definitions, if not, don't (current behavior).

@lesteve
Copy link
Member

lesteve commented Nov 16, 2017

I have created a quick and dirty PR on top of yours to show that supporting rst in scrapers is not so much work on top of the work you have already done. See larsoner#3.

My main worry is that with the interface of the scraper where you only save images, there is no way to have a PNG saved image scraper where the ordering is obvious from just looking at the example HTML. People (like yt) may want to use this feature for capturing saved PNG images, they'll have multiple images in some cells and they'll be confused by the ordering. For an example where the output is confusing in yt, look at http://yt-project.org/docs/dev/cookbook/simple_plots.html#showing-and-hiding-axis-labels-and-colorbars.

Admittedly this kind of development could be done in a further PR. I kind of feel there is enough momentum behind this refactoring to have scraper return rst. The risk of doing it in separate PRs is that it may not happen in the near future and then we'll have this non-optimal scraper interface for a while, and then we'll need to think about deprecating it, with all the pain it entails for both maintainers and users.

@larsoner
Copy link
Contributor Author

Feel free to take over

@larsoner
Copy link
Contributor Author

The risk of doing it in separate PRs is that it may not happen in the near future and then we'll have this non-optimal scraper interface for a while, and then we'll need to think about deprecating it, with all the pain it entails for both maintainers and users.

My point above is that I think that such future enhancements (returning the RST) are future-compatible with the current approach. For example, we can make it such that:

  • if the scraper passes back just an int (current PR), it automatically does cell_figure_rst([image_path.format(offset + ii)]). Even if we go your PR route, we might want this behavior anyway, as it simplifies the user interface (no need for them to deal with RST in this case).
  • If it passes back a tuple, it should be (RST, count).

If we also require that people add a **kwargs option to their function, we guarantee some future compatibility, too, as we can expand the argument list as much as we want later. We might want this even with your PR's version, actually.

Do you agree that this would be future compatible with your proof-of-concept PR without the need for a deprecation cycle?

supporting rst in scrapers is not so much work on top of the work you have already done

I can see that the proof of concept probably wasn't too much work. However, as I'm sure you know, there is more work involved to get things fully fleshed out, working properly without any XXXs, fully tested, as backward compatible as possible (or getting people to agree with the changes, e.g., line-by-line), and supporting the use cases you and @choldgraf and @Titan-C want. I'd rather that someone motivated to work on this take over if possible. But if you don't buy my arguments about this being future compatible, I agree we shouldn't merge this PR, and I can try to find time to complete your proof of concept later (unless someone else does want to take over).

@lesteve
Copy link
Member

lesteve commented Nov 16, 2017

I do get all of your points of course ... basically you say the minimal amount of change already allows a lot of use-cases that weren't there before. I say: oh but we are so close to something that is generic that we should just implement the generic solution.

I am happy for other to jump in and give their opinions.

I am not so fond of a scraper interface that allows a lot of possible outputs I would say. It makes the code harder to maintain IMO.

Maybe I'll try to work on my PR to your PR a bit more, tidy it up a bit and try to convince you that the scaper interface returning a rst is within reach with not so much effort.

@choldgraf
Copy link
Contributor

My 2 cents: I think that a general image scraper functionality would be a great addition, though I don't think it should be blocking on this PR. This effectively takes us from N=2 to N=3 image production approaches, no? I think that's a valuable contribution in itself, and shouldn't be impeded just because we want to go to N > 3, no?

If @larsoner is correct that this PR lays a foundation for general scraping functionality, why not merge this PR after the next sphinx-gallery release and then there will be one development cycle's worth of time to generalize it per @lesteve's suggestions.

@lesteve
Copy link
Member

lesteve commented Nov 17, 2017

If @larsoner is correct that this PR lays a foundation for general scraping functionality, why not merge this PR after the next sphinx-gallery release and then there will be one development cycle's worth of time to generalize it per @lesteve's suggestions.

Thanks for this suggestion, sounds very reasonable to me.

@Titan-C
Copy link
Member

Titan-C commented Nov 17, 2017

If @larsoner is correct that this PR lays a foundation for general scraping functionality, why not merge this PR after the next sphinx-gallery release and then there will be one development cycle's worth of time to generalize it per @lesteve's suggestions.

I'm completely in favour of this PR making the matplotlib and mayavi capturers independent, and starting to build the logic for adding new capturers. I do not agree putting in our documentation without mention of being an experimental feature a suggestion for a scraper. In general I don't like scrapers as they target everything, we need something that is specific to the output. As I commented in #208 (comment), what if there are other png is the directory one is scraping being source pngs to work on, those will be moved as output.

For a later iteration we can indeed figure out how to capture any object and output their rst representation.

@larsoner
Copy link
Contributor Author

Thanks for this suggestion, sounds very reasonable to me.

In that case I might rename the var to scrapers instead of image_scrapers.

In general I don't like scrapers as they target everything, we need something that is specific to the output.

I thought this at first (we'd probably want separate classes / config vars) but from the discussion about both picking images and embedding RST, I don't think we need to anymore. The current API is hopefully sufficiently extendable / future compatible to allow scraping and embedding other objects, too.

I do not agree putting in our documentation without mention of being an experimental feature a suggestion for a scraper.

I'll add something saying that it's a half-baked stub implementation that should be tailored to suit an actual use case.

what if there are other png is the directory one is scraping being source pngs to work on, those will be moved as output.

In cases where people actually want a PNGScraper-like functionality, they can devise workable logic based on their particular use cases. For example, one could configure the scraper to only look for *_glr.png, or maybe plot_example_name_py_saved_*.png (we'd need to provide the currently running example name for this) and correspondingly ensure the files are named that way. Or include an "ignores" list in the class, then the set-based checking the stub currently uses should just work, assuming they only output images they want shown.

Hopefully after this PR is merged, @alexhuth, @ngoldbaum, or someone else who wants saved-image scraping can take the stub and flesh it out into something more generally useful, and we could include it as a proper class in SG.

So to me the todo list is:

  • rename image_scrapers -> scrapers
  • add requirement of **kwargs in scrapers for future compat
  • add current Python filename being parsed as arg to scraper
  • improve doc of PNG scraper stub to say it's incomplete / would need to be adapted to a particular use case
  • wait for release
  • merge
  • subsequent PRs to implement PNG scraping for real and add as SG class
  • subsequent PRs to implement other forms of scraping (e.g., HTML)

@lesteve
Copy link
Member

lesteve commented Nov 17, 2017

add requirement of **kwargs in scrapers for future compat

Since this is a half-baked experimental feature I really don't think we should worry about future compat.

@larsoner
Copy link
Contributor Author

Since this is a half-baked experimental feature I really don't think we should worry about future compat.

My plan is to very soon after this PR implement a scraper for VisPy. It would be nice if I didn't have to update it later. The **kwargs suggestion at least helps this probability, with what seems to be very little cost.

@lesteve
Copy link
Member

lesteve commented Nov 17, 2017

My plan is to very soon after this PR implement a scraper for VisPy. It would be nice if I didn't have to update it later. The **kwargs suggestion at least helps this probability, with what seems to be very little cost.

It would be nice but I don't think that's not the kind of guarantees that you should expect from half-baked experimental features :(. IMO the **kwargs just adds a quarter-baked promise on top on a half-baked experimental feature.

I think the scraper returning RST is just the way forward. This feels like the way that will create less combined effort from everyone involved. I'll try to find some time to make some progress in this direction.

@larsoner
Copy link
Contributor Author

that's not the kind of guarantees that you should expect from half-baked experimental features

I agree with this statement, but I don't think of it as being half-baked ...

I think the scraper returning RST is just the way forward.

I assume this is why you do. This does make the simple case of embedding images harder (PNG scraper, VisPy scraper, etc.), though. You are opposed to allowing int output triggering automatic PNG-RST insertion, whereas tuple of (int, str) is for custom RST?

@codecov-io
Copy link

codecov-io commented Aug 6, 2018

Codecov Report

Merging #313 into master will decrease coverage by 0.04%.
The diff coverage is 94.64%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #313      +/-   ##
==========================================
- Coverage   95.38%   95.33%   -0.05%     
==========================================
  Files          27       29       +2     
  Lines        2013     2166     +153     
==========================================
+ Hits         1920     2065     +145     
- Misses         93      101       +8
Impacted Files Coverage Δ
sphinx_gallery/tests/test_gen_rst.py 98.36% <100%> (-0.36%) ⬇️
sphinx_gallery/gen_rst.py 97.21% <100%> (+1.13%) ⬆️
sphinx_gallery/tests/test_scrapers.py 100% <100%> (ø)
sphinx_gallery/gen_gallery.py 87.55% <80%> (-2.39%) ⬇️
sphinx_gallery/utils.py 95.12% <90.9%> (-4.88%) ⬇️
sphinx_gallery/scrapers.py 96.22% <96.22%> (ø)
sphinx_gallery/backreferences.py 97.54% <0%> (+0.81%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 1d6015c...dd02d84. Read the comment docs.

@larsoner
Copy link
Contributor Author

larsoner commented Aug 7, 2018

Okay @lesteve I have refactored the code quite a bit, and updated the top-level description. See if you are satisfied by the description and API contracts described here (for image_scrapers and reset_modules):

https://537-25860190-gh.circle-artifacts.com/0/rtd_html/advanced_configuration.html

If so, then the code is ready for review/merge from my end.

Copy link
Member

@Titan-C Titan-C left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I really like this step toward being able to capture other objects.
Having the scraper output rst is the path forward, as stated earlier in this PR comments.
I really like the idea of the image_path_iterator, instead of tracking the amount of images captured.
Something that does come to my mind, is that at current state it assumes we are only capturing png files. But that can be later overridden inside the scraper itself.
A bit on the speculation side. I'm also thinking of having a general structure of capturing things. For example for now we capture STDOUT, but as specified earlier some functions do output nicer HTML representations, we might want to put that into an option of capturing things. But in other PR.

By default, Sphinx-gallery will only look for :mod:`matplotlib.pyplot` figures
when building. However, extracting figures generated by :mod:`mayavi.mlab` is
also supported. To enable this feature, you can do::
Image scrapers
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would like to add that this is an Experimental feature. Here in the Title and in the description. This is something we are trying out in order to capture different output objects

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIRC @lesteve wanted such a warning if we were not satisfied with the API. Do you think that it might need to change?

}

.. note:: The parameter ``find_mayavi_figures`` which can also be used to
extract Mayavi figures is **deprecated** in version 1.13+,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In our numbering we are 0.2.0, 1.13+ is way in the future


.. _reset_modules:

Resetting modules
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As @lesteve always reminds me. We should have different features in different PRs

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see these as a bit related, though, since if you don't want to use the mpl scraper then you probably also do not want to use the mpl resetter.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(And if you want to add your own scraper, you might need your own resetter.)

fig.savefig(current_fig, **kwargs)
figure_list.append(current_fig)
image_paths.append(image_path)
fig.savefig(image_paths[-1], **kwargs)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I somehow prefer to save the image first then append the filepath to the list. Do you have any reasons for this order? In my mind, if saving fails, then we don't have the image listed. In case of failure of course I expect an exception, and everything is irrelevant. I also see later on you have a check to scan for the registered images.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think the order matters. But since you prefer the other I can switch it.

e = mlab.get_engine()
for scene, image_path in zip(e.scenes, image_path_iterator):
image_paths.append(image_path)
mlab.savefig(image_paths[-1], figure=scene)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comment with ordering of save and path recording

Configuration and run time variables
gallery_conf : dict
Contains the configuration of Sphinx-Gallery
base_image_name = os.path.splitext(fname)[0]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is an error from the rebase. The content on the left is the correct one

@larsoner
Copy link
Contributor Author

@Titan-C comments addressed, and I also broadened the API to have the scrapers take the block and block_vars variables rather than just image_path_iterator, since for custom scrapers they might want to look for comments like # sg: nocapture or so.

@larsoner
Copy link
Contributor Author

... and I added a note about the API being experimental for custom scrapers.

Copy link
Member

@Titan-C Titan-C left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm good with this. In my PR #324 experimenting with Bokeh I also run into the need to have the save_figures function intake the block and the execution global dictionary. Taking the block_vars might not be enough as one needs to capture something in the state of the running program. That said and since this is experimental features and with the goal in mind of capturing different objects more like a plugin. I'm good on merging this

pngs = sorted(glob.glob(os.path.join(os.getcwd(), '*.png'))
image_names = list()
image_path_iterator = block_vars['image_path_iterator']
for png in my_pngs:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To avoid confusion remove count=0 line and iterate over pngs.

@larsoner
Copy link
Contributor Author

Taking the block_vars might not be enough as one needs to capture something in the state of the running program.

block_vars is just a dict, so it seems like we could add a new entry to this that would allow you to access whatever you need to.

@larsoner
Copy link
Contributor Author

@choldgraf do you have time to look, too?

@larsoner
Copy link
Contributor Author

Copy link
Contributor

@choldgraf choldgraf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I took a relatively quick pass, but I unfortunately don't have a ton of time because I'm headed to Europe getting married this week and will be sorta offline for a few weeks after as I'll by trying not to work as much as possible.

In general I think this looks good - there's still not a clear path in my mind towards how I would create my own custom scraper (or how I could add a scraper for something HTML-based instead of file-based) but I think we can spot-check this in later PRs. I'm always +1 on refactoring and cleaning things up and I like that this generalizes these features to (potentially) new image producing things. So if @Titan-C is +1 then I think we should 🚢 it

_import_matplotlib()


def matplotlib_scraper(block, block_vars, gallery_conf):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I kinda feel like these scrapers should be in a different module (scrapers.py?), what do you all think? gen_rst is quite generic and if there's a lot of scraper-specific code that could be added perhaps its enough to live in its own file.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed, I have refactored them into their own module now.

For example, a naive class to scrape any new PNG outputs in the
current directory could do, e.g.::

import glob
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I must be honest that this explanation is not super clear to me, but I don't know how the scrapers work so am not sure how I could improve it. However, I don't think it should block this PR because it's a good start. We should open an issue about improving this documentation once it's merged.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes let's iterate on the docs, probably after one of the issues listed at the top gets addressed by a user

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not happy at all with the idea of having a scraper use glob. I think I already raised the concern in another issue. For example, what if people have PNG images on this as data input(scikit-image maybe), then this might accidentally capture those. Or if one day we implement parallel builds, could this lead to problems if some examples save to disk at the same time. But this is an experimental feature, and we will iterate on the docs and inner mechanisms. So I can let this pass.



def clean_modules(gallery_conf, fname):
"""Remove/unload seaborn from the name space
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no longer seaborn-specific right?

does not want to influence in other examples in the gallery.
"""
for reset_module in gallery_conf['reset_modules']:
reset_module(gallery_conf, fname)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume this doesn't fail if one of the packages isn't installed, right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right now matplotlib is a hard requirement. Eventually we can remove this requirement if we want, and this PR at least makes that easier.

@larsoner
Copy link
Contributor Author

@choldgraf comments addressed. scrapers is now a proper submodule:

https://543-25860190-gh.circle-artifacts.com/0/rtd_html/reference.html

@choldgraf
Copy link
Contributor

I'm +1 on a merge if appveyor becomes happy. @Titan-C want to do the honors if you're OK with the scrapers module?

Copy link
Member

@Titan-C Titan-C left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is fine. Better get this in and start experimenting. I'm definitely changing the API of save_figures as I need to pass an extra variable containing the state of the example being executed. I would also be happier if instead of scrapers we name this object capturers. But that is just me, and we can iterate later.

For example, a naive class to scrape any new PNG outputs in the
current directory could do, e.g.::

import glob
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not happy at all with the idea of having a scraper use glob. I think I already raised the concern in another issue. For example, what if people have PNG images on this as data input(scikit-image maybe), then this might accidentally capture those. Or if one day we implement parallel builds, could this lead to problems if some examples save to disk at the same time. But this is an experimental feature, and we will iterate on the docs and inner mechanisms. So I can let this pass.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

First gallery plot uses .matplotlibrc rather than the matplotlib defaults
6 participants