Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Add content-based recommendation-system for the example gallery #1081

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
ArturoAmorQ opened this issue Feb 15, 2023 · 11 comments Β· Fixed by #1125
Closed

Add content-based recommendation-system for the example gallery #1081

ArturoAmorQ opened this issue Feb 15, 2023 · 11 comments Β· Fixed by #1125

Comments

@ArturoAmorQ
Copy link
Contributor

In the example gallery of scikit-learn we (more or less) follow a logic of grouping examples by module, e.g. the clustering section groups examples concerning the sklearn.cluster module.

We could try to further divide the example gallery by classes and functions of a given module using subsections to help users focus on a given algorithm, e.g. distinguish between examples using sklearn.cluster.KMeans and, say, examples using sklearn.cluster.MiniBatchKMeans. This would be similar to the already existing structure (as shown by the former links) but would introduce redundancies, as a given example would belong to several subsections from several modules.

Instead we could have a recommender system based on similarity to automatically link to the most relevant related content. This could be introduced at the end of each example (see screenshot below). Maybe a nearest neighbors tf-idf of the symbols could do the job.

Recommender_system_example

I believe that other libraries may benefit from such a feature, such as the matplotlib example gallery. Thoughts on this?

\cc @jklymak @GaelVaroquaux

@larsoner
Copy link
Contributor

My quick, not-thought-out idea would be to combine tagging / #261 with some automated aggregation/grouping system. That way when you add a new example/tutorial (or retrospectively want to classify one) you just add the appropriate labels, and sphinx-gallery (or whatever) automatically adds it in the right places

@GaelVaroquaux
Copy link
Contributor

GaelVaroquaux commented Feb 15, 2023 via email

@GaelVaroquaux
Copy link
Contributor

GaelVaroquaux commented Feb 15, 2023 via email

@jklymak
Copy link
Contributor

jklymak commented Feb 15, 2023

Matplotlib uses https://sphinx-gallery.github.io/dev/configuration.html#add-mini-galleries-for-api-documentation for the API back references. This seems a similar idea, and perhaps the same "see also" list? For our gallery entries we've been back referencing the api (see the bottom of https://matplotlib.org/stable/gallery/images_contours_and_fields/contour_demo.html, for instance), but this is done manually, so is not consistent across the library, though improving slowly.

ping @melissawm and @story645 who are perhaps getting a GSoD mentee to help with tagging the MPL examples.

@story645
Copy link
Contributor

story645 commented Feb 15, 2023

Yeah I opened melissawm/sphinx-tags#33 on sphinx-tags for auto tagging on API - I think that should be doable on scraping.

@kolibril13 implemented some really dynamic search filtering in
https://github.com/kolibril13/plywood-gallery and I think it'd be really useful to also have that in sphinx gallery (especially if you want to combine w/ recommendations), possibly integrated w/ the tags as the initial auto-fills.

@melissawm
Copy link
Contributor

I think this is a great idea and would be happy to help see it through!

This would mean an optional dependency on scikit-learn, from what I understand, for the

Maybe a nearest neighbors tf-idf of the symbols could do the job.

Because the tags are directives, they should be pretty easy to autopopulate once the clusters and classification are identified by the algorithm. Would this be a PR to sphinx-gallery or to sphinx-tags?

@jklymak
Copy link
Contributor

jklymak commented Feb 16, 2023

Ha, I missed that you were going to try to automate this somehow. That sounds like a research project first.

@melissawm
Copy link
Contributor

I think these can be two different approaches: have a human classify the gallery if you can (probably the best option imo!), but have an automated clustering option if you prefer.

@ArturoAmorQ
Copy link
Contributor Author

I do prefer something automated on sphinx-gallery and that can be tuned to show, for instance, the 5 most relevant examples (5 nearest neighbors). Then human implemented tags can be more flexible about the number of examples assigned to a cluster and criteria form them.

@larsoner
Copy link
Contributor

I do prefer something automated on sphinx-gallery and that can be tuned to show, for instance, the 5 most relevant examples (5 nearest neighbors). Then human implemented tags can be more flexible about the number of examples assigned to a cluster and criteria form them.

So I think there are two separate issues:

  1. How examples are labeled in some way or considered similar to one another
  2. Which examples to recommend at the end of each example

I think (2) probably is in scope for SG. Thinking about the manual sphinx-tags case for (1), I think it's straightforward enough to include the N most similar examples in terms of tags or whatever using some suitable distance-based algorithm for (2).

But when I see

Instead we could have a recommender system based on similarity to automatically link to the most relevant related content

I like @jklymak get a bit worried that what you're talking about implementing in SG is parsing of Python code + output to automatically label or compute "distances" between examples, i.e., solve problem (1) automatically. I think this has to be out of scope for SG because there are potentially a lot of ways to do this, and we don't have the maintenance bandwidth for it and all potential modifications people might have in mind down the road.

If you do indeed want to do this sort of "automated tagging", then one approach that could work nicely for division of maintenance between packages is:

  1. In SG we add a Sphinx event that occurs after all examples have been run, that gives four lists, all of length n_examples:

    1. list of input Python example files
    2. list of example labels from sphinx-tags (if used) extracted
    3. list of output RSTs generated
    4. list of list of selected similar examples that will soon be linked to by adding to the RST

    Then whatever modifications are made to the list of RSTs generated (e.g., modifying the RST itself) and list-of-list of selected similar examples will be used to create the final output RST.

  2. In sklearn you write a little sphinx extension that hooks into this event, and modifies that last list in whatever way you want by parsing Python, RST, and sphinx-tags to decide the examples that should be linked

At the end of the day, the end user would need SG and sklearn installed, and add not just 'sphinx_gallery' to their Sphinx extensions but also 'sklearn.sphinxext.automated_sg_tagging' (or whatever), and all options/config/whatever for the automated system could be handled at the sklearn end (or in some other module entirely).

I think this framework is general enough that it allows people to modify the end-of-page linked example lists in whatever way they want. It also allows for easily doing stuff like easier modification of generated RST than using the source-read Sphinx event.

@GaelVaroquaux
Copy link
Contributor

GaelVaroquaux commented Feb 16, 2023 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants