-
Notifications
You must be signed in to change notification settings - Fork 207
Add content-based recommendation-system for the example gallery #1081
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. Weβll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
My quick, not-thought-out idea would be to combine tagging / #261 with some automated aggregation/grouping system. That way when you add a new example/tutorial (or retrospectively want to classify one) you just add the appropriate labels, and sphinx-gallery (or whatever) automatically adds it in the right places |
combine tagging / #261 with some automated aggregation/grouping system.
Right: listing the examples with the same tags.
That's a good idea, though the "backend" differ. What can be in common is the "frontend" (the html + css code todisplay the lists).
|
Also, I forgot to say: I'm super enthusiastic about the idea, as it will enable us to cross-link examples without having lists to maintain.
|
Matplotlib uses https://sphinx-gallery.github.io/dev/configuration.html#add-mini-galleries-for-api-documentation for the API back references. This seems a similar idea, and perhaps the same "see also" list? For our gallery entries we've been back referencing the api (see the bottom of https://matplotlib.org/stable/gallery/images_contours_and_fields/contour_demo.html, for instance), but this is done manually, so is not consistent across the library, though improving slowly. ping @melissawm and @story645 who are perhaps getting a GSoD mentee to help with tagging the MPL examples. |
Yeah I opened melissawm/sphinx-tags#33 on sphinx-tags for auto tagging on API - I think that should be doable on scraping. @kolibril13 implemented some really dynamic search filtering in |
I think this is a great idea and would be happy to help see it through! This would mean an optional dependency on scikit-learn, from what I understand, for the
Because the tags are directives, they should be pretty easy to autopopulate once the clusters and classification are identified by the algorithm. Would this be a PR to sphinx-gallery or to sphinx-tags? |
Ha, I missed that you were going to try to automate this somehow. That sounds like a research project first. |
I think these can be two different approaches: have a human classify the gallery if you can (probably the best option imo!), but have an automated clustering option if you prefer. |
I do prefer something automated on sphinx-gallery and that can be tuned to show, for instance, the 5 most relevant examples (5 nearest neighbors). Then human implemented tags can be more flexible about the number of examples assigned to a cluster and criteria form them. |
So I think there are two separate issues:
I think (2) probably is in scope for SG. Thinking about the manual sphinx-tags case for (1), I think it's straightforward enough to include the But when I see
I like @jklymak get a bit worried that what you're talking about implementing in SG is parsing of Python code + output to automatically label or compute "distances" between examples, i.e., solve problem (1) automatically. I think this has to be out of scope for SG because there are potentially a lot of ways to do this, and we don't have the maintenance bandwidth for it and all potential modifications people might have in mind down the road. If you do indeed want to do this sort of "automated tagging", then one approach that could work nicely for division of maintenance between packages is:
At the end of the day, the end user would need SG and sklearn installed, and add not just I think this framework is general enough that it allows people to modify the end-of-page linked example lists in whatever way they want. It also allows for easily doing stuff like easier modification of generated RST than using the |
This would mean an optional dependency on scikit-learn, from what I understand, for the
> Maybe a nearest neighbors tf-idf of the symbols could do the job.
No, I was thinking that we could easily implement basic version of these in pure Python + numpy.
Because the tags are directives, they should be pretty easy to autopopulate once the clusters and classification are identified by the algorithm. Would this be a PR to sphinx-gallery or to sphinx-tags?
I don't know tags enough to answer.
In general, I'm happy to go whichever way makes the ecosystem healthier
|
In the example gallery of scikit-learn we (more or less) follow a logic of grouping examples by module, e.g. the clustering section groups examples concerning the sklearn.cluster module.
We could try to further divide the example gallery by classes and functions of a given module using subsections to help users focus on a given algorithm, e.g. distinguish between examples using sklearn.cluster.KMeans and, say, examples using sklearn.cluster.MiniBatchKMeans. This would be similar to the already existing structure (as shown by the former links) but would introduce redundancies, as a given example would belong to several subsections from several modules.
Instead we could have a recommender system based on similarity to automatically link to the most relevant related content. This could be introduced at the end of each example (see screenshot below). Maybe a nearest neighbors tf-idf of the symbols could do the job.
I believe that other libraries may benefit from such a feature, such as the matplotlib example gallery. Thoughts on this?
\cc @jklymak @GaelVaroquaux
The text was updated successfully, but these errors were encountered: