Thanks to visit codestin.com
Credit goes to github.com

Skip to content

MRG: Allow sorting subsection files #281

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 8 commits into from
Aug 6, 2017

Conversation

larsoner
Copy link
Contributor

@larsoner larsoner commented Aug 2, 2017

Closes #275.
Closes #278. (Take over of this PR.)

This adds sort keys for sorting within subsections. I don't need explicit reordering currently (filename-based is good enough for now) so I didn't bother implementing it for now. But it at least sets up the framework for it later.

WIP for now because I'm not updating the doc until #279 is merged and I can rebase to avoid conflicts.

@larsoner larsoner changed the title Sortkeyex WIP: Allow sorting subsection files Aug 2, 2017
@larsoner larsoner mentioned this pull request Aug 2, 2017
2 tasks
@larsoner larsoner changed the title WIP: Allow sorting subsection files MRG: Allow sorting subsection files Aug 4, 2017
@larsoner
Copy link
Contributor Author

larsoner commented Aug 4, 2017

Rebased, docs added, and CIs are happy. Ready for review/merge from my end.

Copy link
Contributor

@choldgraf choldgraf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general I think this looks great - I have a few little comments and one less-little one about assuming where the example titles are in the .py files...other than this it LGTM

by the amount of code, i.e. :class:`sphinx_gallery.sorting.AmountOfCodeSortKey`
as::

from sphinx_gallery.sorting import AmountOfCodeSortKey
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not NumberOfLines sortkey? I know that's kind of a bikeshed but I think it's more explicit

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not just number of lines but more number of code characters IIUC

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, number of code lines rather (which you note below)

Sorting gallery examples
========================

Within a given gallery (sub)section, the example files by default are ordered
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we have any short description in the docs that just explains that ordering in general is handled by "sorting keys" which are a way of defining a custom sorting behavior with the sorted function? Maybe that's too much info but I was confused the first time I saw sort keys being used.

``within_subsection_order``:

- :class:`sphinx_gallery.sorting.FileSizeSortKey` to sort by file size.
- :class:`sphinx_gallery.sorting.FileNameSortKey` to sort by file name.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just a thought on ExplicitFilenameSortKey (which I know you aren't implementing here): I think it would get really confusing if we implemented this, since right now doing explicit sorting for folders requres that all folders are listed. We couldn't do the same thing for explicit filenames because there would be too many IMO

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah we'll have to think harder about that one, but I don't think that the current API will make the problem harder necessarily

listdir = [fname for fname in os.listdir(src_dir)
if fname.endswith('.py')]
sorted_listdir = sorted(
listdir, key=gallery_conf['within_subsection_order'](src_dir))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should have a check somewhere that sees if one of the allowed keys are given, and if not (e.g., if somebody use some other wonky object) we'd throw an error that says "this must be one of XXX sortkeys"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But people in principle can use their own

@@ -385,29 +387,25 @@ def generate_dir_rst(src_dir, target_dir, gallery_conf, seen_backrefs):
'Generating gallery for %s ' % build_target_dir,
length=len(sorted_listdir))
for fname in iterator:
intro, amount_of_code, time_elapsed = generate_file_rst(
intro, time_elapsed = generate_file_rst(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like that this is being factored out


def __call__(self, filename):
src_file = os.path.normpath(os.path.join(self.src_dir, filename))
file_conf, script_blocks = split_code_and_text_blocks(src_file)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah I see that it isn't strictly NumberOfLines per-se, since you're only counting the code lines. I think it's fine to keep it AmountOfCode in that case, but maybe clarify in the docs that this will count only code lines and not all the lines in the .py files

The source directory.
"""

def __call__(self, filename):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this will break for many people's sphinx-gallery setups, as there are a few projects (ahem, MNE-python being one of them) that put sphinx tags just above their titles. I feel like I opened an issue about this a while back but the dev team wasn't keen on supporting that use-case...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah yes I should probably parse the title however it's found and parsed elsewhere (and refactor to DRY)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually this already uses split_code_and_text_blocks which calls get_docstring_and_rest, and script_blocks[0] is the docstring, which is already cleaned. I even added this to the code:

        assert script_blocks[0][1].strip().split('\n')[0][0] == '='

And it passed, so we should be okay here.

_check_order(config_app, 'filesize')


@pytest.mark.conf_file(content="""
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These tests feel like magic to me

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The conf stuff? Agreed but I copy pasted (then mostly simplified) from earlier code so I assumed thus was best practice for the repo

@larsoner
Copy link
Contributor Author

larsoner commented Aug 5, 2017

Okay @choldgraf I changed the name to NumberOfCodeLinesSortKey, added a brief bit to get people comfortable with sorted / key, and added some junk comment lines at the top of one the plot_1.py test file to ensure the title-ordering doesn't care about code comments. Updated rendered doc is here:

https://170-25860190-gh.circle-artifacts.com/0/home/ubuntu/sphinx-gallery/rtd_html/advanced_configuration.html#sorting-gallery-examples

@@ -1,7 +1,9 @@
#!/usr/bin/env python2
# -*- coding: utf-8 -*-
"""
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no no - it's not encoding lines, it's sphinx references. People put stuff like

.. _my_ref:

My title
--------

which you're technically not needed to do since SG auto generates references, but people do it anyway :-)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ahh okay, I'll probably have to do something smarter then, I'll look into it more.

}

However, other options exist that can be instantiated for use with
However, multiple convenience classes are provided for use with
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you're going to make edits can you make this "In addition" instead of "However"?

If there aren't any more commits needed then I can edit and then merge (but see my other comment)

@@ -153,12 +153,15 @@ def codestr2rst(codestr, lang='python', lineno=None):
return code_directive + indented_block


def extract_intro(filename, docstring):
def extract_intro_title(filename, docstring):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this naming makes me thing that it extracts just the title of the introduction. could we make it extract_intro_and_title?

""" Extract the first paragraph of module-level docstring. max:95 char"""

# lstrip is just in case docstring has a '\n\n' at the beginning
paragraphs = docstring.lstrip().split('\n\n')
paragraphs = [p for p in paragraphs if not p.startswith('..')]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is going to fix some other bugs in the process 🎉

if len(paragraphs) > 1:
title = paragraphs[0].split('\n')
title = ' '.join(t for t in title if t[0] not in ['=', '-', '^', '~'])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe this could be a list constant somewhere? RST_TITLE_CHARACTERS?

@@ -172,6 +172,7 @@ def _check_order(config_app, key):
regex = '.*:%s=(.):.*' % key
with codecs.open(index_fname, 'r', 'utf-8') as fid:
for line in fid:
print(line)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just making sure this is intentional and not for debugging

@choldgraf
Copy link
Contributor

In general this looks great - a few small comments from me then LGTM

@larsoner
Copy link
Contributor Author

larsoner commented Aug 5, 2017

@choldgraf done, I made the check for RST title chars better based on their official description and renamed the function.

@choldgraf
Copy link
Contributor

nice - so it sounds like this will work if you either put === just after, or both before and after the title, yeah?

LGTM...I'll give it 24 hrs and merge if nobody else brings anything up. That OK w/ you?

@larsoner
Copy link
Contributor Author

larsoner commented Aug 5, 2017

nice - so it sounds like this will work if you either put === just after, or both before and after the title, yeah?

Yeah and RST supports both so we should be good there.

Ready for merge from my end.

@choldgraf choldgraf merged commit 7263a18 into sphinx-gallery:master Aug 6, 2017
@choldgraf
Copy link
Contributor

🎉 🎉 🎉

@lesteve
Copy link
Member

lesteve commented Aug 7, 2017

Sorry I completely missed this one. I think this is great stuff. Quickly looking at the diff, I have the following comments though:

I think we should have an instance (rather than a class) as the value associated with the within_subsection_order to be consistent with subsection_order. Another strong reason is that some people may want to control the order explicity (and I am hoping ExplicitOrder may be used for this).

Another consistency problem, it would be good to have a consistent naming for sorting keys (i.e. finishing with Order or SortKey).

@choldgraf
Copy link
Contributor

I think we should have an instance (rather than a class) as the value

I think this is a good point. No harm in having people give NumberOfLinesSortKey() and I agree it keeps things more standard. Would we need to change the code at all or would the sorting behavior stay the same regardless of whether its an instance or a class?

it would be good to have a consistent naming for sorting keys

+1, either is fine with me but maybe I have a small preference for Order. So we do ExplicitOrder, NumberOfLinesOrder, AmountOfCodeOrder etc. @Eric89GXL do you have thoughts?

@larsoner
Copy link
Contributor Author

larsoner commented Aug 7, 2017

I think we should have an instance (rather than a class) as the value

What eventually gets called is e.g. instance = FileNameSortKey(src_dir); instance(filename). We could pass an instance, but then __call__ would need to take src_dir as an arg, and return something (a function?) that would then operate on fname. Is this preferred?

This is also why the subsection ordering and within-subsection ordering need to be kept separate -- the former operates on effectively just ExplictOrder(order).__call__(fname) and the other on what would need to be ExplicitExampleOrder(order)(src_dir)(fname). So I'm not sure ExplicitOrder can be used as a key using the existing interface.

I used the name SortKey to make it clear that it will be used as a key argument to the sorted function. Until just now, I actually didn't know if ExplictOrder needed to return ints (i.e., the index in the order), or if it also used as a sort key (i.e., just needed to provide anything that could be ordered by sort) when I looked deeper into the code. I'm fine with *Order, too, but we should document in the code that it's really just used as a sort key under the hood so the next dev knows immediately upon looking at the class.

@lesteve
Copy link
Member

lesteve commented Aug 8, 2017

This is also why the subsection ordering and within-subsection ordering need to be kept separate -- the former operates on effectively just ExplictOrder(order).__call__(fname) and the other on what would need to be ExplicitExampleOrder(order)(src_dir)(fname). So I'm not sure ExplicitOrder can be used as a key using the existing interface.

The hope when we were discussing the design was that they would be exactly the same. Is there a good reason the within_subsection_order sort key not be called on a list of the fully qualified filenames inside each src_dir?

Personally I am fine with SortKey because I know what a sort key is. The argument was that a significant chunk of our users may not have heard of a sort key. In the end I think that's why Order was used.

@larsoner
Copy link
Contributor Author

larsoner commented Aug 8, 2017

The hope when we were discussing the design was that they would be exactly the same. Is there a good reason the within_subsection_order sort key not be called on a list of the fully qualified filenames inside each src_dir?

It is eventually called that way -- any class for within_subsection_order will effectively operate as Class(my_order)(src_dir)(fname). Any class for subsection_order, by contrast, will effectively need to operate as Class(my_order)(fname). How should these be unified?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants