-
-
Notifications
You must be signed in to change notification settings - Fork 32k
pydoc doesn't find all module docstrings #41872
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
pydoc.synopsis() attempts to find a module's doc string I've attached a patch against Python 2.4.1 that fixes |
Logged In: YES PEP-257 recommends: "For consistency, always use """triple Are there a large number of modules written using |
Logged In: YES I don't know if there are a large number of modules with At best pydoc is inconsistent - the web browser display uses I have attached a revised patch that uses a regex match so FWIW this bug report was motivated by this thread on |
Logged In: YES I think you're right that if it works for the module summary I'll look at fixing this soon, but feel free to keep |
Logged In: YES I started the thread to which Kent referred. I am aware of |
Source still has the snippet in patch (didn't test behavior). |
The standard library has moved on quite a bit since this patch was written...
(The reason for not using full compilation is that you would then have to either *run* the compiled code or else compile to the AST and interrogate that, which is technically implementation dependent)
|
Oops, I somehow ended up looking at an old revision of pydoc.py The current version *is* using tokenize.open and importlib in synopsis(), so those aspects of my comments are incorrect. However, the point that pydoc should probably be using the tokenize module to do the parsing inside source_synopsis remains valid. There's no good reason to continue duplicating a subset of that text processing logic within pydoc. |
I've rewritten the source_synopsis function to use the tokenize module. It should now work with triple single quotes and hopefully all the other cases where __doc__ returns a string. Since tokenize.tokenize needs a file object that is opened in binary mode, in the case of a StringIO object, i am reading the whole object and converting it to a BytesIO object. I don't know if that is the right way. Also, the only instance i could find where source_synopsis is called with a StringIO object is in the ModuleScanner.run method. Maybe we could tweak this call to pass a byte-stream object to avoid the overhead of re-conversion? All the current tests pass. |
+ except: I don't understand these try/except. First, "except: pass" must never be used, only catch specific exceptions (ex: AttributeError). Can you explain why you expect a TypeError? If your patch fixes a bug, you must add a new unit test to test_pydoc to check for non-regression. |
I've updated my patch with the review changes and tests. tokenize.detect_encoding throws a TypeError if the file object passed to it is in text mode. However, i've realized catching this is not necessary as i now check for TextIOBase instead of just StringIO before. |
Do you have any plan to work on patch for 2.7? |
Added patch for 2.7. Please review. |
I tried pydoc_2.7.patch with the following test file and # -- coding: utf-8 -- u"""ツ""" class Spam(object):
u"""ツ""" >>> import utf8
>>> utf8.__doc__
u'\u30c4'
>>> print(utf8.__doc__)
ツ
>>> import pydoc
>>> pydoc.source_synopsis(file('utf8.py'))
u'\xe3\x83\x84'
>>> print pydoc.source_synopsis(file('utf8.py'))
�
>>> print pydoc.source_synopsis(file('utf8.py')).encode('latin-1')
ツ |
Hi Victor, can you give this another look? |
This issue is 14 years old, inactive for 5 years, has 3 patches: it's far from being "newcomer friendly", I remove the "Easy" label. |
@vstinner Can i take a stab at it? Do you accept a patch ? |
…doc (GH-127520) It now supports docstrings with single quotes, escape sequences, raw string literals, and other Python syntax. Co-authored-by: Éric <[email protected]> Co-authored-by: Serhiy Storchaka <[email protected]>
… in pydoc (pythonGH-127520) It now supports docstrings with single quotes, escape sequences, raw string literals, and other Python syntax. (cherry picked from commit 474e419) Co-authored-by: Srinivas Reddy Thatiparthy (తాటిపర్తి శ్రీనివాస్ రెడ్డి) <[email protected]> Co-authored-by: Éric <[email protected]> Co-authored-by: Serhiy Storchaka <[email protected]>
… in pydoc (pythonGH-127520) It now supports docstrings with single quotes, escape sequences, raw string literals, and other Python syntax. (cherry picked from commit 474e419) Co-authored-by: Srinivas Reddy Thatiparthy (తాటిపర్తి శ్రీనివాస్ రెడ్డి) <[email protected]> Co-authored-by: Éric <[email protected]> Co-authored-by: Serhiy Storchaka <[email protected]>
…e in pydoc (GH-127520) (GH-128620) It now supports docstrings with single quotes, escape sequences, raw string literals, and other Python syntax. (cherry picked from commit 474e419) Co-authored-by: Srinivas Reddy Thatiparthy (తాటిపర్తి శ్రీనివాస్ రెడ్డి) <[email protected]> Co-authored-by: Éric <[email protected]> Co-authored-by: Serhiy Storchaka <[email protected]>
…e in pydoc (GH-127520) (GH-128621) It now supports docstrings with single quotes, escape sequences, raw string literals, and other Python syntax. (cherry picked from commit 474e419) Co-authored-by: Srinivas Reddy Thatiparthy (తాటిపర్తి శ్రీనివాస్ రెడ్డి) <[email protected]> Co-authored-by: Éric <[email protected]> Co-authored-by: Serhiy Storchaka <[email protected]>
Finally, 20 years later, it has been fixed. Pydoc now uses tokenizer to extract the first logical line that can be a docstring, and then parses it with An advantage over |
Good job, @srinivasreddy! |
Uh oh!
There was an error while loading. Please reload this page.
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields:
bugs.python.org fields:
Linked PRs
The text was updated successfully, but these errors were encountered: