Thanks to visit codestin.com
Credit goes to github.com

Skip to content

TYP: Explicit numpy.__all__ in the stubs #26979

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 5 commits into from
Aug 30, 2024
Merged

Conversation

jorenham
Copy link
Member

@jorenham jorenham commented Jul 18, 2024

According to the typing spec:

A type stub should contain an __all__ variable if and only if it also present at runtime. In that case, the contents of __all__ should be identical in the stub and at runtime. If the runtime dynamically adds or removes elements (for example if certain functions are only available on some platforms), include all possible elements in the stubs.

Furthermore, type-checkers such as Pyright / Pylance only consider the own members within .pyi stubs as public, even if import spam as ham is used.

For example, the following code is not understood by (at least) Pyright:

import numpy as np

np.linalg.pinv(1)  # ok
np.emath.sqrt(1)  # error: "emath" is not a known attribute of module "numpy"

This particular example could also be solved with e.g. emath = numpy.lib.scimath.
But for numpy._get_promotion_state, that isn't the case.
One could argue that, even though it is explicitly exported at runtime in numpy.__all__, it should still be considered private, and therefore not be part of the typing stubs.

And that brings me to the most important advantage: It is explicit.
A import this will tell us that

Explicit is better than implicit.

And this case is in particular a great example of that: Once I added the __all__, my IDE (vscode) noticed that one of functions, trapezoid, was missing (and I plan to open a separate PR for it soon).

Without the __all__, checking for missing annotations like these, would take way more effort.
The usual approach to this is by using stub-checker tools such as pyright --verifytypes.
But at the moment, running pyright --ignoreexternal --verifytypes numpy produces >23000 lines of output and reports >8000 errors, of which >4500 originate in .pyi files.

@jorenham jorenham force-pushed the typing/numpy-all branch 3 times, most recently from 0a6ecc3 to 117a1ca Compare July 23, 2024 21:08
@seberg
Copy link
Member

seberg commented Aug 5, 2024

This looks fine if pyright needs it. Is there a test, or would it be possible to add a test that the two __all__ (maybe also at least for np.random and np.linalg match)?

Otherwise, this is bound to get out of sync quickly. With such a test, it might actually end up replace the test for our public API not accidentally growing than the current test we have.

@jorenham
Copy link
Member Author

jorenham commented Aug 5, 2024

This looks fine if pyright needs it. Is there a test, or would it be possible to add a test that the two __all__ (maybe also at least for np.random and np.linalg match)?

The only test I can think of is something like pyright --verifytypes numpy, but like I mentioned before, that's a rather brute force approach, but we might be able to sed the output for it.
The mypy equivalent of this is mypy stubtest, but from the looks of it, it also doesn't have flags to only check the __all__.

But it should be difficult to write a simple script for this ourselves in e.g. numpy/typing/tests/:
A simple import numpy; numpy.__all__ will get us the runtime part.
If we assume that the stubs always uses a static __all__ (no .append() or +=or something), then we could probably get away with a regex. Then compare the two (sorted)all's and output difflib.context_diff` or something to stderr in case they don't match.


edit

ChatGPT 4o came up with this (https://chatgpt.com/share/2c3bf138-9aa3-44e2-8258-3938ceaed13c)

import importlib
import os
import ast

def get_dynamic_all(module_name):
    # Dynamically import the module and return its __all__ attribute
    module = importlib.import_module(module_name)
    if hasattr(module, '__all__'):
        return set(module.__all__)
    return None

def get_static_all_from_stub(module_name):
    # Attempt to locate and read the corresponding .pyi file
    module = importlib.import_module(module_name)
    module_path = module.__file__
    
    if module_path.endswith('.py'):
        stub_path = module_path[:-3] + '.pyi'
    elif module_path.endswith('.pyc'):
        stub_path = module_path[:-4] + '.pyi'
    else:
        raise FileNotFoundError(f"Cannot find .pyi stub for {module_name}")
    
    if not os.path.exists(stub_path):
        raise FileNotFoundError(f".pyi stub file not found for {module_name} at {stub_path}")
    
    with open(stub_path, 'r') as stub_file:
        tree = ast.parse(stub_file.read(), filename=stub_path)
    
    # Extract the __all__ from the stub file
    for node in ast.walk(tree):
        if isinstance(node, ast.Assign):
            for target in node.targets:
                if isinstance(target, ast.Name) and target.id == '__all__':
                    if isinstance(node.value, (ast.List, ast.Tuple)):
                        return set(elt.s for elt in node.value.elts if isinstance(elt, ast.Str))
    return None

def compare_all(module_name):
    dynamic_all = get_dynamic_all(module_name)
    static_all = get_static_all_from_stub(module_name)
    
    if dynamic_all is None:
        print(f"No __all__ attribute found dynamically for module {module_name}")
        return False
    
    if static_all is None:
        print(f"No __all__ attribute found in stub for module {module_name}")
        return False
    
    if dynamic_all == static_all:
        print(f"__all__ matches for module {module_name}")
        return True
    else:
        print(f"__all__ does not match for module {module_name}")
        print(f"Dynamic __all__: {dynamic_all}")
        print(f"Static __all__ in stub: {static_all}")
        return False

# Example usage
if __name__ == "__main__":
    module_name = "example_module"  # Replace with your module name
    compare_all(module_name)

No idea if it actually works, but it might me a good starting point 🤷🏻

@jorenham
Copy link
Member Author

jorenham commented Aug 5, 2024

Otherwise, this is bound to get out of sync quickly. With such a test, it might actually end up replace the test for our public API not accidentally growing than the current test we have.

Currently, without an explicit __all__, it's even easier to get out of sync.
If functions are missing form the stubs, but are present it the __all__ (in the stubs), then your IDE will highlight the missing export (assuming that it's not only Pylance that does this). This is precisely how I found out that numpy.trapezoid was missing, which as far as I'm concerned, wasn't known before that.

And syncing __all__ is in this case as simple as copy-pasting the import numpy; print(numpy.__all__) output into numpy/__init__.pyi

Either way, I do agree that an automatic check for this, is an excellent idea.
But without having __all__ in the stubs, writing such a test would be a lot more difficult, than with it.
So I suppose that having __all__ in the stubs is a win/win scenario.

@charris charris merged commit c556f8c into numpy:main Aug 30, 2024
66 checks passed
@charris
Copy link
Member

charris commented Aug 30, 2024

Thanks @jorenham .

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants