Thanks to visit codestin.com
Credit goes to github.com

Skip to content

API: Cleaning numpy/__init__.py and main namespace - Part 5 [NEP 52] #24587

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 11 commits into from
Sep 18, 2023

Conversation

mtsokol
Copy link
Member

@mtsokol mtsokol commented Aug 30, 2023

Relevant issue: #24306

Scope of the PR:

  • Remove compare_chararrays as it comes from a legacy module. It's still available as np.char.compare_chararrays.
  • Remove Deprecate chararray as it comes from a legacy module. It's still available as np.char.chararray.
  • Remove format_parser as it's the only function from rec present in the main namespace. It's still available as np.rec.format_parser.
  • Remove recarray. It's still available as np.rec.recarray.

@ngoldbaum
Copy link
Member

It looks like pandas needs to be updated for the doc build to pass.

It also looks like the repr for recarray needs to be updated to reflect its new canonical location:

File "/tmp/tmpraxm5dc8/doc/source/user/basics.rec.rst", line 699, in doc/source/user/basics.rec.rst
Failed example:
    type(recordarr.bar)
Expected:
    <class 'numpy.recarray'>
Got:
    <class 'numpy.core.records.recarray'>

@mtsokol mtsokol force-pushed the overhaul-of-main-namespace-part-5 branch 2 times, most recently from 7dad0c1 to b229e45 Compare September 1, 2023 11:07
tylerjereddy added a commit to tylerjereddy/mdanalysis that referenced this pull request Sep 3, 2023
* based on numpy/numpy#24587:
  * replace the single usage of `np.recarray` I found
    in our codebase--there are references to third-party
    libraries providing us with `recarray` that I haven't
    touched
  * similar PRs have been merged to `pandas` and `scipy`

[skip cirrus]
hmacdope pushed a commit to MDAnalysis/mdanalysis that referenced this pull request Sep 4, 2023
* based on numpy/numpy#24587:
  * replace the single usage of `np.recarray` I found
    in our codebase--there are references to third-party
    libraries providing us with `recarray` that I haven't
    touched
  * similar PRs have been merged to `pandas` and `scipy`

[skip cirrus]
@mtsokol mtsokol force-pushed the overhaul-of-main-namespace-part-5 branch from b229e45 to a622a12 Compare September 6, 2023 09:17
@rgommers
Copy link
Member

rgommers commented Sep 6, 2023

Can we actually do this? It looks to me like np.recarray and np.chararray have too much usage (~9,600 and ~2,500 hits in code search for me). So this may be more disruption than it's worth.

@mtsokol
Copy link
Member Author

mtsokol commented Sep 7, 2023

Can we actually do this? It looks to me like np.recarray and np.chararray have too much usage (~9,600 and ~2,500 hits in code search for me). So this may be more disruption than it's worth.

@rgommers These are used but the change to move to np.rec.recarray and np.char.chararray is straightforward and backward compatible. But I agree it will cause some disruption.

I think we have three options:

  1. Keep the PR in the current form: np.recarray and np.chararray are moved to np.rec and np.char.
  2. Revert changes and both names are available from np.* namespace and np.rec.* and np.char.* namespaces as they are right now in main (Drawback: this will break "each function in exactly one place" rule that we're trying to enforce).
  3. Keep np.recarray and np.chararray but remove them from their np.rec.* and np.char.* namespaces (Does it make sense?) - I see that np.rec.recarray and np.char.chararray are way less popular (a hundred hits in search compared to thousands you mentioned).

What do you think about these options? Which should I go with?

@rgommers
Copy link
Member

rgommers commented Sep 7, 2023

(Drawback: this will break "each function in exactly one place" rule that we're trying to enforce).

Yes, that'd be a little unfortunate. However, I'm not sure it's really that black-and-white. For all functions/objects we should have one documented/preferred location, but for some other popular functions we've also decided to leave a compatibility shim in place.

For chararray I think @ngoldbaum had a preference to move that away from the main namespace to make space for the better new functionality (NEP 55)? Not sure though. Let's see what he thinks about that one.

For recarray we don't really have a replacement, so I think it's okay to leave it in the main namespace as the preferred location. I'm okay keeping both np.recarray and np.rec.recarray, since it's a single object only, so we could consider this one as a case where practicality beats purity.

@ngoldbaum
Copy link
Member

ngoldbaum commented Sep 11, 2023

For chararray I think @ngoldbaum had a preference to move that away from the main namespace to make space for the better new functionality (NEP 55)? Not sure though. Let's see what he thinks about that one.

If taking it out of the main namespace would be too disruptive it should stay, practicality beats purity. That said, a user-visible deprecation warning in NumPy 2.0 with a plan to remove it in a few versions also makes sense I think.

@mtsokol mtsokol force-pushed the overhaul-of-main-namespace-part-5 branch 3 times, most recently from bbf20e2 to 668edfa Compare September 12, 2023 10:15
@mtsokol
Copy link
Member Author

mtsokol commented Sep 12, 2023

Hi @rgommers @ngoldbaum,

I updated this PR according to your feedback:

  • np.recarray is available again (also np.rec.recarray can be used),
  • np.chararray is available again but accessing it yields a DeprecationWarning.

Copy link
Member

@ngoldbaum ngoldbaum left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While we may not be deprecating np.recarray, I think it would still make sense to change its repr, in this PR I have:

In [3]: np.recarray
Out[3]: numpy.recarray

Whereas I would expect it to be an alias to numpy.rec.recarray now rather than the other way around.

Use ``np.char.compare_chararrays`` instead.

* ``np.chararray`` has been deprecated and it will be removed in the future.
Use ``np.char.chararray`` instead.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I would rephrase this slightly as "The charrarray in the main namespace has been deprecated. It can be imported without a deprecation warning from np.char.chararray for now, but we are planning to fully deprecate and remove chararray in the future.", since I think the intention is to fully deprecate it and remove it, we just haven't actually done the work yet, so can't outright remove it now. I proposed doing those deprecations before NumPy 2.0 as part of NEP 55, but that hasn't fully gone through yet. If that happens before NumPy 2.0, I will update this release note.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure! I updated this change note.

warnings.warn(
"`np.chararray` is deprecated and will be removed from "
"the main namespace in the future. Use `np.char.chararray` "
"instead.", DeprecationWarning, stacklevel=2)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the migration, I would say "Use an array with a string or bytes dtype instead," while it will still be available as np.char.charray, that will likely also be deprecated before NumPy 2.0 is released.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure! Updated.

@ngoldbaum ngoldbaum added the triage review Issue/PR to be discussed at the next triage meeting label Sep 12, 2023
@mtsokol
Copy link
Member Author

mtsokol commented Sep 13, 2023

While we may not be deprecating np.recarray, I think it would still make sense to change its repr, in this PR I have:

In [3]: np.recarray
Out[3]: numpy.recarray

Whereas I would expect it to be an alias to numpy.rec.recarray now rather than the other way around.

@ngoldbaum I decided to keep repr as numpy.recarray - in my understanding all items present in the main namespace should have numpy.* repr. There is a separate test: test_numpy_namespace with check_dir function, that checks if main namespace functions have correct (numpy.*) __module__ attribute.

Should I change recarray repr and add to the allowlist in the test?

@ngoldbaum
Copy link
Member

I personally think changing the canonical location makes sense and is in the spirit of the other changes we're making for the API migration. Feel free to wait for others to weigh in.

@mtsokol mtsokol force-pushed the overhaul-of-main-namespace-part-5 branch from 927f062 to eb6a10a Compare September 17, 2023 17:47
@mtsokol
Copy link
Member Author

mtsokol commented Sep 17, 2023

I personally think changing the canonical location makes sense and is in the spirit of the other changes we're making for the API migration. Feel free to wait for others to weigh in.

Makes sense to me - I updated the repr to numpy.rec.recarray.

@ngoldbaum
Copy link
Member

Thanks so much for your diligence on pushing forward refactoring the main namespace @mtsokol!

@ngoldbaum ngoldbaum merged commit 1008227 into numpy:main Sep 18, 2023
@mtsokol mtsokol deleted the overhaul-of-main-namespace-part-5 branch September 18, 2023 16:19
@pllim pllim mentioned this pull request Sep 21, 2023
1 task
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
30 - API Numpy 2.0 API Changes triage review Issue/PR to be discussed at the next triage meeting
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants