Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Add same_value casting to np.astype #93

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 9 commits into
base: main
Choose a base branch
from
Open

Add same_value casting to np.astype #93

wants to merge 9 commits into from

Conversation

mattip
Copy link
Owner

@mattip mattip commented May 27, 2025

This is something I have wanted for a while, to allow casting to a "smaller" dtype and raise an error or warning if there is overflow. For now this is a PR to my branch, so we can have a place to discuss the progress without adding too much noise to the general repo.

@mattip
Copy link
Owner Author

mattip commented May 27, 2025

Choices for implementing the same_value cast:

  • 1 A dedicated inner loop, which would double the number of casting loops: we already have 4: aligned/not aligned and strided/contiguous, so that would mean 8 loop functions
    code around the cast in the inner loop and a if flag==SAMEVALUE branch.
  • 2 Adding a flags field to the context. This will impact performance
  • 3 Some check after each inner loop iteration that the values are the same. But how would that work: we would still need more loops.

2: Checking only the flag on the fast path

    if (context->flags != 1234) {
        *(_TYPE2 *)dst = _CONVERT_FN(*(_TYPE1 *)src);
    }

makes this timeit test go from 640ns to 880 ns:

timeit -s "import numpy as np; a = np.arange(10_000, dtype=np.int64)" "a.astype(np.int8)"

Increasing the size to 10_000_000 and testing with/without the dummy flag makes the time go from 74.5ms to 69.5ms. The casting loops already use NPY_GCC_OPT_3.

@mattip
Copy link
Owner Author

mattip commented May 28, 2025

Update: it looks like the CPU can elide the if flags out of the loop, and so the non-checking loop is as fast as not having a check at all as long as I do

uint64_t flags = context->flags;
while... {
    if (flags == NP_CHECK_OVERFLOW) {
        // do the check
    }
    *(_TYPE2 *)dst = _CONVERT_FN(*(_TYPE1 *)src);
}

@mattip
Copy link
Owner Author

mattip commented May 28, 2025

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant