Thanks to visit codestin.com
Credit goes to github.com

Skip to content

WIP: MAINT: Made clip into an ufunc #7876

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 3 commits into from
Closed

WIP: MAINT: Made clip into an ufunc #7876

wants to merge 3 commits into from

Conversation

pimdh
Copy link
Contributor

@pimdh pimdh commented Jul 28, 2016

In order to solve issue #7633, it was suggested at #7873 to make clip into an ufunc.
This is an partial implementation of that. Left to do:

  • Implement proper typecasting
  • Properly handle None and NaN
  • Add a faster loop to loops.c.src to speed up if min or max are not arrays
  • Update the docs
  • Create benchmarks and test the new implementation's performance
  • Remove remnants of the old implementation from
    • numeric.py
    • fromnumeric.py
    • numpy_api.py
    • multiarray_api.*
    • arraytypes.*.src
    • calculation.*

But before I continue, I'd like to ask the following:

  • Am I on the right track?
  • How do I implement proper type casting? Now for example test_clip_with_out_array_int32 fails with message TypeError: ufunc 'clip' output (typecode 'd') could not be coerced to provided output parameter (typecode 'i') according to the casting rule ''same_kind'' because not all the arguments are of the same type.
  • In arraytypes.c fastclip is written down very concisely by doing almost all types at once. In loops.c.src, however, all definitions are split into the main categories (ints, floats, etc.). So far, I've followed that expanded approach, but the concise version seems nicer. Is it acceptable to add the concise combined version to the bottom of loops.c.src instead?

@seberg
Copy link
Member

seberg commented Jul 28, 2016

Nice efforts! As is, I think what might work is if you create a special type resolver for it (there may already be one which does it also).

We could consider creating the special type resolver also, and give a FutureWarning that in the future it will not always cast to a floating value magically, but will also honor integer loops, etc.

I don't have the time right now, but if you can't find it, just make a note or I will probably forget to give more hints.

@seberg
Copy link
Member

seberg commented Aug 5, 2016

A bit of a look, it looks like you are on the right track. I would say that you can put it into a single loop if possible. Not sure about the error on first sight, it sounds a bit like it does not have an iii->i loop?

@pimdh
Copy link
Contributor Author

pimdh commented Aug 13, 2016

Thanks for your comments. I can replicate the previous type behaviour by creating a type resolver that forces the casting to be unsafe. I'm not sure how to introduce any warnings, because the type resolver doesn't seem to have access to the target array type, so the error can't be raised there.

I come across another problem:
The expected behaviour concerning NaN would be to propagate NaN for the first argument, but not for the min or max arguments. This can be easily implemented in the ufunc loops. However, the ufunc does not allow for variable arguments and neither fmin/fmax nor minimum/maximum allow for this asymmetrical NaN propagation. The only solution I can think of, would be to create two more ufunc that follow this behaviour, but that seems very inelegant.

Does anyone have any ideas how to address these issues? Thanks

@pimdh
Copy link
Contributor Author

pimdh commented Aug 14, 2016

There seems to be another NaN issue. On all but Travis build nr 5, it shows error:
RuntimeWarning: invalid value encountered in clip, which I managed to reproduce on Ubuntu 12.04, Python 2.7.3 with:
np.array([np.NaN]).clip(1,2)

However, when I change

NPY_NO_EXPORT void
@TYPE@_clip(char **args, npy_intp *dimensions, npy_intp *steps, void *NPY_UNUSED(func))
{
    TERNARY_LOOP {
        const @type@ in = *((@type@ *)ip1);
        const @type@ min = *((@type@ *)ip2);
        const @type@ max = *((@type@ *)ip3);

        if (@lt@(in, min)) {
            *((@type@ *)op1) = min;
        }
        else if (@gt@(in, max)) {
            *((@type@ *)op1) = max;
        }
        else {
            *((@type@ *)op1) = in;
        }
    }
}

, which yields

>>> np.array([np.NaN]).clip(1,2)
__console__:1: RuntimeWarning: invalid value encountered in clip
array([ nan])

, to

NPY_NO_EXPORT void
@TYPE@_clip(char **args, npy_intp *dimensions, npy_intp *steps, void *NPY_UNUSED(func))
{
    TERNARY_LOOP {
        const @type@ in = *((@type@ *)ip1);
        const @type@ min = *((@type@ *)ip2);
        const @type@ max = *((@type@ *)ip3);

        printf(in < min ? "a" : "b");
        printf(in > max ? "c" : "d");
        if (@lt@(in, min)) {
            *((@type@ *)op1) = min;
        }
        else if (@gt@(in, max)) {
            *((@type@ *)op1) = max;
        }
        else {
            *((@type@ *)op1) = in;
        }
    }
}

, it shows

>>> np.array([np.NaN]).clip(1,2)
bdarray([ nan])

Can anyone advise on what causes this? Thanks

@seberg
Copy link
Member

seberg commented Aug 15, 2016

Hmm, I don't really know floating point flag details. You could check for NaN first to avoid the flag being set probably, or possible unset the flag (@juliantaylor might know without thinking much).
Things changing with the prints, might be because of different vectorization or so, it is nothing unusual I think.

About giving a warning, one thing we did before was introduce special casting flag like "FORCE_CAST_BUT_WARN", it is a bit of a cludge, but works.

@anntzer
Copy link
Contributor

anntzer commented Nov 20, 2016

Could #5142 (behavior of clip when amin > amax) be fixed during this rewrite too? Just a suggestion from the peanut gallery :-)

@homu
Copy link
Contributor

homu commented Jan 16, 2017

☔ The latest upstream changes (presumably #8475) made this pull request unmergeable. Please resolve the merge conflicts.

@seberg
Copy link
Member

seberg commented Jan 16, 2017

Since homu mentioned it, looked at it again. In principle I still really like it, and I think the NaN problems can probably be gotten around (mostly) (on linux there is a comparison function which will not set the floating point error flags for example).

One problem I still see is that it seems that the clip function may allow for None to mean that no clipping should be done. Which is something that the ufunc cannot support (except by using the maximum value). Or maybe I am being silly and this is only true in the C-Api though....

If you ever pick it up again or decide its too complex, we can still warm up the old PR to just fix the out problem....

@pimdh
Copy link
Contributor Author

pimdh commented Jan 16, 2017

I'll take another look at this this weekend, see if I can finish it / make some progress.

@eric-wieser
Copy link
Member

@drumstok: Think you'll return to this, or do you want someone else to take over?

pimdh added 3 commits June 10, 2017 16:03
In order to solve issue #7633, it was suggested to make clip into an ufunc.
This is an partial implementation of that. Left to do:
- Implement proper typecasting
- Propely handle None and NaN
- Add a faster loop to loops.c.src to speed up if min or max are not arrays
- Update the docs
- Create benchmarks and test the new implementation
- Remove remnants of the old implementation from
- - numeric.py
- - fromnumeric.py
- - numpy_api.py
- - multiarray_api.*
- - arraytypes.*.src
- - calculation.*
In order to solve issue #7633, it was suggested to make clip into an ufunc.
This is an partial implementation of that.

The tests should pass except for test_clip_nan (test_numeric.TestClip),
which tests NaN behaviour if either min or max is missing. Ufuncs doen't seem
to allow for missing arguements, so ideal (f)min/max are used. However, the
expected behaviour is asymmetrical: propagate NaN for the first, but not for
the other arguments. None of the existing ufuncs has this behaviour.

Left to do:

- Handle asymmetrical NaN behaviour is either min or max is missing
- Implement warning if unsafe typecasting is used
- Add a faster loop to loops.c.src to speed up if min or max are not arrays
- Update the docs
- Create benchmarks and test the new implementation
- Remove remnants of the old implementation from
- - numeric.py
- - fromnumeric.py
- - numpy_api.py
- - multiarray_api.*
- - arraytypes.*.src
- - calculation.*
@pimdh
Copy link
Contributor Author

pimdh commented Jun 10, 2017

I have rebased and incorporated #8475 in the ufunc docstrings. However, having looked again at the work that still needs to be done, I am fine with someone taking over. The complications that arise from the NaN's and the allowing of missing arguments make this a bit too complicated for me at the moment. ( @eric-wieser )

@eric-wieser
Copy link
Member

Continued in #12519

@seberg
Copy link
Member

seberg commented Dec 19, 2018

I guess we can close it for now then in favor of the new approach. The 3 ufunc approach seems more realistic anyway.

Thanks all!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants