Thanks to visit codestin.com
Credit goes to github.com

Skip to content

MAINT: trapz works on non-monotonic x #10428

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 1 commit into from
Closed

Conversation

andyfaff
Copy link
Member

This PR allows trapz to take non monotonic x arrays. It achieves this by a sort along the required axis.
1 and 3-D tests are added.

@andyfaff
Copy link
Member Author

Think the error comes because the x arr needs to be broadcast against y.

@andyfaff
Copy link
Member Author

Only one test failing now, but I'm not sure how to use broadcast_to with masked arrays, unless I do some object testing.

x = np.copy(x).reshape(shape)

x, y = np.broadcast_arrays(x, y)
idxs = _argsort_indices(x, axis)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm fairly sure you just want argsort here, and your test is too simple to tell the difference - argsort([1, 3, 2]) == _argsort_indices([1, 2, 3])

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

plain argsort gives the indices for a multidimensional array, but you can't directly use the output of argsort to sort that array.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

>>> a = np.arange(9)
>>> np.random.shuffle(a)
>>> a = a.reshape((3, 3))
>>> print(a)
[[7 6 5]
 [4 3 1]
 [0 2 8]]
>>> i = np.argsort(a)
>>> print(i)
[[2 1 0]
 [2 1 0]
 [0 1 2]]
>>> print(a[i])
[[[0 2 8]
  [4 3 1]
  [7 6 5]]

 [[0 2 8]
  [4 3 1]
  [7 6 5]]

 [[7 6 5]
  [4 3 1]
  [0 2 8]]]

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Apologies, I muddled this with the permutation PR (#9880), which at one point used a similar name

@andyfaff
Copy link
Member Author

I should also point out that the documentation of np.trapz doesn't mention what the behaviour is with masked arrays. Whereas the tests say:

Testing that masked arrays behave as if the function is 0 where masked

The test example given is:

x = np.arange(5)
y = x * x
mask = x == 2
ym = np.ma.array(y, mask=mask)
print(trapz(ym, x))
r = 13.0  # sum(0.5 * (0 + 1) * 1.0 + 0.5 * (9 + 16))
assert_almost_equal(trapz(ym, x), r)

However, the result is incorrect according to the comment in the test. If the comment is correct then the answer should be 18.0.

x = np.arange(5)
y = x * x
y[2] = 0
print(trapz(y, x))

@eric-wieser
Copy link
Member

eric-wieser commented Jan 18, 2018

The current behavior is already correct:

>>> x = np.array([0, 2, 1, 3])
>>> y = np.array([0, 2, 3, 5])
>>> np.trapz(y, x)
7.5

You can check that the answer is correct graphically:

>>> np.trapz(
>>> from matplotlib import pyplot as plt
>>> fig, ax = plt.subplots()
>>> ax.plot(x, y)
>>> ax.grid()
>>> ax.axis('equal')
>>> plt.show()

numpy-10428

The area to the bottom-right of the line is indeed 7.5 squares

@andyfaff
Copy link
Member Author

The rationale for trapz is that it's for sampled x/y points. Obviously the order matters.

>>> np.random.seed(2)
>>> x = np.arange(6)
>>> np.random.shuffle(x)
>>> print(np.trapz(x,x))
-8.0

There will be a non-negligible population who think the answer should be 0.5 * 5 * 5 = 12.5.

I don't mind if the PR doesn't progress further, but then the documentation should probably be changed to say that the order of x matters.

@eric-wieser
Copy link
Member

So to elaborate, the current behavior allows integration of parametric functions that are not monotonic in x and/or y, in a mathematically meaningful way.

If your data is un-ordered, then I think it's up to you as the user to put it in the order that makes sense for your function.

Another example of why the existing implementation is useful:

>>> theta = np.linspace(0, 2*np.pi, 1000)
>>> x = np.sin(theta)
>>> y = np.cos(theta)
>>> np.trapz(y, x)
3.1415719413758412   # area of the approximate circle = approximation to pi!

It does sound like the documentation could be improved

@eric-wieser
Copy link
Member

There will be a non-negligible population who think the answer should be 0.5 * 5 * 5 = 12.5.

These people will probably realize that they need to order their data when they try to do a line plot in matplotlib, which also cares about the order of their data.

More worryingly, if you organically have unordered data (random measurements from a process), sorting by one coordinate is a very bad way to produce a fit, and integration under that fit doesn't sound like a meaningful thing to do to me.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants