Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Conversation

@michaelleerilee
Copy link

Benjamin et al.:
The construction of 'a' does not correctly account for 'geographic' coordinates in 1.4.
On the branch I'm forwarding, I've made some modifications that seem to work for me.
Regards,
Mike
P.S. I have found PyKrige quite useful in jump-starting my experiments in kriging. Nicely done.

@rth
Copy link
Contributor

rth commented Aug 14, 2018

Thanks for your PR @michaelleerilee !

Do you mean that computing the great distance calculations is complex is much faster than doing that with great_circle_distance that is currently included in core.py?

Maybe @mjziebarth would be able to review this?

# Convert euclidean distances to great circle distances:
bd = core.euclid3_to_great_circle(bd)
# Note: xy_points_c & xy_data_c are in a packed-complex format
# Note: Using packed-complex because cdist won't work
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you please elaborate on this? cdist does work in ND space with N=2 that should be similar to complex, or am I missing something?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did not see where cdist handles 'geographic' lat-lon distances. It looked like you'd have to pass a lat-lon distance function in. Using Numpy array multiplication syntax seemed an efficient workaround. The bigger issue was that in some places Euclidean distances are used where geographic distances should be used for consistency. In fact, since this pull request, I found another place where Euclidean distances are incorrectly used regardless of the geographic flag. There's nothing wrong with core.euclid3_to_great_circle, per se.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also at this point, it may be better to reject this pull request, since I've another branch which has gone beyond this, treating the issue in a better way, reflecting a couple of months more study of the code. FWIW, ok.py's _get_krige_matrix also needs revision consistent with the way geographic coordinates are treated. I've not worked on the problem everywhere, just the flow that goes through the OK vectorized backend, which I needed for my work.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This part of the latlon-coordinate handling should be okay in the original code.
In line 706, the lon-lat-coordinates have been converted to 3d Euclidean coordinates, hence cdist + conversion.
This was, at the time, the least invasive change I could think of. Surely direct application of great circle distance should be better in the long run. I think the use of the KDTree for n_closest_points may have influenced my decision. If there's a way to use that with spherical coordinates, core.euclid3_to_great_circle could (and should probably) be omitted completely.

@mjziebarth
Copy link
Contributor

mjziebarth commented Aug 15, 2018

Very good spot, I missed the calculation of a when looking over the code. My apologies for that and thanks a lot Mike for pointing it out!
It's been a while since I was working with kriging and I'm not too familiar with the maths, but I guess this can have has quite severe consequences. especially if the variogram is complex.

How should we proceed with this? Ideally I'd suggest to have a fix up on PyPI asap. If I'm not mistaken, a single change in _get_kriging_matrix like this

if self.coordinates_type == 'euclidean':
    xy = np.concatenate((self.X_ADJUSTED[:, np.newaxis],
                         self.Y_ADJUSTED[:, np.newaxis]), axis=1)
    d = cdist(xy, xy, 'euclidean')
elif self.coordinates_type == 'geographic':
    d = great_circle_distance(self.X_ADJUSTED[:,np.newaxis], self.Y_ADJUSTED[:,np.newaxis],
                              self.X_ADJUSTED, self.Y_ADJUSTED)

should be enough to fix the urgent error.

Since Mike already mentioned the seperate branch that makes this PR obsolete (?), maybe it would be best to have a small PR with just the bug fix and then switching to that other branch. But we could also review this one.

PS: Sorry also for the slow response, for some reason I missed the original pull request notification.

Edit 2: I see that code snipped above is about what your development branch heads to, @michaelleerilee. So maybe that's a good way to fix the bug on master until that branch is ready? Also please correct me if there's any other urgent error besides the one in _get_kriging_matrix!

@michaelleerilee
Copy link
Author

michaelleerilee commented Aug 15, 2018 via email

@mjziebarth
Copy link
Contributor

I don't want to take the commit-cake so would you want to work a bit more on this branch? There are still some small things I'd like to discuss then. But also if your time is precious, I could prepare a small patch branch.

@mjziebarth
Copy link
Contributor

@michaelleerilee @rth (also @bsmurphy )
A bump since this has stalled a bit the last month and I think it's important to fix the bug. How shall we proceed?
Regarding this PR from my side:

  • I'm curious about the complex conversion of lon/lat (does it have benefits? In my opinion, it's a little less clear to read than handling lat/lon, but that's preference, I guess.).
  • We should use self.coordinates_type instead of passing coordinates (IIRC I've seen it in your more advanced branch).

In case no more work on this branch is desired, I've prepared a suggestion branch, but as stated previously, this PR should take precedence.

np.sin(lon_p) * np.cos(lat_p),
np.sin(lat_p)), axis=1)

# Packed-complex version.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just curious: Is there any advantage of using the complex version over two real arrays?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Regarding complex packing: I think there was a mapping function or technique that required a signature with a single argument. This may not be an issue any more.

print("cR =", self.cR)

def _get_kriging_matrix(self, n):
def _get_kriging_matrix(self, n, xy=None, coordinates='euclidean'):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can use self.coordinates_type instead of passing coordinates

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This sounds correct.

@mjziebarth
Copy link
Contributor

Dear @rth, @bsmurphy,
another bump. I feel like #113 and #114 may be caused by this error. From my side, we could also merge this and clean up some of the points later on. I'd volunteer for cleaning up, but I cannot merge this, so I'd need some help from one of you. I think it's especially urgent since this error may go undetected.
Regards!

@rth
Copy link
Contributor

rth commented Nov 21, 2018

I'll try to have a more detailed look this evening.

I agree that it is important to fix this, but I'm still not sure that adding a new implementation for the great circle distance is a good idea.

If we do add scikit-learn as a optional dependency (currently it's kind of an optional one to run some examples), it does have a fast C implementation cf scikit-learn/scikit-learn#12552 (comment). That one should be more thoroughly tested (and faster) that what we would be able to do here IMO, cf scikit-learn/scikit-learn#4458 (comment)

@michaelleerilee
Copy link
Author

michaelleerilee commented Nov 21, 2018 via email

@mjziebarth
Copy link
Contributor

Thanks for the quick answers!

I agree that it is important to fix this, but I'm still not sure that adding a new implementation for the great circle distance is a good idea.

Keeping the existing great circle distance calculation is quite reasonable.
Being new to the code and working on a deadline, I didn't have time to figure out a better way to calculate and propagate the pairwise distances. Also, I'm not completely happy that there are two great circle distance calculations in the existing code, I'd prefer just one, but in practice it seems they're close enough.

Okay, so I would also suggest staying with one implementation. I added the Euclidean-based distance function mainly to be least intrusive possible when I was new to the code base as well. However, by now I would agree that it would be better to use just one great circle distance function. For the n_closest_points case, one could use the great circle distance separately after the KDTree search has been performed. Then, the euclid3_to_great_circle could be removed.

@michaelleerilee I had written some code in that branch I mentioned which would be my take on doing just that. If it suits, you're welcome to copy that!

If we do add scikit-learn as a optional dependency (currently it's kind of an optional one to run some examples), it does have a fast C implementation cf scikit-learn/scikit-learn#12552 (comment). That one should be more thoroughly tested (and faster) that what we would be able to do here IMO, cf scikit-learn/scikit-learn#4458 (comment)

Sounds good to me. I was also thinking about optionally supporting ellipsoid great circle distance, i.e. the method by Karney (2013). I think I've seen a rather fast python wrapper somewhere (although IIRC it's still significantly slower than even the spherical great circle distance we have). Not sure how many use cases there are where that precision gain for real-world data is needed but I guess it would be nice to have. Maybe that's something for a future milestone?

@rth
Copy link
Contributor

rth commented Nov 22, 2018

@mjziebarth Could you please open a PR with the changes from your branch? That looks good. Unless someone is able to make this work without the second great circle calculation function, I think it might be preferable to merge that instead of this PR. We will find a way to acknowledge @michaelleerilee contribution in any case, which is very appreciated.

@mjziebarth mjziebarth mentioned this pull request Nov 23, 2018
@MuellerSeb
Copy link
Member

This can be closed, right?

@michaelleerilee
Copy link
Author

michaelleerilee commented Jan 24, 2020 via email

@MuellerSeb MuellerSeb closed this Jan 26, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants