-
Notifications
You must be signed in to change notification settings - Fork 199
Ok get kriging matrix coordinates fix #99
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Ok get kriging matrix coordinates fix #99
Conversation
|
Thanks for your PR @michaelleerilee ! Do you mean that computing the great distance calculations is complex is much faster than doing that with Maybe @mjziebarth would be able to review this? |
| # Convert euclidean distances to great circle distances: | ||
| bd = core.euclid3_to_great_circle(bd) | ||
| # Note: xy_points_c & xy_data_c are in a packed-complex format | ||
| # Note: Using packed-complex because cdist won't work |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you please elaborate on this? cdist does work in ND space with N=2 that should be similar to complex, or am I missing something?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did not see where cdist handles 'geographic' lat-lon distances. It looked like you'd have to pass a lat-lon distance function in. Using Numpy array multiplication syntax seemed an efficient workaround. The bigger issue was that in some places Euclidean distances are used where geographic distances should be used for consistency. In fact, since this pull request, I found another place where Euclidean distances are incorrectly used regardless of the geographic flag. There's nothing wrong with core.euclid3_to_great_circle, per se.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also at this point, it may be better to reject this pull request, since I've another branch which has gone beyond this, treating the issue in a better way, reflecting a couple of months more study of the code. FWIW, ok.py's _get_krige_matrix also needs revision consistent with the way geographic coordinates are treated. I've not worked on the problem everywhere, just the flow that goes through the OK vectorized backend, which I needed for my work.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This part of the latlon-coordinate handling should be okay in the original code.
In line 706, the lon-lat-coordinates have been converted to 3d Euclidean coordinates, hence cdist + conversion.
This was, at the time, the least invasive change I could think of. Surely direct application of great circle distance should be better in the long run. I think the use of the KDTree for n_closest_points may have influenced my decision. If there's a way to use that with spherical coordinates, core.euclid3_to_great_circle could (and should probably) be omitted completely.
|
Very good spot, I missed the calculation of How should we proceed with this? Ideally I'd suggest to have a fix up on PyPI asap. If I'm not mistaken, a single change in if self.coordinates_type == 'euclidean':
xy = np.concatenate((self.X_ADJUSTED[:, np.newaxis],
self.Y_ADJUSTED[:, np.newaxis]), axis=1)
d = cdist(xy, xy, 'euclidean')
elif self.coordinates_type == 'geographic':
d = great_circle_distance(self.X_ADJUSTED[:,np.newaxis], self.Y_ADJUSTED[:,np.newaxis],
self.X_ADJUSTED, self.Y_ADJUSTED)should be enough to fix the urgent error. Since Mike already mentioned the seperate branch that makes this PR obsolete (?), maybe it would be best to have a small PR with just the bug fix and then switching to that other branch. But we could also review this one. PS: Sorry also for the slow response, for some reason I missed the original pull request notification. Edit 2: I see that code snipped above is about what your development branch heads to, @michaelleerilee. So maybe that's a good way to fix the bug on master until that branch is ready? Also please correct me if there's any other urgent error besides the one in |
|
I agree a focussed patch and closing the PR is the way to go, though using two different methods to calculate (the same) distance in the same system raises a caution flag for me.
My development branch needs some work before the next step.
Off topic: One important point to consider is replacing the separate matrix inversion with an integrated matrix solver, which is more stable and should lead to better results.
|
|
I don't want to take the commit-cake so would you want to work a bit more on this branch? There are still some small things I'd like to discuss then. But also if your time is precious, I could prepare a small patch branch. |
|
@michaelleerilee @rth (also @bsmurphy )
In case no more work on this branch is desired, I've prepared a suggestion branch, but as stated previously, this PR should take precedence. |
| np.sin(lon_p) * np.cos(lat_p), | ||
| np.sin(lat_p)), axis=1) | ||
|
|
||
| # Packed-complex version. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just curious: Is there any advantage of using the complex version over two real arrays?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Regarding complex packing: I think there was a mapping function or technique that required a signature with a single argument. This may not be an issue any more.
| print("cR =", self.cR) | ||
|
|
||
| def _get_kriging_matrix(self, n): | ||
| def _get_kriging_matrix(self, n, xy=None, coordinates='euclidean'): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can use self.coordinates_type instead of passing coordinates
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This sounds correct.
|
Dear @rth, @bsmurphy, |
|
I'll try to have a more detailed look this evening. I agree that it is important to fix this, but I'm still not sure that adding a new implementation for the great circle distance is a good idea. If we do add scikit-learn as a optional dependency (currently it's kind of an optional one to run some examples), it does have a fast C implementation cf scikit-learn/scikit-learn#12552 (comment). That one should be more thoroughly tested (and faster) that what we would be able to do here IMO, cf scikit-learn/scikit-learn#4458 (comment) |
|
Keeping the existing great circle distance calculation is quite reasonable.
Being new to the code and working on a deadline, I didn't have time to figure out a better way to calculate and propagate the pairwise distances. Also, I'm not completely happy that there are two great circle distance calculations in the existing code, I'd prefer just one, but in practice it seems they're close enough.
Michael Lee Rilee, Ph.D.Rilee Systems Technologies LLCContact: [email protected]; Phone: 703-348-8127; Cell: 240-481-3254; Fax: 703-880-7202
From: Roman Yurchak <[email protected]>
To: bsmurphy/PyKrige <[email protected]>
Cc: Rilee <[email protected]>; Mention <[email protected]>
Sent: Wednesday, November 21, 2018 9:56 AM
Subject: Re: [bsmurphy/PyKrige] Ok get kriging matrix coordinates fix (#99)
I'll try to have a more detailed look this evening.I agree that it is important to fix this, but I'm still not sure that adding a new implementation for the great circle distance is a good idea.If we do add scikit-learn as a optional dependency (currently it's kind of an optional one to run some examples), it does have a fast C implementation cf scikit-learn/scikit-learn#12552 (comment). That one should be more thoroughly tested (and faster) that what we would be able to do here IMO, cf scikit-learn/scikit-learn#4458 (comment)—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub, or mute the thread.
|
|
Thanks for the quick answers!
Okay, so I would also suggest staying with one implementation. I added the Euclidean-based distance function mainly to be least intrusive possible when I was new to the code base as well. However, by now I would agree that it would be better to use just one great circle distance function. For the @michaelleerilee I had written some code in that branch I mentioned which would be my take on doing just that. If it suits, you're welcome to copy that!
Sounds good to me. I was also thinking about optionally supporting ellipsoid great circle distance, i.e. the method by Karney (2013). I think I've seen a rather fast python wrapper somewhere (although IIRC it's still significantly slower than even the spherical great circle distance we have). Not sure how many use cases there are where that precision gain for real-world data is needed but I guess it would be nice to have. Maybe that's something for a future milestone? |
|
@mjziebarth Could you please open a PR with the changes from your branch? That looks good. Unless someone is able to make this work without the second great circle calculation function, I think it might be preferable to merge that instead of this PR. We will find a way to acknowledge @michaelleerilee contribution in any case, which is very appreciated. |
|
This can be closed, right? |
|
I'm okay with closing it.
Michael Lee Rilee, Ph.D.Rilee Systems Technologies LLCContact: [email protected]; Phone: 703-348-8127; Cell: 240-481-3254; Fax: 703-880-7202
On Friday, January 24, 2020, 10:58:57 AM EST, Sebastian Müller <[email protected]> wrote:
This can be closed, right?
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub, or unsubscribe.
|
Benjamin et al.:
The construction of 'a' does not correctly account for 'geographic' coordinates in 1.4.
On the branch I'm forwarding, I've made some modifications that seem to work for me.
Regards,
Mike
P.S. I have found PyKrige quite useful in jump-starting my experiments in kriging. Nicely done.