ENH: prediction based on nearest model only or a custom bandwidth#52
ENH: prediction based on nearest model only or a custom bandwidth#52martinfleis merged 3 commits intomainfrom
Conversation
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #52 +/- ##
=======================================
Coverage ? 86.17%
=======================================
Files ? 6
Lines ? 752
Branches ? 0
=======================================
Hits ? 648
Misses ? 104
Partials ? 0 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
There was a problem hiding this comment.
Pull request overview
This PR implements prediction based on nearest local model as an alternative to the existing ensemble approach, addressing issue #42. The enhancement provides users with a faster prediction option when working with large bandwidths, while maintaining the option to use the more computationally expensive ensemble method.
Key changes:
- Added a
methodparameter topredict()andpredict_proba()methods with options "nearest" (default) or "ensemble" - Implemented
_prepare_prediction_nearest()to find the nearest training location for prediction - Fixed indexing bug in
_prepare_prediction_neighborhoods()to properly map spatial indices to model IDs
Reviewed changes
Copilot reviewed 2 out of 3 changed files in this pull request and generated 11 comments.
| File | Description |
|---|---|
| gwlearn/base.py | Adds method parameter to prediction methods, implements nearest-neighbor prediction logic, refactors ensemble prediction into _predict_local_ensemble(), and fixes spatial index mapping bug |
| gwlearn/tests/test_base.py | Parametrizes existing prediction tests to cover both "nearest" and "ensemble" methods, adjusts assertions to handle potential NA values from nearest method, and adds method-specific test cases for value validation |
Comments suppressed due to low confidence (1)
gwlearn/tests/test_base.py:630
- The test should differentiate between the "nearest" and "ensemble" methods, similar to how
test_regressor_predict_comparison_with_focal_preddoes. With the "nearest" method and predicting on training data, the predictions should exactly match the focal probabilities (proba_), not just be similar with tolerance. Consider adding a conditional check: for "nearest" use exact comparison, for "ensemble" use the current tolerance-based comparison.
# Get predictions for the same data used for training
predicted_proba = clf.predict_proba(X, geometry, method=method)
# Compare with proba_ (should be very similar but not identical
# because proba_ is calculated during training without using the focal point)
pd.testing.assert_series_equal(
predicted_proba.loc[2],
clf.proba_.loc[2],
check_exact=False,
atol=0.05, # Allow some tolerance because they're not identical
)
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
xref #42