-
-
Notifications
You must be signed in to change notification settings - Fork 1.8k
[ENH] capability:non_contiguous_X tag and skip non-contiguous X tests for affected estimators #9091
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[ENH] capability:non_contiguous_X tag and skip non-contiguous X tests for affected estimators #9091
Conversation
|
Thanks for your contribution, much appreciated. Can you please elaborate how are you identifying which forecasters support non-contiguous X and what does not? For example, if you check #8740, So, how did you determine what all forecasters face this issue? Is it from just test failures, or did you verify by checking one by one, or some other way? |
|
They were failing the For the example you have mentioned, I believe it doesn't fail cuz from 1:3 (what the estimator managed to predict) were contiguous already, no gaps, it couldn't predict 4 and 5 cuz it doesn't have X for them. I think this might mean theat the estimator behavior when passed fh, X that doesn't align in shape is taking the min, and won't fail as long as what it is trying to predict is contiguous. |
| # CI and test flags | ||
| # ----------------- | ||
| "tests:skip_by_name": ["test_predict_time_index_with_X"], | ||
| # known failure in case of non-contiguous X, see issue #8787 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just a minor nitpick regarding documentation - remove the "CI and test flags" section, since this is now a capability tag.
sktime/registry/_tags.py
Outdated
| ``fh``, the forecaster may receive exogenous data ``X`` that corresponds | ||
| only to the specific time points in ``fh``. | ||
| If the forecasting horizon is non-contiguous (e.g., ``fh=[2, 5]``), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is not true. The full X will be passed? Can you kindly check?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have checked and it's true that only X values that correspond to the specific time points in fh will be passed. This is due to passing fh instead of test_size to the temporal_train_test_split in the test_predict_time_index_with_X function.
We can find in the ForecastingHorizonSplitter at sktime\split\fh.py a note saying:
Users should note that, for non-contiguous forecasting horizons,
the union of training and test sets will not cover the entire time series.
sktime/registry/_tags.py
Outdated
| * Require contiguous data for their recursive prediction algorithms | ||
| If a forecaster has this tag set to ``False`` and receives non-contiguous | ||
| exogenous data, it will raise an error during prediction. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you maybe also add a paragraph that references ForecastX, which can be used to make forecasts for the missing X-indices in case the tag is False?
sktime/forecasting/base/_fh.py
Outdated
| return None | ||
|
|
||
|
|
||
| def _is_contiguous_fh(fh): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would add it to ForecastingHorizon as a method, and call it _is_contiguous
sktime/forecasting/base/_fh.py
Outdated
| if not isinstance(fh, ForecastingHorizon): | ||
| fh = ForecastingHorizon(fh) | ||
|
|
||
| try: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we should avoid using try/except in checks, since it masks actual bugs or failures. Instead, check for the exact condition that you want to check.
fkiraly
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great contribution, this is really useful!
I have left some comments above.
@yarnabrina @fkiraly Regarding passing horizon
note: the syntax I am using is arbitrary
|
|
Just now I have tested but what is way more weird is that I have tested @fkiraly could you please check if I am missing something? I have also checked with the exact |
|
@EmanAbdelhaleem, sorry, can you be precise what you exactly tested with what? You keep saying "I have tested |
Just a normal case like the above code I provided, this is what I mean by I tested, I initialize a model, fit it, then try to predict with various For what? |
|
Sorry to ask again, I have difficulties following you. Can you please be very precise under which conditions you tested what, and what is failing? |
Setup:
Test Results: by manually testing different fh values I got:Testing single-element horizons:
Testing multi-element horizons:
Initial Confusion: I initially thought the issue was about non-contiguous horizons (e.g.,
Given that, the first hypothesis that the issue is about non-contiguous horizons is contradicted, hence, declined.
I've conducted an investigation into this behavior by examining the ARDL implementation in both
The key finding is that the contiguous nature of predictions is not determined by whether the forecasting horizon is contiguous or non-contiguous. Instead, the |
|
After ruling out the non-contiguous horizon hypothesis, and knowing that the error I got from reproducing the exact failing test from #8787 was: I checked the Why This Happens: The test uses: y_train, _, X_train, X_test = temporal_train_test_split(y, X, fh=fh)When
So with Summary: The test fails because it's passing incomplete I used some LLM help to make the text clearer, but if anything is still unclear, please point out the exact part or tell me how you would test it, and I’ll follow your steps. I realize I may be using the wrong terms as it's my first PR, my apologies for that. |
|
I see, thanks. So we are probably dealing with two problems:
I would do as follows:
|
|
@fkiraly I have made some changes. Kindly, review them. |
Sorry, can you explain what you mean? |
fkiraly
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great contribution!
@fkiraly Never mind. I think I misunderstood things, I thought you meant to remove I see that there was a trailing whitespace and you already added a commit to remove it. We can run the checks again, right? |
|
Wow, this is my first merged PR, very happy for it. Thank you so much for the review and guidance Dr. @fkiraly ! |
Reference Issues/PRs
Fixes #8787
What does this implement/fix? Explain your changes.
This PR addresses a test failure affecting several estimators that require contiguous exogenous data (
X) when making predictions at non-contiguous forecasting horizons (e.g.,fh=[2, 5]).Changes implemented:
capability:non_contiguous_X:True(most forecasters can handle non-contiguous X)Falsefor forecasters that require contiguous Xsktime/registry/_tags.pywith full documentation_is_contiguous_fh()helper function insktime/forecasting/base/_fh.py:test_predict_time_index_with_Xinsktime/forecasting/tests/test_all_forecasters.py:capability:non_contiguous_X=False"tests:skip_by_name": ["test_predict_time_index_with_X"]tag, and Setcapability:non_contiguous_X=Falsefor affected forecasters:ARDL(sktime/forecasting/ardl.py)DynamicFactor(sktime/forecasting/dynamic_factor.py)SkforecastRecursive(sktime/forecasting/compose/_skforecast_reduce.py)StatsForecastAutoARIMA(sktime/forecasting/statsforecast.py)StatsModelsARIMA(sktime/forecasting/arima/_statsmodels.py)UnobservedComponents(sktime/forecasting/structural.py)Does your contribution introduce a new dependency? If yes, which one?
No
Did you add any tests for the change?
The existing
test_predict_time_index_with_Xtest now properly handles estimators with this limitation by skipping non-contiguous test cases.PR checklist