-
-
Notifications
You must be signed in to change notification settings - Fork 1.7k
[ENH] major speedup for _predict_out_of_sample for RecursiveReductionForeca… #7380
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
For information: the The implementation however seems buggy at the moment, this is a different known issue. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice!
I think we need to enable tests for this estimator to ensure that nothing is broken.
What I wlil do, I will enable the tests and add a skip for the known failure in the "global"
case. Then you can merge from main
and we see what happens.
Further, do you have any profiling results that you can share? Pre/post?
Attached: code I used to do the profiling and the profiling outputs. "Pre" = before my changes, "Post" = after my changes. The profiling was done only for the generation of the out-of-sample forecasts. Let me know if you have any questions. |
That's ... a factor of 1000??? Marvelous! Did you also understand where the time was lost? It looks like unnecessary copies of objects were made, and too many function calls? |
Re activating the tests, I got stuck there with a failed test run to identify the failing tests - I have now restarted. |
Btw, I would invite you to check the discord - another contributor (Julian) also wants to fix the issues with the We could try to meet in one of the Friday 13 UTC meetups. |
Ok, I switched on the tests! |
Questions:
|
Too many function calls is an understatement - 250 million function calls! The routine is supposed to take a vector of 12 numbers, do essentially a few matrix multiplies against this vector, yielding a scalar. This gets repeated 36 times X 20 forecasters, i.e. less than 1,000 calculations. It should be essentially instantaneous. Almost everywhere one looks there are unnecessary actions being done. Why the copies in the first place? All the info from the fitted model is only being read, not written to. And is there really a need for checking the need for imputation? The model has already been fitted at this point. Also why grab the entire original time series, when only the last window is needed. And on it goes. [Also, even if a copy or check for imputation is needed, it should be needed only once, before the first time through the recursive loop.] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Excellent!
Could you kindly add the test cases we discussed in a file forecasting.compose.tests.test_reduce_2nd
? With asserts optimally where differences were before the fix.
Note that these tests do not include the additional tests that I did which confirmed that the actual numerical results - forecasts and fitted values - were unchanged by the faster implementation. (It is not possible to automate this test as it requires different versions of the same file, or something equivalent.) |
Understood, makes sense - thanks! |
What remains to be done here? Can it be completed? |
… (does not seem to do that, though)
…RegressionForecaster; fix some index issues for _predict_multiple
…tiindex - all related to DirectTabularRegressionForecaster
…regular time series (hence no freq)
Speedup _predict_out_of_sample in Issue #3224 - see @fkiraly comment there from Jul 2, 2023
I rewrote the method _predict_out_of_sample in class RecursiveReductionForecaster. The rewrite involved copying a similar method from a different class and modifying it to work with this class.
This PR is a bit hacky: