Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Conversation

@ericjb
Copy link
Contributor

@ericjb ericjb commented Nov 11, 2024

Speedup _predict_out_of_sample in Issue #3224 - see @fkiraly comment there from Jul 2, 2023

I rewrote the method _predict_out_of_sample in class RecursiveReductionForecaster. The rewrite involved copying a similar method from a different class and modifying it to work with this class.

This PR is a bit hacky:

  • I left in much of the original code for reference, surrounded by `if False:'
  • I did not address the pooling = "global" case (I don't really know what that is)
  • I did a careful comparison that the numbers are identical pre/post - but that was only done on one example.

@fkiraly
Copy link
Collaborator

fkiraly commented Nov 12, 2024

For information: the pooling="global" case covers the case of fitting the regression model on multiple time series, i.e., on Panel typed containers.

The implementation however seems buggy at the moment, this is a different known issue.

Copy link
Collaborator

@fkiraly fkiraly left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice!

I think we need to enable tests for this estimator to ensure that nothing is broken.

What I wlil do, I will enable the tests and add a skip for the known failure in the "global" case. Then you can merge from main and we see what happens.

Further, do you have any profiling results that you can share? Pre/post?

@fkiraly fkiraly added module:forecasting forecasting module: forecasting, incl probabilistic and hierarchical forecasting enhancement Adding new functionality labels Nov 12, 2024
@ericjb
Copy link
Contributor Author

ericjb commented Nov 12, 2024

Attached: code I used to do the profiling and the profiling outputs. "Pre" = before my changes, "Post" = after my changes. The profiling was done only for the generation of the out-of-sample forecasts. Let me know if you have any questions.
example_PR_7380.py.gz
profile_7380_post.txt
profile_7380_pre.txt

@fkiraly
Copy link
Collaborator

fkiraly commented Nov 12, 2024

That's ... a factor of 1000???

Marvelous! Did you also understand where the time was lost? It looks like unnecessary copies of objects were made, and too many function calls?

@fkiraly
Copy link
Collaborator

fkiraly commented Nov 12, 2024

Re activating the tests, I got stuck there with a failed test run to identify the failing tests - I have now restarted.

@fkiraly
Copy link
Collaborator

fkiraly commented Nov 12, 2024

Btw, I would invite you to check the discord - another contributor (Julian) also wants to fix the issues with the global setting!

We could try to meet in one of the Friday 13 UTC meetups.

@fkiraly
Copy link
Collaborator

fkiraly commented Nov 12, 2024

Ok, I switched on the tests!

@ericjb
Copy link
Contributor Author

ericjb commented Nov 14, 2024

Btw, I would invite you to check the discord - another contributor (Julian) also wants to fix the issues with the global setting!

We could try to meet in one of the Friday 13 UTC meetups.

Questions:

  1. what does it mean to "check the discord" ? (esp. with reference to Julian)
  2. How does one join the Friday meetups? I think I tried in the past and failed. Also, Friday 13 UTC is generally a very awkward time for me (with some exceptions) due to prior commitments. Frustrating, as I'd definitely like to be a regular attendee.

@ericjb
Copy link
Contributor Author

ericjb commented Nov 14, 2024

That's ... a factor of 1000???

Marvelous! Did you also understand where the time was lost? It looks like unnecessary copies of objects were made, and too many function calls?

Too many function calls is an understatement - 250 million function calls! The routine is supposed to take a vector of 12 numbers, do essentially a few matrix multiplies against this vector, yielding a scalar. This gets repeated 36 times X 20 forecasters, i.e. less than 1,000 calculations. It should be essentially instantaneous. Almost everywhere one looks there are unnecessary actions being done. Why the copies in the first place? All the info from the fitted model is only being read, not written to. And is there really a need for checking the need for imputation? The model has already been fitted at this point. Also why grab the entire original time series, when only the last window is needed. And on it goes. [Also, even if a copy or check for imputation is needed, it should be needed only once, before the first time through the recursive loop.]

Copy link
Collaborator

@fkiraly fkiraly left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Excellent!

Could you kindly add the test cases we discussed in a file forecasting.compose.tests.test_reduce_2nd? With asserts optimally where differences were before the fix.

@ericjb
Copy link
Contributor Author

ericjb commented Nov 26, 2024

Note that these tests do not include the additional tests that I did which confirmed that the actual numerical results - forecasts and fitted values - were unchanged by the faster implementation. (It is not possible to automate this test as it requires different versions of the same file, or something equivalent.)

@fkiraly
Copy link
Collaborator

fkiraly commented Nov 26, 2024

Understood, makes sense - thanks!

@ericjb
Copy link
Contributor Author

ericjb commented Nov 30, 2024

Understood, makes sense - thanks!

What remains to be done here? Can it be completed?

…RegressionForecaster; fix some index issues for _predict_multiple
…tiindex - all related to DirectTabularRegressionForecaster
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement Adding new functionality module:forecasting forecasting module: forecasting, incl probabilistic and hierarchical forecasting

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants