BUG: implement fit_regularized for HurdleCountModel#9746
Open
wcwatson wants to merge 4 commits intostatsmodels:mainfrom
Open
BUG: implement fit_regularized for HurdleCountModel#9746wcwatson wants to merge 4 commits intostatsmodels:mainfrom
wcwatson wants to merge 4 commits intostatsmodels:mainfrom
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
NumPy's guide.
This PR implements
HurdleCountModel.fit_reqularized(...)so that it handles the two "component models" of the mixture. Inheriting from superclasses raises aValueError. See the original bug report for additional details. Changes made in support of that implementation include:HurdleCountModel:score_obs,score, andhessian.HurdleCountModel.fit(...)for greater parallelism with other classes, but I have left that as out-of-scope for this PR.L1HurdleCountResultsto avoid a diamond inheritance problem.L1HurdleCountResults(L1CountResults, HurdleCountResults)causes eitherdf_modelordf_residto be incorrectly defined, depending on the order of the superclasses. Adding passthrough**kwargsin relevant places does not solve the issue.L1CountResultsas a superclass and just duplicate the post-super().__init__()portion of the L1 results class inL1HurdleCountResults. This doesn't feel ideal and there might be a clever mixin solution, but that's out-of-scope for this PR.TestRegularizedHurdleSimulatedruns theCheckHurdlePredictsuite of tests to verify that the results object has attributes correctly populated and that predicted values are close to expected values. In order to match expected values, regularization in the "zero model" must be very weak.TestHurdleL1runs theCheckLikelihoodModelL1suite of tests to verify that results match "external" benchmarks. Since the implementation in R'spscllibrary does not fit regularized hurdle models, I simply ran a regularized model on thedocvisdataset and recorded the results.TestHurdleL1Compatibilityruns theCheckL1Compatibilitysuite of tests to verify that (i) weak/zero regularization yields coefficients equal to an unregularized model, and (ii) extremely strong regularization zeroes out coefficients.In addition to these changes, I've left some inline comments to document some odd behavior I observed in right-censored negative binomial models (used in the
HurdleCountModelwherezerodist="negbin"). Specifically,start_paramsneeds to be nonzero for regularized fitting to successfully converge. This seems possibly related to #9156 and resolving it is out-of-scope for this PR.Since this PR only implements functionality that the existing documentation already covers and claims to exist, I have not made any documentation changes.
The Azure pipeline is failing because of an error that arises only in the
python_314t_parallelinstance when testing functionality instatsmodels.nonparametric.... The new tests included in this PR pass.