-
Notifications
You must be signed in to change notification settings - Fork 7
feat(Refacto Sample Process): Refacto sampling module for stochastic processes (RenewalProcess and NHPP) #46
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: develop/2.2.X
Are you sure you want to change the base?
Conversation
change release version
…into develop/2.2.X
wilgri
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks a lot for your proposals ! :) it leads me to several questions. Not all of them must be addressed right now.
I also think this PR must include pytests that test the fitting process of the models from these sampled data
|
|
||
|
|
||
| class RenewalRewardProcessIterator(RenewalProcessIterator): | ||
| class RenewalRewardProcessIterator(StochasticDataIterator): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sample_step is missing and required by any StochasticDataIterator. Maybe inherited from RenewalProcessIterator would be better ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Definitely, error with ctrl+C, ctrl+V :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If possible, remove type hints. I'll type the code in stubfiles in a near future. At the moment, if it less verbose, it's easier to read (I think)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
don't you want to keep so that future stubfiles are easier to write ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If possible, remove type hints. I'll type the code in stubfiles in a near future. At the moment, if it less verbose, it's easier to read (I think)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If possible, remove type hints. I'll type the code in stubfiles in a near future. At the moment, if it less verbose, it's easier to read (I think)
|
|
||
|
|
||
| class CountDataIterator(Iterator[NDArray[np.void]], ABC): | ||
| def get_args_shape(lifetime_model) -> int: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ReLife has an utils module. Instead, consider to use get_args_nb_assets from utils._array_api. This function is exposed from utils so you can load it from there directly. If other array utilitary routines are needed, group them there if you think they can be used anywhere else in the code. Otherwise, leave them in the current module but hide them (so they would not be imported elsewhere by mistake)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems like get_args_nb_assets doesn't work the same way, i get different results
| @dataclass | ||
| class CountDataSample: | ||
| class StochasticDataSample: | ||
| t0: float |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
t0 data is only used to reconstruct timeline in the derived functions. tf is not used. Can timeline be directly generated with the correct values so that we can remove this dependency to t0 ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure how to do that, to discuss
| def __iter__(self) -> StochasticDataIterator: ... | ||
|
|
||
|
|
||
| def age_of_renewal_process_sampler( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it is not used, right ? remove it and keep the idea for a future release ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not used now, should we delete it ? or keep the code somewhere else in case we decide to implement it ?
| args = getattr(process.lifetime_model, "args", None) | ||
| if args: | ||
| broadcasted_args = list(np.repeat(arg, nb_samples, axis=0) for arg in args) | ||
| broadcasted_model = process.lifetime_model.unfreeze().freeze( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
args data can be set in FrozenParametricModel. Using the setter function here would be easier than unfreeze and freeze, no ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, didn't know about the setter function ! Definitely a better option then
| self.timeline.shape[1], nb_assets=self.timeline.shape[0], seed=self.seed | ||
| args = getattr(self.process.lifetime_model, "args", ()) | ||
| unfrozen_model = ( | ||
| self.process.lifetime_model.unfreeze() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
there is an is_frozen routine in utils. Using it would be more explicit than testing the presence of args
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cool ! Will use that too then, thanks
| self.hpp_timeline, *getattr(self.process, "args", ()) | ||
|
|
||
| ages = self.asset_ages.copy().reshape(-1, 1) | ||
| truncated_lifetime_model = LeftTruncatedModel(unfrozen_model).freeze( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, you can construct a conditional model from a frozen model instance. It would avoid this unfreeze/freeze logic.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What happens when you freeze a LeftTruncatedModel of a frozen model with an a0 ?
|
I'm working on it... but actually, your work makes me realize that I may have made a wrong choice in the way np.ndarray arguments are handled. My intuition comes from the trick you've used to pass Example to illustrate : In terms of shape and broadcasting, we should have : This way follows numpy logic. But currently, we have : It breaks numpy logic With this idea, this must be a valid computation : I think I have been misled by the idea of being closer to the representation of lifetime data in industry databases. This has led me to the point that a0 must be either 0d, (m,) or (m, 1). But in a sense, it violates the dependency inversion principle : abstractions (shape rules in ReLife) must not depend on details (real life encoding of a0, no more than 1d) but details must depend on abstractions. As a consequence, the code tends to use tricks in special cases we did not think about and in the long term the code will be worse and worse. What you did with NHPP is symptomatic. More generally, it has also consequences on policy and other bunch of codes in this module where I have to flatten or reshape... There is a remaining question about Then : In the second example, there is an error because g(covar) would give a shape of (3, 3) that is not broadcastable to a0 of shape (3, 2). A more direct way of encoding covar could be (k,), (k, m), (k, m, n), etc. where the number of coefficients is always held by the first dimension. |
|
Let this idea matures and being check If we agree, I propose to leave these modifications for another release, I'll commit my changes on your work and we incorporate your work on sample in the next release |
- time_window : I propose a small refactoring of the interface. With t0 optional and default to 0, its position in the arguments is not intuitive for the user. Passing a tuple of (t0, tf) reduce the number of arguments and make request more explicit - important : use black and isort formatter - remove useless keyword arguments : I prefer not to use keyword arguments if the number of arguments is limited because it encourages to write function call with different order of arguments. Then the code is less consistent. It is only a personal opinion. In this case, I think the number of arguments is small enough not to use keyword arguments in place of positional arguments. - refactoring get_lifetime_model_nb_assets : this belongs to utils. To avoid import errors, import in the function scope. I've also renamed the function to get_model_nb_assets as it is not specific to lifetime model and I've make it more generic so that it can be applied on any model - clean import : don't write from relife.lifetime_model.conditional_model import ... Prefer from relife.lifetime_model import ... as the exposition of the public interface is held by lifetime_model. I think it might be better to hide lifetime_model modules to avoid this kind of imports - remove unused type hints - missing __all__ to specify the exposed API - hide methods that are not supposed to be part of the public object interface but only belong to internal functionalities - hide iterator : not exposed publically, only iterables are used - remove step method : ambiguous with sample_step. I've also renamed sample_step to sample_time_event_entry as its purpose is to sample time, event, entry. It is more explicit. - refactoring of get_rvs_size : not specific to every StochasticDataIterator but only to those using a lifetime model. I've removed the method from the parent interface and propose to use specific properties for each derived class (if needed)
|
I've made some changes and proposed small refactoring. You can check this commit 39cf2aa
The tests you've written in stochastic_process have passed. Check if everything is OK on you're side. Also, why there is no I did not take time on |
|
Thank you :) Things I've noticed and that you may change :
About its purpose is described in PEP 557. From what I understand, it is just a container type that implements methods by default.
The generated methods are controlled by the usage of the Here, I think your purpose is more precise. You want to provide a convenient interface to interact with a struct array whose structure must remains unknown by the users. I think, dataclass functionnalities make this purpose less obvious. For instance, you have repeated procedures to create the Other alternatives : To avoid these repeated getter procedures, a first idea would be to create property methods for each attribute (cached_property can be considered but I think most of the computations is already done in Another idea is to group these getters in one A last idea, maybe the better one, is to consider using Mapping that requires
|
No description provided.