Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Conversation

@Nathan-Herzhaft
Copy link
Collaborator

No description provided.

@wilgri wilgri self-assigned this Nov 14, 2025
@wilgri wilgri marked this pull request as draft November 14, 2025 08:30
Copy link
Collaborator

@wilgri wilgri left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot for your proposals ! :) it leads me to several questions. Not all of them must be addressed right now.

I also think this PR must include pytests that test the fitting process of the models from these sampled data



class RenewalRewardProcessIterator(RenewalProcessIterator):
class RenewalRewardProcessIterator(StochasticDataIterator):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sample_step is missing and required by any StochasticDataIterator. Maybe inherited from RenewalProcessIterator would be better ?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Definitely, error with ctrl+C, ctrl+V :)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If possible, remove type hints. I'll type the code in stubfiles in a near future. At the moment, if it less verbose, it's easier to read (I think)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

don't you want to keep so that future stubfiles are easier to write ?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If possible, remove type hints. I'll type the code in stubfiles in a near future. At the moment, if it less verbose, it's easier to read (I think)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If possible, remove type hints. I'll type the code in stubfiles in a near future. At the moment, if it less verbose, it's easier to read (I think)



class CountDataIterator(Iterator[NDArray[np.void]], ABC):
def get_args_shape(lifetime_model) -> int:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ReLife has an utils module. Instead, consider to use get_args_nb_assets from utils._array_api. This function is exposed from utils so you can load it from there directly. If other array utilitary routines are needed, group them there if you think they can be used anywhere else in the code. Otherwise, leave them in the current module but hide them (so they would not be imported elsewhere by mistake)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like get_args_nb_assets doesn't work the same way, i get different results

@dataclass
class CountDataSample:
class StochasticDataSample:
t0: float
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

t0 data is only used to reconstruct timeline in the derived functions. tf is not used. Can timeline be directly generated with the correct values so that we can remove this dependency to t0 ?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure how to do that, to discuss

def __iter__(self) -> StochasticDataIterator: ...


def age_of_renewal_process_sampler(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it is not used, right ? remove it and keep the idea for a future release ?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not used now, should we delete it ? or keep the code somewhere else in case we decide to implement it ?

args = getattr(process.lifetime_model, "args", None)
if args:
broadcasted_args = list(np.repeat(arg, nb_samples, axis=0) for arg in args)
broadcasted_model = process.lifetime_model.unfreeze().freeze(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

args data can be set in FrozenParametricModel. Using the setter function here would be easier than unfreeze and freeze, no ?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, didn't know about the setter function ! Definitely a better option then

self.timeline.shape[1], nb_assets=self.timeline.shape[0], seed=self.seed
args = getattr(self.process.lifetime_model, "args", ())
unfrozen_model = (
self.process.lifetime_model.unfreeze()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there is an is_frozen routine in utils. Using it would be more explicit than testing the presence of args

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool ! Will use that too then, thanks

self.hpp_timeline, *getattr(self.process, "args", ())

ages = self.asset_ages.copy().reshape(-1, 1)
truncated_lifetime_model = LeftTruncatedModel(unfrozen_model).freeze(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, you can construct a conditional model from a frozen model instance. It would avoid this unfreeze/freeze logic.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happens when you freeze a LeftTruncatedModel of a frozen model with an a0 ?

@wilgri
Copy link
Collaborator

wilgri commented Nov 21, 2025

I'm working on it... but actually, your work makes me realize that I may have made a wrong choice in the way np.ndarray arguments are handled. My intuition comes from the trick you've used to pass a0 in NHPP sampling. We should not have to repeat values the way you did. The only reason why you did this is because a0 can only have shape (m, 1) or (m,). But I think there must no reason for a0 to be (m, n) or even more (but more would have no sense in real life).

Example to illustrate :
suppose we have

>>> model = LeftTruncatedModel(Weibull())

In terms of shape and broadcasting, we should have :

model.sf(time : (), a0 : (5,)) -> (5,)
model.sf(time : (4, 1), a0 : (2,)) -> (4, 2)
model.sf(time : (3,), a0 : (2,)) # ERROR because (3,) can not be broadcast to (2,)

This way follows numpy logic. But currently, we have :

model.sf(time : (), a0 : (5,)) -> (5,1)

It breaks numpy logic

With this idea, this must be a valid computation :

model.sf(time : (3, 1), a0 : (3, 2)) -> (3, 2)

I think I have been misled by the idea of being closer to the representation of lifetime data in industry databases. This has led me to the point that a0 must be either 0d, (m,) or (m, 1). But in a sense, it violates the dependency inversion principle : abstractions (shape rules in ReLife) must not depend on details (real life encoding of a0, no more than 1d) but details must depend on abstractions. As a consequence, the code tends to use tricks in special cases we did not think about and in the long term the code will be worse and worse. What you did with NHPP is symptomatic.

More generally, it has also consequences on policy and other bunch of codes in this module where I have to flatten or reshape...

There is a remaining question about covar in LifetimeRegression. Currently, if the model has k coefficients, covar can be (k,) or (m, k). I think, the right way to encode covar so that if follows numpy abstractions and not numpy following relife abstractions, would be a sequence of k numpy arrays.

# model has 3 covar
>>> model = LeftTruncatedModel(ProportionalHazard(Weibull()))

Then :

model.sf(time : (3, 1), a0 : (3, 2), covar : [(3, 1), (3, 1), ()]) -> (3, 2)
model.sf(time : (3, 1), a0 : (3, 2), covar : [(3,), (3, 1), ()]) # ERROR

In the second example, there is an error because g(covar) would give a shape of (3, 3) that is not broadcastable to a0 of shape (3, 2). A more direct way of encoding covar could be (k,), (k, m), (k, m, n), etc. where the number of coefficients is always held by the first dimension.

@wilgri
Copy link
Collaborator

wilgri commented Nov 21, 2025

Let this idea matures and being check

If we agree, I propose to leave these modifications for another release, I'll commit my changes on your work and we incorporate your work on sample in the next release

- time_window : I propose a small refactoring of the interface. With t0
optional and default to 0, its position in the arguments is not
intuitive for the user. Passing a tuple of (t0, tf) reduce the number of
arguments and make request more explicit
- important : use black and isort formatter
- remove useless keyword arguments : I prefer not to use keyword
arguments if the number of arguments is limited because it encourages to
write function call with different order of arguments. Then the code is
less consistent. It is only a personal opinion. In this case, I think
the number of arguments is small enough not to use keyword arguments in
place of positional arguments.
- refactoring get_lifetime_model_nb_assets : this belongs to utils. To
avoid import errors, import in the function scope. I've also renamed the
function to get_model_nb_assets as it is not specific to lifetime model
and I've make it more generic so that it can be applied on any model
 - clean import : don't write from
relife.lifetime_model.conditional_model import ... Prefer from
relife.lifetime_model import ... as the exposition of the public
interface is held by lifetime_model. I think it might be better to hide
lifetime_model modules to avoid this kind of imports
- remove unused type hints
- missing __all__ to specify the exposed API
- hide methods that are not supposed to be part of the public object
interface but only belong to internal functionalities
- hide iterator : not exposed publically, only iterables are used
- remove step method : ambiguous with sample_step. I've also renamed
sample_step to sample_time_event_entry as its purpose is to sample time,
event, entry. It is more explicit.
- refactoring of get_rvs_size : not specific to every
StochasticDataIterator but only to those using a lifetime model. I've
removed the method from the parent interface and propose to use specific
properties for each derived class (if needed)
@wilgri
Copy link
Collaborator

wilgri commented Nov 24, 2025

I've made some changes and proposed small refactoring. You can check this commit 39cf2aa
In summary :

  • I've changed t0, tf arguments to time_window. It is clearer than having an optional t0 placed after tf...
  • I've isolated a function get_model_nb_assets in relife.utils and simplified the init process of the iterators
  • I've removed broadcasted_process in NHPP iterator and replace it with an "expanded" lifetime model it is more straithforward
  • small renamings of functions to make them more explicit, black and isort reformating and hidding things that are not supposed to be public

The tests you've written in stochastic_process have passed. Check if everything is OK on you're side.
I've noticed that test_non_homogeneous_poisson_process.py::TestAgeReplacementDistribution is not stable. Sometimes it crashes. Can you investigate ?

Also, why there is no test_non_homogeneous_poisson_process.py::TestAgeReplacementRegression ?

I did not take time on _sample/_data.py but at first, I was confused by the existence of get_sample_ids and sample_id. What's the difference ?

@wilgri
Copy link
Collaborator

wilgri commented Dec 2, 2025

Thank you :)

Things I've noticed and that you may change :

  • syntax error in __post_init__
  • It seems that StochasticRewardDataSample is never used. Is it a mistake ?
  • not sure to get the point of using dataclass (see discussion below)
  • _select_from_struct is only used by test functions and is hidden. Is it a required functionnality or is it only for the purpose of testing ? Its name is also quite ambiguous with select.
  • asset_id and sample_id type hints are not approriate. They are too specific. You may consider using a type alias

About dataclass :

its purpose is described in PEP 557. From what I understand, it is just a container type that implements methods by default.

the decorator adds generated methods to the class and returns the same class it was given

The generated methods are controlled by the usage of the field function and are just methods that allows to compare and compute basic arithmetic on 2 dataclass objects.

Here, I think your purpose is more precise. You want to provide a convenient interface to interact with a struct array whose structure must remains unknown by the users. I think, dataclass functionnalities make this purpose less obvious. For instance, you have repeated procedures to create the events, preventive_renewals or rewards attributes. You do the same array assigments and you return an array. By using dataclass, you're encouraged to write these procedures in the attribute assignments and repeat them. In addition, replace in select could be replaced by a call to the class constructor itself.

Other alternatives :

To avoid these repeated getter procedures, a first idea would be to create property methods for each attribute (cached_property can be considered but I think most of the computations is already done in np.unique). Each property call a hidden methods that does your array assignment. It makes the interface clearer.

Another idea is to group these getters in one __getattr__. The doc specifies which attr call are allowed and an AttributeError is raised for unknown ones. It also removes the inheritance that you've used to add reward. There will be only one unique interface that is look-up into the struct array fields and that may raise an error if the field does not exist.

A last idea, maybe the better one, is to consider using Mapping that requires __getitem__, __len__ and __iter__ functions. You can delegate your procedures to construct events, preventive_renewals and rewards in __getitem__. It makes the interface easier to understand and you get the benefits of other mapping functionnalities !

  1. I have container object (more precisely a Mapping) that encapsulate data in a struct array (that must be hidden)
  2. I can access to values of the struct array thanks to __getitem__ and I directly see in the code how the struct array is processed for each __getitem__ call
  3. I can iterate through my mapping (same as dict). The __iter__ method is here for that. I think it just has to yield the keys of the mapping (I'm not sure so it must be verified)
  4. The mapping is immutable but I can select a subpart of the mapping with select that returns a new mapping.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants