feat(Refacto Sample Process): Refacto sampling module for stochastic processes (RenewalProcess and NHPP) #46

Nathan-Herzhaft · 2025-11-06T13:45:22Z

No description provided.

change release version

…into develop/2.2.X

wilgri

Thanks a lot for your proposals ! :) it leads me to several questions. Not all of them must be addressed right now.

I also think this PR must include pytests that test the fitting process of the models from these sampled data

wilgri · 2025-11-14T08:39:39Z

relife/stochastic_process/_sample/iterators.py

-
-
-class RenewalRewardProcessIterator(RenewalProcessIterator):
+class RenewalRewardProcessIterator(StochasticDataIterator):


sample_step is missing and required by any StochasticDataIterator. Maybe inherited from RenewalProcessIterator would be better ?

Definitely, error with ctrl+C, ctrl+V :)

wilgri · 2025-11-14T08:43:20Z

relife/stochastic_process/_sample/_data.py

If possible, remove type hints. I'll type the code in stubfiles in a near future. At the moment, if it less verbose, it's easier to read (I think)

don't you want to keep so that future stubfiles are easier to write ?

wilgri · 2025-11-14T08:43:29Z

relife/stochastic_process/_sample/iterables.py

If possible, remove type hints. I'll type the code in stubfiles in a near future. At the moment, if it less verbose, it's easier to read (I think)

wilgri · 2025-11-14T08:43:34Z

relife/stochastic_process/_sample/iterators.py

If possible, remove type hints. I'll type the code in stubfiles in a near future. At the moment, if it less verbose, it's easier to read (I think)

wilgri · 2025-11-14T08:50:06Z

relife/stochastic_process/_sample/iterators.py



-class CountDataIterator(Iterator[NDArray[np.void]], ABC):
+def get_args_shape(lifetime_model) -> int:


ReLife has an utils module. Instead, consider to use get_args_nb_assets from utils._array_api. This function is exposed from utils so you can load it from there directly. If other array utilitary routines are needed, group them there if you think they can be used anywhere else in the code. Otherwise, leave them in the current module but hide them (so they would not be imported elsewhere by mistake)

Seems like get_args_nb_assets doesn't work the same way, i get different results

wilgri · 2025-11-14T09:41:00Z

relife/stochastic_process/_sample/data.py

 @dataclass
-class CountDataSample:
+class StochasticDataSample:
    t0: float


t0 data is only used to reconstruct timeline in the derived functions. tf is not used. Can timeline be directly generated with the correct values so that we can remove this dependency to t0 ?

Not sure how to do that, to discuss

wilgri · 2025-11-14T09:44:49Z

relife/stochastic_process/_sample/iterables.py

+    def __iter__(self) -> StochasticDataIterator: ...


 def age_of_renewal_process_sampler(


it is not used, right ? remove it and keep the idea for a future release ?

Not used now, should we delete it ? or keep the code somewhere else in case we decide to implement it ?

wilgri · 2025-11-14T09:53:09Z

relife/stochastic_process/_sample/iterators.py

+        args = getattr(process.lifetime_model, "args", None)
+        if args:
+            broadcasted_args = list(np.repeat(arg, nb_samples, axis=0) for arg in args)
+            broadcasted_model = process.lifetime_model.unfreeze().freeze(


args data can be set in FrozenParametricModel. Using the setter function here would be easier than unfreeze and freeze, no ?

Okay, didn't know about the setter function ! Definitely a better option then

wilgri · 2025-11-14T09:55:07Z

relife/stochastic_process/_sample/iterators.py

-            self.timeline.shape[1], nb_assets=self.timeline.shape[0], seed=self.seed
+        args = getattr(self.process.lifetime_model, "args", ())
+        unfrozen_model = (
+            self.process.lifetime_model.unfreeze()


there is an is_frozen routine in utils. Using it would be more explicit than testing the presence of args

Cool ! Will use that too then, thanks

wilgri · 2025-11-14T09:57:47Z

relife/stochastic_process/_sample/iterators.py

-            self.hpp_timeline, *getattr(self.process, "args", ())
+
+        ages = self.asset_ages.copy().reshape(-1, 1)
+        truncated_lifetime_model = LeftTruncatedModel(unfrozen_model).freeze(


Actually, you can construct a conditional model from a frozen model instance. It would avoid this unfreeze/freeze logic.

What happens when you freeze a LeftTruncatedModel of a frozen model with an a0 ?

…sage

wilgri · 2025-11-21T16:27:21Z

I'm working on it... but actually, your work makes me realize that I may have made a wrong choice in the way np.ndarray arguments are handled. My intuition comes from the trick you've used to pass a0 in NHPP sampling. We should not have to repeat values the way you did. The only reason why you did this is because a0 can only have shape (m, 1) or (m,). But I think there must no reason for a0 to be (m, n) or even more (but more would have no sense in real life).

Example to illustrate :
suppose we have

>>> model = LeftTruncatedModel(Weibull())

In terms of shape and broadcasting, we should have :

model.sf(time : (), a0 : (5,)) -> (5,)
model.sf(time : (4, 1), a0 : (2,)) -> (4, 2)
model.sf(time : (3,), a0 : (2,)) # ERROR because (3,) can not be broadcast to (2,)

This way follows numpy logic. But currently, we have :

model.sf(time : (), a0 : (5,)) -> (5,1)

It breaks numpy logic

With this idea, this must be a valid computation :

model.sf(time : (3, 1), a0 : (3, 2)) -> (3, 2)

I think I have been misled by the idea of being closer to the representation of lifetime data in industry databases. This has led me to the point that a0 must be either 0d, (m,) or (m, 1). But in a sense, it violates the dependency inversion principle : abstractions (shape rules in ReLife) must not depend on details (real life encoding of a0, no more than 1d) but details must depend on abstractions. As a consequence, the code tends to use tricks in special cases we did not think about and in the long term the code will be worse and worse. What you did with NHPP is symptomatic.

More generally, it has also consequences on policy and other bunch of codes in this module where I have to flatten or reshape...

There is a remaining question about covar in LifetimeRegression. Currently, if the model has k coefficients, covar can be (k,) or (m, k). I think, the right way to encode covar so that if follows numpy abstractions and not numpy following relife abstractions, would be a sequence of k numpy arrays.

# model has 3 covar
>>> model = LeftTruncatedModel(ProportionalHazard(Weibull()))

Then :

model.sf(time : (3, 1), a0 : (3, 2), covar : [(3, 1), (3, 1), ()]) -> (3, 2)
model.sf(time : (3, 1), a0 : (3, 2), covar : [(3,), (3, 1), ()]) # ERROR

In the second example, there is an error because g(covar) would give a shape of (3, 3) that is not broadcastable to a0 of shape (3, 2). A more direct way of encoding covar could be (k,), (k, m), (k, m, n), etc. where the number of coefficients is always held by the first dimension.

wilgri · 2025-11-21T16:32:58Z

Let this idea matures and being check

If we agree, I propose to leave these modifications for another release, I'll commit my changes on your work and we incorporate your work on sample in the next release

- time_window : I propose a small refactoring of the interface. With t0 optional and default to 0, its position in the arguments is not intuitive for the user. Passing a tuple of (t0, tf) reduce the number of arguments and make request more explicit - important : use black and isort formatter - remove useless keyword arguments : I prefer not to use keyword arguments if the number of arguments is limited because it encourages to write function call with different order of arguments. Then the code is less consistent. It is only a personal opinion. In this case, I think the number of arguments is small enough not to use keyword arguments in place of positional arguments. - refactoring get_lifetime_model_nb_assets : this belongs to utils. To avoid import errors, import in the function scope. I've also renamed the function to get_model_nb_assets as it is not specific to lifetime model and I've make it more generic so that it can be applied on any model - clean import : don't write from relife.lifetime_model.conditional_model import ... Prefer from relife.lifetime_model import ... as the exposition of the public interface is held by lifetime_model. I think it might be better to hide lifetime_model modules to avoid this kind of imports - remove unused type hints - missing __all__ to specify the exposed API - hide methods that are not supposed to be part of the public object interface but only belong to internal functionalities - hide iterator : not exposed publically, only iterables are used - remove step method : ambiguous with sample_step. I've also renamed sample_step to sample_time_event_entry as its purpose is to sample time, event, entry. It is more explicit. - refactoring of get_rvs_size : not specific to every StochasticDataIterator but only to those using a lifetime model. I've removed the method from the parent interface and propose to use specific properties for each derived class (if needed)

wilgri · 2025-11-24T12:29:55Z

I've made some changes and proposed small refactoring. You can check this commit 39cf2aa
In summary :

I've changed t0, tf arguments to time_window. It is clearer than having an optional t0 placed after tf...
I've isolated a function get_model_nb_assets in relife.utils and simplified the init process of the iterators
I've removed broadcasted_process in NHPP iterator and replace it with an "expanded" lifetime model it is more straithforward
small renamings of functions to make them more explicit, black and isort reformating and hidding things that are not supposed to be public

The tests you've written in stochastic_process have passed. Check if everything is OK on you're side.
I've noticed that test_non_homogeneous_poisson_process.py::TestAgeReplacementDistribution is not stable. Sometimes it crashes. Can you investigate ?

Also, why there is no test_non_homogeneous_poisson_process.py::TestAgeReplacementRegression ?

I did not take time on _sample/_data.py but at first, I was confused by the existence of get_sample_ids and sample_id. What's the difference ?

… regression

…array

wilgri · 2025-12-02T09:47:32Z

Thank you :)

Things I've noticed and that you may change :

syntax error in __post_init__
It seems that StochasticRewardDataSample is never used. Is it a mistake ?
not sure to get the point of using dataclass (see discussion below)
_select_from_struct is only used by test functions and is hidden. Is it a required functionnality or is it only for the purpose of testing ? Its name is also quite ambiguous with select.
asset_id and sample_id type hints are not approriate. They are too specific. You may consider using a type alias

About dataclass :

its purpose is described in PEP 557. From what I understand, it is just a container type that implements methods by default.

the decorator adds generated methods to the class and returns the same class it was given

The generated methods are controlled by the usage of the field function and are just methods that allows to compare and compute basic arithmetic on 2 dataclass objects.

Here, I think your purpose is more precise. You want to provide a convenient interface to interact with a struct array whose structure must remains unknown by the users. I think, dataclass functionnalities make this purpose less obvious. For instance, you have repeated procedures to create the events, preventive_renewals or rewards attributes. You do the same array assigments and you return an array. By using dataclass, you're encouraged to write these procedures in the attribute assignments and repeat them. In addition, replace in select could be replaced by a call to the class constructor itself.

Other alternatives :

To avoid these repeated getter procedures, a first idea would be to create property methods for each attribute (cached_property can be considered but I think most of the computations is already done in np.unique). Each property call a hidden methods that does your array assignment. It makes the interface clearer.

Another idea is to group these getters in one __getattr__. The doc specifies which attr call are allowed and an AttributeError is raised for unknown ones. It also removes the inheritance that you've used to add reward. There will be only one unique interface that is look-up into the struct array fields and that may raise an error if the field does not exist.

A last idea, maybe the better one, is to consider using Mapping that requires __getitem__, __len__ and __iter__ functions. You can delegate your procedures to construct events, preventive_renewals and rewards in __getitem__. It makes the interface easier to understand and you get the benefits of other mapping functionnalities !

I have container object (more precisely a Mapping) that encapsulate data in a struct array (that must be hidden)
I can access to values of the struct array thanks to __getitem__ and I directly see in the code how the struct array is processed for each __getitem__ call
I can iterate through my mapping (same as dict). The __iter__ method is here for that. I think it just has to yield the keys of the mapping (I'm not sure so it must be verified)
The mapping is immutable but I can select a subpart of the mapping with select that returns a new mapping.

wilgri and others added 11 commits October 20, 2025 16:52

REL : release 2.2.0

5604331

change release version

Merge branch 'develop/2.2.X' of https://github.com/rte-france/relife …

8609ea4

…into develop/2.2.X

use class method to infere rvs size from args (+linting)

d8426d8

Apply refacto to iterables

8a5a412

renaming countdata

3b834c1

Fixes for Renewal Process

340f76b

Fixes for NHPP

d505673

Helper function for args size

43603e9

docstring and comments

13b54da

Merge branch 'develop/2.2.X' into feat/refacto_sample_process

766168b

Renaming size > shape

9ec4bee

wilgri self-assigned this Nov 14, 2025

wilgri marked this pull request as draft November 14, 2025 08:30

wilgri reviewed Nov 14, 2025

View reviewed changes

Nathan Herzhaft added 9 commits November 17, 2025 11:09

Use utils function to get args size

b9c2165

Fix NHPP nb_assets getter

16ebd08

Use unfreeze + freeze to avoid in-place modification

ebccd36

Aggregate Data Sample class in a single class for stochastic processes

308caf6

Refacto DataSample classes and methods

f2a84f6

UT for AgeReplacement RP, fix generate lifetime data and regression u…

89815d3

…sage

UT for left truncated first cycle

8c72259

UT for NHPP, fix in LeftTruncatedModel event sampling

8cfa3e0

Immprove 2D matrix methods for DataSample classes

31cdd4a

wilgri mentioned this pull request Nov 27, 2025

Order of arguments in RenewalProcess.sample #36

Open

Nathan Herzhaft added 2 commits November 28, 2025 14:26

Doc for _data methods; update reward matrix method ; test for NHPP AR…

747c4f6

… regression

Refacto Sample data class to focus on matrix usage instead of struct …

ce5c525

…array

Nathan Herzhaft added 2 commits December 1, 2025 14:00

Docstring

00bf8e3

mask for select corrections

5c7dd8e

Nathan Herzhaft and others added 7 commits December 5, 2025 17:14

Refacto StochasticDataSample

1afc5ab

Fix imports

5c3d61c

Docstring, fix test with dedicated function

f01d299

Renaming

5fe7070

Move get from struct to tests, remove unused struct array argument

0ec91aa

Replace time window with timeline argument

5e3349f

Refactoring suggestions of _data

90e744b



		class RenewalRewardProcessIterator(RenewalProcessIterator):
		class RenewalRewardProcessIterator(StochasticDataIterator):



		class CountDataIterator(Iterator[NDArray[np.void]], ABC):
		def get_args_shape(lifetime_model) -> int:

		def __iter__(self) -> StochasticDataIterator: ...


		def age_of_renewal_process_sampler(

feat(Refacto Sample Process): Refacto sampling module for stochastic processes (RenewalProcess and NHPP) #46

Are you sure you want to change the base?

feat(Refacto Sample Process): Refacto sampling module for stochastic processes (RenewalProcess and NHPP) #46

Uh oh!

Conversation

Nathan-Herzhaft commented Nov 6, 2025

Uh oh!

wilgri left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

wilgri commented Nov 21, 2025

Uh oh!

wilgri commented Nov 21, 2025

Uh oh!

wilgri commented Nov 24, 2025

Uh oh!

wilgri commented Dec 2, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants