-
Notifications
You must be signed in to change notification settings - Fork 0
Description
Hello there,
I came across the TimeCycle paper and really liked the idea, it seems like a very elegant approach so thanks a lot for sharing it!
While working with the package on some test data, I noticed that in some cases the p-values returned were slightly above 1. After looking into the code, I think I may have identified a possible reason for this behaviour, and I wanted to flag it in case it's helpful.
From what I understand, TimeCycle computes a persistence score for the observed time series and compares it to a null distribution generated via resampling (as defined by the resamplings parameter). The p-value is then calculated by ranking the observed score among the null scores and dividing by the number of resamplings.
The issue seems to arise when the observed score ties with the maximum in the null distribution or exceeds all of them. In such cases, the rank() function assigns it a value of resamplings + 1, which can lead to a p-value slightly greater than 1 (e.g., 11/10 = 1.1 when resamplings = 10). Of course, this discrepancy becomes negligible with larger, more usfeul numbers of resamplings, but technically, p-values exceeding 1 are still unexpected.
If this interpretation is correct, I think adjusting line 37 of the main TimeCycle() function from
pVals <- unlist(pVals) / as.numeric(resamplings)
to
pVals <- unlist(pVals) / (as.numeric(resamplings) + 1)
would ensure that the maximum p-value never exceeds 1.
Please feel free to correct me if I’ve misunderstood the implementation. Thanks again for publishing the method, I wish every package had such an extensive and well-written documentation!
Best,
Pascal