pseudo-p significance calculation #281

JosiahParry · 2024-02-14T22:13:48Z

This PR drafts a function calculate_significance() to provide a consistent way to calculate pseudo-p values from a reference distribution.

It is based on the discussion at #199

codecov · 2024-02-15T08:30:47Z

Codecov Report

❌ Patch coverage is 93.42105% with 5 lines in your changes missing coverage. Please review.
✅ Project coverage is 82.0%. Comparing base (300f8e8) to head (8dc453d).
⚠️ Report is 25 commits behind head on main.

Files with missing lines	Patch %	Lines
esda/moran.py	75.0%	2 Missing ⚠️
esda/significance.py	95.7%	2 Missing ⚠️
esda/crand.py	95.5%	1 Missing ⚠️

Additional details and impacted files

@@          Coverage Diff          @@
##            main    #281   +/-   ##
=====================================
  Coverage   82.0%   82.0%           
=====================================
  Files         24      25    +1     
  Lines       3489    3538   +49     
=====================================
+ Hits        2861    2902   +41     
- Misses       628     636    +8

Files with missing lines	Coverage Δ
esda/crand.py	`93.7% <95.5%> (-0.7%)`	⬇️
esda/moran.py	`84.9% <75.0%> (-0.1%)`	⬇️
esda/significance.py	`95.7% <95.7%> (ø)`

... and 1 file with indirect coverage changes

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

…shaping

ljwolf · 2024-02-15T12:02:37Z

OK, done here on the logic & implementation. Thank you @JosiahParry for getting the ball rolling here 😄 Very much appreciated!

I've re-implemented the percentile-based two-sided test from scratch using scipy.stats.scoreatpercentile. This approach finds the percentile for the test statistic in the reference distribution and counts how many replicates are outside of (p, 1-p). Over simulations, these are always 2*directed. Second, I modified your folding approach to fold around the mean of the replicates, rather than zero (since the expected value of local stats generally isn't zero) and kept it as a folded option for testing.

I don't think we should expose the folded variant to the user in the classes, since the power in each direction is dependent on the symmetry of the distribution. For example, in the illustration below, the smallest replicate, when folded, is not "extreme," but this is accounted for in the percentile-based method.

The percentile will always equal the folded version for symmetric distributions, but the folded version becomes a directed test as skew increases. I think that the (over+under)/all is also the intended estimand of the directed approach, after re-reading the Hope paper referred to in #199

If other maintainers approve these four options (greater, lesser, two-sided, and directed) for the user classes and a folded option for this function only (for replication/testing purposes) I can start propagating this across the user classes.

knaaptime

sweet. presume the stuff in main gets moved to a test file or something but this is great

If other maintainers approve these four options (greater, lesser, two-sided, and directed) for the user classes and a folded option for this function only (for replication/testing purposes) I can start propagating this across the user classes.

+1

ljwolf · 2024-02-16T10:12:38Z

esda/significance.py

+        percentile = (reference_distribution < test_stat).mean(axis=1)
+        bounds = np.column_stack((1 - percentile, percentile)) * 100
+        bounds.sort(axis=1)
+        lows, highs = np.row_stack(


This may be able to be vectorised, but I couldn't quickly figure that out. the following did not generate the same results as below:

stats.scoreatpercentile(reference_distribution, bounds, axis=1)

esda/significance.py

ljwolf · 2024-03-05T18:27:11Z

I am still working on this, but I recall now why the implementation of an "alternative" argument was a bit trickier than I expected... because we allow for the user to "discard" the random replicates, rather than store them, we have to push the significance logic all the way down to the conditional randomization numba function. This may have significant performance implications, since we're currently only counting the number of larger random stats in each innermost loop.

It seems clear to me that

if the test statistic is as large as k realizations,
then there are always 2k simulations outside of the (k/n, (1-(k/n)) interval, plus the test statistic itself.
So, the proper p-value for the two-sided test is (2*n_greater+1)/(n_samples + 1),
which is off from two times our current p-value by 1/(n_samples+1).

If 1-4 are correct, this means we don't need to change any of the numba code. The correction can be calculated as 2*directed - (1/(n_samples+1)) after the numba calculation. Do I have this right @sjsrey @knaaptime @martinfleis @jGaboardi?

So, if we implement our current test for local stats without flipping (as greater), generate 1-p_sim (as lesser), and implement the above correction for the two-sided test (2*p_sim - (1/(n_samples+1))), none of the numba code needs to change.

Is that OK w/ other maintainers?

martinfleis · 2024-03-05T19:08:07Z

This is too much stats for me to say anything useful.

jGaboardi · 2024-03-05T19:36:23Z

This is too much stats for me to say anything useful.

Same for me.

jGaboardi

Since the topic is over my head, my approval is based on a general review.

ljwolf · 2024-03-06T16:22:14Z

One further wrinkle as well: some global Moran tests support directed testing a binary option already. Notably, if two_tailed=False, they pick the test direction based on whether the global I is positive or negative. It's also useful to note: this means we currently pick the smallest one-tailed p-value and, if the test is two-tailed, multiply this by two.

For us to roll-out the testing across all the classes, we need to consider if this option should be deprecated in favor of an explicit "alternative" option? Right now, there's no way to force a direction on these tests.

sjsrey · 2024-03-07T14:37:45Z

I am still working on this, but I recall now why the implementation of an "alternative" argument was a bit trickier than I expected... because we allow for the user to "discard" the random replicates, rather than store them, we have to push the significance logic all the way down to the conditional randomization numba function. This may have significant performance implications, since we're currently only counting the number of larger random stats in each innermost loop.

It seems clear to me that

if the test statistic is as large as k realizations,

then there are always 2k simulations outside of the (k/n, (1-(k/n)) interval, plus the test statistic itself.

So, the proper p-value for the two-sided test is (2*n_greater+1)/(n_samples + 1),

which is off from two times our current p-value by 1/(n_samples+1).

If 1-4 are correct, this means we don't need to change any of the numba code. The correction can be calculated as 2*directed - (1/(n_samples+1)) after the numba calculation. Do I have this right @sjsrey @knaaptime @martinfleis @jGaboardi?

So, if we implement our current test for local stats without flipping (as greater), generate 1-p_sim (as lesser), and implement the above correction for the two-sided test (2*p_sim - (1/(n_samples+1))), none of the numba code needs to change.

Is that OK w/ other maintainers?

I think this is OK.

One thing to check is if:

So, the proper p-value for the two-sided test is (2*n_greater+1)/(n_samples + 1),
Should be
2(n_greater+1)/(n_samples+1)

ljwolf · 2024-03-07T15:15:06Z

One thing to check is if:

Sure, that is what I initially thought & what @JosiahParry suggested.

The reason why I'm thinking it's actually 2*p_sim - (1/(n_permutations + 1)) is because using 2*p_sim amounts to counting the test stat twice: 2*p_directed = 2 * (n_outside + 1)/(n+1) = (2*outside + 2)/(n+1). The difference will be vanishingly small as the number of permutations increases, but it's the principle...

Thinking another way, in the percentile-based version of the test, you compute the percentile p for the test statistic, count how many null statistics are outside of (p,1-p) and add one, since the test stat is always at least at its own percentile. This p-value is smaller than p_sim by 1/(n_samples-1), which is what would happen in the percentile test if you counted the test statistic twice.

the simulation code at the end of esda/simulation.py should illustrate?

weikang9009 · 2024-04-30T21:14:25Z

@ljwolf I was looking at the discussions in this PR and the other related issue. The correction for the two-sided test 2*p_sim - (1/(n_samples+1)) looks correct to me.

JosiahParry · 2024-05-01T13:04:12Z

Thank you for the explanation @ljwolf. I think I'm almost there/onboard!

It's worth calling out explicitly this formula can result in a p-value > 1.0 which should also be handled e.g.

p_sim = 0.65
nsim = 999

(p_corrected = (2*p_sim - (1/(nsim + 1))))
#> [1] 1.299

if (p_corrected > 1) {
  1.0
} else {
  p_corrected
}
#> [1] 1

Additionally, would you mind elaborating why it is - (1/(nsim + 1)) as opposed to + (1/(nsim + 1))? To me, it makes more sense to penalize smaller numbers of simulations rather than larger number of simulations. For example subtracting the second term results in smaller p values for smaller numbers of simulations and larger ones for larger numbers of simulations.

calc_p_sim <- function(p_sim, nsim) {
  (p_corrected = (2*p_sim - (1/(nsim + 1))))

  if (p_corrected > 1) {
    1.0
  } else {
    p_corrected
  }

}

calc_p_sim(0.05, 49)
#> [1] 0.08
calc_p_sim(0.05, 99)
#> [1] 0.09
calc_p_sim(0.05, 999)
#> [1] 0.099

JosiahParry · 2024-05-01T13:06:33Z

esda/significance.py

+    the directed p-value is half of the two-sided p-value, and corresponds to running the
+    lesser and greater tests, then picking the smaller significance value. This is not advised.


Note that this will be untrue if the adjustment is added

ljwolf · 2025-05-01T19:19:38Z

Hi! back again :) this has not been forgotten---it is the highest priority for me when I have development time.

I needed to move to a pure numpy version of the two-sided percentile test in order to push it down into the numba.njit() inner loop of esda.crand().

I will push that code up shortly, bandwidth permitting. My intention is then to write an numba.njit() compatible esda.significance._permutation_test() function, and send both esda.crand.parallel_crand() and esda.significance.permutation_test() to that when calculating the permutation test. done.

We now need to do some profiling, but hopefully there is not much gain from doing it inline vs. calling another jitted function on singly-typed input.

esda/significance.py

ljwolf · 2025-05-08T16:09:56Z

@weikang9009 notes correctly that we will also need to update the notebooks where local/global statistics are used before merging this.

ljwolf · 2025-09-10T16:52:40Z

the affected notebook has been fixed and #376 has been addressed!

ljwolf · 2025-09-10T18:10:42Z

I think this is ready to merge.

JosiahParry · 2025-09-10T18:20:23Z

Great work, @ljwolf !

ljwolf · 2025-09-10T18:30:17Z

Thanks! it seems there's a broadcasting issue that's numba-version dependent. I will squash this issue, and then it's ready.

ljwolf · 2025-09-16T16:18:48Z

OK, all tests are passing except windows, which look like a build issue. Could we merge this? Or, can I get some help to identify the issue with the windows build? This touches numba code, but no code is specific to windows, and all tests on linux/macos pass.

ljwolf · 2025-09-16T16:32:46Z

Thanks @martinfleis! @sjsrey can you merge?

JosiahParry added 2 commits February 14, 2024 17:08

draft significance.py

9b80c2f

multiply two-sided by 2

5e6b05b

add the two-sided percentile-based test and directed test with array …

1783ae9

…shaping

ljwolf requested review from jGaboardi, knaaptime, martinfleis and sjsrey February 15, 2024 10:21

ljwolf added 4 commits February 15, 2024 10:25

fix imports

65552ca

add folding-based p-value

920719c

swap to strict inequality

d9ea095

add example and ruff

c6e3a8a

knaaptime approved these changes Feb 16, 2024

View reviewed changes

ljwolf reviewed Feb 16, 2024

View reviewed changes

sjsrey reviewed Feb 20, 2024

View reviewed changes

esda/significance.py Outdated Show resolved Hide resolved

esda/significance.py Outdated Show resolved Hide resolved

esda/significance.py Outdated Show resolved Hide resolved

update significance directions

fa2deaa

jGaboardi approved these changes Mar 5, 2024

View reviewed changes

jGaboardi assigned ljwolf and JosiahParry Mar 5, 2024

adding one below is sufficient

1c59fa7

JosiahParry commented May 1, 2024

View reviewed changes

ljwolf added 2 commits May 1, 2025 19:58

move to njit implementation

3ff32b6

move to significance machinery for crand

b9f6161

martinfleis reviewed May 1, 2025

View reviewed changes

esda/significance.py Outdated Show resolved Hide resolved

esda/significance.py Outdated Show resolved Hide resolved

review by @martinfleis: prep for tests and benchmarking

3b2ee7d

ljwolf force-pushed the calc-sig branch from ca61db2 to 3b2ee7d Compare May 2, 2025 10:04

ljwolf added 4 commits May 2, 2025 21:19

fix item() extraction in inner permutation loop

708ac80

iterate to calculate the percentages

54d5f7d

update significance testing tests

7a2dcdf

Merge branch 'main' into calc-sig

9d38da3

ljwolf added this to the next release milestone May 8, 2025

Merge branch 'main' of github.com:pysal/esda into calc-sig

1b53098

ljwolf mentioned this pull request Sep 10, 2025

Moran_Local fails when numba installed – Spatial Autocorrelation notebook #376

Open

ljwolf added 4 commits September 10, 2025 17:43

make sure weights types are cast correctly for matmul

9de1b1e

add warning suppression when islands result in zero seI

1295dfc

fix typing, shaping, and iteration issues in significance

239e737

update notebook, removing warning filter and numba disclaimer

d53a905

ljwolf changed the title ~~[Draft] pseudo-p significance calculation~~ pseudo-p significance calculation Sep 10, 2025

fix shaping and test validity

8dc453d

martinfleis approved these changes Sep 16, 2025

View reviewed changes

sjsrey merged commit 051c715 into pysal:main Sep 17, 2025
29 of 30 checks passed

ljwolf mentioned this pull request Sep 18, 2025

add start of local partial moran statistics #279

Merged

2 tasks

ljwolf mentioned this pull request Sep 25, 2025

adding alternative option in local_moran and moral_local_rate #205

Closed

		the directed p-value is half of the two-sided p-value, and corresponds to running the
		lesser and greater tests, then picking the smaller significance value. This is not advised.

pseudo-p significance calculation #281

pseudo-p significance calculation #281

Uh oh!

Conversation

JosiahParry commented Feb 14, 2024

Uh oh!

codecov bot commented Feb 15, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

ljwolf commented Feb 15, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

knaaptime left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ljwolf Feb 16, 2024

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

ljwolf commented Mar 5, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

martinfleis commented Mar 5, 2024

Uh oh!

jGaboardi commented Mar 5, 2024

Uh oh!

jGaboardi left a comment

Choose a reason for hiding this comment

Uh oh!

ljwolf commented Mar 6, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

sjsrey commented Mar 7, 2024

Uh oh!

ljwolf commented Mar 7, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

weikang9009 commented Apr 30, 2024

Uh oh!

JosiahParry commented May 1, 2024

Uh oh!

JosiahParry May 1, 2024

Choose a reason for hiding this comment

Uh oh!

ljwolf commented May 1, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

ljwolf commented May 8, 2025

Uh oh!

ljwolf commented Sep 10, 2025

Uh oh!

ljwolf commented Sep 10, 2025

Uh oh!

JosiahParry commented Sep 10, 2025

Uh oh!

ljwolf commented Sep 10, 2025

Uh oh!

ljwolf commented Sep 16, 2025

Uh oh!

ljwolf commented Sep 16, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

codecov bot commented Feb 15, 2024 •

edited

Loading

ljwolf commented Feb 15, 2024 •

edited

Loading

knaaptime left a comment •

edited

Loading

ljwolf commented Mar 5, 2024 •

edited

Loading

ljwolf commented Mar 6, 2024 •

edited

Loading

ljwolf commented Mar 7, 2024 •

edited

Loading

ljwolf commented May 1, 2025 •

edited

Loading