fix concatenation for random statistics #147
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There were two issues in the concatenation code for the local statistics
for the parallel conditional permutation code. The original code is
below:
The first big issue is that rlocals is a numpy array containing the
random realizations of simulations for observations within each
processing job. This code, however, will flatten that out into an array
of chunk_size*p_permutations, which certainly is not correct.
The second (slightly smaller) issue is that hstack fails when its chunks
are heterogeneously sized, even when you just want to stack over the
rows. For example:
That last numpy.hstack line fails with a ValueError, since "all of the
input array dimensions for the concatenation axis must match exactly."
That means that a and b have to have the same number of rows to be
hstacked. But, since they're 2-dimensional, one can row_stack them
just fine:
c contains the first 4 rows from a, and then the last 2 rows are b.
This is what we need to do in the rlocals case, since each chunk might be
slightly different in size, but still will always be (chunk_size,
p_permutations).