fix concatenation for random statistics #147

ljwolf · 2020-08-24T10:03:54Z

There were two issues in the concatenation code for the local statistics
for the parallel conditional permutation code. The original code is
below:

rlocals = numpy.hstack(rlocals).flatten()

The first big issue is that rlocals is a numpy array containing the
random realizations of simulations for observations within each
processing job. This code, however, will flatten that out into an array
of chunk_size*p_permutations, which certainly is not correct.

The second (slightly smaller) issue is that hstack fails when its chunks
are heterogeneously sized, even when you just want to stack over the
rows. For example:

>>> a = numpy.arange(20).reshape(4,5)
>>> b = numpy.arange(10).reshape(2,5)
>>> c = numpy.hstack((a,b)) # Fails!

That last numpy.hstack line fails with a ValueError, since "all of the
input array dimensions for the concatenation axis must match exactly."
That means that a and b have to have the same number of rows to be
hstacked. But, since they're 2-dimensional, one can row_stack them
just fine:

>>> c = numpy.row_stack((a,b)) # Success!

c contains the first 4 rows from a, and then the last 2 rows are b.

array([[ 0,  1,  2,  3,  4],
       [ 5,  6,  7,  8,  9],
       [10, 11, 12, 13, 14],
       [15, 16, 17, 18, 19],
       [ 0,  1,  2,  3,  4],
       [ 5,  6,  7,  8,  9]])

This is what we need to do in the rlocals case, since each chunk might be
slightly different in size, but still will always be (chunk_size,
p_permutations).

There were two issues in the concatenation code for the local statistics for the parallel conditional permutation code. The original code is below: > rlocals = numpy.hstack(rlocals).flatten() The first big issue is that rlocals is a numpy array containing the random realizations of simulations for observations within each processing job. This code, however, will flatten that out into an array of chunk_size*p_permutations, which certainly is not correct. The second (slightly smaller) issue is that hstack fails when its chunks are heterogeneously sized, even when you just want to stack over the rows. For example: >>> a = numpy.arange(20).reshape(4,5) >>> b = numpy.arange(10).reshape(2,5) >>> c = numpy.hstack((a,b)) # Fails! That last numpy.hstack line fails with a ValueError, since "all of the input array dimensions for the concatenation axis must match exactly." That means that a and b have to have the same number of rows to be hstacked. But, since they're 2-dimensional, one can *row_stack* them just fine: >>> c = numpy.row_stack((a,b)) # Success! c contains the first 4 rows from a, and then the last 2 rows are b. This is what we need to do in the rlocals case, since each chunk might be slightly different in size, but still will always be (chunk_size, p_permutations).

codecov-commenter · 2020-08-24T10:07:37Z

Codecov Report

Merging #147 into master will not change coverage.
The diff coverage is 0.00%.

@@           Coverage Diff           @@
##           master     #147   +/-   ##
=======================================
  Coverage   77.61%   77.61%           
=======================================
  Files          25       25           
  Lines        3274     3274           
=======================================
  Hits         2541     2541           
  Misses        733      733

Impacted Files	Coverage Δ
esda/crand.py	`57.24% <0.00%> (ø)`

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update db0af3a...70268fd. Read the comment docs.

ljwolf · 2020-08-24T10:08:06Z

This resolves #146. To verify this works, this is broken on master, but resolved here.

I knew I had addressed this before in the getisord rewrite, but that never made it into release?

import esda
import numpy
import libpysal
import time
time0 = time.time()

size = 100;
patch_eg = numpy.random.rand(size,size).flatten()
w = libpysal.weights.lat2W(size,size)

lm = esda.Moran_Local(patch_eg, w, transformation='r', keep_simulations=True,
                      permutations=1000, n_jobs=8)

ljwolf requested review from darribas and sjsrey November 23, 2020 13:38

ljwolf added this to the next release milestone Nov 23, 2020

ljwolf added the bug label Nov 23, 2020

sjsrey approved these changes Nov 24, 2020

View reviewed changes

sjsrey merged commit c5cca10 into pysal:master Dec 29, 2020

sjsrey mentioned this pull request Dec 29, 2020

parallel_crand_ ValueError #146

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix concatenation for random statistics #147

fix concatenation for random statistics #147

Uh oh!

ljwolf commented Aug 24, 2020 •

edited

Loading

Uh oh!

codecov-commenter commented Aug 24, 2020 •

edited

Loading

Uh oh!

ljwolf commented Aug 24, 2020 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

fix concatenation for random statistics #147

fix concatenation for random statistics #147

Uh oh!

Conversation

ljwolf commented Aug 24, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

codecov-commenter commented Aug 24, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

ljwolf commented Aug 24, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

ljwolf commented Aug 24, 2020 •

edited

Loading

codecov-commenter commented Aug 24, 2020 •

edited

Loading

ljwolf commented Aug 24, 2020 •

edited

Loading