Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Conversation

@ljwolf
Copy link
Member

@ljwolf ljwolf commented Aug 24, 2020

There were two issues in the concatenation code for the local statistics
for the parallel conditional permutation code. The original code is
below:

rlocals = numpy.hstack(rlocals).flatten()

The first big issue is that rlocals is a numpy array containing the
random realizations of simulations for observations within each
processing job. This code, however, will flatten that out into an array
of chunk_size*p_permutations, which certainly is not correct.

The second (slightly smaller) issue is that hstack fails when its chunks
are heterogeneously sized, even when you just want to stack over the
rows. For example:

>>> a = numpy.arange(20).reshape(4,5)
>>> b = numpy.arange(10).reshape(2,5)
>>> c = numpy.hstack((a,b)) # Fails!

That last numpy.hstack line fails with a ValueError, since "all of the
input array dimensions for the concatenation axis must match exactly."
That means that a and b have to have the same number of rows to be
hstacked. But, since they're 2-dimensional, one can row_stack them
just fine:

>>> c = numpy.row_stack((a,b)) # Success!

c contains the first 4 rows from a, and then the last 2 rows are b.

array([[ 0,  1,  2,  3,  4],
       [ 5,  6,  7,  8,  9],
       [10, 11, 12, 13, 14],
       [15, 16, 17, 18, 19],
       [ 0,  1,  2,  3,  4],
       [ 5,  6,  7,  8,  9]])

This is what we need to do in the rlocals case, since each chunk might be
slightly different in size, but still will always be (chunk_size,
p_permutations).

There were two issues in the concatenation code for the local statistics
for the parallel conditional permutation code. The original code is
below:

> rlocals = numpy.hstack(rlocals).flatten()

The first big issue is that rlocals is a numpy array containing the
random realizations of simulations for observations within each
processing job. This code, however, will flatten that out into an array
of chunk_size*p_permutations, which certainly is not correct.

The second (slightly smaller) issue is that hstack fails when its chunks
are heterogeneously sized, even when you just want to stack over the
rows. For example:

>>> a = numpy.arange(20).reshape(4,5)
>>> b = numpy.arange(10).reshape(2,5)
>>> c = numpy.hstack((a,b)) # Fails!

That last numpy.hstack line fails with a ValueError, since "all of the
input array dimensions for the concatenation axis must match exactly."
That means that a and b have to have the same number of rows to be
hstacked. But, since they're 2-dimensional, one can *row_stack* them
just fine:

>>> c = numpy.row_stack((a,b)) # Success!

c contains the first 4 rows from a, and then the last 2 rows are b. This
is what we need to do in the rlocals case, since each chunk might be
slightly different in size, but still will always be (chunk_size,
p_permutations).
@codecov-commenter
Copy link

codecov-commenter commented Aug 24, 2020

Codecov Report

Merging #147 into master will not change coverage.
The diff coverage is 0.00%.

Impacted file tree graph

@@           Coverage Diff           @@
##           master     #147   +/-   ##
=======================================
  Coverage   77.61%   77.61%           
=======================================
  Files          25       25           
  Lines        3274     3274           
=======================================
  Hits         2541     2541           
  Misses        733      733           
Impacted Files Coverage Δ
esda/crand.py 57.24% <0.00%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update db0af3a...70268fd. Read the comment docs.

@ljwolf
Copy link
Member Author

ljwolf commented Aug 24, 2020

This resolves #146. To verify this works, this is broken on master, but resolved here.

I knew I had addressed this before in the getisord rewrite, but that never made it into release?

import esda
import numpy
import libpysal
import time
time0 = time.time()

size = 100;
patch_eg = numpy.random.rand(size,size).flatten()
w = libpysal.weights.lat2W(size,size)

lm = esda.Moran_Local(patch_eg, w, transformation='r', keep_simulations=True,
                      permutations=1000, n_jobs=8)

@ljwolf ljwolf requested review from darribas and sjsrey November 23, 2020 13:38
@ljwolf ljwolf added this to the next release milestone Nov 23, 2020
@ljwolf ljwolf added the bug label Nov 23, 2020
@sjsrey sjsrey merged commit c5cca10 into pysal:master Dec 29, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants