Thanks to visit codestin.com
Credit goes to github.com

Skip to content

guvectorize run does not reproduce Python output #2364

@thoth291

Description

@thoth291

Here is the code which I'm trying to speedup:

def resp_all(beta,dist):
    N,K = dist.shape
    min_dist = np.min(dist, axis = 1)
    max_dist = np.max(dist, axis = 1)
    dist_corr = (min_dist + max_dist) / 2
    dist_corr = np.minimum(dist_corr,(min_dist+700*(2/beta)))
    for n in range(0,N):
        dist[n,:] = dist[n,:] -  dist_corr[n]
        dist[n,:] = dist[n,:]* (-beta/2)      
    np.exp(dist, out=dist)
    Rsum = np.sum(dist,axis=1)
    for n in range(0,N):
        dist[n,:] = dist[n,:]/Rsum[n]

Python runtime is 944 ms.
With jit() code runs in 749 ms.

So I thought that I can do guvectorization of this code by unwrapping NumPy vectorization.
This is the result:

def loop(dist,beta,min_dist,max_dist,dist_corr,res):
    #res is the same as dist
    K=dist.shape[0]
    min_dist[0]=dist[0]
    max_dist[0]=dist[0]
    for k in range(K):
        min_dist[0] = min(min_dist[0],dist[k])
        max_dist[0] = max(max_dist[0],dist[k])
    dist_corr[0]=min((min_dist[0] + max_dist[0]) / 2.0,(min_dist[0]+700*(2/beta[0])))
    max_dist[0]=0.0
    for k in range(K):
        res[k] = np.exp(-beta[0]*(dist[k] -  dist_corr[0])/2.0)
        max_dist[0]=max_dist[0]+dist[k]
    for k in range(K):
        res[k] = res[k]/max_dist[0]


guloop=guvectorize(['void(float64[:], float64[:],float64[:],float64[:], float64[:],float64[:])'],
                    '(n),(),(),(),()->(n)', target='parallel',nopython=True)(loop)

def resp_all_dummy2(beta,min_dist,max_dist,dist_corr,dist):
    N,K = dist.shape
    for n in range(N):
        loop(dist[n],[beta[n]],[min_dist[n]],[max_dist[n]],[dist_corr[n]],dist[n])

In pure Python this code resp_all_dummy2 runs in 3 min 37 sec and producing proper output.
But when I run guvectorized code

res=guloop(dist5,beta,min_dist,max_dist,dist_corr)

it runs faster 355 ms but producing wrong output.

Could you please help me to figure out what is going on?
Here is the notebook which explains this issue in details.

Another question is that: if there is any way to utilize parallel guvectorization for this code without unwrapping NumPy vectorization?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions