-
Notifications
You must be signed in to change notification settings - Fork 1.2k
Closed
Description
Here is the code which I'm trying to speedup:
def resp_all(beta,dist):
N,K = dist.shape
min_dist = np.min(dist, axis = 1)
max_dist = np.max(dist, axis = 1)
dist_corr = (min_dist + max_dist) / 2
dist_corr = np.minimum(dist_corr,(min_dist+700*(2/beta)))
for n in range(0,N):
dist[n,:] = dist[n,:] - dist_corr[n]
dist[n,:] = dist[n,:]* (-beta/2)
np.exp(dist, out=dist)
Rsum = np.sum(dist,axis=1)
for n in range(0,N):
dist[n,:] = dist[n,:]/Rsum[n]Python runtime is 944 ms.
With jit() code runs in 749 ms.
So I thought that I can do guvectorization of this code by unwrapping NumPy vectorization.
This is the result:
def loop(dist,beta,min_dist,max_dist,dist_corr,res):
#res is the same as dist
K=dist.shape[0]
min_dist[0]=dist[0]
max_dist[0]=dist[0]
for k in range(K):
min_dist[0] = min(min_dist[0],dist[k])
max_dist[0] = max(max_dist[0],dist[k])
dist_corr[0]=min((min_dist[0] + max_dist[0]) / 2.0,(min_dist[0]+700*(2/beta[0])))
max_dist[0]=0.0
for k in range(K):
res[k] = np.exp(-beta[0]*(dist[k] - dist_corr[0])/2.0)
max_dist[0]=max_dist[0]+dist[k]
for k in range(K):
res[k] = res[k]/max_dist[0]
guloop=guvectorize(['void(float64[:], float64[:],float64[:],float64[:], float64[:],float64[:])'],
'(n),(),(),(),()->(n)', target='parallel',nopython=True)(loop)
def resp_all_dummy2(beta,min_dist,max_dist,dist_corr,dist):
N,K = dist.shape
for n in range(N):
loop(dist[n],[beta[n]],[min_dist[n]],[max_dist[n]],[dist_corr[n]],dist[n])In pure Python this code resp_all_dummy2 runs in 3 min 37 sec and producing proper output.
But when I run guvectorized code
res=guloop(dist5,beta,min_dist,max_dist,dist_corr)it runs faster 355 ms but producing wrong output.
Could you please help me to figure out what is going on?
Here is the notebook which explains this issue in details.
Another question is that: if there is any way to utilize parallel guvectorization for this code without unwrapping NumPy vectorization?
Metadata
Metadata
Assignees
Labels
No labels