guvectorize run does not reproduce Python output

Here is the code which I'm trying to speedup:
```python
def resp_all(beta,dist):
    N,K = dist.shape
    min_dist = np.min(dist, axis = 1)
    max_dist = np.max(dist, axis = 1)
    dist_corr = (min_dist + max_dist) / 2
    dist_corr = np.minimum(dist_corr,(min_dist+700*(2/beta)))
    for n in range(0,N):
        dist[n,:] = dist[n,:] -  dist_corr[n]
        dist[n,:] = dist[n,:]* (-beta/2)      
    np.exp(dist, out=dist)
    Rsum = np.sum(dist,axis=1)
    for n in range(0,N):
        dist[n,:] = dist[n,:]/Rsum[n]
```

Python runtime is `944 ms`.
With `jit()` code runs in `749 ms`.

So I thought that I can do guvectorization of this code by unwrapping NumPy vectorization.
This is the result:
```python
def loop(dist,beta,min_dist,max_dist,dist_corr,res):
    #res is the same as dist
    K=dist.shape[0]
    min_dist[0]=dist[0]
    max_dist[0]=dist[0]
    for k in range(K):
        min_dist[0] = min(min_dist[0],dist[k])
        max_dist[0] = max(max_dist[0],dist[k])
    dist_corr[0]=min((min_dist[0] + max_dist[0]) / 2.0,(min_dist[0]+700*(2/beta[0])))
    max_dist[0]=0.0
    for k in range(K):
        res[k] = np.exp(-beta[0]*(dist[k] -  dist_corr[0])/2.0)
        max_dist[0]=max_dist[0]+dist[k]
    for k in range(K):
        res[k] = res[k]/max_dist[0]


guloop=guvectorize(['void(float64[:], float64[:],float64[:],float64[:], float64[:],float64[:])'],
                    '(n),(),(),(),()->(n)', target='parallel',nopython=True)(loop)

def resp_all_dummy2(beta,min_dist,max_dist,dist_corr,dist):
    N,K = dist.shape
    for n in range(N):
        loop(dist[n],[beta[n]],[min_dist[n]],[max_dist[n]],[dist_corr[n]],dist[n])
```

In pure Python this code `resp_all_dummy2` runs in `3 min 37 sec` and producing proper output.
But when I run guvectorized code
```python
res=guloop(dist5,beta,min_dist,max_dist,dist_corr)
```
it runs faster `355 ms` but producing wrong output.

**Could you please help me to figure out what is going on?**
Here is the [notebook](https://anaconda.org/thoth/resp_all/notebook) which explains this issue in details.

Another question is that: **if there is any way to utilize parallel guvectorization for this code `without` unwrapping NumPy vectorization?**


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

guvectorize run does not reproduce Python output #2364

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

guvectorize run does not reproduce Python output #2364

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions