Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Conversation

@ehsantn
Copy link
Contributor

@ehsantn ehsantn commented Mar 23, 2017

This adds optimization and automatic parallelization passes that can be turned on using parallel=True flag to njit. Blackscholes (examples/blackscholes/blackscholes_pa.py) and Logistic Regression (examples/logistic-regression/logistic_regression.py) benchmarks demonstrate this new feature's functionality.

@ehsantn ehsantn force-pushed the master branch 2 times, most recently from 94b63e1 to 8a31bb1 Compare April 3, 2017 16:08
sklam pushed a commit to sklam/numba that referenced this pull request Apr 3, 2017
sklam pushed a commit to sklam/numba that referenced this pull request Apr 3, 2017
@sklam sklam mentioned this pull request Apr 3, 2017
print("outer_sig = ", outer_sig.args, outer_sig.return_type, outer_sig.recvr, outer_sig.pysig)

# Build the wrapper for GUFunc
ufunc = ParallelGUFuncBuilder(cres.entry_point, gu_signature)
Copy link
Member

@sklam sklam Apr 4, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems that this ufunc object and the following operation on it is unnecessary.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right, this is dead code.

self.signature = signature
self.sin, self.sout = parse_signature(signature)
# allow internal use to pass in data structure that represents signature
if isinstance(signature, tuple):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The only place that this is used is in parfor.py .
As commented in https://github.com/numba/numba/pull/2318/files#r109723364 , its usage is not needed; thus, this also seems unnecessary.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right. Both the above and this one are fixed in commit cab52c6. Thanks for spotting them!

sklam pushed a commit to sklam/numba that referenced this pull request Apr 4, 2017
Copy link
Contributor

@stuartarchibald stuartarchibald left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the patches, this is great. I've made a start on a review of them. My comments are largely just minor things, with the exception of a couple referring to having some class/lib of functions to take away the repetition involved in the building of ir nodes. I think this would reduce code size and make testing easier.

Thanks again.

@@ -0,0 +1,55 @@
import numba
import numpy as np
# from scipy.special import erf
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can probably remove this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. For future reference, this blackscholes code would be simpler if scipy.special.erf or scipy.stats.norm.cdf was supported by Numba.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Noted in #2338, thanks.


@numba.vectorize(nopython=True)
def cndf2(inp):
out = 0.5 + 0.5 * math.erf(0.707106781 * inp)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

low precision for const sqrt(2)/2

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.


@numba.njit(parallel=run_parallel)
def blackscholes(sptprice, strike, rate, volatility, timev):
logterm = np.log10(sptprice / strike)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Think that should be natural log?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

@@ -0,0 +1,52 @@
import numba
import numpy as np
# from scipy.special import erf
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can probably remove this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.


@numba.vectorize(nopython=True)
def cndf2(inp):
out = 0.5 + 0.5 * math.erf(0.707106781 * inp)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

low precision for const sqrt(2)/2

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

numba/ir.py Outdated
return '%s %s %s' % (self.lhs, self.fn, self.rhs)
else:
args = ('%s=%s' % (k, v) for k, v in self._kws.items())
args = ('%s=%s' % (k, v) for k, v in (self._kws.items() if config.DIFF_IR == 0 else sorted(self._kws.items())))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very long line. Also, don't think the == 0 is needed?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the == 0 is needed. If NUMBA_DIFF_IR is 0 then they don't care about ordering and so _kws.items() is used which isn't in deterministic order. If DIFF_IR is non-zero then used the sorted version. I presume that you want the predominant case to occur before the "if" and the rare case in the else clause.

Shortened the line.

return out

def get_np_ufunc_typ(func):
"""get type variable for np.empty() from builtin registry"""
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Think this docstring is incorrect? Same goes for the error message below?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

"""make a block that initializes loop range and iteration variables.
target label in jump needs to be set.
"""
# g_range_var = Global(range)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comment as earlier about making these commonly used sequences into a class/lib.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same response as above. We agree it would be nice to define some APIs but we're not sure by how much it would shrink the PR.

in the context of the current function.
2) The body of the parfor is transformed into a gufunc function.
3) Code is inserted into the main function that calls do_scheduling
to divide the iteration space for each thread, allocatees
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/allocatees/allocates/

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

3) Code is inserted into the main function that calls do_scheduling
to divide the iteration space for each thread, allocatees
reduction arrays, calls the gufunc function, and then invokes
the reduction function acorss the reduction arrays to produce
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/acorss/across/

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

seibert added a commit that referenced this pull request Apr 12, 2017
@codecov-io
Copy link

codecov-io commented Apr 15, 2017

Codecov Report

Merging #2318 into master will decrease coverage by 0.45%.
The diff coverage is 68.16%.

@@            Coverage Diff             @@
##           master    #2318      +/-   ##
==========================================
- Coverage   86.66%   86.21%   -0.46%     
==========================================
  Files         317      313       -4     
  Lines       58437    59199     +762     
  Branches     6032     6384     +352     
==========================================
+ Hits        50645    51038     +393     
- Misses       6830     7124     +294     
- Partials      962     1037      +75

@sklam
Copy link
Member

sklam commented Apr 24, 2017

the example blackscholes_pa.py is failing randomly for me.

When it fails, I get the traceback:

  File "/Users/siu/miniconda3/envs/dev-numba-py35/lib/python3.5/site-packages/numba/compiler.py", line 840, in native_lowering_stage
    lower.create_cpython_wrapper(flags.release_gil)
  File "/Users/siu/miniconda3/envs/dev-numba-py35/lib/python3.5/site-packages/numba/lowering.py", line 223, in create_cpython_wrapper
    release_gil=release_gil)
  File "/Users/siu/miniconda3/envs/dev-numba-py35/lib/python3.5/site-packages/numba/targets/cpu.py", line 142, in create_cpython_wrapper
    builder.build()
  File "/Users/siu/miniconda3/envs/dev-numba-py35/lib/python3.5/site-packages/numba/callwrapper.py", line 122, in build
    self.build_wrapper(api, builder, closure, args, kws)
  File "/Users/siu/miniconda3/envs/dev-numba-py35/lib/python3.5/site-packages/numba/callwrapper.py", line 155, in build_wrapper
    val = cleanup_manager.add_arg(builder.load(obj), ty)
  File "/Users/siu/miniconda3/envs/dev-numba-py35/lib/python3.5/site-packages/numba/callwrapper.py", line 32, in add_arg
    native = self.api.to_native_value(ty, obj)
  File "/Users/siu/miniconda3/envs/dev-numba-py35/lib/python3.5/site-packages/numba/pythonapi.py", line 1316, in to_native_value
    raise NotImplementedError("cannot convert %s to native value" % (typ,))
numba.errors.LoweringError: Failed at nopython (nopython mode backend)
Failed at nopython (nopython mode backend)
cannot convert Function(<numba._DUFunc 'cndf2'>) to native value
File "blackscholes_pa.py", line 16
[1] During: lowering "[LoopNest(index_variable=parfor_index.15, range_variable=strikesize0.5, correlation=10)]{1: <ir.Block at blackscholes_pa.py (16)>}Var(parfor_index.15, blackscholes_pa.py (16))" at blackscholes_pa.py (16)


output = axy(A,X,Y)
expected = A*X+Y
np.testing.assert_array_equal(expected, output)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you test that ensure the transformation is performed?

For example, checking @do_scheduling is in the LLVM:

self.assertIn('@do_scheduling', axy.inspect_llvm(axy.signatures[0]))

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the suggestion. Done. Moving forward, we need similar functionality for the Numba IR level (inspect_IR?).

numba/parfor.py Outdated
for var_list in parfor.array_analysis.array_size_vars.values():
loop_vars |= {v.name for v in var_list if isinstance(v, ir.Var)}
dead_set = set()
# TODO: handle escaping deads
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the escaping deads handled already? If not, can you explain what it is and the consequences of not handling them?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just added escaping dead handling.

from numba import ir, types, typing, config, analysis
from numba.typing.templates import signature
import numpy
from numba.analysis import *
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you avoid import *? It prevents linters from finding errors.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

import sys

from .. import compiler, ir, types, six, cgutils, sigutils
from numba.ir_utils import *
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you avoid import *?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

numba/parfor.py Outdated
from numba import ir, ir_utils, types, rewrites, config, analysis
from numba import array_analysis, postproc
from numba.ir_utils import *
from numba.analysis import *
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you avoid import *?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@@ -0,0 +1,1246 @@
from __future__ import print_function, division, absolute_import
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add a highlevel description or literature reference for the transformations happening in this file?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@ehsantn
Copy link
Contributor Author

ehsantn commented Apr 24, 2017

@sklam I can't reproduce your error. Could you give more details about your environment?

Ehsan Totoni added 6 commits April 24, 2017 14:33
@sklam
Copy link
Member

sklam commented Apr 25, 2017

@ehsantn, the error is likely a due to randomized hashing from set or dict. I am testing on python3.5 on your branch. I can consistently reproduce the error with PYTHONHASHSEED=2 python blackscholes_pa.py

@ehsantn
Copy link
Contributor Author

ehsantn commented Apr 25, 2017

Thanks. I can reproduce it now. We will investigate.

@ehsantn
Copy link
Contributor Author

ehsantn commented Apr 27, 2017

@sklam 48bf27b fixes the issue. Let us know if you find any other problem.

@ninegua
Copy link

ninegua commented May 5, 2017

Some user facing documentation has been added as of commit 656e946, please help to see if it is sufficient.

Copy link
Contributor

@stuartarchibald stuartarchibald left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the docs. I've read through them, they look good and offer a decent explanation of what this feature does. I've made a few comments, all but one (the one about declaring what happens in np.dot) are minor edits. Thanks again.

* :ref:`Numpy ufuncs <supported_ufuncs>` that are supported in :term:`nopython mode`.
* User defined :class:`~numba.DUFunc` through :func:`~numba.vectorize`.

2. Numpy reduction function ``sum`` and ``prod``. Note that they have to be
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/function/functions/

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done


3. Numpy ``dot`` function between a matrix and a vector. When both inputs
are matrices, instead of parallelizing the matrix multiply, we choose to
leave it as a library call to Numpy's native implementation.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I presume this means the Numba equivalent implementation for Numpy's native mat-mat?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right. So I just put in a sentence saying for all other cases, default Numba implementation is used.

many such operations and while each operation could be parallelized
individually, such an approach often has lackluster performance due to poor
cache behavior. Instead, with auto-parallelization, Numba attempts to
identify such operations in a user program, fuse adjacent ones together
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/, /, and/ and s/together/together,/ ?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done


* unary operators: ``+`` ``-`` ``~``
* binary operators: ``+`` ``-`` ``*`` ``/`` ``/?`` ``%`` ``|`` ``>>`` ``^`` ``<<`` ``&`` ``**`` ``//``
* compare operators: ``==`` ``!=`` ``<`` ``<=`` ``>`` ``>=``
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/compare/comparison/

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

written as ``numpy.sum(a)`` instead of ``a.sum()``.

3. Numpy ``dot`` function between a matrix and a vector. When both inputs
are matrices, instead of parallelizing the matrix multiply, we choose to
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about the scalar and vector input combinations to np.dot, like np.dot(scalar, matrix) or np.dot(vector, vector), I seem to remember seeing in the code they weren't supported? If that's the case perhaps just say that in all cases other than the one supported the Numba impl will be called directly?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done.

In this section, we give an example of how this feature helps
parallelize Logistic Regression::

@numba.jit(nopython=True, parallel=True)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps make this example PEP8 compliant to achieve a more consistent rendering?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

We will not discuss details of the algorithm, but instead focus on how
this program behaves with auto-parallelization:

1. Input ``X`` is an ``N x D`` matrix. Input ``Y`` is a vector of size ``N``,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps discuss these in order of appearance in the signature, or reorder the signature?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

t1 = time.time()
w = logistic_regression(labels, points, w, iterations)
compiletime = time.time()-t1
print("SELFPRIMED ", compiletime)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/PRIMED/TIMED/

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are two calls to logistic_regression in the main function, the first one is supposed to warm up the compilation pipeline, and hence is called with simply values, and the its time is called "PRIMED". The second call gives it real arguments, and its time is called "TIMED".

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, this seems correct, it is timing the initial compilation effort, my mistake.

Enables an experimental feature that automatically parallelizes (and
performs other optimizations for) those operations in the function known to
have parallel semantics. For a list of supported operations, see the
following. This feature is enabled by passing ``parallel=True`` and
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps remove the following and add a link to the parallel.rst?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

following. This feature is enabled by passing ``parallel=True`` and
must be used in conjunction with ``nopython=True``::

@jit(nopython=True,parallel=True)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

space after comma

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@stuartarchibald
Copy link
Contributor

@ninegua Thanks for the patches to the documentation. Looks good.

@sklam
Copy link
Member

sklam commented May 9, 2017

Merging

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants