Transfer ParallelAccelerator technology to Numba (first PR) #2318

ehsantn · 2017-03-23T18:28:46Z

This adds optimization and automatic parallelization passes that can be turned on using parallel=True flag to njit. Blackscholes (examples/blackscholes/blackscholes_pa.py) and Logistic Regression (examples/logistic-regression/logistic_regression.py) benchmarks demonstrate this new feature's functionality.

sklam · 2017-04-04T17:13:32Z

numba/npyufunc/parfor.py

+        print("outer_sig = ", outer_sig.args, outer_sig.return_type, outer_sig.recvr, outer_sig.pysig)
+
+    # Build the wrapper for GUFunc
+    ufunc = ParallelGUFuncBuilder(cres.entry_point, gu_signature)


It seems that this ufunc object and the following operation on it is unnecessary.

You are right, this is dead code.

sklam · 2017-04-04T17:16:27Z

numba/npyufunc/ufuncbuilder.py

        self.signature = signature
-        self.sin, self.sout = parse_signature(signature)
+        # allow internal use to pass in data structure that represents signature
+        if isinstance(signature, tuple):


The only place that this is used is in parfor.py .
As commented in https://github.com/numba/numba/pull/2318/files#r109723364 , its usage is not needed; thus, this also seems unnecessary.

You are right. Both the above and this one are fixed in commit cab52c6. Thanks for spotting them!

stuartarchibald

Thanks for the patches, this is great. I've made a start on a review of them. My comments are largely just minor things, with the exception of a couple referring to having some class/lib of functions to take away the repetition involved in the building of ir nodes. I think this would reduce code size and make testing easier.

Thanks again.

stuartarchibald · 2017-04-10T18:23:25Z

examples/blackscholes/blackscholes_pa.py

@@ -0,0 +1,55 @@
+import numba
+import numpy as np
+# from scipy.special import erf


Can probably remove this?

Done. For future reference, this blackscholes code would be simpler if scipy.special.erf or scipy.stats.norm.cdf was supported by Numba.

Noted in #2338, thanks.

stuartarchibald · 2017-04-10T18:24:53Z

examples/blackscholes/blackscholes_pa.py

+
+@numba.vectorize(nopython=True)
+def cndf2(inp):
+    out = 0.5 + 0.5 * math.erf(0.707106781 * inp)


low precision for const sqrt(2)/2

stuartarchibald · 2017-04-10T18:25:24Z

examples/blackscholes/blackscholes_pa.py

+
+@numba.njit(parallel=run_parallel)
+def blackscholes(sptprice, strike, rate, volatility, timev):
+    logterm = np.log10(sptprice / strike)


Think that should be natural log?

stuartarchibald · 2017-04-10T18:25:53Z

examples/blackscholes/blackscholes_vec.py

@@ -0,0 +1,52 @@
+import numba
+import numpy as np
+# from scipy.special import erf


Can probably remove this?

stuartarchibald · 2017-04-10T18:26:11Z

examples/blackscholes/blackscholes_vec.py

+
+@numba.vectorize(nopython=True)
+def cndf2(inp):
+    out = 0.5 + 0.5 * math.erf(0.707106781 * inp)


low precision for const sqrt(2)/2

stuartarchibald · 2017-04-10T20:26:03Z

numba/ir.py

            return '%s %s %s' % (self.lhs, self.fn, self.rhs)
        else:
-            args = ('%s=%s' % (k, v) for k, v in self._kws.items())
+            args = ('%s=%s' % (k, v) for k, v in (self._kws.items() if config.DIFF_IR == 0 else sorted(self._kws.items())))


Very long line. Also, don't think the == 0 is needed?

I think the == 0 is needed. If NUMBA_DIFF_IR is 0 then they don't care about ordering and so _kws.items() is used which isn't in deterministic order. If DIFF_IR is non-zero then used the sorted version. I presume that you want the predominant case to occur before the "if" and the rare case in the else clause.

Shortened the line.

stuartarchibald · 2017-04-10T20:29:46Z

numba/ir_utils.py

+    return out
+
+def get_np_ufunc_typ(func):
+    """get type variable for np.empty() from builtin registry"""


Think this docstring is incorrect? Same goes for the error message below?

stuartarchibald · 2017-04-10T20:31:41Z

numba/ir_utils.py

+    """make a block that initializes loop range and iteration variables.
+    target label in jump needs to be set.
+    """
+    # g_range_var = Global(range)


Same comment as earlier about making these commonly used sequences into a class/lib.

Same response as above. We agree it would be nice to define some APIs but we're not sure by how much it would shrink the PR.

stuartarchibald · 2017-04-10T20:35:58Z

numba/npyufunc/parfor.py

+       in the context of the current function.
+    2) The body of the parfor is transformed into a gufunc function.
+    3) Code is inserted into the main function that calls do_scheduling
+       to divide the iteration space for each thread, allocatees


s/allocatees/allocates/

stuartarchibald · 2017-04-10T20:36:17Z

numba/npyufunc/parfor.py

+    3) Code is inserted into the main function that calls do_scheduling
+       to divide the iteration space for each thread, allocatees
+       reduction arrays, calls the gufunc function, and then invokes
+       the reduction function acorss the reduction arrays to produce


s/acorss/across/

Core changes in PR #2318

codecov-io · 2017-04-15T04:08:29Z

Codecov Report

Merging #2318 into master will decrease coverage by 0.45%.
The diff coverage is 68.16%.

@@            Coverage Diff             @@
##           master    #2318      +/-   ##
==========================================
- Coverage   86.66%   86.21%   -0.46%     
==========================================
  Files         317      313       -4     
  Lines       58437    59199     +762     
  Branches     6032     6384     +352     
==========================================
+ Hits        50645    51038     +393     
- Misses       6830     7124     +294     
- Partials      962     1037      +75

sklam · 2017-04-24T20:54:04Z

the example blackscholes_pa.py is failing randomly for me.

When it fails, I get the traceback:

  File "/Users/siu/miniconda3/envs/dev-numba-py35/lib/python3.5/site-packages/numba/compiler.py", line 840, in native_lowering_stage
    lower.create_cpython_wrapper(flags.release_gil)
  File "/Users/siu/miniconda3/envs/dev-numba-py35/lib/python3.5/site-packages/numba/lowering.py", line 223, in create_cpython_wrapper
    release_gil=release_gil)
  File "/Users/siu/miniconda3/envs/dev-numba-py35/lib/python3.5/site-packages/numba/targets/cpu.py", line 142, in create_cpython_wrapper
    builder.build()
  File "/Users/siu/miniconda3/envs/dev-numba-py35/lib/python3.5/site-packages/numba/callwrapper.py", line 122, in build
    self.build_wrapper(api, builder, closure, args, kws)
  File "/Users/siu/miniconda3/envs/dev-numba-py35/lib/python3.5/site-packages/numba/callwrapper.py", line 155, in build_wrapper
    val = cleanup_manager.add_arg(builder.load(obj), ty)
  File "/Users/siu/miniconda3/envs/dev-numba-py35/lib/python3.5/site-packages/numba/callwrapper.py", line 32, in add_arg
    native = self.api.to_native_value(ty, obj)
  File "/Users/siu/miniconda3/envs/dev-numba-py35/lib/python3.5/site-packages/numba/pythonapi.py", line 1316, in to_native_value
    raise NotImplementedError("cannot convert %s to native value" % (typ,))
numba.errors.LoweringError: Failed at nopython (nopython mode backend)
Failed at nopython (nopython mode backend)
cannot convert Function(<numba._DUFunc 'cndf2'>) to native value
File "blackscholes_pa.py", line 16
[1] During: lowering "[LoopNest(index_variable=parfor_index.15, range_variable=strikesize0.5, correlation=10)]{1: <ir.Block at blackscholes_pa.py (16)>}Var(parfor_index.15, blackscholes_pa.py (16))" at blackscholes_pa.py (16)

sklam · 2017-04-24T21:00:18Z

numba/tests/test_parfors.py

+
+        output = axy(A,X,Y)
+        expected = A*X+Y
+        np.testing.assert_array_equal(expected, output)


Can you test that ensure the transformation is performed?

For example, checking @do_scheduling is in the LLVM:

self.assertIn('@do_scheduling', axy.inspect_llvm(axy.signatures[0]))

Thanks for the suggestion. Done. Moving forward, we need similar functionality for the Numba IR level (inspect_IR?).

sklam · 2017-04-24T21:04:22Z

numba/parfor.py

+    for var_list in parfor.array_analysis.array_size_vars.values():
+        loop_vars |= {v.name for v in var_list if isinstance(v, ir.Var)}
+    dead_set = set()
+    # TODO: handle escaping deads


Is the escaping deads handled already? If not, can you explain what it is and the consequences of not handling them?

I just added escaping dead handling.

sklam · 2017-04-24T21:14:27Z

numba/ir_utils.py

+from numba import ir, types, typing, config, analysis
+from numba.typing.templates import signature
+import numpy
+from numba.analysis import *


Can you avoid import *? It prevents linters from finding errors.

sklam · 2017-04-24T21:15:19Z

numba/npyufunc/parfor.py

+import sys
+
+from .. import compiler, ir, types, six, cgutils, sigutils
+from numba.ir_utils import *


Can you avoid import *?

sklam · 2017-04-24T21:16:03Z

numba/parfor.py

+from numba import ir, ir_utils, types, rewrites, config, analysis
+from numba import array_analysis, postproc
+from numba.ir_utils import *
+from numba.analysis import *


Can you avoid import *?

sklam · 2017-04-24T21:17:45Z

numba/parfor.py

@@ -0,0 +1,1246 @@
+from __future__ import print_function, division, absolute_import


Can you add a highlevel description or literature reference for the transformations happening in this file?

ehsantn · 2017-04-24T21:26:54Z

@sklam I can't reproduce your error. Could you give more details about your environment?

Lowerer splits the containing block and flattens parfor to sequential loop. Refactor to top level to avoid circular import dependency.

sklam · 2017-04-25T15:16:37Z

@ehsantn, the error is likely a due to randomized hashing from set or dict. I am testing on python3.5 on your branch. I can consistently reproduce the error with PYTHONHASHSEED=2 python blackscholes_pa.py

ehsantn · 2017-04-25T16:08:55Z

Thanks. I can reproduce it now. We will investigate.

ehsantn · 2017-04-27T03:16:35Z

@sklam 48bf27b fixes the issue. Let us know if you find any other problem.

…uld each fully fuse to one parfor. Assert that translate IR contains exactly one parfor.

…ions and with link to the main page about the feature.

ninegua · 2017-05-05T04:37:19Z

Some user facing documentation has been added as of commit 656e946, please help to see if it is sufficient.

stuartarchibald

Thanks for the docs. I've read through them, they look good and offer a decent explanation of what this feature does. I've made a few comments, all but one (the one about declaring what happens in np.dot) are minor edits. Thanks again.

stuartarchibald · 2017-05-08T19:28:38Z

docs/source/user/parallel.rst

+    * :ref:`Numpy ufuncs <supported_ufuncs>` that are supported in :term:`nopython mode`.
+    * User defined :class:`~numba.DUFunc` through :func:`~numba.vectorize`.
+
+2. Numpy reduction function ``sum`` and ``prod``. Note that they have to be


s/function/functions/

stuartarchibald · 2017-05-08T19:31:35Z

docs/source/user/parallel.rst

+
+3. Numpy ``dot`` function between a matrix and a vector. When both inputs
+   are matrices, instead of parallelizing the matrix multiply, we choose to
+   leave it as a library call to Numpy's native implementation.


I presume this means the Numba equivalent implementation for Numpy's native mat-mat?

You are right. So I just put in a sentence saying for all other cases, default Numba implementation is used.

stuartarchibald · 2017-05-08T19:36:59Z

docs/source/user/parallel.rst

+many such operations and while each operation could be parallelized
+individually, such an approach often has lackluster performance due to poor
+cache behavior.  Instead, with auto-parallelization, Numba attempts to
+identify such operations in a user program, fuse adjacent ones together


s/, /, and/ and s/together/together,/ ?

stuartarchibald · 2017-05-08T19:40:39Z

docs/source/user/parallel.rst

+
+    * unary operators: ``+`` ``-`` ``~``
+    * binary operators: ``+`` ``-`` ``*`` ``/`` ``/?`` ``%`` ``|`` ``>>`` ``^`` ``<<`` ``&`` ``**`` ``//``
+    * compare operators: ``==`` ``!=`` ``<`` ``<=`` ``>`` ``>=``


s/compare/comparison/

stuartarchibald · 2017-05-08T19:55:08Z

docs/source/user/parallel.rst

+   written as ``numpy.sum(a)`` instead of ``a.sum()``.
+
+3. Numpy ``dot`` function between a matrix and a vector. When both inputs
+   are matrices, instead of parallelizing the matrix multiply, we choose to


What about the scalar and vector input combinations to np.dot, like np.dot(scalar, matrix) or np.dot(vector, vector), I seem to remember seeing in the code they weren't supported? If that's the case perhaps just say that in all cases other than the one supported the Numba impl will be called directly?

stuartarchibald · 2017-05-08T19:56:41Z

docs/source/user/parallel.rst

+In this section, we give an example of how this feature helps
+parallelize Logistic Regression::
+
+    @numba.jit(nopython=True, parallel=True)


Perhaps make this example PEP8 compliant to achieve a more consistent rendering?

stuartarchibald · 2017-05-08T20:08:57Z

docs/source/user/parallel.rst

+We will not discuss details of the algorithm, but instead focus on how
+this program behaves with auto-parallelization:
+
+1. Input ``X`` is an ``N x D`` matrix. Input ``Y`` is a vector of size ``N``,


Perhaps discuss these in order of appearance in the signature, or reorder the signature?

stuartarchibald · 2017-05-08T20:18:13Z

examples/logistic-regression/logistic_regression.py

+    t1 = time.time()
+    w = logistic_regression(labels, points, w, iterations)
+    compiletime = time.time()-t1
+    print("SELFPRIMED ", compiletime)


s/PRIMED/TIMED/

There are two calls to logistic_regression in the main function, the first one is supposed to warm up the compilation pipeline, and hence is called with simply values, and the its time is called "PRIMED". The second call gives it real arguments, and its time is called "TIMED".

Thanks, this seems correct, it is timing the initial compilation effort, my mistake.

stuartarchibald · 2017-05-08T20:21:02Z

docs/source/user/jit.rst

+Enables an experimental feature that automatically parallelizes (and
+performs other optimizations for) those operations in the function known to
+have parallel semantics.  For a list of supported operations, see the
+following.  This feature is enabled by passing ``parallel=True`` and


Perhaps remove the following and add a link to the parallel.rst?

stuartarchibald · 2017-05-08T20:21:18Z

docs/source/user/jit.rst

+following.  This feature is enabled by passing ``parallel=True`` and
+must be used in conjunction with ``nopython=True``::
+
+   @jit(nopython=True,parallel=True)


space after comma

stuartarchibald · 2017-05-09T08:16:22Z

@ninegua Thanks for the patches to the documentation. Looks good.

sklam · 2017-05-09T15:10:15Z

Merging

ehsantn force-pushed the master branch from 8143a07 to 179ce18 Compare March 23, 2017 21:39

ehsantn force-pushed the master branch 2 times, most recently from 94b63e1 to 8a31bb1 Compare April 3, 2017 16:08

sklam pushed a commit to sklam/numba that referenced this pull request Apr 3, 2017

Core changes in PR numba#2318

cb16c02

sklam pushed a commit to sklam/numba that referenced this pull request Apr 3, 2017

Core changes in PR numba#2318

cddbcf6

sklam mentioned this pull request Apr 3, 2017

Core changes in PR #2318 #2330

Merged

sklam reviewed Apr 4, 2017

View reviewed changes

sklam pushed a commit to sklam/numba that referenced this pull request Apr 4, 2017

Core changes in PR numba#2318

b61746a

ehsantn force-pushed the master branch from 935e3c3 to 80a2b19 Compare April 8, 2017 20:26

stuartarchibald reviewed Apr 10, 2017

View reviewed changes

seibert added a commit that referenced this pull request Apr 12, 2017

Merge pull request #2330 from sklam/merger/smallcore2318

711f792

Core changes in PR #2318

ehsantn force-pushed the master branch from 30acf1c to 35dccf1 Compare April 12, 2017 17:53

stuartarchibald mentioned this pull request Apr 18, 2017

scipy.special.erf or scipy.stats.norm.cdf support in numba #2338

Closed

ehsantn force-pushed the master branch from 1d8025e to bcc662c Compare April 20, 2017 02:07

sklam reviewed Apr 24, 2017

View reviewed changes

Ehsan Totoni added 6 commits April 24, 2017 14:33

generate array allocation with np.empty()

31d3b37

get shape for zeros(), zeros_like() etc.

a4acedf

separate parfor init block, refactor max label

046ee32

lower parfor2 sequential, refactor to top level

153a105

Lowerer splits the containing block and flattens parfor to sequential loop. Refactor to top level to avoid circular import dependency.

add new parfor2 file

665fd04

set call types

096818b

Ehsan Totoni added 2 commits April 24, 2017 17:34

explicit import from analysis

d701759

explicit ir_utils import in parfor files

ccba205

find array_expr map func vars from available vars only

48bf27b

DrTodd13 and others added 13 commits May 1, 2017 16:34

Sort some prints.

f5bbacf

Sort some prints.

fd40a15

copy propagate test

d5d522c

Add test for dead code removal.

fd6056c

Fix Del and tuple handling in variable renaming IR traversal

59374ea

Adding tests similar to blackscholes and logistic regression that sho…

280bb20

…uld each fully fuse to one parfor. Assert that translate IR contains exactly one parfor.

Initial version of documentation for parallel=True @jit option

d898609

Adding parallel.rst to doc list.

8f7ba42

Add brief intro to parallel=True option along with the other @jit opt…

06e1de0

…ions and with link to the main page about the feature.

Update documentation on parallel option

a23b7b6

Improve title

60358b2

Fix title underline length

1610dba

Add ufuncs and DUFunc to supported list

656e946

Ehsan Totoni and others added 2 commits May 5, 2017 08:52

parallel doc example fix, line-endings

72de282

Add reference to supported ufuncs

1dedb2b

mishpat approved these changes May 8, 2017

View reviewed changes

stuartarchibald reviewed May 8, 2017

View reviewed changes

Improve documentation based on feedbacks

e6424f1

ninegua mentioned this pull request May 9, 2017

guvectorize run does not reproduce Python output #2364

Closed

sklam merged commit d33dbb4 into numba:master May 9, 2017

stuartarchibald mentioned this pull request May 12, 2017

Likely 32bit issue in Parallel Accelerator code. #2372

Closed

		@@ -0,0 +1,1246 @@
		from __future__ import print_function, division, absolute_import

Transfer ParallelAccelerator technology to Numba (first PR) #2318

Transfer ParallelAccelerator technology to Numba (first PR) #2318

Uh oh!

Conversation

ehsantn commented Mar 23, 2017

Uh oh!

sklam Apr 4, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

stuartarchibald left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

codecov-io commented Apr 15, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

sklam commented Apr 24, 2017

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

sklam Apr 4, 2017 •

edited

Loading

codecov-io commented Apr 15, 2017 •

edited

Loading