- 
                Notifications
    
You must be signed in to change notification settings  - Fork 1.2k
 
Transfer ParallelAccelerator technology to Numba (first PR) #2318
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
94b63e1    to
    8a31bb1      
    Compare
  
            
          
                numba/npyufunc/parfor.py
              
                Outdated
          
        
      | print("outer_sig = ", outer_sig.args, outer_sig.return_type, outer_sig.recvr, outer_sig.pysig) | ||
| 
               | 
          ||
| # Build the wrapper for GUFunc | ||
| ufunc = ParallelGUFuncBuilder(cres.entry_point, gu_signature) | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems that this ufunc object and the following operation on it is unnecessary.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are right, this is dead code.
        
          
                numba/npyufunc/ufuncbuilder.py
              
                Outdated
          
        
      | self.signature = signature | ||
| self.sin, self.sout = parse_signature(signature) | ||
| # allow internal use to pass in data structure that represents signature | ||
| if isinstance(signature, tuple): | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The only place that this is used is in parfor.py .
As commented in https://github.com/numba/numba/pull/2318/files#r109723364 , its usage is not needed; thus, this also seems unnecessary.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are right. Both the above and this one are fixed in commit cab52c6. Thanks for spotting them!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the patches, this is great. I've made a start on a review of them. My comments are largely just minor things, with the exception of a couple referring to having some class/lib of functions to take away the repetition involved in the building of ir nodes. I think this would reduce code size and make testing easier.
Thanks again.
| @@ -0,0 +1,55 @@ | |||
| import numba | |||
| import numpy as np | |||
| # from scipy.special import erf | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can probably remove this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done. For future reference, this blackscholes code would be simpler if scipy.special.erf or scipy.stats.norm.cdf was supported by Numba.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Noted in #2338, thanks.
| 
               | 
          ||
| @numba.vectorize(nopython=True) | ||
| def cndf2(inp): | ||
| out = 0.5 + 0.5 * math.erf(0.707106781 * inp) | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
low precision for const sqrt(2)/2
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed.
| 
               | 
          ||
| @numba.njit(parallel=run_parallel) | ||
| def blackscholes(sptprice, strike, rate, volatility, timev): | ||
| logterm = np.log10(sptprice / strike) | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Think that should be natural log?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed.
| @@ -0,0 +1,52 @@ | |||
| import numba | |||
| import numpy as np | |||
| # from scipy.special import erf | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can probably remove this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
| 
               | 
          ||
| @numba.vectorize(nopython=True) | ||
| def cndf2(inp): | ||
| out = 0.5 + 0.5 * math.erf(0.707106781 * inp) | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
low precision for const sqrt(2)/2
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed.
        
          
                numba/ir.py
              
                Outdated
          
        
      | return '%s %s %s' % (self.lhs, self.fn, self.rhs) | ||
| else: | ||
| args = ('%s=%s' % (k, v) for k, v in self._kws.items()) | ||
| args = ('%s=%s' % (k, v) for k, v in (self._kws.items() if config.DIFF_IR == 0 else sorted(self._kws.items()))) | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very long line. Also, don't think the == 0 is needed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the == 0 is needed. If NUMBA_DIFF_IR is 0 then they don't care about ordering and so _kws.items() is used which isn't in deterministic order. If DIFF_IR is non-zero then used the sorted version. I presume that you want the predominant case to occur before the "if" and the rare case in the else clause.
Shortened the line.
        
          
                numba/ir_utils.py
              
                Outdated
          
        
      | return out | ||
| 
               | 
          ||
| def get_np_ufunc_typ(func): | ||
| """get type variable for np.empty() from builtin registry""" | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Think this docstring is incorrect? Same goes for the error message below?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed.
| """make a block that initializes loop range and iteration variables. | ||
| target label in jump needs to be set. | ||
| """ | ||
| # g_range_var = Global(range) | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same comment as earlier about making these commonly used sequences into a class/lib.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same response as above. We agree it would be nice to define some APIs but we're not sure by how much it would shrink the PR.
        
          
                numba/npyufunc/parfor.py
              
                Outdated
          
        
      | in the context of the current function. | ||
| 2) The body of the parfor is transformed into a gufunc function. | ||
| 3) Code is inserted into the main function that calls do_scheduling | ||
| to divide the iteration space for each thread, allocatees | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s/allocatees/allocates/
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed.
        
          
                numba/npyufunc/parfor.py
              
                Outdated
          
        
      | 3) Code is inserted into the main function that calls do_scheduling | ||
| to divide the iteration space for each thread, allocatees | ||
| reduction arrays, calls the gufunc function, and then invokes | ||
| the reduction function acorss the reduction arrays to produce | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s/acorss/across/
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed.
          Codecov Report
 @@            Coverage Diff             @@
##           master    #2318      +/-   ##
==========================================
- Coverage   86.66%   86.21%   -0.46%     
==========================================
  Files         317      313       -4     
  Lines       58437    59199     +762     
  Branches     6032     6384     +352     
==========================================
+ Hits        50645    51038     +393     
- Misses       6830     7124     +294     
- Partials      962     1037      +75 | 
    
| 
           the example  When it fails, I get the traceback:  | 
    
| 
               | 
          ||
| output = axy(A,X,Y) | ||
| expected = A*X+Y | ||
| np.testing.assert_array_equal(expected, output) | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you test that ensure the transformation is performed?
For example, checking @do_scheduling is in the LLVM:
self.assertIn('@do_scheduling', axy.inspect_llvm(axy.signatures[0]))There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the suggestion. Done. Moving forward, we need similar functionality for the Numba IR level (inspect_IR?).
        
          
                numba/parfor.py
              
                Outdated
          
        
      | for var_list in parfor.array_analysis.array_size_vars.values(): | ||
| loop_vars |= {v.name for v in var_list if isinstance(v, ir.Var)} | ||
| dead_set = set() | ||
| # TODO: handle escaping deads | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the escaping deads handled already? If not, can you explain what it is and the consequences of not handling them?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just added escaping dead handling.
        
          
                numba/ir_utils.py
              
                Outdated
          
        
      | from numba import ir, types, typing, config, analysis | ||
| from numba.typing.templates import signature | ||
| import numpy | ||
| from numba.analysis import * | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you avoid import *?  It prevents linters from finding errors.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
        
          
                numba/npyufunc/parfor.py
              
                Outdated
          
        
      | import sys | ||
| 
               | 
          ||
| from .. import compiler, ir, types, six, cgutils, sigutils | ||
| from numba.ir_utils import * | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you avoid import *?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
        
          
                numba/parfor.py
              
                Outdated
          
        
      | from numba import ir, ir_utils, types, rewrites, config, analysis | ||
| from numba import array_analysis, postproc | ||
| from numba.ir_utils import * | ||
| from numba.analysis import * | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you avoid import *?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
| @@ -0,0 +1,1246 @@ | |||
| from __future__ import print_function, division, absolute_import | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add a highlevel description or literature reference for the transformations happening in this file?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
| 
           @sklam I can't reproduce your error. Could you give more details about your environment?  | 
    
Lowerer splits the containing block and flattens parfor to sequential loop. Refactor to top level to avoid circular import dependency.
| 
           @ehsantn,  the error is likely a due to randomized hashing from set or dict.  I am testing on python3.5 on your branch.  I can consistently reproduce the error with   | 
    
| 
           Thanks. I can reproduce it now. We will investigate.  | 
    
…uld each fully fuse to one parfor. Assert that translate IR contains exactly one parfor.
…ions and with link to the main page about the feature.
| 
           Some user facing documentation has been added as of commit 656e946, please help to see if it is sufficient.  | 
    
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the docs. I've read through them, they look good and offer a decent explanation of what this feature does. I've made a few comments, all but one (the one about declaring what happens in np.dot) are minor edits. Thanks again.
        
          
                docs/source/user/parallel.rst
              
                Outdated
          
        
      | * :ref:`Numpy ufuncs <supported_ufuncs>` that are supported in :term:`nopython mode`. | ||
| * User defined :class:`~numba.DUFunc` through :func:`~numba.vectorize`. | ||
| 
               | 
          ||
| 2. Numpy reduction function ``sum`` and ``prod``. Note that they have to be | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s/function/functions/
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
        
          
                docs/source/user/parallel.rst
              
                Outdated
          
        
      | 
               | 
          ||
| 3. Numpy ``dot`` function between a matrix and a vector. When both inputs | ||
| are matrices, instead of parallelizing the matrix multiply, we choose to | ||
| leave it as a library call to Numpy's native implementation. | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I presume this means the Numba equivalent implementation for Numpy's native mat-mat?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are right. So I just put in a sentence saying for all other cases, default Numba implementation is used.
        
          
                docs/source/user/parallel.rst
              
                Outdated
          
        
      | many such operations and while each operation could be parallelized | ||
| individually, such an approach often has lackluster performance due to poor | ||
| cache behavior. Instead, with auto-parallelization, Numba attempts to | ||
| identify such operations in a user program, fuse adjacent ones together | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s/, /, and/ and s/together/together,/ ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
        
          
                docs/source/user/parallel.rst
              
                Outdated
          
        
      | 
               | 
          ||
| * unary operators: ``+`` ``-`` ``~`` | ||
| * binary operators: ``+`` ``-`` ``*`` ``/`` ``/?`` ``%`` ``|`` ``>>`` ``^`` ``<<`` ``&`` ``**`` ``//`` | ||
| * compare operators: ``==`` ``!=`` ``<`` ``<=`` ``>`` ``>=`` | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s/compare/comparison/
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
        
          
                docs/source/user/parallel.rst
              
                Outdated
          
        
      | written as ``numpy.sum(a)`` instead of ``a.sum()``. | ||
| 
               | 
          ||
| 3. Numpy ``dot`` function between a matrix and a vector. When both inputs | ||
| are matrices, instead of parallelizing the matrix multiply, we choose to | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about the scalar and vector input combinations to np.dot, like np.dot(scalar, matrix) or np.dot(vector, vector), I seem to remember seeing in the code they weren't supported? If that's the case perhaps just say that in all cases other than the one supported the Numba impl will be called directly?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done.
| In this section, we give an example of how this feature helps | ||
| parallelize Logistic Regression:: | ||
| 
               | 
          ||
| @numba.jit(nopython=True, parallel=True) | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps make this example PEP8 compliant to achieve a more consistent rendering?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
        
          
                docs/source/user/parallel.rst
              
                Outdated
          
        
      | We will not discuss details of the algorithm, but instead focus on how | ||
| this program behaves with auto-parallelization: | ||
| 
               | 
          ||
| 1. Input ``X`` is an ``N x D`` matrix. Input ``Y`` is a vector of size ``N``, | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps discuss these in order of appearance in the signature, or reorder the signature?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
| t1 = time.time() | ||
| w = logistic_regression(labels, points, w, iterations) | ||
| compiletime = time.time()-t1 | ||
| print("SELFPRIMED ", compiletime) | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s/PRIMED/TIMED/
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are two calls to logistic_regression in the main function, the first one is supposed to warm up the compilation pipeline, and hence is called with simply values, and the its time is called "PRIMED". The second call gives it real arguments, and its time is called "TIMED".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, this seems correct, it is timing the initial compilation effort, my mistake.
        
          
                docs/source/user/jit.rst
              
                Outdated
          
        
      | Enables an experimental feature that automatically parallelizes (and | ||
| performs other optimizations for) those operations in the function known to | ||
| have parallel semantics. For a list of supported operations, see the | ||
| following. This feature is enabled by passing ``parallel=True`` and | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps remove the following and add a link to the parallel.rst?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
        
          
                docs/source/user/jit.rst
              
                Outdated
          
        
      | following. This feature is enabled by passing ``parallel=True`` and | ||
| must be used in conjunction with ``nopython=True``:: | ||
| 
               | 
          ||
| @jit(nopython=True,parallel=True) | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
space after comma
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
| 
           @ninegua Thanks for the patches to the documentation. Looks good.  | 
    
| 
           Merging  | 
    
This adds optimization and automatic parallelization passes that can be turned on using
parallel=Trueflag tonjit. Blackscholes (examples/blackscholes/blackscholes_pa.py) and Logistic Regression (examples/logistic-regression/logistic_regression.py) benchmarks demonstrate this new feature's functionality.