Compyle allows users to execute a restricted subset of Python (almost similar to C) on a variety of HPC platforms. Currently we support multi-core CPU execution using Cython, and for GPU devices we use OpenCL or CUDA.
Users start with code implemented in a very restricted Python syntax, this code is then automatically transpiled, compiled and executed to run on either one CPU core, or multiple CPU cores (via OpenMP) or on a GPU. Compyle offers source-to-source transpilation, making it a very convenient tool for writing HPC libraries.
Some simple yet powerful parallel utilities are provided which can allow you to solve a remarkably large number of interesting HPC problems. Compyle also features JIT transpilation making it easy to use.
Documentation and learning material is also available in the form of:
- Documentation at: https://compyle.readthedocs.io
- An introduction to compyle in the context of writing a parallel molecular dynamics simulator is in our SciPy 2020 paper.
- Compyle poster presentation
- You may also try Compyle online for free on a Google Colab notebook.
While Compyle seems simple it is not a toy and is used heavily by the PySPH project where Compyle has its origins.
Compyle is itself largely pure Python but depends on numpy and requires either Cython or PyOpenCL or PyCUDA along with the respective backends of a C/C++ compiler, OpenCL and CUDA. If you are only going to execute code on a CPU then all you need is Cython.
You should be able to install Compyle by doing:
$ pip install compyle
Here is a very simple example:
from compyle.api import Elementwise, annotate, wrap, get_config
import numpy as np
@annotate
def axpb(i, x, y, a, b):
    y[i] = a*sin(x[i]) + b
x = np.linspace(0, 1, 10000)
y = np.zeros_like(x)
a, b = 2.0, 3.0
backend = 'cython'
get_config().use_openmp = True
x, y = wrap(x, y, backend=backend)
e = Elementwise(axpb, backend=backend)
e(x, y, a, b)
This will execute the elementwise operation in parallel using OpenMP with Cython. The code is auto-generated, compiled and called for you transparently. The first time this runs, it will take a bit of time to compile everything but the next time, this is cached and will run much faster.
If you just change the backend = 'opencl', the same exact code will be
executed using PyOpenCL and if you change the backend to 'cuda', it will
execute via CUDA without any other changes to your code. This is obviously a
very trivial example, there are more complex examples available as well.
Some simple examples and benchmarks are available in the examples directory.
You may also run these examples on the Google Colab notebook