TensorIR is a scala library that allows you to train a Neural Network with relatively few lines of code. It will automatically generate efficient C++ code, optimize it (for now there's only memory planning optimization), compile it, and run it.
src: Contains scala code responsible for generating C++ codesrc/scala/tensor/ir/contains frontend code that creates IR nodessrc/scala/tensor/ir/CPUTensorOpsdefines basic tensor operations: Plus/Sub/Multiply/Divide, convolution, batchnorm, etc.src/scala/tensor/ir/CPUTensorDiffdefines Auto-Diff versions of the same operations.src/scala/tensor/ir/ResNetcontains a small example neural network built with the current IR. It currently runs on CPU backend, to use GPU backend, changeval dslDriver = new CPUTensorDiffDriverC[String,Unit]toval dslDriver = new GPUTensorDiffDriverC[String,Unit]. Simply switching the driver used is sufficient.
src/scala/tensor/ir/backendcontains backend code that generates C++ (or CUDA) code from IR nodes created by the frontend.src/scala/tensor/ir/backend/MemoryAnalysis.scalais responsible for extracting tensor lifetime information. (when is a tensor allocated, when can it be freed.) It returns aMap[Int, MemoryEvent], when the integer represents an arbitrary timestamp,MemoryEventis an event that signals either beginning of end of a tensor's lifetime.src/scala/tensor/ir/StagedMemoryAllocstoris responsible to taking in tensor lifetime information, and emit a feasible memory plan. It uses a simple best-fit strategy.MemorySolverin the same directory uses z3, but is is too slow.src/scala/tensor/ir/backend/CPUMemoryPlanningTransformeris responsible to taking in a memory plan(emited byStagedMemoryAllocstororMemorySolver) and an IR graph, and returning an modified IR graph with the specified memory plan deployed.
genContains build definition files for generated C++(or CUDA) code, also contains runtime libraries for generated code. Currently,CMakeis used to build the generated code.lms-cleanis a submodule of the Light Weight Modular Staging framework.TensorIRuses a fork of the LMS framework. This fork has 2 important modifications:- Prevent inlining of some tensor operations to preserve lifetime information of tensors.
- Use
CMaketo build generated source code, instead of manually synthesizing compile commands
testcontains a few Unit testcases for CPU backend.
The CPU backend relies on intel's MKL-dnn(installable by brew install mkl-dnn on mac), the GPU backend relies on CUDA, cuDNN, and thrust.