Train it. Quantize it. Synthesize and simulate it — in hardware. All in Python.
pyrtlnet is a self-contained example of a quantized neural network that runs
end-to-end in Python. From model training, to software inference, to hardware
generation, all the way to simulating that custom inference hardware at the logic-gate
level — you can do it all right from the Python REPL. We hope you will find pyrtlnet
(rhymes with turtle-net) a complete and understandable walkthrough that goes from
TensorFlow training to bit-accurate hardware simulation,
with the PyRTL hardware description language.
Main features include:
-
Quantized neural network training with TensorFlow. The resulting inference network is fully quantized, so all inference calculations are done with integers.
-
Four different quantized inference implementations operating at different levels of abstraction. All four implementations produce the same output in the same format and, in doing so, provide a useful framework to extend either from the top-down or the bottom-up.
-
A reference quantized inference implementation, using the standard LiteRT
Interpreter. -
A software implementation of quantized inference, using NumPy and fxpmath, to verify the math performed by the reference implementation.
-
A PyRTL hardware implementation of quantized inference that is simulated right at the logic gate level.
-
A deployment of the PyRTL hardware design to a Pynq Z2 FPGA.
-
-
A new PyRTL linear algebra library, including a composable
WireMatrix2Dmatrix abstraction and an output-stationary systolic array for matrix multiplication. -
An extensive suite of unit tests, and continuous integration testing.
-
Understandable and documented code!
pyrtlnetis designed to be, first and foremost, understandable and readable (even when that comes at the expense of performance). Reference documentation is extracted from docstrings with Sphinx.
-
Install
git. -
Clone this repository, and
cdto the repository's root directory.$ git clone https://github.com/UCSBarchlab/pyrtlnet.git $ cd pyrtlnet -
Install
uv. -
(optional) Install Verilator if you want to export the inference hardware to Verilog, and simulate the Verilog version of the hardware.
-
Run
uv runtensorflow_training.pyin this repository's root directory. This trains a quantized neural network with TensorFlow, on the MNIST data set, and produces a quantizedtflitesaved model file, namedquantized.tflite.$ uv run tensorflow_training.py Training unquantized model. Epoch 1/10 1875/1875 [==============================] - 1s 350us/step - loss: 0.6532 - accuracy: 0.8202 Epoch 2/10 1875/1875 [==============================] - 1s 346us/step - loss: 0.3304 - accuracy: 0.9039 Epoch 3/10 1875/1875 [==============================] - 1s 347us/step - loss: 0.2944 - accuracy: 0.9145 Epoch 4/10 1875/1875 [==============================] - 1s 350us/step - loss: 0.2719 - accuracy: 0.9205 Epoch 5/10 1875/1875 [==============================] - 1s 352us/step - loss: 0.2551 - accuracy: 0.9245 Epoch 6/10 1875/1875 [==============================] - 1s 348us/step - loss: 0.2403 - accuracy: 0.9288 Epoch 7/10 1875/1875 [==============================] - 1s 350us/step - loss: 0.2280 - accuracy: 0.9330 Epoch 8/10 1875/1875 [==============================] - 1s 346us/step - loss: 0.2178 - accuracy: 0.9358 Epoch 9/10 1875/1875 [==============================] - 1s 348us/step - loss: 0.2092 - accuracy: 0.9378 Epoch 10/10 1875/1875 [==============================] - 1s 350us/step - loss: 0.2023 - accuracy: 0.9403 Evaluating unquantized model. 313/313 [==============================] - 0s 235us/step - loss: 0.1994 - accuracy: 0.9414 Training quantized model and writing quantized.tflite and quantized.npz. Epoch 1/2 1875/1875 [==============================] - 1s 410us/step - loss: 0.1963 - accuracy: 0.9426 Epoch 2/2 1875/1875 [==============================] - 1s 408us/step - loss: 0.1936 - accuracy: 0.9423 ... Evaluating quantized model. 313/313 [==============================] - 0s 286us/step - loss: 0.1996 - accuracy: 0.9413 Writing mnist_test_data.npz.
The script's output shows that the unquantized model achieved
0.9414accuracy on the test data set, while the quantized model achieved0.9413accuracy on the test data set.This script produces
quantized.tfliteandquantized.npzfiles which includes all the model's weights, biases, and quantization parameters.quantized.tfliteis a standard.tflitesaved model file that can be read by tools like the Model Explorer.quantized.npzstores the weights, biases, and quantization parameters as NumPy saved arrays.quantized.npzis read by all the provided inference implementations. -
Run
uv runlitert_inference.pyin this repository's root directory. This runs one test image through the reference LiteRT inference implementation.The script outputs many useful pieces of information:
-
A display of the input image, in this case a picture of the digit
7. This display requires a terminal that supports 24-bit color, like gnome-terminal or iTerm2. This is the first image in the test data set(#0). -
The input shape,
(12, 12), anddtype float32. -
The output from the first layer of the network, with shape
(1, 18)anddtype int8. -
The output from the second layer of the network, with shape
(1, 10)anddtype int8. -
A bar chart displaying the network's final output, which is the inferred likelihood that the image contains each digit. The network only has two layers, so this is the same data from the
layer 1 outputline, reformatted into a bar chart.In this case, the digit
7is the most likely, with a score of93, followed by the digit3with a score of58. The digit7is labeled asactualbecause it is the actual prediction generated by the neural network. It is also labeled asexpectedbecause the labled test data confirms that the image actually depicts the digit7.
The
litert_inference.pyscript also supports a--start_imagecommand line flag, to run inference on other images from the test data set. There is also a--num_imagesflag, which will run several images from the test data set, one at a time, and print an accuracy score. All of the provided inference scripts accept these command line flags. For example:$ uv run litert_inference.py --start_image=7 --num_images=10 ... 9/10 correct predictions, 90% accuracy
-
-
Run
uv runnumpy_inference.pyin this repository's root directory. This runs one test image through the software NumPy and fxpmath inference implementation. This implements inference for the quantized neural network as a series of NumPy calls, using the fxpmath fixed-point math library.The tensors output by this script should exactly match the tensors output by
litert_inference.py, except that each layer's outputs are transposed. -
Run
uv runpyrtl_inference.py--verilogin this repository's root directory. This runs one test image through the hardware PyRTL inference implementation. This implementation converts the quantized neural network into hardware logic, and simulates the hardware with a PyRTLSimulation.The tensors output by this script should exactly match the tensors output by
numpy_inference.py.The
--verilogflag makespyrtl_inference.pygenerate a Verilog version of the hardware, which is written topyrtl_inference.v, and a testbench written topyrtl_inference_test.v. The next step will use these generated Verilog files. -
If
verilatoris installed, runverilator --trace -j 0 --binary pyrtl_inference_test.v:$ verilator --trace -j 0 --binary pyrtl_inference_test.v ... - V e r i l a t i o n R e p o r t: Verilator 5.032 2025-01-01 rev (Debian 5.032-1) - Verilator: Built from 0.227 MB sources in 3 modules, into 13.052 MB in 18 C++ files needing 0.022 MB - Verilator: Walltime 3.576 s (elab=0.014, cvt=0.598, bld=2.857); cpu 0.847 s on 32 threads; alloced 164.512 MB
This converts the generated Verilog files to generated C++ code, and compiles the generated C++ code. The outputs of this process can be found in the
obj_dirdirectory. -
If
verilatoris installed, runobj_dir/Vpyrtl_inference_test:$ obj_dir/Vpyrtl_inference_test ... time 1930 layer1 output (transposed): [[ 33 -48 29 58 -50 31 -87 93 9 49]] argmax: 7 - pyrtl_inference_test.v:858: Verilog $finish - S i m u l a t i o n R e p o r t: Verilator 5.032 2025-01-01 - Verilator: $finish at 2ns; walltime 0.005 s; speed 329.491 ns/s - Verilator: cpu 0.006 s on 1 threads; alloced 249 MB
The final
layer1 outputprinted by the Verilator simulation should exactly match thelayer1 outputtensors output bypyrtl_inference.py.
See
fpga/README.md
for instructions on running pyrtlnet inference on a
Pynq Z2 FPGA.
The reference documentation has more information on how these scripts work and their main interfaces.
Try the pyrtl_matrix.py demo script, with
uv run pyrtl_matrix.py
to see how the PyRTL systolic array multiplies matrices. Also see the
documentation for
make_systolic_array:
pyrtl_matrix.py also supports the --verilog flag, so this systolic array
simulation can be repeated with Verilator.
-
Many TODOs are scattered throughout this code base. If one speaks to you, try addressing it! Some notable TODOs:
-
Support input batching, so the various inference systems can process more than one image at a time.
-
Extend
WireMatrix2Dto support an arbitrary number of dimensions, not just two. Extend the systolic array to support multiplying matrices with more dimensions. This is needed to support convolutional neural networks, for example. -
Add support for block matrix multiplications, so all neural network layers can share one systolic array that processes uniformly-sized blocks of inputs at a time. Currently, each layer creates its own systolic array that's large enough to process all of its input data, which is not very realistic.
-
Support arbitrary neural network architectures. The current implementation assumes a model with exactly two layers. Instead, we should discover the number of layers, and how they are connected, by analyzing the saved model.
-
-
Add an
inference_utilto collect image input data directly from the user. It would be cool to draw a digit with a mouse or touch screen, and see the prediction generated by one of the inference implementations. -
Support more advanced neural network architectures, like convolutional neural networks or transformers.
Contributions are welcome! Please check a few things before sending a pull request:
-
Before attempting a large change, please discuss your plan with maintainers. Open an issue or start a discussion and describe your proposed change.
-
Ensure that all tests pass, and that new features are tested. Tests are run with
pytest:$ uv run pytest ============================ test session starts ============================ ... collected 20 items tests/litert_inference_test.py . [ 5%] tests/numpy_inference_test.py . [ 10%] tests/pyrtl_inference_test.py . [ 15%] tests/pyrtl_matrix_test.py .......... [ 65%] tests/tensorflow_training_test.py .. [ 75%] tests/wire_matrix_2d_test.py ..... [100%] ============================ 20 passed in 15.75s ============================
pytest-xdistis also installed, so testing can be accelerated by running the tests in parallel withpytest -n auto. -
Ensure that
rufflint checks pass:$ uv run ruff check All checks passed! -
Apply
ruffautomatic code formatting:$ uv run ruff format 22 files left unchanged
uv pins all pip dependencies to specific versions for reproducible
behavior. These pinned dependencies must be manually updated with
uv lock --upgrade.
When a new
minor version version of Python is
released, update the pinned Python version with uv python pin $VERSION, and
the Python version in .readthedocs.yaml.