NeuraLUT-Assemble (FCCM'25) extends our prior work by assembling multiple NeuraLUT neurons into tree structures with larger fan-in.
- The hardware-aware assembling strategy groups connections at the input of these tree structures, guided by our hardware-aware pruning method.
- This design achieves better trade-offs in LUT utilization, latency, and accuracy compared to the original NeuraLUT framework.
| NeuraLUT β release v1.0.0 | PolyLUT - Hardware-aware Structured Pruning |
|---|---|
We include demo notebooks in each subfolder inside the datasets/ directory to help you get started quickly and as an exercise.
Pretrained checkpoints are also provided in the test_demo/ folder so you can skip training.
These checkpoints are not the exact ones used in the paper but are provided for convenience and practice.
- π§ Quantized training with sub-networks synthesized into truth tables.
- β‘οΈ Skip connections within LUTs for better gradient flow and performance.
- π― Easy FPGA integration using Vivado and Verilator.
- π Experiment tracking with Weights & Biases.
- π§ Supports MNIST and Jet Substructure Tagging.
- π§ͺ Integration with Brevitas for quantization-aware training.
Requires Miniconda
conda create -n neuralut python=3.12.4
conda activate neuralut
pip install torch==2.4.0 torchvision==0.19.0π For CUDA-specific instructions, refer to the PyTorch installation guide.
conda install -y packaging pyparsing
conda install -y docrep -c conda-forge
pip install --no-cache-dir git+https://github.com/Xilinx/brevitas.git@67be9b58c1c63d3923cac430ade2552d0db67ba5pip install -r requirements.txt
cd NeuraLUT
pip install .pip install wandb
wandb.login()Download and install from Xilinx Vivado.
π Used version in our experiments: Vivado 2020.1.
nix-store --realise /nix/store/q12yxbndfwibfs5jbqwcl83xsa5b0dh8-verilator-4.110git clone https://github.com/ollycassidy13/oh-my-xilinx.git /path/to/local/dir
export OHMYXILINX=/path/to/local/dirWe released a dedicated ReducedLUT branch which demonstrates the L-LUT compression pipeline described in our ReducedLUT paper. This includes:
π arXiv | π ACM DL | π¦ Zenodo
@inproceedings{andronic2025neuralut-assemble,
author = "Andronic, Marta and Constantinides, George A.",
title = "{NeuraLUT-Assemble: Hardware-Aware Assembling of Sub-Neural Networks for Efficient LUT Inference}",
booktitle = "{2025 IEEE 33rd Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM)}",
pages = "208-216",
publisher = "IEEE",
year = 2025,
note = "doi: 10.1109/FCCM62733.2025.00077"
}@inproceedings{andronic2024neuralut,
author = "Andronic, Marta and Constantinides, George A.",
title = "{NeuraLUT: Hiding Neural Network Density in Boolean Synthesizable Functions}",
booktitle = "{2024 34th International Conference on Field-Programmable Logic and Applications (FPL)}",
pages = "140-148",
publisher = "IEEE",
year = 2024,
note = "doi: 10.1109/FPL64840.2024.00028"
}@inproceedings{andronic2024neuralut,
author = "Andronic, Marta and Constantinides, George A.",
title = "{PolyLUT: Ultra-Low Latency Polynomial Inference With Hardware-Aware Structured Pruning}",
booktitle = "{IEEE Transactions on Computers}",
pages = "3181-3194",
publisher = "IEEE",
year = 2025,
note = "doi: 10.1109/TC.2025.3586311"
}@inproceedings{reducedlut,
author = {Cassidy, Oliver and Andronic, Marta and Coward, Samuel and Constantinides, George A.},
title = "{ReducedLUT: Table Decomposition with ``Don't Care'' Conditions}",
year = {2025},
isbn = {9798400713965},
publisher = {Association for Computing Machinery},
address = {New York, NY, USA},
note = "doi: 10.1145/3706628.3708823",
booktitle = {Proceedings of the 2025 ACM/SIGDA International Symposium on Field Programmable Gate Arrays},
pages = {36β42},
location = {Monterey, CA, USA},
}NeuraLUT builds on foundational work from LogicNets (Apache 2.0).
Special thanks to the open-source hardware ML community for their inspiration and contributions.