pith. machine review for the scientific record. sign in

arxiv: 2507.04535 · v2 · submitted 2025-07-06 · 💻 cs.AR · cs.LG· hep-ex

Recognition: unknown

da4ml: Distributed Arithmetic for Real-time Neural Networks on FPGAs

Authors on Pith no claims yet
classification 💻 cs.AR cs.LGhep-ex
keywords networksneuralalgorithmfpgaslatencyareaarithmeticcmvm
0
0 comments X
read the original abstract

Neural networks with a latency requirement on the order of microseconds, like the ones used at the CERN Large Hadron Collider, are typically deployed on FPGAs fully unrolled and pipelined. A bottleneck for the deployment of such neural networks is area utilization, which is directly related to the required constant matrix-vector multiplication (CMVM) operations. In this work, we propose an efficient algorithm for implementing CMVM operations with distributed arithmetic on FPGAs that simultaneously optimizes for area consumption and latency. The algorithm achieves resource reduction similar to state-of-the-art algorithms while being significantly faster to compute. The proposed algorithm is open-sourced and integrated into the \texttt{hls4ml} library, a free and open-source library for running real-time neural network inference on FPGAs. We show that the proposed algorithm can reduce on-chip resources by up to a third for realistic, highly quantized neural networks while simultaneously reducing latency, enabling the implementation of previously infeasible networks.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. HGQ-LUT: Fast LUT-Aware Training and Efficient Architectures for DNN Inference

    cs.AR 2026-04 unverdicted novelty 6.0

    HGQ-LUT delivers a practical LUT-aware training framework with new tensor-based layers, heterogeneous quantization, and a resource surrogate that automates accuracy-efficiency trade-offs for FPGA DNN inference.