pith. sign in

arxiv: 2303.02495 · v3 · submitted 2023-03-04 · 💻 cs.DC

scaleTRIM: Scalable TRuncation-Based Integer Approximate Multiplier with Linearization and Compensation

Pith reviewed 2026-05-24 09:58 UTC · model grok-4.3

classification 💻 cs.DC
keywords approximate multipliertruncationlinearizationerror compensationinteger arithmeticDNN accelerationpower delay producthardware efficiency
0
0 comments X

The pith

scaleTRIM approximates multiplication by fitting linear functions to truncated operands and adding piecewise error compensation from segment averages.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper proposes scaleTRIM, an approximate integer multiplier that first truncates each operand to h bits according to leading-one position, then replaces the full product with a linear function obtained by curve fitting, and finally applies a correction term formed by averaging the approximation error inside each of M segments of the input space. These steps convert the operation into additions and bit shifts plus a small lookup table. The design supports multiple truncation depths and compensation levels to span different accuracy-efficiency points. When compared with prior approximate multipliers, it reports a 15.2 percent reduction in mean relative error distance under an efficiency constraint and a 22.8 percent reduction in power-delay product under joint accuracy and efficiency constraints. The same circuits, when placed inside deep neural networks for image classification, produce a better overall accuracy-efficiency operating point than the reference designs.

Core claim

Multiplication of two integers can be replaced by a linear function fitted to their h-bit truncated versions together with an M-segment piecewise-constant correction obtained by averaging the residual error inside each segment; the resulting circuit uses only additions, shifts, and a lookup table and yields lower mean relative error distance and lower power-delay product than existing approximate multipliers while preserving usability inside DNN inference workloads.

What carries the argument

Truncation of operands to h bits followed by curve-fitted linearization of the product term and an M-segment piecewise-average error compensation unit realized with a lookup table.

If this is right

  • Designers can select any truncation depth and any number of compensation segments to obtain a desired accuracy-efficiency operating point without redesigning the core arithmetic.
  • The multiplier satisfies both the accuracy constraint and the efficiency constraint while improving mean relative error distance relative to prior art.
  • The multiplier satisfies both the accuracy constraint and the efficiency constraint while improving power-delay product relative to prior art.
  • When substituted into deep neural networks for image classification, the design produces a measurably better accuracy-efficiency trade-off than the compared state-of-the-art approximate multipliers.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same truncation-plus-linear-fit pattern could be retargeted to other fixed-point arithmetic operations such as squaring or division by deriving new fitting coefficients and segment tables.
  • An adaptive version that chooses the number of segments or the truncation depth at run time according to operand statistics might further improve the average trade-off without changing the hardware template.
  • Because the method is parameter-free once the linear coefficients and segment table are stored, it could be instantiated for many different integer widths simply by regenerating the small lookup tables.

Load-bearing premise

The linear functions obtained by curve fitting on the truncated operands and the piecewise averages used for compensation will continue to produce the reported accuracy-efficiency numbers for the bit widths, input distributions, and DNN workloads examined in the evaluation.

What would settle it

Measuring mean relative error distance and power-delay product on the same bit widths but with input distributions or neural-network models drawn from a source materially different from those used in the paper and finding that the claimed 15.2 percent and 22.8 percent gains disappear or reverse.

Figures

Figures reproduced from arXiv: 2303.02495 by Ali Mahani, Behnam Ghavami, Ebrahim Farahmand, Hassan Ghasemzadeh, Mohammad Javad Askarizadeh, Muhammad Abdullah Hanif, Muhammad Shafique.

Figure 2
Figure 2. Figure 2: Overview of our novel design of scaleTRIM ( [PITH_FULL_IMAGE:figures/full_fig_p002_2.png] view at source ↗
Figure 1
Figure 1. Figure 1: Absolute Relative Error (ARED) of state-of-the-art works (a-c) [PITH_FULL_IMAGE:figures/full_fig_p002_1.png] view at source ↗
Figure 4
Figure 4. Figure 4: An Example of Error Value (EV) of proposed approximate [PITH_FULL_IMAGE:figures/full_fig_p004_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: The hardware design of scaleTRIM constants without the use of memory. Furthermore, the value for compensating the error is selected based on added Xh and Yh value by M × 1 multiplexer. In the last step, the result of added compensated unit and arithmetic unit is bit-wise shifted with the amount resulting from adding the LODs of the inputs operand. Error-configurability: The proposed multiplier is error con… view at source ↗
Figure 6
Figure 6. Figure 6: Accuracy and Efficiency of 8-bit Approximate Multipliers [PITH_FULL_IMAGE:figures/full_fig_p006_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: Design space of comparison the 8-bit scaleTRIM with the state [PITH_FULL_IMAGE:figures/full_fig_p006_7.png] view at source ↗
read the original abstract

In this paper, we propose a scalable approximate multiplier design, scaleTRIM, that approximates the multiplication operation using fitted linear functions, also referred to as linearization. We show that multiplication operations can be completely replaced by low-cost addition and bit-wise shift operations by exploiting linearization. Moreover, our proposed design utilizes a lookup table (LUT)-based compensation unit as a novel error-reduction method. In essence, input operands are truncated to a reduced bit-width representation (i.e., h bits) based on their leading-one positions. Then, a curve-fitting method is employed to map the product term to a linear function. Additionally, a piecewise constant error-correction term is used to reduce the approximation error. To compute the piecewise constant, we divide the function space into M segments and average the errors within each segment. In particular, our multiplier supports various degrees of truncation and error compensation to offer a range of accuracy-efficiency trade-offs. The proposed multiplier improves the Mean Relative Error Distance (MRED) by about 15.2% while satisfying the efficiency constraint and improves the Power Delay Product (PDP) by about 22.8% while satisfying the accuracy and efficiency constraints compared to different state-of-the-art approximate multipliers. From a usability perspective, our evaluation of the proposed design for image classification using Deep Neural Networks (DNNs) demonstrates that scaleTRIM offers a better accuracy-efficiency trade-off than state-of-the-art approximate multiplier designs.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 2 minor

Summary. The paper proposes scaleTRIM, a scalable approximate integer multiplier that truncates operands to h bits based on leading-one position, replaces multiplication with low-cost linear functions (additions and shifts) obtained via curve-fitting, and applies a LUT-based piecewise constant compensation (M segments, segment-wise error averages) for error reduction. It reports ~15.2% MRED improvement under efficiency constraints and ~22.8% PDP improvement under accuracy/efficiency constraints versus prior approximate multipliers, plus improved accuracy-efficiency trade-offs when used in DNN image classification.

Significance. If the accuracy and efficiency gains prove robust to input distributions and bit-widths beyond the fitting data, the design contributes a tunable truncation-plus-linearization methodology that converts multiplication to add/shift operations with explicit compensation, offering a new point in the approximate-computing design space for energy-constrained accelerators.

major comments (3)
  1. [Abstract / design methodology] Abstract and design section: the linear functions are obtained by curve-fitting directly to the product of the truncated operands being approximated, and the piecewise constants are segment-wise averages computed on the identical error surface; this makes the reported MRED/PDP gains specific to the fitting distribution and segment boundaries rather than intrinsic, undermining the claim of general scalability across bit-widths and DNN workloads without per-application retuning.
  2. [Evaluation / results] Evaluation section: the headline 15.2% MRED and 22.8% PDP improvements are stated without error bars, statistical tests, or explicit description of how the curve-fitting was validated (e.g., hold-out sets, cross-distribution testing); the central empirical claim therefore rests on point estimates whose stability is unquantified.
  3. [DNN evaluation] DNN evaluation: while the paper shows better trade-offs on image classification, it does not report whether the same (h, M) parameters fitted on generic operands were used or whether retuning was performed per network; this directly affects the usability claim.
minor comments (2)
  1. [Design] Notation for the linear function coefficients and the exact definition of the truncation width h should be introduced with an equation in the design section for reproducibility.
  2. [Abstract / results tables] The abstract states concrete percentage improvements; the corresponding tables or figures should explicitly list the exact baseline designs and bit-widths used for each comparison.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive comments on our manuscript. We provide point-by-point responses to the major comments below, indicating where we agree and will make revisions.

read point-by-point responses
  1. Referee: [Abstract / design methodology] Abstract and design section: the linear functions are obtained by curve-fitting directly to the product of the truncated operands being approximated, and the piecewise constants are segment-wise averages computed on the identical error surface; this makes the reported MRED/PDP gains specific to the fitting distribution and segment boundaries rather than intrinsic, undermining the claim of general scalability across bit-widths and DNN workloads without per-application retuning.

    Authors: The curve-fitting and compensation calculations are performed exhaustively over all possible combinations of the h-bit truncated operands, which constitute the complete function space rather than samples from any particular distribution. This makes the approximation intrinsic to the truncated multiplication operation. The scalability arises from the parametric nature of h and M, allowing the same methodology to be applied to different bit-widths. We will revise the design section to clarify that the fitting uses exhaustive enumeration of the truncated operand space. revision: partial

  2. Referee: [Evaluation / results] Evaluation section: the headline 15.2% MRED and 22.8% PDP improvements are stated without error bars, statistical tests, or explicit description of how the curve-fitting was validated (e.g., hold-out sets, cross-distribution testing); the central empirical claim therefore rests on point estimates whose stability is unquantified.

    Authors: We agree that more details on the validation of the curve-fitting would strengthen the paper. Since the fitting is deterministic and based on the exact mathematical products, hold-out validation is not relevant. We will add an explicit description of the curve-fitting and compensation computation process in the revised manuscript. We will also consider including results across multiple input distributions to quantify stability. revision: yes

  3. Referee: [DNN evaluation] DNN evaluation: while the paper shows better trade-offs on image classification, it does not report whether the same (h, M) parameters fitted on generic operands were used or whether retuning was performed per network; this directly affects the usability claim.

    Authors: We will update the DNN evaluation section to clarify that the (h, M) parameters were selected from the general accuracy-efficiency characterization of the multiplier and used consistently across all DNN experiments without any per-network retuning. This supports the claim of usability as a general-purpose approximate multiplier. revision: yes

Circularity Check

0 steps flagged

No significant circularity; design uses explicit curve-fitting as stated method

full rationale

The paper explicitly describes its core technique as employing curve-fitting to obtain linear functions for truncated operands and computing piecewise averages for compensation. These are presented as design choices for an approximate multiplier, not as a first-principles derivation or prediction. Performance claims (MRED/PDP improvements) are empirical comparisons to other designs and exact multiplication, with no reduction of the central result to a self-referential fit or self-citation chain. The approach is self-contained engineering work; the fitting process is the approximation mechanism itself rather than a hidden circular step.

Axiom & Free-Parameter Ledger

3 free parameters · 1 axioms · 0 invented entities

The central claim rests on fitted linear coefficients obtained via curve-fitting and on the choice of M segments for averaging errors; these are free parameters introduced to make the approximation work. The assumption that truncation based on leading-one position preserves enough information for the linear map is a domain assumption.

free parameters (3)
  • linear function coefficients
    Obtained by curve-fitting the product term after truncation; directly determine the approximation.
  • M (number of segments)
    Controls how the error space is divided for piecewise constant compensation; chosen to reduce error.
  • h (truncation width)
    Determines how many bits are kept after leading-one detection; trades accuracy for cost.
axioms (1)
  • domain assumption Multiplication of truncated operands can be usefully approximated by a linear function of the inputs.
    Invoked when the paper states that multiplication operations can be completely replaced by low-cost addition and bit-wise shift operations via linearization.

pith-pipeline@v0.9.0 · 5822 in / 1480 out tokens · 23286 ms · 2026-05-24T09:58:50.139568+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

23 extracted references · 23 canonical work pages

  1. [2]

    An Improved Logarithmic Multiplier for Energy-Efficient Neural Computing

    Mohammad Saeed Ansari, Bruce F. Cockburn, and Jie Han. “An Improved Logarithmic Multiplier for Energy-Efficient Neural Computing”. In: IEEE Trans- actions on Computers 70.4 (2021), pp. 614–625. DOI: 10.1109/TC.2020.2992113

  2. [3]

    AdaPT: Fast Emulation of Approximate DNN Accelerators in PyTorch

    Dimitrios Danopoulos et al. “AdaPT: Fast Emulation of Approximate DNN Accelerators in PyTorch”. In: IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems (2022), pp. 1–1. DOI: 10 . 1109 / TCAD . 2022 . 3212645

  3. [4]

    On the use of approximate adders in carry-save multiplier- accumulators

    Darjn Esposito et al. “On the use of approximate adders in carry-save multiplier- accumulators”. In: 2017 IEEE International Symposium on Circuits and Systems (ISCAS). 2017, pp. 1–4. DOI: 10.1109/ISCAS.2017.8050437

  4. [5]

    Approximate multipliers based on a novel unbiased approximate 4-2 compressor

    Bao Fang et al. “Approximate multipliers based on a novel unbiased approximate 4-2 compressor”. In: Integration 81 (2021), pp. 17–24

  5. [6]

    High performance and optimal configuration of accurate heterogeneous block-based approximate adder

    Ebrahim Farahmand et al. “High performance and optimal configuration of accurate heterogeneous block-based approximate adder”. In: arXiv preprint arXiv:2106.08800 (2021)

  6. [7]

    Error resilience analysis for systematically employing approximate computing in con- volutional neural networks

    Muhammad Abdullah Hanif, Rehan Hafiz, and Muhammad Shafique. “Error resilience analysis for systematically employing approximate computing in con- volutional neural networks”. In: 2018 Design, Automation & Test in Europe Conference & Exhibition (DATE) . 2018, pp. 913–916. DOI: 10 . 23919 / DATE . 2018.8342139

  7. [8]

    DRUM: A Dynamic Range Unbiased Multiplier for approximate applications

    Soheil Hashemi, R. Iris Bahar, and Sherief Reda. “DRUM: A Dynamic Range Unbiased Multiplier for approximate applications”. In: 2015 IEEE/ACM Interna- tional Conference on Computer-Aided Design (ICCAD). 2015, pp. 418–425. DOI: 10.1109/ICCAD.2015.7372600

  8. [9]

    ApproxLP: Approximate Multiplication with Linearization and Iterative Error Control

    Mohsen Imani et al. “ApproxLP: Approximate Multiplication with Linearization and Iterative Error Control”. In: 2019 56th ACM/IEEE Design Automation Conference (DAC). 2019, pp. 1–6

  9. [10]

    A review, classification, and comparative evaluation of approximate arithmetic circuits

    Honglan Jiang et al. “A review, classification, and comparative evaluation of approximate arithmetic circuits”. In: ACM Journal on Emerging Technologies in Computing Systems (JETC) 13.4 (2017), pp. 1–34

  10. [11]

    Leading one detectors and leading one position detectors - An evolutionary design methodology

    K. Kunaraj and R. Seshasayanan. “Leading one detectors and leading one position detectors - An evolutionary design methodology”. In: Canadian Journal of Electrical and Computer Engineering 36.3 (2013), pp. 103–110. DOI: 10.1109/ CJECE.2013.6704691

  11. [12]

    Computer Multiplication and Division Using Binary Loga- rithms

    John N. Mitchell. “Computer Multiplication and Division Using Binary Loga- rithms”. In: IRE Transactions on Electronic Computers EC-11.4 (1962), pp. 512–

  12. [13]

    DOI: 10.1109/TEC.1962.5219391

  13. [14]

    Energy-Efficient Approximate Multiplication for Digital Signal Processing and Classification Applications

    Srinivasan Narayanamoorthy et al. “Energy-Efficient Approximate Multiplication for Digital Signal Processing and Classification Applications”. In: IEEE Transac- tions on Very Large Scale Integration (VLSI) Systems23.6 (2015), pp. 1180–1184. DOI: 10.1109/TVLSI.2014.2333366

  14. [15]

    Architectural-space exploration of approximate multipli- ers

    Semeen Rehman et al. “Architectural-space exploration of approximate multipli- ers”. In: 2016 IEEE/ACM International Conference on Computer-Aided Design (ICCAD). IEEE. 2016, pp. 1–8

  15. [16]

    Minimally Biased Multipliers for Approximate Integer and Floating-Point Multiplication

    Hassaan Saadat, Haseeb Bokhari, and Sri Parameswaran. “Minimally Biased Multipliers for Approximate Integer and Floating-Point Multiplication”. In: IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems37.11 (2018), pp. 2623–2635. DOI: 10.1109/TCAD.2018.2857262

  16. [17]

    REALM: Reduced-Error Approximate Log-based Integer Multiplier

    Hassaan Saadat et al. “REALM: Reduced-Error Approximate Log-based Integer Multiplier”. In: 2020 Design, Automation & Test in Europe Conference & Exhibition (DATE) . 2020, pp. 1366–1371. DOI: 10 . 23919 / DATE48585 . 2020 . 9116315

  17. [18]

    Cross-layer approximate computing: From logic to architectures

    Muhammad Shafique et al. “Cross-layer approximate computing: From logic to architectures”. In: Proceedings of the 53rd Annual Design Automation Confer- ence. 2016, pp. 1–6

  18. [19]

    High-performance accurate and approximate multipliers for fpga-based hardware accelerators

    Salim Ullah et al. “High-performance accurate and approximate multipliers for fpga-based hardware accelerators”. In: IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems 41.2 (2021), pp. 211–224

  19. [20]

    LETAM: A low energy truncation-based approximate multiplier

    Shaghayegh Vahdat et al. “LETAM: A low energy truncation-based approximate multiplier”. In: Computers & Electrical Engineering 63 (2017), pp. 1–17. ISSN : 0045-7906. DOI: https://doi.org/10.1016/j.compeleceng.2017.08.019

  20. [21]

    TOSAM: An Energy-Efficient Truncation- and Rounding-Based Scalable Approximate Multiplier

    Shaghayegh Vahdat et al. “TOSAM: An Energy-Efficient Truncation- and Rounding-Based Scalable Approximate Multiplier”. In: IEEE Transactions on Very Large Scale Integration (VLSI) Systems 27.5 (2019), pp. 1161–1173. DOI: 10.1109/TVLSI.2018.2890712

  21. [22]

    Approximate computing and the quest for comput- ing efficiency

    Swagath Venkataramani et al. “Approximate computing and the quest for comput- ing efficiency”. In: 2015 52nd ACM/EDAC/IEEE Design Automation Conference (DAC). IEEE. 2015, pp. 1–6

  22. [23]

    RoBA Multiplier: A Rounding-Based Approximate Multiplier for High-Speed yet Energy-Efficient Digital Signal Processing

    Reza Zendegani et al. “RoBA Multiplier: A Rounding-Based Approximate Multiplier for High-Speed yet Energy-Efficient Digital Signal Processing”. In: IEEE Transactions on Very Large Scale Integration (VLSI) Systems 25.2 (2017), pp. 393–401. DOI: 10.1109/TVLSI.2016.2587696

  23. [24]

    Design-Efficient Approximate Multiplication Circuits Through Partial Product Perforation

    Georgios Zervakis et al. “Design-Efficient Approximate Multiplication Circuits Through Partial Product Perforation”. In: IEEE Transactions on Very Large Scale Integration (VLSI) Systems 24.10 (2016), pp. 3105–3117. DOI: 10.1109/TVLSI. 2016.2535398