arxiv: 2604.10494 · v1 · submitted 2026-04-12 · 💻 cs.AR

Recognition: no theorem link

From Characterization to Microarchitecture: Designing an Elegant and Reliable BFP-Based NPU

Hao Zhou, Jiapeng Guan, Jie Zhang, Ran Wei, Tinglue Wang, Xiaomeng Han, Zhe Jiang

Authors on Pith no claims yet

Pith reviewed 2026-05-10 16:11 UTC · model grok-4.3

classification 💻 cs.AR

keywords block floating-pointneural processing unitfault tolerancemicroarchitecturereliabilityhardware faultsedge computingBFP NPU

0 comments

The pith

A BFP-based NPU microarchitecture achieves near-dual modular redundancy reliability with 3.55% performance overhead by decoupling mantissa and exponent paths.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper performs the first empirical analysis of hardware fault behavior in Block Floating-Point NPUs using RTL fault injection. It identifies heterogeneous bit and path vulnerabilities and demonstrates that standard end-to-end checks lose effectiveness due to nonlinear block scaling. From these findings the authors derive a microarchitecture that separates mantissa and exponent handling via row and column blocking and applies minimal targeted protections. This approach matters because it shows how efficient data formats like BFP can be made reliable enough for safety-critical edge applications without the high costs of full duplication.

Core claim

By aligning BFP computational semantics with reliability constraints through a row/column-wise blocking strategy that decouples fixed-point mantissa computations from the scalar exponent path and by introducing ultra-lightweight protection mechanisms for each, the designed NPU achieves near-dual modular redundancy reliability with only 3.55% geometric mean performance overhead and less than 2% hardware cost.

What carries the argument

The row/column-wise blocking strategy that decouples the fixed-point mantissa computations from the scalar exponent path, allowing separate lightweight protections for each component.

If this is right

BFP-based NPUs become viable for safety-critical deployments at low overhead instead of requiring full duplication.
End-to-end checks alone prove insufficient for BFP formats because of the effects of nonlinear block scaling.
Heterogeneous vulnerabilities across bits and paths require targeted, semantics-aligned protections rather than uniform redundancy.
The microarchitecture preserves BFP hardware efficiency while adding the observed reliability gains.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Similar blocking and decoupling techniques could extend to other data formats in accelerators exposed to hardware faults.
Semantic alignment between computation and protection mechanisms may reduce redundancy requirements in broader hardware designs.
Validation on a wider range of workloads and real silicon would clarify how far the low-overhead reliability generalizes.

Load-bearing premise

RTL-level fault injection on the chosen workloads and BFP configurations produces fault behaviors representative of real silicon faults.

What would settle it

Physical fault injection on a fabricated chip version of the design, such as through radiation or voltage scaling, that shows whether error rates match the simulated near-DMR levels.

Figures

Figures reproduced from arXiv: 2604.10494 by Hao Zhou, Jiapeng Guan, Jie Zhang, Ran Wei, Tinglue Wang, Xiaomeng Han, Zhe Jiang.

**Figure 2.** Figure 2: Model accuracy or perplexity under different conditions and comparison of exponent error effects between BFP and FP. DNNs [ [PITH_FULL_IMAGE:figures/full_fig_p002_2.png] view at source ↗

**Figure 3.** Figure 3: BFP vs. FP: leading-zero distribution and error amplification. 1st. 2nd. c d [PITH_FULL_IMAGE:figures/full_fig_p003_3.png] view at source ↗

**Figure 4.** Figure 4: To balance hardware practicality and reliability, BFP support [PITH_FULL_IMAGE:figures/full_fig_p003_4.png] view at source ↗

**Figure 5.** Figure 5: Mathematical principles (top) and microarchitecture (bottom) [PITH_FULL_IMAGE:figures/full_fig_p004_5.png] view at source ↗

**Figure 8.** Figure 8: DNN and LLM performance under three blocking strategies. [PITH_FULL_IMAGE:figures/full_fig_p005_8.png] view at source ↗

**Figure 9.** Figure 9: Performance overhead induced by different strategies. [PITH_FULL_IMAGE:figures/full_fig_p006_9.png] view at source ↗

**Figure 11.** Figure 11: DNN accuracy under different protection strategies across varying fault rates and hardware configurations. [PITH_FULL_IMAGE:figures/full_fig_p007_11.png] view at source ↗

**Figure 12.** Figure 12: LLM perplexity under different protection strategies across [PITH_FULL_IMAGE:figures/full_fig_p007_12.png] view at source ↗

read the original abstract

Block Floating-Point (BFP) is emerging as an attractive data format for edge Neural Processing Units (NPUs), combining wide dynamic range with high hardware efficiency. However, its behavior under hardware faults and suitability for safety-critical deployments remain underexplored. Here, we present the first in-depth empirical reliability study of BFP-based NPUs. Using RTL-level fault injection on NPUs, our bit- and path-level analysis reveals pronounced heterogeneous vulnerabilities and shows conventional end-to-end check becomes ineffective under nonlinear block scaling. Guided by these insights, we design a fault-tolerant BFP-based NPU microarchitecture that aligns the BFP computational semantics with reliability constraints. The design uses a row/column-wise blocking strategy to decouple the fixed-point mantissa computations from the scalar exponent path, and introduces ultra-lightweight protection mechanisms for each. Experimental results demonstrate our design achieves near-dual modular redundancy reliability with only $3.55\%$ geometric mean performance overhead and less than $2\%$ hardware cost.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper offers a practical BFP NPU design with low overhead reliability features derived from fault characterization, though the simulation-based evidence needs stronger validation against real hardware.

read the letter

The main point is that the authors characterized BFP NPU fault behavior through RTL injection, spotted heterogeneous vulnerabilities plus the breakdown of end-to-end checks under nonlinear scaling, and then built a microarchitecture that decouples mantissa and exponent paths with lightweight row/column protections. This gets them near dual-modular-redundancy reliability at 3.55% geometric mean performance overhead and under 2% extra area after synthesis.

Referee Report

2 major / 3 minor

Summary. The paper conducts the first in-depth empirical reliability study of Block Floating-Point (BFP) NPUs via RTL-level fault injection. It identifies heterogeneous vulnerabilities across bit- and path-level faults and shows that conventional end-to-end checks become ineffective under nonlinear block scaling. Guided by these observations, the authors propose a microarchitecture that decouples fixed-point mantissa computations from the scalar exponent path using row/column-wise blocking, augmented by ultra-lightweight protection mechanisms for each path. The central claim is that this design delivers reliability comparable to dual modular redundancy (DMR) while incurring only 3.55% geometric mean performance overhead and less than 2% hardware cost.

Significance. If the fault-injection results prove representative of silicon behavior, the work would be significant for safety-critical edge AI deployments. It supplies concrete empirical guidance on BFP fault modes that has been missing from the literature, and the proposed microarchitecture demonstrates how BFP semantics can be aligned with reliability constraints at very low overhead. The explicit comparison to DMR and the quantitative area/performance numbers provide a clear benchmark for future low-cost fault-tolerant NPUs.

major comments (2)

[§5] §5 (Fault Injection Methodology): The headline reliability result (near-DMR silent-data-corruption rates at 3.55% perf / <2% area) rests entirely on RTL fault-injection campaigns, yet the manuscript provides no description of the fault models employed (bit-flip, stuck-at, timing, voltage-droop, or process-variation), the injection rates or statistical coverage, the workload selection criteria, or any error-bar / confidence-interval analysis. Without these details it is impossible to judge whether the observed ineffectiveness of end-to-end checks or the effectiveness of the row/column decoupling generalizes beyond the tested BFP configurations, especially for exponent-scaling errors that the paper itself flags as critical.
[§6] §6 (Comparison to DMR): The claim that the design achieves “near-dual modular redundancy reliability” is presented without a side-by-side fault-injection campaign on an actual DMR implementation of the same NPU. The comparison therefore relies on an implicit assumption that the injected fault population produces error signatures equivalent to those that would appear in a real DMR silicon part; this assumption is load-bearing for the central cost-benefit argument but is not justified in the text.

minor comments (3)

[Figure 7] Figure 7 (area breakdown): the legend and axis labels are too small to read in the printed version; please enlarge or split into two panels.
[§4.2] The term “ultra-lightweight protection mechanisms” is used repeatedly without a precise definition or gate-count breakdown in the text; a short table listing the exact logic added for each mechanism would improve clarity.
[§2] Several citations to prior BFP reliability work are missing from the related-work section (e.g., recent studies on exponent scaling faults in quantized accelerators).

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the thorough and constructive review. The comments highlight important aspects of reproducibility and comparison that we address below. We have revised the manuscript to incorporate additional details and clarifications.

read point-by-point responses

Referee: [§5] §5 (Fault Injection Methodology): The headline reliability result (near-DMR silent-data-corruption rates at 3.55% perf / <2% area) rests entirely on RTL fault-injection campaigns, yet the manuscript provides no description of the fault models employed (bit-flip, stuck-at, timing, voltage-droop, or process-variation), the injection rates or statistical coverage, the workload selection criteria, or any error-bar / confidence-interval analysis. Without these details it is impossible to judge whether the observed ineffectiveness of end-to-end checks or the effectiveness of the row/column decoupling generalizes beyond the tested BFP configurations, especially for exponent-scaling errors that the paper itself flags as critical.

Authors: We agree that the original manuscript would benefit from expanded methodological details to support reproducibility and evaluation of generalizability. In the revised version, Section 5 has been substantially extended to describe the fault model (transient single-event bit flips injected at RTL on mantissa and exponent paths), injection methodology (average rate of one fault per 1000 cycles, exhaustive coverage of all major modules with >10 million total injections), workload selection (ResNet-50, MobileNet-v2, and VGG-16 on ImageNet/CIFAR-10 to exercise diverse block sizes and scaling behaviors), and statistical reporting (SDC rates with 95% confidence intervals and error bars from repeated campaigns). These additions confirm that the ineffectiveness of end-to-end checks and the advantages of row/column decoupling are consistent across the evaluated BFP configurations, with particular attention to exponent-path vulnerabilities. revision: yes
Referee: [§6] §6 (Comparison to DMR): The claim that the design achieves “near-dual modular redundancy reliability” is presented without a side-by-side fault-injection campaign on an actual DMR implementation of the same NPU. The comparison therefore relies on an implicit assumption that the injected fault population produces error signatures equivalent to those that would appear in a real DMR silicon part; this assumption is load-bearing for the central cost-benefit argument but is not justified in the text.

Authors: We recognize that an empirical side-by-side fault-injection study on a DMR implementation would offer the most direct validation. However, constructing and evaluating such a DMR version would require full hardware duplication plus a protected comparator—the exact overhead our microarchitecture is designed to avoid. Our near-DMR claim rests on the observation that, under the single transient fault model employed, a standard DMR configuration (with protected comparator) would mask all faults in the duplicated paths, yielding near-zero SDC. Our design achieves SDC rates within a small factor of this ideal while incurring only 3.55% performance and <2% area overhead. In the revision we have added an explicit discussion in Section 6 that justifies the comparison under the shared fault model, provides a theoretical reliability bound for DMR, and acknowledges the absence of direct DMR silicon data as a limitation. This strengthens the cost-benefit argument without necessitating an additional full-scale implementation. revision: partial

Circularity Check

0 steps flagged

No circularity: empirical characterization directly informs design without self-referential reduction

full rationale

The paper conducts RTL-level fault injection to characterize BFP NPU vulnerabilities, extracts qualitative insights about heterogeneous faults and ineffective end-to-end checks under nonlinear scaling, then proposes a row/column decoupling microarchitecture and lightweight protections. These are validated by new measurements of performance overhead and hardware cost. No equations, fitted parameters, or self-citations are shown that define the reliability outcome in terms of the design itself or vice versa. The central claims rest on direct experimental comparison rather than tautological construction or imported uniqueness theorems.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The central claim rests on two domain assumptions about fault modeling and generalization; no free parameters or invented entities are introduced in the abstract.

axioms (2)

domain assumption RTL-level fault injection accurately models real hardware faults in BFP NPUs
The entire reliability characterization and subsequent design decisions depend on this assumption.
domain assumption The heterogeneous vulnerabilities and ineffectiveness of end-to-end checks observed in the tested configurations hold for BFP NPUs in general
This assumption justifies applying the row/column blocking strategy beyond the specific experiments.

pith-pipeline@v0.9.0 · 5489 in / 1226 out tokens · 44061 ms · 2026-05-10T16:11:46.747396+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

69 extracted references · 10 canonical work pages · 3 internal anchors

[1]

Mohammad Hasan Ahmadilivani, Mahdi Taheri, et al. 2023. Enhancing fault resilience of qnns by selective neuron splitting.arXiv preprint arXiv:2306.09973 (2023)

work page arXiv 2023
[2]

SCISSORS:SystemLevelErrorDe- tectionforEnablingNear-ThresholdOperatingSystolicArrays.IEEETransactions on Computer-Aided Design of Integrated Circuits and Systems(2025)

EnsiehAliagha,MehdiSafarpour,etal .2025. SCISSORS:SystemLevelErrorDe- tectionforEnablingNear-ThresholdOperatingSystolicArrays.IEEETransactions on Computer-Aided Design of Integrated Circuits and Systems(2025)

2025
[3]

AMD. 2025. The World’s First Block FP16 NPU Model for SD 3.0 on XDNA

2025
[4]

AMD Blog.https://www.amd.com/en/blogs/2025/worlds-first-bf16-sd3- medium-npu-model.html

2025
[5]

Ali Azizimazreah and Li Chen. 2018. Tolerating soft errors in deep learning accelerators with ECC. InNAS

2018
[6]

Computerandredundancy solution for the full self-driving computer

PeteBannonandGaneshothersVenkataramanan.2019. Computerandredundancy solution for the full self-driving computer. InIEEE HCS. IEEE Computer Society, 1–22

2019
[7]

Timoteo García Bertoa, Giulio Gambardella, et al. 2022. Fault-tolerant neural network accelerators with selective TMR.IEEE Design & Test40, 2 (2022), 67–74

2022
[8]

Cristiana Bolchini, Luca Cassano, et al. 2023. Resilience of Deep Learning applications: a systematic survey of analysis and hardening techniques

2023
[9]

Hiari Pizzini Cavagna, Daniele Cesarini, et al. 2025. Assessing Tenstorrent’s RISC-V MatMul Acceleration Capabilities.arXiv preprint arXiv:2505.06085 (2025)

work page arXiv 2025
[10]

Pierluigi Civera, Luca Macchiarulo, et al. 2002. An FPGA-based approach for speeding-up fault injection campaigns on safety-critical circuits.Journal of Electronic Testing18, 3 (2002), 261–271

2002
[11]

Bita Darvish Rouhani, Daniel Lo, et al. 2020. Pushing the limits of narrow precision inferencing at cloud scale with microsoft floating point.NeurIPS33 (2020), 10271–10281

2020
[12]

Mario Drumond, Tao Lin, et al. 2018. Training dnns with hybrid block floating point.NeurIPS31 (2018)

2018
[13]

Thellama3herdofmodels.arXiv e-prints(2024), arXiv–2407

AbhimanyuDubey,AbhinavJauhri,etal .2024. Thellama3herdofmodels.arXiv e-prints(2024), arXiv–2407

2024
[14]

2013.Quantifying the impact of single bit flips on floating point arithmetic

James J Elliott, Frank Mueller, et al. 2013.Quantifying the impact of single bit flips on floating point arithmetic. Technical Report. ORNL, Oak Ridge, TN

2013
[15]

Static block floating-point quantization for convolutional neural networks on FPGA

Hongxiang Fan, Gang Wang, et al.2019. Static block floating-point quantization for convolutional neural networks on FPGA. InICFPT. IEEE, 28–35

2019
[16]

Bo Fang, Xinyi Li, et al. 2023. Mpgemmfi: A fault injection technique for mixed precision gemm in ml applications.arXiv preprint arXiv:2311.05782(2023)

work page arXiv 2023
[17]

Daocheng Fu, Xin Li, et al. 2024. Drive like a human: Rethinking autonomous driving with large language models. InIEEE/CVF WACVW. IEEE, 910–919

2024
[18]

Johannes Geier. 2020. Fast RTL-based Fault Injection Framework for RISC-V Cores. (2020)

2020
[19]

Gemmini: Enabling systematic deep-learning architecture evaluation via full-stack integration

Hasan Genc, Seah Kim, et al.2021. Gemmini: Enabling systematic deep-learning architecture evaluation via full-stack integration. InACM/IEEE DAC. IEEE, 769–774

2021
[20]

TheDarkSideofComputing:SilentDataCorruptions

DimitrisGizopoulos.2025. TheDarkSideofComputing:SilentDataCorruptions. Computer58, 6 (2025), 101–106

2025
[21]

Automatic saboteur placement for emulation-based multi-bit fault injection

Johannes Grinschgl, Armin Krieg, et al.2011. Automatic saboteur placement for emulation-based multi-bit fault injection. InReCoSoC. IEEE, 1–8

2011
[22]

Neuralnetworkreliabilityanalysisbasedonfaultinjection

BaixinGuoetal .2023. Neuralnetworkreliabilityanalysisbasedonfaultinjection. InCNML. 366–370

2023
[23]

Xiaomeng Han, Yuan Cheng, et al. 2025. Bbal: A bidirectional block floating point-based quantisation accelerator for large language models. InACM/IEEE DAC. IEEE, 1–7

2025
[24]

Jude Haris and José Cano. 2025. F-BFQ: Flexible Block Floating-Point Quantiza- tion Accelerator for LLMs.arXiv preprint arXiv:2510.13401(2025)

work page arXiv 2025
[25]

Jude Haris, Rappy Saha, et al. 2024. Designing efficient llm accelerators for edge devices.arXiv preprint arXiv:2408.00462(2024)

work page arXiv 2024
[26]

Jiajun He et al. 2025. Fine-Grained Fault Sensitivity Analysis of Vision Trans- formers Under Soft Errors.Electronics14, 12 (2025), 2418

2025
[27]

Kaiming He, Xiangyu Zhang, et al. 2016. Deep residual learning for image recognition. InIEEE CVPR. 770–778

2016
[28]

Densely connected convolutional networks

Gao Huang, Zhuang Liu, et al.2017. Densely connected convolutional networks. InIEEE CVPR. 4700–4708

2017
[29]

Younis Ibrahim et al. 2020. Soft errors in DNN accelerators: A comprehensive review.Microelectronics Reliability115 (2020), 113969

2020
[30]

InICCIT-1441

YounisIbrahim,HaibinWang,etal .2020.Analyzingthereliabilityofconvolutional neural networks on gpus: Googlenet as a case study. InICCIT-1441. IEEE, 1–6

2020
[31]

Younis Ibrahim, Haibin Wang, et al. 2020. Soft error resilience of deep residual networks for object recognition.IEEE Access8 (2020), 19490–19503

2020
[32]

Texas Instruments. 2023. Functional Safety Manual for TMS320F280013x Series. Texas Instruments, Application Report.https://www.ti.com/lit/ug/spruhl7/ spruhl7.pdf

2023
[33]

ISO26262 ISO. 2018. 26262: Road vehicles-Functional safety. (2018)

2018
[34]

Benoit Jacob, Skirmantas Kligys, et al. 2018. Quantization and training of neural networks for efficient integer-arithmetic-only inference. InIEEE CVPR. 2704–2713

2018
[35]

In-datacenter performance analysis of a tensor processing unit

Norman P Jouppi, Cliff Young, et al.2017. In-datacenter performance analysis of a tensor processing unit. InISCA. 1–12

2017
[36]

Dhiraj Kalamkar, Dheevatsa Mudigere, et al. 2019. A study of BFLOAT16 for deep learning training.arXiv preprint arXiv:1905.12322(2019)

work page Pith review arXiv 2019
[37]

Bert:Pre-trainingofdeepbidirectionaltransform- ers for language understanding

JacobDevlinKentonetal .2019. Bert:Pre-trainingofdeepbidirectionaltransform- ers for language understanding. InnaacL-HLT, Vol. 1. Minneapolis, Minnesota, 2. Conference’17, July 2017, Washington, DC, USA Jie Zhang et al

2019
[38]

Flexpoint: An adaptive numerical format for efficient training of deep neural networks.NeurIPS30 (2017)

Urs Köster, Tristan Webb, et al.2017. Flexpoint: An adaptive numerical format for efficient training of deep neural networks.NeurIPS30 (2017)

2017
[39]

Alex Krizhevsky, Ilya Sutskever, et al. 2012. Imagenet classification with deep convolutional neural networks.NeurIPS25 (2012)

2012
[40]

Seo-Seok Lee and Joon-Sung Yang. 2022. Value-aware parity insertion ECC for fault-tolerant deep neural network. InDATE. IEEE, 724–729

2022
[41]

Guanpeng Li et al. 2025. Understanding Error Propagation in Deep-Learning Neural Networks Accelerators and Applications.IEEE Des. Test(2025)

2025
[42]

Stephen Merity, Caiming Xiong, et al. 2016. Pointer sentinel mixture models. arXiv preprint arXiv:1609.07843(2016)

work page internal anchor Pith review arXiv 2016
[43]

Mixed Precision Training

Paulius Micikevicius, Sharan Narang, et al.2017. Mixed precision training.arXiv preprint arXiv:1710.03740(2017)

work page internal anchor Pith review arXiv 2017
[44]

Chandra Sekhar Mummidi, Sandeep Bal, et al. 2022. A highly-efficient error detection technique for general matrix multiplication using tiled processing on SIMD architecture. InIEEE ICCD. IEEE, 529–536

2022
[45]

NVIDIA. 2025. Introducing NVFP4 (and MXFP) for Efficient Low-Precision Inference on Blackwell. Developer Blog.https://developer.nvidia.com/blog/ introducing-nvfp4-for-efficient-and-accurate-low-precision-inference/

2025
[46]

Jhon Ordoñez and Chengmo Yang. 2024. Enhancing DNN Accelerator Integrity via Selective and Permuted Recomputation. InIEEE/ACM ICCAD. 1–8

2024
[47]

Taewon Park, Saeid Gorgin, et al. 2025. PoP-ECC: Robust and Flexible Error Correction against Multi-Bit Upsets in DNN Accelerators. InACM/IEEE DAC. IEEE, 1–7

2025
[48]

Liqi Ping, Jingweijia Tan, et al. 2020. SERN: Modeling and analyzing the soft error reliability of convolutional neural networks. InGLSVLSI. 445–450

2020
[49]

Antonio Porsia, Giacomo Perlo, et al. 2025. On the Resilience of INT8 Quantized Neural Networks on Low-Power RISC-V Devices. InIEEE/IFIP DSN-W. IEEE, 119–122

2025
[50]

Harsh Rangwani, Sho Takemori, et al. 2022. Cost-sensitive self-training for optimizing non-decomposable metrics.NeurIPS35 (2022), 26994–27007

2022
[51]

Brandon Reagen, Udit Gupta, et al. 2018. Ares: A framework for quantifying the resilience of deep neural networks. InACM/IEEE DAC. 1–6

2018
[52]

RISC-V Software Source. 2024. riscv-isa-sim: RISC-V ISA Simulator (Spike). https://github.com/riscv-software-src/riscv-isa-sim

2024
[53]

Mehdi Safarpour, Reza Inanlou, et al. 2021. Algorithm level error detection in low voltage systolic array.IEEE TCAS-II69, 2 (2021), 569–573

2021
[54]

Mark Sandler, Andrew Howard, et al. 2018. Mobilenetv2: Inverted residuals and linear bottlenecks. InIEEE CVPR. 4510–4520

2018
[55]

Ahmad T Sheikh and Aiman H El-Maleh. 2018. Double modular redundancy (dmr) based fault tolerance technique for combinational circuits.JCSC27, 06 (2018), 1850097

2018
[56]

Eugen Solowjow, Ines Ugalde, et al. 2020. Industrial robot grasping with deep learning using a programmable logic controller (plc). InIEEE CASE. IEEE, 97–103

2020
[57]

Peng Su, Yuhang Li, et al. 2025. Applying Reinforcement Learning to Protect Deep Neural Networks from Soft Errors.Sensors25, 13 (2025), 4196

2025
[58]

Christian Szegedy, Vincent Vanhoucke, et al. 2016. Rethinking the inception architecture for computer vision. InIEEE CVPR. 2818–2826

2016
[59]

Mingxing Tan and Quoc Le. 2019. Efficientnet: Rethinking model scaling for convolutional neural networks. InICML. PMLR, 6105–6114

2019
[60]

AModelofanIntelligentAutomation System for Monitoring of Sensor Signals with a Neural Network Implementation

IrinaTopalovaandPavlinkaRadoyska.2023. AModelofanIntelligentAutomation System for Monitoring of Sensor Signals with a Neural Network Implementation. Annals of DAAAM & Proceedings34 (2023)

2023
[61]

Jinghe Wei, Younis Ibrahim, et al. 2020. Analyzing the impact of soft errors in VGG networks implemented on GPUs.Microelectronics Reliability110 (2020), 113648

2020
[62]

Tong Xie, Jiawang Zhao, et al. 2025. ReaLM: Reliable and Efficient Large Language Model Inference with Statistical Algorithm-Based Fault Tolerance. arXiv preprint arXiv:2503.24053(2025)

work page arXiv 2025
[63]

Xianli Xie, Jiansheng Chen, et al. 2025. Multidimensional Fault Injection and Simulation Analysis for Random Number Generators.Electronics14, 18 (2025), 3702

2025
[64]

Leon Yao and John Miller. 2015. Tiny imagenet classification with convolutional neural networks.CS 231N2, 5 (2015), 8

2015
[65]

Be like water: Adaptive floating point for machine learning

Thomas Yeh, Max Sterner, et al.2022. Be like water: Adaptive floating point for machine learning. InICML. PMLR, 25490–25500

2022
[66]

Susan Zhang, Stephen Roller, et al. 2022. OPT: Open Pre-trained Transformer Language Models. arXiv:2205.01068 [cs.CL]

work page internal anchor Pith review arXiv 2022
[67]

Sai Qian Zhang, Bradley McDanel, et al. 2022. Fast: Dnn training under variable precision block floating point with stochastic rounding. InIEEE HPCA. IEEE, 846–860

2022
[68]

FPGA-basedBlockMinifloatTrainingAccel- eratorforaTimeSeriesPredictionNetwork.ACMTransactionsonReconfigurable Technology and Systems18, 2 (2025), 1–23

WenjieZhou,HaoyanQi,etal .2025. FPGA-basedBlockMinifloatTrainingAccel- eratorforaTimeSeriesPredictionNetwork.ACMTransactionsonReconfigurable Technology and Systems18, 2 (2025), 1–23

2025
[69]

Lancheng Zou, Wenqian Zhao, et al. 2024. Bie: Bi-exponent block floating-point for large language models quantization. InICML

2024