ITHICA: Intra-Thread Instruction Checking Approach for Defect-Induced Silent Data Corruptions
Pith reviewed 2026-05-19 19:49 UTC · model grok-4.3
pith:ILNMSBVX Add to your LaTeX paper
What is a Pith Number?\usepackage{pith}
\pithnumber{ILNMSBVX}
Prints a linked pith:ILNMSBVX badge after your title and writes the identifier into PDF metadata. Compiles on arXiv with no extra files. Learn more
The pith
Intra-thread instruction duplication detects 39% more defective servers by catching inconsistent defect errors.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
ITHICA transforms arbitrary programs into tests for defect-induced silent data corruptions by inserting intra-thread, instruction-level error checks that primarily use instruction duplication and output comparison. The central insight is that the most pernicious defects cause inconsistent errors: two executions of the same instruction within the same thread, given the same inputs, can produce different architectural outputs depending on the execution context in which they run. This enables identification of affected instructions upon error detection. When applied to industrial hyperscaler test programs, datacenter workloads, and common libraries and run on over 3,000 CPU servers, the ITHICA-
What carries the argument
Intra-thread instruction duplication and output comparison that exploits inconsistent errors to turn programs into tests and flag affected instructions.
If this is right
- ITHICA checks derived from baseline industrial programs detect 39% more defective servers than native checks within the same tests.
- Datacenter workloads and common libraries can be turned into functional tests for defect-induced errors.
- Affected instructions are identified when an error is detected during test execution.
- New observations about defect behavior emerge that differ from conclusions in prior hyperscaler fleet studies.
Where Pith is reading between the lines
- The same duplication idea could be tried on GPUs or other accelerators where execution context might also trigger inconsistent faults.
- Automated pipelines that insert these checks could become routine for screening new hardware batches before deployment.
- Defect models used in reliability analysis may need to treat error behavior as context-dependent rather than fixed.
Load-bearing premise
The most pernicious defects cause inconsistent errors such that two executions of the same instruction within the same thread, given the same inputs, can produce different architectural outputs depending on the execution context.
What would settle it
Running duplicated instructions from an ITHICA test on a server already known to produce silent data corruptions and checking whether the two executions with identical inputs always yield identical outputs; consistent outputs would undermine the inconsistent-error premise.
Figures
read the original abstract
Hyperscaler reports of silent data corruptions (SDCs), presumed to be caused by silicon manufacturing defects, have motivated the development of functional tests for detecting defective CPUs. We present ITHICA, an approach for automatically generating functional tests for defect-induced errors from arbitrary programs by inserting intra-thread, instruction-level error checks, primarily leveraging instruction duplication and output comparison. Our key insight is that the most pernicious defects cause inconsistent errors: two executions of the same instruction within the same thread, given the same inputs, can produce different architectural outputs depending on the execution context in which they run. By exploiting this insight, ITHICA enables arbitrary programs to serve as tests and identifies affected instructions upon error detections. We use ITHICA to transform industrial hyperscaler test programs (our baseline), datacenter workloads, and common libraries into functional tests, and evaluate them on over 3,000 CPU servers. ITHICA error checks detect 39% more defective servers than native checks within the ITHICA tests derived from our baseline programs, and enable novel findings on defect behavior that challenge conclusions drawn by prior hyperscaler fleet studies.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper introduces ITHICA, an approach to automatically convert arbitrary programs into functional tests for defect-induced silent data corruptions (SDCs) by inserting intra-thread instruction-level checks, primarily via duplication of instructions and comparison of their architectural outputs. The core insight is that the most pernicious manufacturing defects produce inconsistent errors, such that the same instruction executed twice within the same thread on identical inputs can yield different outputs depending on execution context. The method is applied to industrial baseline test programs, datacenter workloads, and common libraries; these transformed tests are run on over 3,000 CPU servers. The evaluation reports that ITHICA checks detect 39% more defective servers than the native checks already present in the baseline-derived tests and yields new observations on defect behavior that challenge prior hyperscaler fleet studies.
Significance. If the attribution of observed inconsistencies to manufacturing defects is substantiated, the work would offer a practical, low-overhead way to leverage existing production programs for defect screening at hyperscale, potentially improving SDC mitigation and prompting re-examination of earlier fleet-study conclusions. The scale of the real-hardware deployment (thousands of servers) constitutes a concrete strength and supports reproducibility of the detection-rate measurements.
major comments (2)
- [Abstract] Abstract and evaluation description: the 39% improvement in defective-server detections is presented as a central quantitative result, yet the manuscript provides no independent ground truth (physical failure analysis, controlled fault injection, or orthogonal detection method) to confirm that the additional inconsistencies are caused by permanent manufacturing defects rather than transient faults, environmental variation, or microarchitectural nondeterminism. This attribution is load-bearing for both the percentage claim and the challenge to prior studies.
- [Evaluation] Methods and evaluation sections: the assumption that defects produce context-dependent inconsistent outputs for identical instructions and inputs is used to justify turning arbitrary programs into tests via duplication/comparison, but no validation experiments or controls are described that would rule out other sources of intra-thread output variation. Without such evidence the extra detections cannot be unambiguously credited to defects.
minor comments (2)
- [Abstract] Abstract: the phrase 'over 3,000 CPU servers' should be replaced by the exact count and a brief statement of selection criteria.
- [Throughout] Notation: ensure consistent use of 'SDC' after its first definition and clarify whether 'native checks' refers to existing hardware mechanisms or to the baseline program's own assertions.
Simulated Author's Rebuttal
We thank the referee for their constructive comments and the opportunity to clarify aspects of our work. We address each major comment below and have revised the manuscript where feasible to strengthen the attribution of results to manufacturing defects.
read point-by-point responses
-
Referee: [Abstract] Abstract and evaluation description: the 39% improvement in defective-server detections is presented as a central quantitative result, yet the manuscript provides no independent ground truth (physical failure analysis, controlled fault injection, or orthogonal detection method) to confirm that the additional inconsistencies are caused by permanent manufacturing defects rather than transient faults, environmental variation, or microarchitectural nondeterminism. This attribution is load-bearing for both the percentage claim and the challenge to prior studies.
Authors: We acknowledge that direct ground truth such as physical failure analysis would provide stronger confirmation. However, such analysis is impractical at the scale of over 3,000 production servers due to cost, time, and the need to maintain fleet availability. We instead rely on the repeatability of inconsistencies across repeated executions on the same servers and the context-dependent error pattern, which aligns with known defect behaviors cited in the paper. Transient faults and environmental factors are mitigated by our experimental design of multiple runs per test. We will add a dedicated paragraph in the evaluation section discussing alternative explanations and why the observed patterns are most consistent with permanent defects. revision: partial
-
Referee: [Evaluation] Methods and evaluation sections: the assumption that defects produce context-dependent inconsistent outputs for identical instructions and inputs is used to justify turning arbitrary programs into tests via duplication/comparison, but no validation experiments or controls are described that would rule out other sources of intra-thread output variation. Without such evidence the extra detections cannot be unambiguously credited to defects.
Authors: The assumption draws from established CPU defect literature on intermittent and context-sensitive errors, which we reference. To strengthen this, we will include new control experiments in the revised evaluation: running the duplicated instruction sequences on a set of known-good servers to quantify baseline variation from microarchitectural sources, and reporting that detected inconsistencies are persistent rather than sporadic. This supports crediting the additional detections to defects while acknowledging that complete isolation of all nondeterministic sources remains challenging without hardware-level instrumentation. revision: yes
- Independent physical failure analysis or controlled fault injection at hyperscale to provide definitive ground truth for all detected servers
Circularity Check
No significant circularity; empirical hardware results independent of self-referential inputs
full rationale
The paper presents ITHICA as a method to generate tests from arbitrary programs by exploiting an assumed key insight on defect-induced inconsistent errors. The central quantitative claim (39% more detections) is obtained by executing the generated tests on over 3,000 real CPU servers and comparing against native checks within the same tests. No equations, fitted parameters, or derived predictions are described that reduce the reported detection improvement to a quantity defined by the paper's own inputs or prior self-citations. The evaluation uses external hardware benchmarks, satisfying the criterion for a self-contained result against external measurements. A minor score of 2 accounts for the normal presence of an unverified modeling assumption without any reduction of outputs to inputs by construction.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Most pernicious defects cause inconsistent errors where the same instruction with same inputs produces different outputs depending on execution context
Lean theorems connected to this paper
-
IndisputableMonolith/Foundation/AbsoluteFloorClosure.leanreality_from_one_distinction unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
Our key insight... the most pernicious defects cause inconsistent errors: two executions of the same instruction within the same thread, given the same inputs, can produce different architectural outputs depending on the execution context
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
ITHICA error checks detect 39% more defective servers than native checks
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
-
[1]
Andreas Abel, Yuying Li, Richard O’Grady, Chris Kennelly, and Darryl Gove. 2024. A Profiling-Based Benchmark Suite for Warehouse-Scale Computers. In2024 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS). 325–327. https://doi.org/10.1109/ISPASS61541.2024.00046
-
[2]
Andreas Abel and Jan Reineke. 2019. uops.info: Characterizing Latency, Through- put, and Port Usage of Instructions on Intel Microarchitectures. InASPLOS (Providence, RI, USA)(ASPLOS ’19). ACM, New York, NY, USA, 673–686. https: //doi.org/10.1145/3297858.3304062
-
[3]
abseil 2024. Abseil. https://github.com/abseil/abseil-cpp
work page 2024
-
[4]
Paul, Ming Zhang, and Subhasish Mitra
Mridul Agarwal, Bipul C. Paul, Ming Zhang, and Subhasish Mitra. 2007. Circuit Failure Prediction and Its Application to Transistor Aging. In25th IEEE VLSI Test Symposium (VTS’07). 277–286. https://doi.org/10.1109/VTS.2007.22
-
[5]
Chang, Chao-Wen Tseng, Chien-Mo James Li, Mike Purtell, and Edward Joseph McCluskey
Jonathan T.-Y. Chang, Chao-Wen Tseng, Chien-Mo James Li, Mike Purtell, and Edward Joseph McCluskey. 1998. Analysis of pattern-dependent and timing-dependent failures in an experimental test chip.Proceedings Interna- tional Test Conference 1998 (IEEE Cat. No.98CH36270)(1998), 184–193. https: //api.semanticscholar.org/CorpusID:16286356
work page 1998
-
[6]
D’Agostino, Ioanna Vavelidou, Vijay D
Saranyu Chattopadhyay, Keerthikumara Devarajegowda, Bihan Zhao, Florian Lonsing, Brandon A. D’Agostino, Ioanna Vavelidou, Vijay D. Bhatt, Sebastian Prebeck, Wolfgang Ecker, Caroline Trippel, Clark Barrett, and Subhasish Mitra
-
[7]
In2023 60th ACM/IEEE Design Automation Conference (DAC)
G-QED: Generalized QED Pre-silicon Verification beyond Non-Interfering Hardware Accelerators. In2023 60th ACM/IEEE Design Automation Conference (DAC). 1–6. https://doi.org/10.1109/DAC56929.2023.10247903
-
[8]
Odysseas Chatzopoulos, Nikos Karystinos, George Papadimitriou, Dimitris Gi- zopoulos, Harish D. Dixit, and Sriram Sankar. 2025. Veritas - Demystifying Silent Data Corruptions: uArch-Level Modeling and Fleet Data of Modern x86 CPUs. In 2025 IEEE International Symposium on High Performance Computer Architecture (HPCA). 1–14. https://doi.org/10.1109/HPCA6190...
-
[9]
Odysseas Chatzopoulos, George Papadimitriou, Dimitris Gizopoulos, Harish D Dixit, and Sriram Sankar. 2025. From gates to sdcs: Understanding fault propa- gation through the compute stack. In2025 Design, Automation & Test in Europe Conference (DATE). IEEE, 1–7
work page 2025
-
[10]
Tze Wee Chen, Kyunglok Kim, Young Moon Kim, and Subhasish Mitra. 2008. Gate-Oxide Early Life Failure Prediction. In26th IEEE VLSI Test Symposium (vts 2008). 111–118. https://doi.org/10.1109/VTS.2008.55
-
[11]
Szafaryn, Chen-Yong Cher, Hyungmin Cho, Kevin Skadron, Mircea R
Eric Cheng, Shahrzad Mirkhani, Lukasz G. Szafaryn, Chen-Yong Cher, Hyungmin Cho, Kevin Skadron, Mircea R. Stan, Klas Lilja, Jacob A. Abraham, Pradip Bose, and Subhasish Mitra. 2016. CLEAR: Cross-Layer Exploration for Architecting Resilience - Combining hardware and software techniques to tolerate soft errors in processor cores. InProceedings of the 53rd A...
-
[12]
Peter Deutsch, Harish Dixit, Gautham Vunnam, Carl Moran, Eleanor Ozer, and Sriram Sankar. 2026. PinDrop: Breaking the Silence on SDCs in a Large-Scale Fleet. 1–14. https://doi.org/10.1109/HPCA68181.2026.11408620
-
[13]
Deutsch, Vincent Quentin Ulitzsch, Sudhanva Gurumurthi, Vilas Srid- haran, Joel S
Peter W. Deutsch, Vincent Quentin Ulitzsch, Sudhanva Gurumurthi, Vilas Srid- haran, Joel S. Emer, and Mengjia Yan. 2024. DelayAVF: Calculating Architectural Vulnerability Factors for Delay Faults. In2024 57th IEEE/ACM International Symposium on Microarchitecture (MICRO). 231–245. https://doi.org/10.1109/ MICRO61859.2024.00026
-
[14]
Moslem Didehban and Aviral Shrivastava. 2016. nZDC: A compiler technique for near Zero Silent Data Corruption. In2016 53nd ACM/EDAC/IEEE Design Automation Conference (DAC). 1–6. https://doi.org/10.1145/2897937.2898054
- [15]
- [16]
-
[17]
2025.Hardware Sentinel: Protecting Software Applications from Hardware Silent Data Corruptions
Rhea Dutta, Harish Dattatraya Dixit, Rik Van Riel, Gautham Vunnam, and Sriram Sankar. 2025.Hardware Sentinel: Protecting Software Applications from Hardware Silent Data Corruptions. Association for Computing Machinery, New York, NY, USA, 482–497. https://doi.org/10.1145/3676641.3716258
-
[18]
E. B. Eichelberger and T. W. Williams. 1988. A logic design structure for LSI testability. InPapers on Twenty-Five Years of Electronic Design Automation (25 years of DAC). Association for Computing Machinery, New York, NY, USA, 358–364. https://doi.org/10.1145/62882.62924
-
[19]
Shuguang Feng, Shantanu Gupta, Amin Ansari, and Scott Mahlke. 2010. Shoestring: probabilistic soft error reliability on the cheap.ACM SIGPLAN Notices45 (03 2010), 385. https://doi.org/10.1145/1735971.1736063
-
[20]
Nikos Foutris, Dimitris Gizopoulos, Mihalis Psarakis, Xavier Vera, and An- tonio Gonzalez. 2011. Accelerating microprocessor silicon validation by ex- posing ISA diversity. InProceedings of the 44th Annual IEEE/ACM Interna- tional Symposium on Microarchitecture(Porto Alegre, Brazil)(MICRO-44). As- sociation for Computing Machinery, New York, NY, USA, 386–...
-
[21]
Nishant George, Sudhanva Gurumurthi, Vilas Sridharan, Harish Dattatraya Dixit, Emel Goksu, Bharath Parthasarathy, Amber Huffman, Thiago Macieira, Arani Sinha, Dean Liberty, Lisa Minwell, and Robert S. Chappell. 2025. Silent Data Corruption in AI: A Growing Challenge for Large-Scale Machine Learning.IEEE Micro(2025), 1–7. https://doi.org/10.1109/MM.2025.3645670
-
[22]
Dimitris Gizopoulos, George Papadimitriou, Odysseas Chatzopoulos, Nikos Karystinos, Harish D. Dixit, and Sriram Sankar. 2024. Silent Data Corruptions in Computing Systems: Early Predictions and Large-Scale Measurements. In2024 IEEE European Test Symposium (ETS). 1–10. https://doi.org/10.1109/ETS61313. 2024.10567770
-
[23]
Google. 2020. Google cpu-check torture test. https://github.com/google/cpu- check
work page 2020
-
[24]
Google. 2021. Silifuzz. https://github.com/google/silifuzz
work page 2021
-
[25]
Google. 2022. Fleetbench. https://github.com/google/fleetbench
work page 2022
- [26]
-
[27]
Zhengyang He, Yafan Huang, Hui Xu, Dingwen Tao, and Guanpeng Li. 2023. Demystifying and Mitigating Cross-Layer Deficiencies of Soft Error Protection in Instruction Duplication. InProceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis(Denver, CO, USA) (SC ’23). Association for Computing Machinery, New Y...
-
[28]
Zhengyang He, Hui Xu, and Guanpeng Li. 2024. A Fast Low-Level Error Detection Technique. In2024 54th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN). 90–98. https://doi.org/10.1109/DSN58291.2024. 00023
-
[29]
K. Heragu, J.H. Patel, and V.D. Agrawal. 1996. Segment delay faults: a new fault model. InProceedings of 14th VLSI Test Symposium
work page 1996
-
[30]
Hochschild, Paul Turner, Jeffrey C
Peter H. Hochschild, Paul Turner, Jeffrey C. Mogul, Rama Govindaraju, Parthasarathy Ranganathan, David E. Culler, and Amin Vahdat. 2021. Cores That Don’t Count. InProceedings of the Workshop on Hot Topics in Operating Systems
work page 2021
-
[31]
Ted Hong, Yanjing Li, Sung-Boem Park, Diana Mui, David Lin, Ziyad Abdel Kaleq, Nagib Hakim, Helia Naeimi, Donald S. Gardner, and Subhasish Mitra
-
[32]
In 2010 IEEE International Test Conference
QED: Quick Error Detection tests for effective post-silicon validation. In 2010 IEEE International Test Conference
work page 2010
-
[33]
Deutsch, Vincent Quentin Ulitzsch, Sudhanva Gurumurthi, Vilas Srid- haran, Joel S
Yao Hsiao, Nikos Nikoleris, Artem Khyzha, Dominic P. Mulligan, Gustavo Petri, Christopher W. Fletcher, and Caroline Trippel. 2024. RTL2M 𝜇PATH: Multi- 𝜇PATH Synthesis with Applications to Hardware Security Verification. In2024 57th IEEE/ACM International Symposium on Microarchitecture (MICRO). 507–524. https://doi.org/10.1109/MICRO61859.2024.00045
-
[34]
Yafan Huang, Shengjian Guo, Sheng Di, Guanpeng Li, and Franck Cappello. 2022. Mitigating Silent Data Corruptions in HPC Applications across Multiple Program Inputs. InSC22: International Conference for High Performance Computing, Net- working, Storage and Analysis. 1–14. https://doi.org/10.1109/SC41404.2022.00022
work page internal anchor Pith review Pith/arXiv arXiv doi:10.1109/sc41404.2022.00022 2022
-
[35]
Intel. 2021. OpenDCDiag. https://github.com/opendcdiag
work page 2021
-
[36]
Nikos Karystinos, Odysseas Chatzopoulos, George-Marios Fragkoulis, George Pa- padimitriou, Dimitris Gizopoulos, and Sudhanva Gurumurthi. 2024. Harpocrates: Breaking the Silence of CPU Faults through Hardware-in-the-Loop Program Generation. In2024 ACM/IEEE 51st Annual International Symposium on Computer Architecture (ISCA). 516–531. https://doi.org/10.1109...
-
[37]
Nikos Karystinos, George-Marios Fragkoulis, Odysseas Chatzopoulos, Dimitris Gizopoulos, and Sudhanva Gurumurthi. 2025. Harpocrates++: Automated Func- tional Program Generation against CPU Faults and Silent Data Corruptions.IEEE Micro(2025), 1–9. https://doi.org/10.1109/MM.2025.3640385
- [38]
-
[39]
2021.SiliFuzz: Fuzzing CPUs by proxy
Doug Kwan, Kostik Shtoyk, Kostya Serebryany, Maxim L Lifantsev, and Peter Hochschild. 2021.SiliFuzz: Fuzzing CPUs by proxy. Technical Report. Google
work page 2021
-
[40]
J.C.-M. Li and E.J. McCluskey. 2002. Diagnosis of sequence-dependent chips. InProceedings 20th IEEE VLSI Test Symposium (VTS 2002). 187–192. https: //doi.org/10.1109/VTS.2002.1011137
-
[41]
Wei Li, Chris Nigh, Danielle Duvalsaint, Subhasish Mitra, and R. D. Blanton. 2022. PEPR: Pseudo-Exhaustive Physically-Aware Region Testing. InInternational Test Conference
work page 2022
- [42]
-
[43]
libcxxllvm 2024. LLVM libc++. https://github.com/llvm/llvm- project/blob/main/libcxx/include/concepts
work page 2024
-
[44]
David Lin, Ted Hong, Yanjing Li, Farzan Fallah, Donald S Gardner, Nagib Hakim, and Subhasish Mitra. 2013. Overcoming post-silicon validation challenges through quick error detection (QED). In2013 Design, Automation & Test in Europe Conference & Exhibition (DATE). IEEE, 320–325
work page 2013
-
[45]
David Lin, Ted Hong, Yanjing Li, Eswaran S, Sharad Kumar, Farzan Fallah, Nagib Hakim, Donald S. Gardner, and Subhasish Mitra. 2014. Effective Post-Silicon Validation of System-on-Chips Using Quick Error Detection.IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems33, 10 (2014), 1573–1590. https://doi.org/10.1109/TCAD.2014.2334301...
-
[46]
LLVM Language Reference Manual
llvm-language-ref 2022. LLVM Language Reference Manual. https://llvm.org/ docs/LangRef.html. Accessed: 2022-10-19
work page 2022
-
[47]
Florian Lonsing, Subhasish Mitra, and Clark W. Barrett. 2020. A Theoretical Framework for Symbolic Quick Error Detection. In2020 Formal Methods in Computer Aided Design, FMCAD 2020, Haifa, Israel, September 21-24, 2020. IEEE, 1–10. https://doi.org/10.34727/2020/ISBN.978-3-85448-042-6_9
-
[48]
Jiacheng Ma, Majd Ganaiem, Madeline Burbage, Theo Gregersen, Rachel McAmis, Freddy Gabbay, and Baris Kasikci. 2025. Proactive Runtime Detection of Aging- Related Silent Data Corruptions: A Bottom-Up Approach. InProceedings of the 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 4(Hilton La ...
-
[49]
S.C. Ma, P. Franco, and E.J. McCluskey. 1995. An experimental chip to evaluate test techniques experiment results. InProceedings of 1995 IEEE International Test Conference (ITC)
work page 1995
-
[50]
Timothy C. May and Murray H. Woods. 1978. A New Physical Mechanism for Soft Errors in Dynamic Memories. In16th International Reliability Physics Symposium. 33–40. https://doi.org/10.1109/IRPS.1978.362815
- [51]
-
[52]
E.J. McCluskey and Chao-Wen Tseng. 2000. Stuck-fault tests vs. actual defects. InProceedings International Test Conference 2000 (IEEE Cat. No.00CH37159)
work page 2000
-
[53]
Yixuan Mei, Shreya Varshini, Harish Dixit, Sriram Sankar, and K. V. Rashmi. 2026. SEVI: Silent Data Corruption of Vector Instructions in Hyper-Scale Datacenters. InProceedings of the 31st ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 2(USA)(ASPLOS ’26). Association for Computing Machinery, Ne...
-
[54]
Justin Meza, Qiang Wu, Sanjeev Kumar, and Onur Mutlu. 2015. Revisiting Mem- ory Errors in Large-Scale Production Data Centers: Analysis and Modeling of New Trends from the Field. In2015 45th Annual IEEE/IFIP International Conference on Dependable Systems and Networks. 415–426. https://doi.org/10.1109/DSN. 2015.57
work page doi:10.1109/dsn 2015
-
[55]
Liu, Bharath Parthasarathy, and Parthasarathy Ranganathan
Subhasish Mitra, Subho Banerjee, Martin Dixon, Rama Govindaraju, Peter Hochschild, Eric X. Liu, Bharath Parthasarathy, and Parthasarathy Ranganathan
-
[56]
arXiv:2508.01786 [cs.AR] https://arxiv.org/abs/2508.01786
Silent Data Corruption by 10x Test Escapes Threatens Reliable Computing. arXiv:2508.01786 [cs.AR] https://arxiv.org/abs/2508.01786
- [57]
-
[58]
S.S. Mukherjee, J. Emer, and S.K. Reinhardt. 2005. The soft error problem: an architectural perspective. In11th International Symposium on High-Performance Computer Architecture
work page 2005
-
[59]
S.S. Mukherjee, C. Weaver, J. Emer, S.K. Reinhardt, and T. Austin. 2003. A sys- tematic methodology to compute the architectural vulnerability factors for a high-performance microprocessor. InProceedings. 36th Annual IEEE/ACM In- ternational Symposium on Microarchitecture, 2003. MICRO-36.29–40. https: //doi.org/10.1109/MICRO.2003.1253181
-
[60]
N. Oh, S. Mitra, and E.J. McCluskey. 2002. ED4I: error detection by diverse data and duplicated instructions.IEEE Trans. Comput.51, 2 (2002), 180–199. https://doi.org/10.1109/12.980007
-
[61]
N. Oh, P.P. Shirvani, and E.J. McCluskey. 2002. Control-flow checking by software signatures.IEEE Transactions on Reliability51, 1 (2002), 111–122. https://doi. org/10.1109/24.994926
-
[62]
Nahmsuk Oh, Philip Shirvani, and Edward McCluskey. 2002. Error detection by duplicated instructions in super-scalar processors.IEEE Transactions on Reliability51, 1 (2002), 63–75. https://doi.org/10.1109/24.994913
-
[63]
OpenHW Group. 2019. CVA6 RISC-V CPU. https://github.com/openhwgroup/ cva6
work page 2019
-
[64]
openssl 2024. OpenSSL. https://github.com/openssl/openssl
work page 2024
-
[65]
openssl-manual [n. d.]. OPENSSL Debian Manpages. https://manpages.debian. org/testing/libssl-doc/OPENSSL_LH_doall_arg.3ssl.en.html
-
[66]
George Papadimitriou and Dimitris Gizopoulos. 2023. AVGI: Microarchitecture- Driven, Fast and Accurate Vulnerability Assessment. In2023 IEEE International Symposium on High-Performance Computer Architecture (HPCA). 935–948. https: //doi.org/10.1109/HPCA56546.2023.10071105
-
[67]
George Papadimitriou and Dimitris Gizopoulos. 2023. Silent Data Corruptions: Microarchitectural Perspectives.IEEE Trans. Comput.72, 11 (2023), 3072–3085. https://doi.org/10.1109/TC.2023.3285094
-
[68]
George Papadimitriou, Dimitris Gizopoulos, Harish Dattatraya Dixit, and Sriram Sankar. 2023. Silent Data Corruptions: The Stealthy Saboteurs of Digital Integrity. 2023 IEEE 29th International Symposium on On-Line Testing and Robust System Design (IOLTS)(2023), 1–7. https://api.semanticscholar.org/CorpusID:261315246
work page 2023
-
[69]
Priyadarsan Patra. 2007. On the cusp of a validation wall.IEEE Design & Test of Computers24, 2 (2007), 193–196. https://doi.org/10.1109/MDT.2007.54
-
[70]
Paul, Kunhyuk Kang, Haldun Kufluoglu, Muhammad A
Bipul C. Paul, Kunhyuk Kang, Haldun Kufluoglu, Muhammad A. Alam, and Kaushik Roy. 2007. Negative Bias Temperature Instability: Estimation and Design for Improved Reliability of Nanoscale Circuits.IEEE Transactions on Computer- Aided Design of Integrated Circuits and Systems26, 4 (2007), 743–751. https: //doi.org/10.1109/TCAD.2006.884870
-
[71]
Mahesh Prabhu and Jacob A. Abraham. 2012. Functional test generation for hard to detect stuck-at faults using RTL model checking. In2012 17th IEEE European Test Symposium (ETS)
work page 2012
-
[72]
G.A. Reis, J. Chang, N. Vachharajani, R. Rangan, and D.I. August. 2005. SWIFT: software implemented fault tolerance. InInternational Symposium on Code Gen- eration and Optimization. 243–254. https://doi.org/10.1109/CGO.2005.34
-
[73]
Matthias Sauer, Young Moon Kim, Jun Seomun, Hyung-Ock Kim, Kyung-Tae Do, Jung Yun Choi, Kee Sup Kim, Subhasish Mitra, and Bernd Becker. 2013. Early-life-failure detection using SAT-based ATPG. In2013 IEEE International Test Conference (ITC). 1–10. https://doi.org/10.1109/TEST.2013.6651925
-
[74]
Jian Shen and Jacob A. Abraham. 1998. Native mode functional test generation for processors with applications to self test and design validation.Proceedings International Test Conference 1998 (IEEE Cat. No.98CH36270)(1998), 990–999. https://api.semanticscholar.org/CorpusID:14132281
work page 1998
-
[75]
Eshan Singh, Clark W. Barrett, and Subhasish Mitra. 2017. E-QED: Electrical Bug Localization During Post-silicon Validation Enabled by Quick Error Detection and Formal Methods. (2017)
work page 2017
-
[76]
Gordon L. Smith. 1985. Model for Delay Faults Based upon Paths. InInternational Test Conference
work page 1985
-
[77]
Wilson Snyder, Paul Wasson, and Duane Galbi et al. [n. d.].Verilator. https: //verilator.org
-
[78]
2009.Fault Tolerant Computer Architecture
Daniel Sorin. 2009.Fault Tolerant Computer Architecture. Vol. 4. https://doi.org/ 10.2200/S00192ED1V01Y200904CAC005
-
[79]
T.M. Storey and W. Maly. 1990. CMOS bridging fault detection. InProceedings. International Test Conference 1990
work page 1990
-
[80]
E. Takeda and N. Suzuki. 1983. An empirical model for device degradation due to hot-carrier injection.IEEE Electron Device Letters4, 4 (1983), 111–113. https://doi.org/10.1109/EDL.1983.25667
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.