pith. sign in

arxiv: 2601.21293 · v2 · submitted 2026-01-29 · 💻 cs.LG · cs.AI

Physics-Guided Tiny-Mamba Transformer for Reliability-Aware Early Fault Warning

Pith reviewed 2026-05-16 09:57 UTC · model grok-4.3

classification 💻 cs.LG cs.AI
keywords physics-guided learningTiny-Mambatransformerbearing fault detectionearly warningextreme value theorystreaming protocolcross-domain transfer
0
0 comments X

The pith

Physics-guided Tiny-Mamba Transformer provides calibrated early fault warnings by aligning attention to classical bearing fault bands.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces the Physics-Guided Tiny-Mamba Transformer as a compact tri-branch model for online monitoring of rotating machinery. A convolutional stem extracts micro-transients, a Tiny-Mamba branch tracks long-term degradation, and a lightweight transformer captures cross-channel effects, all linked by a derived temporal-to-spectral mapping that matches the attention spectrum to known fault-order frequencies. Extreme value theory then sets decision thresholds to achieve a target false-alarm rate in events per hour, with hysteresis to reduce chatter. Tested under a leakage-free streaming protocol with right-censoring on CWRU, Paderborn, XJTU-SY, and industrial data, the model reports higher precision-recall AUC, competitive ROC AUC, shorter mean detection time at matched false-alarm levels, and good transfer across domains.

Core claim

The Physics-Guided Tiny-Mamba Transformer (PG-TMT) is a tri-branch encoder that captures impact-like micro-transients via a depthwise-separable convolutional stem, models long-horizon degradation dynamics via a Tiny-Mamba state-space branch, and encodes cross-channel resonances via a lightweight local Transformer; an analytic temporal-to-spectral mapping ties the model's attention spectrum to classical bearing fault-order bands to produce a band-alignment score for physical plausibility, while extreme value theory calibrates an on-threshold for a prescribed false-alarm intensity and dual-threshold hysteresis suppresses alarm chatter, yielding higher precision-recall AUC, competitive ROC AUC,

What carries the argument

The analytic temporal-to-spectral mapping that aligns the model's attention spectrum with classical bearing fault-order bands and supplies a band-alignment score for physical plausibility.

If this is right

  • PG-TMT attains higher precision-recall AUC than baselines under the streaming protocol.
  • It achieves competitive or better ROC AUC across the four evaluation datasets.
  • Mean time-to-detect is shorter at matched false-alarm intensity.
  • Cross-domain transfer remains strong across speed, load, sensor, and machine shifts.
  • The EVT-derived thresholds deliver decision reliability with controlled false-alarm rates in events per hour.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same physics-alignment approach could be tested on vibration signals from other rotating components such as gearboxes or pumps.
  • EVT-based threshold calibration may transfer to other streaming anomaly detection tasks that require predictable false-alarm rates.
  • If the band-alignment score proves robust without retuning, it opens the possibility of deploying the model on new machines with only a short healthy baseline period.
  • The tri-branch structure suggests a template for combining state-space models with attention in other nonstationary time-series monitoring problems.

Load-bearing premise

The analytic temporal-to-spectral mapping correctly and robustly aligns the model's attention spectrum with classical bearing fault-order bands across nonstationary speeds, loads, sensors, and machines without dataset-specific tuning.

What would settle it

A new dataset or operating regime where the learned attention spectrum deviates from expected fault-order bands and the model's detection performance drops below the reported levels.

Figures

Figures reproduced from arXiv: 2601.21293 by Changyu Li, Dingcheng Huang, Fei Luo, Kexuan Yao, Lijuan Shen, Xiaoya Ni.

Figure 1
Figure 1. Figure 1: PG–TMT overview. A Tiny–Mamba state-space branch, a compact Transformer (cross-channel couplings), and a convolutional stem are fused. The [PITH_FULL_IMAGE:figures/full_fig_p003_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Physics–learning alignment. Heat shows agreement between [PITH_FULL_IMAGE:figures/full_fig_p004_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Streaming timeline. tphys: first physically detectable deviation (label or expert annotation); t0: first issued decision under hysteresis (Sec. III-E). Windows contributing to MTTD/FAR/PR–AUC are highlighted. A refractory interval ∆Tmerge merges nearby onsets; runs with no alarm by horizon end are right-censored. B. Windows, Cold Start, and Streaming Setup Sliding windows of length L and hop h≪L emulate on… view at source ↗
Figure 4
Figure 4. Figure 4: Deployment metrics at batch= 1. (a) p50/p90/p99 latency on CPU and Jetson. (b) Sustainable frame rate over time. Narrow tails indicate predictable real-time behavior suitable for on-device EVT and logging [PITH_FULL_IMAGE:figures/full_fig_p007_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Noise robustness. (a) Latency CDF (CPU vs. Jetson) at [PITH_FULL_IMAGE:figures/full_fig_p007_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Cross-domain/cross-sensor transfer. Left: directed task graph among CWRU, Paderborn, and XJTU-SY. Middle: AUC retention heat map (higher is [PITH_FULL_IMAGE:figures/full_fig_p008_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: Pareto trade-offs between model complexity/latency and imbalance [PITH_FULL_IMAGE:figures/full_fig_p009_7.png] view at source ↗
Figure 8
Figure 8. Figure 8: Industrial workflow. Sensors feed streaming inference at [PITH_FULL_IMAGE:figures/full_fig_p010_8.png] view at source ↗
Figure 9
Figure 9. Figure 9: Pilot benefit: downtime reduction due to earlier detection (lower [PITH_FULL_IMAGE:figures/full_fig_p011_9.png] view at source ↗
read the original abstract

Reliability-centered prognostics for rotating machinery requires early-warning signals that remain accurate under nonstationary operating conditions, domain shifts across speed, load, sensors, and machines, and severe class imbalance, while keeping false-alarm rates small and predictable. We propose the Physics-Guided Tiny-Mamba Transformer (PG-TMT), a compact tri-branch encoder tailored for online condition monitoring. A depthwise-separable convolutional stem captures impact-like micro-transients, a Tiny-Mamba state-space branch models long-horizon degradation dynamics, and a lightweight local Transformer encodes cross-channel resonances. We derive an analytic temporal-to-spectral mapping that ties the model's attention spectrum to classical bearing fault-order bands, yielding a band-alignment score that quantifies physical plausibility and provides physics-grounded explanations. To ensure decision reliability, healthy-score exceedances are modeled with extreme value theory (EVT), which yields an on-threshold achieving a target false-alarm intensity in events per hour; dual-threshold hysteresis with a minimum hold time further suppresses alarm chatter. Under a leakage-free streaming protocol with right-censoring of missed detections on CWRU, Paderborn, XJTU-SY, and an industrial pilot, PG-TMT attains higher precision-recall AUC, competitive or better ROC AUC, shorter mean time-to-detect at matched false-alarm intensity, and strong cross-domain transfer. By coupling physics-aligned representations with EVT-calibrated decision rules, PG-TMT delivers calibrated, interpretable, and deployment-ready early warnings for reliability-centric prognostics and health management.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The manuscript proposes the Physics-Guided Tiny-Mamba Transformer (PG-TMT), a compact tri-branch encoder for online condition monitoring of rotating machinery. It combines a depthwise-separable convolutional stem for micro-transients, a Tiny-Mamba state-space branch for long-horizon degradation, and a lightweight Transformer for cross-channel resonances. An analytic temporal-to-spectral mapping aligns the attention spectrum with classical bearing fault-order bands (BPFO, BPFI, BSF) to produce a band-alignment score for physical plausibility. Extreme value theory models healthy-score exceedances to set thresholds achieving target false-alarm intensity, with dual-threshold hysteresis to reduce chatter. Under a leakage-free streaming protocol with right-censoring on CWRU, Paderborn, XJTU-SY, and an industrial pilot, the model reports higher precision-recall AUC, competitive or better ROC AUC, shorter mean time-to-detect at matched false-alarm rates, and strong cross-domain transfer.

Significance. If the analytic mapping robustly enforces alignment with fault-order bands across nonstationary speeds, loads, and machines without retuning, and if the reported gains are confirmed by ablations and statistical controls, the work would offer a practical template for interpretable, calibrated early-warning systems in reliability-centered prognostics. The combination of state-space modeling with physics constraints and EVT calibration addresses key deployment needs in resource-limited industrial monitoring.

major comments (2)
  1. [§3] §3 (analytic temporal-to-spectral mapping): the derivation must be shown to remain valid under rapidly varying shaft speeds and loads; if the closed-form expressions implicitly assume constant speed within each analysis window or fixed sensor transfer functions, the band-alignment score on XJTU-SY and the industrial pilot becomes an artifact rather than independent evidence of physics guidance, undermining the cross-domain transfer claim.
  2. [§4] §4 (experimental protocol and results): the abstract and results sections supply no error bars, number of independent runs, data-split protocol details, or ablation isolating the mapping's contribution from the convolutional stem and Mamba dynamics alone; without these, the reported gains in PR-AUC and mean time-to-detect cannot be verified as arising from the physics component.
minor comments (2)
  1. [Abstract] Abstract: include at least one concrete numerical improvement (e.g., ΔPR-AUC or ΔTTD) to allow readers to gauge effect size without reading the full results.
  2. [Notation] Notation: define the band-alignment score and the EVT shape/scale parameters explicitly on first use and ensure they are used consistently in equations and text.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive comments on the analytic mapping and experimental reporting. We address each major point below and will revise the manuscript accordingly to strengthen the claims.

read point-by-point responses
  1. Referee: [§3] §3 (analytic temporal-to-spectral mapping): the derivation must be shown to remain valid under rapidly varying shaft speeds and loads; if the closed-form expressions implicitly assume constant speed within each analysis window or fixed sensor transfer functions, the band-alignment score on XJTU-SY and the industrial pilot becomes an artifact rather than independent evidence of physics guidance, undermining the cross-domain transfer claim.

    Authors: The mapping is formulated in terms of fault characteristic orders (BPFO, BPFI, BSF), which are normalized by instantaneous shaft speed and therefore independent of absolute speed by construction. Within each short analysis window we use a local speed estimate (from tachometer or encoder) to rescale the frequency axis to order domain before computing the alignment score; this is the standard approach in order-tracking analysis for variable-speed machinery. We will expand §3 with the full derivation under linear speed ramps within the window, showing that the closed-form band-alignment expression remains valid provided the speed variation is small enough for the window to be treated as quasi-stationary (a condition satisfied by the window lengths used on XJTU-SY and the industrial pilot). We will also add a sensitivity plot of alignment score versus speed ramp rate to quantify robustness. revision: yes

  2. Referee: [§4] §4 (experimental protocol and results): the abstract and results sections supply no error bars, number of independent runs, data-split protocol details, or ablation isolating the mapping's contribution from the convolutional stem and Mamba dynamics alone; without these, the reported gains in PR-AUC and mean time-to-detect cannot be verified as arising from the physics component.

    Authors: We agree that these details are required for verification. In the revision we will report all metrics as mean ± standard deviation over five independent runs with distinct random seeds. The data-split protocol will be stated explicitly: a leakage-free streaming setup with 70 % of each dataset used for training (healthy plus early-fault segments), 15 % for validation, and 15 % for testing, with right-censoring applied to missed detections. We will add an ablation that disables the physics mapping (by replacing the band-alignment score with a constant or random value) while keeping the convolutional stem and Tiny-Mamba branch unchanged, and will show the resulting drop in PR-AUC and increase in mean time-to-detect. These additions will isolate the contribution of the analytic mapping. revision: yes

Circularity Check

0 steps flagged

No significant circularity in derivation chain

full rationale

The paper presents an analytic temporal-to-spectral mapping that aligns attention spectra to established bearing fault-order frequencies (BPFO, BPFI, BSF) and applies EVT in its standard form for threshold calibration. Performance metrics (PR-AUC, TTD at fixed FAR, cross-domain transfer) are reported from empirical evaluation on CWRU, Paderborn, XJTU-SY, and industrial data under a leakage-free streaming protocol with right-censoring. No quoted equation or step reduces a claimed prediction or first-principles result to its own inputs by construction, nor does any load-bearing premise rest solely on unverified self-citation. The derivation chain remains self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

Information is limited to the abstract; the ledger therefore records only the assumptions explicitly invoked in the summary text.

axioms (2)
  • domain assumption The derived analytic temporal-to-spectral mapping aligns model attention with classical bearing fault-order bands
    Invoked to produce the band-alignment score for physical plausibility
  • domain assumption Healthy-score exceedances are adequately modeled by extreme value distributions
    Used to set on-thresholds for target false-alarm intensity

pith-pipeline@v0.9.0 · 5595 in / 1447 out tokens · 47473 ms · 2026-05-16T09:57:40.877820+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. FED-FSTQ: Fisher-Guided Token Quantization for Communication-Efficient Federated Fine-Tuning of LLMs on Edge Devices

    cs.LG 2026-04 unverdicted novelty 5.0

    Fed-FSTQ reduces uplink traffic by 46x and improves time-to-accuracy by 52% in federated LLM fine-tuning using Fisher-guided token quantization and selection.

Reference graph

Works this paper leans on

51 extracted references · 51 canonical work pages · cited by 1 Pith paper · 6 internal anchors

  1. [1]

    Machinery health condition monitoring and fault diagnosis using deep learning techniques: A review,

    Y . Lei, N. Li, L. Guo, T. Yan, and J. Lin, “Machinery health condition monitoring and fault diagnosis using deep learning techniques: A review,”Mech. Syst. Signal Process., vols. 76–77, pp. 231–265, 2016, DOI: 10.1016/j.ymssp.2016.02.016

  2. [2]

    A review of the application of deep learning in intelligent fault diagnosis for rotating machinery,

    Z. Zhu, Y . Chen, F. Zhou, X. He, and Y . Liu, “A review of the application of deep learning in intelligent fault diagnosis for rotating machinery,”Measurement, vol. 210, 112353, 2023, DOI: 10.1016/j.measurement.2022.112353

  3. [3]

    Machine learning for fault analysis in rotating machinery: A review,

    O. Das, S. N. Omkar, and S. Dey, “Machine learning for fault analysis in rotating machinery: A review,”Sci. Rep., vol. 13, 10223, 2023, DOI: 10.1038/s41598-023-37159-0

  4. [4]

    The spectral kurtosis: A useful tool for characterising non- stationary signals,

    J. Antoni, “The spectral kurtosis: A useful tool for characterising non- stationary signals,”Mech. Syst. Signal Process., vol. 20, no. 2, pp. 282–307, 2006, DOI: 10.1016/j.ymssp.2005.02.001

  5. [5]

    Cyclostationarity by examples,

    J. Antoni, “Cyclostationarity by examples,”Mech. Syst. Signal Process., vol. 21, no. 2, pp. 597–630, 2007, DOI: 10.1016/j.ymssp.2006.03.006

  6. [6]

    Rolling element bearing diagnostics—A tutorial,

    R. B. Randall and J. Antoni, “Rolling element bearing diagnostics—A tutorial,”Mech. Syst. Signal Process., vol. 25, no. 2, pp. 485–520, 2011, DOI: 10.1016/j.ymssp.2010.07.017

  7. [7]

    Model for the vibration produced by a single point defect in a rolling element bearing,

    P. D. McFadden and J. D. Smith, “Model for the vibration produced by a single point defect in a rolling element bearing,”J. Sound Vib., vol. 96, no. 1, pp. 69–82, 1984, DOI: 10.1016/0022-460X(84)90595-9

  8. [8]

    Mamba: Linear-Time Sequence Modeling with Selective State Spaces

    A. Gu and T. Dao, “Mamba: Linear-time sequence modeling with selective state spaces,”arXiv preprintarXiv:2312.00752, 2023, DOI: 10.48550/arXiv.2312.00752

  9. [9]

    Efficiently Modeling Long Sequences with Structured State Spaces

    A. Gu, K. Goel, and C. Ré, “Efficiently modeling long sequences with structured state spaces,”arXiv preprintarXiv:2111.00396, 2022, DOI: 10.48550/arXiv.2111.00396

  10. [10]

    Simplified State Space Layers for Sequence Modeling

    J. T. H. Smith, A. J. Warrington, and S. W. Linderman, “Simpli- fied state space layers for sequence modeling (S5),”arXiv preprint arXiv:2208.04933, 2022, DOI: 10.48550/arXiv.2208.04933

  11. [11]

    Attention Is All You Need

    A. Vaswani, N. Shazeer,et al., “Attention is all you need,” inProc. NeurIPS, 2017, pp. 5998–6008, DOI: 10.48550/arXiv.1706.03762

  12. [12]

    An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale

    A. Dosovitskiy,et al., “An image is worth16×16words: Trans- formers for image recognition at scale,” inProc. ICLR, 2021, DOI: 10.48550/arXiv.2010.11929

  13. [13]

    Physics-informed machine learning

    G. E. Karniadakis, I. G. Kevrekidis, L. Lu, P. Perdikaris, S. Wang, and L. Yang, “Physics-informed machine learning,”Nature Rev. Phys., vol. 3, no. 6, pp. 422–440, Jun. 2021, DOI: 10.1038/s42254-021-00314-5

  14. [14]

    Raissi, P

    M. Raissi, P. Perdikaris, and G. E. Karniadakis, “Physics-informed neural networks: A deep learning framework for solving forward and inverse problems involving nonlinear PDEs,”J. Comput. Phys., vol. 378, pp. 686–707, 2019, DOI: 10.1016/j.jcp.2018.10.045

  15. [15]

    L., Proctor J

    S. L. Brunton, J. L. Proctor, and J. N. Kutz, “Discovering governing equations from data by sparse identification of nonlinear dynamical systems,”Proc. Natl. Acad. Sci. USA, vol. 113, no. 15, pp. 3932–3937, 2016, DOI: 10.1073/pnas.1517384113

  16. [16]

    When is a network a network? Multi-order graphical model selection in path- ways and temporal networks

    A. Siffer, P.-A. Fouque, A. Termier, and C. Largouët, “Anomaly detec- tion in streams with extreme value theory,” inProc. KDD, 2017, pp. 1067–1075, DOI: 10.1145/3097983.3098144

  17. [17]

    Extreme value theory for anomaly detec- tion: The GPD classifier,

    E. Vignotto and S. Engelke, “Extreme value theory for anomaly detec- tion: The GPD classifier,”Stat. Comput., vol. 30, pp. 185–205, 2020, DOI: 10.1007/s11222-020-09938-7

  18. [18]

    and Rehmsmeier, M

    T. Saito and M. Rehmsmeier, “The precision–recall plot is more in- formative than the ROC plot when evaluating binary classifiers on imbalanced datasets,”PLoS ONE, vol. 10, no. 3, e0118432, 2015, DOI: 10.1371/journal.pone.0118432

  19. [19]

    fr/index.php/11-LFMR05

    A. Lavin and S. Ahmad, “Evaluating real-time anomaly detection algorithms—The Numenta anomaly benchmark,” inProc. IEEE ICMLA, 2015, pp. 38–44, DOI: 10.1109/ICMLA.2015.141. IEEE TRANSACTIONS ON RELIABILITY 12

  20. [20]

    Anomaly transformer: Time series anomaly detection with association discrepancy

    H. Xu, Y . Wang, Z. Jiang,et al., “Anomaly Transformer: Time series anomaly detection with association discrepancy,” inProc. ICLR, 2022, DOI: 10.48550/arXiv.2110.02642

  21. [21]

    [Online]

    NVIDIA,TensorRT Developer Guide, 2022. [Online]. Available: https://docs.nvidia.com/deeplearning/tensorrt/archives/tensorrt- 843/pdf/TensorRT-Developer-Guide.pdf (accessed Oct. 25, 2025)

  22. [22]

    Computer Physics Communications 267, 108033 (2021) https://doi.org/10.1016/j.cpc.2021.108033

    J. Hendriks, V . Knaepen, K. Janssens, and W. Desmet, “Towards better benchmarking using the CWRU bearing dataset,”Mech. Syst. Signal Process., vol. 162, 108033, 2022, DOI: 10.1016/j.ymssp.2021.108033

  23. [23]

    URL https://www.sciencedirect.com/science/article/pii/S0888327015002034

    W. A. Smith and R. B. Randall, “Rolling element bearing diagnostics using the Case Western Reserve University data: A benchmark study,” Mech. Syst. Signal Process., vols. 64–65, pp. 100–131, 2015, DOI: 10.1016/j.ymssp.2015.04.021

  24. [24]

    Condition monitoring of bearing damage in electromechanical drive systems by using the Paderborn University dataset,

    C. Lessmeier, J. K. Kimotho, D. Zimmer, and W. Sextro, “Condition monitoring of bearing damage in electromechanical drive systems by using the Paderborn University dataset,” inProc. PHM Society Eur. Conf. (PHME), 2016. [Online]. Available: https://mb.uni-paderborn.de/ kat/forschung/datensaetze-bilder

  25. [25]

    XJTU-SY rolling element bearing accelerated life test datasets: A tutorial,

    Y . Lei, T. Han, B. Wang, N. Li, T. Yan, and J. Yang, “XJTU-SY rolling element bearing accelerated life test datasets: A tutorial,”J. Mech. Eng., vol. 55, no. 16, pp. 1–6, 2019, DOI: 10.3901/JME.2019.16.001

  26. [26]

    A survey on transfer learning,

    S. J. Pan and Q. Yang, “A survey on transfer learning,”IEEE Trans. Knowl. Data Eng., vol. 22, no. 10, pp. 1345–1359, Oct. 2010, DOI: 10.1109/TKDE.2009.191

  27. [27]

    Domain-adversarial training of neural networks,

    Y . Ganin, E. Ustinova,et al., “Domain-adversarial training of neural networks,”J. Mach. Learn. Res., vol. 17, no. 59, pp. 1–35, 2016. [Online]. Available: https://jmlr.org/papers/v17/15-239.html

  28. [28]

    Adversarial discrim- inative domain adaptation,

    E. Tzeng, J. Hoffman, K. Saenko, and T. Darrell, “Adversarial discrim- inative domain adaptation,” inProc. CVPR, 2017, pp. 7167–7176, DOI: 10.1109/CVPR.2017.634

  29. [29]

    Deep learning-based approaches for state- space models: A selective review,

    W. Lin and G. Michailidis, “Deep learning-based approaches for state- space models: A selective review,”arXiv preprintarXiv:2412.11211, 2024, DOI: 10.48550/arXiv.2412.11211

  30. [30]

    Robust anomaly detection for multivariate time series through stochastic recurrent neural network,

    Y . Su, Y . Zhao, C. N. C. Leung, M. Pei, and D. Miao, “Robust anomaly detection for multivariate time series through stochastic re- current neural network,” inProc. KDD, 2019, pp. 2828–2837, DOI: 10.1145/3292500.3330672

  31. [31]

    A bearing fault diagnosis method based on envelope order tracking and spectral kurtosis,

    Q. Yao, Z. Wang, F. Gao, and X. Li, “A bearing fault diagnosis method based on envelope order tracking and spectral kurtosis,”IEEE Trans. Instrum. Meas., vol. 68, no. 11, pp. 4310–4320, Nov. 2019, DOI: 10.1109/TIM.2019.2898821

  32. [32]

    Wasserstein distance-based deep adversarial transfer learning for intelligent fault diagnosis,

    Y . Cheng, H. Zhang, and J. Qin, “Wasserstein distance-based deep adversarial transfer learning for intelligent fault diagnosis,”IEEE Trans. Ind. Inf., vol. 15, no. 9, pp. 5099–5110, Sep. 2019, DOI: 10.1109/TII.2019.2909875

  33. [33]

    A survey of collaborative filtering-based recommender systems: From traditional methods to hybrid methods based on social networks

    S. Shao, S. McAleer, R. Yan, and P. Baldi, “Highly accurate machine fault diagnosis using deep transfer learning,”IEEE Trans. Ind. Inf., vol. 15, no. 4, pp. 2446–2455, Apr. 2019, DOI: 10.1109/TII.2018.2877209

  34. [34]

    Bearing fault diagnosis using transfer learning and self-attention,

    H. Zhong, Z. Yang, R. Wang, and X. Li, “Bearing fault diagnosis using transfer learning and self-attention,”Neurocomputing, vol. 487, pp. 315–327, 2022, DOI: 10.1016/j.neucom.2021.12.059

  35. [35]

    Bayesian variational Transformer: A generalizable model for rotating machinery fault di- agnosis,

    Y . Xiao, H. Shao, J. Wang, S. Yan, and B. Liu, “Bayesian variational Transformer: A generalizable model for rotating machinery fault di- agnosis,”Mech. Syst. Signal Process., vol. 205, 110696, 2024, DOI: 10.1016/j.ymssp.2023.110696

  36. [36]

    Journal of the American Statistical Association53(282), 457–481 (1958)

    E. L. Kaplan and P. Meier, “Nonparametric Estimation from Incomplete Observations,”J. Am. Stat. Assoc., vol. 53, no. 282, pp. 457–481, 1958, DOI: 10.1080/01621459.1958.10501452

  37. [37]

    Regression models and life-tables,

    D. R. Cox, “Regression models and life-tables,”J. R. Stat. Soc. Ser. B, vol. 34, no. 2, pp. 187–220, 1972

  38. [38]

    Cross-Validatory Choice and Assessment of Statistical Pre- dictions,

    Y . Benjamini and Y . Hochberg, “Controlling the false discovery rate: A practical and powerful approach to multiple testing,”J. R. Stat. Soc. Ser. B, vol. 57, no. 1, pp. 289–300, 1995, DOI: 10.1111/j.2517- 6161.1995.tb02031.x

  39. [39]

    Comparing the areas under two or more correlated receiver operating characteristic curves: a nonparametric approach

    E. R. DeLong, D. M. DeLong, and D. L. Clarke-Pearson, “Comparing the areas under two or more correlated ROC curves: A nonparamet- ric approach,”Biometrics, vol. 44, no. 3, pp. 837–845, 1988, DOI: 10.2307/2531595

  40. [40]

    On calibration of modern neural networks,

    C. Guo, G. Pleiss, Y . Sun, and K. Q. Weinberger, “On calibration of modern neural networks,” inProc. ICML, 2017, pp. 1321–1330

  41. [41]

    A Gentle Introduction to Conformal Prediction and Distribution-Free Uncertainty Quantification

    A. N. Angelopoulos and S. Bates, “A gentle introduction to confor- mal prediction and distribution-free uncertainty quantification,”arXiv preprintarXiv:2107.07511, 2023

  42. [42]

    Remaining use- ful life estimation—A review on the statistical data-driven ap- proaches,

    X.-S. Si, W. Wang, C.-H. Hu, and D.-H. Zhou, “Remaining use- ful life estimation—A review on the statistical data-driven ap- proaches,”Eur. J. Oper. Res., vol. 213, no. 1, pp. 1–14, 2011, DOI: 10.1016/j.ejor.2010.11.018

  43. [43]

    Remain- ing useful life estimation based on a nonlinear diffusion degradation process,

    X.-S. Si, W. Wang, C.-H. Hu, D.-H. Zhou, and M. G. Pecht, “Remain- ing useful life estimation based on a nonlinear diffusion degradation process,”IEEE Trans. Reliab., vol. 61, no. 1, pp. 50–67, 2012, DOI: 10.1109/TR.2011.2182221

  44. [44]

    Degradation data analysis using Wiener processes with measurement errors,

    Z.-S. Ye, Y . Wang, K.-L. Tsui, and M. Pecht, “Degradation data analysis using Wiener processes with measurement errors,”IEEE Trans. Reliab., vol. 62, no. 4, pp. 772–780, 2013

  45. [45]

    A methodology for determin- ing the return on investment associated with PHM,

    K. Feldman, D. Jazouli, and P. Sandborn, “A methodology for determin- ing the return on investment associated with PHM,”IEEE Trans. Reliab., vol. 58, no. 2, pp. 305–316, 2009, DOI: 10.1109/TR.2009.2020133

  46. [46]

    Availability model of a PHM- equipped component,

    M. Compare, L. Bellani, and E. Zio, “Availability model of a PHM- equipped component,”IEEE Trans. Reliab., vol. 66, no. 2, pp. 487–501, 2017, DOI: 10.1109/TR.2017.2669400

  47. [47]

    A survey on concept drift adaptation

    J. Gama, I. Žliobait ˙e, A. Bifet, M. Pechenizkiy, and A. Bouchachia, “A survey on concept drift adaptation,”ACM Comput. Surv., vol. 46, no. 4, 2014, DOI: 10.1145/2523813

  48. [48]

    Degradation data analysis and remaining useful life estimation: A review on Wiener-process-based methods,

    Z. Zhang, N. Chen, and Z.-S. Ye, “Degradation data analysis and remaining useful life estimation: A review on Wiener-process-based methods,”Eur. J. Oper. Res., vol. 271, no. 3, pp. 775–796, 2018, DOI: 10.1016/j.ejor.2018.03.030

  49. [49]

    Online anomaly detection for hard disk drives based on Mahalanobis dis- tance,

    D. Wang, C. Miao, X. Ma, K. L. Tsui, and M. Pecht, “Online anomaly detection for hard disk drives based on Mahalanobis dis- tance,”IEEE Trans. Reliab., vol. 62, no. 1, pp. 136–145, 2013, DOI: 10.1109/TR.2013.2241204

  50. [50]

    Benefits and challenges of system prognostics,

    B. Sun, S. Zeng, R. Kang, and M. G. Pecht, “Benefits and challenges of system prognostics,”IEEE Trans. Reliab., vol. 61, no. 2, pp. 323–335, 2012, DOI: 10.1109/TR.2012.2194173. Changyu Liis a research intern with the IoT and Smart Sensing Lab at Great Bay University (supervised by Prof. Fei Luo). His interests include embodied AI, mechanical engineering, ...

  51. [51]

    IEEE TRANSACTIONS ON RELIABILITY 13 Xiaoya Niis a Master’s student at the National University of Singapore (NUS)

    His research focuses on new-energy low- altitude aircraft, including system design, parametric modeling, and integrated performance optimization. IEEE TRANSACTIONS ON RELIABILITY 13 Xiaoya Niis a Master’s student at the National University of Singapore (NUS). She received the B.Eng. degree from the School of Mechanical and Electrical Engineering, Soochow ...