arxiv: 2605.00868 · v2 · submitted 2026-04-22 · ⚛️ physics.app-ph · cond-mat.mtrl-sci· cs.LG· cs.SY· eess.SY

Recognition: unknown

Autonomous Reliability Qualification of Ga₂O₃-based Hydrogen and Temperature Sensors via Safe Active Learning

Davi Febba , William A. Callahan , Anna Sacchi , Andriy Zakutayev

Authors on Pith no claims yet

Pith reviewed 2026-05-09 22:05 UTC · model grok-4.3

classification ⚛️ physics.app-ph cond-mat.mtrl-scics.LGcs.SYeess.SY

keywords safe active learningGa2O3 sensorsreliability qualificationGaussian processhydrogen stresstemperature sensorsdegradation modelingautonomous experimentation

0 comments

The pith

Safe Active Learning autonomously characterizes Ga2O3 sensor reliability under thermal and hydrogen stress by modeling rectification as a safety observable.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces a Safe Active Learning framework for autonomous reliability testing of rectifying Ga2O3-based devices exposed to coupled thermal and hydrogen stress. It uses a Gaussian-process surrogate to model how rectification evolves with time, temperature, and H2 concentration, then applies safety constraints including lower-confidence-bound checks, adaptive time windows, and a trust region around previously verified safe points. A two-phase strategy begins with conservative exploration and later relaxes targets as degradation occurs. The method is shown in simulation to expand the tested space safely and in real automated probe-station experiments to incur only one unsafe measurement in the first phase while collecting data that supports later long-horizon degradation forecasts.

Core claim

SAL treats rectification as a device-physics-motivated in situ safety observable and models its evolution over elapsed time, temperature, and H2 concentration with a Gaussian-process surrogate. Safety is enforced through an adaptive completion-time window, time-window lower-confidence-bound checks, a trust region anchored to verified safe conditions, and a two-phase strategy that shifts from conservative to relaxed rectification targets as the device degrades. This produces a curated dataset that enables offline long-horizon forecasting of saturating degradation trends via a structured Gaussian-process model with a condition-dependent Kohlrausch-Williams-Watts mean and residual covariance.

What carries the argument

Safe Active Learning framework that combines a Gaussian-process surrogate for rectification evolution, adaptive completion-time windows, lower-confidence-bound safety checks, a trust region around verified safe conditions, and a two-phase conservative-to-relaxed exploration strategy.

If this is right

The approach safely enlarges the explored stress space compared with purely conservative manual testing.
The collected data directly supports structured Gaussian-process models that forecast long-time saturating degradation.
Only one unsafe measurement occurred in the initial conservative phase of the reported campaign.
The same safety-observable strategy applies to other devices where an in situ measurable proxy for safe operation can be defined.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The method could be extended to incorporate multiple safety observables simultaneously for devices with several failure modes.
Offline forecasting accuracy might improve further by feeding the SAL-acquired data into physics-informed mean functions beyond the Kohlrausch-Williams-Watts form.
Similar frameworks could reduce manual oversight in high-risk characterization campaigns such as high-voltage or radiation testing.

Load-bearing premise

Rectification can be treated as a reliable in situ safety observable whose evolution under thermal and hydrogen stress is accurately captured by the Gaussian-process surrogate so that unsafe conditions are avoided.

What would settle it

A follow-up experiment in which the SAL policy selects a new stress point that produces an unpredicted rectification collapse below the safety threshold, causing device damage or multiple unsafe measurements.

read the original abstract

We present a Safe Active Learning (SAL) framework for autonomous reliability characterization of rectifying Ga$_2$O$_3$-based devices under coupled thermal and hydrogen stress. SAL treats rectification as a device-physics-motivated safety observable and models its evolution over elapsed time, temperature, and H$_2$ concentration using a Gaussian-process surrogate. To handle condition-dependent and uncertain experiment durations, the method combines an adaptive completion-time window, time-window lower-confidence-bound safety checks, a trust region anchored to previously verified safe conditions, and a two-phase strategy that transitions from conservative safe exploration to progressively relaxed rectification targets as the device degrades. We first evaluate SAL in simulation, where it safely expands the explored region while learning the evolving rectification surface. We then demonstrate SAL experimentally on an automated high-temperature probe-station platform using a Pt/Cr$_2$O$_3$:Mg/$\beta$-Ga$_2$O$_3$ device. In the reported campaign, phase 1 incurred only one unsafe measurement associated with spurious current-voltage sweeps, while phase 2 intentionally probed lower-rectification regimes. Finally, we use the curated SAL dataset for offline long-horizon forecasting of device response at a target voltage using a structured Gaussian-process model with a condition-dependent Kohlrausch--Williams--Watts mean and a residual covariance kernel. The model captures long-time, saturating degradation trends in an auxiliary validation dataset, illustrating how safety-aware autonomous experimentation enables both conservative characterization and subsequent degradation modeling. Although demonstrated here for a rectifying Ga$_2$O$_3$ device, SAL is applicable to other systems where a measurable in situ safety observable can be defined.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper's Safe Active Learning method for Ga2O3 reliability testing keeps most runs safe in experiment but leaves the Gaussian process safety model's calibration unproven.

read the letter

The core contribution is a Safe Active Learning framework that treats device rectification as an in-situ safety signal and uses it to guide exploration of temperature and hydrogen conditions on Ga2O3 rectifiers. The method adds an adaptive completion-time window, time-window lower-confidence-bound checks, a trust region around verified points, and a two-phase shift from conservative to relaxed targets as degradation proceeds. In simulation it expands the explored region while staying safe, and the single experimental campaign on an automated probe station recorded only one unsafe measurement in phase 1, which the authors attribute to a spurious current-voltage sweep. The collected data then feeds an offline structured Gaussian-process model with a Kohlrausch-Williams-Watts mean function that captures saturating long-term trends on a validation set. That combination of tailored safety pieces and a concrete hardware demonstration is what is new here. The experimental platform description and the forecasting step are the parts that feel most useful. The main weakness is that the safety claims rest on the Gaussian-process surrogate producing well-calibrated uncertainties for rectification under coupled stress, yet the paper supplies no coverage diagnostics, kernel sensitivity checks, or tests against accelerating degradation. Without those, it is difficult to judge how reliably the lower-confidence-bound filter would prevent unsafe conditions on other devices or longer horizons. Baselines and quantitative error metrics are also thin, so the performance edge over simpler sampling strategies is not fully established. This work is aimed at groups building autonomous reliability testbeds for sensors or power electronics who already have an in-situ safety observable they can measure. A reader looking for practical examples of safe exploration on real hardware would find value in the platform details and the two-phase strategy. The experimental demonstration is concrete enough that the paper deserves a serious referee, even though the validation of the surrogate uncertainties needs strengthening. I would send it for review and ask specifically for uncertainty calibration results and at least one baseline comparison.

Referee Report

3 major / 2 minor

Summary. The paper presents a Safe Active Learning (SAL) framework for autonomous reliability qualification of Ga₂O₃-based rectifying devices under coupled thermal and hydrogen stress. Rectification is treated as an in-situ safety observable and modeled via a Gaussian-process surrogate over time, temperature, and H₂ concentration. The method integrates an adaptive completion-time window, time-window lower-confidence-bound (LCB) safety checks, a trust region anchored to verified safe points, and a two-phase strategy that relaxes rectification targets as degradation proceeds. SAL is first tested in simulation for safe region expansion, then demonstrated experimentally on an automated high-temperature probe station with a Pt/Cr₂O₃:Mg/β-Ga₂O₃ device (one unsafe measurement in phase 1), and finally used to curate data for offline long-horizon forecasting of saturating degradation trends via a structured GP with condition-dependent Kohlrausch–Williams–Watts mean function.

Significance. If the safety mechanism proves robust, the work offers a practical advance for autonomous experimentation in harsh-environment device characterization, reducing reliance on manual oversight while enabling subsequent predictive modeling of degradation. The two-phase relaxation and integration of safety-aware data collection with structured forecasting are notable strengths. The framework's generality to other systems with definable in-situ safety observables is a positive aspect.

major comments (3)

[SAL framework description (methods) and experimental campaign] The central safety claim rests on the GP surrogate accurately modeling rectification evolution so that LCB checks and the trust region prevent unsafe conditions. However, no uncertainty calibration diagnostics (e.g., coverage probabilities, posterior predictive checks, or kernel appropriateness under accelerating degradation) are reported for the rectification GP. This is load-bearing for the LCB safety filter and the assertion that only one unsafe measurement occurred in phase 1.
[Experimental demonstration and abstract] The experimental results report only one unsafe measurement in phase 1 but provide no quantitative context: total number of measurements, comparison against non-SAL baselines (e.g., random sampling or standard active learning), error bars on any performance metrics, or full validation statistics for the surrogate. Without these, the claim that SAL 'safely expands the explored region' cannot be rigorously assessed.
[Forecasting section] The offline long-horizon forecasting uses a condition-dependent KWW-structured mean on the curated SAL dataset, yet no details are given on how the two-phase relaxation or trust-region constraints affect the training distribution, nor are cross-validation or hold-out metrics reported for the auxiliary validation dataset. This weakens support for the forecasting utility as a direct outcome of the SAL procedure.

minor comments (2)

[Abstract] The abstract introduces 'phase 1' and 'phase 2' without a concise definition; a one-sentence clarification would improve readability for readers outside the immediate subfield.
[Methods] Notation for the trust region and adaptive completion-time window should be introduced with explicit symbols or a small diagram to avoid ambiguity when the LCB checks are described.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive and detailed feedback, which highlights important aspects for strengthening the safety and validation claims in our work on Safe Active Learning for Ga2O3 sensor reliability qualification. We address each major comment point by point below, proposing targeted revisions where appropriate while defending the core contributions based on the presented framework, simulations, and experiments.

read point-by-point responses

Referee: [SAL framework description (methods) and experimental campaign] The central safety claim rests on the GP surrogate accurately modeling rectification evolution so that LCB checks and the trust region prevent unsafe conditions. However, no uncertainty calibration diagnostics (e.g., coverage probabilities, posterior predictive checks, or kernel appropriateness under accelerating degradation) are reported for the rectification GP. This is load-bearing for the LCB safety filter and the assertion that only one unsafe measurement occurred in phase 1.

Authors: We agree that uncertainty calibration diagnostics would provide stronger quantitative support for the LCB safety mechanism. The manuscript introduces the rectification GP surrogate and its role in the adaptive safety checks but does not report explicit calibration metrics such as coverage probabilities or posterior predictive checks. In the revised manuscript, we will add these diagnostics in a dedicated methods subsection or appendix, including calibration plots, assessment of kernel suitability under time-dependent degradation, and coverage statistics over the experimental conditions. This will directly bolster the claim regarding the single unsafe measurement in phase 1. revision: yes
Referee: [Experimental demonstration and abstract] The experimental results report only one unsafe measurement in phase 1 but provide no quantitative context: total number of measurements, comparison against non-SAL baselines (e.g., random sampling or standard active learning), error bars on any performance metrics, or full validation statistics for the surrogate. Without these, the claim that SAL 'safely expands the explored region' cannot be rigorously assessed.

Authors: We acknowledge the need for more quantitative context in the experimental demonstration. The manuscript reports the single unsafe measurement in phase 1 but omits the total measurement count, error bars, and full surrogate validation details. We will revise the experimental section and abstract to include the total number of measurements, performance metrics with uncertainty estimates, and expanded surrogate validation statistics. Regarding non-SAL baselines, direct experimental comparisons on the same device are not feasible due to irreversible degradation; however, the paper already includes simulation-based evaluations of safe region expansion versus standard active learning, which we will expand and more prominently feature to rigorously support the safety claims. revision: partial
Referee: [Forecasting section] The offline long-horizon forecasting uses a condition-dependent KWW-structured mean on the curated SAL dataset, yet no details are given on how the two-phase relaxation or trust-region constraints affect the training distribution, nor are cross-validation or hold-out metrics reported for the auxiliary validation dataset. This weakens support for the forecasting utility as a direct outcome of the SAL procedure.

Authors: We agree that explicit linkage between the SAL curation process and forecasting performance would strengthen the narrative. The manuscript demonstrates long-horizon forecasting on the SAL-curated data using the structured GP but does not analyze the effects of two-phase relaxation and trust-region constraints on the training distribution or report cross-validation/hold-out metrics. In revision, we will add a discussion of the resulting data distribution characteristics and include quantitative hold-out validation metrics (such as predictive RMSE and log-likelihood) on the auxiliary dataset to better establish the forecasting as an outcome of the SAL procedure. revision: yes

Circularity Check

0 steps flagged

No circularity: SAL method and forecasting model defined independently of results

full rationale

The paper introduces the Safe Active Learning framework by defining rectification as an in-situ safety observable, modeling its evolution via an independent Gaussian-process surrogate, and specifying safety mechanisms (adaptive time window, LCB checks, trust region, two-phase relaxation) as separate algorithmic choices. These components are not defined in terms of the target outcomes. Simulation and experimental runs apply the framework to collect data; the offline forecasting step fits a distinct structured GP (KWW mean plus residual kernel) to the collected dataset and validates on an auxiliary set, without any reduction of predictions to fitted inputs by construction. No self-citations, uniqueness theorems, or ansatzes are invoked in a load-bearing way within the described derivation chain. The overall structure remains self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Based solely on the abstract, the work relies on standard Gaussian process modeling and device-physics concepts without explicit new free parameters or invented entities; the safety observable is motivated by existing device physics rather than postulated anew.

pith-pipeline@v0.9.0 · 5630 in / 1281 out tokens · 45582 ms · 2026-05-09T22:05:01.148806+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

21 extracted references · 1 canonical work pages

[1]

Run experiments at seed(T,G)points
[2]

Compute rectificationRfrom IV data; ban invalid conditions
[3]

Initialize datasetDwith(t,T,G,R)
[4]

Phase 1: Safe exploration at fixed thresholdh(fori=1toN 1)
[5]

Fit GP model onlogR(t,T,G)
[6]

Construct adaptive completion-time window from recent du- rations
[7]

Compute time-window lower boundL win(T,G)
[8]

Build safe setS safe ={(T,G):L win ≥h}
[9]

Intersect with trust region around measured-safe points
[10]

If empty: relaxβ; if still empty , invoke rescue
[11]

Select(T,G)maximizing weighted acquisition inside safe set
[12]

Execute experiment, update dataset and durations, ban invalid points
[13]

Phase 1 Rescue (if safe set collapses)
[14]

Re-measure most recent safe condition
[15]

Classify outcome as modeling artifact, boundary behavior, or failure
[16]

Resume Phase 1 (remaining budget), transition to Phase 2, or terminate
[17]

Phase 2: Threshold relaxation (forj=1toN 2)
[18]

Update targetτ k via exponential decay
[19]

• If safe set empty: switch to trust-region uncertainty fall- back

Ifτ k >1+ε: • Repeat Phase 1 logic using thresholdτ k. • If safe set empty: switch to trust-region uncertainty fall- back
[20]

Ifτ k ≈1: drop safety gating and maximize uncertainty globally
[21]

2024 , archiveprefix =

Execute experiment and update model. Fig. 2 High-level pseudocode of the Safe Active Learning (SAL) algo- rithm. Phase 1 and Phase 2 operate under fixed iteration budgetsN1 andN 2, respectively. accounts for uncertain experiment durations, a trust region in (T,G)anchored to previously observed safe conditions, and a two- phase sampling schedule that start...

work page arXiv 2018