pith. machine review for the scientific record. sign in

stat.AP

Applications

Biology, Education, Epidemiology, Engineering, Environmental Sciences, Medical, Physical Sciences, Quality Control, Social Sciences

0
stat.AP 2026-05-13 1 theorem

Ensemble models forecast daily tree water use from weather data

An ensemble prediction method for forecasting sap flux density and water-use in temperate trees

Additive model ensemble tested across nine species and three seasons enables irrigation planning under climate stress.

Figure from the paper full image
abstract click to expand
Efficient irrigation management is crucial to agriculture, forestry and horticulture, especially under climate change. Developments in novel sensors and Internet of Things technology provide an opportunity to carry out real-time monitoring of tree sap flux density, which, when coupled with advanced modelling techniques, enables online prediction of tree water-use suitable for irrigation planning. This manuscript proposes one such pipeline that integrates tree sap flow sensors, weather station sensors, and statistical models to predict tree daily water-use. In particular, an ensemble prediction approach based on additive models has been developed, using weather data as the main predictors of sap flux density. The method simultaneously considers the non-linear relationships and interactions between sap flux density and its environmental drivers, as well as the variability among individual trees over different growing seasons. Using field data collected on nine species of trees over the 2022, 2023 and 2024 growing seasons, this manuscript demonstrates the ability of the proposed ensemble prediction method in producing reliable daily water-use forecasts. The challenge of predicting tree water-use under climate stress, such as heatwaves, and the impact of tree sizes on prediction have also been discussed. Despite the complexity of the problem, the proposed method provides a general framework which can be used in a variety of settings, from commercial tree growers to conversation work. The model can be integrated into an online monitoring platform, assisting real-time decision making on irrigation management.
0
0
stat.AP 2026-05-13 Recognition

Corrected audits flag discrimination in every Illinois insurer

Fairness Testing for Algorithmic Pricing

Standard errors fail for deterministic pricing; new variance formulas show minority zip codes pay $34-$158 more than comparable-risk white z

Figure from the paper full image
abstract click to expand
Algorithmic systems now set prices across auto insurance, credit, and lending markets, and regulators increasingly require firms to demonstrate that these systems do not discriminate against protected groups. The standard audit regresses pricing output on a protected attribute and legitimate rating factors, then tests the resulting coefficient using ordinary least squares standard errors. We show that this approach is structurally invalid. Pricing algorithms are usually deterministic, so residuals reflect approximation error rather than sampling variability, rendering classical standard errors invalid in both direction and magnitude. We derive correct asymptotic variance estimators for OLS and GLM audit regressions and the correct cross-covariance formula for proxy discrimination testing. Applied to quoted premiums from 34 Illinois auto insurers, every insurer fails the conditional demographic parity test, with minority zip codes paying $34-$158 more per year than comparable-risk white zip codes. The standard proxy discrimination formula flags zero insurers. However, our corrected formula identifies all 34 as statistically significant, of which 16 exceed the substantive threshold. Our framework provides statistically valid audit tools for any deterministic algorithmic system subject to regression-based fairness testing.
0
0
stat.AP 2026-05-13 Recognition

Balanced designs give exact ANOVA estimators for dose-response precision

Statistical evaluation of measurement precision in linear dose-response relationships via interlaboratory studies

Closed-form decomposition separates repeatability from between-lab variance and tests whether labs differ in baseline or in slope.

abstract click to expand
This paper proposes a framework for evaluating the statistical precision of measurement methods from interlaboratory studies where the outcome is a dose-response relationship summarized by a regression line. For such measurement methods, where a linear mixed-effects model is applied that allows laboratories to differ in both baseline level and dose-response slope, we define precision evaluation metrics specified in ISO 5725, repeatability and between-laboratory variances. These are method-level precision metrics, and the latter are constructed as design-averaged dose-specific between-laboratory variances over the dose levels and the participating laboratories. For fully balanced designs with common dose levels and equal replication, we obtain an exact decomposition of the total sum of squares, closed-form analysis of variance (ANOVA) estimators of the precision variances, and three associated $F$-tests targeting (i) the overall dose-response trend, (ii) homogeneity of intercepts, and (iii) homogeneity of slopes across laboratories. This formulation enables precision to be quantified and estimated directly and supports an evaluation of whether between-laboratory discrepancies are caused primarily by baseline shifts or by differences in sensitivity, in contrast to fixed-effect comparisons that only detect the presence of differences. Furthermore, we analyze data obtained from an interlaboratory study on observations in bronchoalveolar lavage fluid from experiments involving the intratracheal administration of nanomaterials to rats, using the proposed method as a case study.
0
0
stat.AP 2026-05-12 2 theorems

Prediction markets lag behind statistical models for flu and measles

Prediction Markets Underperform Simple Baselines For Infectious Disease Forecasting

Market prices for US hospitalizations and case counts are beaten by expert ensembles and basic baselines, with no gain from combining them.

Figure from the paper full image
abstract click to expand
Prediction markets (e.g., Polymarket, Kalshi) allow participants to bet on future events, producing real-time forecasts based on collective judgment. In domains such as elections and finance, markets have been effective at aggregating information, often rivaling or outperforming expert forecasters or polls. Whether this performance extends to infectious disease dynamics is unclear. Participants are self-selected and typically lack epidemiological expertise. However, markets can respond in real time to emerging news and unstructured signals in ways that standard forecasting pipelines cannot. Also, substantial financial stakes encourage participants to make an effort to be accurate. We evaluate Polymarket forecasts during 2025 and 2026 for two settings: weekly cumulative influenza hospitalizations in the US, which have an established expert-curated forecasting ensemble (CDC FluSight), and monthly measles cases, which do not. Across both settings, prediction markets fail to outperform standard benchmarks. For influenza, markets are competitive with low-performing individual FluSight models but are dominated by the FluSight ensemble: even when we combine market forecasts with the ensemble, the best combination puts zero weight on the markets. For measles, markets are outperformed by simple statistical baselines. We diagnose two sources of market inefficiency: placement of probability mass on impossible outcomes (e.g., decreasing values in cumulative forecasts) and low trading volume. These results suggest that current prediction markets are not reliable forecasters of infectious disease dynamics on their own or useful as complementary features for existing forecasting systems.
0
0
stat.AP 2026-05-12 1 theorem

Consensus SEIR trajectories obtained via constrained FrΓ©chet mean

Estimating Consensus Epidemic Trajectories via a Constrained Power Fr\'echet Mean with Functional Registration

Exposed-infectious curve pairs are averaged under differential-equation constraints so that full dynamics and parameters can be recovered.

Figure from the paper full image
abstract click to expand
We propose a method for summarizing multiple solutions to SEIR-type compartmental models on a functional space by computing a constrained power Fr\'echet mean with functional registration to obtain consensus epidemic trajectories with partial mechanistic interpretability. In our method, we regard the pairs of exposed and infectious compartments as objects in a Hilbert space, and the consensus trajectory is defined as the solution to a constrained optimization problem. Differential equation constraints and population constraints are incorporated in the optimization to preserve a partially mechanistic interpretation regarding the infectious compartment. The full dynamics with additional susceptible and removed compartments can then be recovered from the estimated trajectories and parameters. We develop an efficient block-optimization algorithm based on functional data analysis and illustrate the method using simulated and literature-derived epidemiological parameters for COVID-19 in the early phase of the pandemic that began in 2020. The proposed approach provides a generalized trajectory-summarization framework that includes mean- and median-type estimators on a functional space and holds potential for model averaging and ensemble forecasting in infectious disease modeling.
0
0
stat.AP 2026-05-11 1 theorem

Functional regression maps when physical activity interventions work

Quantifying Time-Varying Physical Activity Intervention Effects via Functional Regression

Treating daily step counts as full curves shows distinct timing and durability for each of three incentive strategies in a real trial.

Figure from the paper full image
abstract click to expand
Physical activity (PA) intervention studies often collect repeated intensity measurements over long observation periods. Quantifying the variation in intervention effects over the study period is critical to evaluating and improving intervention strategies, yet many analyses reduce PA data into scalar summary measures, resulting in limited insights. We propose a functional regression framework, which captures time-varying intervention effects by modeling the entire PA trajectory as a functional observation. From both methodological and practical perspectives, we demonstrate the advantages of function-on-scalar regression (FoSR) over the traditional two-step approach of applying functional principal components analysis (FPCA) followed by regressing scores on covariates. The FoSR is further extended to a function-on-function regression (FoFR) for studying the association of PA across time periods. Methods are applied to daily step counts from the Social incentives to Encourage Physical Activity and Understand Predictors (STEP UP) study, revealing distinct and highly interpretable time-varying effects of three intervention strategies on PA and differences in their sustainability. Our case study highlights the feasibility of functional data analysis techniques for uncovering novel insights in intervention studies with high-dimensional endpoints.
0
0
stat.AP 2026-05-11 2 theorems

Bayesian inference directly yields optimal wildlife management actions

Bayesian decision theory for wildlife management under uncertainty: from inference to action

In wolf and muskrat examples, expected utility over posterior simulations balances population risk against control benefits without stopping

Figure from the paper full image
abstract click to expand
Ecologists are increasingly expected to inform management decisions under uncertainty, yet most analytical workflows stop at statistical inference. This disconnect limits the practical impact of ecological modelling, particularly in high-stakes contexts such as wildlife management, where decisions must balance ecological, economic and social objectives. Bayesian decision theory provides a coherent framework to bridge this gap. It propagates uncertainty from posterior distributions to quantify the consequences of alternative actions through utility functions. Despite its strong theoretical foundations, it remains underused in ecology. Here, we present a practical workflow for implementing Bayesian decision theory using standard Bayesian tools. We illustrate the approach with two case studies. First, wolf management in France, where the decision consists of selecting the number of wolves that can be removed under uncertainty about population dynamics. Second, invasive muskrat management in the Netherlands, where the decision involves allocating a fixed control effort across space. In both cases, expected utility is computed from posterior simulations, explicitly accounting for uncertainty and trade-offs. Results show that optimal decisions emerge as a compromise between competing objectives. In the wolf case, optimal harvest balances removal benefits and population risk. In the muskrat case, optimal effort increases with the importance of population reduction and is unevenly allocated across provinces. These examples show that Bayesian decision theory can be implemented as a direct extension of standard inference. By making trade-offs explicit, it enhances transparency, reproducibility, and relevance for management. More broadly, it provides a flexible basis for integrating ecological modelling with decision-making.
0
0
stat.AP 2026-05-11 Recognition

Transfer learning improves abundance estimates from CPUE data

Accounting for variable detection functions in temporal abundance modeling via transfer learning

Detection functions learned from capture-recapture data are transferred to adjust for variable probabilities in catch-per-unit-effort models

Figure from the paper full image
abstract click to expand
Relative abundance, measured as the number of animals caught per unit of sampling effort (CPUE), is commonly used to monitor fish and wildlife populations, largely because sampling methods are cost-effective to implement. Modeling relative abundance, however, requires the assumption that the detection probability is constant across sampling events. This assumption is likely not valid, as the probability of detection often varies as a function of several factors, including the characteristics of individual animals and environmental conditions at the time of sampling. In contrast, methods to estimate absolute abundance, such as capture-recapture (CR), account for variable detection, but are often infeasible to implement across large spatiotemporal scales. Despite this, CR data are sometimes available for species of interest, albeit at smaller spatiotemporal extents. Leveraging information on detection probabilities from CR data to help inform estimates of widely available CPUE data could strengthen inferences about the status of fish and wildlife populations. We propose an approach to (i) learn the effect of environmental covariates on detection probabilities from CR data and (ii) transfer these detection functions to CPUE models for improved inference. Shown empirically through a simulation study, this approach improves estimates of abundance and the ability to detect temporal trends. We apply our transfer learning method using CR and CPUE data to recreationally important smallmouth bass (\textit{Micropterus dolomieu}) fisheries in Pennsylvania, USA rivers.
0
0
stat.AP 2026-05-11 Recognition

Time-warping adapts RNN fuel moisture model across lag classes

Transfer Learning for Dead Fuel Moisture Prediction Using Time-Warping Recurrent Neural Networks

Pretrained on 10h data, the network predicts 1h, 100h, and 1000h moisture by rescaling dynamics to match each class's response time.

abstract click to expand
This paper proposes a time-warping transfer learning method, a technique for temporally rescaling the learned dynamics of a recurrent neural network (RNN) with a Long Short-Term Memory (LSTM) layer to enable task transfer across fuel moisture classes. Fuel moisture content (FMC) is divided into idealized classes based on characteristic lag time. Large quantities of real-time data are available for 10h fuels from sensors on weather stations, but observations of other fuel classes are sparse in space and time. We use transfer learning to adapt an RNN pretrained on 10h FMC to predict FMC for 1h, 100h, and 1000h fuels. We validate this method using data from a landmark field study conducted in Oklahoma that was used to calibrate the state-of-the-art Nelson fuel moisture model.
0
0
stat.AP 2026-05-11 Recognition

AI catches technical errors in POMP reviews that humans miss

Jagged AI in Scientific Peer Review: Evidence from POMP Data Analysis

But underperforms on narrative coherence and domain critique, with the uneven profile consistent across different instructions on 72 student

abstract click to expand
Despite their growing use in academic writing and statistical analysis, the performance of artificial intelligence (AI) tools in scientific peer review remains a largely unexplored area. A key challenge is jagged AI, a phenomenon where AI exhibits strong ability spikes in some domains while remaining deficient in others. To study this jaggedness in a practical data science context, we considered the task of reviewing partially observed Markov process (POMP) data analyses. POMP models, also known as state-space models or hidden Markov models, are used to fit mechanistic dynamic models to time series data in diverse applications including disease transmission, ecological dynamics, and financial risk assessment. Quality peer review in this area entails assessment of scientific context, identification of errors in implementing complex algorithms, and decisions concerning methodological best practices. We studied 72 POMP projects from four semesters of a University of Michigan graduate time series course for which the project reports, the source code, and student peer reviews are anonymized and open-access. We compared the human reviews with four AI reviewing agents, using Claude Code with differing instructions implemented as skill files. We found that AI reviewers exhibited a jagged capability profile, proficiently catching human-overlooked technical errors and invalid inference methodology, while failing to match human standards in checking interpretive errors, narrative coherence, and domain-informed model critique. The jaggedness was found to be similar for all agents, consistent with it being primarily a property of the underlying AI model rather than the specific instructions. Skill file configuration shifted which weaknesses agents emphasized, without removing the jaggedness.
0
0
stat.AP 2026-05-11 Recognition

Statistics alone convict nurses of patient deaths

There to care; not to kill: medical settings, statistics and wrongful convictions

When direct evidence like DNA or confessions is missing, clusters of incidents tied to one nurse's shifts become the main case for guilt.

abstract click to expand
This paper discusses wrongful convictions in a medical setting, focusing on nurses. Common features are lack of strong direct evidence: the nurse was never seen doing anything wrong. There is no DNA evidence of tampering of apparatus or medications by the nurse. There is no CCTV footage showing suspicious actions. Analysis of medical records at the time led coroners to issue certificates of natural deaths, and most events were not, at the time, thought suspicious by hospital staff. There is no confession and the nurse consistently asserts they are completely innocent. There is no evidence of earlier psychopathic behaviour. Instead, private writings (e.g., in a diary) are interpreted by the prosecution as a confession; mundane behaviour is given a sinister interpretation. Motive remains speculation. The main evidence is statistical: a spike in deaths or collapses and a statistical association with a particular nurse. There is forensic evidence which suggests one or two patients might have been harmed by administration of medication much used in the hospital, and even legitimately used earlier in the care of the alleged victims. Police investigations are driven by the hospital consultants who were clinically responsible for the patients allegedly killed or harmed by the nurse.
0
0
stat.AP 2026-05-11 2 theorems

Spatial mean modeling improves wind-speed volatility forecasts

Spatiotemporal dynamics of wind-speed volatility

GARCH models with distance weights show mean specification drives residual quality and out-of-sample performance, with persistence rising at

Figure from the paper full image
abstract click to expand
Wind-speed processes exhibit substantial temporal variability and spatial dependence, yet volatility dynamics across monitoring networks remain relatively unexplored. This study investigates the spatiotemporal behaviour of wind-speed volatility using daily observations from 141 stations in Northern Italy over 2016--2021, with measurements at 10 m and 100 m enabling the analysis of spatial and vertical dependence. We adopt a parsimonious spatiotemporal volatility framework based on GARCH-type dynamics, in which conditional variance depends on past local shocks and spatially aggregated information from neighbouring stations. The approach combines a spatial mean specification with structured volatility models using distance-based and directionally informed weight matrices. Results show that properly modelling spatial dependence in the mean is essential for well-behaved residuals and reliable inference. Forecast performance is strongly driven by the mean specification: flexible structures perform better when residual spatial dependence remains, while parsimonious distance-based models yield robust out-of-sample forecasts once spatial interactions are captured. Persistence increases with height, and a multivariate extension reveals cross-height dependence.
0
0
stat.AP 2026-05-11 Recognition

New method isolates uncertainty from incomplete asset data in risk models

Quantifying Exposure Information Uncertainty in Regional Risk Assessment

It decomposes total risk uncertainty to show how missing exposure details contribute bias alongside hazard and damage factors

Figure from the paper full image
abstract click to expand
Exposure characterization in regional risk assessment aims to assign physical properties to the assets of interest so they can be associated with damage and loss functions. While this process has benefited from the growing availability of public infrastructure inventories, these datasets often lack the detailed attributes required for high-resolution risk assessment. Missing attributes are commonly inferred using predictive models or engineering-based rulesets. However, these imputations are inherently imperfect and can introduce bias and additional uncertainty in regional risk estimates. This study proposes a methodology to quantify the bias and uncertainty in regional risk assessment that arises from probabilistic exposure characterization. By integrating analytical and simulation-based approaches, the methodology decomposes the total uncertainty into contributions from incomplete exposure information as well as other sources, including hazard and damage characterization. This decomposition clarifies how bias and uncertainty associated with missing exposure information are generated and propagated through the risk assessment pipeline. The methodology is applied to both bridge-specific and regional risk assessments. A high-resolution bridge exposure inventory is developed using a data augmentation framework that combines publicly available information with machine learning and engineering-based imputation methods.
0
0
stat.AP 2026-05-08 1 theorem

K-means finds stable clusters in continuous data with no groups

Drawing Lines in Psychological Space: What K-means Clustering Reveals in Simulated and Real Psychometric Data

Simulations and psychometric survey responses show the algorithm still partitions space into coherent regions around centroids.

abstract click to expand
K-means clustering is widely used in psychological and psychometric research to identify profiles, subgroups, and potential typologies, yet its classical formulation does not test whether such groups exist as latent psychological categories. Instead, K-means partitions multidimensional space into regions around centroids, favoring compact, approximately spherical clusters defined by geometric distance. In this paper, we examine this limitation through a sequence of controlled simulated datasets. We then extend the analysis to the SMARVUS dataset, a large international psychometric dataset comprising survey responses from university students across 35 countries, to evaluate whether similar geometric partitioning patterns emerge in empirical psychological data. By contrasting simulated and empirical data, this paper argues that K-means can produce stable and visually coherent clustering solutions even in continuous Gaussian latent spaces without true subgroup structure.
0
0
stat.AP 2026-05-08 2 theorems

Small capability errors amplify into large defect risks

Nonlinear Amplification of Finite-Sample Uncertainty in Capability-Based Decisions

Index values stable in linear space show wide PPM variability near thresholds due to tail curvature

Figure from the paper full image
abstract click to expand
This paper studies the propagation of finite-sample uncertainty under nonlinear transformations commonly used in statistical decision systems. In particular, we consider process capability indices, which are widely used in manufacturing practice but are estimated from finite samples, rendering the resulting approval decisions inherently uncertain. We show that such uncertainty cannot be fully explained by estimator variability alone, but is substantially influenced by a nonlinear amplification mechanism through which capability uncertainty is transformed into defect-risk metrics. While capability estimators vary approximately linearly with process dispersion, defect probabilities depend on tail curvature, causing small estimation errors to be disproportionately amplified in measures such as defect probability and parts-per-million (PPM) rates. Consequently, capability assessments that appear stable in index space may exhibit substantial variability in defect-risk space, particularly near decision thresholds. This insight provides a unified explanation of finite-sample decision instability, motivates reliability-aware decision formulations, and links sample-size requirements directly to decision reliability. Monte Carlo simulations and industrial data analyses validate the proposed mechanism and demonstrate its practical implications, including the impact of distributional assumptions on defect-risk estimation.
0
0
stat.AP 2026-05-08

Two-level model improves route predictions for mobility apps

A Two-Level Plackett-Luce Model for preference modeling in smart mobility platforms

It structures user preferences in stages to support personalized recommendations, carpool coordination, and incentive design.

abstract click to expand
The Plackett-Luce model is widely used to deal with probabilities in discrete choice settings. This paper introduces a novel two-level Plackett-Luce model combined with a multinomial logistic scheme that provides the basis for the route choice module in a smart mobility platform. For this, we develop Bayesian inference and prediction mechanisms to capture consumers' preferences for personalized route recommendations. The model is empirically tested, allowing for refinements and discussion of its applicability. We also illustrate its practical relevance through several use cases, including relevant route selection, coordinated car pooling, incentive design and synthetic data generation.
0
0
stat.AP 2026-05-08

Organ donation counts recover to pre-pandemic baselines after COVID break

Scalable model selection for count time series with structural breaks: application to solid-organ transplantation during and after COVID-19 in the USA and Italy

BIC-selected count models for US and Italy data show deceased donors returning faster than living donors, with COVID burden variables adding

Figure from the paper full image
abstract click to expand
Weekly healthcare activity data are typically non-negative counts with temporal dependence and occasional system-wide disruptions, settings in which Gaussian time-series models may be inadequate. Solid organ transplant (SOT) activity provides a representative case study of a count process affected by a large external shock. We analyse weekly SOT counts in the USA and Italy from 2014 to October 2024, stratified by donor type (deceased vs living) and organ (kidney and liver). We fit Poisson and negative-binomial count time-series models incorporating short-term dynamics, calendar effects (holiday weeks), and pre-specified pandemic-period level and/or slope indicators. Candidate specifications are screened within a pre-defined portfolio and selected using BIC within each training window. Forecasting performance is evaluated with an expanding-window design at horizons $h\in\{4,8,12\}$ weeks. Alongside RMSE, we report empirical coverage of nominal $95\%$ predictive intervals and interval widths to summarise calibration and forecast uncertainty. Across strata, selected models capture substantial pandemic-period deviations and varying post-period trajectories. Deceased-donor series are broadly consistent with a return towards pre-pandemic baselines in both countries, whereas the US living-donor series shows a more gradual convergence in this application. Within the explored model class and validation protocol, auxiliary covariates representing COVID burden and mortality add limited incremental predictive contribution beyond autoregressive and calendar components. Our analysis shows that donation time series represent an unconditional phenomenon, with auxiliary variables having a statistically negligible impact on donations, thus allowing a focus on more practical aspects related to ongoing challenges in the post-pandemic era, such as hospital overloads and changes in public perception.
0
0
stat.AP 2026-05-08

Counterfactual model fixes bias from uneven disease testing

Correcting heterogeneous diagnostic bias when developing clinical prediction models using causal hidden Markov models

Estimating diagnosis rates as if everyone were tested at the reference frequency brings observed-to-expected ratios near one for under-diagn

abstract click to expand
In routine care, individuals identified a priori as high-risk are usually tested for conditions more frequently. Protected attributes, such as sex or ethnicity may also determine testing frequency. Such heterogeneous detection rates across a population induce label error. This causes systematic model error for specific groups and biases performance metrics during validation. This paper proposes a method to correct for such bias in prediction models due to differential diagnostic delay. We use a causal inference framework to define our target estimand: an individual's diagnosis probability in a counterfactual scenario where their diagnosis rate matches that of a reference group. We model the longitudinal process as a hidden Markov model, in which confirmatory test results are emissions from a latent progressive disease stage. We validate our approach in simulated data and apply it to a case study of chronic kidney disease prediction using electronic health records. In simulations, our method reduces prediction bias and improves calibration-in-the-large, correcting the Observed:Expected ratio in the underdiagnosed group from 1.34 (standard deviation: 0.09) in a model developed without any correction for underdiagnosis bias to 1.02 (0.09). Violations of assumptions in the simulation affected the estimation of model parameters, but the proposed approach nonetheless remained better calibrated than the standard model. In the clinical case study, we identify diabetes as the main driver of observability, with an odds ratio of 10.36 (95% confidence interval, 9.80 - 11.02) in 6-month urine albumin-creatinine ratio testing rate. Using our approach to predict the counterfactual diagnostic rate in patients without diabetes, we improved the Observed:Expected ratio of a developed clinical prediction model from 1.55 (1.51 - 1.59) to 1.01 (0.98 - 1.04).
0
0
stat.AP 2026-05-08

Causal estimate trims BP benefit on heart disease to 3.4%

Causal Inference of Blood Pressure Reduction and Coronary Heart Disease Risk in the Framingham Study

Observational analysis overstates absolute risk reduction by 22 percent in Framingham data, with implications for risk calculators.

Figure from the paper full image
abstract click to expand
Standard cardiovascular risk calculators, including the Framingham Risk Score and the ACC/AHA Pooled Cohort Equations, estimate the conditional probability P(CHD | SysBP = s) rather than the interventional quantity P(CHD | do(SysBP = s)). When confounding is present, this distinction has direct clinical consequences: observational estimates may systematically overstate the absolute benefit of antihypertensive treatment. We applied Pearl's do-calculus to the Framingham Heart Study Offspring Cohort (n = 4,240; primary analysis on 3,776 complete cases; 574 ten-year coronary heart disease events). A structurally corrected directed acyclic graph (DAG) was specified and evaluated using conditional independence testing. The average causal effect (ACE) of a 20 mmHg systolic blood pressure reduction was estimated by g-computation with bootstrap confidence intervals, corroborated by propensity score matching and inverse probability weighting. G-computation yielded an ACE of 3.40 percent absolute risk reduction (95 percent CI: 2.64 to 4.14), compared with a naive observational estimate of 4.14 percent, corresponding to an approximate 21.8 percent relative overestimation. Conditional average treatment effects were estimated using R-Learner and T-Learner metalearners. These findings suggest that observational cardiovascular risk tools may overestimate the absolute benefit of blood pressure reduction, with implications for clinical risk stratification and prescribing thresholds.
0
0
stat.AP 2026-05-07

BISG sampling matches Pew Jewish survey at fraction of cost

Improving Minority Population Sampling with BISG Probabilities: Evidence from a Survey of Jewish Americans

Probabilities from names and addresses in a stratified Poisson design reproduce Pew estimates of religious denominations and activities.

Figure from the paper full image
abstract click to expand
Sampling geographically dispersed minority populations poses substantial challenges when individual group membership cannot be directly observed. Although stratified sampling can offer efficiency gains, these gains are typically modest unless the minority population is highly concentrated within a small number of strata. In this paper, we propose using Bayesian Improved Surname Geocoding (BISG) to enhance the efficiency of minority population sampling. BISG generates individual-level probabilities of minority group membership based on names and residential addresses. We incorporate these probabilities into a stratified Poisson probability sampling design. Applying the proposed approach to a national survey of Jewish Americans, we find that our estimates closely align with those from a large-scale Pew Research Center survey of the same population, which relied on a substantially more expensive sampling strategy involving geographic stratification and screening. At a fraction of the cost, our survey reproduces nearly identical patterns observed by Pew, including estimates of religious denominations and participation in specific religious activities.
0
0
stat.AP 2026-05-07

Randompack produces identical random sequences across languages and machines

Randompack: Cross-Platform Reproducible Random Number Generation and Distribution Sampling

The C library separates engines from distributions to achieve both faster sampling and bit-for-bit compatibility from the same seeds.

Figure from the paper full image
abstract click to expand
A C library for random number generation, Randompack, is presented. The library implements several modern random number generators (engines), including xoshiro256, PCG64, Philox, ranlux++, and sfc64; 14 continuous distributions including uniform, normal, exponential, gamma, beta, and multivariate normal; raw bit streams, bounded integers, permutations, and sampling without replacement. The engine and the distribution layers are separated so any engine can be used with any distribution. Benchmarks show that Randompack is faster overall than competing libraries, with speedup factors ranging from about 1 to 15 depending on engine, distribution, interface, and platform. A distinguishing feature is reproducibility: with the same seeds Randompack gives compatible results across programming languages, computers, CPU architectures, and compilers. The library includes comprehensive support for parallel simulation. It is accompanied by a comprehensive test suite, benchmarking programs, and example programs. Interfaces to Fortran, Python, Julia, and R have been implemented; their benchmark results are included, although their design and implementation are otherwise outside the scope of the article. Unlike other available C libraries with comparable scope, Randompack is permissively licensed under the MIT license, and it is open source and publicly available through GitHub and conda-forge.
0
0
stat.AP 2026-05-06

DroughtFormer forecasts African droughts to 90 days

Prediction of Drought and Flash Drought in Africa at the Seasonal-to-Subseasonal Scale using the Community Research Earth Digital Intelligence Twin Framework

The ML model matches climatology for soil moisture and vegetation health, aiding agricultural planning in data-sparse regions.

abstract click to expand
Droughts and flash droughts (rapidly developing droughts; FDs) remain impactful events that are known to desiccate landscape and destroy crops. In particular, droughts in Africa are often more impactful than in other locations, such as the United States or Europe, due to many regions in Africa heavily depending on local agriculture for sustenance. In recent years, large machine learning (ML) models, such as GraphCast and AIFS, have emerged as effective tools for global weather prediction. However, sparse data observations and few ML studies in Africa have left it unclear if these ML models retain their skill when focused on Africa. As such, this project seeks to examine the predictability of drought and FD in Africa using a CrossFormer model based on the Community Research Earth Digital Intelligence Twin (CREDIT) framework developed by NSF NCAR. Our CrossFormer model, termed DroughtFormer, incorporates variables from the ERA5 and GLDAS2 reanalyses and the IMERG and MODIS satellite observations, and employs dry air mass and moisture conservation, to predict soil moisture, vegetation health, and other drought-related surface variables. While DroughtFormer displayed lower accuracy in predicting precipitation and FD indices, it showed significant skill in predicting the remaining variables, delivering stable and skillful forecasts out to 90-day lead times (either beating out or having comparable skill to climatology). In particular, DroughtFormer skillfully represented climate anomalies for key variables, such as soil moisture (though it struggled with the magnitude of the anomalies). Thus, DroughtFormer showed significant promise in representing and predicting agricultural level drought in a region that is heavily impacted by drought events.
0
0
stat.AP 2026-05-05

Gait model yields misleading ratios under 10 percent

Evaluating the probative value of forensic gait analysis evidence using empirical data

Likelihood ratios from binary features and PCA stay reliable when personal variation is modeled correctly, guiding experts without replacing

Figure from the paper full image
abstract click to expand
Forensic gait analysis can aid the investigation of crimes through comparing features of gait captured in video footage. Modelling the probative value of gait evidence requires an understanding of the variation of features of gait between individuals in the population and within the same individuals. We address this question using a previously described population dataset and newly collected datasets with repeated observations of the same individuals on separate occasions. In addition to exploring the level of variability, correlation between features of gait, and the effect of demographic factors, we developed a likelihood ratio model through recoding features of gait as dichotomous variables and dimension reduction using PCA. High correlations between some features were observed, confirming that they should not contribute independently to the weight of evidence. The likelihood ratio model produced misleading likelihood ratios in less than 10% of the comparisons using the first four principal components. However, the risk increases when within-individual variability is mis-specified. Therefore, while the current model provides assistance to the judgement of gait experts, human expertise is indispensable to decide whether or not the difference in walking and/or recording conditions between the reference and questioned footage could have caused any observed differences in the features of gait. We discuss future directions in understanding the sources of the variability, improving statistical modelling and note the need to consider carefully how to select the relevant population for model fitting.
0
0
stat.AP 2026-05-05

SAFE framework controls error rates in trial safety data

Synergy Area with FDR-controlled Evaluation (SAFE) to robustly assess safety profile in clinical trials

Two layers evaluate synergy areas with evidence then apply FDR control across them to screen extremes and support conclusions.

abstract click to expand
Safety assessment plays a fundamental role in developing a new drug via clinical trials for ethical considerations. Due to complexity, manual review is typically conducted on the totality of data to draw safety conclusions. There are some existing quantitative methods to facilitate or tailor further medical review, with a controlled error rate and integration of clinical knowledge. In addition to those two key aspects, we emphasize the importance of relying on substantial evidence to draw robust conclusions on safety. Motivated by these three important properties, we propose a two-layer Synergy Area with FDR-controlled Evaluation (SAFE) structural framework to robustly assess the safety profile in clinical trials. In the first layer of SAFE, we investigate each clinically meaningful Synergy Area (SA) based on compelling evidence. In the next layer, the false discovery rate (FDR) is controlled for potential findings across all SAs. Simulation studies show that SAFE properly controls error rates within and across SAs at the nominal level. We further apply the proposed approach to two case studies based on real data from the Historical Trial Data (HTD) Sharing Initiative of the DataCelerate platform. As compared to some direct methods, SAFE demonstrates an appealing feature of screening out extreme data and reaching solid safety conclusions. It can act as either a building block in another framework, or a platform to incorporate additional components.
0
0
stat.AP 2026-05-05

Semi-Markov regimes capture epidemic waves with random durations

Semi-Markov Models with Particle-Based Bayesian Inference for Epidemics

The model represents sustained transmission phases parsimoniously and improves UK COVID-19 estimates when cases and deaths are used together

Figure from the paper full image
abstract click to expand
The COVID-19 pandemic has been characterised by multiple waves of transmission driven by interventions and emerging variants, challenging epidemic models that assume gradually evolving transmission dynamics. We propose a class of state-space models in which the transmission rate evolves through persistent regimes of random duration, governed by a semi-Markov process. This formulation yields an interpretable representation of sustained transmission phases and retains a parsimonious parameterisation. Particle-based Bayesian methods are well established for standard state-space models, but their use in semi-Markov settings has received comparatively limited attention. In epidemic applications, inference is further complicated by differential equation-driven latent dynamics and observation models defined through functionals of the latent process. We develop an inferential framework that accommodates these features, combining particle-based state updates with gradient-based parameter updates and enabling batch and sequential inference via particle and sequential Monte Carlo. We apply the proposed methodology to COVID-19 data from the United Kingdom and show that combining reported cases and deaths leads to more precise and stable inference compared to using deaths alone. These results illustrate the practical value of semi-Markov transmission models for epidemic analysis under complex observation schemes.
0
0
stat.AP 2026-05-05

NICU music studies shift from physiology to family bonds

Research trends in music-based interventions in neonatal intensive care units: a text mining and topic modeling study

Text mining of 83 abstracts shows growth in live music and parent involvement, moving beyond immediate stabilization to developmental andε…³η³»ζ€§

Figure from the paper full image
abstract click to expand
Background: Music-based interventions are increasingly used in neonatal intensive care units (NICUs), but the literature remains heterogeneous in intervention type, provider role, and research focus. This study examined research trends in NICU music-based intervention studies using text mining. Methods: We analyzed 83 abstracts from peer-reviewed studies published between 1998 and 2025. Methods included preprocessing, RAKE-based keyphrase extraction, keyword frequency analysis, temporal trend analysis, intervention-type comparison, and latent Dirichlet allocation topic modeling. The optimal number of topics was determined using the CaoJuan2009, Arun2010, and Deveaud2014 metrics. Results: Study volume increased steadily over time, with nearly half (38/83) published from 2020 onward. Early studies focused on passive music listening and short-term physiological outcomes, whereas recent studies increasingly examined singing, live music, and parent-involved interventions. Keyword analysis showed a shift from physiological stability and behavioral responses toward neurodevelopmental outcomes, parental emotional well-being, and parent-infant interaction. Music medicine studies emphasized passive auditory stimulation and immediate physiological outcomes, whereas music therapy studies addressed broader developmental, relational, and psychosocial topics. Topic modeling identified four major themes, with parent-involved physiological regulation and stress reduction the most frequent dominant topic. Conclusions: NICU music-based intervention research is becoming more interdisciplinary. The field has expanded from immediate physiological stabilization to broader developmental, relational, and psychosocial goals. Future work should clarify the distinction between music therapy and music medicine and promote interdisciplinary collaboration in NICU care.
0
0
stat.AP 2026-05-05 2 theorems

Scale-invariance fixes analogue ensemble volumes and dispersion

Structural and Lagrangian properties of analogue ensembles to characterize multifractality of stochastic processes

In Takens-reconstructed spaces, nearest-neighbor group sizes and their time spread directly reflect the scaling of fractional Brownian and

Figure from the paper full image
abstract click to expand
We present a framework for the scale-invariance characterization of stochastic processes in reconstructed finite-dimensional phase spaces. This framework analyses the structural and dynamical properties of the phase space and is based on a Takens embedding reconstruction followed by the definition of ensembles of analogue states. We define the analogues of a target state as its nearest neighbors. Then, we specify a collection of target states densely sampling the full phase space. For each target state, we search for the ensemble of its k-best analogues and we analyze its volume and dynamics. First, we study the probability distribution of the volumes and relate its mean and variance to the scale-invariance properties of the stochastic process. Second, we study the Lagrangian properties of the analogues by characterizing how they disperse in time. More particularly, we study the volume occupied by the analogue's successors in function of time and of their initial volume. We link these dynamical properties to the scale-invariance properties of the process. We analyze two types of stationary and dissipative 1-dimensional scale-invariant processes: regularized fractional Brownian motion and regularized multifractal random walk. For both processes, the structure and dynamics of the phase space are determined by their scale-invariant properties.
0
0
stat.AP 2026-05-05

Intraday risk curves boost asset selection

Large-Scale Asset Selection via Metric Dependence with Enriched High Frequency Information

Metric dependence screening on point-curve objects outperforms scalar and return-based methods on 2,938 Chinese stocks.

Figure from the paper full image
abstract click to expand
Large-scale portfolio choice is highly sensitive to estimation error, making the preliminary asset selection essential in empirical implementation. Existing selection rules typically rely on scalar returns or low dimensional high frequency summaries, and thus discard intraday risk dynamics that may be relevant for risk adjusted allocation. We propose Metric Dependence Screening (MDS), an asset selection procedure that incorporates high frequency information as object valued data. Each asset day observation is represented as a point-curve object combining daily return with an intraday risk state curve, equipped with a weighted product metric that preserves both reward information and within day risk dynamics. MDS ranks assets by a Fr\'echet variation based dependence score, measuring how much a risk adjusted target explains the metric dispersion of the asset representations. This yields a simple two stage portfolio procedure: MDS first reduces the investable universe, and standard mean-variance or minimum variance allocation is then applied. We develop a target slicing estimator and establish concentration, sure selection, and rank consistency guarantees under $\alpha$-mixing time series dependence and ultrahigh dimensionality. Simulations show that MDS performs well across both Euclidean and non-Euclidean settings. Using high frequency data for $2938$ Chinese A-share stocks from July 2023 to December 2025, we demonstrate that MDS improves out of sample portfolio performance over return based and scalar dependence based benchmarks, highlighting the value of preserving intraday risk dynamics.
0
0
stat.AP 2026-05-05 Recognition

Point-curve objects improve asset selection for portfolios

Large-Scale Asset Selection via Metric Dependence with Enriched High Frequency Information

Treating each asset day as a daily return paired with an intraday risk curve under a weighted metric yields stronger out-of-sample results.

Figure from the paper full image
abstract click to expand
Large-scale portfolio choice is highly sensitive to estimation error, making the preliminary asset selection essential in empirical implementation. Existing selection rules typically rely on scalar returns or low dimensional high frequency summaries, and thus discard intraday risk dynamics that may be relevant for risk adjusted allocation. We propose Metric Dependence Screening (MDS), an asset selection procedure that incorporates high frequency information as object valued data. Each asset day observation is represented as a point-curve object combining daily return with an intraday risk state curve, equipped with a weighted product metric that preserves both reward information and within day risk dynamics. MDS ranks assets by a Fr\'echet variation based dependence score, measuring how much a risk adjusted target explains the metric dispersion of the asset representations. This yields a simple two stage portfolio procedure: MDS first reduces the investable universe, and standard mean-variance or minimum variance allocation is then applied. We develop a target slicing estimator and establish concentration, sure selection, and rank consistency guarantees under $\alpha$-mixing time series dependence and ultrahigh dimensionality. Simulations show that MDS performs well across both Euclidean and non-Euclidean settings. Using high frequency data for $2938$ Chinese A-share stocks from July 2023 to December 2025, we demonstrate that MDS improves out of sample portfolio performance over return based and scalar dependence based benchmarks, highlighting the value of preserving intraday risk dynamics.
0
0
stat.AP 2026-05-04

Influence scores select compatible external controls to improve RCT precision

Adaptive Influence-Based Borrowing Framework for Improving Treatment Effect Estimation in RCTs Using External Controls

Nested subsets are formed from patients whose addition barely perturbs the RCT outcome model, and the subset minimizing mean squared erroris

Figure from the paper full image
abstract click to expand
Randomized controlled trials (RCTs) often suffer from limited sample sizes due to high costs and lengthy recruitment periods, compromising precision in treatment effect estimation. External real-world control data offer a valuable opportunity for augmentation, but na\"ive integration may introduce bias without careful compatibility assessment. This paper presents a practical tutorial on the adaptive influence-based borrowing framework~\citep{Yang-etal2026}, which addresses this challenge through a principled, individual-level borrowing strategy. The core intuition is straightforward: rather than indiscriminately pooling all external controls (ECs), the framework first asks how much each external patient would perturb the outcome model fitted using RCT controls. External patients whose inclusion barely changes this model are deemed comparable and prioritized for borrowing, whereas those who substantially shift it are flagged as potentially incompatible. This individual-level compatibility metric, based on the influence score, is then used to construct a sequence of nested candidate subsets of ECs, from which the optimal subset is selected by minimizing the mean squared error of the treatment effect estimator, balancing the competing risks of bias from over-borrowing and imprecision from under-borrowing. When systematic differences between ECs and RCT controls are substantial, an optional outcome calibration step can align the two groups before influence-based selection proceeds. We provide a clear, step-by-step workflow with emphasis on methodological intuition, practical considerations, and visualization, thereby offering a principled, transparent, and practical method for leveraging ECs when RCTs alone are underpowered. Implementation is supported by an accompanying \texttt{R} package InfluenceBorrowing.
0
0
stat.AP 2026-05-04

TETRIS recovers multiple respiratory signals directly from PPG

Data-driven time-frequency tessellation for signals with oscillatory amplitude envelopes and instantaneous frequency, with application to photoplethysmograhy

Partitioning the time-frequency plane along cardiac instantaneous frequency captures breathing modulations in both amplitude and frequency.

Figure from the paper full image
abstract click to expand
Biomedical signals often comprise multiple non-sinusoidal oscillatory components whose amplitude modulation (AM) and instantaneous frequency (IF) may themselves be governed by additional (second-order) oscillatory dynamics with time-varying amplitude and frequency. We introduce a novel time-frequency (TF) analysis framework, {\em Tessellation-based Ensembled Time-Frequency Representation via Integrated Shifting} (TETRIS), designed based on the proposed generalized adaptive non-harmonic model to leverage second-order oscillatory information in this class of signals. We present the model and algorithm using the photoplethysmogram (PPG) as a canonical example, whose cardiac component is known to encode respiratory information in both AM and IF, and demonstrate how respiratory signals can be recovered from PPG. The central idea of TETRIS is to partition the TF plane along the estimated IF of the cardiac component and to process each partition adaptively to enhance representation quality. This tessellation enables a refined time-frequency representation (TFR), allowing more effective recovery of the respiratory modulation governing the AM of the cardiac component. We provide theoretical justification for the proposed method and validate its performance on semi-synthetic signals. Finally, we demonstrate that TETRIS enables improved reconstruction of multiple surrogate respiratory signals directly from PPG data. While the model and algorithm are developed with a focus on PPG, the framework is flexible and has potential to be applied to other signals.
0
0
stat.AP 2026-05-04

Factor structure fixes identifiability in multivariate OU models

Factor State Space Modelling of the Ornstein-Uhlenbeck Process with Measurement Error and its Application

New state-space extension handles measurement error while recovering latent mean-reverting dynamics in biological and environmental series

Figure from the paper full image
abstract click to expand
Standard Ornstein-Uhlenbeck (OU) models often yield biased parameter estimates when measurement error is ignored. While the Ornstein-Uhlenbeck State Space Model (OUSSM) addresses this in univariate settings, multidimensional extensions remain limited. This paper introduces the factor OUSSM to model multi-dimensional, mean-reverting systems with observational noise. We resolve critical identifiability challenges in parameter estimation by establishing necessary constraints and validating the method through extensive simulations. We demonstrate the model's versatility by analyzing human gut microbiome dynamics and North Atlantic Sea Surface Temperature (SST) data. The results reveal distinct latent temporal structures in both biological and environmental systems, establishing the factor OUSSM as a robust framework for multivariate time series analysis.
0
0
stat.AP 2026-05-04

Data bounds when physicians beat trial averages

Trust Me, I'm a Doctor?

Nested randomized and observational studies let researchers bound the share of doctors whose strategies match or exceed the best trial arm.

abstract click to expand
Clinical trials usually target average treatment effects, but treatment decisions are made for individuals. This tension motivates a common criticism of evidence-based medicine: a treatment that is beneficial on average may be inappropriate for a particular patient, and skilled physicians may outperform rigid adherence to the strategy that performed best in a randomized trial. We consider how randomized and observational data from the same target population can be used to assess that possibility. Specifically, we study settings in which a randomized trial is nested within an observational cohort, so that outcomes are observed under treatment, control, and usual care. We ask what the observed data can reveal about how often physicians outperform the strategy suggested by the trial. We define a gain score to formalize this comparison and derive sharp bounds on the proportion of physicians whose personal strategies perform at least as well as, or better than, always choosing the better performing treatment from the trial. These results shed light on when clinical data support relying on physician discretion over the trial-average recommendation and when stronger justification is required.
0
0
stat.AP 2026-05-04 2 theorems

Bounds show how often doctors beat trial recommendations

Trust Me, I'm a Doctor?

In nested trial-observational designs, sharp bounds quantify the share of physicians whose strategies match or exceed the trial's better arm

abstract click to expand
Clinical trials usually target average treatment effects, but treatment decisions are made for individuals. This tension motivates a common criticism of evidence-based medicine: a treatment that is beneficial on average may be inappropriate for a particular patient, and skilled physicians may outperform rigid adherence to the strategy that performed best in a randomized trial. We consider how randomized and observational data from the same target population can be used to assess that possibility. Specifically, we study settings in which a randomized trial is nested within an observational cohort, so that outcomes are observed under treatment, control, and usual care. We ask what the observed data can reveal about how often physicians outperform the strategy suggested by the trial. We define a gain score to formalize this comparison and derive sharp bounds on the proportion of physicians whose personal strategies perform at least as well as, or better than, always choosing the better performing treatment from the trial. These results shed light on when clinical data support relying on physician discretion over the trial-average recommendation and when stronger justification is required.
0
0
stat.AP 2026-05-01

The paper demonstrates linked micromaps on U.S

Using Linked Micromaps to Explore Complex Structures in Official Statistics

Linked micromaps applied to Bureau of Labor Statistics data illustrate how visual linking of maps and charts can reveal spatial, temporal…

abstract click to expand
Over the past decade, researchers have focused increasing levels of attention on the use of survey and non-survey data to inform decision-making by multiple stakeholders. Work with such data generally requires extensive exploration before a statistics practitioner focuses on specific steps in model building and inference. For many of the resulting initial exploratory analyses, crucial issues center on the extent to which empirical results may vary over geography and subpopulations. Such information is usually presented in tabular form, which can be difficult for stakeholders and decision makers to understand and to utilize. To address these issues, this paper uses data from the U.S. Bureau of Labor Statistics to illustrate a suite of tools known as linked micromaps. These applications show how linked micromaps can help stakeholders better understand and view descriptive statistics for populations and subpopulations, explore multivariate relationships and ordinal structure, and discover patterns of heterogeneity across time and space. In addition, this paper comments briefly on the prospective use of linked micromaps in model-building and analysis of multiple components of uncertainty.
0
0
stat.AP 2026-05-01

Method finds optimal trial split across zones for hundreds of crop genotypes

Optimal allocation of trials to sub-regions in crop variety testing with multiple years and correlated genotype effects

Kinship-based mixed models plus optimization distribute a fixed budget to improve local performance predictions.

abstract click to expand
Plant breeding and variety trials are usually conducted in multiple environments sampled from a defined target population of environments in order to characterize the performance of breeding lines or varieties. When the population is large and heterogeneous, it may be sub-divided into sub-regions or zones according to administrative and agro-ecological criteria. Analysis then focuses on prediction of performance in the individual sub-regions. Modelling the genotype effect in each sub-region as random, information can be borrowed across sub-regions using best linear unbiased prediction based on a suitable variance-covariance matrix for the genotype-zone effects. Here, we consider the important case where kinship of pedigree information is available for the genotypes under test. This information can be integrated into the variance-covariance matrix for genotype-zone effects. The objective we pursue here is to determine the optimal allocation of a fixed budget of trials to sub-regions. This design problem is solved using a combination of theory and explicit equations on one hand and numerical optimization on the other hand. Our proposed novel approach allows obtaining the optimal allocation when the number of genotypes is in the hundreds, a common setting in large plant breeding programs as well as in variety testing for economically important crops.
0
0
stat.AP 2026-05-01

GCC reserving method gains explicit MSEP formula

A Note on the Generalized Cape Cod Reserving Method

Embedding the generalized Cape Cod method in a stochastic model produces a closed-form expression for its mean squared error of prediction,

Figure from the paper full image
abstract click to expand
Claims reserving is one of the most important actuarial tasks in non-life insurance modeling. There are several popular methods to perform claims reserving such as the chain-ladder (CL), the Bornhuetter--Ferguson (BF) or the generalized Cape Cod (GCC) methods. These methods have originally been introduced as deterministic algorithms, and only in a later step, they have been lifted to stochastic models allowing for analyzing claims prediction uncertainty. This holds true for the CL and the BF methods, but not for the GCC method. The purpose of this article is to close this gap and derive an analytical formula for the mean squared error of prediction (MSEP) of the GCC method.
0
0
stat.AP 2026-05-01

GPS tracks quantify HIV exposure beyond home

Estimating Population Viral Load Contextual Exposure Using GPS-Derived Activity Spaces in Rural South Africa

By combining local viral load maps with movement data, the method tracks how risk shifts as activity spaces expand for young adults.

Figure from the paper full image
abstract click to expand
This article introduces novel methodologies for estimating contextual exposure to HIV population viral load using GPS data. We propose a comprehensive analytical framework comprising (i) local (grid-cell level) estimation of HIV population viral load, (ii) derivation of individual activity spaces from GPS trajectories, and (iii) quantification of contextual exposure to HIV within these activity spaces. We integrate HIV surveillance and sociodemographic survey data with GPS-based mobility data collected in rural KwaZulu-Natal, South Africa, to characterize mobility patterns among young adults aged 20-30 years. Using derived measures of mobility and contextual exposure, we assess whether participants' sex and age systematically influence the magnitude, configuration, and heterogeneity of their mobility patterns. Furthermore, we describe analytical approaches to examine how contextual exposure to HIV evolves as activity spaces extend beyond static residential locations, outlining procedures to identify GPS-tracked participants at elevated risk of HIV acquisition. KEYWORDS: Population viral load exposure; GPS-based mobility analysis; Activity space
0
0
stat.AP 2026-04-30

Preference uncertainty induces a distribution over optimal designs

Estimating Decision Uncertainty from Preference Uncertainty: Application to Ground Vehicle Design

A framework samples random preferences to map variability onto the Pareto front and measure decision stability in engineering problems.

Figure from the paper full image
abstract click to expand
Engineering design problems are often modeled as multi-objective optimization tasks in which a scalarized utility function selects an optimal design from the Pareto set. In practice, preferences are imperfectly known, so uncertainty in the preference model leads to uncertainty in the resulting optimal design. This paper proposes a probabilistic framework that treats preference parameters as random variables and examines how preference uncertainty propagates to decision uncertainty. A random preference vector induces a probability distribution over optimal designs, allowing us to identify which regions of the Pareto front are most likely to be selected and to assess recommendation stability under preference variability. To explain the sources of this variability, we apply variance-based global sensitivity analysis to the induced optimal solutions, using Sobol' indices and Shapley values to quantify the contributions of individual design variables and their dependencies. We further summarize the overall dispersion of the optimal-design distribution using the Fr\'echet variance, which provides a scalar measure of decision stability under a given preference model. Two vehicle design case studies demonstrate how problem structure can lead to discrete versus continuous decision distributions and show how the proposed quantities support preference-aware design analysis.
0
0
stat.AP 2026-04-30 1 theorem

Satellite embeddings raise prediction accuracy for malaria and child infections

AlphaEarth Satellite Embeddings for Modelling Climate Sensitive Diseases Towards Global Health Resilience

64-dimensional representations of the Earth's surface increase R-squared values in models across multiple countries and health outcomes.

Figure from the paper full image
abstract click to expand
Malaria, childhood acute respiratory infection, and child undernutrition together account for over two million deaths annually in children under five, with the burden concentrated in low and middle-income countries where climate variability modulates transmission, exposure, and nutritional outcomes. Routine health surveillance in these settings remains sparse and reactive. Satellite-derived representations of the Earth's surface offer a scalable, low-cost complement to traditional covariates, yet their utility as predictors of population health outcomes is poorly characterised. We summarise findings from three studies evaluating AlphaEarth Foundations 64-dimensional satellite embeddings as predictors of population health outcomes, focusing on vulnerable populations. The studies span infectious disease (malaria, respiratory infection) and stunting. In each study, embeddings provide predictive value at sufficient spatial granularity: (i) malaria prediction across Nigeria shows consistent per-region R^2 gains; (ii) childhood acute respiratory infection prediction across 11 DHS countries increases pooled R^2 from 0.157 to 0.206 across three tree-based estimators; (iii) stunting prediction across 35 countries is neutral at country level due to collinearity with fixed effects. The stunting case is currently limited by lack of DHS cluster-level coordinates, which is the next key experiment.
0
0
stat.AP 2026-04-30

Elevated amyloid shortens remaining dementia-free quantiles

Bayesian Nonparametric Causal Inference for Quantile Residual Life: An Application to Alzheimer's Disease

Bayesian nonparametric model applied to ADNI data finds shorter residual survival times for elevated baseline amyloid groups at landmark cut

Figure from the paper full image
abstract click to expand
In Alzheimer's disease research, for individuals who remain dementia-free through a given follow-up time, an important clinical question is how much longer they are likely to remain dementia-free. Quantiles of this remaining time provide clinically interpretable prognostic milestones and can help characterize prognostic heterogeneity across baseline groups. We address this question in the Alzheimer's Disease Neuroimaging Initiative (ADNI), focusing on baseline amyloid status as the exposure. Estimation is challenging because amyloid status is observed rather than randomized, requiring adjustment for confounding, and because time to dementia onset is heterogeneous and heavily right-censored. We estimate causal contrasts in quantile residual life using a Bayesian nonparametric enriched Dirichlet process mixture model for the joint distribution of event times, exposure, and baseline covariates, with inference via Bayesian g-computation. The approach accommodates ignorable missing baseline covariates through data augmentation, supports inference across clinically relevant landmark times, and allows sensitivity analysis for residual unmeasured confounding. Simulation studies show good performance under complex heterogeneity and heavy censoring. In ADNI, elevated baseline amyloid was associated with shorter quantiles of remaining dementia-free time than non-elevated baseline amyloid among individuals who remained dementia-free through relevant landmark times, overall and within baseline diagnostic subgroups.
0
0
stat.AP 2026-04-30

Markov chain fixes rainfall sequencing in bias correction

Improving Bias Correction Methods for Daily Rainfall Using a Markov Chain Approach

State-dependent thresholds and adjustments better match observed wet and dry spell lengths while preserving frequency and intensity accuracy

abstract click to expand
Accurate, localised rainfall information is essential for applications such as agricultural planning, climate risk assessment, and water resources management. Gridded climate products provide rainfall information over large areas but can lack the accuracy needed at local scales, often requiring bias correction before use in local impact studies. Bias correction of daily rainfall is particularly challenging due to its complex characteristics. Local intensity scaling (LOCI) and quantile mapping (QM) are two widely used bias correction methods which adjust both rainfall frequency and intensity, but do not account for the temporal structure of daily rainfall. This can lead to biases in the representation of wet and dry spells. This study proposes integrating a two-state first-order Markov chain directly into existing bias correction methods through state-dependent rain day thresholds and rainfall adjustments, aimed at improving the temporal structure of rainfall. Two implementations of this framework are presented: Markov chain local intensity scaling (MC LOCI) and Markov chain quantile mapping (MC QM). The proposed methods were applied to AgERA5 reanalysis data with rainfall data from five stations in Zimbabwe. Results showed that the Markov chain methods outperformed LOCI and QM by improving the representation of rainfall persistence, onset, and wet and dry spell characteristics, while maintaining improvements in rain day frequency and overall rainfall statistics. These results demonstrate that the proposed methods could be beneficial for applications such as crop simulation, hydrological modelling and other applications which rely on accurate representation of rainfall sequencing.
0
0
stat.AP 2026-04-30

New model attributes heatwaves as space-time events

A spatio-temporal statistical framework for heatwave attribution under climate change

By separating warming trends from extreme clustering, the framework estimates how human influence changes heatwave probability and duration.

abstract click to expand
We develop a unified statistical framework for attributing heatwaves as spatio-temporal phenomena under climate change. We quantify the impact of anthropogenic forcing on the probability and persistence of heatwaves not captured by standard marginal extreme-value approaches. Our methodology constructs a generative model for daily temperature fields that separates marginal nonstationarity from spatio-temporal dependence. We combine three components: a Bayesian spatial quantile regression model for the bulk of the data; a nonstationary spatial generalized extreme value model for tail behavior; and a copula-based model capturing both asymptotic dependence and independence in the extremes. The framework is applied to the CMIP6 MRI-ESM2 climate model, contrasting factual and counterfactual scenarios for probabilistic attribution. Our results show that the approach captures key heatwave characteristics inaccessible to traditional methods, enabling direct estimation of event-level attribution metrics. Overall, it provides a flexible basis for analyzing and attributing complex climate extremes as space-time objects.
0
0
stat.AP 2026-04-30

Binary verdicts fail to separate replicable from non-replicable science

The Difference Between "Replicable" and "Not replicable" is not Itself Scientifically Replicable

Heterogeneity in non-exact replications creates unidentifiable uncertainty, so standard data cannot tell high-replicability sequences from低-

Figure from the paper full image
abstract click to expand
Replication studies estimate the replicability rate of scientific results by aggregating binary verdicts of experiments. Exact replications are rarely attainable, so most replication sequences are non-exact. Experiments differ in ways that matter and do not share a single data-generating process. We formalize two statistical interpretations of non-exactness. In a shared latent rate (benchmark) model, experiments are exchangeable and depend on a common random replicability rate. In a conditionally independent rates (operational) model, each experiment has its own replicability rate drawn from a population distribution. Under the benchmark model, even small variability among replicability rates induces an irreducible variance floor on the estimated mean replicability rate that no amount of replication can eliminate. Under the operational model, the degree of non-exactness is not identifiable from standard replication data, because one binary verdict per experiment carries no information about between-experiment heterogeneity. Researchers cannot tell which precision regime they are in or whether high- and low-replicability sequences can be distinguished in principle. The usual data structure cannot support reliable demarcation between "replicable" and "not replicable" results and systematically understates uncertainty, making high- and low-replicability sequences appear discriminable when they are not. We show how common sources of heterogeneity amplify these problems and demonstrate practical consequences in a reanalysis of Many Labs 4. Aggregating replicability rates across heterogeneous literatures produces averages that conflate incommensurable regimes and lack a stable interpretation. Replicability rate is not a reliable demarcation criterion. The replication crisis, if there is one, cannot be established by the methods used to declare it.
0
0
stat.AP 2026-04-30

Four macro factors price G20 equity returns better than CAPM

Pricing Global Macroeconomic Risk in Equity Markets: Evidence from Selected G20 Economies

Dynamic factor model on inflation, activity, and policy variables yields significant loadings and higher explanatory power for ten countries

abstract click to expand
This study investigates whether international equity markets systematically price global macroeconomic risks. The empirical analysis is conducted using monthly excess returns for ten G20 countries over the period 2000-2024. A Dynamic Factor Model (DFM) is employed to extract latent global factors from a set of macroeconomic variables capturing global inflation, real activity, monetary policy, term structure, exchange rates, volatility, and oil prices. The model selection criteria of the dynamic factor framework, which support a 3 factor specification that is parsimonious. The Fama MacBeth regressions demonstrate the low explanatory power of the 3-factor model. In contrast, a 4 factor specification results in economically large and statistically significant factor loadings, an obvious rise in explanatory power, and a significant improvement in model performance. The results indicate that a four-factor specification provides the best balance between explanatory power and model stability, significantly improving the ability to explain cross-sectional variation in excess returns , with all factors statistically significant. The Capital Asset Pricing Model, while offering a parsimonious and stable benchmark with consistently significant market betas, exhibits limited explanatory power due to its single factor structure. Overall, the findings suggest that macro driven latent factors extracted through the DFM provide a more comprehensive and empirically robust framework for international asset pricing than the CAPM, highlighting the importance of incorporating multiple sources of systematic risk in explaining cross-country equity returns.
0
0
stat.AP 2026-04-29

Adaptive networks let one MCMC sampler handle many similar structural updates

Adaptive Meta-Learning Stochastic Gradient Hamiltonian Monte Carlo Simulation for Bayesian Updating of Structural Dynamic Models

The method trains once and transfers to new Bayesian model updating tasks on the same class of structures without retraining.

abstract click to expand
In the last few decades, Markov chain Monte Carlo (MCMC) methods have been widely applied to Bayesian updating of structural dynamic models in the field of structural health monitoring. Recently, several MCMC algorithms have been developed that incorporate neural networks to enhance their performance for specific Bayesian model updating problems. However, a common challenge with these approaches lies in the fact that the embedded neural networks often necessitate retraining when faced with new tasks, a process that is time-consuming and significantly undermines the competitiveness of these methods. This paper introduces a newly developed adaptive meta-learning stochastic gradient Hamiltonian Monte Carlo (AM-SGHMC) algorithm. The idea behind AM-SGHMC is to optimize the sampling strategy by training adaptive neural networks, and due to the adaptive design of the network inputs and outputs, the trained sampler can be directly applied to various Bayesian updating problems of the same type of structure without further training, thereby achieving meta-learning. Additionally, practical issues for the feasibility of the AM-SGHMC algorithm for structural dynamic model updating are addressed, and two examples involving Bayesian updating of multi-story building models with different model fidelity are used to demonstrate the effectiveness and generalization ability of the proposed method.
0
0
stat.AP 2026-04-29

Satellite ammonia data sharpens farm carbon estimates in Po Valley

On the use of satellite information to estimate agricultural carbon footprint in a small area framework

Model integrates Earth observation to improve precision at agrarian subregion scale while reducing need for bulky auxiliary datasets.

Figure from the paper full image
abstract click to expand
The agricultural sector is undergoing rapid change due to climate pressures, demographic shifts, and uneven economic development, increasing the demand for reliable environmental indicators at fine spatial scales. However, limited data availability often constrains subregional analyses. This study develops a model-based framework for producing reliable small-area estimates for assessing the agricultural carbon footprint in the Po Valley (Northern Italy), a region characterized by intensive livestock farming and high environmental pressure. We integrate survey, census, and satellite-derived emission data into a unified framework and produce estimates at the level of Agrarian Subregions, defined as agriculturally homogeneous municipalities by the Italian National Institute of Statistics. Satellite-based ammonia emission data are incorporated as auxiliary covariates to improve precision and spatial coherence. A key methodological contribution is the treatment of spatial misalignment between gridded satellite data and administrative boundaries. This issue is addressed through a geostatistical upscaling procedure combined with a parametric bootstrap that propagates uncertainty from the covariate construction stage to the final small-area estimates. The results show that satellite-derived information substantially improves the accuracy and stability of carbon footprint estimates while reducing reliance on large, heterogeneous auxiliary datasets, illustrating the potential of Earth observation data in model-based environmental statistics.
0
0
stat.AP 2026-04-28

Modified tempering explores multimodal HMM posteriors in whale data

Bayesian inference for hidden Markov models under genuine multimodality with application to ecological time series

New priors and algorithm tweaks allow full sampling when inferring blue whale dive behaviors under sound stimuli.

Figure from the paper full image
abstract click to expand
Bayesian inference in hidden Markov models (HMMs) can be challenging due to the presence of multimodality in the likelihood function, and consequently in the joint posterior distribution, even after correcting for label switching. The parallel tempering (PT) algorithm, a state-space augmentation method, is a widely used approach for dealing with multimodal distributions. Nevertheless, standard implementation of the PT algorithm may not always be sufficient to effectively explore the high-dimensional, complex multimodal posterior distributions that arise in HMMs. In this work, we demonstrate common pitfalls when implementing the PT algorithm for HMMs, approaches to remedy them, and introduce new non-informative prior distributions that facilitate effective posterior distribution exploration. We analyse time series of blue whale dive data with two 3-state HMMs in a Bayesian framework, one of which includes a categorical covariate in the transition probability matrix to account for the effect of sound stimuli on the whale's behavior. We demonstrate how effective implementation of the modified PT algorithm for Bayesian inference leads to effective exploration of the resultant multimodal posterior distribution and how that affects inference for the underlying movement patterns of the blue whales.
0
0
stat.AP 2026-04-28

Hawkes model with pauses matches LOB volatility slopes

Extended State-dependent Hawkes Process for Limit Order Books: Mathematical Foundation and the Reproduction of Volatility Signature Plots

Physical geometry enforcement lets the process reproduce upward slopes in signature plots by modeling super-criticality from marketable 0rd4

Figure from the paper full image
abstract click to expand
This paper proposes an Extended State-Dependent Hawkes Process (ExsdHawkes) to model the intricate dynamics of Limit Order Books (LOBs). Our theoretical contribution lies in relaxing traditional constraints by allowing for state disappearances -- a phenomenon frequently observed in high-frequency trading. We mathematically prove, using Karush--Kuhn--Tucker (KKT) conditions, that the maximum likelihood estimation remains separable, justifying an efficient two-step procedure. In the empirical section, we apply our model to three months of high-frequency tick data of Mitsubishi UFJ Financial Group (8306). We demonstrate that ExsdHawkes uniquely reproduces the volatility signature plot's characteristic upward slope by capturing the "local super-criticality" triggered during disequilibrium states. Crucially, we identify Marketable Limit Orders (MLO) as the primary catalyst that forces the LOB into these unstable states. Comparative analysis reveals that models lacking physical constraints (e.g., standard SD-Hawkes) suffer from explosive branching ratios and fail to maintain simulation stability. Our findings suggest that physical consistency is not merely a mathematical nicety, but a prerequisite for accurately modeling macro-level volatility. By enforcing the physical geometry to `pause' the residual accumulation during inadmissible periods, ExsdHawkes uniquely maintains statistical integrity where unconstrained models succumb to structural bias.
0
0
stat.AP 2026-04-27

Kernel model links plaque size to neighboring cell genes in Alzheimer tissue

Sparse Reduced-rank Regression Methods for Spatially Misaligned Data with Application to Spatial Transcriptomics

The method weights spatial transcriptomics by proximity, selects genes automatically, and borrows strength across cell types while handling

Figure from the paper full image
abstract click to expand
Understanding the spatiotemporal dynamics of disease progression in relation to transcriptomic profiles provides key insights into complex conditions such as Alzheimer disease. To enable such investigations, STARmap PLUS technology offers joint profiling of high-resolution spatial transcriptomics and protein detection within the same tissue section. Motivated by data from Zeng et al. (2023), we develop a novel kernel-weighted regression framework that models plaque size as a collective effect of the spatial transcriptomics of neighboring cells, automatically integrating across cell types and tissue samples from different disease states. To further strengthen interpretability and efficiency, we incorporate a sparse low-rank factorization that enables gene selection while borrowing strength across genes, cell types, and time points. The proposed approach is implemented in a fully automated manner with data-driven specification of key model components. Through simulation studies, we demonstrate the robustness of the proposed method and its superiority across a range of specification scenarios. Applied to Alzheimer disease data, the proposed framework uncovers biologically meaningful associations, highlighting its potential for advancing the understanding of disease mechanisms.
0
0
stat.AP 2026-04-27

Basic temperature model predicts spring timing changes exactly

How temperature regimes near the equinox synchronize spring biological events

Stopped random walk theory shows why events lose coordination as they move later under warming, matching lilac data without extra factors.

Figure from the paper full image
abstract click to expand
Many biological processes, including plant leafout and flowering, occur once cumulative temperatures reach a threshold (the thermal-sum model). In this way, temperatures are thought to coordinate the timing of biological events. But growing evidence suggests that as climates warm, both the advancement of spring has slowed (declining sensitivity) and the variance in the timing of spring events has increased (declining synchrony), raising questions about the resilience of temperature-based coordination to anthropogenic climate change. To answer these questions, researchers have complicated the thermal-sum model, introducing additional factors and mechanisms. We consider whether such complexity is necessary. Using results from the theory of stopped random walks, we show that sensitivity and synchrony are exactly as predicted by the basic thermal-sum model. The theory suggests a nonlinear relationship between temperatures and both the timing and synchrony of biological events. In particular, it predicts that as temperatures increase and springtime events shift from the equinox toward the solstice, the events themselves become less coordinated and more variable. We verify these predictions using experimental and real-world data, including 10,000 observations of common lilacs (United States, 1956-2025). We conclude that the theory provides a powerful tool for understanding the thermal-sum model, particularly when considering additional complexity.
0
0
stat.AP 2026-04-27

Latent model quantifies human influence on US heat extremes

Estimating Causal Attribution of Anthropogenic Forcing on High-Temperature Extremes Using a Latent Gaussian Spatial Model

Comparing factual and counterfactual climate runs yields maps of credible causal effects on annual temperature maxima.

Figure from the paper full image
abstract click to expand
Climate change has become a significant global concern due to its capacity to cause substantial disruption to daily life by increasing the frequency and intensity of extreme weather events. Given the rising trend of human interventions in the climate system over recent decades, this study aims to quantify the relative contribution of anthropogenic forcing to the increasing likelihood of climate extremes, with a particular emphasis on high-temperature extremes. Our analysis focuses on annual temperature maxima from the IPSL-CM6A model in the CMIP6 experiment. We propose a novel causal inference framework that focuses on differences in return levels derived from annual temperature maxima between the factual and counterfactual worlds. While jointly modeling the annual maxima from the two worlds using a bivariate generalized extreme value distribution, we model the spatially-varying coefficients using a latent Gaussian framework. Specifically, given that the data are available over a $1^\circ \times 1^\circ$ grid, we employ the multivariate intrinsic conditional autoregressive model for the latent layer in the proposed hierarchical model, ensuring proper posterior distributions. We implement a recently developed highly-efficient approximate Bayesian inference technique, `Max-and-smooth', that uses a Laplace approximation of the likelihood and then performs Gibbs sampling based on the approximate posterior. The results include posterior estimates of the causal effect of anthropogenic forcing on high-temperature extremes, along with the trends in this effect, over the factual world. Furthermore, we estimate credible regions for a significant causal effect to facilitate hotspot detection across the mainland United States.
0
0
stat.AP 2026-04-27

Adaptive rotations make MCMC samplers work on any structural model

MCMC with Adaptive Principal-Component Transformation: Rotation-Invariant Universal Samplers for Bayesian Structural System Identification

The method unifies translation, scale and rotation invariances so training on simple tasks transfers zero-shot to new models.

abstract click to expand
Over decades, Markov chain Monte Carlo (MCMC) methods have been widely studied, with a typical application being the quantification of posterior uncertainties in Bayesian system identification of structural dynamic models. To address the issue of excessively low sampling efficiency in generic MCMC methods when applied to specific problems, researchers developed several MCMC algorithms that integrate trainable neural networks to replace and enhance their critical components. Later, meta-learning MCMC methods emerged to reduce training time. However, they require considerable similarity between test and training tasks, while their sampling efficiency is constrained by trade-off-simplified network designs. This paper proposes the Adaptive Principal-Component (PC) Meta-learning Stochastic Gradient Hamiltonian Monte Carlo (APM-SGHMC) algorithm. It adaptively rotates coordinate axes in the parameter space to align with the PC directions of the current posterior samples, ensuring rotation-invariance of sampling performance with respect to the posterior distribution. By incorporating translation-invariance, scale-invariance, and rotation-invariance in a unified framework, APM-SGHMC enables universal samplers to acquire generalizable knowledge across diverse Bayesian system identification tasks using minimalistic tasks while eliminating the constraints imposed by network design trade-offs on sampling efficiency. Practical feasibility issues are also addressed. Two Bayesian system identification case studies demonstrate its effectiveness and universality: our method overcomes the case-by-case limitations of traditional data-driven approaches, achieving zero-shot generalization across structurally distinct models without retraining and maintaining consistent superior performance across all scenarios.
0
0
stat.AP 2026-04-27

Embeddings show how Beatles songwriting evolved

Come Together: Analyzing Popular Songs Through Statistical Embeddings

Logistic PCA converts chords, notes and contours into vectors that cluster by album and track style shifts from 1962 to 1966

Figure from the paper full image
abstract click to expand
Statistical modeling of popular music presents a unique challenge due to the complexity of song structures, which cannot be easily analyzed using conventional statistical tools. However, recent advances in data science have shown that converting non-standard data objects into real vector-valued embeddings enables meaningful statistical analysis. In this work, we demonstrate an approach based on logistic principal component analysis to construct embeddings from global song features, allowing for standard multivariate analysis. We apply this method to a corpus of Lennon and McCartney songs from 1962-1966, using embeddings derived from chords, melodic notes, chord and pitch transitions, and melodic contours. Our analysis explores how these song embeddings cluster by Beatles album, how songwriting styles evolved over time, and whether Lennon and McCartney's compositions exhibited convergence or divergence. This embedding-based approach offers a powerful framework for statistically examining musical structure and stylistic development in popular music.
0
0
stat.AP 2026-04-27

Bilinear model improves forecasts of extreme events in aircraft production

Multi-output Extreme Spatial Model for Complex Aircraft Production Systems

By modeling control variables and measurement locations as separate spatial domains, it captures tail dependencies that standard methods in

abstract click to expand
Problem definition: Data-driven models in machine learning have enabled efficient management of production systems. However, a majority of machine learning models are devoted to modeling the mean response or average pattern, which is inappropriate for studying abnormal extreme events that are often of primary interest in aircraft manufacturing. Since extreme events from heavy-tailed distributions give rise to prohibitive expenditures in system management, sophisticated extreme models are urgently needed to analyze complex extreme risks. Engineering applications of extreme models usually focus on individual extreme events, which is insufficient for complex systems with correlations. Methodology/results: We introduce an extreme spatial model for multi-output response control systems that efficiently captures the dynamics using a bilinear function on two spatial domains for control variables and measurement locations. Marginal parameter modeling and extremal dependence have been investigated. In addition, an efficient graph-assisted composite likelihood estimation and corresponding computational algorithms are developed to cope with high-dimensional outputs. The application to composite aircraft production shows that the proposed model enables comprehensive analyses with superior predictive performance on extreme events compared to canonical methods. Managerial implications: Our method shows how to use an extreme spatial model for predicting extreme events and managing extreme risks in complex production systems such as aircraft. This can help achieve better quality management and operation safety in aircraft production systems and beyond.
0
0
stat.AP 2026-04-27

Dual-thresholding boosts short CNA detection accuracy in noisy data

Tail-Greedy Unbalanced Haar Wavelet Segmentation for Copy Number Alteration Data

Tail-greedy unbalanced Haar method achieves higher true positives and lower false positives than CBS, HaarSeg, and FDRSeg for copy number

Figure from the paper full image
abstract click to expand
Detecting copy number alterations (CNAs) from next-generation sequencing data remains challenging, particularly for short segments under noisy conditions. Existing segmentation methods often suffer from high false positive rates or fail to reliably detect short aberrations, especially in low-coverage data. In this study, we propose a modified tail-greedy unbalanced Haar (TGUHm) method that introduces a dual-thresholding strategy to improve segmentation accuracy. The proposed approach effectively suppresses spurious spikes while preserving sensitivity to both short and long CNA segments. Extensive simulation studies under Gaussian and heavy-tailed noise demonstrate that TGUHm consistently achieves higher true positive rates and lower false positive rates compared to state-of-the-art methods, including CBS, HaarSeg, and FDRSeg. In particular, the proposed method improves detection accuracy for short segments while maintaining competitive overall performance. Application to real cancer genomic data further confirms the practical utility of the method, revealing biologically meaningful CNAs associated with known cancer-related genes. These results suggest that TGUHm provides a robust and effective framework for CNA detection in challenging sequencing settings.
0
0
stat.AP 2026-04-27

Latent-space method speeds Bayesian updating of building FE models

Finite element model updating of building structures under seismic excitation: A parallelized latent space-based Bayesian framework

Projects seismic response data to low dimensions for fast GPU-parallel sampling and reliable uncertainty estimates even with sparse, complex

Figure from the paper full image
abstract click to expand
Enhancing seismic fragility and risk assessment of nuclear power plants relies on accurate prediction of reactor building responses to seismic hazards, which can be further improved through dynamic analysis of high-fidelity finite element (FE) models. However, FE models often exhibit non-negligible discrepancies from actual structures due to various sources of uncertainty, necessitating FE model updating with rigorous quantification of associated uncertainties. This paper presents a GPU-accelerated latent space--based Bayesian framework for FE model updating of building structures. In the proposed framework, high-dimensional structural response data (e.g., time histories or frequency response functions) are projected into a low-dimensional latent space using a multimodal variational autoencoder (MVAE), thereby enabling efficient and tractable likelihood evaluation without explicit modeling in the original observation space. Once trained, the surrogate enables amortized inference, allowing posterior sampling to be performed without additional simulator evaluations. We specifically employ a sequential Monte Carlo (SMC) sampler, whose population-based formulation allows parallel evaluation of the approximate likelihood on GPUs, resulting in computational efficiency and robustness against multimodal and complex posterior distributions. The proposed framework is validated through both numerical benchmarking and experimental data from a shaking table test of a reinforced concrete building structure. The results demonstrate that the method accurately estimates structural parameters with well-quantified uncertainties, while achieving fast and efficient inference through GPU-based parallelization, and enabling robust inference even in the presence of sparse observations that induce multimodal and highly complex posterior distributions.
0
0
stat.AP 2026-04-27

Specific-source feature LR systems top performance but are hardest to implement

From specific-source feature-based to common-source score-based likelihood-ratio systems: ranking the stars

Common-source score-based systems are easiest yet lowest performing, except common-source feature-based which balances both well

abstract click to expand
This paper studies expected performance and practical feasibility of the most commonly used classes of source-level likelihood-ratio (LR) systems when applied to a trace-reference comparison problem. The paper compares performance of these classes of LR systems (used to update prior odds) to each other and to the use of prior odds only, using strictly proper scoring rules as performance measures. It also explores practical feasibility of the classes of LR systems. The present analysis allows for a ranking of these classes of LR systems: from specific-source feature-based to common-source anchored or non-anchored score-based. A trade-off between performance and practical feasibility is observed, meaning that the best performing class of LR systems is the hardest to realise in practice, while the least performing class is the easiest to realise in practice. The other classes of LR systems are in between the two extremes. The one positive exception is a common-source feature-based LR system, with good performance and relatively low experimental demands. The paper also argues against the claim that some classes of LR systems should not be used, by showing that all systems have merit (when updating prior odds) over just using the prior odds (i.e. not using the LR system).
0
0
stat.AP 2026-04-24

Activity pattern changes modeled as deformations track physical function

Modeling Physical Activity Change as Smooth Transformations: Temporal and Amplitude Patterns Associated with Physical Function in Older Women

Riemannian analysis of accelerometer curves finds a leading mode of daily activity increase tied to better function in older women, beyond

abstract click to expand
Background: Minute-level accelerometer data capture rich diurnal physical activity (PA) patterns, but conventional summary metrics obscures clinically meaningful changes accumulated across a day. Building on Riemannian framework, we integrate multivariate functional principal component analysis (MFPCA) to identify main modes of PA change in older women and examine associations with physical function (PF). Method: A subset participant from OPACH as baseline and two WHISH follow-ups (W1, W2), yielded 3 accelerometer measurements; each participant's diurnal PA at each visit was represented as a smooth curve. Change between consecutive visits (defined as periods: baseline-W1, W1-W2) was modeled as a Riemannian deformation (RD) jointly capturing changes in PA timing and magnitude. Deformations were parameterized by initial momenta and summarized using MFPCA; participant-level changes were characterized by principal component (PC) scores and deformation energy (DE), a metric of overall pattern change. Associations with PF were assessed using linear mixed models. Results: Mean deformation in both periods showed overall downward shifts in PA magnitude with temporal redistribution between 10am and 7pm. Top 15 PCs explained >= 90% of variability in both periods; PC1 represented a pattern of PA increase/decrease throughout the day, explaining 22.4% (baseline-W1) and 20.8% (W1-W2). Among complete data (N=1157), an increase in PA in the mode of PC1 was positively associated with PF (p <0.0001). The interaction between DE and period was significantly associated with PF (p=0.003). Conclusions: Modeling longitudinal PA change as RDs and summarizing variability via MFPCA produced clinically interpretable phenotypes of diurnal PA change beyond standard metrics. The leading deformation mode was significantly associated with PF, and DE showed a stronger association with PF in the later period.
0
0
stat.AP 2026-04-24

Bayesian model detects shared neural responses in fMRI with uncertainty

Bayesian Sparsity Modeling of Shared Neural Response in Functional Magnetic Resonance Imaging Data

Sparse Gaussian process estimation plus horseshoe prior yields better activation maps and response estimates than intersubject correlation.

Figure from the paper full image
abstract click to expand
Detecting shared neural activity from functional magnetic resonance imaging (fMRI) across individuals exposed to the same stimulus can reveal synchronous brain responses, functional roles of regions, and potential clinical biomarkers. Intersubject correlation (ISC) is the main method for identifying voxelwise shared responses and per-subject variability, but it relies on heavy data summarization and thousands of regional tests, leading to poor uncertainty quantification and multiple testing issues. ISC also does not directly estimate a shared neural response (SNR) function. We propose a model-based alternative applicable to both task-based and naturalistic fMRI that simultaneously identifies spatial regions of shared activity and estimates the SNR function. The model combines sparse Gaussian process estimation of the response function with a Bayesian sparsity prior inspired by the horseshoe prior to detect voxel activation. A spatially structured extension encourages neighboring voxels to exhibit similar activation patterns. We examine the model's properties, evaluate performance via simulations, and analyze two real-world fMRI datasets, including one task-based and one naturalistic dataset. The Bayesian framework provides principled uncertainty quantification for the shared response function and shows improved activation detection and response estimation compared to standard approaches. Model fits demonstrate comparable or superior performance relative to ISC, while the framework opens avenues for clinical applications.
0
0
stat.AP 2026-04-24

Concurrent floods and droughts become more likely in Danube by 2100

Exploring climate change effects on concurrent floods and concurrent droughts via statistical deep learning

Statistical deep learning shows the rise stems largely from stronger links between extremes across catchments.

Figure from the paper full image
abstract click to expand
Concurrent floods and concurrent droughts in nearby catchments pose challenges to risk assessment and water management. Climate change is affecting extremely high and low discharge, but the complex interplay between changes in individual catchments and in the dependence across catchments make it difficult to provide accurate assessments of the occurrence probabilities of concurrent extremes. In this work, we use a contemporary statistical deep learning model (the deep SPAR framework) to capture concurrent river floods and droughts in four catchments in the Upper Danube basin, based on discharge simulated by a hydrological model driven with large ensemble climate model output. The statistical model is able to accurately capture the multivariate extremes of the simulated discharge, which we assess by making use of the large available sample size. We subsequently use our statistical model to study changes in joint tail behaviour of discharge over time, finding that both compound flooding and drought-like conditions are becoming increasingly likely towards the end of the 21st century under a high-emission scenario. In particular, our results highlight that changes in the dependence structure of extremes strongly contribute to the detected changes, an aspect that would be difficult to capture with traditional approaches. This work paves the way for highly flexible, general inference on compound extremes in hydrological applications, and demonstrates key advantages of using statistical deep learning in this setting.
0
0
stat.AP 2026-04-24

Boundary conditions fix optimal basis risk weight for expectile insurance

Optimal basis risk weighting in expectile-based parametric insurance

Utility maximization yields existence and uniqueness rules, plus simulation results on premium loading and risk aversion for hurricane cover

Figure from the paper full image
abstract click to expand
Parametric insurance contracts translate index measurements to compensation for policyholders' losses using predefined payment schemes. These need to be designed carefully to keep basis risk, i.e. the disparity between payouts and true damages, small. Previous research has motivated the use of conditional expectiles as payment schemes, whose compensation is impacted by the policyholder's potentially unknown attitude towards basis risk. To alleviate this model uncertainty and to investigate the impact of (hidden) influencing factors, we characterize existence and uniqueness of the optimal basis risk weighting in a utility-maximization framework through a set of boundary conditions. In the absence of an optimal solution, we provide comparisons to the utility of no insurance and full indemnity coverage. We establish a link between location-scale distributions and separability of conditional expectiles' derivatives, thus improving the understanding of these statistical functionals. A simulation study on parametric hurricane insurance visualizes our results, investigates the influence of premium loading and risk aversion on the optimal weighting, and comments on the challenge of (spatial) loss dependence.
0
0
stat.AP 2026-04-24

Few attorneys drive most Philadelphia evictions

Legal Infrastructure Organizes Eviction: Evidence from Philadelphia

755,000 court records show repeat filers, same addresses, and judge-lawyer pairings organize the process.

Figure from the paper full image
abstract click to expand
The filing-side legal infrastructure of eviction is studied using the Philadelphia Municipal Court docket. Using 755.004 landlord--tenant records filed from 1969 to 2022, with 747.125 residential filings, it shows that eviction is organized upstream by a concentrated plaintiff-side bar, durable plaintiff--attorney dependence, repeated use of the same properties, and repeat exposure of tenants to the court system. In 1983--2022, the ten most active plaintiff attorneys handled an average of 82.0% of represented plaintiff-side cases, compared with 14.7% for the ten most active plaintiffs. Large plaintiffs are also highly dependent on dominant counsel: among plaintiffs with at least 101 cases, the mean top-1 attorney share is 78.3%. Repeated filing is likewise central. Across the residential docket, 50.6% of cases occur at addresses with a prior filing in the preceding year, and 24.6% occur at addresses with six or more prior filings. Those repeated addresses are usually same-plaintiff repeats and are processed through a more default-heavy, agreement-light pathway. A narrower mechanism is also examined: plaintiff adoption of specialist plaintiff-side counsel. Filing-margin event studies show adoption-linked reorganization rather than clean throughput effects, while within-plaintiff and within-plaintiff--property comparisons show the most stable changes in judgment by agreement, fee share, and lockout-trigger language. The contribution is an upstream account of eviction as plaintiff-side legal infrastructure: a layer of concentrated counsel, repeated places, and recurring tenants through which filings are produced before any courtroom bargaining occurs.
0
0
stat.AP 2026-04-24 Recognition

Ten attorneys handle 82% of Philadelphia eviction cases

Legal Infrastructure Organizes Eviction: Evidence from Philadelphia

Docket analysis shows concentrated counsel, landlord dependence on specialists, and repeat addresses structure filings upstream of court.

Figure from the paper full image
abstract click to expand
The filing-side legal infrastructure of eviction is studied using the Philadelphia Municipal Court docket. Using 755.004 landlord--tenant records filed from 1969 to 2022, with 747.125 residential filings, it shows that eviction is organized upstream by a concentrated plaintiff-side bar, durable plaintiff--attorney dependence, repeated use of the same properties, and repeat exposure of tenants to the court system. In 1983--2022, the ten most active plaintiff attorneys handled an average of 82.0% of represented plaintiff-side cases, compared with 14.7% for the ten most active plaintiffs. Large plaintiffs are also highly dependent on dominant counsel: among plaintiffs with at least 101 cases, the mean top-1 attorney share is 78.3%. Repeated filing is likewise central. Across the residential docket, 50.6% of cases occur at addresses with a prior filing in the preceding year, and 24.6% occur at addresses with six or more prior filings. Those repeated addresses are usually same-plaintiff repeats and are processed through a more default-heavy, agreement-light pathway. A narrower mechanism is also examined: plaintiff adoption of specialist plaintiff-side counsel. Filing-margin event studies show adoption-linked reorganization rather than clean throughput effects, while within-plaintiff and within-plaintiff--property comparisons show the most stable changes in judgment by agreement, fee share, and lockout-trigger language. The contribution is an upstream account of eviction as plaintiff-side legal infrastructure: a layer of concentrated counsel, repeated places, and recurring tenants through which filings are produced before any courtroom bargaining occurs.
0
0
stat.AP 2026-04-23

Expected Threat error follows log-normal distribution

Model quality in football: Quantifying the quality of an Expected Threat model

Simulations and expert input set the error level at which player evaluations lose reliability for scouting use.

abstract click to expand
The recent growth in data availability in football has increased the risk of incorrect use of data-driven models, making guidelines on their validation and application necessary. The Expected Threat (xT) model is an accessible option for football organizations that start building in-house methods, yet little is known about how to assess its quality. The aim of this study is twofold: to examine how the model error depends on the number of game states and the number of training points, and to translate these results into guidelines for constructing and applying the model. Using the Markov chain underlying the model, we perform theoretical analyses and simulations to study the model error. These show that the model error is approximately log-normally distributed for a specified number of training points and game states. Additionally, we combine the simulations with expert consultation to establish the model error beyond which player evaluations based on the Expected Threat model become unreliable for scouting applications. From this, we derive rules of thumb to ensure the quality of an Expected Threat model before application, and we illustrate through an example how a validated model can be applied in practice. Because the approach generalizes to Expected Possession Value models, this paper illustrates a framework to systematically quantify model quality, despite the ground truth being unobservable in football analytics.
0
0
stat.AP 2026-04-23

3D pattern matching boosts conflict fatality forecasts

The geometry of conflict : 3D Spatio-temporal patterns in fatalities prediction

Historical diffusion shapes matched via Earth Mover's Distance outperform the leading VIEWS ensemble model

Figure from the paper full image
abstract click to expand
Understanding how conflict events spread over time and space is crucial for predicting and mitigating future violence. However, progress in this area has been limited by the lack of methods capable of capturing the intricate, dynamic patterns of conflict diffusion. The complex nature of those trends needs flexibility in the models to untangle them. This study addresses this gap by analyzing spatio-temporal conflict fatality data using an innovative approach that transforms the data into three-dimensional patterns at the Prio-Grid level. In this paper, a shape-based model called ShapeFinder is adapted. By applying the Earth Movers Distance (EMD) algorithm, we detect and classify these patterns, allowing us to compare and match patterns with high adaptive capacity in all dimensions. Using historical similar patterns, we generate predictions of conflict fatalities and compare these with forecasts from the Views ensemble model, a leading benchmark. Our findings demonstrate that recognizing and analyzing conflict diffusion patterns significantly improves predictive accuracy, outperforming the benchmark model. This research contributes to the study of conflict dynamics by introducing a novel pattern recognition framework that enhances the analysis of spatio-temporal data and offers practical applications for early warning systems.
0
0
stat.AP 2026-04-23

Bayesian models recover missing counts in diagnostic tables

Bayesian Inference for Incomplete 2x2 Diagnostic Tables

Hierarchical priors yield posterior estimates and uncertainty for sensitivity and specificity when only partial 2x2 table information is in

abstract click to expand
Incomplete reporting of diagnostic accuracy data remains a persistent problem in medical research. In many studies, only part of the 2x2 diagnostic table is reported, leaving denominators for diseased and non-diseased groups unknown and preventing direct calculation of sensitivity, specificity, predictive values, and related operating characteristics. To address this limitation, we develop hierarchical Bayesian models for reconstructing incomplete 2x2 diagnostic tables from such partial information. Two motivating scenarios are considered: one in which only a single test-outcome row is observed, and another in which true positives, false positives, and the total sample size are reported but the remaining cells are missing. The proposed models are illustrated on a benchmark breast MRI study with complete counts, treated as partially observed in order to assess reconstruction performance under controlled missingness. The framework yields posterior inference for the missing cell counts and associated diagnostic measures, together with uncertainty quantification in weakly identified settings.
0
0
stat.AP 2026-04-23

Weather shifts flip GB winter shortfall severity

Assessing the Shortfall Risk of GB Electricity Grid using Shifts in Winter Weather Conditions

Day-of-week and holiday alignment can make the same winter data the most or least critical for electricity supply security.

Figure from the paper full image
abstract click to expand
Extreme weather events during peak winter periods drive resource adequacy risk in Great Britain (GB), with weather sensitivity of the supply-demand balance increasing through additional electric heating and wind generation. This work develops an approach of time-shifting weather within the peak season, through adjustment of the relevant terms in a statistical model for demand. This allows more complete consideration of the security of supply consequences of a weather series, as there will be relevant conditions where demand is suppressed due to weather occurring at a weekend or during the Christmas holiday. Results on a GB example show that consideration of this counterfactual is indeed important, and specifically that winter 2010-11 can either be the most severe in the dataset, or insignificant within the resource adequacy model, depending on the alignment of day-of-week with the weather series. Statistical interpretation of the shift model is discussed, which is straightforward for alignment of day-of-week with weather assuming that all seven alignments are equiprobable; but is more subtle for shifting weather in and out of Christmas, as there is no natural maximum on the realistic length of shift, but too large a shift may be physically unrealistic. It is likely that in all systems, assessment of a weather year's severity is incomplete without such consideration of the day-of-week effect; however, whether longer shifts of weather with respect to date need to be considered will depend on the presence of a major holiday (such as Christmas in GB) in the peak season.
0
0
stat.AP 2026-04-23

FCS imputations complete SHARELIFE life histories

SHARELIFE Imputations

The filled values match observed patterns, inverse-propensity estimates, and external benchmarks from regular SHARE waves.

Figure from the paper full image
abstract click to expand
This report describes the SHARELIFE-MI project, which aims to generate multiple imputations for missing values in the life-course data collected in SHARELIFE Waves 3 and 7. The SHARELIFE study reconstructs individual life histories through retrospective questions covering key biographical domains such as partnerships, fertility, employment, and residence. As in the regular SHARE waves, item nonresponse represents an important source of nonsampling error - particularly for monetary variables, which require conversions across multiple currencies and long time periods. We document the preliminary data recoding and harmonization steps, as well as the design, specification, and implementation of an imputation model based on the fully conditional specification approach. Finally, we assess the internal and external validity of the resulting imputations through comparisons with the observed data, alternative nonresponse adjustments based on inverse propensity weighting, and external benchmarks from the regular SHARE waves.
0
0
stat.AP 2026-04-23

Smartwatch data shows V-shaped football fever in fans

Time-dependent structural equation modeling of fans' football fever using activity tracking data during the 2025 DFB Cup final

Time-dependent models of heart rate and stress during a cup final reveal arousal that starts high, drops, then rebounds with large fan-to-f

Figure from the paper full image
abstract click to expand
Football fans frequently exhibit pronounced emotional and physiological reactions during high-stakes matches. However, the temporal dynamics of this football fever are rarely modeled as a latent process. Using intensive longitudinal data from Arminia Bielefeld supporters who wore smartwatches during the 2025 German Football Association (DFB) Cup final, we investigate how football fever unfolds. The devices recorded heart rate, stress level, and related indicators in short intervals, allowing us to construct a latent variable for football fever and model its dynamics. We specify a time-dependent structural equation model with latent growth components and autoregressive effects to capture both overall trends and short-term carry-over effects in fans' physiological responses. Results are aggregated across multiple imputations of missing measurements. Model fit is evaluated using adjustments for the high data dimensionality. The results show that football fever follows a V-shaped trajectory: high at kick-off, followed by a steady decline until the renewed arousal in the second half, with substantial between-fan heterogeneity in both baseline level and temporal dynamics. Our findings demonstrate that football fever can be adequately represented as a latent variable using structural equation modeling and reflected by wearable technology data. This highlights the importance of accounting for temporal dependence when studying dynamic emotional phenomena, e. g., in sports spectatorship.
0
0
stat.AP 2026-04-23

Viral load data sharpens estimates of household infection times

Bayesian inference for disease transmission models informed by viral dynamics

A cut Bayesian model links individual viral trajectories to stochastic household spread and recovers parameters without bias at high sample

Figure from the paper full image
abstract click to expand
Infectious disease dynamics operate across multiple biological scales, with within-host viral dynamics being a key driver of between-host transmission. However, while models that explicitly link these scales exist, none have been developed with statistical inference as a primary goal. In this paper we propose a multiscale model that jointly captures heterogeneous individual-level viral load trajectories and stochastic household transmission, and develop efficient inference methods to fit it to data. Since full joint inference is computationally difficult, we employ a cut approach that passes information from the within-host to the between-host model but not vice versa. This enables the data on viral loads to inform the transmission parameters such as the infection times and symptom onset thresholds. We evaluate the framework on simulated household outbreak data, assessing parameter recovery, computational efficiency, and the effect of viral load sampling frequency on inference quality. Parameter recovery is unbiased when the sampling frequency of the viral loads is high enough. When sampling is sparse, some bias is introduced, but incorporating external viral load data can mitigate this.
0
0
stat.AP 2026-04-22

Latent Gaussian field models EV charging demand forecasts

Spatio-temporal modelling of electric vehicle charging demand

Scotland dataset and INLA inference add spatial-temporal structure plus calibrated uncertainty for grid and charger planning.

Figure from the paper full image
abstract click to expand
Accurate forecasting of electric vehicle (EV) charging demand is critical for grid management and infrastructure planning. Yet the field continues to rely on legacy benchmarks; such as the Palo Alto (2020) dataset; that fail to reflect the scale and behavioral diversity of modern charging networks. To address this, we introduce a novel large-scale longitudinal dataset collected across Scotland (2022 2025), which release it as an open benchmark for the community. Building on this dataset, we formulate EV charging demand as a spatio-temporal latent Gaussian field and perform approximate Bayesian inference via Integrated Nested Laplace Approximation (INLA). The resulting model jointly captures spatial dependence, temporal dynamics, and covariate effects within a unified proba bilistic framework. On station-level forecasting tasks, our approach achieves competitive predictive accuracy against machine learning baselines, while additionally providing principled uncertainty quan tification and interpretable spatial and temporal decompositions properties that are essential for risk-aware infrastructure planning.
0
0
stat.AP 2026-04-22

Bayesian priors cut early misclassification of at-risk students by up to 42%

Early Prediction of Student Performance Using Bayesian Updating with Informative Priors Across Cohorts

Updating models each week with data from the prior cohort improves accuracy when current-semester information is still limited.

Figure from the paper full image
abstract click to expand
Early identification of at risk students in higher education depends on predictive models that maintain accuracy across successive cohorts -- a requirement that single-cohort modeling approaches fail to meet. This study evaluates Bayesian updating with informative priors from a previous cohort to improve cross-cohort prediction robustness using digital trace data. We fit weekly Bayesian linear, logistic, and ordinal regression models with either uninformative default priors or informative priors derived from posterior distributions of a preceding cohort. Models were applied to six weekly self-regulated learning (SRL)-aligned engagement indicators from two consecutive cohorts of students in a blended first-year mathematics course (N1 = 307; N2 = 323). Outcomes were exam points, final grades, and a binary at risk indicator. The models were evaluated weekly based on accuracy, sensitivity, and RMSE. In the source cohort, performance was already substantial by week 6. In the target cohort, informative priors improved early classification: Logistic models with priors reduced misclassification by 22% and false negatives by 38% in week 3 relative to the uninformative default. Ordinal models with priors similarly showed the strongest improvements in early weeks, reducing misclassification by 42% in week 2 and reaching an accuracy of .77 by week 4. Linear models showed little benefit from prior information. These findings demonstrate that Bayesian updating is a viable method for improving early classification performance across cohorts, with gains concentrated in the early weeks of the semester when current-cohort data are scarce.
0
0
stat.AP 2026-04-22

Stats compare who wrote And Quiet Flows the Don

And Quiet Does Not Flow the Don: Statistical Analysis of a Quarrel Between Nobel Prize Laureates

Textual statistics test the novel against Sholokhov and Kriukov to weigh the Solzhenitsyn-supported plagiarism claim.

Figure from the paper full image
abstract click to expand
The Nobel Prize in literature 1965 was awarded Mikhail Sholokhov (1905-1984), for the epic novel Tikhij Don about Cossack life and the birth of a new Soviet society (And Quiet Flows the Don, or The Quiet Don, in different translations). Sholokhov has been compared to Tolstoy and was at least one and two generations ago called `the greatest of our writers' in the Soviet Union. In Russia alone his books have been published in more than a thousand editions, selling in total more than sixty million copies. He was an elected member of the USSR Supreme Soviet, the USSR Academy of Sciences, and of the CPSU Central Committee. But in the autumn of 1974 an article was published in Paris, Stremya `Tihogo Dona' (Zagadki romana (`The Rapids of Quiet Don: the Enigmas of the Novel'), by the author and critic D$^*$. He claimed that Tikhij Don was not at all Sholokhov's work, but that it rather was written by Fiodor Kriukov, a more obscure author who fought against bolshevism and died in 1920. The article was given credibility and prestige by none other than Aleksandr Solzhenitsyn (a Nobel prize winner five years after Sholokhov), who wrote a preface giving full support to D$^*$'s conclusion. Scandals followed, also touching the upper echelons of Soviet society, and Sholokhov's reputation was faltering abroad (see e.g. Doris Lessing's (1997) comments; `vibrations of dislike instantly flowed between us'). Are we in fact faced with one of the most flagrant cases of plagiarism in the history of literature?
0
0
stat.AP 2026-04-22

Deep learning predicts PM2.5 pollution at any location without grids

Ground-Level Near Real-Time Modeling for PM2.5 Pollution Prediction

Sparse EPA data blended with terrain, weather and land-use inputs yields fast point-specific forecasts for health decisions.

abstract click to expand
Air pollution is a worldwide public health threat that can cause or exacerbate many illnesses, including respiratory disease, cardiovascular disease, and some cancers. However, epidemiological studies and public health decision-making are stymied by the inability to assess pollution exposure impacts in near real time. To address this, developing accurate digital twins of environmental pollutants will enable timely data-driven analytics - a crucial step in modernizing health policy and decision-making. Although other models predict and analyze fine particulate matter exposure, they often rely on modeled input data sources and data streams that are not regularly updated. Another challenge stems from current models relying on predefined grids. In contrast, our deep-learning approach interpolates surface level PM2.5 concentrations between sparsely distributed US EPA monitoring stations in a grid-free manner. By incorporating additional, readily available datasets - including topographic, meteorological, and land-use data - we improve its ability to predict pollutant concentrations with high spatial and temporal resolution. This enables model querying at any spatial location for rapid predictions without computing over the entire grid. To ensure robustness, we randomize spatial sampling during training to enable our model to perform well in both dense and sparse monitored regions. This model is well suited for near real-time deployment because its lightweight architecture allows for fast updates in response to streaming data. Moreover, model flexibility and scalability allow it to be adapted to various geographical contexts and scales, making it a practical tool for delivering accurate and timely air quality assessments. Its capacity to rapidly evaluate multiple scenarios can be especially valuable for decision-making during public health crises.
0
0
stat.AP 2026-04-21

Review maps statistical tools for detecting and quantifying drug safety signals

A Review of Statistical Methods for Spontaneous Reporting System Data Mining: Signal Detection and Beyond

Guidance on building contingency tables from public aggregated counts supports nuanced analysis beyond simple yes-or-no decisions.

Figure from the paper full image
abstract click to expand
Postmarketing safety surveillance relies on data from spontaneous reporting systems (SRS) such as FAERS, EudraVigilance and VigiBase, and commonly uses SRS data mining methods to assess the associations between drugs and adverse events (AEs). Traditionally, these analyses have focused on signal detection framed as a binary decision problem, whereas more recent work has emphasized more nuanced inference involving signal strength estimation and uncertainty quantification. In this paper, we review contemporary SRS data mining approaches and their statistical underpinnings for safety assessment using data from major pharmacovigilance databases worldwide. In addition to methodological review, we provide practical guidance on data preprocessing for such analysis, including construction of SRS contingency tables using only aggregated AE-drug counts, as are publicly available from databases such as VigiBase and EudraVigilance. We illustrate the guidance via opioid-related datasets obtained from FAERS and VigiBase, complied with subsequent downstream SRS data analyses.
0
0
stat.AP 2026-04-21

New mixture model makes large-scale spatial extreme analysis feasible

Spatial Extremes at Scale: A Case Study of Surface Skin Temperature and Heat Risk in the United States

A random scale mixture process with amortized learning enables Bayesian modeling of extreme temperatures and heat risks across many sites.

Figure from the paper full image
abstract click to expand
Understanding and mapping extreme heat is critical for risk management and public health planning, particularly in regions with complex terrain and heterogeneous climate. We present a case study of extreme heat in the Four Corners region of the United States, using high-resolution surface skin temperature data from the North American Land Data Assimilation System to characterize spatially heterogeneous and seasonally varying extremes across complex terrain, and to assess their implications for heat-related public health risks. Spatial extremes exhibit complex dependencies across geographic regions, which require sophisticated statistical models to capture. While recent advances in spatial extreme value modeling provide flexible representations of joint tail dependencies, statistical inference remains computationally demanding, especially for datasets with a large number of locations. To address this, we propose a random scale mixture process that facilitates Bayesian inference of spatial extremes, and develop scalable inference strategies that leverage advances in spatial modeling and amortized learning. We evaluate the proposed inference methods through large-scale simulation studies, representing the first such extensive study in spatial extremes, and a high-resolution surface skin temperature application in the Four Corners region. Surface skin temperature is particularly useful as a predictor for air temperature, for studying heatwaves and related environmental phenomena, and to calculate heat indices reflecting downstream health risks at any location. Our findings provide insights into efficient, data-driven approaches for modeling spatial extremes, and serve as guidelines for practitioners in the fields of climate science, environmental risk assessment, and beyond.
0
0
stat.AP 2026-04-21

Neural method transfers non-stationary air patterns from models to stations

A Non-stationary, Amortized, Transfer Learning Approach for Modeling Italian Air Quality

Daily fine-grid predictions with uncertainty outperform stationary geostatistical baselines by capturing complex geography.

Figure from the paper full image
abstract click to expand
Air quality monitoring in Italy relies on sparse, irregular, ground-based stations that provide high-quality but incomplete measurements of pollution. Chemical transport models (CTMs) offer full spatial and temporal coverage but smooth over local variability. We develop a spatial transfer-learning framework that integrates these two data sources to produce daily, fine-grid predictions of nitrogen dioxide (NO$_2$) concentrations across Italy for 2023, with uncertainty quantification. The resulting maps provide a resource for decision making in downstream applications such as epidemiology and environmental policy. Our approach builds on the geostatistical LatticeKrig framework, which uses compactly supported basis functions and coefficients governed by a sparse precision matrix. We learn a nonstationary, anisotropic correlation structure from the gridded CTM outputs using an image-to-image neural architecture that estimates millions of spatially varying parameters in a matter of seconds. The basis-function representation enables this covariance structure to be transferred to the point-level station data and projected onto a finer prediction grid, a key extension for handling the change of support between data sources. A likelihood-based refinement step then adjusts the correlation range to recover fine-scale variability smoothed out by the gridded data. The proposed methodology results in a flexible, non-stationary, and anisotropic representation of the spatial process, better accommodating the complex geography of Italy. Performance is assessed through experiments on both gridded CTM outputs and point-level station measurements, demonstrating improvements over the stationary formulation.
0
0
stat.AP 2026-04-21

Joint Bayesian model improves spatial gene detection accuracy

JASPER: Joint Bayesian Analysis of Spatial Expression via Regression

By jointly analyzing multiple genes with basis function regression, it reduces errors from ignoring correlations and fixed kernels in data.

Figure from the paper full image
abstract click to expand
Spatially resolved transcriptomics is a fast-developing set of technologies that enables the measurement of localized gene expression across spatial locations in a sample. Detecting spatially varying genes is critical for analyzing such data, yet existing methods often fail to account for inter-gene correlations, leading to inflated false positive and false negative rates. Additionally, most prominent methods rely on predefined spatial covariance kernels, making them sensitive to the complexity of spatial expression patterns. Motivated by a human breast cancer dataset, we address these limitations in existing literature through JASPER (Joint Bayesian Analysis of SPatial Expression via Regression), a Bayesian framework that jointly models spatial expression patterns across multiple genes using a spatial basis function regression approach. We demonstrate the superior performance of JASPER compared to existing methods in several real-world spatial transcriptomic datasets and supporting simulation experiments. JASPER identifies genes with stronger spatial correlation and greater biological relevance, as validated by overlap comparison, enrichment analysis, and pathway analysis using independent biological databases. Our results highlight the ability of JASPER to improve the statistical and biological interpretability of spatial transcriptomics data, making it a powerful tool for uncovering spatial gene expression patterns in complex biological systems.
0
0
stat.AP 2026-04-21

Pulse oximetry error creates three separate equity problems

Data (in)equities in data science: Dissecting systemic and systematic biases in pulse oximetry

One upstream measurement bias produces distinct violations depending on whether it is examined at data collection, model output, or clinical

Figure from the paper full image
abstract click to expand
Data equity is an emerging framework for responsible data science. However, its core concepts, including fairness, representativeness, and information bias, remain largely abstract and general, lacking the mathematical specificity needed for practical implementation. In this paper, we demonstrate how statisticians can operationalize data equity by translating its tenets into precise, testable formulations tailored to a given problem. Using the well-documented case of differential measurement error across racial groups in pulse oximetry, we first adopt an oracle approach, tracing how a single upstream violation of information bias compounds through the analytic pipeline into treatment disparities, fairness violations, and adverse health outcomes. We then demonstrate the inverse: starting from an observed outcome disparity, the data equity framework provides a principled structure for systematically identifying its statistical sources. Our exposition reveals that data equity, prediction equity, and decision equity are distinct requirements with distinct evaluation and policy needs--a nuance that highlights both the unique role of statisticians in the era of artificial intelligence as well as the necessity of interdisciplinary collaboration.
0

browse all of stat.AP β†’ full archive Β· search Β· sub-categories