stat.OT — Pith

0

stat.OT 2026-05-11 2 theorems

Detectors for subspace signals in nonzero-mean clutter lose one DOF

Adaptive Subspace Signal Detection and Performance Analysis in Nonzero-Mean Clutter

GLRT, Rao and Wald versions retain the same structure as zero-mean clutter but incur explicit DOF and SCR losses shown by closed-form PD and

abstract click to expand

To solve the problem of detecting subspace signals in nonzero-mean clutter, we propose adaptive detectors, based on the strategies of generalized likelihood ratio test (GLRT), Rao test, Wald test, gradient test, and Durbin test. The results show that the detectors based on GLRT, Rao and Wald are structurally consistent with the subspace detectors in zero-means clutter. The analytic expressions for the probability of detection (PD) and probability of false alarm (PFA) of each detector are derived, and two major performance differences in the nonzero-mean clutter scenario are revealed. One is the loss of degree of freedom (DOF), which is reduced by 1 compared with the zero-mean clutter scenario. The second is the loss of signal-to-clutter (SCR) ratio. Simulation and measured data verify the effectiveness of the proposed detectors and demonstrate their practical value in real-world radar systems.

0

stat.OT 2026-05-08

Reform calls for p-value testing mix useful ideas with overlooked limits

Statistical Significance Revisited

Reviewing changes to the 0.05 threshold, prepublication studies, binary decisions, and shifts to intervals or Bayes reveals specific pros as

abstract click to expand

Since its introduction by Fisher, the method of hypothesis testing that relies on computing error probabilities has witnessed several developments. Perhaps the most significant development was the seminal contributions of Neyman and Pearson who brought in the concept of the alternative hypothesis with its corresponding error of the second kind. Significance tests have played a major role in various scientific and technological developments, but not without controversies. Although originally cast as frequentist approaches, Bayesian ideas have been incorporated into significance tests, widening access to them. The quantities central to computations of error probabilities are the sampling distributions, which can be computed even without thresholds or alternative hypotheses. Even though Fisher used the significance threshold of 0.05 in his calculations, he cautioned against prescribing any specific threshold. Recently, there have been calls for reformation in practice with regard to the almost standard use of the significance threshold of 0.05, prepublication confirmatory studies, the dichotomous consideration of the null and alternative hypothesis and abandoning significance tests altogether in favour of other approaches such as confidence intervals and Bayesian decision theory. In this paper, we examine these calls for reform and unearth their strengths and short comings.

0

stat.OT 2026-05-08

Bayesian networks cut tail risks in express delivery design

Bayesian Multi-Topology Express Transportation Network Design under Posterior Predictive Demand, Sorting-Efficiency and Delivery-Time Uncertainty

Learning uncertainty from data allows trading small cost increases for much lower delivery delays and better hub performance in multiple top

abstract click to expand

Express transportation network design is uncertain because origin--destination demand, travel time, operating cost, hub congestion, and realized sorting productivity vary over time. Existing multi-topology express network models usually optimize cost and maximum arrival time under fixed input data, which may produce designs that are efficient nominally but fragile under demand surges, route disruptions, and hub productivity losses. This paper develops a Bayesian posterior-predictive framework for multi-topology express transportation network design. The model learns demand, travel-time, cost, and hub-reliability uncertainty from historical or benchmark-calibrated data and propagates them through posterior predictive scenarios. For fully connected, hub-and-spoke, restricted-allocation, and direct-link hybrid topologies, candidate designs are evaluated using posterior expected cost, conditional value-at-risk of maximum arrival time, service reliability, hub hold-time reliability, and emission-aware penalties. A Bayesian multi-structure design methodology is proposed using posterior simulation, sample-average approximation, topology-wise optimization, and Bayes-risk selection. Theoretical results establish existence of a Bayes-optimal design, convergence of posterior scenario risks, and stability of topology selection. Simulation and CAB benchmark experiments show that the Bayesian design can trade modest additional cost for substantial reductions in tail delivery risk and improved hub reliability.

0

stat.OT 2026-05-08

Metaverse framework proposed for immersive statistical education

Welcome to the Statverse: A Metaverse for Data Science

Statverse blends physical and digital realms to let students interact with complex statistical concepts in virtual space.

abstract click to expand

This paper introduces the Statverse, a Metaverse framework designed to revolutionize statistical education in the digital age. Our key goal is to report our progress and encourage others to integrate similar strategies into their programs. The proposed framework seamlessly integrates the physical and digital realms to provide an immersive environment for the nuanced representation of complex statistical concepts. Finally, we discuss the potential impact of Statverse on advancing Statistical Education, offering a transformative approach to teaching and learning in the digital age. Statverse is the outcome of an academic partnership between Universidad T\'ecnica Federico Santa Mar\'ia (UTFSM) and the University of Edinburgh (UoE).

0

stat.OT 2026-05-04

Functional Liu estimator selects shrinkage by direct MSE minimization

Functional Liu Regression for Scalar-on-Functional Models in High-Dimensional Settings

In high-dimensional scalar-on-function settings, a plug-in rule from the risk decomposition replaces uninformative cross-validation criteria

abstract click to expand

This study develops a functional Liu-type shrinkage estimator (fLiu) for scalar-on-function regression in the presence of strong multicollinearity and high-dimensional functional predictors. The approach extends the classical Liu estimator to the functional setting by combining directional shrinkage with smoothness regularization, providing flexible control over the bias-variance trade-off. Theoretical analysis is used to examine the behavior of the estimator and the associated parameter selection problem. In particular, an explicit mean squared error (MSE) decomposition is derived, characterizing the risk of the estimator in terms of variance reduction and shrinkage bias. This further yields an explicit optimal choice of the shrinkage parameter of the fLiu estimator through a one-dimensional convex risk minimization problem, leading to a practical plug-in tuning rule. Moreover, it is shown that in high-dimensional (underdetermined) settings, commonly used criterion such as GCV (and equivalently PRESS/LOO-CV) become constant with respect to the parameter d, thus uninformative for tuning. This provides a theoretical explanation for the predominant focus on the overdetermined regime in existing Liu-type methods. Numerical results demonstrate that the estimator achieves competitive predictive accuracy relative to existing methods. Implementation is carried out in R using the fda package, and in Python via the fLiu.py package developed for this study.

0

stat.OT 2026-05-04

The paper models networked systems that switch regimes and carry memory of past states

Quenched Amplification and Tail Shaping in Networked Systems with Memory and Regime Switching

Quenched amplification arises generically in linear regime-switching networks with Volterra memory, producing power-law burst tails whose…

abstract click to expand

Networked systems operating under intermittent adverse conditions and long memory can remain stable on average while exhibiting rare but extreme trajectory-level excursions. We study linear regime-switching network dynamics with Volterra-type memory, formulated through a finite-dimensional lifted ordinary differential equation embedding. Despite finite-horizon annealed boundedness, we show that quenched amplification emerges generically from the interaction of regime persistence, memory accumulation, and non-normal lifted operator geometry. A lower bound on burst-size distributions reveals power-law tails whose exponent is determined by the ratio between unfavorable dwell-time rates and an operator-defined instantaneous growth parameter. This parameter is computable online via the Euclidean logarithmic norm of the lifted operator, yielding a practical early-warning indicator. Building on this structure, we introduce a dynamic data-driven intervention strategy that enforces contraction on demand along rare amplification channels, thereby shaping or truncating tail risk without altering exogenous regime statistics or typical system behavior. The results provide a geometrically grounded and operationally actionable framework for understanding and mitigating extreme events in memory-driven regime-switching systems.

0

stat.OT 2026-04-29

Markov chain solves Sudoku by sampling favored grids

Sudoku Solving and Finding Magic Squares by Probability Models and Markov Chains

A probability model on 9x9 matrices weights promising attempts higher so the chain reaches valid solutions, and the same produces magic 8x8,

abstract click to expand

The sudoku puzzles have a long history, with variations going back more than a hundred years, but its current and perhaps surprising world-wide prominence goes back to certain initiatives and then puzzle-generating computer programmes from just after 2000. To solve a sudoko puzzle, a statistician can put up a probabilitymodel on the enormous space of $9\times9$ matrix possibilities, constructed to favour `good attempts', and then engineer a Markov chain to sample a long enough chain of sudoku table realisations from that model, until the solution is found. The methods work also for other types of puzzles, like constructing `magic squares' with wished-for properties (sums of rows, columns, diagonals equal, etc.), as is also illustrated in this article; via magic models and equally magic Markov chains I find impressively magic $8\times8$ and $10\times10$ squares.

0

stat.OT 2026-04-27

Students use GenAI heavily but doubt output accuracy

Perceptions and Utilization of GenAI Tools among Data Science Students and Faculty

One-university survey shows coding and writing aid as top uses yet low confidence and faculty integration, calling for training and guidance

abstract click to expand

This study investigates perceptions and use of generative artificial intelligence (GenAI) tools among students and faculty in statistics and data science at a historically Black college or university. Survey data from 119 valid student responses and 14 faculty responses were used to examine familiarity, usage patterns, perceived benefits, awareness of limitations, and instructional support needs. Students reported substantial use of GenAI, with ChatGPT as the dominant tool, primarily for coding assistance and writing support. Although student perceptions of AI in data science workflows and careers were generally positive, confidence in interpreting AI-generated outputs was limited, and concerns about accuracy, reliability, and over-reliance were common. Faculty also viewed GenAI favorably, but self-rated proficiency and the frequency of classroom integration remained limited. Comparisons across student subgroups suggested that familiarity with GenAI and awareness of its limitations varied more by academic level than by gender. These findings highlight a gap between AI adoption and AI literacy and underscore the need for structured training, validation practices, and clearer institutional guidance for responsible AI integration in data science education.

0

stat.OT 2026-04-21

Parameters reframe regression by composition

A Parameter-Centric View on Regression

A discussion shows how centering on parameters clarifies what composition adds to regression models.

abstract click to expand

Discussion on ``Regression by Composition'' by Farewell, Daniel, Stensrud, and Huitfeldt

0

stat.OT 2026-04-21

Composing regressions removes dependence on data variation

Toward Variation-Independent Regression by Composition

The method structures the model so estimates stay stable when predictor distributions change.

abstract click to expand

Discussion on "Regression by Composition" by Farewell, Daniel, Stensrud, and Huitfeldt.

0

stat.OT 2026-04-16

AI functions as a health determinant like air pollution

The Epidemiology of Artificial Intelligence

A framework from environmental epidemiology separates ambient algorithmic influences from personal AI tool use to study population health.

abstract click to expand

Artificial intelligence (AI) systems increasingly shape how people access health information, make medical decisions, and receive care -- yet epidemiology lacks frameworks for measuring AI exposure or studying its health effects at the population level. Here we argue that AI now functions as a determinant of health and propose a conceptual framework, borrowed from environmental epidemiology, for studying it. We distinguish ambient AI exposure -- algorithmic curation and AI-mediated institutional decisions that affect populations regardless of individual choice -- from personal AI exposure -- direct, volitional use of AI tools. We characterize AI's possible causal roles in epidemiological models, show that existing experimental approaches are inadequate for capturing chronic, population-level effects, and illustrate these ideas with nationally representative US survey data. We discuss implications for study design, health equity, and AI governance.

0

stat.OT 2026-04-13

Bivariate Zenga surfaces track inequality across paired variables

On Some Multivariate Extensions to Zenga Curve: Properties and Applications

Quantile functions turn the classic one-variable Zenga measure into surfaces that show how two inequalities interact, demonstrated on global

abstract click to expand

Measures of inequality are often limited in their ability to capture multidimensional aspects that arise from the joint distribution of multiple socio-economic variables. In this paper, we develop bivariate extensions of the Zenga inequality measure using bivariate quantile functions. We propose new bivariate Zenga surfaces and study their theoretical properties. A vector-valued bivariate Zenga curve is also introduced to provide a more detailed characterization of inequality. A non-parametric estimator is proposed and methods are evaluated through simulation studies and applied to the analysis of digital inequality across countries using indicators such as broadband penetration and digital literacy. The results highlight the effectiveness of the proposed framework in capturing multidimensional inequality.

0

stat.OT 2026-04-06 Recognition

Recommendations fail to increase regularization use

Why is Regularization Underused? An Empirical Study on Trust and Adoption of Statistical Methods

Survey of 606 analysts links adoption instead to ease of implementation, bias control benefits, and peer norms.

abstract click to expand

Statistical practice does not automatically follow methodological innovation. Regularization methods, widely advocated to reduce overfitting and stabilize inference, are readily available in modern software, but are not consistently used by data analysts. We investigate this implementation gap in a large-scale empirical study of trust in, and acceptance of, regularization techniques, based on $N = 606$ data analysts. Drawing on measurement frameworks from technology acceptance research, we survey practitioners and embed a randomized experiment to test whether written recommendation of regularization methods increases trust or intended use. We find no evidence of such an effect. Instead, adoption intentions are strongly associated with analysts' perceptions of ease of implementation and practical benefit, such as improved bias control or interpretability. Perceived social norms also emerge as a central driver. These results indicate that uptake of statistical methodology depends less on formal recommendations than on usability, perceived utility, and community practice.

0

stat.OT 2026-04-02 1 theorem

Debiased LASSO restores valid inference in high-dimensional regression

Debiased Estimators in High-Dimensional Regression: A Review and Replication of Javanmard and Montanari (2014)

Replication confirms reliable coverage and Type I error control, with power trade-offs versus projection estimators in low-signal cases

abstract click to expand

High-dimensional statistical settings ($p \gg n$) pose fundamental challenges for classical inference, largely due to bias introduced by regularized estimators such as the LASSO. To address this, Javanmard and Montanari (2014) propose a debiased estimator that enables valid hypothesis testing and confidence interval construction. This report examines their debiased LASSO framework, which yields asymptotically normal estimators in high-dimensional settings. The key theoretical results underlying this approach are presented. Specifically, the construction of an optimized debiased estimator that restores asymptotic normality, which enables the computation of valid confidence intervals and $p$-values. To evaluate the claims of Javanmard and Montanari, a subset of the original simulation study and the real-data analysis is presented. The original empirical analysis is extended to the desparsified LASSO, which is referenced but not implemented in the original study. The results demonstrate that while the debiased LASSO achieves reliable coverage and controls Type I error, the LASSO projection estimator can offer improved power in idealized low-signal settings without compromising error rates. The results reveal a trade-off: the LASSO projection estimator performs well in low-signal settings, while Javanmard and Montanari's method is more robust to complex correlations, improving precision and signal detection in real data.

0

stat.OT 2026-04-01 2 theorems

Soft numbers assign infinitesimal probabilities to points

Hilbert's Sixth Problem and Soft Logic

This refines classical probability to help derive macro laws from micro mechanics and represent Hilbert's sixth problem with a soft-numbered

abstract click to expand

Hilbert's sixth problem calls for the axiomatization of physics, particularly the derivation of macroscopic statistical laws from microscopic mechanical principles. A conceptual difficulty arises in classical probability theory: in continuous spaces every individual microstate has probability zero. In this paper, we introduce a probabilistic framework based on Soft Logic and Soft Numbers in which point events possess infinitesimal Soft probabilities rather than the classical zero. We show that Soft probability can be interpreted as an infinitesimal refinement of classical probability and discuss its implications for statistical mechanics and Hilbert's sixth problem. In addition, we show rigorously how to construct a Mobius strip, based on the soft numbers, and we discuss how this Mobius strip representation with soft numbers allows for a deeper understanding of the nature and character of Hilbert's sixth problem.

0