pith. machine review for the scientific record. sign in

physics.data-an

Data Analysis, Statistics and Probability

Methods, software and hardware for physics data analysis: data processing and storage; measurement methodology; statistical and mathematical aspects such as parametrization and uncertainties.

0
physics.data-an 2026-05-04

Workbench links weather embeddings to physical data for event retrieval

Toward a Scientific Discovery Engine for Weather and Climate Data: A Visual Analytics Workbench for Embedding-Based Exploration

Researchers characterize known phenomena in labeled sets then search large unlabeled archives for matching storms and other weather events.

Figure from the paper full image
abstract click to expand
Earth system science is producing increasingly large, high-dimensional datasets from physics based Earth system models to AI-based weather and climate models. Embedding-based representations can make these data searchable through similarity search and analog retrieval, but nearest neighbors in latent space are not automatically scientifically meaningful: it may reflect real weather structure, or preprocessing, geography, or model bias. Researchers therefore need ways to inspect how embeddings organize meteorological data, compare representation models, develop retrieval strategies, and verify results against physical evidence. We present an open-source visual analytics workbench for each of these steps. The system links embedding experiments to source data, metadata, spatial context, and model configurations, so latent-space results can be traced back to the physics. Users can explore latent spaces for different models, issue global or localized queries, and inspect analogs through familiar meteorological views. This enables a discovery workflow in which scientists characterize a phenomenon of interest in a well-understood dataset, identifying its signature in latent space, and then use that signature to probe larger, less-labeled archives or ensembles for similar events. We demonstrate the workbench through tropical-cyclone retrieval using ERA5-derived embeddings and IBTrACS metadata, and evaluate its out-of-core retrieval backend to show that large embedding collections can be searched beyond in-memory limits on commodity workstation hardware.
0
0
physics.data-an 2026-05-01

FitED is a Python desktop application that provides an interactive GUI and numerical…

FitED: A User-Centric, Extensible Software Environment for Robust Peak-Profile and General Functional Data Fitting

FitED is a new software package offering interactive and automated nonlinear fitting of conventional peak shapes and custom analytical…

abstract click to expand
Reliable parameter extraction from experimental data is central to quantitative analysis in spectroscopy, diffraction, photoluminescence, chromatography, microscopy, and time-resolved measurements. We present FitED, a Python-based desktop application for interactive and automated nonlinear fitting of one-dimensional scientific data. FitED combines an accessible graphical workflow with a numerical backend capable of fitting both conventional peak profiles and arbitrary user-defined analytical functions. The software supports Gaussian, Lorentzian, Pseudo-Voigt, and exact area-normalized Voigt profiles, together with custom functions such as exponential decays, stretched exponentials, saturation curves, and spectroscopy-specific response functions. It integrates robust text-file import, region-of-interest selection, background modeling, parameter bounds, weighting strategies, automated pre-fit search, iterative peak refinement, residual visualization, session persistence, and structured export of fitted curves, components, reports, and metadata. By combining mathematical transparency with practical usability, FitED aims to make nonlinear fitting more reproducible and accessible while preserving the parameter-level control required by experienced experimental researchers.
0
0
physics.data-an 2026-04-29

Astrocytic gains create self-attention in memory networks

Emergent Self-Attention from Astrocyte-Gated Associative Memory Dynamics

Entropy-regularized replicator dynamics on gains lead to softmax routing at fixed points and better retrieval under interference.

Figure from the paper full image
abstract click to expand
We introduce a Hopfield-type associative memory in which effective connectivity is multiplicatively modulated by astrocytic gains evolving under an entropy-regularized replicator equation. The coupled neuron-astrocyte dynamics admit a Lyapunov function, ensuring global convergence. At fixed points, astrocytic gains implement a softmax-normalized allocation over pattern similarity scores, yielding a mechanistic realization of self-attention as emergent routing on the gain simplex. In regimes of high memory load and interference, the model significantly improves retrieval accuracy relative to classical Hopfield dynamics and recent neuron-astrocyte baselines. These results establish a dynamical systems framework linking glial modulation, competitive resource allocation, and attention-like computation.
0
0
physics.data-an 2026-04-28

DySIB recovers pendulum phase space from video data

Information bottleneck for learning the phase space of dynamics from high-dimensional experimental data

By maximizing predictive mutual information between past and future latent windows, it yields coordinates aligned with angle and angular vel

Figure from the paper full image
abstract click to expand
Identifying the dynamical state variables of a system from high-dimensional observations is a central problem across physical sciences. The challenge is that the state variables are not directly observable and must be inferred from raw high-dimensional data without supervision. Here we introduce DySIB (Dynamical Symmetric Information Bottleneck) as a method to learn low-dimensional representations of time-series data by maximizing predictive mutual information between past and future observation windows while penalizing representation complexity. This objective operates entirely in latent space and avoids reconstruction of the observations. We apply DySIB to an experimental video dataset of a physical pendulum, where the underlying state space is known. The method, with hyperparameters of the learning architecture set self-consistently by the data, recovers a two-dimensional representation that matches the dimensionality, topology, and geometry of the pendulum phase space, with the learned coordinates aligning smoothly with the canonical angle and angular velocity. These results demonstrate, on a well-characterized experimental system, that predictive information in latent space can be used to recover interpretable dynamical coordinates directly from high-dimensional data.
0
0
physics.data-an 2026-04-23

MCMC maintains 95 percent coverage in constrained gamma unmixing

Bayesian approach for uncertainty quantification of hybrid spectral unmixing in γ-ray spectrometry

Laplace approximation loses accuracy when spectral deformation constraints activate or background dominates, but MCMC does not.

Figure from the paper full image
abstract click to expand
Identifying and quantifying $\gamma$-emitting radionuclides, considering spectral deformation from $\gamma$-interactions in radioactive source surroundings, present a significant challenge in $\gamma$-ray spectrometry. In that context, a hybrid machine learning method has been previously proposed to jointly estimate the counting and spectral signatures of $\gamma$-emitters under conditions of spectral variability. This paper addresses the uncertainty quantification of the estimators (i.e., the counting and the variable $\lambda$ which characterizes the spectral signatures) obtained by this spectral unmixing algorithm. The focus is on the coverage interval, as defined by the GUM, which corresponds closely to a credible interval in the Bayesian framework. Given the inverse problem and the constraints associated with spectral deformation, two Bayesian methods - Laplace approximation and Markov Chain Monte Carlo - have been developed for uncertainty quantification to ensure robust decision-making. The Laplace approximation technique approximates the posterior distribution by a Gaussian distribution, while the Markov Chain Monte Carlo technique samples the posterior distribution. This study evaluates these two methods in terms of precision of coverage interval based on repeated Monte Carlo samples using the long-run success rate. Numerical experiments show that both methods yield similar results close to the expected success rate of 95.4$\%$ when constraints related to spectral signatures deformation and counting are inactive. However, when constraints are active or the background counting significantly dominates other radionuclides, the Laplace approximation method deviates from the expected long-run success rate due to the non-Gaussian posterior distribution. In such cases, the Markov Chain Monte Carlo method still provides robust results.
0
0
physics.data-an 2026-04-20

One foundation model unifies three DIRC detector tasks

Application of a Mixture of Experts-based Foundation Model to the GlueX DIRC Detector

A shared transformer backbone with expert routing performs simulation, identification and hit filtering of Cherenkov photons without task-by

Figure from the paper full image
abstract click to expand
We present a Mixture-of-Experts-based foundation model applied to the GlueX DIRC detector at Jefferson Lab, demonstrating its utility as a unified framework for fast simulation, particle identification, and hit-level noise filtering of Cherenkov photons. By leveraging a single shared transformer backbone across all tasks, the approach eliminates the fragmentation of task-specific pipelines while maintaining competitive-and in several cases superior-performance relative to established methods. The model operates directly on low-level detector inputs, performing hit-by-hit autoregressive generation over split spatial and temporal vocabularies with continuous kinematic conditioning, and supports class-conditional generation of pions and kaons through its Mixture-of-Experts architecture. We benchmark against the standard geometrical reconstruction and prior deep learning methods across the full kinematic phase space of the GlueX DIRC, demonstrating that the foundation model framework transfers effectively to this detector without architectural modification. This work positions the foundation model as a practical and scalable alternative to the suite of task-specific models currently proposed for GlueX DIRC analysis.
0
0
physics.data-an 2026-04-17

The paper builds a prototype that uses open-weight LLMs to read high-energy physics papers

Development of an LLM-Based System for Automatic Code Generation from HEP Publications

A two-stage LLM system extracts structured analysis selections from HEP papers and references then generates and validates executable code…

Figure from the paper full image
abstract click to expand
Ensuring the reproducibility of physics results is one of the crucial challenges in high-energy physics (HEP). In this study, we develop a proof-of-concept system that uses large language models (LLMs) to extract analysis procedures from HEP publications and generate executable analysis code for reproducing published results. Our method consists of two stages. In the first stage, open-weight LLMs extract event selection criteria, object definitions, and other relevant analysis information from a target paper and, when necessary, from its referenced publications, and then produce a structured selection list. In the second stage, the structured selection list is used to generate analysis code, which is then executed and validated iteratively. As a benchmark, we use the ATLAS $H \to ZZ^{*} \to 4\ell$ analysis based on proton-proton collision data recorded in 2015 and 2016 and released as ATLAS Open Data. This benchmark allows direct comparison between the generated results and the published analysis, as well as comparison with a manually developed baseline implementation. We separately evaluate selection extraction and code generation in order to clarify the current capabilities and limitations of open-weight LLMs for HEP analysis reproduction. Our initial results show that recent open-weight models can recover many documented selection criteria from papers and references, and that in some runs they can generate event selections fully matching a baseline implementation at the event level. At the same time, stochasticity, hallucination, and execution failure remain significant challenges. These results suggest that LLMs are already promising as human-in-the-loop tools for reproducibility support, although they are not yet reliable as fully autonomous HEP analysis agents. In this paper, we report the design of the prototype system and its initial performance evaluation.
0
0
physics.data-an 2026-04-10

GAPE lifts PROSPECT neutrino signal ratio by nearly 2.8 times

New Deep Learning Data Analysis Method for PROSPECT using GAPE: Genetic Algorithm Powered Evolution

Genetic evolution selects deep learning models that outperform traditional analysis on identical input data for cleaner reactor antineutrino

Figure from the paper full image
abstract click to expand
We propose a genetic algorithm powered evolution (GAPE) method to create deep learning solutions for energy and position estimation for reactor antineutrino interactions in the Precision Reactor Oscillation and Spectrum Experiment (PROSPECT) at the highly enriched High Flux Isotope Reactor (HFIR) at Oak Ridge National Laboratory. We also apply GAPE to create classification models to distinguish signatures of inverse beta decay (IBD) interactions of reactor antineutrinos from common background types. The GAPE method can also be adopted for optimization of other types of problems that utilize machine learning (ML) models for particle physics applications. When applied in the PROSPECT context, we find that the models selected by GAPE can, in some cases, outperform the traditional models previously used for PROSPECT data analysis. In particular, when benchmarked against conventional PROSPECT neutrino identification pathways using the same underlying information, the classifier offers the promise of improving the signal-to-background ratio by nearly 2.8 times. Performance biases uncovered during initial IBD classifier validation were primarily caused by differences in time-dependent response between background and signal training datasets. Biases were effectively mitigated through a data-period-specific training regimen, offering a pathway towards realizing an unbiased IBD signal classifier for future reactor neutrino datasets.
0
0
physics.data-an 2026-04-09 2 theorems

Orthogonal polynomials make Savitzky-Golay filters faster and far more accurate

Fast and accurate noise removal by curve fitting using orthogonal polynomials

Recursive algorithms cut memory use and raise numerical precision by orders of magnitude for repeated local polynomial fits.

Figure from the paper full image
abstract click to expand
Local polynomial smoothing is a widespread technique in data analysis, and Savitzky-Golay (SG) filters are one of its most well-known realizations. In real settings, the effectiveness of SG filtering depends critically on proper tuning of its parameters, constrained in turn by repeated polynomial fitting over large data windows and for varying polynomial degrees. Standard implementations based on monomial bases and Vandermonde matrix formulations are known to suffer from ill-conditioning and unfavorable scaling as the problem size increases. In this work, we present a fast and numerically stable method for computing polynomial fitting and differentiation matrices by reformulating the problem in terms of discrete orthogonal (Chebyshev) polynomials. Exploiting their recursive structure and the intrinsic symmetry properties of the resulting matrices, we derive two algorithms designed to reduce computational overhead. Both methods significantly reduce memory usage and improve scalability with respect to the polynomial degree and window length. A discussion of the performance demonstrates that the proposed algorithms achieve orders-of-magnitude improvements in numerical accuracy compared to standard matrix multiplication, while also providing potential gains in execution time for large-scale problems. These features make the approach particularly well suited for applications requiring repeated local polynomial fits, such as the optimization of SG filters in high-resolution spectral analyses, including axion dark matter searches and the ALPHA haloscope.
0
0
physics.data-an 2026-04-03 2 theorems

Neural method estimates battery parameters in milliseconds

Neural posterior estimation for scalable and accurate inverse parameter inference in Li-ion batteries

Matches Bayesian accuracy on experimental voltage data and enables real-time diagnostics for Li-ion cells

Figure from the paper full image
abstract click to expand
Diagnosing the internal state of Li-ion batteries is critical for battery research, operation of real-world systems, and prognostic evaluation of remaining lifetime. By using physics-based models to perform probabilistic parameter estimation via Bayesian calibration, diagnostics can account for the uncertainty due to model fitness, data noise, and the observability of any given parameter. However, Bayesian calibration in Li-ion batteries using electrochemical data is computationally intensive even when using a fast surrogate in place of physics-based models, requiring many thousands of model evaluations. A fully amortized alternative is neural posterior estimation (NPE). NPE shifts the computational burden from the parameter estimation step to data generation and model training, reducing the parameter estimation time from minutes to milliseconds, enabling real-time applications. The present work shows that NPE calibrates parameters equally or more accurately than Bayesian calibration, and we demonstrate that the higher computational costs for data generation are tractable even in high-dimensional cases (ranging from 6 to 27 estimated parameters), but the NPE method can lead to higher voltage prediction errors. The NPE method also offers several interpretability advantages over Bayesian calibration, such as local parameter sensitivity to specific regions of the voltage curve. The NPE method is demonstrated using an experimental fast charge dataset, with parameter estimates validated against measurements of loss of lithium inventory and loss of active material. The implementation is made available in a companion repository (https://github.com/NatLabRockies/BatFIT).
0
0
physics.data-an 2010-07-12 2 theorems

Asymptotic formulae give distributions for new physics tests

Asymptotic formulae for likelihood-based tests of new physics

They supply closed-form behaviour for likelihood statistics with systematics and use an Asimov dataset to find median experimental reach.

abstract click to expand
We describe likelihood-based statistical tests for use in high energy physics for the discovery of new phenomena and for construction of confidence intervals on model parameters. We focus on the properties of the test procedures that allow one to account for systematic uncertainties. Explicit formulae for the asymptotic distributions of test statistics are derived using results of Wilks and Wald. We motivate and justify the use of a representative data set, called the "Asimov data set", which provides a simple method to obtain the median experimental sensitivity of a search or measurement as well as fluctuations about this expectation.
36 0
0
physics.data-an 2000-04-25 3 theorems

Bottleneck code extracts relevant info from signals

The information bottleneck method

Compressing X while maximizing information about Y yields self-consistent equations solvable by re-estimation.

abstract click to expand
We define the relevant information in a signal $x\in X$ as being the information that this signal provides about another signal $y\in \Y$. Examples include the information that face images provide about the names of the people portrayed, or the information that speech sounds provide about the words spoken. Understanding the signal $x$ requires more than just predicting $y$, it also requires specifying which features of $\X$ play a role in the prediction. We formalize this problem as that of finding a short code for $\X$ that preserves the maximum information about $\Y$. That is, we squeeze the information that $\X$ provides about $\Y$ through a `bottleneck' formed by a limited set of codewords $\tX$. This constrained optimization problem can be seen as a generalization of rate distortion theory in which the distortion measure $d(x,\x)$ emerges from the joint statistics of $\X$ and $\Y$. This approach yields an exact set of self consistent equations for the coding rules $X \to \tX$ and $\tX \to \Y$. Solutions to these equations can be found by a convergent re-estimation method that generalizes the Blahut-Arimoto algorithm. Our variational principle provides a surprisingly rich framework for discussing a variety of problems in signal processing and learning, as will be described in detail elsewhere.
54 0

browse all of physics.data-an → full archive · search · sub-categories