ProactBench measures LLM conversational proactivity in three phases using 198 multi-agent dialogues and finds recovery behavior hard to predict from existing benchmarks.
hub Tool reference
Statist.] 10.1214/aos/1176344136 , 6, 461
Tool reference. 83% of classified Pith citations use this work as a method, library, or software dependency, not as a substantive claim.
hub tools
citation-role summary
citation-polarity summary
representative citing papers
Zero-noise extrapolation has a finite-shot help-harm boundary below which it increases local mean-squared error due to variance penalties outweighing bias reduction.
JudgeSense benchmark shows LLM judge consistency does not reliably improve with model scale, with coherence most sensitive to prompt changes and factuality more stable.
Jensen-Shannon regularized analogues of KL-based direct-correlation measures are introduced, taking values in [0,1] and accompanied by alphabet-size-dependent upper bounds under the observed marginal p(x,z).
Causal Process Models reframe dynamic causal graph discovery as multi-agent reinforcement learning to build sparse time-varying graphs only at active interactions, outperforming dense baselines on physical prediction.
vsOED uses a variational one-point reward and RL policy optimization to provide a lower bound on expected information gain for sequential experimental design, supporting nuisance parameters, implicit likelihoods, and multiple design goals.
RankElastor mitigates embedding collapse via spectrum-robust token mixing and GLU-based P-FFNs, yielding better performance and scaling on industrial recommendation datasets.
COO co-optimizes orbitals with TrimCI to absorb many-body correlations into the basis, cutting determinant count by orders of magnitude for iron-sulfur clusters versus localized bases or DMRG.
Proposes adaptive multiple importance sampling for robust Bayesian model evidence estimation under parameter non-identifiability, shown to outperform deterministic methods on ecological case studies while being cheaper than MCMC.
A Bayesian model for multi-feature contact matrices that uses tensor structures and contingency table theory to satisfy structural constraints and impute missing contact features, validated on simulations and US/German survey data.
Bayesian procedures are derived to compute the posterior probability that a recoverable process is currently in control or that a drifting latent parameter lies in an acceptable region.
A semi-supervised kernel two-sample test integrates unlabeled covariate data to achieve asymptotic normality under the null, higher power than standard kernel tests, and consistency against fixed and local alternatives.
Task-aligned supervised geometric stability predicts linear steerability with high accuracy while unsupervised stability detects representational drift earlier and with lower false alarms than CKA or Procrustes.
42% of significant turn-level associations in LLM conversation analysis are spurious due to unaccounted autocorrelation, with a validated two-stage correction framework improving replication.
Bio-PINNs with a near-to-far curriculum and deformation-uncertainty proxy recover cell-induced densified phases and tether morphologies more reliably than standard adaptive PINN baselines in single-cell and multicellular settings.
A new decay-adjusted spatio-temporal model improves estimation of neglected tropical disease prevalence by explicitly accounting for the waning impact of mass drug administration in sparse survey data.
A hybrid INLA-RF framework integrates Bayesian spatio-temporal modeling with random forests through two iterative algorithms to improve predictions and uncertainty quantification for environmental data.
RAPTOR introduces a tree-organized retrieval method using recursive abstractive summaries, achieving a 20% absolute accuracy improvement on the QuALITY benchmark when paired with GPT-4.
PDE-STRIDE applies stability-based model selection to sparse regression for robust, parameter-free recovery of PDEs from noisy data.
Four new FRBs discovered commensally during Parkes PTA pulsar observations, including one with record S/N and unusual spectrum; all highly polarized.
dynesty is an open-source Python package for dynamic nested sampling that improves efficiency in Bayesian posterior and evidence estimation compared to MCMC on certain problems.
A formula approximating degrees of freedom for tree-structured varying coefficient models is proposed to improve BIC model selection over naive parameter counting.
An unsupervised method detects domain shifts via localized density anomaly search in feature space, attributes the shift to a minimal subspace, and extracts balanced subsets from two unlabeled datasets.
Bayesian-ARGOS is a hybrid frequentist-Bayesian method that discovers equations from limited noisy observations more efficiently than SINDy or bootstrap-ARGOS while adding uncertainty quantification.
citing papers explorer
-
Variational Sequential Optimal Experimental Design using Reinforcement Learning
vsOED uses a variational one-point reward and RL policy optimization to provide a lower bound on expected information gain for sequential experimental design, supporting nuisance parameters, implicit likelihoods, and multiple design goals.