{"total":11,"items":[{"citing_arxiv_id":"2606.31282","ref_index":122,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"Revisiting the Volume Hypothesis","primary_cat":"cs.LG","submitted_at":"2026-06-30T07:58:19+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"The generalization advantage of SGD over random sampling diminishes with growing training set size in binary networks, as measured by joint density of states over train and test accuracy.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.21108","ref_index":3,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"Efficient Learning of Deep State Space Models via Importance Smoothing","primary_cat":"cs.LG","submitted_at":"2026-05-20T12:41:50+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"PVMC is a new parallel training algorithm for deep state space models that achieves 10x faster training than prior SMC methods while matching or exceeding benchmark performance for both generative and discriminative tasks.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.18530","ref_index":5,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"Continuous Diffusion Scales Competitively with Discrete Diffusion for Language","primary_cat":"cs.CL","submitted_at":"2026-05-18T15:15:24+00:00","verdict":"CONDITIONAL","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"RePlaid achieves a 20x compute gap to autoregressive models, new SOTA PPL of 22.1 among continuous DLMs on OpenWebText, and competitive scaling laws by aligning architecture with modern discrete DLMs.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.06315","ref_index":8,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"End-to-End Identifiable and Consistent Recurrent Switching Dynamical Systems","primary_cat":"stat.ML","submitted_at":"2026-05-07T14:14:39+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"Identifiability is proven for recurrent nonlinear switching dynamical systems under flexible assumptions, and ΩSDS is introduced as a flow-based estimator that improves disentanglement and forecasting over VAE-based methods.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.05493","ref_index":72,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"A renormalization-group inspired lattice-based framework for piecewise generalized linear models","primary_cat":"stat.ME","submitted_at":"2026-05-06T22:27:50+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"RG-inspired lattice models for piecewise GLMs provide explicit interpretable partitions and a replica-analysis-derived scaling law for regularization that allows increasing complexity without expected rise in generalization loss.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.03413","ref_index":205,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Learning to Theorize the World from Observation","primary_cat":"cs.LG","submitted_at":"2026-05-05T06:39:12+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"NEO is a probabilistic neural model that induces compositional programs as a learned Language of Thought from non-textual observations and executes them via a shared transition model to enable explanation-driven generalization.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.01862","ref_index":21,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"QHyer: Q-conditioned Hybrid Attention-mamba Transformer for Offline Goal-conditioned RL","primary_cat":"cs.LG","submitted_at":"2026-05-03T13:11:28+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"QHyer replaces return-to-go with a state-conditioned Q-estimator and adds a gated hybrid attention-mamba backbone to achieve state-of-the-art performance in offline goal-conditioned RL on both Markovian and non-Markovian datasets.","context_count":1,"top_context_role":"background","top_context_polarity":"background","context_text":"|Π|<∞(can be relaxed to finite covering number). 2. For all(a, s, g, h, Q)∈ A × S × G ×[H]×[0,1]andπ∈Π:|logπ(a|s, g, h, Q)| ≤c. 3.min π∈Π L(π)≤δ approx, whereL(π) :=E (s,g,Q)∼Pβ[DKL(Pβ(· |s, g, h, Q)∥π(· |s, g, h, Q))]. Assumption B.4(Q-Value Coverage).For each(s, a, g, h)in the support ofβ, define: TD(s, a, g, h) :={k∈[N] : (s k h, ak h) = (s, a)}.(21) For trajectory k, let Qk h(g) :=Q β(τ k, g, h) be the empirical goal-reaching probability computed via hindsight relabeling. There exists˜c∈(0,1]such that: |{k∈ T D(s, a, g, h) :Q k h(g) =Q ⋆(s, a, g, h)}| |TD(s, a, g, h)| ≥˜c.(22) Interpretation: At least ˜c-fraction of trajectories through (s, a) achieve the optimal Q-value. Under Assumption B.1, Q⋆ is"},{"citing_arxiv_id":"2602.08167","ref_index":95,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"Self-Supervised Bootstrapping of Action-Predictive Embodied Reasoning","primary_cat":"cs.RO","submitted_at":"2026-02-09T00:10:17+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"R&B-EnCoRe uses self-supervised importance-weighted variational inference to distill action-predictive reasoning datasets that improve VLA performance on manipulation, navigation, and driving tasks without external verifiers.","context_count":1,"top_context_role":"method","top_context_polarity":"use_method","context_text":"logp(A) = logE \" 1 K KX k=1 wk # ≥E \" log 1 K KX k=1 wk # =L K, wherew k = p(Zk,A) q(Zk|A) are importance weights. ForK >1, [94] showsL K+1 ≥ L K ≥ L1 =ELBO V AE, so by increasing sample count, IW AE theoretically improves evidence estimate. In the next section, we propose our algorithm which is based on a sampling-importance-resampling technique introduced in [95] that improves upon IW AE by estimating the posterior distribution using a categorical distribution of the importance weights. More details on the theory of this technique can be found in Appendix C. Algorithm 1R&B-EnCoRe: Warmstarting Require:DatasetD={(C i, Ai)}N i=1, Reasoning Primitives R, Dropout rated, VLMM, Foundation Model FM 1:D warm ← ∅"},{"citing_arxiv_id":"2512.06695","ref_index":5,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"Mitigating Barren Plateaus in Quantum Denoising Diffusion Probabilistic Model","primary_cat":"cs.LG","submitted_at":"2025-12-07T07:01:44+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":5.0,"formal_verification":"none","one_line_summary":"Quantum diffusion models develop a distinct barren plateau beyond small qubit counts; an architectural enhancement and conditional formulation restore trainability for Hamiltonian-parameterized ground-state generation.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2406.09250","ref_index":10,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"MirrorCheck: Efficient Adversarial Defense for Vision-Language Models","primary_cat":"cs.CV","submitted_at":"2024-06-13T15:55:04+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"MirrorCheck detects adversarial attacks on VLMs via T2I regeneration for semantic consistency checks, using stochastic model selection and one-time perturbations for robustness against adaptive attacks.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"1605.08803","ref_index":10,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Density estimation using Real NVP","primary_cat":"cs.LG","submitted_at":"2016-05-27T21:24:32+00:00","verdict":"ACCEPT","verdict_confidence":"MODERATE","novelty_score":8.0,"formal_verification":"none","one_line_summary":"Real NVP uses affine coupling layers to create invertible transformations that support exact density estimation, sampling, and latent inference without approximations.","context_count":1,"top_context_role":"background","top_context_polarity":"background","context_text":"arXiv preprint arXiv:1312.6002, 2013. [8] Samuel R Bowman, Luke Vilnis, Oriol Vinyals, Andrew M Dai, Rafal Jozefowicz, and Samy Bengio. Generating sentences from a continuous space. arXiv preprint arXiv:1511.06349, 2015. [9] Joan Bruna, Pablo Sprechmann, and Yann LeCun. Super-resolution with deep convolutional sufﬁcient statistics. arXiv preprint arXiv:1511.05666, 2015. [10] Yuri Burda, Roger Grosse, and Ruslan Salakhutdinov. Importance weighted autoencoders. arXiv preprint arXiv:1509.00519, 2015. [11] Scott Shaobing Chen and Ramesh A Gopinath. Gaussianization. In Advances in Neural Information Processing Systems, 2000. [12] Junyoung Chung, Kyle Kastner, Laurent Dinh, Kratarth Goel, Aaron C Courville, and Yoshua Bengio."}],"limit":50,"offset":0}