{"total":19,"items":[{"citing_arxiv_id":"2606.13451","ref_index":26,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"Uncertainty Estimation for Molecular Diffusion Models","primary_cat":"cs.LG","submitted_at":"2026-06-11T15:11:12+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":5.0,"formal_verification":"none","one_line_summary":"Post-hoc uncertainty estimation via Laplace approximation on the denoising network in molecular diffusion models correlates negatively with sample quality and enables filtering to improve performance.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2606.04342","ref_index":29,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"Expectations vs. Realities: The Cost of MSE-Optimal Forecasting Under Conditional Uncertainty","primary_cat":"cs.LG","submitted_at":"2026-06-03T01:50:32+00:00","verdict":"CONDITIONAL","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"MSE-optimal multi-step forecasters cannot match the marginal distribution of realizations under nonzero conditional uncertainty, creating a quantifiable accuracy-realism Pareto frontier across benchmarks.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.28986","ref_index":39,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"Comparing Classical Simulation and Sample-Based Learning of Quantum Systems: Learning the Hardness of Quantum Systems from Samples","primary_cat":"quant-ph","submitted_at":"2026-05-27T18:44:36+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":5.0,"formal_verification":"none","one_line_summary":"Empirical study finds neural-network learning difficulty (via Hessian eigenvalue and random subspace optimization) correlates with classical simulation hardness parameterized by MPS bond dimension and T-gate count.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.16486","ref_index":26,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"StAD: Stein Amortized Divergence for Fast Likelihoods with Diffusion and Flow","primary_cat":"stat.ML","submitted_at":"2026-05-15T18:00:00+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"StAD distills divergence of PF-ODEs via the Langevin-Stein operator for faster, lower-variance likelihood estimation in generative models without Jacobian costs.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.13225","ref_index":24,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"Mix, Don't Tune: Bilingual Pre-Training Outperforms Hyperparameter Search in Data-Constrained Settings","primary_cat":"cs.LG","submitted_at":"2026-05-13T09:17:51+00:00","verdict":"CONDITIONAL","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"Mixing auxiliary high-resource language data outperforms hyperparameter tuning in data-constrained bilingual pre-training, with gains equivalent to 2-13 times more unique target data.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.08698","ref_index":9,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Supersampling Stable Diffusion and Beyond: A Seamless, Training-Free Approach for Scaling Neural Networks Using Common Interpolation Methods","primary_cat":"cs.CV","submitted_at":"2026-05-09T05:13:21+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":5.0,"formal_verification":"none","one_line_summary":"Kernel interpolation with a constant multiplier scales convolution and fully-connected layers in neural networks to higher resolutions or dimensions without training, producing competitive results on Stable Diffusion and other models.","context_count":1,"top_context_role":"background","top_context_polarity":"background","context_text":"suggested by [ 24]. 15 A PREPRINT - M AY 12, 2026 4.2 Experimental Metrics To conduct our experiments, we choose variants of the FID and KID metric [ 10], [ 11] as done by [ 23], [ 24]. The description of the metrics we used is speciﬁed in this section. 4.3 Fréchet Inception Distance (FID) Comparing performance across generative models is non-trivial [ 9]. A good alternative to human judgment in such cases is the Inception Score [ 8], where the Inception Model (that was trained on ImageNet) is applied to every image of the generated distribution. The softmax probabilities of the generated images ( p(y|x)) containing ImageNet objects will have low entropy. And as the generated samples are varied, this will result in a high-entropy over the marginal"},{"citing_arxiv_id":"2605.07193","ref_index":68,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Coupling Models for One-Step Discrete Generation","primary_cat":"cs.LG","submitted_at":"2026-05-08T03:40:39+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"Coupling Models enable single-step discrete sequence generation via learned couplings to Gaussian latents and outperform prior one-step baselines on text perplexity, biological FBD, and image FID metrics.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.05520","ref_index":70,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Bayesian Rain Field Reconstruction using Commercial Microwave Links and Diffusion Model Priors","primary_cat":"cs.LG","submitted_at":"2026-05-06T23:36:46+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":5.0,"formal_verification":"none","one_line_summary":"Bayesian inverse problem with diffusion model priors for CML-based rain field reconstruction outperforms baselines by preserving rainfall statistics better than Gaussian processes.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.03413","ref_index":178,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Learning to Theorize the World from Observation","primary_cat":"cs.LG","submitted_at":"2026-05-05T06:39:12+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"NEO is a probabilistic neural model that induces compositional programs as a learned Language of Thought from non-textual observations and executes them via a shared transition model to enable explanation-driven generalization.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2604.11653","ref_index":26,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"GazeVaLM: A Multi-Observer Eye-Tracking Benchmark for Evaluating Clinical Realism in AI-Generated X-Rays","primary_cat":"cs.CV","submitted_at":"2026-04-13T16:05:45+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"GazeVaLM provides 960 gaze recordings from 16 radiologists on 60 chest X-rays (half synthetic) plus LLM predictions for diagnostic accuracy and real-fake detection under matched conditions.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2604.09091","ref_index":65,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Synthesizing real-world distributions from high-dimensional Gaussian Noise with Fully Connected Neural Network","primary_cat":"cs.LG","submitted_at":"2026-04-10T08:20:45+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":5.0,"formal_verification":"none","one_line_summary":"Fully connected neural network with randomized loss synthesizes real-world tabular data distributions from Gaussian noise faster than state-of-the-art deep generative models.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2604.02718","ref_index":12,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Generative Frontiers: Why Evaluation Matters for Diffusion Language Models","primary_cat":"cs.LG","submitted_at":"2026-04-03T04:21:20+00:00","verdict":"CONDITIONAL","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"Generative perplexity and entropy are shown to be the two additive components of KL divergence to a reference distribution, motivating generative frontiers as a principled evaluation method for diffusion language models.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2603.13419","ref_index":58,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"Diffusion Models Memorize in Training -- and Generalize in Inference","primary_cat":"cs.LG","submitted_at":"2026-03-12T21:02:17+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"Diffusion models overfit denoising loss at intermediate noise but generalize in inference as model error smooths the flow field and sampling paths avoid memorized noisy training data.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"1906.11080","ref_index":20,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"AGAN: Towards Automated Design of Generative Adversarial Networks","primary_cat":"cs.LG","submitted_at":"2019-06-25T10:12:32+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":8.0,"formal_verification":"none","one_line_summary":"AGAN is the first neural architecture search method for GANs that discovers architectures outperforming state-of-the-art on CIFAR-10 unsupervised image generation and competitive on supervised tasks.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"1809.11096","ref_index":12,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Large Scale GAN Training for High Fidelity Natural Image Synthesis","primary_cat":"cs.LG","submitted_at":"2018-09-28T15:38:49+00:00","verdict":"ACCEPT","verdict_confidence":"MODERATE","novelty_score":7.0,"formal_verification":"none","one_line_summary":"BigGANs achieve state-of-the-art class-conditional synthesis on ImageNet 128x128 with Inception Score 166.5 and FID 7.4 by scaling GANs and applying orthogonal regularization plus truncation.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"1801.01401","ref_index":58,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"Demystifying MMD GANs","primary_cat":"stat.ML","submitted_at":"2018-01-04T15:25:26+00:00","verdict":"ACCEPT","verdict_confidence":"MODERATE","novelty_score":6.0,"formal_verification":"none","one_line_summary":"MMD GANs have unbiased critic gradients but biased generator gradients from sample-based learning, and the Kernel Inception Distance provides a practical new measure for GAN convergence and dynamic learning rate adaptation.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"1605.08803","ref_index":62,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Density estimation using Real NVP","primary_cat":"cs.LG","submitted_at":"2016-05-27T21:24:32+00:00","verdict":"ACCEPT","verdict_confidence":"MODERATE","novelty_score":8.0,"formal_verification":"none","one_line_summary":"Real NVP uses affine coupling layers to create invertible transformations that support exact density estimation, sampling, and latent inference without approximations.","context_count":1,"top_context_role":"background","top_context_polarity":"background","context_text":"and real data. Rather than using an intractable log-likelihood, this discriminator network provides the training signal in an adversarial fashion. Successfully trained GAN models [ 21, 15, 47] can consistently generate sharp and realistically looking samples [38]. However, metrics that measure the diversity in the generated samples are currently intractable [62, 22, 30]. Additionally, instability in their training process [47] requires careful hyperparameter tuning to avoid diverging behavior. Training such a generative networkg that maps latent variablez∼pZ to a samplex∼pX does not in theory require a discriminator network as in GANs, or approximate inference as in variational autoencoders. Indeed, ifg is bijective, it can be trained through maximum likelihood using thechange"},{"citing_arxiv_id":"1511.06434","ref_index":19,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks","primary_cat":"cs.LG","submitted_at":"2015-11-19T22:50:32+00:00","verdict":"ACCEPT","verdict_confidence":"MODERATE","novelty_score":8.0,"formal_verification":"none","one_line_summary":"DCGANs with architectural constraints learn a hierarchy of representations from object parts to scenes in both generator and discriminator across image datasets.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"1503.03585","ref_index":51,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Deep Unsupervised Learning using Nonequilibrium Thermodynamics","primary_cat":"cs.LG","submitted_at":"2015-03-12T04:51:37+00:00","verdict":"ACCEPT","verdict_confidence":"MODERATE","novelty_score":8.0,"formal_verification":"none","one_line_summary":"A forward diffusion process adds noise iteratively to data until it is unstructured, and a neural network learns the reverse process to generate new samples from the original distribution.","context_count":1,"top_context_role":"background","top_context_polarity":"background","context_text":"X(t−1)|X(0) )] −Hp ( X(T) ) (49) = T∑ t=2 ∫ dx(0···T)q ( x(0···T) ) log [ p ( x(t−1)|x(t)) q ( x(t−1)|x(t), x(0)) ] +Hq ( X(T)|X(0) ) −Hq ( X(1)|X(0) ) −Hp ( X(T) ) . (50) Finally we transform the log ratio of probability distributions into a KL divergence, K =− T∑ t=2 ∫ dx(0)dx(t)q ( x(0), x(t) ) DKL ( q ( x(t−1)|x(t), x(0) )⏐⏐⏐ ⏐⏐⏐p ( x(t−1)|x(t) )) (51) +Hq ( X(T)|X(0) ) −Hq ( X(1)|X(0) ) −Hp ( X(T) ) . Note that the entropies can be analytically computed, and the KL divergence can be analytically computed given x(0) and x(t). Gaussian Binomial Well behaved (analytically tractable) distribution π ( x(T)) = N ( x(T); 0, I ) B ( x(T); 0.5 ) Forward diffusion kernel q ( x(t)|x(t−1)) = N ( x(t); x(t−1)√1−βt, Iβt"}],"limit":50,"offset":0}