{"total":11,"items":[{"citing_arxiv_id":"2605.22644","ref_index":21,"ref_count":1,"confidence":0.55,"is_internal_anchor":false,"paper_title":"Why SGD is not Brownian Motion: A New Perspective on Stochastic Dynamics","primary_cat":"cs.LG","submitted_at":"2026-05-21T15:50:40+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"SGD is reformulated via a master equation from discrete updates, producing a discrete Fokker-Planck equation that predicts non-stationary variance growth proportional to learning rate in flat Hessian directions.","context_count":1,"top_context_role":"background","top_context_polarity":"background","context_text":"Langevin-inspired surrogate dynamics and show that the two descriptions differ already at orderη2. A.1 Exact recursion from the master equation Our starting point is the master equation, which relates the parameter distribution at two successive iterations (Eq. 3): pn+1(w) = Z pn(v)E L [δ(w−v+η∇L(v))]dv.(20) We employ the representation of the delta function via its Fourier transform: δ(a) = Z dp (2π)d eip⊤a, p∈R d.(21) This yields: pn+1(w) = Z dv dp 1 (2π)d pn(v)eip⊤(w−v) EL h eiηp⊤∇L(v) i .(22) We now expand the exponential inη: pn+1(w) = ∞X m=0 (iη)m m! dX i1,...,im=1 Z dv dp 1 (2π)d pn(v)eip⊤(w−v)pi1 · · ·p imEL [∇i1L(v)· · · ∇ imL(v)].(23) Using: Z dp (2π)d eip⊤(w−v)pi1 · · ·p im = (−i)m∇i1 · · · ∇imδ(w−v),(24) we obtain the recursion: pn+1(w) = ∞X m=0 ηm m!"},{"citing_arxiv_id":"2605.19510","ref_index":54,"ref_count":1,"confidence":0.55,"is_internal_anchor":false,"paper_title":"Return of Frustratingly Easy Unsupervised Video Domain Adaptation","primary_cat":"cs.CV","submitted_at":"2026-05-19T08:07:56+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":4.0,"formal_verification":"none","one_line_summary":"MetaTrans improves unsupervised video domain adaptation performance by separating and subtracting spatial and temporal divergences via a dedicated module and a minimal two-term loss objective.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.19392","ref_index":245,"ref_count":1,"confidence":0.55,"is_internal_anchor":false,"paper_title":"Understanding Dynamics of Adam in Zero-Sum Games: An ODE Approach","primary_cat":"cs.LG","submitted_at":"2026-05-19T05:38:47+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"Derives ODE limits of Adam-DA showing that first- and second-order momentum parameters reverse their convergence roles in zero-sum games compared to minimization, validated on GAN experiments.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.10792","ref_index":151,"ref_count":1,"confidence":0.55,"is_internal_anchor":false,"paper_title":"Fixed-Point Neural Optimal Transport without Implicit Differentiation","primary_cat":"math.OC","submitted_at":"2026-05-11T16:22:06+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"A single-network fixed-point formulation for neural optimal transport eliminates adversarial min-max optimization and implicit differentiation while enforcing dual feasibility exactly.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.01765","ref_index":21,"ref_count":1,"confidence":0.55,"is_internal_anchor":false,"paper_title":"Distributional Causal Mediation via Conditional Generative Modeling","primary_cat":"stat.ML","submitted_at":"2026-05-03T07:57:41+00:00","verdict":"UNVERDICTED","verdict_confidence":"UNKNOWN","novelty_score":6.0,"formal_verification":"none","one_line_summary":"DCMA uses conditional generative models to recover and simulate interventional outcome distributions for distributional causal mediation effects, with derived error bounds.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2604.20711","ref_index":4,"ref_count":1,"confidence":0.55,"is_internal_anchor":false,"paper_title":"Participatory provenance as representational auditing for AI-mediated public consultation","primary_cat":"cs.AI","submitted_at":"2026-04-22T15:54:16+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"Participatory provenance auditing of Canada's AI strategy consultation shows official AI summaries exclude 15-17% of participants more than random baselines, with 33-88% exclusion for dissent clusters.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2604.20430","ref_index":78,"ref_count":1,"confidence":0.55,"is_internal_anchor":false,"paper_title":"A discrete-time overdetermined problem for the heat equation","primary_cat":"math.AP","submitted_at":"2026-04-22T10:51:46+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"A discrete-time constant flux condition on the heat equation forces the domain to be a ball under suitable regularity.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2604.20115","ref_index":4,"ref_count":1,"confidence":0.55,"is_internal_anchor":false,"paper_title":"On the Stability and Generalization of First-order Bilevel Minimax Optimization","primary_cat":"cs.LG","submitted_at":"2026-04-22T02:27:24+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"Provides the first systematic generalization analysis via algorithmic stability for single-timescale and two-timescale stochastic gradient descent-ascent in bilevel minimax problems.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2502.07529","ref_index":126,"ref_count":1,"confidence":0.55,"is_internal_anchor":false,"paper_title":"Training Deep Learning Models with Norm-Constrained LMOs","primary_cat":"cs.LG","submitted_at":"2025-02-11T13:10:34+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"Scion is a new stochastic LMO-based optimizer family that unifies existing methods, supports unconstrained problems, and delivers hyperparameter transferability plus speedups on nanoGPT training.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2209.14687","ref_index":18,"ref_count":1,"confidence":0.55,"is_internal_anchor":false,"paper_title":"Diffusion Posterior Sampling for General Noisy Inverse Problems","primary_cat":"stat.ML","submitted_at":"2022-09-29T11:12:27+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"Diffusion models solve noisy (non)linear inverse problems via approximated posterior sampling that blends diffusion steps with manifold gradients without strict consistency projection.","context_count":1,"top_context_role":"background","top_context_polarity":"background","context_text":"The likelihood function for the Poisson measurements under the i.i.d. assumption is given as p(y|x0) = nY j=1 [A(x0)]yj j exp [[−A(x0)]j] yj ! , (17) where j indexes the measurement bin. In most cases where the measured values are not too small, the model can be approximated by a Gaussian distribution with very high accuracy4. Namely, p(y|x0) → nY j=1 1p 2π[A(x0)]j exp \u0012 −(yj − [A(x0)]j)2 2[A(x0)]j \u0013 (18) ≃ nY j=1 1p2πyj exp \u0012 −(yj − [A(x0)]j)2 2yj \u0013 , (19) where we have used the standard approximation for the shot noise model [A(x0)]j ≃ yj to arrive at the last equation (Kingston, 2013). Then, similar to the Gaussian case, by differentiation and the use of Theorem 1, we have that ∇xt log p(y|xt) ≃ −ρ∇xt ∥y − A(x0)∥2 Λ, [Λ]ii ≜ 1/2yj, (20) where ∥a∥2"},{"citing_arxiv_id":"2209.14577","ref_index":82,"ref_count":1,"confidence":0.55,"is_internal_anchor":false,"paper_title":"Rectified Flow: A Marginal Preserving Approach to Optimal Transport","primary_cat":"stat.ML","submitted_at":"2022-09-29T06:37:26+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"A single-objective rectified flow variant uses neural ODEs trained by regression to monotonically decrease a fixed convex transport cost while preserving marginal distributions.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null}],"limit":50,"offset":0}