A framework learns effective multiscale stochastic dynamics from single slow-variable paths by parameterizing the fast process invariant distribution with normalizing flows, trained end-to-end via penalized likelihood from stochastic averaging.
citation dossier
Approximation capabilities of multilayer feedforward networks.Neural networks, 4(2):251–257
why this work matters in Pith
Pith has found this work in 2 reviewed papers. Its strongest current cluster is cs.LG (1 papers). The largest review-status bucket among citing papers is UNVERDICTED (2 papers). For highly cited works, this page shows a dossier first and a bounded explorer second; it never tries to render every citing paper at once.
years
2026 2verdicts
UNVERDICTED 2representative citing papers
Training transformers with KV sparsification during continued pretraining produces representations that admit better post-hoc KV cache compression, improving quality under memory budgets for long-context tasks.
citing papers explorer
-
Learning stochastic multiscale models through normalizing flows
A framework learns effective multiscale stochastic dynamics from single slow-variable paths by parameterizing the fast process invariant distribution with normalizing flows, trained end-to-end via penalized likelihood from stochastic averaging.
-
Training Transformers for KV Cache Compressibility
Training transformers with KV sparsification during continued pretraining produces representations that admit better post-hoc KV cache compression, improving quality under memory budgets for long-context tasks.