and Tanaka, Hidenori , year =

Okawa, Maya, Lubana, Ekdeep Singh, Dick, Robert P · arXiv 2310.09336

2 Pith papers cite this work. Polarity classification is still indexing.

2 Pith papers citing it

representative citing papers

Grokking of Diffusion Models: Case Study on Modular Addition

cs.LG · 2026-04-20 · unverdicted · novelty 7.0

Diffusion models show grokking on modular addition by composing periodic operand representations in simple data regimes or by separating arithmetic computation from visual denoising across timesteps in varied regimes.

The two clocks and the innovation window: When and how generative models learn rules

cs.LG · 2026-05-11 · unverdicted · novelty 6.0

Generative models learn rules before memorizing data, creating an innovation window whose width depends on dataset size and rule complexity, observed in both diffusion and autoregressive architectures.

citing papers explorer

Showing 2 of 2 citing papers.

Grokking of Diffusion Models: Case Study on Modular Addition cs.LG · 2026-04-20 · unverdicted · none · ref 16
Diffusion models show grokking on modular addition by composing periodic operand representations in simple data regimes or by separating arithmetic computation from visual denoising across timesteps in varied regimes.
The two clocks and the innovation window: When and how generative models learn rules cs.LG · 2026-05-11 · unverdicted · none · ref 56
Generative models learn rules before memorizing data, creating an innovation window whose width depends on dataset size and rule complexity, observed in both diffusion and autoregressive architectures.

and Tanaka, Hidenori , year =

fields

years

verdicts

representative citing papers

citing papers explorer