Gaussian error linear units (

Hendrycks, Dan, Gimpel, Kevin , journal=

4 Pith papers cite this work. Polarity classification is still indexing.

4 Pith papers citing it

browse 4 citing papers

representative citing papers

Floating-Point Networks with Automatic Differentiation Can Represent Almost All Floating-Point Functions and Their Gradients

cs.LG · 2026-05-03 · unverdicted · novelty 8.0

Floating-point neural networks with automatic differentiation can represent arbitrary floating-point functions and their gradients under mild conditions.

Stochastic Transition-Map Distillation for Fast Probabilistic Inference

cs.LG · 2026-05-08 · unverdicted · novelty 7.0

STMD distills the full transition map of diffusion sampling SDEs into a conditional Mean Flow model to enable fast one- or few-step stochastic sampling without teacher models or bi-level optimization.

Demystifying CLIP Data

cs.CV · 2023-09-28 · accept · novelty 6.0

MetaCLIP curates balanced 400M-pair subsets from CommonCrawl that outperform CLIP data, reaching 70.8% zero-shot ImageNet accuracy on ViT-B versus CLIP's 68.3%.

Hypothesis-driven construction of mesoscopic dynamics

cs.LG · 2026-05-15 · unverdicted · novelty 5.0

A constrained hypothesis-class framework for identifying mesoscopic dynamics from data, backed by uniform well-posedness and stability guarantees derived from a generalized Onsager principle.

citing papers explorer

Showing 4 of 4 citing papers.

Floating-Point Networks with Automatic Differentiation Can Represent Almost All Floating-Point Functions and Their Gradients cs.LG · 2026-05-03 · unverdicted · none · ref 32
Floating-point neural networks with automatic differentiation can represent arbitrary floating-point functions and their gradients under mild conditions.
Stochastic Transition-Map Distillation for Fast Probabilistic Inference cs.LG · 2026-05-08 · unverdicted · none · ref 107
STMD distills the full transition map of diffusion sampling SDEs into a conditional Mean Flow model to enable fast one- or few-step stochastic sampling without teacher models or bi-level optimization.
Demystifying CLIP Data cs.CV · 2023-09-28 · accept · none · ref 97
MetaCLIP curates balanced 400M-pair subsets from CommonCrawl that outperform CLIP data, reaching 70.8% zero-shot ImageNet accuracy on ViT-B versus CLIP's 68.3%.
Hypothesis-driven construction of mesoscopic dynamics cs.LG · 2026-05-15 · unverdicted · none · ref 42
A constrained hypothesis-class framework for identifying mesoscopic dynamics from data, backed by uniform well-posedness and stability guarantees derived from a generalized Onsager principle.

Gaussian error linear units (

fields

years

verdicts

representative citing papers

citing papers explorer