Floating-point neural networks with automatic differentiation can represent arbitrary floating-point functions and their gradients under mild conditions.
Gaussian error linear units (
4 Pith papers cite this work. Polarity classification is still indexing.
representative citing papers
STMD distills the full transition map of diffusion sampling SDEs into a conditional Mean Flow model to enable fast one- or few-step stochastic sampling without teacher models or bi-level optimization.
MetaCLIP curates balanced 400M-pair subsets from CommonCrawl that outperform CLIP data, reaching 70.8% zero-shot ImageNet accuracy on ViT-B versus CLIP's 68.3%.
A constrained hypothesis-class framework for identifying mesoscopic dynamics from data, backed by uniform well-posedness and stability guarantees derived from a generalized Onsager principle.
citing papers explorer
-
Floating-Point Networks with Automatic Differentiation Can Represent Almost All Floating-Point Functions and Their Gradients
Floating-point neural networks with automatic differentiation can represent arbitrary floating-point functions and their gradients under mild conditions.
-
Stochastic Transition-Map Distillation for Fast Probabilistic Inference
STMD distills the full transition map of diffusion sampling SDEs into a conditional Mean Flow model to enable fast one- or few-step stochastic sampling without teacher models or bi-level optimization.
-
Demystifying CLIP Data
MetaCLIP curates balanced 400M-pair subsets from CommonCrawl that outperform CLIP data, reaching 70.8% zero-shot ImageNet accuracy on ViT-B versus CLIP's 68.3%.
-
Hypothesis-driven construction of mesoscopic dynamics
A constrained hypothesis-class framework for identifying mesoscopic dynamics from data, backed by uniform well-posedness and stability guarantees derived from a generalized Onsager principle.