Consistency models achieve fast one-step generation with SOTA FID of 3.55 on CIFAR-10 and 6.20 on ImageNet 64x64 by directly mapping noise to data, outperforming prior distillation techniques.
hub
On the variance of the adaptive learning rate and beyond
11 Pith papers cite this work. Polarity classification is still indexing.
hub tools
representative citing papers
Manifold steering along activation geometry induces behavioral trajectories matching the natural manifold of outputs, while linear steering produces off-manifold unnatural behaviors.
Latent Consistency Models enable high-fidelity text-to-image generation in 2-4 steps by directly predicting solutions to the probability flow ODE in latent space, distilled from pre-trained LDMs.
Anon optimizer uses tunable adaptivity and incremental delay update to achieve convergence guarantees and outperform existing methods on image classification, diffusion, and language modeling tasks.
PaRT achieves >50% tagging efficiency for boosted H->WW jets at 1% background efficiency, decorrelated from jet mass, with data-to-simulation scale factors of 0.9-1.0 on 138 fb^{-1} of 13 TeV collisions.
APT augments multi-task learning by adapting advanced optimizers via momentum balancing and light direction preservation, delivering performance gains on four standard MTL datasets.
Fisher vector encoding integrated into CNN-ViT hybrids outperforms benchmarks on MedMNIST datasets and matches literature results on other medical image sets.
AdaMeZO adapts Adam moment estimates to zeroth-order LLM fine-tuning without extra memory storage, outperforming MeZO with up to 70% fewer forward passes.
A neural network fuses wheel and motor speed signals to cut wheel-speed estimation error by up to 85% versus the production sensor on real Volkswagen ID.7 data.
A globally video-guided multimodal translation framework retrieves semantically related video segments with a vector database and applies attention mechanisms to improve subtitle translation accuracy in long videos.
DualOpt decouples optimization by using real-time layer-wise weight decay for scratch training and weight rollback for fine-tuning to improve convergence, generalization, and reduce knowledge forgetting.
citing papers explorer
-
Consistency Models
Consistency models achieve fast one-step generation with SOTA FID of 3.55 on CIFAR-10 and 6.20 on ImageNet 64x64 by directly mapping noise to data, outperforming prior distillation techniques.
-
Manifold Steering Reveals the Shared Geometry of Neural Network Representation and Behavior
Manifold steering along activation geometry induces behavioral trajectories matching the natural manifold of outputs, while linear steering produces off-manifold unnatural behaviors.
-
Latent Consistency Models: Synthesizing High-Resolution Images with Few-Step Inference
Latent Consistency Models enable high-fidelity text-to-image generation in 2-4 steps by directly predicting solutions to the probability flow ODE in latent space, distilled from pre-trained LDMs.
-
Anon: Extrapolating Adaptivity Beyond SGD and Adam
Anon optimizer uses tunable adaptivity and incremental delay update to achieve convergence guarantees and outperform existing methods on image classification, diffusion, and language modeling tasks.
-
Particle transformers for identifying Lorentz-boosted Higgs bosons decaying to a pair of W bosons
PaRT achieves >50% tagging efficiency for boosted H->WW jets at 1% background efficiency, decorrelated from jet mass, with data-to-simulation scale factors of 0.9-1.0 on 138 fb^{-1} of 13 TeV collisions.
-
Delve into the Applicability of Advanced Optimizers for Multi-Task Learning
APT augments multi-task learning by adapting advanced optimizers via momentum balancing and light direction preservation, delivering performance gains on four standard MTL datasets.
-
Deep neural networks with Fisher vector encoding for medical image classification
Fisher vector encoding integrated into CNN-ViT hybrids outperforms benchmarks on MedMNIST datasets and matches literature results on other medical image sets.
-
AdaMeZO: Adam-style Zeroth-Order Optimizer for LLM Fine-tuning Without Maintaining the Moments
AdaMeZO adapts Adam moment estimates to zeroth-order LLM fine-tuning without extra memory storage, outperforming MeZO with up to 70% fewer forward passes.
-
Neural Network-Based Virtual Wheel-Speed Sensor for Enhanced Low-Velocity State Estimation
A neural network fuses wheel and motor speed signals to cut wheel-speed estimation error by up to 85% versus the production sensor on real Volkswagen ID.7 data.
-
Video-guided Machine Translation with Global Video Context
A globally video-guided multimodal translation framework retrieves semantically related video segments with a vector database and applies attention mechanisms to improve subtitle translation accuracy in long videos.
-
Neural Network Optimization Reimagined: Decoupled Techniques for Scratch and Fine-Tuning
DualOpt decouples optimization by using real-time layer-wise weight decay for scratch training and weight rollback for fine-tuning to improve convergence, generalization, and reduce knowledge forgetting.