Attention and LoRA regression losses induce Poincaré inequalities under mild regularization, so SGD-mimicking SDEs converge to minimizers with no assumptions on data or model size.
hub
Proceedings of the IEEE international conference on computer vision , pages=
11 Pith papers cite this work. Polarity classification is still indexing.
hub tools
citation-role summary
citation-polarity summary
roles
background 1polarities
unclear 1representative citing papers
Events trigger on-the-fly LoRA module generation via hypernetworks over a shared team policy in MARL, paired with a Neural Manifold Diversity metric, enabling sequential role reassignment while preserving reward maximization.
ABGD parametrizes piecewise linear functions as difference of max-affine functions and converges linearly to an epsilon-accurate solution with O(d max(sigma/epsilon,1)^2) samples under sub-Gaussian noise, which is minimax optimal up to logs.
Scion is a new stochastic LMO-based optimizer family that unifies existing methods, supports unconstrained problems, and delivers hyperparameter transferability plus speedups on nanoGPT training.
XPERT extracts and reuses cross-domain expert knowledge from pre-trained MoE LLMs via inference analysis and tensor decomposition to improve performance and convergence in downstream language model training.
An attack aligns differently shuffled intermediate activations from secure Transformer inference queries to recover model weights with low error using roughly one dollar of queries.
TLoRA jointly optimizes LoRA initialization via task-data SVD and sensitivity-driven rank allocation, delivering stronger results than standard LoRA across NLU, reasoning, math, code, and chat tasks while using fewer trainable parameters.
E-value sequential tests enable early stopping of MCMC sampling in Bayesian deep ensembles, often needing only a fraction of the full budget while improving over standard deep ensembles.
SPIN lets weak LLMs become strong by self-generating training data from previous model versions and training to prefer human-annotated responses over its own outputs, outperforming DPO even with extra GPT-4 data on benchmarks.
Neo, a cGAN, super-resolves HSC images to HST-like quality and improves galaxy morphological parameter accuracy by factors of 2-10.
citing papers explorer
-
Convergent Stochastic Training of Attention and Understanding LoRA
Attention and LoRA regression losses induce Poincaré inequalities under mild regularization, so SGD-mimicking SDEs converge to minimizers with no assumptions on data or model size.
-
Events as Triggers for Behavioral Diversity in Multi-Agent Reinforcement Learning
Events trigger on-the-fly LoRA module generation via hypernetworks over a shared team policy in MARL, paired with a Neural Manifold Diversity metric, enabling sequential role reassignment while preserving reward maximization.
-
Locally Near Optimal Piecewise Linear Regression in High Dimensions via Difference of Max-Affine Functions
ABGD parametrizes piecewise linear functions as difference of max-affine functions and converges linearly to an epsilon-accurate solution with O(d max(sigma/epsilon,1)^2) samples under sub-Gaussian noise, which is minimax optimal up to logs.
-
Training Deep Learning Models with Norm-Constrained LMOs
Scion is a new stochastic LMO-based optimizer family that unifies existing methods, supports unconstrained problems, and delivers hyperparameter transferability plus speedups on nanoGPT training.
-
XPERT: Expert Knowledge Transfer for Effective Training of Language Models
XPERT extracts and reuses cross-domain expert knowledge from pre-trained MoE LLMs via inference analysis and tensor decomposition to improve performance and convergence in downstream language model training.
-
On the (In-)Security of the Shuffling Defense in the Transformer Secure Inference
An attack aligns differently shuffled intermediate activations from secure Transformer inference queries to recover model weights with low error using roughly one dollar of queries.
-
TLoRA: Task-aware Low Rank Adaptation of Large Language Models
TLoRA jointly optimizes LoRA initialization via task-data SVD and sensitivity-driven rank allocation, delivering stronger results than standard LoRA across NLU, reasoning, math, code, and chat tasks while using fewer trainable parameters.
-
Towards E-Value Based Stopping Rules for Bayesian Deep Ensembles
E-value sequential tests enable early stopping of MCMC sampling in Bayesian deep ensembles, often needing only a fraction of the full budget while improving over standard deep ensembles.
-
Self-Play Fine-Tuning Converts Weak Language Models to Strong Language Models
SPIN lets weak LLMs become strong by self-generating training data from previous model versions and training to prefer human-annotated responses over its own outputs, outperforming DPO even with extra GPT-4 data on benchmarks.
-
Photometric Super-Resolution for Improving Galaxy Morphological Measurements using Conditional Generative Adversarial Networks
Neo, a cGAN, super-resolves HSC images to HST-like quality and improves galaxy morphological parameter accuracy by factors of 2-10.
- Inpainting physics: self-supervised learning for context-driven fluid simulation