TFlow enables multi-agent LLMs to collaborate via transient low-rank LoRA perturbations derived from sender activations, yielding up to 8.5 accuracy gains and 83% token reduction versus text-based baselines on Qwen3-4B models.
hub
Title resolution pending
22 Pith papers cite this work. Polarity classification is still indexing.
abstract
This work explores hypernetworks: an approach of using a one network, also known as a hypernetwork, to generate the weights for another network. Hypernetworks provide an abstraction that is similar to what is found in nature: the relationship between a genotype - the hypernetwork - and a phenotype - the main network. Though they are also reminiscent of HyperNEAT in evolution, our hypernetworks are trained end-to-end with backpropagation and thus are usually faster. The focus of this work is to make hypernetworks useful for deep convolutional networks and long recurrent networks, where hypernetworks can be viewed as relaxed form of weight-sharing across layers. Our main result is that hypernetworks can generate non-shared weights for LSTM and achieve near state-of-the-art results on a variety of sequence modelling tasks including character-level language modelling, handwriting generation and neural machine translation, challenging the weight-sharing paradigm for recurrent networks. Our results also show that hypernetworks applied to convolutional networks still achieve respectable results for image recognition tasks compared to state-of-the-art baseline models while requiring fewer learnable parameters.
hub tools
representative citing papers
A hypernetwork maps style motion embeddings to LoRA updates that stylize text-driven motion diffusion models with improved generalization to unseen styles via contrastive structuring of the style space.
Events trigger on-the-fly LoRA module generation via hypernetworks over a shared team policy in MARL, paired with a Neural Manifold Diversity metric, enabling sequential role reassignment while preserving reward maximization.
EnvCoLoc uses 3D point cloud-conditioned diffusion meta-learning to reduce mean WiFi localization error by up to 20% in NLOS scenarios with only 10 support samples.
NonZero introduces an interaction score and bandit-formalized proposal rule for local agent deviations in multi-agent MCTS, delivering a sublinear local-regret guarantee and improved sample efficiency on game benchmarks without full joint-action enumeration.
CLOVER augments value decomposition with a GNN mixer whose weights depend on the realized wireless communication graph, proving permutation invariance, monotonicity, and greater expressiveness than QMIX while showing gains on Predator-Prey and Lumberjacks under p-CSMA channels.
IA-VAE augments amortized variational inference with hypernetwork-generated instance-adaptive modulations, strictly containing the standard variational family and improving held-out ELBO on synthetic and image data.
Automated search discovers Swish activation f(x) = x * sigmoid(βx) that improves top-1 ImageNet accuracy over ReLU by 0.9% on Mobile NASNet-A and 0.6% on Inception-ResNet-v2.
MULTI uses two-stage textual inversion to disentangle camera lens, sensor, view, and domain factors for novel image generation, supporting dataset extension and ControlNet modifications on the new DF-RICO benchmark.
Hystar adapts CLIP-like models to unseen query styles by generating per-input singular-value perturbations with a hypernetwork for attention layers and a new StyleNCE contrastive loss.
RareCP improves interval efficiency for time series conformal prediction by retrieving and weighting regime-specific calibration examples while adapting to drift and maintaining coverage.
MoMo uses Feature-Wise Linear Modulation and low-rank neural modulation to condition contrastive planning representations on user preferences while preserving inference efficiency and probability density ratios.
Dynamic parameterization of standard layers can replace explicit attention for linear-time global visual modeling.
ST-PT turns transformers into explicit factor graphs for time series, enabling structural injection of symbolic priors, per-sample conditional generation, and principled latent autoregressive forecasting via MFVI iterations.
Knowledge conflicts in hypernetwork LLM adaptation stem from constant adapter margins losing to frequency-dependent pretrained margins; selective layer boosting and conflict-aware triggering raise deep-conflict accuracy to 71-72.5% on Gemma-2B and Mistral-7B.
FLARE predicts post-cooling displacement fields in directed energy deposition by encoding simulations as implicit neural fields whose weights are regularized to follow an affine structure in parameter space, enabling data-efficient prediction via weight mixing.
Hypernetworks map a forcing parameter directly to policy weights in an RL framework, enabling unified stabilization of the Kuramoto-Sivashinsky equation across regimes with KAN architectures showing strongest extrapolation.
HyperFitS is a hypernetwork for configurable spectral fitting in 1H MRSI that matches conventional LCModel results while processing whole-brain data in seconds instead of hours and adapting to varied protocols without retraining.
AdaAct employs a HOI encoder and two-branch hypernetwork to adaptively adjust temporal encoding parameters based on video-level human-object interactions for improved weakly-supervised action segmentation.
Neural Computers are introduced as a new machine form where computation, memory, and I/O are unified in a learned runtime state, with initial video-model experiments showing acquisition of basic interface primitives from traces.
MaskGen improves domain generalization for biomedical image segmentation by using source intensities plus domain-stable foundation model representations with minimal added complexity.
AM-KNet adds sensor-specific modules, hypernetwork conditioning on target type and pose, and Joseph-form covariance estimation to KalmanNet, yielding better accuracy and stability than base KalmanNet on nuScenes and View-of-Delft data.
citing papers explorer
-
Good Agentic Friends Do Not Just Give Verbal Advice: They Can Update Your Weights
TFlow enables multi-agent LLMs to collaborate via transient low-rank LoRA perturbations derived from sender activations, yielding up to 8.5 accuracy gains and 83% token reduction versus text-based baselines on Qwen3-4B models.
-
Stylized Text-to-Motion Generation via Hypernetwork-Driven Low-Rank Adaptation
A hypernetwork maps style motion embeddings to LoRA updates that stylize text-driven motion diffusion models with improved generalization to unseen styles via contrastive structuring of the style space.
-
Events as Triggers for Behavioral Diversity in Multi-Agent Reinforcement Learning
Events trigger on-the-fly LoRA module generation via hypernetworks over a shared team policy in MARL, paired with a Neural Manifold Diversity metric, enabling sequential role reassignment while preserving reward maximization.
-
Environment-Conditioned Diffusion Meta-Learning for Data-Efficient WiFi Localization
EnvCoLoc uses 3D point cloud-conditioned diffusion meta-learning to reduce mean WiFi localization error by up to 20% in NLOS scenarios with only 10 support samples.
-
NonZero: Interaction-Guided Exploration for Multi-Agent Monte Carlo Tree Search
NonZero introduces an interaction score and bandit-formalized proposal rule for local agent deviations in multi-agent MCTS, delivering a sublinear local-regret guarantee and improved sample efficiency on game benchmarks without full joint-action enumeration.
-
Wireless Communication Enhanced Value Decomposition for Multi-Agent Reinforcement Learning
CLOVER augments value decomposition with a GNN mixer whose weights depend on the realized wireless communication graph, proving permutation invariance, monotonicity, and greater expressiveness than QMIX while showing gains on Predator-Prey and Lumberjacks under p-CSMA channels.
-
Instance-Adaptive Parametrization for Amortized Variational Inference
IA-VAE augments amortized variational inference with hypernetwork-generated instance-adaptive modulations, strictly containing the standard variational family and improving held-out ELBO on synthetic and image data.
-
Searching for Activation Functions
Automated search discovers Swish activation f(x) = x * sigmoid(βx) that improves top-1 ImageNet accuracy over ReLU by 0.9% on Mobile NASNet-A and 0.6% on Inception-ResNet-v2.
-
MULTI: Disentangling Camera Lens, Sensor, View, and Domain for Novel Image Generation
MULTI uses two-stage textual inversion to disentangle camera lens, sensor, view, and domain factors for novel image generation, supporting dataset extension and ControlNet modifications on the new DF-RICO benchmark.
-
Hystar: Hypernetwork-driven Style-adaptive Retrieval via Dynamic SVD Modulation
Hystar adapts CLIP-like models to unseen query styles by generating per-input singular-value perturbations with a hypernetwork for attention layers and a new StyleNCE contrastive loss.
-
RareCP: Regime-Aware Retrieval for Efficient Conformal Prediction
RareCP improves interval efficiency for time series conformal prediction by retrieving and weighting regime-specific calibration examples while adapting to drift and maintaining coverage.
-
MoMo: Conditioned Contrastive Representation Learning for Preference-Modulated Planning
MoMo uses Feature-Wise Linear Modulation and low-rank neural modulation to condition contrastive planning representations on user preferences while preserving inference efficiency and probability density ratios.
-
Linear-Time Global Visual Modeling without Explicit Attention
Dynamic parameterization of standard layers can replace explicit attention for linear-time global visual modeling.
-
Exploring the Potential of Probabilistic Transformer for Time Series Modeling: A Report on the ST-PT Framework
ST-PT turns transformers into explicit factor graphs for time series, enabling structural injection of symbolic priors, per-sample conditional generation, and principled latent autoregressive forecasting via MFVI iterations.
-
The Override Gap: A Magnitude Account of Knowledge Conflict Failure in Hypernetwork-Based Instant LLM Adaptation
Knowledge conflicts in hypernetwork LLM adaptation stem from constant adapter margins losing to frequency-dependent pretrained margins; selective layer boosting and conflict-aware triggering raise deep-conflict accuracy to 71-72.5% on Gemma-2B and Mistral-7B.
-
FLARE: A Data-Efficient Surrogate for Predicting Displacement Fields in Directed Energy Deposition
FLARE predicts post-cooling displacement fields in directed energy deposition by encoding simulations as implicit neural fields whose weights are regularized to follow an affine structure in parameter space, enabling data-efficient prediction via weight mixing.
-
Hyperfastrl: Hypernetwork-based reinforcement learning for unified control of parametric chaotic PDEs
Hypernetworks map a forcing parameter directly to policy weights in an RL framework, enabling unified stabilization of the Kuramoto-Sivashinsky equation across regimes with KAN architectures showing strongest extrapolation.
-
HyperFitS -- Hypernetwork Fitting Spectra for metabolic quantification of ${}^1$H MR spectroscopic imaging
HyperFitS is a hypernetwork for configurable spectral fitting in 1H MRSI that matches conventional LCModel results while processing whole-brain data in seconds instead of hours and adapting to varied protocols without retraining.
-
HOI-aware Adaptive Network for Weakly-supervised Action Segmentation
AdaAct employs a HOI encoder and two-branch hypernetwork to adaptively adjust temporal encoding parameters based on video-level human-object interactions for improved weakly-supervised action segmentation.
-
Neural Computers
Neural Computers are introduced as a new machine form where computation, memory, and I/O are unified in a learned runtime state, with initial video-model experiments showing acquisition of basic interface primitives from traces.
-
Why Invariance is Not Enough for Biomedical Domain Generalization and How to Fix It
MaskGen improves domain generalization for biomedical image segmentation by using source intensities plus domain-stable foundation model representations with minimal added complexity.
-
Adaptive Learned State Estimation based on KalmanNet
AM-KNet adds sensor-specific modules, hypernetwork conditioning on target type and pose, and Joseph-form covariance estimation to KalmanNet, yielding better accuracy and stability than base KalmanNet on nuScenes and View-of-Delft data.