hub

Title resolution pending

David Ha, Andrew Dai, Quoc V. Le · 2016 · cs.LG · arXiv 1609.09106

22 Pith papers cite this work. Polarity classification is still indexing.

22 Pith papers citing it

open full Pith review browse 22 citing papers arXiv PDF

Title metadata for this work has not finished resolving. The hub is built from the citation graph; the title resolver retries DOI and OpenAlex on its next pass.

abstract

This work explores hypernetworks: an approach of using a one network, also known as a hypernetwork, to generate the weights for another network. Hypernetworks provide an abstraction that is similar to what is found in nature: the relationship between a genotype - the hypernetwork - and a phenotype - the main network. Though they are also reminiscent of HyperNEAT in evolution, our hypernetworks are trained end-to-end with backpropagation and thus are usually faster. The focus of this work is to make hypernetworks useful for deep convolutional networks and long recurrent networks, where hypernetworks can be viewed as relaxed form of weight-sharing across layers. Our main result is that hypernetworks can generate non-shared weights for LSTM and achieve near state-of-the-art results on a variety of sequence modelling tasks including character-level language modelling, handwriting generation and neural machine translation, challenging the weight-sharing paradigm for recurrent networks. Our results also show that hypernetworks applied to convolutional networks still achieve respectable results for image recognition tasks compared to state-of-the-art baseline models while requiring fewer learnable parameters.

hub tools

JSON dossier citing papers JSON arXiv source

representative citing papers

Good Agentic Friends Do Not Just Give Verbal Advice: They Can Update Your Weights

cs.CL · 2026-05-13 · unverdicted · novelty 7.0

TFlow enables multi-agent LLMs to collaborate via transient low-rank LoRA perturbations derived from sender activations, yielding up to 8.5 accuracy gains and 83% token reduction versus text-based baselines on Qwen3-4B models.

Stylized Text-to-Motion Generation via Hypernetwork-Driven Low-Rank Adaptation

cs.CV · 2026-05-13 · unverdicted · novelty 7.0

A hypernetwork maps style motion embeddings to LoRA updates that stylize text-driven motion diffusion models with improved generalization to unseen styles via contrastive structuring of the style space.

Events as Triggers for Behavioral Diversity in Multi-Agent Reinforcement Learning

cs.MA · 2026-05-12 · unverdicted · novelty 7.0 · 2 refs

Events trigger on-the-fly LoRA module generation via hypernetworks over a shared team policy in MARL, paired with a Neural Manifold Diversity metric, enabling sequential role reassignment while preserving reward maximization.

Environment-Conditioned Diffusion Meta-Learning for Data-Efficient WiFi Localization

eess.SP · 2026-05-11 · unverdicted · novelty 7.0

EnvCoLoc uses 3D point cloud-conditioned diffusion meta-learning to reduce mean WiFi localization error by up to 20% in NLOS scenarios with only 10 support samples.

NonZero: Interaction-Guided Exploration for Multi-Agent Monte Carlo Tree Search

cs.LG · 2026-05-01 · unverdicted · novelty 7.0

NonZero introduces an interaction score and bandit-formalized proposal rule for local agent deviations in multi-agent MCTS, delivering a sublinear local-regret guarantee and improved sample efficiency on game benchmarks without full joint-action enumeration.

Wireless Communication Enhanced Value Decomposition for Multi-Agent Reinforcement Learning

cs.LG · 2026-04-09 · unverdicted · novelty 7.0

CLOVER augments value decomposition with a GNN mixer whose weights depend on the realized wireless communication graph, proving permutation invariance, monotonicity, and greater expressiveness than QMIX while showing gains on Predator-Prey and Lumberjacks under p-CSMA channels.

Instance-Adaptive Parametrization for Amortized Variational Inference

cs.LG · 2026-04-08 · unverdicted · novelty 7.0

IA-VAE augments amortized variational inference with hypernetwork-generated instance-adaptive modulations, strictly containing the standard variational family and improving held-out ELBO on synthetic and image data.

Searching for Activation Functions

cs.NE · 2017-10-16 · conditional · novelty 7.0

Automated search discovers Swish activation f(x) = x * sigmoid(βx) that improves top-1 ImageNet accuracy over ReLU by 0.9% on Mobile NASNet-A and 0.6% on Inception-ResNet-v2.

MULTI: Disentangling Camera Lens, Sensor, View, and Domain for Novel Image Generation

cs.CV · 2026-05-12 · unverdicted · novelty 6.0

MULTI uses two-stage textual inversion to disentangle camera lens, sensor, view, and domain factors for novel image generation, supporting dataset extension and ControlNet modifications on the new DF-RICO benchmark.

Hystar: Hypernetwork-driven Style-adaptive Retrieval via Dynamic SVD Modulation

cs.CV · 2026-05-11 · unverdicted · novelty 6.0

Hystar adapts CLIP-like models to unseen query styles by generating per-input singular-value perturbations with a hypernetwork for attention layers and a new StyleNCE contrastive loss.

RareCP: Regime-Aware Retrieval for Efficient Conformal Prediction

cs.LG · 2026-05-09 · unverdicted · novelty 6.0

RareCP improves interval efficiency for time series conformal prediction by retrieving and weighting regime-specific calibration examples while adapting to drift and maintaining coverage.

MoMo: Conditioned Contrastive Representation Learning for Preference-Modulated Planning

cs.LG · 2026-05-08 · unverdicted · novelty 6.0

MoMo uses Feature-Wise Linear Modulation and low-rank neural modulation to condition contrastive planning representations on user preferences while preserving inference efficiency and probability density ratios.

Linear-Time Global Visual Modeling without Explicit Attention

cs.CV · 2026-05-03 · unverdicted · novelty 6.0

Dynamic parameterization of standard layers can replace explicit attention for linear-time global visual modeling.

Exploring the Potential of Probabilistic Transformer for Time Series Modeling: A Report on the ST-PT Framework

cs.LG · 2026-04-29 · unverdicted · novelty 6.0

ST-PT turns transformers into explicit factor graphs for time series, enabling structural injection of symbolic priors, per-sample conditional generation, and principled latent autoregressive forecasting via MFVI iterations.

The Override Gap: A Magnitude Account of Knowledge Conflict Failure in Hypernetwork-Based Instant LLM Adaptation

cs.LG · 2026-04-26 · conditional · novelty 6.0 · 2 refs

Knowledge conflicts in hypernetwork LLM adaptation stem from constant adapter margins losing to frequency-dependent pretrained margins; selective layer boosting and conflict-aware triggering raise deep-conflict accuracy to 71-72.5% on Gemma-2B and Mistral-7B.

FLARE: A Data-Efficient Surrogate for Predicting Displacement Fields in Directed Energy Deposition

cs.LG · 2026-04-17 · unverdicted · novelty 6.0

FLARE predicts post-cooling displacement fields in directed energy deposition by encoding simulations as implicit neural fields whose weights are regularized to follow an affine structure in parameter space, enabling data-efficient prediction via weight mixing.

Hyperfastrl: Hypernetwork-based reinforcement learning for unified control of parametric chaotic PDEs

cs.CE · 2026-04-07 · unverdicted · novelty 6.0

Hypernetworks map a forcing parameter directly to policy weights in an RL framework, enabling unified stabilization of the Kuramoto-Sivashinsky equation across regimes with KAN architectures showing strongest extrapolation.

HyperFitS -- Hypernetwork Fitting Spectra for metabolic quantification of ${}^1$H MR spectroscopic imaging

cs.LG · 2026-04-03 · unverdicted · novelty 6.0

HyperFitS is a hypernetwork for configurable spectral fitting in 1H MRSI that matches conventional LCModel results while processing whole-brain data in seconds instead of hours and adapting to varied protocols without retraining.

HOI-aware Adaptive Network for Weakly-supervised Action Segmentation

cs.CV · 2026-04-29 · unverdicted · novelty 5.0

AdaAct employs a HOI encoder and two-branch hypernetwork to adaptively adjust temporal encoding parameters based on video-level human-object interactions for improved weakly-supervised action segmentation.

Neural Computers

cs.LG · 2026-04-07 · unverdicted · novelty 5.0

Neural Computers are introduced as a new machine form where computation, memory, and I/O are unified in a learned runtime state, with initial video-model experiments showing acquisition of basic interface primitives from traces.

Why Invariance is Not Enough for Biomedical Domain Generalization and How to Fix It

eess.IV · 2026-04-02 · unverdicted · novelty 5.0

MaskGen improves domain generalization for biomedical image segmentation by using source intensities plus domain-stable foundation model representations with minimal added complexity.

Adaptive Learned State Estimation based on KalmanNet

cs.RO · 2026-04-02 · unverdicted · novelty 5.0

AM-KNet adds sensor-specific modules, hypernetwork conditioning on target type and pose, and Joseph-form covariance estimation to KalmanNet, yielding better accuracy and stability than base KalmanNet on nuScenes and View-of-Delft data.

citing papers explorer

Showing 22 of 22 citing papers.

Good Agentic Friends Do Not Just Give Verbal Advice: They Can Update Your Weights cs.CL · 2026-05-13 · unverdicted · none · ref 19 · internal anchor
TFlow enables multi-agent LLMs to collaborate via transient low-rank LoRA perturbations derived from sender activations, yielding up to 8.5 accuracy gains and 83% token reduction versus text-based baselines on Qwen3-4B models.
Stylized Text-to-Motion Generation via Hypernetwork-Driven Low-Rank Adaptation cs.CV · 2026-05-13 · unverdicted · none · ref 26 · internal anchor
A hypernetwork maps style motion embeddings to LoRA updates that stylize text-driven motion diffusion models with improved generalization to unseen styles via contrastive structuring of the style space.
Events as Triggers for Behavioral Diversity in Multi-Agent Reinforcement Learning cs.MA · 2026-05-12 · unverdicted · none · ref 10 · 2 links · internal anchor
Events trigger on-the-fly LoRA module generation via hypernetworks over a shared team policy in MARL, paired with a Neural Manifold Diversity metric, enabling sequential role reassignment while preserving reward maximization.
Environment-Conditioned Diffusion Meta-Learning for Data-Efficient WiFi Localization eess.SP · 2026-05-11 · unverdicted · none · ref 29 · internal anchor
EnvCoLoc uses 3D point cloud-conditioned diffusion meta-learning to reduce mean WiFi localization error by up to 20% in NLOS scenarios with only 10 support samples.
NonZero: Interaction-Guided Exploration for Multi-Agent Monte Carlo Tree Search cs.LG · 2026-05-01 · unverdicted · none · ref 3 · internal anchor
NonZero introduces an interaction score and bandit-formalized proposal rule for local agent deviations in multi-agent MCTS, delivering a sublinear local-regret guarantee and improved sample efficiency on game benchmarks without full joint-action enumeration.
Wireless Communication Enhanced Value Decomposition for Multi-Agent Reinforcement Learning cs.LG · 2026-04-09 · unverdicted · none · ref 57 · internal anchor
CLOVER augments value decomposition with a GNN mixer whose weights depend on the realized wireless communication graph, proving permutation invariance, monotonicity, and greater expressiveness than QMIX while showing gains on Predator-Prey and Lumberjacks under p-CSMA channels.
Instance-Adaptive Parametrization for Amortized Variational Inference cs.LG · 2026-04-08 · unverdicted · none · ref 24 · internal anchor
IA-VAE augments amortized variational inference with hypernetwork-generated instance-adaptive modulations, strictly containing the standard variational family and improving held-out ELBO on synthetic and image data.
Searching for Activation Functions cs.NE · 2017-10-16 · conditional · none · ref 8 · internal anchor
Automated search discovers Swish activation f(x) = x * sigmoid(βx) that improves top-1 ImageNet accuracy over ReLU by 0.9% on Mobile NASNet-A and 0.6% on Inception-ResNet-v2.
MULTI: Disentangling Camera Lens, Sensor, View, and Domain for Novel Image Generation cs.CV · 2026-05-12 · unverdicted · none · ref 18 · internal anchor
MULTI uses two-stage textual inversion to disentangle camera lens, sensor, view, and domain factors for novel image generation, supporting dataset extension and ControlNet modifications on the new DF-RICO benchmark.
Hystar: Hypernetwork-driven Style-adaptive Retrieval via Dynamic SVD Modulation cs.CV · 2026-05-11 · unverdicted · none · ref 7 · internal anchor
Hystar adapts CLIP-like models to unseen query styles by generating per-input singular-value perturbations with a hypernetwork for attention layers and a new StyleNCE contrastive loss.
RareCP: Regime-Aware Retrieval for Efficient Conformal Prediction cs.LG · 2026-05-09 · unverdicted · none · ref 39 · internal anchor
RareCP improves interval efficiency for time series conformal prediction by retrieving and weighting regime-specific calibration examples while adapting to drift and maintaining coverage.
MoMo: Conditioned Contrastive Representation Learning for Preference-Modulated Planning cs.LG · 2026-05-08 · unverdicted · none · ref 49 · internal anchor
MoMo uses Feature-Wise Linear Modulation and low-rank neural modulation to condition contrastive planning representations on user preferences while preserving inference efficiency and probability density ratios.
Linear-Time Global Visual Modeling without Explicit Attention cs.CV · 2026-05-03 · unverdicted · none · ref 12 · internal anchor
Dynamic parameterization of standard layers can replace explicit attention for linear-time global visual modeling.
Exploring the Potential of Probabilistic Transformer for Time Series Modeling: A Report on the ST-PT Framework cs.LG · 2026-04-29 · unverdicted · none · ref 46 · internal anchor
ST-PT turns transformers into explicit factor graphs for time series, enabling structural injection of symbolic priors, per-sample conditional generation, and principled latent autoregressive forecasting via MFVI iterations.
The Override Gap: A Magnitude Account of Knowledge Conflict Failure in Hypernetwork-Based Instant LLM Adaptation cs.LG · 2026-04-26 · conditional · none · ref 13 · 2 links · internal anchor
Knowledge conflicts in hypernetwork LLM adaptation stem from constant adapter margins losing to frequency-dependent pretrained margins; selective layer boosting and conflict-aware triggering raise deep-conflict accuracy to 71-72.5% on Gemma-2B and Mistral-7B.
FLARE: A Data-Efficient Surrogate for Predicting Displacement Fields in Directed Energy Deposition cs.LG · 2026-04-17 · unverdicted · none · ref 35 · internal anchor
FLARE predicts post-cooling displacement fields in directed energy deposition by encoding simulations as implicit neural fields whose weights are regularized to follow an affine structure in parameter space, enabling data-efficient prediction via weight mixing.
Hyperfastrl: Hypernetwork-based reinforcement learning for unified control of parametric chaotic PDEs cs.CE · 2026-04-07 · unverdicted · none · ref 71 · internal anchor
Hypernetworks map a forcing parameter directly to policy weights in an RL framework, enabling unified stabilization of the Kuramoto-Sivashinsky equation across regimes with KAN architectures showing strongest extrapolation.
HyperFitS -- Hypernetwork Fitting Spectra for metabolic quantification of ${}^1$H MR spectroscopic imaging cs.LG · 2026-04-03 · unverdicted · none · ref 35 · internal anchor
HyperFitS is a hypernetwork for configurable spectral fitting in 1H MRSI that matches conventional LCModel results while processing whole-brain data in seconds instead of hours and adapting to varied protocols without retraining.
HOI-aware Adaptive Network for Weakly-supervised Action Segmentation cs.CV · 2026-04-29 · unverdicted · none · ref 10 · internal anchor
AdaAct employs a HOI encoder and two-branch hypernetwork to adaptively adjust temporal encoding parameters based on video-level human-object interactions for improved weakly-supervised action segmentation.
Neural Computers cs.LG · 2026-04-07 · unverdicted · none · ref 13 · internal anchor
Neural Computers are introduced as a new machine form where computation, memory, and I/O are unified in a learned runtime state, with initial video-model experiments showing acquisition of basic interface primitives from traces.
Why Invariance is Not Enough for Biomedical Domain Generalization and How to Fix It eess.IV · 2026-04-02 · unverdicted · none · ref 19 · internal anchor
MaskGen improves domain generalization for biomedical image segmentation by using source intensities plus domain-stable foundation model representations with minimal added complexity.
Adaptive Learned State Estimation based on KalmanNet cs.RO · 2026-04-02 · unverdicted · none · ref 19 · internal anchor
AM-KNet adds sensor-specific modules, hypernetwork conditioning on target type and pose, and Joseph-form covariance estimation to KalmanNet, yielding better accuracy and stability than base KalmanNet on nuScenes and View-of-Delft data.

Title resolution pending

hub tools

fields

years

verdicts

representative citing papers

citing papers explorer