hub

Film: Visual reasoning with a general conditioning layer

Ethan Perez, Florian Strub, Harm de Vries, Vincent Dumoulin, Aaron Courville · 2017 · cs.CV · arXiv 1709.07871

10 Pith papers cite this work. Polarity classification is still indexing.

10 Pith papers citing it

open full Pith review browse 10 citing papers arXiv PDF

abstract

We introduce a general-purpose conditioning method for neural networks called FiLM: Feature-wise Linear Modulation. FiLM layers influence neural network computation via a simple, feature-wise affine transformation based on conditioning information. We show that FiLM layers are highly effective for visual reasoning - answering image-related questions which require a multi-step, high-level process - a task which has proven difficult for standard deep learning methods that do not explicitly model reasoning. Specifically, we show on visual reasoning tasks that FiLM layers 1) halve state-of-the-art error for the CLEVR benchmark, 2) modulate features in a coherent manner, 3) are robust to ablations and architectural modifications, and 4) generalize well to challenging, new data from few examples or even zero-shot.

hub tools

JSON dossier citing papers JSON arXiv source

representative citing papers

Discrete Diffusion for Complex and Congested Multi-Agent Path Finding with Sparse Social Attention

cs.AI · 2026-05-13 · unverdicted · novelty 7.0

DiffLNS uses a discrete diffusion initializer to produce warm-start plans that lift LNS2 success rates to 95.8% across 20 congested MAPF settings, generalizing from 96 to 312 agents.

SeBA: Semi-supervised few-shot learning via Separated-at-Birth Alignment for tabular data

cs.LG · 2026-05-08 · unverdicted · novelty 7.0

SeBA is a joint-embedding framework that separates tabular data into two complementary views and aligns one view's representations to the nearest-neighbor structure of the other, improving feature-label relationships and achieving SOTA results in most benchmarks without relying on augmentations.

Diffusion Models Beat GANs on Image Synthesis

cs.LG · 2021-05-11 · accept · novelty 7.0

Diffusion models with architecture improvements and classifier guidance achieve superior FID scores to GANs on unconditional and conditional ImageNet image synthesis.

MoMo: Conditioned Contrastive Representation Learning for Preference-Modulated Planning

cs.LG · 2026-05-08 · unverdicted · novelty 6.0

MoMo uses Feature-Wise Linear Modulation and low-rank neural modulation to condition contrastive planning representations on user preferences while preserving inference efficiency and probability density ratios.

Self-Supervised Spatial And Zero-Shot Angular Super-Resolution by Spatial-Angular Implicit Representation For Rotating-View SNR-Efficient Diffusion MRI

cs.CV · 2026-05-04 · unverdicted · novelty 6.0

SA-INR achieves self-supervised spatial super-resolution and zero-shot angular super-resolution in rotating-view dMRI from single-view acquisitions per direction, reaching 34.82 dB PSNR on trained and 33.08 dB on unseen directions in simulated data while improving DTI fitting.

Lucid-XR: An Extended-Reality Data Engine for Robotic Manipulation

cs.RO · 2026-04-30 · unverdicted · novelty 6.0

Lucid-XR uses XR-headset physics simulation and physics-guided video generation to create synthetic data that trains robot policies transferring zero-shot to unseen real-world manipulation tasks.

The Override Gap: A Magnitude Account of Knowledge Conflict Failure in Hypernetwork-Based Instant LLM Adaptation

cs.LG · 2026-04-26 · conditional · novelty 6.0 · 2 refs

Knowledge conflicts in hypernetwork LLM adaptation stem from constant adapter margins losing to frequency-dependent pretrained margins; selective layer boosting and conflict-aware triggering raise deep-conflict accuracy to 71-72.5% on Gemma-2B and Mistral-7B.

Train-Small Deploy-Large: Leveraging Diffusion-Based Multi-Robot Planning

cs.RO · 2026-04-08 · unverdicted · novelty 6.0

A diffusion-based multi-robot planner trained on few agents generalizes to larger numbers during deployment using inter-agent attention and temporal convolution.

AE-ViT: Stable Long-Horizon Parametric Partial Differential Equations Modeling

cs.LG · 2026-04-07 · unverdicted · novelty 6.0

AE-ViT combines a convolutional autoencoder with a latent-space transformer and multi-stage parameter plus coordinate injection to deliver stable long-horizon predictions for parametric PDEs, cutting relative rollout error by roughly five times versus prior DL-ROMs and ViTs on advection-diffusion-re

Federated Medical Image Classification under Class and Domain Imbalance exploiting Synthetic Sample Generation

cs.CV · 2026-04-29 · unverdicted · novelty 5.0

FedSSG generates and shares synthetic samples within a federated setup to reduce class imbalance and domain shift problems in medical image classification.

citing papers explorer

Showing 10 of 10 citing papers.

Discrete Diffusion for Complex and Congested Multi-Agent Path Finding with Sparse Social Attention cs.AI · 2026-05-13 · unverdicted · none · ref 12 · internal anchor
DiffLNS uses a discrete diffusion initializer to produce warm-start plans that lift LNS2 success rates to 95.8% across 20 congested MAPF settings, generalizing from 96 to 312 agents.
SeBA: Semi-supervised few-shot learning via Separated-at-Birth Alignment for tabular data cs.LG · 2026-05-08 · unverdicted · none · ref 227
SeBA is a joint-embedding framework that separates tabular data into two complementary views and aligns one view's representations to the nearest-neighbor structure of the other, improving feature-label relationships and achieving SOTA results in most benchmarks without relying on augmentations.
Diffusion Models Beat GANs on Image Synthesis cs.LG · 2021-05-11 · accept · none · ref 48
Diffusion models with architecture improvements and classifier guidance achieve superior FID scores to GANs on unconditional and conditional ImageNet image synthesis.
MoMo: Conditioned Contrastive Representation Learning for Preference-Modulated Planning cs.LG · 2026-05-08 · unverdicted · none · ref 48
MoMo uses Feature-Wise Linear Modulation and low-rank neural modulation to condition contrastive planning representations on user preferences while preserving inference efficiency and probability density ratios.
Self-Supervised Spatial And Zero-Shot Angular Super-Resolution by Spatial-Angular Implicit Representation For Rotating-View SNR-Efficient Diffusion MRI cs.CV · 2026-05-04 · unverdicted · none · ref 14
SA-INR achieves self-supervised spatial super-resolution and zero-shot angular super-resolution in rotating-view dMRI from single-view acquisitions per direction, reaching 34.82 dB PSNR on trained and 33.08 dB on unseen directions in simulated data while improving DTI fitting.
Lucid-XR: An Extended-Reality Data Engine for Robotic Manipulation cs.RO · 2026-04-30 · unverdicted · none · ref 21
Lucid-XR uses XR-headset physics simulation and physics-guided video generation to create synthetic data that trains robot policies transferring zero-shot to unseen real-world manipulation tasks.
The Override Gap: A Magnitude Account of Knowledge Conflict Failure in Hypernetwork-Based Instant LLM Adaptation cs.LG · 2026-04-26 · conditional · none · ref 29 · 2 links
Knowledge conflicts in hypernetwork LLM adaptation stem from constant adapter margins losing to frequency-dependent pretrained margins; selective layer boosting and conflict-aware triggering raise deep-conflict accuracy to 71-72.5% on Gemma-2B and Mistral-7B.
Train-Small Deploy-Large: Leveraging Diffusion-Based Multi-Robot Planning cs.RO · 2026-04-08 · unverdicted · none · ref 15
A diffusion-based multi-robot planner trained on few agents generalizes to larger numbers during deployment using inter-agent attention and temporal convolution.
AE-ViT: Stable Long-Horizon Parametric Partial Differential Equations Modeling cs.LG · 2026-04-07 · unverdicted · none · ref 19
AE-ViT combines a convolutional autoencoder with a latent-space transformer and multi-stage parameter plus coordinate injection to deliver stable long-horizon predictions for parametric PDEs, cutting relative rollout error by roughly five times versus prior DL-ROMs and ViTs on advection-diffusion-re
Federated Medical Image Classification under Class and Domain Imbalance exploiting Synthetic Sample Generation cs.CV · 2026-04-29 · unverdicted · none · ref 24
FedSSG generates and shares synthetic samples within a federated setup to reduce class imbalance and domain shift problems in medical image classification.

Film: Visual reasoning with a general conditioning layer

hub tools

fields

years

verdicts

representative citing papers

citing papers explorer