Brain-IT-VQA decodes visual question answers from fMRI using a transformer to extract language tokens and introduces the NSD-VQA benchmark with 20 controlled questions per image across 20 categories.
The wisdom of a crowd of brains: A universal brain encoder
5 Pith papers cite this work. Polarity classification is still indexing.
abstract
Image-to-fMRI encoding is important for both neuroscience research and practical applications. However, such "Brain-Encoders" have been typically trained per-subject and per fMRI-dataset, thus restricted to very limited training data. In this paper we propose a Universal Brain-Encoder, which can be trained jointly on data from many different subjects/datasets/machines. What makes this possible is our new voxel-centric Encoder architecture, which learns a unique "voxel-embedding" per brain-voxel. Our Encoder trains to predict the response of each brain-voxel on every image, by directly computing the cross-attention between the brain-voxel embedding and multi-level deep image features. This voxel-centric architecture allows the functional role of each brain-voxel to naturally emerge from the voxel-image cross-attention. We show the power of this approach to (i) combine data from multiple different subjects (a "Crowd of Brains") to improve each individual brain-encoding, (ii) quick & effective Transfer-Learning across subjects, datasets, and machines (e.g., 3-Tesla, 7-Tesla), with few training examples, and (iii) use the learned voxel-embeddings as a powerful tool to explore brain functionality (e.g., what is encoded where in the brain).
citation-role summary
citation-polarity summary
years
2026 5verdicts
UNVERDICTED 5roles
background 2polarities
background 2representative citing papers
BrainCause recovers known visual localizations and finds new candidate representations by validating causal specificity via counterfactual stimuli and encoding models, showing activation alone produces many false positives.
TRIBE v2 is a multimodal AI model that predicts human brain activity more accurately than linear encoding models and recovers established neuroscientific findings through in-silico testing.
A meta-optimized in-context learning approach enables training-free cross-subject semantic visual decoding from fMRI by inferring individual neural encoding patterns via hierarchical inference on a few examples.
ViBE generates M/EEG signals from visual stimuli by reconstructing neural responses with a TSC-VAE and aligning CLIP image features to its latent space via Q-Former, MSE, and sliced Wasserstein losses.
citing papers explorer
-
A foundation model of vision, audition, and language for in-silico neuroscience
TRIBE v2 is a multimodal AI model that predicts human brain activity more accurately than linear encoding models and recovers established neuroscientific findings through in-silico testing.
-
Meta-learning In-Context Enables Training-Free Cross Subject Brain Decoding
A meta-optimized in-context learning approach enables training-free cross-subject semantic visual decoding from fMRI by inferring individual neural encoding patterns via hierarchical inference on a few examples.