From Activation to Causality: Discovery of Causal Visual Representations in the Human Brain

Antonio Torralba; Aude Oliva; Matias Cosarinsky; Michal Irani; Navve Wasserman; Roman Beliy; Tamar Rott Shaham; Yuval Golbari

arxiv: 2605.23895 · v1 · pith:CRW6I6NMnew · submitted 2026-05-22 · 💻 cs.CV

From Activation to Causality: Discovery of Causal Visual Representations in the Human Brain

Yuval Golbari , Navve Wasserman , Matias Cosarinsky , Roman Beliy , Aude Oliva , Antonio Torralba , Michal Irani , Tamar Rott Shaham This is my paper

Pith reviewed 2026-05-25 04:22 UTC · model grok-4.3

classification 💻 cs.CV

keywords causal validationvisual representationsfMRI encodingcounterfactual stimulifunctional localizationactivation maximizationBrainCauseconcept representations

0 comments

The pith

Causal validation with counterfactual stimuli is required to identify true visual representations in the brain, since activation alone yields many false positives.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces BrainCause, a framework that builds stimulus sets containing a target concept, counterfactual versions that remove the concept while preserving other content, and images with correlated distractors. It then applies an image-to-fMRI encoding model to predict responses and isolate regions that respond selectively to the concept rather than to correlated cues. The method recovers known functional areas and surfaces new candidates across many concepts, but demonstrates that a substantial share of activation-based localizations fail the causal test. A sympathetic reader would care because the work replaces correlation-based mapping with a procedure that directly tests whether a region represents the queried concept itself.

Core claim

BrainCause automates construction of targeted stimulus sets comprising concept images, counterfactual edits that remove the target concept while preserving other image content, and images with candidate correlated distractors; it then uses an image-to-fMRI encoding model to predict brain responses and identify representations that respond specifically to the target concept over correlated alternatives, recovering known localizations while showing that without this causal step a large fraction of activation-based findings would be false positives.

What carries the argument

BrainCause framework that synthesizes controlled stimuli with counterfactual edits and applies an image-to-fMRI encoding model to perform causal testing of neural representations.

If this is right

Known functional localizations such as face and place areas are recovered and confirmed through the causal procedure.
New candidate representations are proposed for dozens of visual concepts.
A large fraction of localizations identified by activation maximization alone are shown to be false positives.
The framework generates specific follow-up fMRI experiments to further test or extend the validated candidates.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

If the encoding model holds for the counterfactual stimuli, the method could be used to screen many candidate concepts before committing scanner time to new experiments.
Applying similar counterfactual editing to existing large fMRI datasets might allow re-analysis of prior activation maps for causal validity.
The same stimulus-construction logic could be extended to test representations of more abstract or relational visual properties beyond object categories.

Load-bearing premise

The image-to-fMRI encoding model accurately predicts brain responses to counterfactual stimuli that differ from the training distribution in targeted ways.

What would settle it

Recording actual fMRI responses to the generated counterfactual stimuli and finding that they fail to match the encoding model's predictions for the candidate regions would falsify the claim that those regions represent the target concept.

Figures

Figures reproduced from arXiv: 2605.23895 by Antonio Torralba, Aude Oliva, Matias Cosarinsky, Michal Irani, Navve Wasserman, Roman Beliy, Tamar Rott Shaham, Yuval Golbari.

**Figure 1.** Figure 1: Overview of our approach. Activation-maximization based methods (top) identify regions with high responses to a target concept, but cannot distinguish true concept representations from correlated cues. In contrast, BrainCause (bottom) performs causal evaluation using targeted counterfactual stimuli that isolate the concept. Regions with high activation but low response difference between original and coun… view at source ↗

**Figure 2.** Figure 2: Concept-Targeted Causal Data Generation. Given a target concept (e.g., human face), BrainCause constructs a causal dataset consisting of three types of stimuli: positive images, counterfactuals, and semantic negatives. A language model generates prompts for each type, which are used by a text-to-image model to synthesize images. Counterfactual and semantic negatives are designed to isolate the target conce… view at source ↗

**Figure 3.** Figure 3: Causal evaluation of discovered regions. Top: regions discovered by BrainCause, showing high activation for positives and lower activation for counterfactual edits and semantic negatives. Bottom: regions discovered by an activation-based method, which remain highly activated after edits or for related negatives, indicating false positives driven by correlated cues. negative images for training and a simila… view at source ↗

**Figure 4.** Figure 4: Causal ranking reduces false discoveries. Activation-based discovery frequently selects regions that respond strongly to the target concept but are not causally specific, leading to many false positives. By ranking candidates using causal score, BrainCause suppresses correlation-driven discoveries, reduces false positives, and recovers more faithful concept representations. way, the discovered region is re… view at source ↗

**Figure 5.** Figure 5: Concepts discovered by causal evaluation. We show voxel-wise causal scores on brain maps for three example concepts, with representative positive images above each map. Each panel shows a flatmap of high-level visual cortex for subject 1. Each voxel is colored by its causal score, where warmer colors indicate higher concept-specific causal evidence. Black outlines and labels mark NSD functional ROIs, allow… view at source ↗

**Figure 6.** Figure 6: Fine-grained organization of related concepts. Left: body-related concepts, including human face, human hand, and human leg, show distinct voxel patterns across face- and bodyselective regions. Right: text-related concepts, including handwritten text, symbolic signs, and logos, show distinct voxel patterns across word- and object-related visual areas. These results show that BrainCause discovers nearby se… view at source ↗

read the original abstract

Identifying which brain regions represent a visual concept in the human brain is a central challenge in neuroscience. Existing approaches have localized coarse functional regions (e.g., faces, places) through activation maximization, identifying regions that activate strongly for a target concept relative to other concepts. Yet strong activation alone does not establish that a region represents the concept itself, as responses may instead be driven by correlated visual or semantic cues. We introduce BrainCause, an automated framework that combines generative and brain models to synthesize controlled stimuli and validate neural representations through targeted causal testing. Given a query specifying a concept of interest, our framework constructs targeted stimulus sets comprising concept images, counterfactual edits that remove the target concept while preserving other image content, and images with candidate correlated distractors. It then uses an image-to-fMRI encoding model to predict brain responses and searches for representations that respond specifically to the target concept over correlated alternatives. BrainCause returns validated candidate representations and proposes follow-up fMRI experiments to further test or extend its discoveries. Our approach successfully recovers known functional localizations and identifies new candidate representations across dozens of concepts, validated on both predicted and measured fMRI data. Critically, we show that without causal validation, a large fraction of localizations would be false positives, confirming that activation alone is insufficient evidence of representation.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

BrainCause pairs generative counterfactual edits with encoding models to test causal representations in fMRI, recovering known areas while flagging activation false positives, but the whole thing rests on unshown model accuracy for the edited images.

read the letter

The main takeaway is that this paper builds BrainCause to move past activation maximization by generating concept images, counterfactual edits that remove the target while holding other content fixed, and distractor sets, then using an image-to-fMRI model to isolate regions that respond specifically to the concept. It recovers standard localizations like faces and places, surfaces new candidate regions for many concepts, and reports that a large share of activation-only findings would be false positives. The stimulus construction and the explicit comparison to correlated alternatives are the parts that feel like a genuine step forward from prior work on activation maps. The measured fMRI validation step is also a plus because it does not rely solely on model predictions. The soft spot is exactly the one the stress-test flags. All causal claims require the encoding model to give accurate differential predictions on the generative edits, which sit outside the natural-image training distribution. The abstract mentions a validation step for stimulus construction but supplies no numbers on held-out correlation or error rates for those specific edited images. If that performance is weak or biased, the claimed separation between causal and activation-based localizations does not hold. The paper is aimed at visual neuroscientists who already work with encoding models and want tighter localization tools. A reader in that group would get a usable framework and a clear cautionary result about activation methods, even if they later disagree with some of the new candidates. It deserves a serious referee because the question is substantive and the approach is new enough that the details on model generalization need to be checked in review rather than dismissed at the desk. Send it to peer review.

Referee Report

2 major / 1 minor

Summary. The paper introduces BrainCause, a framework combining generative models to synthesize concept images, counterfactual edits that remove a target visual concept while preserving other content, and distractor images, with an image-to-fMRI encoding model to test for causal specificity of brain responses. It claims to recover known functional localizations, identify new candidate representations across dozens of concepts on both predicted and measured fMRI data, and demonstrate that activation-based localization produces a large fraction of false positives, confirming activation alone is insufficient for establishing representation.

Significance. If the encoding model's predictions remain accurate on the generative counterfactual edits, the work could provide a scalable, automated method for causal discovery of visual representations in neuroscience, moving the field beyond correlational activation maps toward more rigorous validation and targeted follow-up experiments.

major comments (2)

[Abstract] Abstract: the claim that 'without causal validation, a large fraction of localizations would be false positives' is load-bearing for the central thesis but supplies no quantitative fraction, exact validation metrics, or criteria for classifying a localization as a false positive (e.g., differential response thresholds between concept and edit conditions).
[Abstract] Abstract (validation step and stimulus construction paragraph): the image-to-fMRI encoding model's accuracy on out-of-distribution generative counterfactual edits is not quantified (no held-out correlation, error metrics, or OOD generalization results reported), yet this generalization is required for the causal specificity tests to be reliable; systematic misprediction on edited stimuli would collapse the activation-versus-causality distinction.

minor comments (1)

[Abstract] Abstract: the description of how the framework 'searches for representations that respond specifically to the target concept over correlated alternatives' would benefit from explicit mention of the statistical test or similarity metric employed.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback on the abstract. We agree that additional quantitative details will strengthen the presentation and will revise the abstract accordingly. Responses to the major comments follow.

read point-by-point responses

Referee: [Abstract] Abstract: the claim that 'without causal validation, a large fraction of localizations would be false positives' is load-bearing for the central thesis but supplies no quantitative fraction, exact validation metrics, or criteria for classifying a localization as a false positive (e.g., differential response thresholds between concept and edit conditions).

Authors: The results section of the manuscript reports the quantitative fraction of activation-based localizations that fail causal validation, along with the exact metrics and the statistical criterion used to classify false positives (regions showing no significant differential response between concept and counterfactual-edit conditions). We will revise the abstract to include these specific values and a concise statement of the classification criterion. revision: yes
Referee: [Abstract] Abstract (validation step and stimulus construction paragraph): the image-to-fMRI encoding model's accuracy on out-of-distribution generative counterfactual edits is not quantified (no held-out correlation, error metrics, or OOD generalization results reported), yet this generalization is required for the causal specificity tests to be reliable; systematic misprediction on edited stimuli would collapse the activation-versus-causality distinction.

Authors: The manuscript validates the encoding model on held-out data that includes the generative counterfactual edits and reports the relevant correlation and error metrics in the methods and results sections. We will revise the abstract to explicitly state these OOD generalization numbers so that the reliability of the causal tests is clear from the abstract alone. revision: yes

Circularity Check

0 steps flagged

No circularity; causal claims rest on measured fMRI validation, not model fit alone

full rationale

The derivation fits an image-to-fMRI encoding model on (presumably natural-image) data and uses its predictions to screen for concept-specific responses versus counterfactual edits and distractors. However, the paper states that candidate representations are 'validated on both predicted and measured fMRI data,' supplying an external benchmark that is independent of the fitted parameters. No equations, self-citations, or self-definitional steps are shown that would make the causal-versus-activation distinction reduce to the training fit by construction. The central claim therefore remains self-contained against the measured data.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only; no explicit free parameters, axioms, or invented entities are described. The approach implicitly depends on the existence of a sufficiently accurate image-to-fMRI encoding model and on the generative model being able to perform targeted concept removal without introducing artifacts, but these are not itemized.

pith-pipeline@v0.9.0 · 5790 in / 1198 out tokens · 22977 ms · 2026-05-25T04:22:20.836025+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

45 extracted references · 45 canonical work pages · 4 internal anchors

[1]

Sereno, Anders M

Martin I. Sereno, Anders M. Dale, John B. Reppas, Kai K. Kwong, John W. Belliveau, Thomas J. Brady, Bruce R. Rosen, and Roger B. H. Tootell. Borders of multiple visual areas in humans revealed by functional magnetic resonance imaging.Science, 268(5212):889–893, 1995. doi: 10.1126/science.7754376

work page doi:10.1126/science.7754376 1995
[2]

Engel, Gary H

Stephen A. Engel, Gary H. Glover, and Brian A. Wandell. Retinotopic organization in human visual cortex and the spatial precision of functional mri.Cerebral Cortex, 7(2):181–192, 1997. doi: 10.1093/cercor/7.2.181

work page doi:10.1093/cercor/7.2.181 1997
[3]

The fusiform face area: A module in human extrastriate cortex specialized for face perception.Journal of Neuroscience, 1997

Nancy Kanwisher, Josh McDermott, and Marvin M Chun. The fusiform face area: A module in human extrastriate cortex specialized for face perception.Journal of Neuroscience, 1997

work page 1997
[4]

A cortical representation of the local visual environment

Russell Epstein and Nancy Kanwisher. A cortical representation of the local visual environment. Nature, 1998

work page 1998
[5]

A cortical area selective for visual processing of the human body.Science, 293(5539):2470–2473, 2001

Paul E Downing, Yuhong Jiang, Miles Shuman, and Nancy Kanwisher. A cortical area selective for visual processing of the human body.Science, 293(5539):2470–2473, 2001

work page 2001
[6]

The visual word form area: spatial and temporal characterization of an initial stage of reading in normal subjects and posterior split-brain patients.Brain, 123(2):291–307, 2000

Laurent Cohen, Stanislas Dehaene, Lionel Naccache, Stéphane Lehéricy, Ghislaine Dehaene- Lambertz, Marie-Anne Hénaff, and François Michel. The visual word form area: spatial and temporal characterization of an initial stage of reading in normal subjects and posterior split-brain patients.Brain, 123(2):291–307, 2000

work page 2000
[7]

Mindsimulator: Exploring brain concept localization via synthetic fmri.arXiv preprint arXiv:2503.02351, 2025

Guangyin Bao, Qi Zhang, Zixuan Gong, Zhuojia Wu, and Duoqian Miao. Mindsimulator: Exploring brain concept localization via synthetic fmri.arXiv preprint arXiv:2503.02351, 2025

work page arXiv 2025
[8]

In silico mapping of visual categorical selectivity across the whole brain.arXiv preprint arXiv:2510.21142, 2025

Ethan Hwang, Hossein Adeli, Wenxuan Guo, Andrew Luo, and Nikolaus Kriegeskorte. In silico mapping of visual categorical selectivity across the whole brain.arXiv preprint arXiv:2510.21142, 2025

work page arXiv 2025
[9]

Brain mapping with dense features: Grounding cortical semantic selectivity in natural images with vision transformers.arXiv preprint arXiv:2410.05266, 2024

Andrew F Luo, Jacob Yeung, Rushikesh Zawar, Shaurya Dewan, Margaret M Henderson, Leila Wehbe, and Michael J Tarr. Brain mapping with dense features: Grounding cortical semantic selectivity in natural images with vision transformers.arXiv preprint arXiv:2410.05266, 2024

work page arXiv 2024
[10]

Allen, Yihan Wu, Ghislain St-Yves, Thomas Naselaris, Kendrick Kay, Mert R

Zijin Gu, Keith Wakefield Jamison, Meenakshi Khosla, Emily J. Allen, Yihan Wu, Ghislain St-Yves, Thomas Naselaris, Kendrick Kay, Mert R. Sabuncu, and Amy Kuceyeski. Neurogen: Activation optimized image synthesis for discovery neuroscience.NeuroImage, 247:118812,

work page
[11]

doi: https://doi.org/10.1016/j.neuroimage.2021.118812

ISSN 1053-8119. doi: https://doi.org/10.1016/j.neuroimage.2021.118812. URL https: //www.sciencedirect.com/science/article/pii/S1053811921010831

work page doi:10.1016/j.neuroimage.2021.118812 2021
[12]

Multidimensional feature tuning in category-selective areas of human visual cortex.bioRxiv, pages 2025–06, 2025

Leonard E van Dyck, Martin N Hebart, and Katharina Dobs. Multidimensional feature tuning in category-selective areas of human visual cortex.bioRxiv, pages 2025–06, 2025

work page 2025
[13]

Brainexplore: Large-scale discovery of interpretable visual represen- tations in the human brain.arXiv preprint arXiv:2512.08560, 2025

Navve Wasserman, Matias Cosarinsky, Yuval Golbari, Aude Oliva, Antonio Torralba, Tamar Rott Shaham, and Michal Irani. Brainexplore: Large-scale discovery of interpretable visual represen- tations in the human brain.arXiv preprint arXiv:2512.08560, 2025

work page arXiv 2025
[14]

E. A. DeYoe, G. J. Carman, P. Bandettini, S. Glickman, J. Wieser, R. Cox, D. Miller, and J. Neitz. Mapping striate and extrastriate visual areas in human cerebral cortex.Proceedings of the National Academy of Sciences, 93(6):2382–2386, 1996. doi: 10.1073/pnas.93.6.2382

work page doi:10.1073/pnas.93.6.2382 1996
[15]

Decoding the visual and subjective contents of the human brain.Nature Neuroscience, 8(5):679–685, 2005

Yukiyasu Kamitani and Frank Tong. Decoding the visual and subjective contents of the human brain.Nature Neuroscience, 8(5):679–685, 2005. doi: 10.1038/nn1444. 10

work page doi:10.1038/nn1444 2005
[16]

Spatial frequency tuning in human retinotopic visual areas.Journal of Vision, 8(10):5:1–13, 2008

Linda Henriksson, Lauri Nurminen, Aapo Hyvärinen, and Simo Vanni. Spatial frequency tuning in human retinotopic visual areas.Journal of Vision, 8(10):5:1–13, 2008. doi: 10.1167/8.10.5

work page doi:10.1167/8.10.5 2008
[17]

Conway, Sebastian Moeller, and Doris Y

Bevil R. Conway, Sebastian Moeller, and Doris Y . Tsao. Specialized color modules in macaque extrastriate cortex.Neuron, 56(3):560–573, 2007. doi: 10.1016/j.neuron.2007.10.008

work page doi:10.1016/j.neuron.2007.10.008 2007
[18]

Roger B. H. Tootell, John B. Reppas, Anders M. Dale, Rodney B. Look, Martin I. Sereno, Rafael Malach, Thomas J. Brady, and Bruce R. Rosen. Visual motion aftereffect in human cortical area mt revealed by functional magnetic resonance imaging.Nature, 375(6527):139–141, 1995. doi: 10.1038/375139a0

work page doi:10.1038/375139a0 1995
[19]

The fusiform face area: a cortical region specialized for the perception of faces.Philosophical Transactions of the Royal Society B: Biological Sciences, 361(1476):2109–2128, 2006

Nancy Kanwisher and Galit Yovel. The fusiform face area: a cortical region specialized for the perception of faces.Philosophical Transactions of the Royal Society B: Biological Sciences, 361(1476):2109–2128, 2006

work page 2006
[20]

Scene perception in the human brain.Annual review of vision science, 5(1):373–397, 2019

Russell A Epstein and Chris I Baker. Scene perception in the human brain.Annual review of vision science, 5(1):373–397, 2019

work page 2019
[21]

Functional brain-to-brain transformation without shared stimuli.NeuroImage, 327:121741, 2026

Navve Wasserman, Roman Beliy, Roy Urbach, and Michal Irani. Functional brain-to-brain transformation without shared stimuli.NeuroImage, 327:121741, 2026

work page 2026
[22]

Inter-subject neural code converter for visual image representation.NeuroImage, 113:289–297, 2015

Kentaro Yamada, Yoichi Miyawaki, and Yukiyasu Kamitani. Inter-subject neural code converter for visual image representation.NeuroImage, 113:289–297, 2015

work page 2015
[23]

Mindeye2: Shared-subject models enable fmri-to-image with 1 hour of data.arXiv preprint arXiv:2403.11207, 2024

Paul S Scotti, Mihir Tripathy, Cesar Kadir Torrico Villanueva, Reese Kneeland, Tong Chen, Ashutosh Narang, Charan Santhirasegaran, Jonathan Xu, Thomas Naselaris, Kenneth A Norman, and Tanishq Mathew Abraham. Mindeye2: Shared-subject models enable fmri-to-image with 1 hour of data.arXiv preprint arXiv:2403.11207, 2024

work page arXiv 2024
[24]

Brain-it: Image reconstruction from fmri via brain-interaction transformer.arXiv preprint arXiv:2510.25976, 2025

Roman Beliy, Amit Zalcher, Jonathan Kogman, Navve Wasserman, and Michal Irani. Brain-it: Image reconstruction from fmri via brain-interaction transformer.arXiv preprint arXiv:2510.25976, 2025

work page arXiv 2025
[25]

Computational models of category-selective brain regions enable high-throughput tests of selectivity.Nature communications, 12(1):5540, 2021

N Apurva Ratan Murty, Pouya Bashivan, Alex Abate, James J DiCarlo, and Nancy Kanwisher. Computational models of category-selective brain regions enable high-throughput tests of selectivity.Nature communications, 12(1):5540, 2021

work page 2021
[26]

Kay, Thomas Naselaris, Ryan J

Kendrick N. Kay, Thomas Naselaris, Ryan J. Prenger, and Jack L. Gallant. Identifying natural im- ages from human brain activity.Nature, 452(7185):352–355, 2008. doi: 10.1038/nature06713

work page doi:10.1038/nature06713 2008
[27]

Kay, Shinji Nishimoto, and Jack L

Thomas Naselaris, Kendrick N. Kay, Shinji Nishimoto, and Jack L. Gallant. Encoding and decoding in fmri.NeuroImage, 56(2):400–410, 2011. ISSN 1053-8119. doi: https://doi.org/ 10.1016/j.neuroimage.2010.07.073. URL https://www.sciencedirect.com/science/ article/pii/S1053811910010657. Multivariate Decoding and Brain Reading

work page doi:10.1016/j.neuroimage.2010.07.073 2011
[28]

The wisdom of a crowd of brains: A universal brain encoder.arXiv preprint arXiv:2406.12179, 2024

Roman Beliy, Navve Wasserman, Amit Zalcher, and Michal Irani. The wisdom of a crowd of brains: A universal brain encoder.arXiv preprint arXiv:2406.12179, 2024

work page internal anchor Pith review arXiv 2024
[29]

Transformer brain encoders explain human high-level visual responses.arXiv preprint arXiv:2505.17329, 2025

Hossein Adeli, Sun Minni, and Nikolaus Kriegeskorte. Transformer brain encoders explain human high-level visual responses.arXiv preprint arXiv:2505.17329, 2025

work page arXiv 2025
[30]

Brain diffusion for vi- sual exploration: Cortical discovery using large scale generative models

Andrew Luo, Maggie Henderson, Leila Wehbe, and Michael Tarr. Brain diffusion for vi- sual exploration: Cortical discovery using large scale generative models. In A. Oh, T. Nau- mann, A. Globerson, K. Saenko, M. Hardt, and S. Levine, editors,Advances in Neu- ral Information Processing Systems, volume 36, pages 75740–75781. Curran Associates, Inc., 2023. UR...

work page 2023
[31]

Disentangling causal webs in the brain using functional magnetic resonance imaging: A review of current approaches.Network Neuroscience, 3(2):237–273, 2019

Natalia Z Bielczyk, Sebo Uithol, Tim van Mourik, Paul Anderson, Jeffrey C Glennon, and Jan K Buitelaar. Disentangling causal webs in the brain using functional magnetic resonance imaging: A review of current approaches.Network Neuroscience, 3(2):237–273, 2019

work page 2019
[32]

Causal mapping of human brain function.Nature reviews neuroscience, 23(6):361–375, 2022

Shan H Siddiqi, Konrad P Kording, Josef Parvizi, and Michael D Fox. Causal mapping of human brain function.Nature reviews neuroscience, 23(6):361–375, 2022. 11

work page 2022
[33]

High- resolution image synthesis with latent diffusion models

Robin Rombach, Andreas Blattmann, Dominik Lorenz, Patrick Esser, and Björn Ommer. High- resolution image synthesis with latent diffusion models. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 10684–10695, 2022

work page 2022
[34]

Instructpix2pix: Learning to follow image editing instructions

Tim Brooks, Aleksander Holynski, and Alexei A Efros. Instructpix2pix: Learning to follow image editing instructions. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 18392–18402, 2023

work page 2023
[35]

Paint by inpaint: Learning to add image objects by removing them first

Navve Wasserman, Noam Rotstein, Roy Ganz, and Ron Kimmel. Paint by inpaint: Learning to add image objects by removing them first. InProceedings of the Computer Vision and Pattern Recognition Conference, pages 18313–18324, 2025

work page 2025
[36]

FLUX.1 Kontext: Flow Matching for In-Context Image Generation and Editing in Latent Space

Black Forest Labs, Stephen Batifol, Andreas Blattmann, Frederic Boesel, Saksham Consul, Cyril Diagne, Tim Dockhorn, Jack English, Zion English, Patrick Esser, Sumith Kulal, Kyle Lacey, Yam Levi, Cheng Li, Dominik Lorenz, Jonas Müller, Dustin Podell, Robin Rombach, Harry Saini, Axel Sauer, and Luke Smith. Flux.1 kontext: Flow matching for in-context image ...

work page internal anchor Pith review Pith/arXiv arXiv 2025
[37]

Causal reasoning and large language models: Opening a new frontier for causality.Transactions on Machine Learning Research, 2023

Emre Kiciman, Robert Ness, Amit Sharma, and Chenhao Tan. Causal reasoning and large language models: Opening a new frontier for causality.Transactions on Machine Learning Research, 2023

work page 2023
[38]

A survey on hypothesis generation for scientific discovery in the era of large language models

Atilla Kaan Alkan, Shashwat Sourav, Maja Jablonska, Simone Astarita, Rishabh Chakrabarty, Nikhil Garuda, Pranav Khetarpal, Maciej Pióro, Dimitrios Tanoglidis, Kartheik G Iyer, et al. A survey on hypothesis generation for scientific discovery in the era of large language models. arXiv preprint arXiv:2504.05496, 2025

work page arXiv 2025
[39]

Hypothesis generation with large language models

Yangqiaoyu Zhou, Haokun Liu, Tejes Srivastava, Hongyuan Mei, and Chenhao Tan. Hypothesis generation with large language models. InProceedings of the 1st Workshop on NLP for Science (NLP4Science), pages 117–139, 2024

work page 2024
[40]

GPT-5 System Card.https://cdn.openai.com/gpt-5-system-card.pdf , 2025

OpenAI. GPT-5 System Card.https://cdn.openai.com/gpt-5-system-card.pdf , 2025. Accessed: 2026-05-06

work page 2025
[41]

Gemma 3 Technical Report

Gemma Team and Google DeepMind. Gemma 3.arXiv preprint arXiv:2503.19786, 2025. URL https://arxiv.org

work page internal anchor Pith review Pith/arXiv arXiv 2025
[42]

FLUX.2: Frontier Visual Intelligence

Black Forest Labs. FLUX.2: Frontier Visual Intelligence. https://bfl.ai/blog/flux-2, 2025

work page 2025
[43]

Qwen3 Technical Report

Qwen Team. Qwen3 technical report, 2025. URLhttps://arxiv.org/abs/2505.09388

work page internal anchor Pith review Pith/arXiv arXiv 2025
[44]

Learning transferable visual models from natural language supervision

Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agar- wal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, Gretchen Krueger, and Ilya Sutskever. Learning transferable visual models from natural language supervision. InInterna- tional Conference on Machine Learning, pages 8748–8763. PMLR, 2021

work page 2021
[45]

A massive 7t fmri dataset to bridge cognitive neuroscience and artificial intelligence.Nature neuroscience, 25(1):116–126, 2022

Emily J Allen, Ghislain St-Yves, Yihan Wu, Jesse L Breedlove, Jacob S Prince, Logan T Dowdle, Matthias Nau, Brad Caron, Franco Pestilli, Ian Charest, et al. A massive 7t fmri dataset to bridge cognitive neuroscience and artificial intelligence.Nature neuroscience, 25(1):116–126, 2022. 12 Appendix A Ablation & Analysis A.1 Consistency Across Subjects While...

work page 2022

[1] [1]

Sereno, Anders M

Martin I. Sereno, Anders M. Dale, John B. Reppas, Kai K. Kwong, John W. Belliveau, Thomas J. Brady, Bruce R. Rosen, and Roger B. H. Tootell. Borders of multiple visual areas in humans revealed by functional magnetic resonance imaging.Science, 268(5212):889–893, 1995. doi: 10.1126/science.7754376

work page doi:10.1126/science.7754376 1995

[2] [2]

Engel, Gary H

Stephen A. Engel, Gary H. Glover, and Brian A. Wandell. Retinotopic organization in human visual cortex and the spatial precision of functional mri.Cerebral Cortex, 7(2):181–192, 1997. doi: 10.1093/cercor/7.2.181

work page doi:10.1093/cercor/7.2.181 1997

[3] [3]

The fusiform face area: A module in human extrastriate cortex specialized for face perception.Journal of Neuroscience, 1997

Nancy Kanwisher, Josh McDermott, and Marvin M Chun. The fusiform face area: A module in human extrastriate cortex specialized for face perception.Journal of Neuroscience, 1997

work page 1997

[4] [4]

A cortical representation of the local visual environment

Russell Epstein and Nancy Kanwisher. A cortical representation of the local visual environment. Nature, 1998

work page 1998

[5] [5]

A cortical area selective for visual processing of the human body.Science, 293(5539):2470–2473, 2001

Paul E Downing, Yuhong Jiang, Miles Shuman, and Nancy Kanwisher. A cortical area selective for visual processing of the human body.Science, 293(5539):2470–2473, 2001

work page 2001

[6] [6]

The visual word form area: spatial and temporal characterization of an initial stage of reading in normal subjects and posterior split-brain patients.Brain, 123(2):291–307, 2000

Laurent Cohen, Stanislas Dehaene, Lionel Naccache, Stéphane Lehéricy, Ghislaine Dehaene- Lambertz, Marie-Anne Hénaff, and François Michel. The visual word form area: spatial and temporal characterization of an initial stage of reading in normal subjects and posterior split-brain patients.Brain, 123(2):291–307, 2000

work page 2000

[7] [7]

Mindsimulator: Exploring brain concept localization via synthetic fmri.arXiv preprint arXiv:2503.02351, 2025

Guangyin Bao, Qi Zhang, Zixuan Gong, Zhuojia Wu, and Duoqian Miao. Mindsimulator: Exploring brain concept localization via synthetic fmri.arXiv preprint arXiv:2503.02351, 2025

work page arXiv 2025

[8] [8]

In silico mapping of visual categorical selectivity across the whole brain.arXiv preprint arXiv:2510.21142, 2025

Ethan Hwang, Hossein Adeli, Wenxuan Guo, Andrew Luo, and Nikolaus Kriegeskorte. In silico mapping of visual categorical selectivity across the whole brain.arXiv preprint arXiv:2510.21142, 2025

work page arXiv 2025

[9] [9]

Brain mapping with dense features: Grounding cortical semantic selectivity in natural images with vision transformers.arXiv preprint arXiv:2410.05266, 2024

Andrew F Luo, Jacob Yeung, Rushikesh Zawar, Shaurya Dewan, Margaret M Henderson, Leila Wehbe, and Michael J Tarr. Brain mapping with dense features: Grounding cortical semantic selectivity in natural images with vision transformers.arXiv preprint arXiv:2410.05266, 2024

work page arXiv 2024

[10] [10]

Allen, Yihan Wu, Ghislain St-Yves, Thomas Naselaris, Kendrick Kay, Mert R

Zijin Gu, Keith Wakefield Jamison, Meenakshi Khosla, Emily J. Allen, Yihan Wu, Ghislain St-Yves, Thomas Naselaris, Kendrick Kay, Mert R. Sabuncu, and Amy Kuceyeski. Neurogen: Activation optimized image synthesis for discovery neuroscience.NeuroImage, 247:118812,

work page

[11] [11]

doi: https://doi.org/10.1016/j.neuroimage.2021.118812

ISSN 1053-8119. doi: https://doi.org/10.1016/j.neuroimage.2021.118812. URL https: //www.sciencedirect.com/science/article/pii/S1053811921010831

work page doi:10.1016/j.neuroimage.2021.118812 2021

[12] [12]

Multidimensional feature tuning in category-selective areas of human visual cortex.bioRxiv, pages 2025–06, 2025

Leonard E van Dyck, Martin N Hebart, and Katharina Dobs. Multidimensional feature tuning in category-selective areas of human visual cortex.bioRxiv, pages 2025–06, 2025

work page 2025

[13] [13]

Brainexplore: Large-scale discovery of interpretable visual represen- tations in the human brain.arXiv preprint arXiv:2512.08560, 2025

Navve Wasserman, Matias Cosarinsky, Yuval Golbari, Aude Oliva, Antonio Torralba, Tamar Rott Shaham, and Michal Irani. Brainexplore: Large-scale discovery of interpretable visual represen- tations in the human brain.arXiv preprint arXiv:2512.08560, 2025

work page arXiv 2025

[14] [14]

E. A. DeYoe, G. J. Carman, P. Bandettini, S. Glickman, J. Wieser, R. Cox, D. Miller, and J. Neitz. Mapping striate and extrastriate visual areas in human cerebral cortex.Proceedings of the National Academy of Sciences, 93(6):2382–2386, 1996. doi: 10.1073/pnas.93.6.2382

work page doi:10.1073/pnas.93.6.2382 1996

[15] [15]

Decoding the visual and subjective contents of the human brain.Nature Neuroscience, 8(5):679–685, 2005

Yukiyasu Kamitani and Frank Tong. Decoding the visual and subjective contents of the human brain.Nature Neuroscience, 8(5):679–685, 2005. doi: 10.1038/nn1444. 10

work page doi:10.1038/nn1444 2005

[16] [16]

Spatial frequency tuning in human retinotopic visual areas.Journal of Vision, 8(10):5:1–13, 2008

Linda Henriksson, Lauri Nurminen, Aapo Hyvärinen, and Simo Vanni. Spatial frequency tuning in human retinotopic visual areas.Journal of Vision, 8(10):5:1–13, 2008. doi: 10.1167/8.10.5

work page doi:10.1167/8.10.5 2008

[17] [17]

Conway, Sebastian Moeller, and Doris Y

Bevil R. Conway, Sebastian Moeller, and Doris Y . Tsao. Specialized color modules in macaque extrastriate cortex.Neuron, 56(3):560–573, 2007. doi: 10.1016/j.neuron.2007.10.008

work page doi:10.1016/j.neuron.2007.10.008 2007

[18] [18]

Roger B. H. Tootell, John B. Reppas, Anders M. Dale, Rodney B. Look, Martin I. Sereno, Rafael Malach, Thomas J. Brady, and Bruce R. Rosen. Visual motion aftereffect in human cortical area mt revealed by functional magnetic resonance imaging.Nature, 375(6527):139–141, 1995. doi: 10.1038/375139a0

work page doi:10.1038/375139a0 1995

[19] [19]

The fusiform face area: a cortical region specialized for the perception of faces.Philosophical Transactions of the Royal Society B: Biological Sciences, 361(1476):2109–2128, 2006

Nancy Kanwisher and Galit Yovel. The fusiform face area: a cortical region specialized for the perception of faces.Philosophical Transactions of the Royal Society B: Biological Sciences, 361(1476):2109–2128, 2006

work page 2006

[20] [20]

Scene perception in the human brain.Annual review of vision science, 5(1):373–397, 2019

Russell A Epstein and Chris I Baker. Scene perception in the human brain.Annual review of vision science, 5(1):373–397, 2019

work page 2019

[21] [21]

Functional brain-to-brain transformation without shared stimuli.NeuroImage, 327:121741, 2026

Navve Wasserman, Roman Beliy, Roy Urbach, and Michal Irani. Functional brain-to-brain transformation without shared stimuli.NeuroImage, 327:121741, 2026

work page 2026

[22] [22]

Inter-subject neural code converter for visual image representation.NeuroImage, 113:289–297, 2015

Kentaro Yamada, Yoichi Miyawaki, and Yukiyasu Kamitani. Inter-subject neural code converter for visual image representation.NeuroImage, 113:289–297, 2015

work page 2015

[23] [23]

Mindeye2: Shared-subject models enable fmri-to-image with 1 hour of data.arXiv preprint arXiv:2403.11207, 2024

Paul S Scotti, Mihir Tripathy, Cesar Kadir Torrico Villanueva, Reese Kneeland, Tong Chen, Ashutosh Narang, Charan Santhirasegaran, Jonathan Xu, Thomas Naselaris, Kenneth A Norman, and Tanishq Mathew Abraham. Mindeye2: Shared-subject models enable fmri-to-image with 1 hour of data.arXiv preprint arXiv:2403.11207, 2024

work page arXiv 2024

[24] [24]

Brain-it: Image reconstruction from fmri via brain-interaction transformer.arXiv preprint arXiv:2510.25976, 2025

Roman Beliy, Amit Zalcher, Jonathan Kogman, Navve Wasserman, and Michal Irani. Brain-it: Image reconstruction from fmri via brain-interaction transformer.arXiv preprint arXiv:2510.25976, 2025

work page arXiv 2025

[25] [25]

Computational models of category-selective brain regions enable high-throughput tests of selectivity.Nature communications, 12(1):5540, 2021

N Apurva Ratan Murty, Pouya Bashivan, Alex Abate, James J DiCarlo, and Nancy Kanwisher. Computational models of category-selective brain regions enable high-throughput tests of selectivity.Nature communications, 12(1):5540, 2021

work page 2021

[26] [26]

Kay, Thomas Naselaris, Ryan J

Kendrick N. Kay, Thomas Naselaris, Ryan J. Prenger, and Jack L. Gallant. Identifying natural im- ages from human brain activity.Nature, 452(7185):352–355, 2008. doi: 10.1038/nature06713

work page doi:10.1038/nature06713 2008

[27] [27]

Kay, Shinji Nishimoto, and Jack L

Thomas Naselaris, Kendrick N. Kay, Shinji Nishimoto, and Jack L. Gallant. Encoding and decoding in fmri.NeuroImage, 56(2):400–410, 2011. ISSN 1053-8119. doi: https://doi.org/ 10.1016/j.neuroimage.2010.07.073. URL https://www.sciencedirect.com/science/ article/pii/S1053811910010657. Multivariate Decoding and Brain Reading

work page doi:10.1016/j.neuroimage.2010.07.073 2011

[28] [28]

The wisdom of a crowd of brains: A universal brain encoder.arXiv preprint arXiv:2406.12179, 2024

Roman Beliy, Navve Wasserman, Amit Zalcher, and Michal Irani. The wisdom of a crowd of brains: A universal brain encoder.arXiv preprint arXiv:2406.12179, 2024

work page internal anchor Pith review arXiv 2024

[29] [29]

Transformer brain encoders explain human high-level visual responses.arXiv preprint arXiv:2505.17329, 2025

Hossein Adeli, Sun Minni, and Nikolaus Kriegeskorte. Transformer brain encoders explain human high-level visual responses.arXiv preprint arXiv:2505.17329, 2025

work page arXiv 2025

[30] [30]

Brain diffusion for vi- sual exploration: Cortical discovery using large scale generative models

Andrew Luo, Maggie Henderson, Leila Wehbe, and Michael Tarr. Brain diffusion for vi- sual exploration: Cortical discovery using large scale generative models. In A. Oh, T. Nau- mann, A. Globerson, K. Saenko, M. Hardt, and S. Levine, editors,Advances in Neu- ral Information Processing Systems, volume 36, pages 75740–75781. Curran Associates, Inc., 2023. UR...

work page 2023

[31] [31]

Disentangling causal webs in the brain using functional magnetic resonance imaging: A review of current approaches.Network Neuroscience, 3(2):237–273, 2019

Natalia Z Bielczyk, Sebo Uithol, Tim van Mourik, Paul Anderson, Jeffrey C Glennon, and Jan K Buitelaar. Disentangling causal webs in the brain using functional magnetic resonance imaging: A review of current approaches.Network Neuroscience, 3(2):237–273, 2019

work page 2019

[32] [32]

Causal mapping of human brain function.Nature reviews neuroscience, 23(6):361–375, 2022

Shan H Siddiqi, Konrad P Kording, Josef Parvizi, and Michael D Fox. Causal mapping of human brain function.Nature reviews neuroscience, 23(6):361–375, 2022. 11

work page 2022

[33] [33]

High- resolution image synthesis with latent diffusion models

Robin Rombach, Andreas Blattmann, Dominik Lorenz, Patrick Esser, and Björn Ommer. High- resolution image synthesis with latent diffusion models. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 10684–10695, 2022

work page 2022

[34] [34]

Instructpix2pix: Learning to follow image editing instructions

Tim Brooks, Aleksander Holynski, and Alexei A Efros. Instructpix2pix: Learning to follow image editing instructions. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 18392–18402, 2023

work page 2023

[35] [35]

Paint by inpaint: Learning to add image objects by removing them first

Navve Wasserman, Noam Rotstein, Roy Ganz, and Ron Kimmel. Paint by inpaint: Learning to add image objects by removing them first. InProceedings of the Computer Vision and Pattern Recognition Conference, pages 18313–18324, 2025

work page 2025

[36] [36]

FLUX.1 Kontext: Flow Matching for In-Context Image Generation and Editing in Latent Space

Black Forest Labs, Stephen Batifol, Andreas Blattmann, Frederic Boesel, Saksham Consul, Cyril Diagne, Tim Dockhorn, Jack English, Zion English, Patrick Esser, Sumith Kulal, Kyle Lacey, Yam Levi, Cheng Li, Dominik Lorenz, Jonas Müller, Dustin Podell, Robin Rombach, Harry Saini, Axel Sauer, and Luke Smith. Flux.1 kontext: Flow matching for in-context image ...

work page internal anchor Pith review Pith/arXiv arXiv 2025

[37] [37]

Causal reasoning and large language models: Opening a new frontier for causality.Transactions on Machine Learning Research, 2023

Emre Kiciman, Robert Ness, Amit Sharma, and Chenhao Tan. Causal reasoning and large language models: Opening a new frontier for causality.Transactions on Machine Learning Research, 2023

work page 2023

[38] [38]

A survey on hypothesis generation for scientific discovery in the era of large language models

Atilla Kaan Alkan, Shashwat Sourav, Maja Jablonska, Simone Astarita, Rishabh Chakrabarty, Nikhil Garuda, Pranav Khetarpal, Maciej Pióro, Dimitrios Tanoglidis, Kartheik G Iyer, et al. A survey on hypothesis generation for scientific discovery in the era of large language models. arXiv preprint arXiv:2504.05496, 2025

work page arXiv 2025

[39] [39]

Hypothesis generation with large language models

Yangqiaoyu Zhou, Haokun Liu, Tejes Srivastava, Hongyuan Mei, and Chenhao Tan. Hypothesis generation with large language models. InProceedings of the 1st Workshop on NLP for Science (NLP4Science), pages 117–139, 2024

work page 2024

[40] [40]

GPT-5 System Card.https://cdn.openai.com/gpt-5-system-card.pdf , 2025

OpenAI. GPT-5 System Card.https://cdn.openai.com/gpt-5-system-card.pdf , 2025. Accessed: 2026-05-06

work page 2025

[41] [41]

Gemma 3 Technical Report

Gemma Team and Google DeepMind. Gemma 3.arXiv preprint arXiv:2503.19786, 2025. URL https://arxiv.org

work page internal anchor Pith review Pith/arXiv arXiv 2025

[42] [42]

FLUX.2: Frontier Visual Intelligence

Black Forest Labs. FLUX.2: Frontier Visual Intelligence. https://bfl.ai/blog/flux-2, 2025

work page 2025

[43] [43]

Qwen3 Technical Report

Qwen Team. Qwen3 technical report, 2025. URLhttps://arxiv.org/abs/2505.09388

work page internal anchor Pith review Pith/arXiv arXiv 2025

[44] [44]

Learning transferable visual models from natural language supervision

Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agar- wal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, Gretchen Krueger, and Ilya Sutskever. Learning transferable visual models from natural language supervision. InInterna- tional Conference on Machine Learning, pages 8748–8763. PMLR, 2021

work page 2021

[45] [45]

A massive 7t fmri dataset to bridge cognitive neuroscience and artificial intelligence.Nature neuroscience, 25(1):116–126, 2022

Emily J Allen, Ghislain St-Yves, Yihan Wu, Jesse L Breedlove, Jacob S Prince, Logan T Dowdle, Matthias Nau, Brad Caron, Franco Pestilli, Ian Charest, et al. A massive 7t fmri dataset to bridge cognitive neuroscience and artificial intelligence.Nature neuroscience, 25(1):116–126, 2022. 12 Appendix A Ablation & Analysis A.1 Consistency Across Subjects While...

work page 2022