arxiv: 2604.26218 · v1 · submitted 2026-04-29 · 💻 cs.CV

Recognition: unknown

ViBE: Visual-to-M/EEG Brain Encoding via Spatio-Temporal VAE and Distribution-Aligned Projection

Boyu Wang, Ganxi Xu, Guoxu Zhou, Jian Zhu, Jinyi Long, Shuyan Zhou, Yonghao Song, Yuting Tang, Zhao-Rong Lai

Authors on Pith no claims yet

Pith reviewed 2026-05-07 12:16 UTC · model grok-4.3

classification 💻 cs.CV

keywords brain encodingMEGEEGvisual stimulivariational autoencodercross-modal alignmentdistribution alignmentQ-Former

0 comments

The pith

ViBE generates M/EEG signals from visual stimuli by reconstructing neural responses in a spatio-temporal latent space and aligning visual embeddings to it.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents ViBE as a framework that turns visual images into corresponding magnetoencephalography and electroencephalography recordings. It separates the task into faithful signal reconstruction and cross-modal alignment between vision and brain activity. A convolutional variational autoencoder first learns to reconstruct the spatio-temporal patterns of real M/EEG data. A Q-Former then translates CLIP visual embeddings into that same latent space, after which point-wise and distributional losses pull the translated embeddings toward the true neural ones. If the approach holds, it supplies a concrete route from pixels to measurable brain signals on existing datasets.

Core claim

ViBE generates magnetoencephalography and electroencephalography signals from visual stimuli by first using a spatio-temporal convolutional variational autoencoder to reconstruct neural responses, then employing a Q-Former to map CLIP image embeddings into the autoencoder latent space as neural proxy embeddings, and finally applying both mean squared error and sliced Wasserstein distance to align the proxy embeddings with the true latent embeddings.

What carries the argument

Spatio-temporal convolutional variational autoencoder (TSC-VAE) whose latent space receives Q-Former-mapped CLIP embeddings and is aligned to real neural latents via combined MSE and sliced Wasserstein losses.

If this is right

Higher-fidelity reconstruction of spatio-temporal M/EEG patterns on the THINGS-EEG2 and THINGS-MEG datasets.
Tighter cross-modal alignment between visual feature spaces and neural response spaces.
A modular pipeline that separates reconstruction from alignment for future brain encoding models.
Direct applicability to visual stimulus-to-brain-signal generation tasks.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same alignment strategy might transfer to other sensory modalities where one needs to map external stimuli into recorded neural activity.
Successful generation could supply synthetic training data for downstream brain-computer interface decoders.
The framework could be tested on real-time streaming visual input to check whether alignment remains stable outside static image sets.

Load-bearing premise

The assumption that visual features from CLIP, once mapped by Q-Former and aligned with MSE plus sliced Wasserstein distance inside the TSC-VAE latent space, produce M/EEG signals that match real neural responses outside the training distribution.

What would settle it

Generated signals on held-out visual stimuli from the THINGS datasets show low correlation or mismatched statistical structure with simultaneously recorded M/EEG when compared channel-by-channel and time-by-time.

Figures

Figures reproduced from arXiv: 2604.26218 by Boyu Wang, Ganxi Xu, Guoxu Zhou, Jian Zhu, Jinyi Long, Shuyan Zhou, Yonghao Song, Yuting Tang, Zhao-Rong Lai.

**Figure 1.** Figure 1: Overview of the training procedure. In Stage I, the TSC-VAE employs temporal convo view at source ↗

**Figure 2.** Figure 2: Illustration of the inference pipeline. Given a test image view at source ↗

**Figure 3.** Figure 3: Visual comparison of TSConv and TSConvPlus across Stage I and Stage II on THINGS view at source ↗

**Figure 4.** Figure 4: Distribution comparison of CLIP image embeddings, TSC-VAE latent embeddings, and view at source ↗

**Figure 5.** Figure 5: Brain region ablation study results. Left: THINGS-EEG2. Right: THINGS-MEG. view at source ↗

read the original abstract

Brain encoding models not only serve to decipher how visual stimuli are transformed into neural responses, but also represent a critical step toward visual prostheses that restore vision for patients with severe vision disorders. Brain encoding involves two fundamental steps: achieving faithful reconstruction of neural responses and establishing cross-modal alignment between visual stimuli and neural responses. To this end, we propose ViBE, a novel brain encoding framework for generating magnetoencephalography (MEG) and electroencephalography (EEG) signals from visual stimuli. Specifically, we first design a spatio-temporal convolutional variational autoencoder (TSC-VAE) that captures the spatio-temporal characteristics of M/EEG signals for effective neural response reconstruction. To bridge the modality gap between visual features and neural representations, we employ Q-Former to map CLIP image embeddings to the TSC-VAE latent space, producing neural proxy embeddings. For comprehensive cross-modal alignment, we combine mean squared error (MSE) loss for point-wise feature matching with sliced Wasserstein distance (SWD) for probability distribution alignment between the neural proxy embeddings and TSC-VAE latent embeddings. We conduct extensive experiments on the THINGS-EEG2 and THINGS-MEG datasets, demonstrating the effectiveness of our approach in generating high-quality M/EEG signals from visual stimuli.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

ViBE stitches a spatio-temporal VAE, Q-Former bridge from CLIP, and MSE plus sliced Wasserstein alignment into a visual-to-M/EEG pipeline, but the advance is incremental and the abstract gives no numbers to judge it.

read the letter

ViBE is a pipeline that trains a spatio-temporal convolutional VAE on M/EEG data to reconstruct signals, then uses a Q-Former to map CLIP image embeddings into that latent space, and aligns the two with a mix of point-wise MSE and distributional sliced Wasserstein distance. The goal is to generate plausible EEG or MEG responses from visual stimuli on the THINGS datasets. The architecture is described clearly enough that someone could reimplement the main pieces without too much guesswork. The choice of datasets is sensible, and the dual loss for alignment is a standard move that avoids relying on one metric alone. That part is fine as far as it goes. The main limitation is that the abstract claims high-quality generation and extensive experiments but supplies no quantitative scores, no baseline comparisons, and no ablation results. Without those, it is impossible to tell whether the combination actually outperforms simpler regression or prior encoding models, or whether the outputs are faithful enough for downstream use in prostheses. The generalization claim also rests on the held-out THINGS splits, which may not stress distribution shift very hard. This paper is mainly for people already working on brain encoding or BCI pipelines who want one more reference implementation to compare against. It is not essential reading for the broader field. I would send it to peer review because the problem matters and the method is internally consistent, but the referees will need to see the actual metrics and controls before any stronger claims can be accepted.

Referee Report

1 major / 0 minor

Summary. The paper proposes ViBE, a brain encoding framework that reconstructs M/EEG signals from visual stimuli using a spatio-temporal convolutional variational autoencoder (TSC-VAE) to capture neural response characteristics, a Q-Former to map CLIP image embeddings into the TSC-VAE latent space as neural proxy embeddings, and a joint training objective combining MSE for point-wise matching with sliced Wasserstein distance (SWD) for distribution alignment. Extensive experiments are claimed on the THINGS-EEG2 and THINGS-MEG datasets to demonstrate high-quality M/EEG signal generation from visual inputs.

Significance. If the quantitative results, baselines, and ablations support the claims, the work could contribute to brain encoding research by integrating spatio-temporal VAE reconstruction with cross-modal distribution alignment, potentially advancing applications toward visual prostheses. The architecture is coherent and the use of standard losses on held-out data avoids obvious circularity, but the lack of reported metrics in the abstract leaves the practical impact unverified from the given description.

major comments (1)

Abstract: the central claim of 'demonstrating the effectiveness of our approach in generating high-quality M/EEG signals' is not supported by any quantitative metrics, baseline comparisons, ablation results, or error analysis. Without these, the effectiveness cannot be assessed and the experiments section must supply them to substantiate the contribution.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the constructive comment on the abstract. We address the point below and will revise the manuscript to strengthen the presentation of our results.

read point-by-point responses

Referee: Abstract: the central claim of 'demonstrating the effectiveness of our approach in generating high-quality M/EEG signals' is not supported by any quantitative metrics, baseline comparisons, ablation results, or error analysis. Without these, the effectiveness cannot be assessed and the experiments section must supply them to substantiate the contribution.

Authors: We agree that the abstract would benefit from explicit quantitative support for the effectiveness claim. The Experiments section of the manuscript already reports quantitative metrics (reconstruction quality on held-out data), baseline comparisons, ablation studies on the TSC-VAE, Q-Former, and loss components, and error analyses across the THINGS-EEG2 and THINGS-MEG datasets. To make the abstract self-contained and directly substantiate the central claim, we will revise it to include key numerical results from those experiments. revision: yes

Circularity Check

0 steps flagged

No significant circularity detected

full rationale

The paper presents a standard encoder-decoder architecture: TSC-VAE reconstructs M/EEG signals from their own spatio-temporal structure, Q-Former maps external CLIP visual embeddings into the VAE latent space, and training uses ordinary MSE plus sliced Wasserstein losses on held-out splits of the public THINGS-EEG2 and THINGS-MEG datasets. None of the reported performance numbers (reconstruction fidelity, alignment metrics) are algebraically forced by the fitted parameters themselves or by any self-referential normalization; the derivation chain consists of empirical training and evaluation against independent ground-truth neural recordings.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 1 invented entities

The central claim depends on the unproven effectiveness of the proposed TSC-VAE architecture and the assumption that CLIP features plus Q-Former can be aligned to neural latent space via standard losses; no numerical free parameters are stated in the abstract.

axioms (1)

domain assumption CLIP image embeddings contain visual features that are linearly or non-linearly mappable to M/EEG representations
Invoked when Q-Former is used to produce neural proxy embeddings from CLIP features.

invented entities (1)

TSC-VAE no independent evidence
purpose: Captures spatio-temporal characteristics of M/EEG signals for reconstruction
New architectural design introduced in the paper; no independent evidence of its superiority is provided in the abstract.

pith-pipeline@v0.9.0 · 5556 in / 1363 out tokens · 100806 ms · 2026-05-07T12:16:59.682348+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

45 extracted references · 10 canonical work pages · 1 internal anchor

[1]

Encoding and decoding in fmri.Neuroimage, 56(2):400–410, 2011

Thomas Naselaris, Kendrick N Kay, Shinji Nishimoto, and Jack L Gallant. Encoding and decoding in fmri.Neuroimage, 56(2):400–410, 2011

2011
[2]

Mindsimulator: Exploring brain concept localization via synthetic fmri.arXiv preprint arXiv:2503.02351, 2025

Guangyin Bao, Qi Zhang, Zixuan Gong, Zhuojia Wu, and Duoqian Miao. Mindsimulator: Exploring brain concept localization via synthetic fmri.arXiv preprint arXiv:2503.02351, 2025

work page arXiv 2025
[3]

Brain diffusion for visual exploration: Cortical discovery using large scale generative models.Advances in Neural Information Processing Systems, 36:75740–75781, 2023

Andrew Luo, Maggie Henderson, Leila Wehbe, and Michael Tarr. Brain diffusion for visual exploration: Cortical discovery using large scale generative models.Advances in Neural Information Processing Systems, 36:75740–75781, 2023

2023
[4]

Brain mapping with dense features: Grounding cortical semantic selectivity in natural images with vision transformers.arXiv preprint arXiv:2410.05266, 2024

Andrew F Luo, Jacob Yeung, Rushikesh Zawar, Shaurya Dewan, Margaret M Henderson, Leila Wehbe, and Michael J Tarr. Brain mapping with dense features: Grounding cortical semantic selectivity in natural images with vision transformers.arXiv preprint arXiv:2410.05266, 2024

work page arXiv 2024
[5]

Development of visual neuroprostheses: trends and challenges.Bioelec- tronic medicine, 4(1):12, 2018

Eduardo Fernandez. Development of visual neuroprostheses: trends and challenges.Bioelec- tronic medicine, 4(1):12, 2018

2018
[6]

A head mounted device stimulator for optogenetic retinal prosthesis.Journal of neural engineering, 15(6):065002, 2018

Ahmed Soltan, John Martin Barrett, Pleun Maaskant, Niall Armstrong, Walid Al-Atabany, Lionel Chaudet, Mark Neil, Evelyne Sernagor, and Patrick Degenaar. A head mounted device stimulator for optogenetic retinal prosthesis.Journal of neural engineering, 15(6):065002, 2018

2018
[7]

Hybrid neural autoencoders for stimulus en- coding in visual and other sensory neuroprostheses.Advances in Neural Information Processing Systems, 35:22671–22685, 2022

Jacob Granley, Lucas Relic, and Michael Beyeler. Hybrid neural autoencoders for stimulus en- coding in visual and other sensory neuroprostheses.Advances in Neural Information Processing Systems, 35:22671–22685, 2022

2022
[8]

Synbrain: Enhancing visual-to-fmri synthesis via probabilistic representation learning.arXiv preprint arXiv:2508.10298, 2025

Weijian Mai, Jiamin Wu, Yu Zhu, Zhouheng Yao, Dongzhan Zhou, Andrew F Luo, Qihao Zheng, Wanli Ouyang, and Chunfeng Song. Synbrain: Enhancing visual-to-fmri synthesis via probabilistic representation learning.arXiv preprint arXiv:2508.10298, 2025

work page arXiv 2025
[9]

Dataset distillation with neural characteristic function: A minmax perspective

Shaobo Wang, Yicun Yang, Zhiyuan Liu, Chenghao Sun, Xuming Hu, Conghui He, and Linfeng Zhang. Dataset distillation with neural characteristic function: A minmax perspective. In Proceedings of the Computer Vision and Pattern Recognition Conference, pages 25570–25580, 2025

2025
[10]

Image-to-brain signal generation for visual prosthesis with clip guided multimodal diffusion models.arXiv preprint arXiv:2509.00787, 2025

Ganxi Xu, Zhao-Rong Lai, Yuting Tang, Yonghao Song, Guoxu Zhou, Jian Zhu, Jinyi Long, et al. Image-to-brain signal generation for visual prosthesis with clip guided multimodal diffusion models.arXiv preprint arXiv:2509.00787, 2025

work page arXiv 2025
[11]

Leila Montazeri, Nizar El Zarif, Stuart Trenholm, and Mohamad Sawan. Optogenetic stimulation for restoring vision to patients suffering from retinal degenerative diseases: current strategies and future directions.IEEE transactions on biomedical circuits and systems, 13(6):1792–1807, 2019

2019
[12]

A large and rich eeg dataset for modeling human visual object recognition.NeuroImage, 264:119754, 2022

Alessandro T Gifford, Kshitij Dwivedi, Gemma Roig, and Radoslaw M Cichy. A large and rich eeg dataset for modeling human visual object recognition.NeuroImage, 264:119754, 2022

2022
[13]

Things-data, a multimodal collection of large-scale datasets for investigating object representations in human brain and behavior.Elife, 12:e82580, 2023

Martin N Hebart, Oliver Contier, Lina Teichmann, Adam H Rockter, Charles Y Zheng, Alexis Kidder, Anna Corriveau, Maryam Vaziri-Pashkam, and Chris I Baker. Things-data, a multimodal collection of large-scale datasets for investigating object representations in human brain and behavior.Elife, 12:e82580, 2023

2023
[14]

Scalable diffusion models with transformers

William Peebles and Saining Xie. Scalable diffusion models with transformers. InProceedings of the IEEE/CVF international conference on computer vision, pages 4195–4205, 2023. 10

2023
[15]

Attention is all you need.Advances in neural information processing systems, 30, 2017

Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. Attention is all you need.Advances in neural information processing systems, 30, 2017

2017
[16]

Vi- sual decoding and reconstruction via eeg embeddings with guided diffusion.arXiv preprint arXiv:2403.07721, 2024

Dongyang Li, Chen Wei, Shiying Li, Jiachen Zou, Haoyang Qin, and Quanying Liu. Vi- sual decoding and reconstruction via eeg embeddings with guided diffusion.arXiv preprint arXiv:2403.07721, 2024

work page arXiv 2024
[17]

Vision transformer with quadrangle attention.IEEE Transactions on Pattern Analysis and Machine Intelligence, 46(5):3608–3624, 2024

Qiming Zhang, Jing Zhang, Yufei Xu, and Dacheng Tao. Vision transformer with quadrangle attention.IEEE Transactions on Pattern Analysis and Machine Intelligence, 46(5):3608–3624, 2024

2024
[18]

Sliced and radon wasser- stein barycenters of measures.Journal of Mathematical Imaging and Vision, 51(1):22–45, 2015

Nicolas Bonneel, Julien Rabin, Gabriel Peyré, and Hanspeter Pfister. Sliced and radon wasser- stein barycenters of measures.Journal of Mathematical Imaging and Vision, 51(1):22–45, 2015

2015
[19]

Personalized visual encoding model construction with small data.Communications Biology, 5(1):1382, 2022

Zijin Gu, Keith Jamison, Mert Sabuncu, and Amy Kuceyeski. Personalized visual encoding model construction with small data.Communications Biology, 5(1):1382, 2022

2022
[20]

Natural speech reveals the semantic maps that tile human cerebral cortex.Nature, 532(7600):453–458, 2016

Alexander G Huth, Wendy A De Heer, Thomas L Griffiths, Frédéric E Theunissen, and Jack L Gallant. Natural speech reveals the semantic maps that tile human cerebral cortex.Nature, 532(7600):453–458, 2016

2016
[21]

Predicting human brain activity associated with the meanings of nouns.science, 320(5880):1191–1195, 2008

Tom M Mitchell, Svetlana V Shinkareva, Andrew Carlson, Kai-Min Chang, Vicente L Malave, Robert A Mason, and Marcel Adam Just. Predicting human brain activity associated with the meanings of nouns.science, 320(5880):1191–1195, 2008

2008
[22]

Brain encoding models based on multimodal transformers can transfer across language and vision.Advances in neural information processing systems, 36:29654–29666, 2023

Jerry Tang, Meng Du, Vy V o, Vasudev Lal, and Alexander Huth. Brain encoding models based on multimodal transformers can transfer across language and vision.Advances in neural information processing systems, 36:29654–29666, 2023

2023
[23]

Predicting brain activity using trans- formers.bioRxiv, pages 2023–08, 2023

Hossein Adeli, Sun Minni, and Nikolaus Kriegeskorte. Predicting brain activity using trans- formers.bioRxiv, pages 2023–08, 2023

2023
[24]

The wisdom of a crowd of brains: A universal brain encoder.arXiv preprint arXiv:2406.12179, 2024

Roman Beliy, Navve Wasserman, Amit Zalcher, and Michal Irani. The wisdom of a crowd of brains: A universal brain encoder.arXiv preprint arXiv:2406.12179, 2024

work page arXiv 2024
[25]

The algonauts project 2023 challenge: How the human brain makes sense of natural scenes.arXiv preprint arXiv:2301.03198, 2023

Alessandro T Gifford, Benjamin Lahner, Sari Saba-Sadiya, Martina G Vilas, Alex Las- celles, Aude Oliva, Kendrick Kay, Gemma Roig, and Radoslaw M Cichy. The algonauts project 2023 challenge: How the human brain makes sense of natural scenes.arXiv preprint arXiv:2301.03198, 2023

work page arXiv 2023
[26]

Genetic reactivation of cone photoreceptors restores visual responses in retinitis pigmentosa.science, 329(5990):413–417, 2010

V olker Busskamp, Jens Duebel, David Balya, Mathias Fradot, Tim James Viney, Sandra Siegert, Anna C Groner, Erik Cabuy, Valérie Forster, Mathias Seeliger, et al. Genetic reactivation of cone photoreceptors restores visual responses in retinitis pigmentosa.science, 329(5990):413–417, 2010

2010
[27]

Bioengineering strategies for restoring vision.Nature biomedical engineering, 7(4):387–404, 2023

Jasmina Cehajic-Kapetanovic, Mandeep S Singh, Eberhart Zrenner, and Robert E MacLaren. Bioengineering strategies for restoring vision.Nature biomedical engineering, 7(4):387–404, 2023

2023
[28]

Behavioural responses to a photovoltaic subretinal prosthesis implanted in non-human primates.Nature biomedical engineering, 4(2):172–180, 2020

Paul-Henri Prévot, Kevin Gehere, Fabrice Arcizet, Himanshu Akolkar, Mina A Khoei, Kévin Blaize, Omar Oubari, Pierre Daye, Marion Lanoë, Manon Valet, et al. Behavioural responses to a photovoltaic subretinal prosthesis implanted in non-human primates.Nature biomedical engineering, 4(2):172–180, 2020

2020
[29]

Restoration of patterned vision with an engineered photoactivatable g protein-coupled receptor.Nature communications, 8(1):1862, 2017

Michael H Berry, Amy Holt, Joshua Levitz, Johannes Broichhagen, Benjamin M Gaub, Meike Visel, Cherise Stanley, Krishan Aghi, Yang Joon Kim, Kevin Cao, et al. Restoration of patterned vision with an engineered photoactivatable g protein-coupled receptor.Nature communications, 8(1):1862, 2017

2017
[30]

Partial recovery of visual function in a blind patient after optogenetic therapy.Nature medicine, 27(7):1223–1229, 2021

José-Alain Sahel, Elise Boulanger-Scemama, Chloé Pagot, Angelo Arleo, Francesco Galluppi, Joseph N Martel, Simona Degli Esposti, Alexandre Delaux, Jean-Baptiste de Saint Aubert, Caroline de Montleau, et al. Partial recovery of visual function in a blind patient after optogenetic therapy.Nature medicine, 27(7):1223–1229, 2021. 11

2021
[31]

Stimulus- and goal-oriented frameworks for understanding natural vision.Nature neuroscience, 22(1):15– 24, 2019

Maxwell H Turner, Luis Gonzalo Sanchez Giraldo, Odelia Schwartz, and Fred Rieke. Stimulus- and goal-oriented frameworks for understanding natural vision.Nature neuroscience, 22(1):15– 24, 2019

2019
[32]

Vibe: A universal background subtraction algorithm for video sequences.IEEE Transactions on Image processing, 20(6):1709–1724, 2010

Olivier Barnich and Marc Van Droogenbroeck. Vibe: A universal background subtraction algorithm for video sequences.IEEE Transactions on Image processing, 20(6):1709–1724, 2010

2010
[33]

Region-of-interest processing for electronic visual prostheses.Journal of Electronic Imaging, 17(1):013002–013002, 2008

Justin R Boyle, Anthony J Maeder, and Wageeh W Boles. Region-of-interest processing for electronic visual prostheses.Journal of Electronic Imaging, 17(1):013002–013002, 2008

2008
[34]

Optimization of visual information presentation for visual prosthesis.International journal of biomedical imaging, 2018(1):3198342, 2018

Fei Guo, Yuan Yang, and Yong Gao. Optimization of visual information presentation for visual prosthesis.International journal of biomedical imaging, 2018(1):3198342, 2018

2018
[35]

Recognition of moving object in high dynamic scene for visual prosthesis.IEICE transactions on information and systems, 102(7):1321–1331, 2019

Fei Guo, Yuan Yang, Yang Xiao, Yong Gao, and Ningmei Yu. Recognition of moving object in high dynamic scene for visual prosthesis.IEICE transactions on information and systems, 102(7):1321–1331, 2019

2019
[36]

Optimization of neuroprosthetic vision via end-to-end deep reinforcement learning.International Journal of Neural Systems, 32(11):2250052, 2022

Burcu Küçüko ˘glu, Bodo Rueckauer, Nasir Ahmad, Jaap de Ruyter van Steveninck, Umut Güçlü, and Marcel van Gerven. Optimization of neuroprosthetic vision via end-to-end deep reinforcement learning.International Journal of Neural Systems, 32(11):2250052, 2022

2022
[37]

A computational model of phosphene appearance for epiretinal prostheses

Jacob Granley and Michael Beyeler. A computational model of phosphene appearance for epiretinal prostheses. In2021 43rd Annual International Conference of the IEEE Engineering in Medicine & Biology Society (EMBC), pages 4477–4481. IEEE, 2021

2021
[38]

Towards biologically plausible phosphene simulation for the differentiable optimization of visual cortical prostheses.Elife, 13:e85812, 2024

Maureen van der Grinten, Jaap de Ruyter van Steveninck, Antonio Lozano, Laura Pijnacker, Bodo Rueckauer, Pieter Roelfsema, Marcel van Gerven, Richard van Wezel, Umut Güçlü, and Ya˘gmur Güçlütürk. Towards biologically plausible phosphene simulation for the differentiable optimization of visual cortical prostheses.Elife, 13:e85812, 2024

2024
[39]

End-to-end optimization of prosthetic vision.Journal of Vision, 22(2):20–20, 2022

Jaap de Ruyter van Steveninck, Umut Güçlü, Richard van Wezel, and Marcel van Gerven. End-to-end optimization of prosthetic vision.Journal of Vision, 22(2):20–20, 2022

2022
[40]

Decoding natural images from eeg for object recognition.arXiv preprint arXiv:2308.13234, 2023

Yonghao Song, Bingchuan Liu, Xiang Li, Nanlin Shi, Yijun Wang, and Xiaorong Gao. Decoding natural images from eeg for object recognition.arXiv preprint arXiv:2308.13234, 2023

work page arXiv 2023
[41]

Auto-Encoding Variational Bayes

Diederik P Kingma and Max Welling. Auto-encoding variational bayes.arXiv preprint arXiv:1312.6114, 2013

work page internal anchor Pith review arXiv 2013
[42]

Learning transferable visual models from natural language supervision

Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, et al. Learning transferable visual models from natural language supervision. InInternational Conference on Machine Learning, pages 8748–8763. PMLR, 2021

2021
[43]

Deep neural networks reveal a gradient in the complexity of neural representations across the ventral stream.Journal of Neuroscience, 35(27):10005–10014, 2015

Umut Güçlü and Marcel AJ Van Gerven. Deep neural networks reveal a gradient in the complexity of neural representations across the ventral stream.Journal of Neuroscience, 35(27):10005–10014, 2015

2015
[44]

Performance-optimized hierarchical models predict neural responses in higher visual cortex.Proceedings of the national academy of sciences, 111(23):8619–8624, 2014

Daniel LK Yamins, Ha Hong, Charles F Cadieu, Ethan A Solomon, Darren Seibert, and James J DiCarlo. Performance-optimized hierarchical models predict neural responses in higher visual cortex.Proceedings of the national academy of sciences, 111(23):8619–8624, 2014

2014
[45]

ViBE (Ours)

Matthias Guggenmos, Philipp Sterzer, and Radoslaw Martin Cichy. Multivariate pattern analysis for meg: A comparison of dissimilarity measures.Neuroimage, 173:434–447, 2018. 12 A Technical appendices and supplementary material A.1 Experiment Details We implement ViBE using PyTorch on four NVIDIA V100S GPUs. Datasets and Preprocessing.We evaluate on THINGS-...

work page arXiv 2018