Recognition: 2 theorem links
· Lean TheoremBridging Brain and Semantics: A Hierarchical Framework for Semantically Enhanced fMRI-to-Video Reconstruction
Pith reviewed 2026-05-15 01:49 UTC · model grok-4.3
The pith
CineNeuron reconstructs videos from fMRI signals through bottom-up semantic enrichment followed by top-down memory integration.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The central claim is that a bottom-up semantic enrichment stage maps fMRI signals to comprehensive embeddings spanning textual, visual, action, and object information, while a subsequent top-down stage uses Mixture-of-Memories to select and integrate relevant prior memories, enabling video reconstructions that capture dynamic cues such as actions more effectively than prior methods.
What carries the argument
The two-stage hierarchical framework consisting of bottom-up semantic enrichment of fMRI signals into multi-aspect embeddings and top-down Mixture-of-Memories integration for dynamic fusion with prior data.
If this is right
- Reconstructed videos incorporate action and object details more accurately than methods relying on incomplete embeddings.
- Dynamic selection of prior memories allows the model to refine outputs using previously seen data without fixed tuning.
- The framework produces superior quantitative and qualitative results on existing fMRI-to-video benchmarks.
- The dual-stage design mirrors human brain dual-pathway processing to bridge the semantic gap between signals and video content.
Where Pith is reading between the lines
- The same enrichment-plus-memory pattern could be tested on other brain-signal modalities such as EEG for cross-modal video decoding.
- If the memory integration step proves robust, it might extend to real-time reconstruction tasks in brain-computer interface settings.
- The separation of bottom-up feature mapping from top-down selection offers a template for other multimodal reconstruction problems where prior knowledge must be selectively applied.
Load-bearing premise
The assumption that the bottom-up enrichment and top-down memory stages can reliably extract video-specific cues like actions from noisy fMRI signals without benchmark-specific overfitting.
What would settle it
Failure of CineNeuron to outperform baselines on a new fMRI-to-video dataset containing unseen actions and object categories would indicate the claim does not hold.
Figures
read the original abstract
Reconstructing dynamic visual experiences as videos from functional magnetic resonance imaging (fMRI) is pivotal for advancing the understanding of neural processes. However, current fMRI-to-video reconstruction methods are hindered by a semantic gap between noisy fMRI signals and the rich content of videos, stemming from a reliance on incomplete semantic embeddings that neither capture video-specific cues (e.g., actions) nor integrate prior knowledge. To this end, we draw inspiration from the dual-pathway processing mechanism in human brain and introduce CineNeuron, a novel hierarchical framework for semantically enhanced video reconstruction from fMRI signals with two synergistic stages. First, a bottom-up semantic enrichment stage maps fMRI signals to a rich embedding space that comprehensively captures textual semantics, image contents, action concepts, and object categories. Second, a top-down memory integration stage utilizes the proposed Mixture-of-Memories method to dynamically select relevant "memories" from previously seen data and fuse them with the fMRI embedding to refine the video reconstruction. Extensive experimental results on two fMRI-to-video benchmarks demonstrate that CineNeuron surpasses state-of-the-art methods across various metrics.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript proposes CineNeuron, a hierarchical framework for fMRI-to-video reconstruction inspired by dual-pathway brain processing. It consists of a bottom-up semantic enrichment stage mapping fMRI signals to rich embeddings capturing textual semantics, image contents, action concepts, and object categories, followed by a top-down memory integration stage that employs Mixture-of-Memories to dynamically select and fuse relevant memories from previously seen data. Extensive experiments on two fMRI-to-video benchmarks are reported to show that CineNeuron surpasses state-of-the-art methods across various metrics.
Significance. If the reported gains prove robust under controls for data leakage, the work could advance fMRI-to-video reconstruction by addressing the semantic gap through explicit integration of video-specific cues (actions, dynamics) via brain-inspired stages, building on existing embeddings without introducing circular parameter fitting.
major comments (2)
- [top-down memory integration stage] Top-down memory integration stage: the Mixture-of-Memories method selects and fuses memories from previously seen data, yet the manuscript provides no explicit mechanism (e.g., train/test split rules or disjoint memory bank construction) ensuring the bank excludes test distributions. This directly undermines the central claim of synergistic improvement, as gains on the two benchmarks could arise from memorization of benchmark statistics rather than general semantic enrichment.
- [experimental results] Experimental results section: the abstract asserts superiority across metrics without reporting ablation studies isolating the contribution of bottom-up vs. top-down stages, error bars, or data exclusion rules. Without these, it is impossible to confirm whether the synergistic gains are robust or affected by post-hoc choices, weakening the evidence for the hierarchical framework's effectiveness.
minor comments (1)
- [Abstract] Abstract: the phrase 'surpasses state-of-the-art methods across various metrics' would be clearer if the specific metrics (e.g., PSNR, SSIM, semantic similarity) and quantitative improvements were stated.
Simulated Author's Rebuttal
We thank the referee for the thoughtful comments, which help clarify key aspects of our framework. We address each major point below and will revise the manuscript to improve transparency and robustness.
read point-by-point responses
-
Referee: [top-down memory integration stage] Top-down memory integration stage: the Mixture-of-Memories method selects and fuses memories from previously seen data, yet the manuscript provides no explicit mechanism (e.g., train/test split rules or disjoint memory bank construction) ensuring the bank excludes test distributions. This directly undermines the central claim of synergistic improvement, as gains on the two benchmarks could arise from memorization of benchmark statistics rather than general semantic enrichment.
Authors: We agree that explicit safeguards against data leakage are essential for validating the top-down stage. The memory bank is constructed solely from training-set embeddings with no overlap to test samples, following standard benchmark splits (e.g., subject-wise or video-wise disjoint partitions). However, this protocol was described only at a high level. In revision we will add a dedicated subsection with precise train/test split rules, pseudocode for disjoint bank construction, and confirmation that test distributions are fully excluded from memory selection and fusion. revision: yes
-
Referee: [experimental results] Experimental results section: the abstract asserts superiority across metrics without reporting ablation studies isolating the contribution of bottom-up vs. top-down stages, error bars, or data exclusion rules. Without these, it is impossible to confirm whether the synergistic gains are robust or affected by post-hoc choices, weakening the evidence for the hierarchical framework's effectiveness.
Authors: We acknowledge the need for stronger empirical isolation of each stage. The original experiments compare the full model against prior methods but do not include stage-wise ablations or variance estimates. We will expand the experimental section with (i) ablation tables removing bottom-up enrichment or top-down integration, (ii) error bars or standard deviations over multiple random seeds, and (iii) explicit restatement of the data-exclusion rules already used for the memory bank. These additions will directly demonstrate the synergistic contribution of the two stages. revision: yes
Circularity Check
No significant circularity in derivation chain
full rationale
The paper describes a hierarchical framework (bottom-up semantic enrichment followed by top-down Mixture-of-Memories integration) that builds on existing embeddings and memory concepts. No equations, derivations, or self-referential reductions appear in the provided text; performance claims rest on benchmark experiments that remain externally falsifiable rather than being forced by fitted parameters or self-citation chains. The approach is self-contained against external benchmarks with no load-bearing self-definitional steps.
Axiom & Free-Parameter Ledger
Lean theorems connected to this paper
-
IndisputableMonolith/Foundation/ArithmeticFromLogic.leanreality_from_one_distinction unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
Extensive experimental results on two fMRI-to-video benchmarks demonstrate that CineNeuron surpasses state-of-the-art methods
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
-
[1]
Emily J Allen, Ghislain St-Yves, Yihan Wu, Jesse L Breedlove, Jacob S Prince, Logan T Dowdle, Matthias Nau, Brad Caron, Franco Pestilli, Ian Charest, et al. A massive 7T fMRI dataset to bridge cognitive neuroscience and ar- tificial intelligence.Nature Neuroscience, 25(1):116–126,
-
[2]
Yunpeng Bai, Xintao Wang, Yan-pei Cao, Yixiao Ge, Chun Yuan, and Ying Shan. DreamDiffusion: Generating high- quality images from brain EEG signals.arXiv preprint arXiv:2306.16934, 2023. 1
-
[3]
Johan Baijot, Stijn Denissen, Lars Costers, Jeroen Gie- len, Melissa Cambron, Miguel D’Haeseleer, Marie B D’hooghe, Anne-Marie Vanbinst, Johan De Mey, Guy Nagels, et al. Signal quality as Achilles’ heel of graph the- ory in functional magnetic resonance imaging in multiple sclerosis.Scientific Reports, 11(1):7376, 2021. 1
work page 2021
-
[4]
Yohann Benchetrit, Hubert Banville, and Jean-R ´emi King. Brain decoding: toward real-time reconstruction of visual perception.arXiv preprint arXiv:2310.19812, 2023. 2
-
[5]
Structure and func- tion of visual area MT.Annu
Richard T Born and David C Bradley. Structure and func- tion of visual area MT.Annu. Rev. Neurosci., 28(1):157– 189, 2005. 8
work page 2005
-
[6]
VideoCrafter1: Open Diffusion Models for High-Quality Video Generation
Haoxin Chen, Menghan Xia, Yingqing He, Yong Zhang, Xiaodong Cun, Shaoshu Yang, Jinbo Xing, Yaofang Liu, Qifeng Chen, Xintao Wang, et al. Videocrafter1: Open diffusion models for high-quality video generation.arXiv preprint arXiv:2310.19512, 2023. 3
work page internal anchor Pith review Pith/arXiv arXiv 2023
-
[7]
Videocrafter2: Overcoming data limitations for high-quality video diffu- sion models
Haoxin Chen, Yong Zhang, Xiaodong Cun, Menghan Xia, Xintao Wang, Chao Weng, and Ying Shan. Videocrafter2: Overcoming data limitations for high-quality video diffu- sion models. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 7310– 7320, 2024. 3
work page 2024
-
[8]
Masked autoencoders are effective tok- enizers for diffusion models
Hao Chen, Yujin Han, Fangyi Chen, Xiang Li, Yidong Wang, Jindong Wang, Ze Wang, Zicheng Liu, Difan Zou, and Bhiksha Raj. Masked autoencoders are effective tok- enizers for diffusion models. InForty-second International Conference on Machine Learning, 2025. 3
work page 2025
-
[9]
Seeing beyond the brain: Conditional diffusion model with sparse masked modeling for vision decoding
Zijiao Chen, Jiaxin Qing, Tiange Xiang, Wan Lin Yue, and Juan Helen Zhou. Seeing beyond the brain: Conditional diffusion model with sparse masked modeling for vision decoding. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 22710– 22720, 2023. 1, 2
work page 2023
-
[10]
Zijiao Chen, Jiaxin Qing, and Juan Helen Zhou. Cinematic mindscapes: High-quality video reconstruction from brain activity.Advances in Neural Information Processing Sys- tems, 36:24841–24858, 2023. 1, 2, 3, 5, 6, 4
work page 2023
-
[11]
Neural decoding of music from the EEG.Scien- tific Reports, 13(1):624, 2023
Ian Daly. Neural decoding of music from the EEG.Scien- tific Reports, 13(1):624, 2023. 1
work page 2023
-
[12]
How do expectations shape perception?Trends in Cognitive Sci- ences, 22(9):764–779, 2018
Floris P De Lange, Micha Heilbron, and Peter Kok. How do expectations shape perception?Trends in Cognitive Sci- ences, 22(9):764–779, 2018. 2
work page 2018
-
[13]
Imagenet: A large-scale hierarchical image database
Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li Fei-Fei. Imagenet: A large-scale hierarchical image database. In2009 IEEE Conference on Computer Vision and Pattern Recognition, pages 248–255. Ieee, 2009. 2
work page 2009
-
[14]
Alou Diakite, Cheng Li, Yousuf Babiker M Osman, Zan Chen, Yiang Pan, Jiawei Zhang, Tao Tan, Hairong Zheng, and Shanshan Wang. Dual-uncertainty guided multimodal mri-based visual pathway extraction.IEEE Transactions on Biomedical Engineering, 72(6):1993–2000, 2025. 2
work page 1993
-
[15]
fMRIPrep: a robust preprocessing pipeline for functional MRI.Nature Methods, 16(1):111–116, 2019
Oscar Esteban, Christopher J Markiewicz, Ross W Blair, Craig A Moodie, A Ilkay Isik, Asier Erramuzpe, James D Kent, Mathias Goncalves, Elizabeth DuPre, Madeleine Snyder, et al. fMRIPrep: a robust preprocessing pipeline for functional MRI.Nature Methods, 16(1):111–116, 2019. 1, 2
work page 2019
-
[16]
Dominic H Ffytche, CN Guy, and Semir Zeki. The par- allel visual motion inputs into areas V1 and V5 of human cerebral cortex.Brain, 118(6):1375–1394, 1995. 8
work page 1995
-
[17]
Brain netflix: Scaling data to reconstruct videos from brain signals
Camilo Fosco, Benjamin Lahner, Bowen Pan, Alex An- donian, Emilie Josephs, Alex Lascelles, and Aude Oliva. Brain netflix: Scaling data to reconstruct videos from brain signals. InEuropean Conference on Computer Vision, pages 457–474. Springer, 2024. 2
work page 2024
-
[18]
Mind-3d: Reconstruct high- quality 3d objects in human brain, 2023
Jianxiong Gao, Yuqian Fu, Yun Wang, Xuelin Qian, Jian- feng Feng, and Yanwei Fu. Mind-3d: Reconstruct high- quality 3d objects in human brain, 2023. 2
work page 2023
-
[19]
Jianxiong Gao, Yanwei Fu, Yuqian Fu, Yun Wang, Xuelin Qian, and Jianfeng Feng. Mind-3d++: Advancing fmri- based 3d reconstruction with high-quality textured mesh generation and a comprehensive dataset, 2025. 2
work page 2025
-
[20]
Jianxiong Gao, Yichang Liu, Baofeng Yang, Jianfeng Feng, and Yanwei Fu. CineBrain: A large-scale multi-modal brain dataset during naturalistic audiovisual narrative pro- cessing.arXiv preprint arXiv:2503.06940, 2025. 2, 3, 5, 6, 4
-
[21]
Yuan Gao, Tao Tan, Xin Wang, Regina Beets-Tan, Tianyu Zhang, Luyi Han, Antonio Portaluri, Chunyao Lu, Xing- long Liang, Jonas Teuwen, et al. Multi-modal longitudi- nal representation learning for predicting neoadjuvant ther- apy response in breast cancer treatment.IEEE Journal of Biomedical and Health Informatics, 2025. 3
work page 2025
-
[22]
Top-down influences on vi- sual processing.Nature Reviews Neuroscience, 14(5):350– 363, 2013
Charles D Gilbert and Wu Li. Top-down influences on vi- sual processing.Nature Reviews Neuroscience, 14(5):350– 363, 2013. 2
work page 2013
-
[23]
The minimal preprocessing pipelines for the human connectome project.Neuroimage, 80:105–124, 2013
Matthew F Glasser, Stamatios N Sotiropoulos, J An- thony Wilson, Timothy S Coalson, Bruce Fischl, Jesper L Andersson, Junqian Xu, Saad Jbabdi, Matthew Webster, Jonathan R Polimeni, et al. The minimal preprocessing pipelines for the human connectome project.Neuroimage, 80:105–124, 2013. 2
work page 2013
-
[24]
A multi-modal parcellation of human cere- bral cortex.Nature, 536(7615):171–178, 2016
Matthew F Glasser, Timothy S Coalson, Emma C Robin- son, Carl D Hacker, John Harwell, Essa Yacoub, Kamil Ugurbil, Jesper Andersson, Christian F Beckmann, Mark Jenkinson, et al. A multi-modal parcellation of human cere- bral cortex.Nature, 536(7615):171–178, 2016. 2
work page 2016
-
[25]
Zixuan Gong, Guangyin Bao, Qi Zhang, Zhongwei Wan, Duoqian Miao, Shoujin Wang, Lei Zhu, Changwei Wang, Rongtao Xu, Liang Hu, et al. NeuroClips: Towards high-fidelity and smooth fmri-to-video reconstruction.Ad- vances in Neural Information Processing Systems, 37: 51655–51683, 2024. 1, 2, 5, 6, 4, 7
work page 2024
-
[26]
Mindtuner: Cross-subject visual decoding with visual fingerprint and semantic correction
Zixuan Gong, Qi Zhang, Guangyin Bao, Lei Zhu, Rong- tao Xu, Ke Liu, Liang Hu, and Duoqian Miao. Mindtuner: Cross-subject visual decoding with visual fingerprint and semantic correction. InProceedings of the AAAI Confer- ence on Artificial Intelligence, pages 14247–14255, 2025. 2
work page 2025
-
[27]
The functional architecture of the ventral temporal cortex and its role in categorization.Nat
Kalanit Grill-Spector and Kevin S Weiner. The functional architecture of the ventral temporal cortex and its role in categorization.Nat. Rev. Neurosci., 15(8):536–548, 2014. 8
work page 2014
-
[28]
AnimateDiff: Animate Your Personalized Text-to-Image Diffusion Models without Specific Tuning
Yuwei Guo, Ceyuan Yang, Anyi Rao, Yaohui Wang, Yu Qiao, Dahua Lin, and Bo Dai. Animatediff: Animate your personalized text-to-image diffusion models without spe- cific tuning.arXiv preprint arXiv:2307.04725, 2023. 2
work page internal anchor Pith review Pith/arXiv arXiv 2023
-
[29]
Kuan Han, Haiguang Wen, Junxing Shi, Kun-Han Lu, Yizhen Zhang, Di Fu, and Zhongming Liu. Variational au- toencoder: An unsupervised model for encoding and de- coding fmri activity in visual cortex.NeuroImage, 198: 125–136, 2019. 2
work page 2019
-
[30]
Yujin Han, Hao Chen, Andi Han, Zhiheng Wang, Xinyu Liu, Yingya Zhang, Shiwei Zhang, and Difan Zou. Turn- ing internal gap into self-improvement: Promoting the generation-understanding unification in mllms.arXiv preprint arXiv:2507.16663, 2025. 2
-
[31]
Yujin Han, Andi Han, Wei Huang, Chaochao Lu, and Difan Zou. Can diffusion models learn hidden inter-feature rules behind images?arXiv preprint arXiv:2502.04725, 2025
-
[32]
Jonathan Ho, Ajay Jain, and Pieter Abbeel. Denoising dif- fusion probabilistic models.Advances in neural informa- tion processing systems, 33:6840–6851, 2020. 2, 5
work page 2020
-
[33]
Jonathan Ho, Tim Salimans, Alexey Gritsenko, William Chan, Mohammad Norouzi, and David J. Fleet. Video dif- fusion models.arXiv preprint arXiv:2204.03458, 2022. 3
work page internal anchor Pith review Pith/arXiv arXiv 2022
-
[34]
Lora: Low-rank adaptation of large language models
Edward J Hu, Yelong Shen, Phillip Wallis, Zeyuan Allen- Zhu, Yuanzhi Li, Shean Wang, Lu Wang, Weizhu Chen, et al. Lora: Low-rank adaptation of large language models. ICLR, 1(2):3, 2022. 5
work page 2022
-
[35]
Vbench: Comprehensive benchmark suite for video generative models
Ziqi Huang, Yinan He, Jiashuo Yu, Fan Zhang, Chenyang Si, Yuming Jiang, Yuanhan Zhang, Tianxing Wu, Qingyang Jin, Nattapol Chanpaisit, et al. Vbench: Comprehensive benchmark suite for video generative models. InProceed- ings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 21807–21818, 2024. 3, 5, 6
work page 2024
-
[36]
Decoding EEG by visual- guided deep neural networks
Zhicheng Jiao, Haoxuan You, Fan Yang, Xin Li, Han Zhang, and Dinggang Shen. Decoding EEG by visual- guided deep neural networks. InIJCAI, pages 1387–1393. Macao, 2019. 2
work page 2019
-
[37]
Beyond brain decoding: Visual-semantic reconstructions to mental creation exten- sion based on fmri
Haodong Jing, Dongyao Jiang, Yongqiang Ma, Haibo Hua, Bo Huang, and Nanning Zheng. Beyond brain decoding: Visual-semantic reconstructions to mental creation exten- sion based on fmri. InProceedings of the IEEE/CVF In- ternational Conference on Computer Vision, pages 19258– 19268, 2025
work page 2025
-
[38]
Haodong Jing, Yongqiang Ma, Panqi Yang, Haibo Hua, and Nanning Zheng. Pinpointing visual content: Disentan- gled features in multimodal model for eeg representation learning and decoding.Knowledge-Based Systems, page 114212, 2025
work page 2025
-
[39]
Haodong Jing, Yongqiang Ma, Wenjie Gao, Dongyao Jiang, Shuai Huang, and Nanning Zheng. Mind-vad: Brain- inspired fmri-to-video precise reconstruction via cross- modal autoregressive diffusion.IEEE Transactions on Cir- cuits and Systems for Video Technology, 2026
work page 2026
-
[40]
Haodong Jing, Yongqiang Ma, Panqi Yang, Haoyu Li, Shuai Huang, Badong Chen, and Nanning Zheng. Damind: Zero-shot visual cross-domain alignment and representa- tion for eeg decoding.IEEE Transactions on Image Pro- cessing, 2026
work page 2026
-
[41]
Haodong Jing, Panqi Yang, Dongyao Jiang, Zhipeng Liu, Nanning Zheng, and Yongqiang Ma. Evoke: Efficient and high-fidelity eeg-to-video reconstruction via decou- pling implicit neural representation. InProceedings of the AAAI Conference on Artificial Intelligence, pages 5539– 5547, 2026. 2
work page 2026
-
[42]
The Kinetics Human Action Video Dataset
Will Kay, Joao Carreira, Karen Simonyan, Brian Zhang, Chloe Hillier, Sudheendra Vijayanarasimhan, Fabio Vi- ola, Tim Green, Trevor Back, Paul Natsev, et al. The kinetics human action video dataset.arXiv preprint arXiv:1705.06950, 2017. 5
work page internal anchor Pith review Pith/arXiv arXiv 2017
-
[43]
Mixco: Mix-up contrastive learning for visual repre- sentation.arXiv preprint arXiv:2010.06300, 2020
Sungnyun Kim, Gihun Lee, Sangmin Bae, and Se-Young Yun. Mixco: Mix-up contrastive learning for visual repre- sentation.arXiv preprint arXiv:2010.06300, 2020. 4
-
[44]
Auto-Encoding Variational Bayes
Diederik P Kingma and Max Welling. Auto-encoding vari- ational bayes.arXiv preprint arXiv:1312.6114, 2013. 3
work page internal anchor Pith review Pith/arXiv arXiv 2013
-
[45]
HunyuanVideo: A Systematic Framework For Large Video Generative Models
Weijie Kong, Qi Tian, Zijian Zhang, Rox Min, Zuozhuo Dai, Jin Zhou, Jiangfeng Xiong, Xin Li, Bo Wu, Jian- wei Zhang, et al. Hunyuanvideo: A systematic frame- work for large video generative models.arXiv preprint arXiv:2412.03603, 2024. 3
work page internal anchor Pith review Pith/arXiv arXiv 2024
-
[46]
Representation of per- ceived object shape by the human lateral occipital complex
Zoe Kourtzi and Nancy Kanwisher. Representation of per- ceived object shape by the human lateral occipital complex. Science, 293(5534):1506–1509, 2001. 8
work page 2001
-
[47]
Ganit Kupershmidt, Roman Beliy, Guy Gaziv, and Michal Irani. A penny for your (visual) thoughts: Self-supervised reconstruction of natural movies from brain activity.arXiv preprint arXiv:2206.03544, 2022. 6, 5
-
[48]
Benjamin Lahner, Kshitij Dwivedi, Polina Iamshchinina, Monika Graumann, Alex Lascelles, Gemma Roig, Alessan- dro Thomas Gifford, Bowen Pan, SouYoung Jin, N Apurva Ratan Murty, et al. Bold moments: modeling short visual events through a video fmri dataset and metadata.bioRxiv, pages 2023–03, 2023. 5, 6
work page 2023
-
[49]
Enhanc- ing cross-subject fmri-to-video decoding with global-local functional alignment
Chong Li, Xuelin Qian, Yun Wang, Jingyang Huo, Xi- angyang Xue, Yanwei Fu, and Jianfeng Feng. Enhanc- ing cross-subject fmri-to-video decoding with global-local functional alignment. InEuropean Conference on Com- puter Vision, pages 353–369. Springer, 2024. 3, 6, 4
work page 2024
-
[50]
Dongyang Li, Chen Wei, Shiying Li, Jiachen Zou, Haoyang Qin, and Quanying Liu. Visual decoding and reconstruction via eeg embeddings with guided diffusion.arXiv preprint arXiv:2403.07721, 2024. 1
-
[51]
Magicmotion: Controllable video generation with dense-to-sparse trajectory guidance
Quanhao Li, Zhen Xing, Rui Wang, Hui Zhang, Qi Dai, and Zuxuan Wu. Magicmotion: Controllable video generation with dense-to-sparse trajectory guidance. InProceedings of the IEEE/CVF International Conference on Computer Vi- sion (ICCV), pages 12112–12123, 2025. 2
work page 2025
-
[52]
Flashmotion: Few- step controllable video generation with trajectory guidance
Quanhao Li, Zhen Xing, Rui Wang, Haidong Cao, Qi Dai, Daoguo Dong, and Zuxuan Wu. Flashmotion: Few- step controllable video generation with trajectory guidance. arXiv preprint arXiv:2603.12146, 2026. 2
-
[53]
Neurobolt: Resting-state EEG- to-fMRI synthesis with multi-dimensional feature mapping
Yamin Li, Ange Lou, Ziyuan Xu, Shengchao Zhang, Shiyu Wang, Dario Englot, Soheil Kolouri, Daniel Moyer, Roza Bayrak, and Catie Chang. Neurobolt: Resting-state EEG- to-fMRI synthesis with multi-dimensional feature mapping. Advances in Neural Information Processing Systems, 37: 23378–23405, 2024. 1
work page 2024
-
[54]
Microsoft coco: Common objects in context
Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Doll ´ar, and C Lawrence Zitnick. Microsoft coco: Common objects in context. InComputer vision–ECCV 2014: 13th Eu- ropean conference, zurich, Switzerland, September 6-12, 2014, proceedings, part v 13, pages 740–755. Springer,
work page 2014
-
[55]
Focal loss for dense object detection
Tsung-Yi Lin, Priya Goyal, Ross Girshick, Kaiming He, and Piotr Doll ´ar. Focal loss for dense object detection. In Proceedings of the IEEE international Conference on com- puter vision, pages 2980–2988, 2017. 3
work page 2017
-
[56]
Flow Matching for Generative Modeling
Yaron Lipman, Ricky TQ Chen, Heli Ben-Hamu, Maxim- ilian Nickel, and Matt Le. Flow matching for generative modeling.arXiv preprint arXiv:2210.02747, 2022. 5
work page internal anchor Pith review Pith/arXiv arXiv 2022
-
[57]
Flow Straight and Fast: Learning to Generate and Transfer Data with Rectified Flow
Xingchao Liu, Chengyue Gong, and Qiang Liu. Flow straight and fast: Learning to generate and transfer data with rectified flow.arXiv preprint arXiv:2209.03003, 2022. 5
work page internal anchor Pith review Pith/arXiv arXiv 2022
-
[58]
Xuan-Hao Liu, Yan-Kai Liu, Yansen Wang, Kan Ren, Han- wen Shi, Zilong Wang, Dongsheng Li, Bao-Liang Lu, and Wei-Long Zheng. EEG2video: Towards decoding dynamic visual perception from EEG signals.Advances in Neural Information Processing Systems, 37:72245–72273, 2024. 2
work page 2024
-
[59]
Yulong Liu, Yongqiang Ma, Wei Zhou, Guibo Zhu, and Nanning Zheng. Brainclip: Bridging brain and visual- linguistic representation via clip for generic natural visual stimulus decoding.arXiv preprint arXiv:2302.12971, 2023. 2
-
[60]
See through their minds: Learning trans- ferable brain decoding models from cross-subject fmri
Yulong Liu, Yongqiang Ma, Guibo Zhu, Haodong Jing, and Nanning Zheng. See through their minds: Learning trans- ferable brain decoding models from cross-subject fmri. In Proceedings of the AAAI Conference on Artificial Intelli- gence, pages 5730–5738, 2025. 2
work page 2025
-
[61]
Showtable: Unlocking creative table visualization with collaborative reflection and refinement
Zhihang Liu, Xiaoyi Bao, Pandeng Li, Junjie Zhou, Zhaohe Liao, Yefei He, Kaixun Jiang, Chen-Wei Xie, Yun Zheng, and Hongtao Xie. Showtable: Unlocking creative table visualization with collaborative reflection and refinement. arXiv preprint arXiv:2512.13303, 2025. 3
-
[62]
Zhihang Liu, Chen-Wei Xie, Pandeng Li, Liming Zhao, Longxiang Tang, Yun Zheng, Chuanbin Liu, and Hongtao Xie. Hybrid-level instruction injection for video token com- pression in multi-modal large language models. InPro- ceedings of the Computer Vision and Pattern Recognition Conference, pages 8568–8578, 2025. 3
work page 2025
-
[63]
Zhihang Liu, Chen-Wei Xie, Bin Wen, Feiwu Yu, Jixuan Chen, Pandeng Li, Boqiang Zhang, Nianzu Yang, Yinglu Li, Zuan Gao, et al. Capability: A comprehensive vi- sual caption benchmark for evaluating both correctness and thoroughness.arXiv preprint arXiv:2502.14914, 2025. 3
-
[64]
Decoupled Weight Decay Regularization
Ilya Loshchilov and Frank Hutter. Decoupled weight decay regularization.arXiv preprint arXiv:1711.05101, 2017. 5
work page internal anchor Pith review Pith/arXiv arXiv 2017
-
[65]
Jingyu Lu, Haonan Wang, Qixiang Zhang, and Xi- aomeng Li. A cognitive process-inspired architecture for subject-agnostic brain visual decoding.arXiv preprint arXiv:2511.02565, 2025. 2
-
[66]
Yizhuo Lu, Changde Du, Chong Wang, Xuanliu Zhu, Liuyun Jiang, Xujin Li, and Huiguang He. Animate your thoughts: Decoupled reconstruction of dynamic nat- ural vision from slow brain activity.arXiv preprint arXiv:2405.03280, 2024. 1, 2, 6
-
[67]
Na Luo, Weiyang Shi, Zhengyi Yang, Ming Song, and Tianzi Jiang. Multimodal fusion of brain imaging data: Methods and applications.Machine Intelligence Research, 21(1):136–152, 2024. 3
work page 2024
-
[68]
Hierarchical bayesian causality network to extract high-level semantic information in visual cortex
Yongqiang Ma, Wen Zhang, Ming Du, Haodong Jing, and Nanning Zheng. Hierarchical bayesian causality network to extract high-level semantic information in visual cortex. International Journal of Neural Systems, 34(01):2450002,
-
[69]
Animal kingdom: A large and diverse dataset for animal behavior understanding
Xun Long Ng, Kian Eng Ong, Qichen Zheng, Yun Ni, Si Yong Yeo, and Jun Liu. Animal kingdom: A large and diverse dataset for animal behavior understanding. InPro- ceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 19023–19034, 2022. 5
work page 2022
-
[70]
Reconstructing visual experiences from brain activity evoked by natural movies
Shinji Nishimoto, An T Vu, Thomas Naselaris, Yuval Ben- jamini, Bin Yu, and Jack L Gallant. Reconstructing visual experiences from brain activity evoked by natural movies. Current Biology, 21(19):1641–1646, 2011. 2
work page 2011
-
[71]
Representation Learning with Contrastive Predictive Coding
Aaron van den Oord, Yazhe Li, and Oriol Vinyals. Repre- sentation learning with contrastive predictive coding.arXiv preprint arXiv:1807.03748, 2018. 3
work page internal anchor Pith review Pith/arXiv arXiv 2018
-
[72]
DINOv2: Learning Robust Visual Features without Supervision
Maxime Oquab, Timoth ´ee Darcet, Th ´eo Moutakanni, Huy V o, Marc Szafraniec, Vasil Khalidov, Pierre Fernandez, Daniel Haziza, Francisco Massa, Alaaeldin El-Nouby, et al. Dinov2: Learning robust visual features without supervi- sion.arXiv preprint arXiv:2304.07193, 2023. 5
work page internal anchor Pith review Pith/arXiv arXiv 2023
-
[73]
Furkan Ozcelik and Rufin VanRullen. Natural scene recon- struction from fMRI signals using generative latent diffu- sion.Scientific Reports, 13(1):15666, 2023. 1
work page 2023
-
[74]
Scalable diffusion mod- els with transformers
William Peebles and Saining Xie. Scalable diffusion mod- els with transformers. InProceedings of the IEEE/CVF In- ternational Conference on Computer Vision, pages 4195– 4205, 2023. 3
work page 2023
-
[75]
Psychometry: An omnifit model for image re- construction from human brain activity
Ruijie Quan, Wenguan Wang, Zhibo Tian, Fan Ma, and Yi Yang. Psychometry: An omnifit model for image re- construction from human brain activity. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 233–243, 2024. 2
work page 2024
-
[76]
Learn- ing transferable visual models from natural language super- vision
Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, et al. Learn- ing transferable visual models from natural language super- vision. InInternational Conference on Machine Learning, pages 8748–8763. PmLR, 2021. 2
work page 2021
-
[77]
Learn- ing transferable visual models from natural language super- vision
Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, et al. Learn- ing transferable visual models from natural language super- vision. InInternational Conference on Machine Learning, pages 8748–8763. PmLR, 2021. 5
work page 2021
-
[78]
Hierarchical Text-Conditional Image Generation with CLIP Latents
Aditya Ramesh, Prafulla Dhariwal, Alex Nichol, Casey Chu, and Mark Chen. Hierarchical text-conditional image generation with clip latents.arXiv preprint arXiv:2204.06125, 1(2):3, 2022. 4
work page internal anchor Pith review Pith/arXiv arXiv 2022
-
[79]
High-resolution image synthesis with latent diffusion models
Robin Rombach, Andreas Blattmann, Dominik Lorenz, Patrick Esser, and Bj ¨orn Ommer. High-resolution image synthesis with latent diffusion models. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 10684–10695, 2022. 2
work page 2022
-
[80]
Paul Scotti, Atmadeep Banerjee, Jimmie Goode, Stepan Shabalin, Alex Nguyen, Aidan Dempster, Nathalie Ver- linde, Elad Yundler, David Weisberg, Kenneth Norman, et al. Reconstructing the mind’s eye: fmri-to-image with contrastive learning and diffusion priors.Advances in Neural Information Processing Systems, 36:24705–24728,
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.