Recognition: no theorem link
EmambaIR: Efficient Visual State Space Model for Event-guided Image Reconstruction
Pith reviewed 2026-05-11 02:00 UTC · model grok-4.3
The pith
A state space model reconstructs images from event streams with linear complexity and better accuracy than CNNs or transformers.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
EmambaIR combines a cross-modal Top-k Sparse Attention Module that performs efficient pixel-level top-k attention to guide event-image fusion with a Gated State-Space Module that adds a nonlinear gated unit to vanilla linear state space models, thereby capturing global contextual dependencies and temporal information from sparse event streams without quadratic cost. This architecture is applied to three reconstruction tasks and yields higher-quality outputs than prior CNN- and ViT-based methods while consuming substantially less memory and computation on six datasets.
What carries the argument
The cross-modal Top-k Sparse Attention Module for sparse cross-modal fusion paired with the Gated State-Space Module that enhances temporal representation inside linear-complexity state space layers.
If this is right
- It delivers higher reconstruction quality than state-of-the-art methods on motion deblurring, deraining, and HDR enhancement across six datasets.
- It achieves substantial reductions in memory consumption and computational cost relative to vision transformers.
- Its linear complexity supports use in high-resolution scenarios where prior transformer approaches become prohibitive.
- It effectively processes the spatially sparse and temporally continuous nature of event streams for image restoration.
Where Pith is reading between the lines
- The same sparse-attention-plus-gating pattern could be tested on other event-based tasks such as tracking or segmentation.
- If the linear scaling holds beyond the reported resolutions, the model might run in real time on edge hardware for autonomous systems.
- The gated enhancement technique could be applied to other state-space-model variants in computer vision to improve temporal modeling.
Load-bearing premise
That the top-k sparse attention and gated state-space additions together model global dependencies and event-image interactions more effectively than CNNs or vision transformers while preserving linear scaling.
What would settle it
Direct head-to-head runs on the six datasets for motion deblurring, deraining, and HDR enhancement that show EmambaIR using more memory or time than current best methods or producing lower-quality reconstructions would disprove the central efficiency and performance claims.
Figures
read the original abstract
Recent event-based image reconstruction methods predominantly rely on Convolutional Neural Networks (CNNs) and Vision Transformers (ViTs) to process complementary event information. However, these architectures face fundamental limitations: CNNs often fail to capture global feature correlations, whereas ViTs incur quadratic computational complexity (e.g., $O(n^2)$), hindering their application in high-resolution scenarios. To address these bottlenecks, we introduce EmambaIR, an Efficient visual State Space Model designed for image reconstruction using spatially sparse and temporally continuous event streams. Our framework introduces two key components: the cross-modal Top-k Sparse Attention Module (TSAM) and the Gated State-Space Module (GSSM). TSAM efficiently performs pixel-level top-k sparse attention to guide cross-modal interactions, yielding rich yet sparse fusion features. Subsequently, GSSM utilizes a nonlinear gated unit to enhance the temporal representation of vanilla linear-complexity ($O(n)$) SSMs, effectively capturing global contextual dependencies without the typical computational overhead. Extensive experiments on six datasets across three diverse image reconstruction tasks - motion deblurring, deraining, and High Dynamic Range (HDR) enhancement - demonstrate that EmambaIR significantly outperforms state-of-the-art methods while offering substantial reductions in memory consumption and computational cost. The source code and data are publicly available at: https://github.com/YunhangWickert/EmambaIR
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper introduces EmambaIR, an efficient visual state space model for event-guided image reconstruction from sparse event streams. It proposes two modules: the cross-modal Top-k Sparse Attention Module (TSAM) for efficient pixel-level sparse attention to enable cross-modal fusion, and the Gated State-Space Module (GSSM) that augments linear-complexity SSMs with a nonlinear gated unit for improved temporal modeling. The framework is evaluated on motion deblurring, deraining, and HDR enhancement tasks across six datasets, claiming superior performance over CNN- and ViT-based SOTA methods alongside substantial reductions in memory and compute.
Significance. If the empirical results and efficiency claims hold, the work offers a promising direction for scalable event-based vision by leveraging SSMs to achieve global context with linear complexity, potentially enabling high-resolution applications where ViTs are prohibitive. Public release of code and data is a positive factor for reproducibility.
minor comments (3)
- [Abstract] Abstract: The claim of significant outperformance and efficiency gains is stated without any numerical metrics, error bars, or dataset identifiers, which reduces immediate assessability even though the full manuscript presumably contains these details in the experiments section.
- [§4] §4 (Experiments): Confirm that all reported comparisons include consistent evaluation protocols (e.g., same input resolutions, event representations) across the six datasets to support the cross-task generalization claim.
- [Figure 3] Figure 3 or equivalent architecture diagram: Clarify the exact integration point of TSAM outputs into GSSM to avoid ambiguity in how cross-modal features propagate through the state-space layers.
Simulated Author's Rebuttal
We thank the referee for the positive assessment of EmambaIR and the recommendation for minor revision. We appreciate the recognition that the approach offers a promising direction for scalable event-based vision by achieving global context with linear complexity.
Circularity Check
No significant circularity detected
full rationale
The manuscript introduces EmambaIR as an architectural framework combining TSAM (cross-modal Top-k Sparse Attention) and GSSM (Gated State-Space Module) to process event streams for image reconstruction. Its central claims rest on empirical results across six datasets for deblurring, deraining, and HDR tasks, plus the standard linear-complexity property of SSMs. No derivation chain, equation, or performance prediction reduces by construction to a fitted parameter, self-definition, or self-citation; the modules are defined independently and evaluated externally. This is the expected non-finding for an empirical architecture paper whose results are not forced by internal re-labeling of inputs.
Axiom & Free-Parameter Ledger
Reference graph
Works this paper leans on
-
[1]
2-d ssm: A general spatial layer for visual transformers
Ethan Baron, Itamar Zimerman, and Lior Wolf. 2-d ssm: A general spatial layer for visual transformers. arXiv preprint arXiv:2306.06635, 2023
-
[2]
Retinexformer: One-stage retinex-based transformer for low-light im- age enhancement
Yuanhao Cai, Hao Bian, Jing Lin, Haoqian Wang, Radu Timofte, and Yulun Zhang. Retinexformer: One-stage retinex-based transformer for low-light im- age enhancement. InProceedings of the IEEE/CVF International Conference on Computer Vision, pages 12504–12513, 2023
work page 2023
-
[3]
Hinet: Half instance normalization network for image restoration
Liangyu Chen, Xin Lu, Jie Zhang, Xiaojie Chu, and Chengpeng Chen. Hinet: Half instance normalization network for image restoration. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 182–192, 2021
work page 2021
-
[4]
Learning a sparse transformer network for effective image deraining
Xiang Chen, Hao Li, Mingqiang Li, and Jinshan Pan. Learning a sparse transformer network for effective image deraining. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recog- nition, pages 5896–5905, 2023
work page 2023
-
[5]
Rethinking coarse-to-fine approach in single image deblurring
Cho, S.J., Ji, S.W., Hong, J.P., Jung, S.W., Ko, and S.J. Rethinking coarse-to-fine approach in single image deblurring. 2021
work page 2021
-
[6]
Reciprocal attention mixing transformer for lightweight image restora- tion
Haram Choi, Cheolwoong Na, Jihyeon Oh, Seung- jae Lee, Jinseop Kim, Subeen Choe, Jeongmin Lee, Taehoon Kim, and Jihoon Yang. Reciprocal attention mixing transformer for lightweight image restora- tion. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 5992–6002, 2024
work page 2024
-
[7]
Nafssr: Stereo image super-resolution using nafnet
Xiaojie Chu, Liangyu Chen, and Wenqing Yu. Nafssr: Stereo image super-resolution using nafnet. InPro- ceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 1239–1248, 2022
work page 2022
-
[8]
Dancing in the dark: A benchmark towards general low-light video enhancement
Huiyuan Fu, Wenkai Zheng, Xicong Wang, Jiaxuan Wang, Heng Zhang, and Huadong Ma. Dancing in the dark: A benchmark towards general low-light video enhancement. InProceedings of the IEEE/CVF International Conference on Computer Vision, pages 12877–12886, 2023
work page 2023
-
[9]
Learning enriched features via selective state spaces model for efficient image deblurring
Hu Gao, Bowen Ma, Ying Zhang, Jingfan Yang, Jing Yang, and Depeng Dang. Learning enriched features via selective state spaces model for efficient image deblurring. InProceedings of the 32nd ACM Inter- national Conference on Multimedia, pages 710–718, 2024
work page 2024
-
[10]
Efficient frequency-domain image deraining with contrastive regularization
Ning Gao, Xingyu Jiang, Xiuhui Zhang, and Yue Deng. Efficient frequency-domain image deraining with contrastive regularization. InEuropean Confer- ence on Computer Vision (ECCV). Springer, 2024
work page 2024
-
[12]
Efficiently Modeling Long Sequences with Structured State Spaces
Albert Gu, Karan Goel, and Christopher Ré. Ef- ficiently modeling long sequences with structured state spaces. page arXiv:2111.00396, 2021
work page internal anchor Pith review arXiv 2021
-
[13]
Mambair: A simple baseline for image restoration with state-space model
Hang Guo, Jinmin Li, Tao Dai, Zhihao Ouyang, Xudong Ren, and Shu-Tao Xia. Mambair: A simple baseline for image restoration with state-space model. InEuropean Conference on Computer Vision, pages 222–241. Springer, 2025
work page 2025
-
[14]
Multi-scale representation learning for image restoration with state-space model,
Yuhong He, Long Peng, Qiaosi Yi, Chen Wu, and Lu Wang. Multi-scale representation learning for image restoration with state-space model.arXiv preprint arXiv:2408.10145, 2024
-
[15]
Long movie clip classification with state-space video mod- els
Md Mohaiminul Islam and Gedas Bertasius. Long movie clip classification with state-space video mod- els. InEuropean Conference on Computer Vision, pages 87–104. Springer, 2022
work page 2022
-
[16]
Hojin Jang, Devin McCormack, and Frank Tong. Noise-trained deep neural networks effectively pre- dict human vision and its neural responses to chal- lenging images.PLoS Biology, 19(12):e3001418, 2021
work page 2021
-
[17]
Learning event-based motion deblurring
Jiang, Z., Zhang, Y ., Zou, D., Ren, J., Lv, J., Liu, and Y . Learning event-based motion deblurring. 2020
work page 2020
-
[18]
Frequency-aware event-based video deblurring for real-world motion blur
Taewoo Kim, Hoonhee Cho, and Kuk-Jin Yoon. Frequency-aware event-based video deblurring for real-world motion blur. InProceedings of the IEEE/CVF Conference on Computer Vision and Pat- tern Recognition, pages 24966–24976, 2024
work page 2024
-
[19]
Adam: A method for stochastic optimization.ICLR, 2015
Kingma, D.P., Ba, and J. Adam: A method for stochastic optimization.ICLR, 2015
work page 2015
-
[20]
Blind de- convolution using alternating maximum a posteriori estimation with heavy-tailed priors
Kotera, J., Sroubek, F., Milanfar, and P. Blind de- convolution using alternating maximum a posteriori estimation with heavy-tailed priors. 2013. 8 APREPRINT- MAY11, 2026
work page 2013
-
[21]
Blind deconvo- lution using a normalized sparsity measure
Krishnan, D., Tay, T., Fergus, and R. Blind deconvo- lution using a normalized sparsity measure. 2011
work page 2011
-
[22]
Knn local attention for image restora- tion
Hunsang Lee, Hyesong Choi, Kwanghoon Sohn, and Dongbo Min. Knn local attention for image restora- tion. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 2139–2149, 2022
work page 2022
-
[23]
Guoqiang Liang, Kanghao Chen, Hangyu Li, Yunfan Lu, and Lin Wang. Towards robust event-guided low- light image enhancement: A large-scale real-world event-image dataset and novel approach. InPro- ceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 23–33, 2024
work page 2024
-
[24]
Swinir: Image restoration using swin transformer
Jingyun Liang, Jiezhang Cao, Guolei Sun, Kai Zhang, Luc Van Gool, and Radu Timofte. Swinir: Image restoration using swin transformer. InProceedings of the IEEE/CVF international conference on computer vision, pages 1833–1844, 2021
work page 2021
-
[25]
Learning event-driven video deblurring and interpolation
Lin, S., Zhang, J., Pan, J., Jiang, Z., Zou, D., Wang, Y ., Chen, J., Ren, and J. Learning event-driven video deblurring and interpolation. 2020
work page 2020
-
[26]
Pay attention to mlps.Advances in neural informa- tion processing systems, 34:9204–9215, 2021
Hanxiao Liu, Zihang Dai, David So, and Quoc V Le. Pay attention to mlps.Advances in neural informa- tion processing systems, 34:9204–9215, 2021
work page 2021
-
[27]
arXiv preprint arXiv:2401.10166 , year=
Yue Liu, Yunjie Tian, Yuzhong Zhao, Hongtian Yu, Lingxi Xie, Yaowei Wang, Qixiang Ye, and Yun- fan Liu. Vmamba: Visual state space model. page arXiv:2401.10166, 2024
-
[28]
Event camera demosaicing via swin trans- former and pixel-focus loss
Yunfan Lu, Yijie Xu, Wenzong Ma, Weiyu Guo, and Hui Xiong. Event camera demosaicing via swin trans- former and pixel-focus loss. InProceedings of the IEEE/CVF Conference on Computer Vision and Pat- tern Recognition, pages 1095–1105, 2024
work page 2024
-
[29]
U-mamba: Enhancing long-range dependency for biomedical image segmentation
Jun Ma, Feifei Li, and Bo Wang. U-mamba: En- hancing long-range dependency for biomedical im- age segmentation.arXiv preprint arXiv:2401.04722, 2024
-
[30]
Multi-bracket high dynamic range imaging with event cameras
Nico Messikommer, Stamatios Georgoulis, Daniel Gehrig, Stepan Tulyakov, Julius Erbach, Alfredo Bochicchio, Yuanyou Li, and Davide Scaramuzza. Multi-bracket high dynamic range imaging with event cameras. InProceedings of the IEEE/CVF con- ference on computer vision and pattern recognition, pages 547–557, 2022
work page 2022
-
[31]
Deep multi- scale convolutional neural network for dynamic scene deblurring.CVPR, 2017
Nah, S., Hyun Kim, T., Mu Lee, and K. Deep multi- scale convolutional neural network for dynamic scene deblurring.CVPR, 2017
work page 2017
-
[32]
On the integration of self-attention and convolution
Xuran Pan, Chunjiang Ge, Rui Lu, Shiji Song, Guanfu Chen, Zeyi Huang, and Gao Huang. On the integration of self-attention and convolution. In Proceedings of the IEEE/CVF conference on com- puter vision and pattern recognition, pages 815–825, 2022
work page 2022
-
[33]
arXiv preprint arXiv:2403.13600 (2024)
Yanyuan Qiao, Zheng Yu, Longteng Guo, Sihan Chen, Zijia Zhao, Mingzhen Sun, Qi Wu, and Jing Liu. Vl-mamba: Exploring state space models for multimodal learning. page arXiv:2403.13600, 2024
-
[34]
Esim: an open event camera simulator.CoLR, 2018
Rebecq, H., Gehrig, D., Scaramuzza, and D. Esim: an open event camera simulator.CoLR, 2018
work page 2018
-
[35]
Blurry video frame interpolation
Wang Shen, Wenbo Bao, Guangtao Zhai, Li Chen, Xiongkuo Min, and Zhiyong Gao. Blurry video frame interpolation. InProceedings of the IEEE/CVF con- ference on computer vision and pattern recognition, pages 5114–5123, 2020
work page 2020
-
[36]
Vmambair: Visual state space model for image restoration.arXiv preprint arXiv:2403.11423, 2024
Yuan Shi, Bin Xia, Xiaoyu Jin, Xing Wang, Tianyu Zhao, Xin Xia, Xuefeng Xiao, and Wenming Yang. Vmambair: Visual state space model for image restoration.arXiv preprint arXiv:2403.11423, 2024
-
[37]
Reducing the sim-to-real gap for event cam- eras.ECCV, 2020
Stoffregen, T., Scheerlinck, C., Scaramuzza, D., Drummond, T., Barnes, N., Kleeman, L., Mahony, and R. Reducing the sim-to-real gap for event cam- eras.ECCV, 2020
work page 2020
-
[38]
Spatially-attentive patch-hierarchical network for adaptive motion deblurring
Suin, M., Purohit, K., Rajagopalan, and A.N. Spatially-attentive patch-hierarchical network for adaptive motion deblurring. 2020
work page 2020
-
[39]
Event-based fusion for motion deblur- ring with cross-modal attention
Lei Sun, Christos Sakaridis, Jingyun Liang, Qi Jiang, Kailun Yang, Peng Sun, Yaozu Ye, Kaiwei Wang, and Luc Van Gool. Event-based fusion for motion deblur- ring with cross-modal attention. InEuropean confer- ence on computer vision, pages 412–428. Springer, 2022
work page 2022
-
[40]
Event-based frame inter- polation with ad-hoc deblurring
Lei Sun, Christos Sakaridis, Jingyun Liang, Peng Sun, Jiezhang Cao, Kai Zhang, Qi Jiang, Kaiwei Wang, and Luc Van Gool. Event-based frame inter- polation with ad-hoc deblurring. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 18043–18052, 2023
work page 2023
-
[41]
Restoring images in adverse weather conditions via histogram transformer
Shangquan Sun, Wenqi Ren, Xinwei Gao, Rui Wang, and Xiaochun Cao. Restoring images in adverse weather conditions via histogram transformer. In European Conference on Computer Vision (ECCV), pages 111–129. Springer, 2024
work page 2024
-
[42]
Motion aware event representation-driven image deblurring
Zhijing Sun, Xueyang Fu, Longzhuo Huang, Aip- ing Liu, and Zheng-Jun Zha. Motion aware event representation-driven image deblurring. InEuro- pean Conference on Computer Vision, pages 418–
-
[43]
Chuanxin Tang, Yucheng Zhao, Guangting Wang, Chong Luo, Wenxuan Xie, and Wenjun Zeng. Sparse mlp for image recognition: Is self-attention really necessary? InProceedings of the AAAI conference on artificial intelligence, volume 36, pages 2344–2351, 2022
work page 2022
-
[44]
Scale-recurrent network for deep image deblurring
Tao, X., Gao, H., Shen, X., Wang, J., Jia, and J. Scale-recurrent network for deep image deblurring. 2018
work page 2018
-
[45]
Banet: Blur-aware attention networks for dy- namic scene deblurring
Tsai, F.J., Peng, Y .T., Lin, Y .Y ., Tsai, C.C., Lin, and C.W. Banet: Blur-aware attention networks for dy- namic scene deblurring. page arXiv:2101.07518, 2021. 9 APREPRINT- MAY11, 2026
-
[46]
Event enhanced high-quality image re- covery
Bishan Wang, Jingwei He, Lei Yu, Gui-Song Xia, and Wen Yang. Event enhanced high-quality image re- covery. InComputer Vision–ECCV 2020: 16th Euro- pean Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part XIII 16, pages 155–171. Springer, 2020
work page 2020
-
[47]
Nformer: Robust person re- identification with neighbor transformer
Haochen Wang, Jiayi Shen, Yongtuo Liu, Yan Gao, and Efstratios Gavves. Nformer: Robust person re- identification with neighbor transformer. InProceed- ings of the IEEE/CVF conference on computer vision and pattern recognition, pages 7297–7307, 2022
work page 2022
-
[48]
Kvt: k-nn attention for boosting vision transformers
Pichao Wang, Xue Wang, Fan Wang, Ming Lin, Shun- ing Chang, Hao Li, and Rong Jin. Kvt: k-nn attention for boosting vision transformers. InEuropean confer- ence on computer vision, pages 285–302. Springer, 2022
work page 2022
-
[49]
Uformer: A general u-shaped transformer for image restoration
Zhendong Wang, Xiaodong Cun, Jianmin Bao, Wengang Zhou, Jianzhuang Liu, and Houqiang Li. Uformer: A general u-shaped transformer for image restoration. InProceedings of the IEEE/CVF con- ference on computer vision and pattern recognition, pages 17683–17693, 2022
work page 2022
-
[50]
Event-based video reconstruction using transformer
Wenming Weng, Yueyi Zhang, and Zhiwei Xiong. Event-based video reconstruction using transformer. ICCV, page 2563–2572, 2021
work page 2021
-
[51]
Hdr imaging for dynamic scenes with events.arXiv preprint arXiv:2404.03210, 2024
Li Xiaopeng, Zeng Zhaoyuan, Fan Cien, Zhao Chen, Deng Lei, and Yu Lei. Hdr imaging for dynamic scenes with events.arXiv preprint arXiv:2404.03210, 2024
-
[52]
Segmamba: Long-range sequential modeling mamba for 3d medical image segmentation
Zhaohu Xing, Tian Ye, Yijun Yang, Guang Liu, and Lei Zhu. Segmamba: Long-range sequential model- ing mamba for 3d medical image segmentation. page arXiv:2401.13560, 2024
-
[53]
Unnatural l0 sparse representation for natural image deblurring
Xu, L., Zheng, S., Jia, and J. Unnatural l0 sparse representation for natural image deblurring. 2013
work page 2013
-
[54]
Event-based motion deblurring with modality-aware decomposition and recomposi- tion
Wen Yang, Jinjian Wu, Leida Li, Weisheng Dong, and Guangming Shi. Event-based motion deblurring with modality-aware decomposition and recomposi- tion. InProceedings of the 31st ACM International Conference on Multimedia, pages 8327–8335, 2023
work page 2023
-
[55]
Learning event guided high dynamic range video reconstruction
Yixin Yang, Jin Han, Jinxiu Liang, Imari Sato, and Boxin Shi. Learning event guided high dynamic range video reconstruction. InProceedings of the IEEE/CVF Conference on Computer Vision and Pat- tern Recognition, pages 13924–13934, 2023
work page 2023
-
[56]
Learning scale-aware spatio-temporal implicit representation for event-based motion de- blurring
Wei Yu, Jianing Li, Shengping Zhang, and Xi- angyang Ji. Learning scale-aware spatio-temporal implicit representation for event-based motion de- blurring. InForty-first International Conference on Machine Learning, 2024
work page 2024
-
[57]
Multi-stage progres- sive image restoration
Zamir, S.W., Arora, A., Khan, S., Hayat, M., Khan, F.S., Yang, M.H., Shao, and L. Multi-stage progres- sive image restoration. 2021
work page 2021
-
[58]
Restormer: Efficient transformer for high- resolution image restoration
Syed Waqas Zamir, Aditya Arora, Salman Khan, Mu- nawar Hayat, Fahad Shahbaz Khan, and Ming-Hsuan Yang. Restormer: Efficient transformer for high- resolution image restoration. InProceedings of the IEEE/CVF conference on computer vision and pat- tern recognition, pages 5728–5739, 2022
work page 2022
-
[59]
Jiale Zhang, Yulun Zhang, Jinjin Gu, Yongbing Zhang, Linghe Kong, and Xin Yuan. Accurate im- age restoration with attention retractable transformer. arXiv preprint arXiv:2210.01427, 2022
-
[60]
Guangxiang Zhao, Junyang Lin, Zhiyuan Zhang, Xu- ancheng Ren, Qi Su, and Xu Sun. Explicit sparse transformer: Concentrated attention through explicit selection.arXiv preprint arXiv:1912.11637, 2019
-
[61]
Vision Mamba: Efficient Visual Representation Learning with Bidirectional State Space Model
Lianghui Zhu, Bencheng Liao, Qian Zhang, Xin- long Wang, Wenyu Liu, and Xinggang Wang. Vi- sion mamba: Efficient visual representation learn- ing with bidirectional state space model. page arXiv:2401.09417, 2024
work page internal anchor Pith review arXiv 2024
-
[62]
Yurui Zhu, Tianyu Wang, Xueyang Fu, Xuanyu Yang, Xin Guo, Jifeng Dai, Yu Qiao, and Xiaowei Hu. Learning weather-general and weather-specific fea- tures for image restoration under multiple adverse weather conditions. InProc. IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2023. 10
work page 2023
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.