Recognition: no theorem link
Robust Promptable Video Object Segmentation
Pith reviewed 2026-05-13 07:13 UTC · model grok-4.3
The pith
A memory-conditioned adaptation technique enables promptable video object segmentation to remain accurate despite input corruptions.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The central discovery is the Memory-object-conditioned Gated-rank Adaptation (MoGA) method for robust promptable video object segmentation. MoGA uses object-specific representations stored in memory across video frames to condition the adaptation process. This lets the model treat degradations differently for each tracked object while ensuring temporal consistency in the segmentation results. Experiments on a new benchmark with real-world adverse conditions and synthetic data confirm notable performance gains across many corruption types.
What carries the argument
Memory-object-conditioned Gated-rank Adaptation (MoGA), a technique that conditions robustification on per-object memory representations to achieve temporally consistent handling of degradations.
If this is right
- MoGA delivers consistent and significant performance gains across diverse corruption types on both synthetic and real-world data.
- The memory mechanism allows object-specific adaptation without sacrificing frame-to-frame consistency in the segmented objects.
- The new benchmark with 351 real-world clips provides a standardized way to measure robustness in promptable video object segmentation.
- Training on temporally varying synthetic corruptions prepares models to handle the range of degradations seen in practical video inputs.
Where Pith is reading between the lines
- The same per-object memory conditioning could be transferred to improve robustness in neighboring tasks such as video instance segmentation or multi-object tracking.
- Existing promptable video object segmentation models could be upgraded by inserting the MoGA adaptation layer with only modest additional training.
- Expanding the benchmark to include more video domains and sensor types would test whether the observed gains hold under broader real-world distributions.
Load-bearing premise
The assumption that the synthetic corruptions applied to existing VOS datasets and the new real-world benchmark sufficiently represent the distribution of adverse conditions encountered in actual deployments.
What would settle it
A controlled test in which MoGA shows no improvement or reduced accuracy on a fresh collection of real-world videos containing corruption types absent from the benchmark would disprove the generalization of the approach.
Figures
read the original abstract
The performance of promptable video object segmentation (PVOS) models substantially degrades under input corruptions, which prevents PVOS deployment in safety-critical domains. This paper offers the first comprehensive study on robust PVOS (RobustPVOS). We first construct a new, comprehensive benchmark with two real-world evaluation datasets of 351 video clips and more than 2,500 object masks under real-world adverse conditions. At the same time, we generate synthetic training data by applying diverse and temporally varying corruptions to existing VOS datasets. Moreover, we present a new RobustPVOS method, dubbed Memory-object-conditioned Gated-rank Adaptation (MoGA). The key to successfully performing RobustPVOS is two-fold: effectively handling object-specific degradation and ensuring temporal consistency in predictions. MoGA leverages object-specific representations maintained in memory across frames to condition the robustification process, which allows the model to handle each tracked object differently in a temporally consistent way. Extensive experiments on our benchmark validate MoGA's efficacy, showing consistent and significant improvements across diverse corruption types on both synthetic and real-world datasets, establishing a strong baseline for future RobustPVOS research. Our benchmark is publicly available at https://sohyun-l.github.io/RobustPVOS_project_page/.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper introduces the first comprehensive benchmark and method for Robust Promptable Video Object Segmentation (RobustPVOS). It creates real-world evaluation datasets (351 clips, >2500 masks under adverse conditions) and synthetic training data via temporally varying corruptions on existing VOS datasets. The proposed MoGA method uses object-specific representations maintained in memory to condition gated-rank adaptation, enabling per-object handling of degradation while preserving temporal consistency. Experiments report consistent improvements across corruption types on both synthetic and real data, positioning MoGA as a baseline for future work.
Significance. If the robustness gains hold under scrutiny, the work is significant for enabling PVOS deployment in safety-critical applications. The public benchmark release and focus on real-world adverse conditions address a clear gap. The object-specific memory conditioning idea is a plausible mechanism for handling heterogeneous corruptions, and the dual synthetic/real evaluation setup strengthens the claims relative to purely synthetic robustness studies.
major comments (3)
- [Experiments] Experiments section: No ablation isolates the contribution of object-specific memory conditioning. The paper must include a controlled variant that disables or freezes the memory module while retaining gated-rank adaptation (and vice versa) to substantiate the claim that memory conditioning, rather than adaptation alone, drives the per-object robustness and temporal consistency gains.
- [Benchmark Construction] Benchmark and data generation: The description of how temporally varying corruptions are applied to create synthetic training data lacks sufficient detail (e.g., per-frame corruption schedules, parameter ranges, and correlation with object masks) to allow reproduction or to verify that the synthetic distribution meaningfully approximates the real-world benchmark.
- [Experiments] Real-world results: Improvements on the new real-world datasets are reported without error bars, standard deviations across runs, or statistical tests. Given the modest number of clips (351), this makes it difficult to assess whether the gains are robust or could arise from dataset-specific biases.
minor comments (2)
- [Method] Method section: Provide a clearer diagram or pseudocode for the memory conditioning and gating mechanism to improve reproducibility.
- [Abstract] Abstract and introduction: The phrase 'object-specific representations maintained in memory' should be defined at first use with a forward reference to the relevant equation or figure.
Simulated Author's Rebuttal
We thank the referee for the constructive and detailed feedback. We address each major comment point by point below. Where revisions are needed, we will incorporate them in the next version of the manuscript to improve clarity, reproducibility, and statistical rigor.
read point-by-point responses
-
Referee: [Experiments] Experiments section: No ablation isolates the contribution of object-specific memory conditioning. The paper must include a controlled variant that disables or freezes the memory module while retaining gated-rank adaptation (and vice versa) to substantiate the claim that memory conditioning, rather than adaptation alone, drives the per-object robustness and temporal consistency gains.
Authors: We agree that a controlled ablation isolating the object-specific memory conditioning is essential to substantiate its role. In the revised manuscript, we will add experiments that (i) disable the memory module while retaining gated-rank adaptation and (ii) freeze the adaptation parameters while retaining the memory module. These variants will be evaluated on both synthetic and real-world data to quantify the specific contributions to per-object robustness and temporal consistency. revision: yes
-
Referee: [Benchmark Construction] Benchmark and data generation: The description of how temporally varying corruptions are applied to create synthetic training data lacks sufficient detail (e.g., per-frame corruption schedules, parameter ranges, and correlation with object masks) to allow reproduction or to verify that the synthetic distribution meaningfully approximates the real-world benchmark.
Authors: We acknowledge the need for greater detail to ensure reproducibility. We will expand the data generation section to specify the per-frame corruption schedules, the exact parameter ranges and sampling distributions for each corruption type, and the procedure for correlating corruptions with object masks (including any mask-aware application rules). This will also clarify how the synthetic distribution was designed to approximate the real-world adverse conditions. revision: yes
-
Referee: [Experiments] Real-world results: Improvements on the new real-world datasets are reported without error bars, standard deviations across runs, or statistical tests. Given the modest number of clips (351), this makes it difficult to assess whether the gains are robust or could arise from dataset-specific biases.
Authors: We agree that reporting variability and statistical significance is important given the dataset size. In the revision, we will rerun all real-world experiments across multiple random seeds, report standard deviations and error bars, and include statistical tests (e.g., paired t-tests or Wilcoxon signed-rank tests) to assess the significance of MoGA's improvements over baselines. revision: yes
Circularity Check
No significant circularity in derivation chain
full rationale
The paper introduces a new benchmark (synthetic corruptions on existing VOS data plus two new real-world datasets) and a novel architecture MoGA whose core components (object-specific memory conditioning and gated-rank adaptation) are defined independently of the evaluation metrics. Performance gains are reported via empirical comparison on held-out synthetic and real data rather than any quantity that is fitted or defined in terms of itself. No equations, self-citations, or uniqueness theorems are invoked that would reduce the central claim to a tautology or to a parameter fit performed inside the same paper.
Axiom & Free-Parameter Ledger
Reference graph
Works this paper leans on
-
[1]
Lora-ir: taming low-rank experts for efficient all-in-one image restoration,
Yuang Ai, Huaibo Huang, and Ran He. Lora-ir: Taming low- rank experts for efficient all-in-one image restoration.arXiv preprint arXiv:2410.15385, 2024. 2
-
[2]
Refereverything: Towards seg- menting everything we can speak of in videos
Anurag Bagchi, Zhipeng Bao, Yu-Xiong Wang, Pavel Tok- makov, and Martial Hebert. Refereverything: Towards seg- menting everything we can speak of in videos. InProceed- ings of the IEEE/CVF International Conference on Com- puter Vision, pages 23221–23231, 2025. 2
work page 2025
-
[3]
Generalized foggy- scene semantic segmentation by frequency decoupling
Qi Bi, Shaodi You, and Theo Gevers. Generalized foggy- scene semantic segmentation by frequency decoupling. In Proceedings of the IEEE/CVF Conference on Computer Vi- sion and Pattern Recognition (CVPR) Workshops, pages 1389–1399, 2024. 2
work page 2024
-
[4]
RobustSAM: segment anything robustly on de- graded images
Wei-Ting Chen, Yu-Jiet V ong, Sy-Yen Kuo, Sizhou Ma, and Jian Wang. RobustSAM: segment anything robustly on de- graded images. InProc. IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2024. 1, 2, 3
work page 2024
-
[5]
Putting the object back into video object segmentation
Ho Kei Cheng, Seoung Wug Oh, Brian Price, Joon-Young Lee, and Alexander Schwing. Putting the object back into video object segmentation. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 3151–3161, 2024. 2
work page 2024
-
[6]
On the effective- ness of layernorm tuning for continual learning in vision transformers
Thomas De Min, Massimiliano Mancini, Karteek Alahari, Xavier Alameda-Pineda, and Elisa Ricci. On the effective- ness of layernorm tuning for continual learning in vision transformers. In2023 IEEE/CVF International Conference on Computer Vision Workshops (ICCVW), pages 3577–3586. IEEE, 2023. 5
work page 2023
-
[7]
MOSE: A new dataset for video object segmentation in complex scenes
Henghui Ding, Chang Liu, Shuting He, Xudong Jiang, Philip HS Torr, and Song Bai. MOSE: A new dataset for video object segmentation in complex scenes. InProc. IEEE/CVF International Conference on Computer Vision (ICCV), 2023. 1, 3
work page 2023
-
[8]
Sam2long: Enhancing sam 2 for long video segmentation with a training-free memory tree
Shuangrui Ding, Rui Qian, Xiaoyi Dong, Pan Zhang, Yuhang Zang, Yuhang Cao, Yuwei Guo, Dahua Lin, and Jiaqi Wang. Sam2long: Enhancing sam 2 for long video segmentation with a training-free memory tree. InProc. IEEE/CVF Inter- national Conference on Computer Vision (ICCV), 2025. 1
work page 2025
-
[9]
William Fedus, Barret Zoph, and Noam Shazeer. Switch transformers: Scaling to trillion parameter models with sim- ple and efficient sparsity.Journal of Machine Learning Re- search, 23(120):1–39, 2022. 5
work page 2022
-
[10]
Devos: Flow-guided deformable trans- former for video object segmentation
V olodymyr Fedynyak, Yaroslav Romanus, Bohdan Hlo- vatskyi, Bohdan Sydor, Oles Dobosevych, Igor Babin, and Roman Riazantsev. Devos: Flow-guided deformable trans- former for video object segmentation. InProceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), pages 240–249, 2024. 2
work page 2024
-
[11]
Vanishing-point-guided video semantic segmentation of driving scenes
Diandian Guo, Deng-Ping Fan, Tongyu Lu, Christos Sakaridis, and Luc Van Gool. Vanishing-point-guided video semantic segmentation of driving scenes. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 3544–3553, 2024. 2
work page 2024
-
[12]
X-prompt: Multi-modal visual prompt for video object segmentation
Pinxue Guo, Wanyun Li, Hao Huang, Lingyi Hong, Xinyu Zhou, Zhaoyu Chen, Jinglun Li, Kaixun Jiang, Wei Zhang, and Wenqiang Zhang. X-prompt: Multi-modal visual prompt for video object segmentation. InProceedings of the 32nd ACM International Conference on Multimedia, pages 5151– 5160, 2024. 2
work page 2024
-
[13]
Benchmarking neu- ral network robustness to common corruptions and perturba- tions
Dan Hendrycks and Thomas Dietterich. Benchmarking neu- ral network robustness to common corruptions and perturba- tions. InProc. International Conference on Learning Repre- sentations (ICLR), 2019. 1, 3
work page 2019
-
[14]
LoRA: Low-rank adaptation of large language models
Edward J Hu, Yelong Shen, Phillip Wallis, Zeyuan Allen- Zhu, Yuanzhi Li, Shean Wang, Lu Wang, and Weizhu Chen. LoRA: Low-rank adaptation of large language models. In Proc. International Conference on Learning Representations (ICLR), 2022. 4, 7
work page 2022
-
[15]
Categorical repa- rameterization with gumbel-softmax
Eric Jang, Shixiang Gu, and Ben Poole. Categorical repa- rameterization with gumbel-softmax. InProc. International Conference on Learning Representations (ICLR), 2017. 5
work page 2017
-
[16]
Wei Ji, Jingjing Li, Cheng Bian, Zongwei Zhou, Jiaying Zhao, Alan L. Yuille, and Li Cheng. Multispectral video semantic segmentation: A benchmark dataset and baseline. InProc. IEEE/CVF Conference on Computer Vision and Pat- tern Recognition (CVPR), 2023. 1, 3, 5
work page 2023
-
[17]
Segment anything in high quality
Lei Ke, Mingqiao Ye, Martin Danelljan, Yifan Liu, Yu-Wing Tai, Chi-Keung Tang, and Fisher Yu. Segment anything in high quality. InProc. Neural Information Processing Sys- tems (NeurIPS), 2023. 2
work page 2023
-
[18]
Event-guided deblurring of unknown exposure time videos
Taewoo Kim, Jeongmin Lee, Lin Wang, and Kuk-Jin Yoon. Event-guided deblurring of unknown exposure time videos. InEuropean Conference on Computer Vision, pages 519–
-
[19]
Ex- ploring temporally dynamic data augmentation for video recognition
Taeoh Kim, Jinhyung Kim, Minho Shim, Sangdoo Yun, Myunggu Kang, Dongyoon Wee, and Sangyoun Lee. Ex- ploring temporally dynamic data augmentation for video recognition. InProc. International Conference on Learning Representations (ICLR), 2023. 1, 3
work page 2023
-
[20]
Alexander Kirillov, Eric Mintun, Nikhila Ravi, Hanzi Mao, Chloe Rolland, Laura Gustafson, Tete Xiao, Spencer White- head, Alexander C Berg, Wan-Yen Lo, et al. Segment any- thing. InProceedings of the IEEE/CVF International Con- ference on Computer Vision, pages 4015–4026, 2023. 2
work page 2023
-
[21]
Fifo: Learn- ing fog-invariant features for foggy scene segmentation
Sohyun Lee, Taeyoung Son, and Suha Kwak. Fifo: Learn- ing fog-invariant features for foggy scene segmentation. In Proc. IEEE/CVF Conference on Computer Vision and Pat- tern Recognition (CVPR), 2022. 2
work page 2022
-
[22]
Human pose estimation in extremely low-light con- ditions
Sohyun Lee, Jaesung Rim, Boseung Jeong, Geonu Kim, Byungju Woo, Haechan Lee, Sunghyun Cho, and Suha Kwak. Human pose estimation in extremely low-light con- ditions. InProc. IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2023
work page 2023
-
[23]
Frest: Feature restoration for semantic segmentation under multiple adverse conditions
Sohyun Lee, Namyup Kim, Sungyeon Kim, and Suha Kwak. Frest: Feature restoration for semantic segmentation under multiple adverse conditions. InProc. European Conference on Computer Vision (ECCV). Springer, 2024
work page 2024
-
[24]
GaRA-SAM: Robustifying segment anything model with gated-rank adaptation
Sohyun Lee, Yeho Gwon, Lukas Hoyer, and Suha Kwak. GaRA-SAM: Robustifying segment anything model with gated-rank adaptation. InProc. Neural Information Process- ing Systems (NeurIPS), 2025. 2, 4, 5
work page 2025
-
[25]
Sohyun Lee, Nayeong Kim, Juwon Kang, Seong Joon Oh, and Suha Kwak. Testdg: Test-time domain general- ization for continual test-time adaptation.arXiv preprint arXiv:2504.04981, 2025. 2
-
[26]
All-In-One Image Restoration for Unknown Corruption
Boyun Li, Xiao Liu, Peng Hu, Zhongqin Wu, Jiancheng Lv, and Xi Peng. All-In-One Image Restoration for Unknown Corruption. InIEEE Conference on Computer Vision and Pattern Recognition, New Orleans, LA, 2022. 2, 5
work page 2022
-
[27]
Event-assisted low-light video object segmentation
Hebei Li, Jin Wang, Jiahui Yuan, Yue Li, Wenming Weng, Yansong Peng, Yueyi Zhang, Zhiwei Xiong, and Xiaoyan Sun. Event-assisted low-light video object segmentation. In Proc. IEEE/CVF Conference on Computer Vision and Pat- tern Recognition (CVPR), 2024. 3
work page 2024
-
[28]
UniVS: Unified and universal video segmentation with prompts as queries
Minghan Li, Shuai Li, Xindong Zhang, and Lei Zhang. UniVS: Unified and universal video segmentation with prompts as queries. InProceedings of the IEEE/CVF Confer- ence on Computer Vision and Pattern Recognition (CVPR), pages 3227–3238, 2024. 1, 2
work page 2024
-
[29]
Learning spatial-semantic fea- tures for robust video object segmentation
Xin Li, Deshui Miao, Zhenyu He, Yaowei Wang, Huchuan Lu, and Ming-Hsuan Yang. Learning spatial-semantic fea- tures for robust video object segmentation. InProc. Inter- national Conference on Learning Representations (ICLR),
-
[30]
Unified open-world segmentation with multi-modal prompts
Yang Liu, Yufei Yin, Chenchen Jing, Muzhi Zhu, Hao Chen, Yuling Xi, Bo Feng, Hao Wang, Shiyu Li, and Chunhua Shen. Unified open-world segmentation with multi-modal prompts. InProc. IEEE/CVF International Conference on Computer Vision (ICCV), 2025. 2
work page 2025
-
[31]
Decoupled weight decay regularization
Ilya Loshchilov and Frank Hutter. Decoupled weight decay regularization. InProc. International Conference on Learn- ing Representations (ICLR), 2019. 5
work page 2019
-
[32]
Image segmenta- tion using text and image prompts
Timo L ¨uddecke and Alexander Ecker. Image segmenta- tion using text and image prompts. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 7086–7096, 2022. 2
work page 2022
-
[33]
Sam- i2v: Upgrading sam to support promptable video segmen- tation with less than 0.2% training cost
Haiyang Mei, Pengyu Zhang, and Mike Zheng Shou. Sam- i2v: Upgrading sam to support promptable video segmen- tation with less than 0.2% training cost. InProceedings of the Computer Vision and Pattern Recognition Conference (CVPR), pages 3417–3426, 2025. 1, 2, 4
work page 2025
-
[34]
A benchmark dataset and evaluation methodology for video object segmentation
Federico Perazzi, Jordi Pont-Tuset, Brian McWilliams, Luc Van Gool, Markus Gross, and Alexander Sorkine- Hornung. A benchmark dataset and evaluation methodology for video object segmentation. InThe IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016. 1, 3
work page 2016
-
[35]
PromptIR: Prompting for all-in-one image restoration
Vaishnav Potlapalli, Syed Waqas Zamir, Salman Khan, and Fahad Khan. PromptIR: Prompting for all-in-one image restoration. InProc. Neural Information Processing Systems (NeurIPS), 2023. 2
work page 2023
-
[36]
Wang Qi, Yu-Ping Ruan, Yuan Zuo, and Taihao Li. Parameter-efficient tuning on layer normalization for pre- trained language models.arXiv preprint arXiv:2211.08682,
-
[37]
SAM 2: Segment anything in images and videos
Nikhila Ravi, Valentin Gabeur, Yuan-Ting Hu, Ronghang Hu, Chaitanya Ryali, Tengyu Ma, Haitham Khedr, Roman R¨adle, Chloe Rolland, Laura Gustafson, Eric Mintun, Junt- ing Pan, Kalyan Vasudev Alwala, Nicolas Carion, Chao- Yuan Wu, Ross Girshick, Piotr Dollar, and Christoph Feicht- enhofer. SAM 2: Segment anything in images and videos. In Proc. International...
work page 2025
-
[38]
ACDC: The adverse conditions dataset with correspondences for se- mantic driving scene understanding
Christos Sakaridis, Dengxin Dai, and Luc Van Gool. ACDC: The adverse conditions dataset with correspondences for se- mantic driving scene understanding. InProc. IEEE/CVF In- ternational Conference on Computer Vision (ICCV), 2021. 1
work page 2021
-
[39]
Christos Sakaridis, Haoran Wang, Ke Li, Ren ´e Zurbr ¨ugg, Arpit Jadon, Wim Abbeloos, Daniel Olmeda Reino, Luc Van Gool, and Dengxin Dai. ACDC: The adverse condi- tions dataset with correspondences for robust semantic driv- ing scene perception.IEEE Transactions on Pattern Analysis and Machine Intelligence, 2025. 1, 3, 5
work page 2025
-
[40]
Kernelized memory network for video object segmentation
Hongje Seong, Junhyuk Hyun, and Euntai Kim. Kernelized memory network for video object segmentation. InProc. European Conference on Computer Vision (ECCV), 2020. 2
work page 2020
-
[41]
Urie: Universal image enhancement for visual recognition in the wild
Taeyoung Son, Juwon Kang, Namyup Kim, Sunghyun Cho, and Suha Kwak. Urie: Universal image enhancement for visual recognition in the wild. InProc. European Conference on Computer Vision (ECCV), 2020. 2, 5
work page 2020
-
[42]
Learning video object segmentation with visual memory
Pavel Tokmakov, Karteek Alahari, and Cordelia Schmid. Learning video object segmentation with visual memory. In Proc. IEEE/CVF International Conference on Computer Vi- sion (ICCV), 2017. 2
work page 2017
-
[43]
Learning motion patterns in videos
Pavel Tokmakov, Karteek Alahari, and Cordelia Schmid. Learning motion patterns in videos. InProc. IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2017
work page 2017
-
[44]
Video segmentation via object flow
Yi-Hsuan Tsai, Ming-Hsuan Yang, and Michael J Black. Video segmentation via object flow. InProc. IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2016. 2
work page 2016
-
[45]
Layernorm: A key component in parameter-efficient fine-tuning.arXiv preprint arXiv:2403.20284, 2024
Taha ValizadehAslani and Hualou Liang. Layernorm: A key component in parameter-efficient fine-tuning.arXiv preprint arXiv:2403.20284, 2024. 5
-
[46]
Yunyang Xiong, Chong Zhou, Xiaoyu Xiang, Lemeng Wu, Chenchen Zhu, Zechun Liu, Saksham Suri, Balakrishnan Varadarajan, Ramya Akula, Forrest Iandola, et al. Efficient track anything. InProc. IEEE/CVF International Conference on Computer Vision (ICCV), 2025. 1
work page 2025
-
[47]
Youtube-vos: A large-scale video object segmentation benchmark.arXiv preprint arXiv:1809.03327, 2018
Ning Xu, Linjie Yang, Yuchen Fan, Dingcheng Yue, Yuchen Liang, Jianchao Yang, and Thomas Huang. Youtube-vos: A large-scale video object segmentation benchmark.arXiv preprint arXiv:1809.03327, 2018. 1, 3, 4, 5
-
[48]
Tuning layernorm in attention: Towards efficient multi-modal llm finetuning,
Bingchen Zhao, Haoqin Tu, Chen Wei, Jieru Mei, and Cihang Xie. Tuning layernorm in attention: Towards efficient multi-modal llm finetuning.arXiv preprint arXiv:2312.11420, 2023. 5
-
[49]
Rmem: Re- stricted memory banks improve video object segmentation
Junbao Zhou, Ziqi Pang, and Yu-Xiong Wang. Rmem: Re- stricted memory banks improve video object segmentation. InProc. IEEE/CVF Conference on Computer Vision and Pat- tern Recognition (CVPR), 2024. 2, 4
work page 2024
-
[50]
Segment everything everywhere all at once
Xueyan Zou, Jianwei Yang, Hao Zhang, Feng Li, Linjie Li, Jianfeng Wang, Lijuan Wang, Jianfeng Gao, and Yong Jae Lee. Segment everything everywhere all at once. InProc. Neural Information Processing Systems (NeurIPS), 2023. 2
work page 2023
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.