SUMO: Segment and Track Any Motion with Nonlinear State Space Models
Pith reviewed 2026-06-30 06:35 UTC · model grok-4.3
The pith
A nonlinear state space model from robotics enables zero-shot tracking and segmentation of objects with complex motions.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
SUMO develops a nonlinear State Space Model to represent object dynamics and introduces a Selective Unscented Filter that applies joint scoring and dynamic fusion of multi-source predictions, together with a memory selection mechanism, to achieve state-of-the-art results on VOT and MOS benchmarks in a zero-shot setting.
What carries the argument
The nonlinear State Space Model (SSM) that encodes object motion dynamics, paired with the Selective Unscented Filter (SUF) that performs state estimation through scoring and fusion of predictions.
Load-bearing premise
The nonlinear state space model accurately captures the motion patterns of objects in real videos.
What would settle it
Videos containing object motions that deviate strongly from the assumed nonlinear dynamics where the selective unscented filter still produces incorrect state estimates despite clear visual evidence.
Figures
read the original abstract
Visual Object Tracking (VOT) and Moving Object Segmentation (MOS) are two fundamental tasks in computer vision that involve both spatial and temporal object dynamics. Existing methods rely predominantly on visual cues and thus often falter in real-world scenarios where object motions are inherently complex and nonlinear. To address this limitation, we propose SUMO, a zero-shot, training-free, unified framework integrating nonlinear dynamics with vision-based segmentation for accurate and consistent VOT and MOS. Specifically, we develop a nonlinear State Space Model (SSM) inspired by robotics principles to capture the complex object dynamics. Building on this model, we propose a Selective Unscented Filter (SUF) for accurate state estimation, which features a joint scoring mechanism and dynamically fuses multi-source predictions to identify the most plausible object state over time. Furthermore, we apply a memory selection mechanism to evaluate the reliability of memory frames. Our extensive experimental results show that SUMO achieves state-of-the-art performance on both VOT and MOS tasks.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper proposes SUMO, a zero-shot training-free unified framework for Visual Object Tracking (VOT) and Moving Object Segmentation (MOS). It combines a robotics-inspired nonlinear state space model (SSM) to capture complex object dynamics, a Selective Unscented Filter (SUF) with joint scoring and dynamic multi-source fusion for state estimation, and a memory selection mechanism. The central claim is that this yields state-of-the-art performance on both VOT and MOS tasks.
Significance. If the results hold, the work could be significant as a training-free alternative that incorporates nonlinear dynamics from robotics to improve robustness on complex motions where visual-cue-only methods fail. The zero-shot nature and the SUF's joint scoring mechanism are strengths that could generalize across tasks. However, the significance hinges on whether the nonlinear SSM component is demonstrably necessary, which is not yet established.
major comments (2)
- [Method description of nonlinear SSM and SUF] The SOTA claim on VOT and MOS rests on the premise that the nonlinear SSM plus SUF delivers superior state estimation, yet the manuscript provides no ablation that replaces the nonlinear dynamics with a linear SSM (or EKF/UKF) while keeping the SUF, joint scoring, and segmentation pipeline fixed. This is load-bearing for the central claim that nonlinear modeling is required to address the limitations of existing methods.
- [Experimental results] No quantitative tables, baseline comparisons, or statistical details are referenced to support the SOTA performance assertions, and the experimental evaluation lacks controls that would allow attribution of gains specifically to the robotics-inspired nonlinear component versus the vision backbone.
minor comments (1)
- The abstract is dense and would benefit from explicit separation of the three main contributions (nonlinear SSM, SUF, memory selection) for readability.
Simulated Author's Rebuttal
We thank the referee for the constructive feedback. The comments highlight the need for stronger empirical isolation of the nonlinear SSM's contribution, which we address by committing to additional experiments in revision. We respond point-by-point below.
read point-by-point responses
-
Referee: [Method description of nonlinear SSM and SUF] The SOTA claim on VOT and MOS rests on the premise that the nonlinear SSM plus SUF delivers superior state estimation, yet the manuscript provides no ablation that replaces the nonlinear dynamics with a linear SSM (or EKF/UKF) while keeping the SUF, joint scoring, and segmentation pipeline fixed. This is load-bearing for the central claim that nonlinear modeling is required to address the limitations of existing methods.
Authors: We agree that an explicit ablation isolating the nonlinear dynamics (replacing the nonlinear SSM with a linear SSM or EKF/UKF while freezing SUF, joint scoring, and the segmentation pipeline) is necessary to substantiate the central claim. The current manuscript motivates the nonlinear SSM from robotics principles for complex motions but does not include this controlled comparison. In the revised version we will add the requested ablation on standard VOT and MOS benchmarks and report the resulting performance deltas. revision: yes
-
Referee: [Experimental results] No quantitative tables, baseline comparisons, or statistical details are referenced to support the SOTA performance assertions, and the experimental evaluation lacks controls that would allow attribution of gains specifically to the robotics-inspired nonlinear component versus the vision backbone.
Authors: The manuscript contains quantitative results and baseline comparisons in the experimental section; however, we acknowledge that these do not yet include the specific controls needed to attribute gains to the nonlinear SSM versus the vision backbone. The additional ablation described in the response to the first comment will directly address this attribution gap. We will also ensure all tables, statistical details, and controls are clearly referenced in the revised text. revision: partial
Circularity Check
No circularity: empirical SOTA claim rests on independent model proposal and experiments
full rationale
The paper proposes a new zero-shot framework (nonlinear SSM + SUF + memory selection) for VOT/MOS, with performance claims grounded in experimental results on standard benchmarks rather than any derivation that reduces to its own inputs. No equations, fitted parameters, or self-citations appear in the abstract or described structure that would trigger self-definitional, fitted-input, or load-bearing self-citation patterns. The robotics inspiration is presented as motivation for an ansatz, not as a uniqueness theorem imported from prior author work. The derivation chain is therefore self-contained against external benchmarks.
Axiom & Free-Parameter Ledger
Reference graph
Works this paper leans on
-
[1]
Progressive-x: Efficient, anytime, multi-model fitting algorithm
Daniel Barath and Jiri Matas. Progressive-x: Efficient, anytime, multi-model fitting algorithm. InProceedings of the IEEE/CVF international conference on computer vision, pages 3780–3788, 2019. 2
2019
-
[2]
Time optimal tra- jectories for a car-like mobile robot.IEEE Transactions on Robotics, 38(1):421–432, 2021
Joseph Z Ben-Asher and Elon D Rimon. Time optimal tra- jectories for a car-like mobile robot.IEEE Transactions on Robotics, 38(1):421–432, 2021. 3
2021
-
[3]
Fully-convolutional siamese networks for object tracking
Luca Bertinetto, Jack Valmadre, Joao F Henriques, Andrea Vedaldi, and Philip HS Torr. Fully-convolutional siamese networks for object tracking. InComputer vision–ECCV 2016 workshops: Amsterdam, the Netherlands, October 8- 10 and 15-16, 2016, proceedings, part II 14, pages 850–865. Springer, 2016. 1, 2
2016
-
[4]
Learning discriminative model prediction for track- ing
Goutam Bhat, Martin Danelljan, Luc Van Gool, and Radu Timofte. Learning discriminative model prediction for track- ing. InProceedings of the IEEE/CVF international confer- ence on computer vision, pages 6182–6191, 2019. 2
2019
-
[5]
It’s moving! a prob- abilistic model for causal motion segmentation in moving camera videos
Pia Bideau and Erik Learned-Miller. It’s moving! a prob- abilistic model for causal motion segmentation in moving camera videos. InEuropean Conference on Computer Vi- sion, pages 433–449. Springer, 2016. 2
2016
-
[6]
Markus Bosch. Deep learning for robust motion seg- mentation with non-static cameras.arXiv preprint arXiv:2102.10929, 2021. 1, 2
-
[7]
Ro- bust object modeling for visual tracking
Yidong Cai, Jie Liu, Jie Tang, and Gangshan Wu. Ro- bust object modeling for visual tracking. InProceedings of the IEEE/CVF international conference on computer vision, pages 9589–9600, 2023. 6
2023
-
[8]
Springer Science & Business Media, 2012
Frank M Callier and Charles A Desoer.Linear system theory. Springer Science & Business Media, 2012. 1, 3
2012
-
[9]
Learning independent object motion from unlabelled stereo- scopic videos
Zhe Cao, Abhishek Kar, Christian Hane, and Jitendra Malik. Learning independent object motion from unlabelled stereo- scopic videos. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 5594– 5603, 2019. 2
2019
-
[10]
Linear rotate subspaee based visual tracking methods with application to uav stand-off target tracking
Fei Che, Jie Li, Yifeng Niu, Lizhen Wu, Wenchen Yao, and Chao Yan. Linear rotate subspaee based visual tracking methods with application to uav stand-off target tracking. In 2019 IEEE International Conference on Unmanned Systems (ICUS), pages 914–919, 2019. 1
2019
-
[11]
Transformer tracking
Xin Chen, Bin Yan, Jiawen Zhu, Dong Wang, Xiaoyun Yang, and Huchuan Lu. Transformer tracking. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 8126–8135, 2021. 2, 6
2021
-
[12]
Seqtrack: Sequence to sequence learning for visual ob- ject tracking
Xin Chen, Houwen Peng, Dong Wang, Huchuan Lu, and Han Hu. Seqtrack: Sequence to sequence learning for visual ob- ject tracking. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 14572– 14581, 2023. 6
2023
-
[13]
Siamese box adaptive network for visual tracking
Zedu Chen, Bineng Zhong, Guorong Li, Shengping Zhang, and Rongrong Ji. Siamese box adaptive network for visual tracking. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 6668–6677,
-
[14]
Mixformer: End-to-end tracking with iterative mixed atten- tion
Yutao Cui, Cheng Jiang, Limin Wang, and Gangshan Wu. Mixformer: End-to-end tracking with iterative mixed atten- tion. InProceedings of the IEEE/CVF conference on com- puter vision and pattern recognition, pages 13608–13618,
-
[15]
Atom: Accurate tracking by overlap max- imization
Martin Danelljan, Goutam Bhat, Fahad Shahbaz Khan, and Michael Felsberg. Atom: Accurate tracking by overlap max- imization. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 4660–4669,
-
[16]
Ada-track: End-to-end multi-camera 3d multi-object tracking with alternating detection and association
Shuxiao Ding, Lukas Schneider, Marius Cordts, and Juergen Gall. Ada-track: End-to-end multi-camera 3d multi-object tracking with alternating detection and association. InPro- ceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 15184–15194, 2024. 1
2024
-
[17]
Lester E Dubins. On curves of minimal length with a con- straint on average curvature, and with prescribed initial and terminal positions and tangents.American Journal of math- ematics, 79(3):497–516, 1957. 3
1957
-
[18]
Lasot: A high-quality large-scale single object tracking benchmark.International Journal of Computer Vision, 129 (2):439–461, 2021
Heng Fan, Hexin Bai, Liting Lin, Fan Yang, Peng Chu, Ge Deng, Sijia Yu, Harshit, Mingzhen Huang, Juehuan Liu, et al. Lasot: A high-quality large-scale single object tracking benchmark.International Journal of Computer Vision, 129 (2):439–461, 2021. 5, 6
2021
-
[19]
Mamba: Linear-Time Sequence Modeling with Selective State Spaces
Albert Gu and Tri Dao. Mamba: Linear-time sequence modeling with selective state spaces.arXiv preprint arXiv:2312.00752, 2023. 1, 2
work page internal anchor Pith review Pith/arXiv arXiv 2023
-
[20]
Masked autoencoders are scalable vision learners
Kaiming He, Xinlei Chen, Saining Xie, Yanghao Li, Piotr Doll´ar, and Ross Girshick. Masked autoencoders are scalable vision learners. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 16000– 16009, 2022. 2
2022
-
[21]
Got-10k: A large high-diversity benchmark for generic object tracking in the wild.IEEE transactions on pattern analysis and machine intelligence, 43(5):1562–1577, 2019
Lianghua Huang, Xin Zhao, and Kaiqi Huang. Got-10k: A large high-diversity benchmark for generic object tracking in the wild.IEEE transactions on pattern analysis and machine intelligence, 43(5):1562–1577, 2019. 5, 6
2019
-
[22]
Segment any motion in videos
Nan Huang, Wenzhao Zheng, Chenfeng Xu, Kurt Keutzer, Shanghang Zhang, Angjoo Kanazawa, and Qianqian Wang. Segment any motion in videos. InProceedings of the Com- puter Vision and Pattern Recognition Conference, pages 3406–3416, 2025. 1, 6, 8
2025
-
[23]
Accelerated reeds-shepp and under-specified reeds-shepp algorithms for mobile robot path planning.IEEE Transactions on Robotics,
Ibrahim Ibrahim, Wilm Decr ´e, and Jan Swevers. Accelerated reeds-shepp and under-specified reeds-shepp algorithms for mobile robot path planning.IEEE Transactions on Robotics,
-
[24]
Unscented filtering and nonlinear estimation.Proceedings of the IEEE, 92(3): 401–422, 2004
Simon J Julier and Jeffrey K Uhlmann. Unscented filtering and nonlinear estimation.Proceedings of the IEEE, 92(3): 401–422, 2004. 4
2004
-
[25]
Learning segmentation from point trajecto- ries.Advances in Neural Information Processing Systems, 37:112573–112597, 2024
Laurynas Karazija, Iro Laina, Christian Rupprecht, and An- drea Vedaldi. Learning segmentation from point trajecto- ries.Advances in Neural Information Processing Systems, 37:112573–112597, 2024. 2
2024
-
[26]
Nonlinear systems.3rd edition, 2002
HK Khalil. Nonlinear systems.3rd edition, 2002. 3
2002
-
[27]
Segment any- thing
Alexander Kirillov, Eric Mintun, Nikhila Ravi, Hanzi Mao, Chloe Rolland, Laura Gustafson, Tete Xiao, Spencer White- head, Alexander C Berg, Wan-Yen Lo, et al. Segment any- thing. InProceedings of the IEEE/CVF international confer- ence on computer vision, pages 4015–4026, 2023. 1 9
2023
-
[28]
The weighted markov-dubins problem.IEEE Robotics and Automation Letters, 8(3):1563–1570, 2023
Deepak Prakash Kumar, Swaroop Darbha, Satya- narayana Gupta Manyam, and David Casbeer. The weighted markov-dubins problem.IEEE Robotics and Automation Letters, 8(3):1563–1570, 2023. 3
2023
-
[29]
Motion segmentation via a sparsity constraint
Taotao Lai, Hanzi Wang, Yan Yan, Tat-Jun Chin, and Wan- Lei Zhao. Motion segmentation via a sparsity constraint. IEEE Transactions on Intelligent Transportation Systems, 18 (4):973–983, 2016. 2
2016
-
[30]
High performance visual tracking with siamese region pro- posal network
Bo Li, Junjie Yan, Wei Wu, Zheng Zhu, and Xiaolin Hu. High performance visual tracking with siamese region pro- posal network. InProceedings of the IEEE conference on computer vision and pattern recognition, pages 8971–8980,
-
[31]
Siamrpn++: Evolution of siamese vi- sual tracking with very deep networks
Bo Li, Wei Wu, Qiang Wang, Fangyi Zhang, Junliang Xing, and Junjie Yan. Siamrpn++: Evolution of siamese vi- sual tracking with very deep networks. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 4282–4291, 2019. 2, 6
2019
-
[32]
Video segmentation by tracking many figure- ground segments
Fuxin Li, Taeyoung Kim, Ahmad Humayun, David Tsai, and James M Rehg. Video segmentation by tracking many figure- ground segments. InProceedings of the IEEE international conference on computer vision, pages 2192–2199, 2013. 6, 8
2013
-
[33]
Sixu Li and Yang Zhou. Nonlinear oscillatory response of automated vehicle car-following: Theoretical analysis with traffic state and control input limits.Transportation Research Part B: Methodological, 201:103315, 2025. 2
2025
-
[34]
Sequencing-enabled hierarchical cooperative cav on- ramp merging control with enhanced stability and feasibility
Sixu Li, Yang Zhou, Xinyue Ye, Jiwan Jiang, and Meng Wang. Sequencing-enabled hierarchical cooperative cav on- ramp merging control with enhanced stability and feasibility. IEEE Transactions on Intelligent Vehicles, 2024. 3
2024
-
[35]
Closed-form generation of paths for motion planning of a convexified reeds-shepp vehicle on a sphere.Available at SSRN 5227769, 2025
Sixu Li, Deepak Prakash Kumar, Swaroop Darbha, and Yang Zhou. Closed-form generation of paths for motion planning of a convexified reeds-shepp vehicle on a sphere.Available at SSRN 5227769, 2025
2025
-
[36]
Time-optimal Convexified Reeds-Shepp Paths on a Sphere
Sixu Li, Deepak Prakash Kumar, Swaroop Darbha, and Yang Zhou. Time-optimal convexified reeds-shepp paths on a sphere.arXiv preprint arXiv:2504.00966, 2025. 3
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[37]
Bootstrapping objectness from videos by relaxed common fate and visual grouping
Long Lian, Zhirong Wu, and Stella X Yu. Bootstrapping objectness from videos by relaxed common fate and visual grouping. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 14582– 14591, 2023. 8
2023
-
[38]
Pointmamba: A simple state space model for point cloud analysis,
Dingkang Liang, Xin Zhou, Wei Xu, Xingkui Zhu, Zhikang Zou, Xiaoqing Ye, Xiao Tan, and Xiang Bai. Pointmamba: A simple state space model for point cloud analysis.arXiv preprint arXiv:2402.10739, 2024. 1, 2
-
[39]
Swintrack: A simple and strong baseline for trans- former tracking.Advances in Neural Information Processing Systems, 35:16743–16754, 2022
Liting Lin, Heng Fan, Zhipeng Zhang, Yong Xu, and Haibin Ling. Swintrack: A simple and strong baseline for trans- former tracking.Advances in Neural Information Processing Systems, 35:16743–16754, 2022. 6
2022
-
[40]
Tracking meets lora: Faster training, larger model, stronger performance
Liting Lin, Heng Fan, Zhipeng Zhang, Yaowei Wang, Yong Xu, and Haibin Ling. Tracking meets lora: Faster training, larger model, stronger performance. InEuropean Confer- ence on Computer Vision, pages 300–318. Springer, 2024. 6
2024
-
[41]
Vmamba: Visual state space model.Advances in neural information processing systems, 37:103031–103063, 2024
Yue Liu, Yunjie Tian, Yuzhong Zhao, Hongtian Yu, Lingxi Xie, Yaowei Wang, Qixiang Ye, Jianbin Jiao, and Yunfan Liu. Vmamba: Visual state space model.Advances in neural information processing systems, 37:103031–103063, 2024. 1, 2
2024
-
[42]
Cam- bridge University Press, 2017
Kevin M Lynch and Frank C Park.Modern robotics. Cam- bridge University Press, 2017. 3
2017
-
[43]
U-Mamba: Enhancing Long-range Dependency for Biomedical Image Segmentation
Jun Ma, Feifei Li, and Bo Wang. U-mamba: Enhancing long-range dependency for biomedical image segmentation. arXiv preprint arXiv:2401.04722, 2024. 1, 2
work page internal anchor Pith review Pith/arXiv arXiv 2024
-
[44]
Transforming model prediction for tracking
Christoph Mayer, Martin Danelljan, Goutam Bhat, Matthieu Paul, Danda Pani Paudel, Fisher Yu, and Luc Van Gool. Transforming model prediction for tracking. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 8731–8740, 2022. 2
2022
-
[45]
Em-driven unsupervised learning for efficient motion seg- mentation.IEEE Transactions on Pattern Analysis and Ma- chine Intelligence, 45(4):4462–4473, 2022
Etienne Meunier, Ana ¨ıs Badoual, and Patrick Bouthemy. Em-driven unsupervised learning for efficient motion seg- mentation.IEEE Transactions on Pattern Analysis and Ma- chine Intelligence, 45(4):4462–4473, 2022. 8
2022
-
[46]
Trackingnet: A large-scale dataset and benchmark for object tracking in the wild
Matthias Muller, Adel Bibi, Silvio Giancola, Salman Al- subaihi, and Bernard Ghanem. Trackingnet: A large-scale dataset and benchmark for object tracking in the wild. In Proceedings of the European conference on computer vision (ECCV), pages 300–317, 2018. 1, 6
2018
-
[47]
Segmentation of moving objects by long term video analysis.IEEE trans- actions on pattern analysis and machine intelligence, 36(6): 1187–1200, 2013
Peter Ochs, Jitendra Malik, and Thomas Brox. Segmentation of moving objects by long term video analysis.IEEE trans- actions on pattern analysis and machine intelligence, 36(6): 1187–1200, 2013. 6, 8
2013
-
[48]
A benchmark dataset and evaluation methodology for video object segmentation
Federico Perazzi, Jordi Pont-Tuset, Brian McWilliams, Luc Van Gool, Markus Gross, and Alexander Sorkine-Hornung. A benchmark dataset and evaluation methodology for video object segmentation. InProceedings of the IEEE conference on computer vision and pattern recognition, pages 724–732,
-
[49]
Tracking 3-d motion of dynamic objects using monocular visual-inertial sensing.IEEE Transactions on Robotics, 35 (4):799–816, 2019
Kejie Qiu, Tong Qin, Wenliang Gao, and Shaojie Shen. Tracking 3-d motion of dynamic objects using monocular visual-inertial sensing.IEEE Transactions on Robotics, 35 (4):799–816, 2019. 1
2019
-
[50]
SAM 2: Segment Anything in Images and Videos
Nikhila Ravi, Valentin Gabeur, Yuan-Ting Hu, Ronghang Hu, Chaitanya Ryali, Tengyu Ma, Haitham Khedr, Roman R¨adle, Chloe Rolland, Laura Gustafson, et al. Sam 2: Segment anything in images and videos.arXiv preprint arXiv:2408.00714, 2024. 1, 2, 3, 6
work page internal anchor Pith review Pith/arXiv arXiv 2024
-
[51]
Hi- era: A hierarchical vision transformer without the bells-and- whistles
Chaitanya Ryali, Yuan-Ting Hu, Daniel Bolya, Chen Wei, Haoqi Fan, Po-Yao Huang, Vaibhav Aggarwal, Arkabandhu Chowdhury, Omid Poursaeed, Judy Hoffman, et al. Hi- era: A hierarchical vision transformer without the bells-and- whistles. InInternational conference on machine learning, pages 29441–29454. PMLR, 2023. 2
2023
-
[52]
Explicit visual prompts for visual object tracking
Liangtao Shi, Bineng Zhong, Qihua Liang, Ning Li, Sheng- ping Zhang, and Xianxian Li. Explicit visual prompts for visual object tracking. InProceedings of the AAAI Confer- ence on Artificial Intelligence, pages 4838–4846, 2024. 6
2024
-
[53]
Shortest paths for the reeds-shepp car: a worked out example of the use of geomet- ric techniques in nonlinear optimal control, 1991
H ´ector J Sussmann and Guoqing Tang. Shortest paths for the reeds-shepp car: a worked out example of the use of geomet- ric techniques in nonlinear optimal control, 1991. 3, 4
1991
-
[54]
Nuscenes-spatialqa: A spatial understanding and reasoning benchmark for vision- language models in autonomous driving
Kexin Tian, Jingrui Mao, Yunlong Zhang, Jiwan Jiang, Yang Zhou, and Zhengzhong Tu. Nuscenes-spatialqa: A spatial understanding and reasoning benchmark for vision- language models in autonomous driving. InProceedings 10 of the IEEE/CVF International Conference on Computer Vi- sion, pages 4567–4576, 2025. 1
2025
-
[55]
Physi- cally analyzable ai-based nonlinear platoon dynamics mod- eling during traffic oscillation: A koopman approach.IEEE Transactions on Intelligent Transportation Systems, 2025
Kexin Tian, Haotian Shi, Yang Zhou, and Sixu Li. Physi- cally analyzable ai-based nonlinear platoon dynamics mod- eling during traffic oscillation: A koopman approach.IEEE Transactions on Intelligent Transportation Systems, 2025. 1
2025
-
[56]
The unscented kalman filter for nonlinear estimation
Eric A Wan and Rudolph Van Der Merwe. The unscented kalman filter for nonlinear estimation. InProceedings of the IEEE 2000 adaptive systems for signal processing, commu- nications, and control symposium (Cat. No. 00EX373), pages 153–158. Ieee, 2000. 4
2000
-
[57]
Segment- ing moving objects via an object-centric layered representa- tion.Advances in neural information processing systems, 35: 28023–28036, 2022
Junyu Xie, Weidi Xie, and Andrew Zisserman. Segment- ing moving objects via an object-centric layered representa- tion.Advances in neural information processing systems, 35: 28023–28036, 2022. 8
2022
-
[58]
Appearance- based refinement for object-centric motion segmentation
Junyu Xie, Weidi Xie, and Andrew Zisserman. Appearance- based refinement for object-centric motion segmentation. In European Conference on Computer Vision, pages 238–256. Springer, 2024. 6, 8
2024
-
[59]
Moving object segmentation: All you need is sam (and flow)
Junyu Xie, Charig Yang, Weidi Xie, and Andrew Zisserman. Moving object segmentation: All you need is sam (and flow). InProceedings of the Asian conference on computer vision, pages 162–178, 2024. 2
2024
-
[60]
Autore- gressive queries for adaptive tracking with spatio-temporal transformers
Jinxia Xie, Bineng Zhong, Zhiyi Mo, Shengping Zhang, Liangtao Shi, Shuxiang Song, and Rongrong Ji. Autore- gressive queries for adaptive tracking with spatio-temporal transformers. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 19300– 19309, 2024. 1, 6
2024
-
[61]
Segmamba: Long-range sequential modeling mamba for 3d medical image segmentation
Zhaohu Xing, Tian Ye, Yijun Yang, Guang Liu, and Lei Zhu. Segmamba: Long-range sequential modeling mamba for 3d medical image segmentation. InInternational Conference on Medical Image Computing and Computer-Assisted Interven- tion, pages 578–588. Springer, 2024. 2
2024
-
[62]
Learning spatio-temporal transformer for vi- sual tracking
Bin Yan, Houwen Peng, Jianlong Fu, Dong Wang, and Huchuan Lu. Learning spatio-temporal transformer for vi- sual tracking. InProceedings of the IEEE/CVF international conference on computer vision, pages 10448–10457, 2021. 6
2021
-
[63]
Cheng-Yen Yang, Hsiang-Wei Huang, Wenhao Chai, Zhongyu Jiang, and Jenq-Neng Hwang. Samurai: Adapting segment anything model for zero-shot visual tracking with motion-aware memory.arXiv preprint arXiv:2411.11922,
-
[64]
Unsupervised moving object detection via contextual information separation
Yanchao Yang, Antonio Loquercio, Davide Scaramuzza, and Stefano Soatto. Unsupervised moving object detection via contextual information separation. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 879–888, 2019. 8
2019
-
[65]
Joint feature learning and relation modeling for tracking: A one-stream framework
Botao Ye, Hong Chang, Bingpeng Ma, Shiguang Shan, and Xilin Chen. Joint feature learning and relation modeling for tracking: A one-stream framework. InEuropean conference on computer vision, pages 341–357. Springer, 2022. 6
2022
-
[66]
Deeper and wider siamese networks for real-time visual tracking
Zhipeng Zhang and Houwen Peng. Deeper and wider siamese networks for real-time visual tracking. InProceed- ings of the IEEE/CVF conference on computer vision and pattern recognition, pages 4591–4600, 2019. 2
2019
-
[67]
Mdnet: A semantically and visually inter- pretable medical image diagnosis network
Zizhao Zhang, Yuanpu Xie, Fuyong Xing, Mason McGough, and Lin Yang. Mdnet: A semantically and visually inter- pretable medical image diagnosis network. InProceedings of the IEEE conference on computer vision and pattern recog- nition, pages 6428–6436, 2017. 2
2017
-
[68]
Odtrack: Online dense temporal token learning for visual tracking
Yaozong Zheng, Bineng Zhong, Qihua Liang, Zhiyi Mo, Shengping Zhang, and Xianxian Li. Odtrack: Online dense temporal token learning for visual tracking. InProceed- ings of the AAAI conference on artificial intelligence, pages 7588–7596, 2024. 6 11 A. Additional Implementation Details A.1. Computing Environments SUMO is a training-free model, with all infe...
2024
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.