Recognition: no theorem link
Event6D: Event-based Novel Object 6D Pose Tracking
Pith reviewed 2026-05-14 22:04 UTC · model grok-4.3
The pith
EventTrack6D tracks 6D poses of novel objects using event cameras by reconstructing intensity and depth from event streams at over 120 FPS.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
EventTrack6D is an event-depth tracking framework that generalizes to novel objects without object-specific training by reconstructing both intensity and depth at arbitrary timestamps between depth frames. Conditioned on the most recent depth measurement, the dual reconstruction recovers dense photometric and geometric cues from sparse event streams, operating at over 120 FPS while maintaining temporal consistency under rapid motion.
What carries the argument
The dual reconstruction network that recovers dense photometric and geometric cues from sparse event streams, conditioned on the most recent depth measurement.
If this is right
- Operates in real time at over 120 FPS for fast dynamic scenes.
- Generalizes from synthetic training data to real-world scenarios without fine-tuning.
- Maintains accurate tracking across diverse objects and motion patterns.
- Provides a benchmark suite including synthetic training data and real and simulated evaluation sets.
Where Pith is reading between the lines
- This reconstruction approach could potentially be adapted for other event-based vision tasks such as optical flow estimation in high-speed scenarios.
- By avoiding object-specific training, the method opens possibilities for deploying tracking systems in environments with frequently changing object sets.
- The reliance on recent depth measurements suggests integration with depth sensors could further improve performance in varying lighting conditions.
Load-bearing premise
The dual reconstruction network can reliably recover dense photometric and geometric cues from sparse event streams for arbitrary novel objects and rapid motions when conditioned only on the most recent depth measurement.
What would settle it
Observing a significant drop in tracking accuracy on real event data with rapid object motions or unseen object shapes compared to synthetic benchmarks would indicate the reconstruction does not generalize as claimed.
Figures
read the original abstract
Event cameras provide microsecond latency, making them suitable for 6D object pose tracking in fast, dynamic scenes where conventional RGB and depth pipelines suffer from motion blur and large pixel displacements. We introduce EventTrack6D, an event-depth tracking framework that generalizes to novel objects without object-specific training by reconstructing both intensity and depth at arbitrary timestamps between depth frames. Conditioned on the most recent depth measurement, our dual reconstruction recovers dense photometric and geometric cues from sparse event streams. Our EventTrack6D operates at over 120 FPS and maintains temporal consistency under rapid motion. To support training and evaluation, we introduce a comprehensive benchmark suite: a large-scale synthetic dataset for training and two complementary evaluation sets, including real and simulated event datasets. Trained exclusively on synthetic data, EventTrack6D generalizes effectively to real-world scenarios without fine-tuning, maintaining accurate tracking across diverse objects and motion patterns. Our method and datasets validate the effectiveness of event cameras for event-based 6D pose tracking of novel objects. Code and datasets are publicly available at https://chohoonhee.github.io/Event6D.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper presents EventTrack6D, an event-based framework for 6D pose tracking of novel objects. It uses a dual reconstruction network to recover dense intensity and depth cues from sparse event streams, conditioned on the most recent depth measurement, enabling tracking at arbitrary timestamps between depth frames. Trained exclusively on synthetic data, the method claims to generalize effectively to real-world scenarios without fine-tuning, operating above 120 FPS while maintaining temporal consistency under rapid motion. The work introduces a large-scale synthetic training dataset and two evaluation sets (real and simulated events) to support this, with public code and data release.
Significance. If the synthetic-to-real generalization without fine-tuning holds under the reported conditions, the result would advance event-camera applications in fast-motion 6D tracking by eliminating object-specific training requirements. The public datasets and code provide a concrete benchmark that could facilitate follow-on work in event-based vision.
major comments (1)
- [§3.2] §3.2 (Dual Reconstruction Network): Conditioning the network solely on the single most recent depth measurement creates a risk that geometric priors become stale under rapid object motion or inter-frame depth changes. Because the central no-fine-tuning generalization claim depends on reliable dense cue recovery from events alone, the manuscript should include targeted ablations or error analysis on motion speed and depth variation to confirm that inference from the event stream remains accurate for novel shapes.
minor comments (2)
- [Abstract] The abstract states performance at 'over 120 FPS' without specifying the hardware platform, input resolution, or exact timing breakdown between reconstruction and tracking stages.
- [§5] Table or figure captions for the benchmark datasets should explicitly list the number of objects, motion types, and event rates to allow direct comparison with prior event-tracking work.
Simulated Author's Rebuttal
We thank the referee for the constructive feedback and positive assessment of the significance of EventTrack6D. We address the single major comment below by agreeing to incorporate the requested analyses, which we believe will further strengthen the evidence for our synthetic-to-real generalization claims under rapid motion.
read point-by-point responses
-
Referee: [§3.2] §3.2 (Dual Reconstruction Network): Conditioning the network solely on the single most recent depth measurement creates a risk that geometric priors become stale under rapid object motion or inter-frame depth changes. Because the central no-fine-tuning generalization claim depends on reliable dense cue recovery from events alone, the manuscript should include targeted ablations or error analysis on motion speed and depth variation to confirm that inference from the event stream remains accurate for novel shapes.
Authors: We thank the referee for this insightful observation on potential staleness of geometric priors. Our dual reconstruction network is explicitly trained on synthetic sequences that include diverse motion speeds and depth variations, allowing the event stream to provide continuous high-frequency updates that compensate for any outdated depth conditioning. Original experiments already demonstrate robust tracking at >120 FPS on rapid-motion real and simulated sequences without fine-tuning. To directly address the request, the revised manuscript adds a new ablation subsection that systematically varies inter-frame motion velocity and depth change magnitude, reporting both reconstruction error and final 6D pose accuracy for novel objects. These results show graceful degradation, confirming that event-based inference remains reliable even when the most recent depth measurement is stale. revision: yes
Circularity Check
No circularity: empirical generalization presented as experimental outcome
full rationale
The paper's central claim—that EventTrack6D, trained only on synthetic data, generalizes to real novel objects without fine-tuning—is framed as an empirical result validated on introduced benchmarks rather than a quantity derived by definition or self-referential fitting. The abstract describes the dual reconstruction network as a methodological component conditioned on recent depth to recover cues from events, but does not equate the reported tracking accuracy or generalization performance to any fitted parameter or input defined from the same data. No equations, self-citations, or uniqueness theorems are invoked in the provided text to force the result; the synthetic-to-real transfer is presented as an observed outcome of the framework and datasets. This leaves the derivation self-contained against external evaluation.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Event camera output can be treated as a reliable sparse signal of brightness changes that, when combined with occasional depth frames, suffices to reconstruct dense intensity and geometry.
Reference graph
Works this paper leans on
-
[1]
Mtevent: A multi-task event camera dataset for 6d pose estimation and moving object detection
Shrutarv Awasthi, Anas Gouda, Sven Franke, J ´erˆome Ruti- nowski, Frank Hoffmann, and Moritz Roidl. Mtevent: A multi-task event camera dataset for 6d pose estimation and moving object detection. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 5102–5110, 2025. 3
work page 2025
-
[2]
Seunghyeok Back, Joosoon Lee, Kangmin Kim, Heeseon Rho, Geonhyup Lee, Raeyoung Kang, Sangbeom Lee, Sangjun Noh, Youngjin Lee, Taeyeop Lee, et al. Grasp- clutter6d: A large-scale real-world dataset for robust per- ception and grasping in cluttered scenes.IEEE Robotics and Automation Letters, 2025. 2
work page 2025
-
[3]
Prithviraj Banerjee, Sindi Shkodrani, Pierre Moulon, Shreyas Hampali, Fan Zhang, Jade Fountain, Edward Miller, Selen Basol, Richard Newcombe, Robert Wang, et al. Introducing hot3d: An egocentric dataset for 3d hand and object tracking.arXiv preprint arXiv:2406.09598,
-
[4]
Hot3d: Hand and object tracking in 3d from egocentric multi-view videos
Prithviraj Banerjee, Sindi Shkodrani, Pierre Moulon, Shreyas Hampali, Shangchen Han, Fan Zhang, Linguang Zhang, Jade Fountain, Edward Miller, Selen Basol, et al. Hot3d: Hand and object tracking in 3d from egocentric multi-view videos. InProceedings of the IEEE/CVF Con- ference on Computer Vision and Pattern Recognition, pages 7061–7071, 2025. 5
work page 2025
-
[5]
Simultaneous optical flow and intensity estimation from an event camera
Patrick Bardow, Andrew J Davison, and Stefan Leuteneg- ger. Simultaneous optical flow and intensity estimation from an event camera. InProceedings of the IEEE con- ference on computer vision and pattern recognition, pages 884–892, 2016. 2
work page 2016
-
[6]
Real-time image- based tracking of planes using efficient second-order min- imization
Selim Benhimane and Ezio Malis. Real-time image- based tracking of planes using efficient second-order min- imization. In2004 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)(IEEE Cat. No. 04CH37566), pages 943–948. IEEE, 2004. 2
work page 2004
-
[7]
Method for registration of 3-d shapes
Paul J Besl and Neil D McKay. Method for registration of 3-d shapes. InSensor fusion IV: control paradigms and data structures, pages 586–606. Spie, 1992. 2, 6, 4
work page 1992
-
[8]
The ycb object and model set: Towards common benchmarks for manip- ulation research
Berk Calli, Arjun Singh, Aaron Walsman, Siddhartha Srini- vasa, Pieter Abbeel, and Aaron M Dollar. The ycb object and model set: Towards common benchmarks for manip- ulation research. In2015 international conference on ad- vanced robotics (ICAR), pages 510–517. IEEE, 2015. 1, 5, 3
work page 2015
-
[9]
Dexycb: A benchmark for capturing hand grasping of objects
Yu-Wei Chao, Wei Yang, Yu Xiang, Pavlo Molchanov, Ankur Handa, Jonathan Tremblay, Yashraj S Narang, Karl Van Wyk, Umar Iqbal, Stan Birchfield, et al. Dexycb: A benchmark for capturing hand grasping of objects. InPro- ceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 9044–9053, 2021. 1, 5
work page 2021
-
[10]
Repurposing pre- trained video diffusion models for event-based video inter- polation
Jingxi Chen, Brandon Y Feng, Haoming Cai, Tianfu Wang, Levi Burner, Dehao Yuan, Cornelia Fermuller, Christo- pher A Metzler, and Yiannis Aloimonos. Repurposing pre- trained video diffusion models for event-based video inter- polation. InProceedings of the Computer Vision and Pat- tern Recognition Conference, pages 12456–12466, 2025. 3
work page 2025
-
[11]
Peiyu Chen, Weipeng Guan, and Peng Lu. Esvio: Event- based stereo visual inertial odometry.IEEE Robotics and Automation Letters, 8(6):3661–3668, 2023. 3
work page 2023
-
[12]
Gre-slam: 6-dof pure event- based slam with semi-dense depth recovery assisted bundle adjustment
Yang Chen and Lin Zhang. Gre-slam: 6-dof pure event- based slam with semi-dense depth recovery assisted bundle adjustment. InProceedings of the 2025 International Con- ference on Multimedia Retrieval, pages 90–98, 2025. 3
work page 2025
-
[13]
Tem- poral event stereo via joint learning with stereoscopic flow
Hoonhee Cho, Jae-Young Kang, and Kuk-Jin Yoon. Tem- poral event stereo via joint learning with stereoscopic flow. InEuropean Conference on Computer Vision, pages 294–
-
[14]
Hoonhee Cho, Taewoo Kim, Yuhwan Jeong, and Kuk-Jin Yoon. A benchmark dataset for event-guided human pose estimation and tracking in extreme conditions.Advances in Neural Information Processing Systems, 37:134826– 134840, 2024. 3
work page 2024
-
[15]
Hoonhee Cho, Taewoo Kim, Yuhwan Jeong, and Kuk-Jin Yoon. Tta-evf: Test-time adaptation for event-based video frame interpolation via reliable pixel and sample estima- tion. InProceedings of the IEEE/CVF Conference on Com- puter Vision and Pattern Recognition, pages 25701–25711,
-
[16]
Ev-3dod: Pushing the temporal boundaries of 3d object detection with event cameras
Hoonhee Cho, Jae-young Kang, Youngho Kim, and Kuk- Jin Yoon. Ev-3dod: Pushing the temporal boundaries of 3d object detection with event cameras. InProceedings of the Computer Vision and Pattern Recognition Conference, pages 27197–27210, 2025. 3
work page 2025
-
[17]
Dense hand-object (ho) graspnet with full grasping taxonomy and dynamics
Woojin Cho, Jihyun Lee, Minjae Yi, Minje Kim, Taeyun Woo, Donghwan Kim, Taewook Ha, Hyokeun Lee, Je- Hwan Ryu, Woontack Woo, et al. Dense hand-object (ho) graspnet with full grasping taxonomy and dynamics. In European Conference on Computer Vision, pages 284–303. Springer, 2024. 5, 3
work page 2024
-
[18]
Real-time markerless tracking for augmented reality: the virtual visual servoing framework
Andrew I Comport, Eric Marchand, Muriel Pressigout, and Francois Chaumette. Real-time markerless tracking for augmented reality: the virtual visual servoing framework. IEEE Transactions on visualization and computer graph- ics, 12(4):615–628, 2006. 2
work page 2006
-
[19]
Interacting maps for fast vi- sual interpretation
Matthew Cook, Luca Gugelmann, Florian Jug, Christoph Krautz, and Angelika Steger. Interacting maps for fast vi- sual interpretation. InThe 2011 International Joint Con- ference on Neural Networks, pages 770–776. IEEE, 2011. 2
work page 2011
-
[20]
Robust 3d track- ing with descriptor fields
Alberto Crivellaro and Vincent Lepetit. Robust 3d track- ing with descriptor fields. InProceedings of the IEEE con- ference on computer vision and pattern recognition, pages 3414–3421, 2014. 2
work page 2014
-
[21]
Xinke Deng, Arsalan Mousavian, Yu Xiang, Fei Xia, Tim- othy Bretl, and Dieter Fox. Poserbpf: A rao–blackwellized particle filter for 6-d object pose tracking.IEEE Transac- tions on Robotics, 37(5):1328–1342, 2021. 2
work page 2021
-
[22]
Blenderproc: Reducing the reality gap with photorealistic rendering
Maximilian Denninger, Martin Sundermeyer, Dominik Winkelbauer, Dmitry Olefir, Tomas Hodan, Youssef Zidan, Mohamad Elbadrawy, Markus Knauer, Harinandan Katam, and Ahsan Lodhi. Blenderproc: Reducing the reality gap with photorealistic rendering. In16th Robotics: Science and Systems, RSS 2020, Workshops, 2020. 5
work page 2020
-
[23]
Maximilian Denninger, Dominik Winkelbauer, Martin Sundermeyer, Wout Boerdijk, Markus Wendelin Knauer, Klaus H Strobl, Matthias Humt, and Rudolph Triebel. Blenderproc2: A procedural pipeline for photorealistic ren- dering.Journal of Open Source Software, 8(82):4901,
-
[24]
Google scanned objects: A high-quality dataset of 3d scanned household items
Laura Downs, Anthony Francis, Nate Koenig, Brandon Kinman, Ryan Hickman, Krista Reymann, Thomas B McHugh, and Vincent Vanhoucke. Google scanned objects: A high-quality dataset of 3d scanned household items. In 2022 International Conference on Robotics and Automa- tion (ICRA), pages 2553–2560. Ieee, 2022. 5, 1, 3
work page 2022
-
[25]
Tom Drummond and Roberto Cipolla. Real-time visual tracking of complex structures.IEEE Transactions on Pattern Analysis & Machine Intelligence, 24(07):932–946,
-
[26]
Pizza: A powerful image-only zero-shot zero- cad approach to 6 dof tracking
Yuming Du, Yang Xiao, Michael Ramamonjisoa, Vincent Lepetit, et al. Pizza: A powerful image-only zero-shot zero- cad approach to 6 dof tracking. In2022 International Con- ference on 3D Vision (3DV), pages 515–525. IEEE, 2022. 2
work page 2022
-
[27]
Rgb-de: Event camera calibration for fast 6-dof object tracking
Etienne Dubeau, Mathieu Garon, Benoit Debaque, Raoul de Charette, and Jean-Franc ¸ois Lalonde. Rgb-de: Event camera calibration for fast 6-dof object tracking. In2020 IEEE International Symposium on Mixed and Augmented Reality (ISMAR), pages 127–135. IEEE, 2020. 1, 2, 3, 5
work page 2020
-
[28]
Real-time 6-dof pose estimation by an event-based camera using active led markers
Gerald Ebmer, Adam Loch, Minh Nhat Vu, Roberto Mecca, Germain Haessig, Christian Hartl-Nesic, Markus Vincze, and Andreas Kugi. Real-time 6-dof pose estimation by an event-based camera using active led markers. InProceed- ings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pages 8137–8146, 2024. 3
work page 2024
-
[29]
Unsupervised event-based video reconstruction
Gereon Fox, Xingang Pan, Ayush Tewari, Mohamed El- gharib, and Christian Theobalt. Unsupervised event-based video reconstruction. InProceedings of the IEEE/CVF Win- ter Conference on Applications of Computer Vision, pages 4179–4188, 2024. 3
work page 2024
-
[30]
Unified temporal and spatial calibration for multi-sensor systems
Paul Furgale, Joern Rehder, and Roland Siegwart. Unified temporal and spatial calibration for multi-sensor systems. In2013 IEEE/RSJ International Conference on Intelligent Robots and Systems, pages 1280–1286. IEEE, 2013. 1
work page 2013
-
[31]
Comparative anal- ysis of optitrack motion capture systems
Joshua S Furtado, Hugh HT Liu, Gilbert Lai, Herve Lacheray, and Jason Desouza-Coelho. Comparative anal- ysis of optitrack motion capture systems. InAdvances in Motion Sensing and Control for Robotic Applications: Selected Papers from the Symposium on Mechatronics, Robotics, and Control (SMRC’18)-CSME International Congress 2018, May 27-30, 2018 Toronto, C...
work page 2018
-
[32]
Guillermo Gallego, Tobi Delbr ¨uck, Garrick Orchard, Chiara Bartolozzi, Brian Taba, Andrea Censi, Stefan Leutenegger, Andrew J Davison, J ¨org Conradt, Kostas Daniilidis, et al. Event-based vision: A survey.IEEE Trans- actions on Pattern Analysis and Machine Intelligence, 44 (1):154–180, 2020. 1
work page 2020
-
[33]
Deep 6-dof tracking.IEEE transactions on visualization and computer graphics, 23(11):2410–2418, 2017
Mathieu Garon and Jean-Franc ¸ois Lalonde. Deep 6-dof tracking.IEEE transactions on visualization and computer graphics, 23(11):2410–2418, 2017. 2
work page 2017
-
[34]
Low-latency au- tomotive vision with event cameras.Nature, 629(8014): 1034–1040, 2024
Daniel Gehrig and Davide Scaramuzza. Low-latency au- tomotive vision with event cameras.Nature, 629(8014): 1034–1040, 2024. 3
work page 2024
-
[35]
Asynchronous, photometric feature tracking using events and frames
Daniel Gehrig, Henri Rebecq, Guillermo Gallego, and Davide Scaramuzza. Asynchronous, photometric feature tracking using events and frames. InProceedings of the European Conference on Computer Vision (ECCV), pages 750–765, 2018. 2
work page 2018
-
[36]
Video to events: Recycling video datasets for event cameras
Daniel Gehrig, Mathias Gehrig, Javier Hidalgo-Carri ´o, and Davide Scaramuzza. Video to events: Recycling video datasets for event cameras. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 3586–3595, 2020. 1
work page 2020
-
[37]
Recurrent vision transformers for object detection with event cameras
Mathias Gehrig and Davide Scaramuzza. Recurrent vision transformers for object detection with event cameras. In Proceedings of the IEEE/CVF conference on computer vi- sion and pattern recognition, pages 13884–13893, 2023. 6
work page 2023
-
[38]
Mathias Gehrig, Willem Aarents, Daniel Gehrig, and Da- vide Scaramuzza. Dsec: A stereo event camera dataset for driving scenarios.IEEE Robotics and Automation Letters, 6(3):4947–4954, 2021. 3
work page 2021
-
[39]
E-raft: Dense optical flow from event cameras
Mathias Gehrig, Mario Millh ¨ausler, Daniel Gehrig, and Da- vide Scaramuzza. E-raft: Dense optical flow from event cameras. In2021 International Conference on 3D Vision (3DV), pages 197–206. IEEE, 2021. 3
work page 2021
-
[40]
Edopt: Event-camera 6-dof dynamic object pose tracking
Arren Glover, Luna Gava, Zhichao Li, and Chiara Bar- tolozzi. Edopt: Event-camera 6-dof dynamic object pose tracking. In2024 IEEE International Conference on Robotics and Automation (ICRA), pages 18200–18206. IEEE, 2024. 3
work page 2024
-
[41]
Edopt: Event-camera 6-dof dynamic object pose tracking
Arren Glover, Luna Gava, Zhichao Li, and Chiara Bar- tolozzi. Edopt: Event-camera 6-dof dynamic object pose tracking. In2024 IEEE International Conference on Robotics and Automation (ICRA), pages 18200–18206. IEEE, 2024. 5
work page 2024
-
[42]
Deio: Deep event inertial odometry
Weipeng Guan, Fuling Lin, Peiyu Chen, and Peng Lu. Deio: Deep event inertial odometry. InProceedings of the IEEE/CVF International Conference on Computer Vision, pages 4606–4615, 2025. 3
work page 2025
-
[43]
Andrew Guo, Bowen Wen, Jianhe Yuan, Jonathan Trem- blay, Stephen Tyree, Jeffrey Smith, and Stan Birchfield. Handal: A dataset of real-world manipulable object cate- gories with pose annotations, affordances, and reconstruc- tions. In2023 IEEE/RSJ International Conference on In- telligent Robots and Systems (IROS), pages 11428–11435. IEEE, 2023. 1
work page 2023
-
[44]
Measuring depth accuracy in rgbd cameras
Hussein Haggag, Mohammed Hossny, Despina Filippidis, Douglas Creighton, Saeid Nahavandi, and Vinod Puri. Measuring depth accuracy in rgbd cameras. In2013, 7th international conference on signal processing and commu- nication systems (ICSPCS), pages 1–7. IEEE, 2013. 1
work page 2013
-
[45]
Etap: Event- based tracking of any point
Friedhelm Hamann, Daniel Gehrig, Filbert Febryanto, Kostas Daniilidis, and Guillermo Gallego. Etap: Event- based tracking of any point. InProceedings of the Com- puter Vision and Pattern Recognition Conference, pages 27186–27196, 2025. 6, 1, 4
work page 2025
-
[46]
Honnotate: A method for 3d annotation of hand and object poses
Shreyas Hampali, Mahdi Rad, Markus Oberweger, and Vin- cent Lepetit. Honnotate: A method for 3d annotation of hand and object poses. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 3196–3206, 2020. 2, 5, 1
work page 2020
-
[47]
Rapid-a video rate object tracker
Chris Harris and Carl Stennett. Rapid-a video rate object tracker. InBMVC, page 3, 1990. 2
work page 1990
-
[48]
Oussama Abdul Hay, Xiaoqian Huang, Abdulla Ayyad, Es- lam Sherif, Randa Almadhoun, Yusra Abdulrahman, Lak- mal Seneviratne, Abdulqader Abusafieh, and Yahya Zweiri. E-pose: A large scale event camera dataset for object pose estimation.Scientific data, 12(1):245, 2025. 1, 2, 3, 5
work page 2025
-
[49]
Learning monocular dense depth from events
Javier Hidalgo-Carri ´o, Daniel Gehrig, and Davide Scara- muzza. Learning monocular dense depth from events. In 2020 International Conference on 3D Vision (3DV), pages 534–542. IEEE, 2020. 3
work page 2020
-
[50]
Stefan Hinterstoisser, Vincent Lepetit, Slobodan Ilic, Stefan Holzer, Gary Bradski, Kurt Konolige, and Nassir Navab. Model based training, detection and pose estimation of texture-less 3d objects in heavily cluttered scenes. InAsian conference on computer vision, pages 548–562. Springer,
-
[51]
Bop challenge 2023 on detection segmentation and pose estimation of seen and unseen rigid objects
Tomas Hodan, Martin Sundermeyer, Yann Labbe, Van Nguyen Nguyen, Gu Wang, Eric Brachmann, Bertram Drost, Vincent Lepetit, Carsten Rother, and Jiri Matas. Bop challenge 2023 on detection segmentation and pose estimation of seen and unseen rigid objects. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 5610–5619, 2024. 1, 5
work page 2023
-
[52]
Ede-distill: Boosting event-based monocular depth estimation performance via knowledge distillation
Chenming Hu, Junjie Jiang, Yaohui Li, Mingyuan Sun, and Zheng Fang. Ede-distill: Boosting event-based monocular depth estimation performance via knowledge distillation. IEEE Robotics and Automation Letters, 2025. 3
work page 2025
-
[53]
Depth-based object tracking using a robust gaussian filter
Jan Issac, Manuel W ¨uthrich, Cristina Garcia Cifuentes, Jeannette Bohg, Sebastian Trimpe, and Stefan Schaal. Depth-based object tracking using a robust gaussian filter. In2016 IEEE international conference on robotics and au- tomation (ICRA), pages 608–615. IEEE, 2016. 2
work page 2016
-
[54]
To- wards robust event-based networks for nighttime via un- paired day-to-night event translation
Yuhwan Jeong, Hoonhee Cho, and Kuk-Jin Yoon. To- wards robust event-based networks for nighttime via un- paired day-to-night event translation. InEuropean Confer- ence on Computer Vision, pages 286–306. Springer, 2024. 3
work page 2024
-
[55]
What mat- ters in unsupervised optical flow
Rico Jonschkowski, Austin Stone, Jonathan T Barron, Ariel Gordon, Kurt Konolige, and Anelia Angelova. What mat- ters in unsupervised optical flow. InEuropean conference on computer vision, pages 557–572. Springer, 2020. 4
work page 2020
-
[56]
Jae-Young Kang, Hoonhee Cho, and Kuk-Jin Yoon. Tem- poral stereo matching from event cameras via joint learning with stereoscopic flow.IEEE Transactions on Pattern Anal- ysis and Machine Intelligence, 2025. 3
work page 2025
-
[57]
Un- leashing the temporal potential of stereo event cameras for continuous-time 3d object detection
Jae-Young Kang, Hoonhee Cho, and Kuk-Jin Yoon. Un- leashing the temporal potential of stereo event cameras for continuous-time 3d object detection. InProceedings of the IEEE/CVF International Conference on Computer Vision, pages 6869–6881, 2025. 3
work page 2025
-
[58]
Intel realsense stereo- scopic depth cameras
Leonid Keselman, John Iselin Woodfill, Anders Grunnet- Jepsen, and Achintya Bhowmik. Intel realsense stereo- scopic depth cameras. InProceedings of the IEEE con- ference on computer vision and pattern recognition work- shops, pages 1–10, 2017. 1
work page 2017
-
[59]
Simultaneous mosaicing and track- ing with an event camera.J
Hanme Kim, Ankur Handa, Ryad Benosman, Sio-Hoi Ieng, and Andrew J Davison. Simultaneous mosaicing and track- ing with an event camera.J. Solid State Circ, 43:566–576,
-
[60]
Real-time 3d reconstruction and 6-dof tracking with an event camera
Hanme Kim, Stefan Leutenegger, and Andrew J Davison. Real-time 3d reconstruction and 6-dof tracking with an event camera. InEuropean conference on computer vision, pages 349–364. Springer, 2016. 2, 3
work page 2016
-
[61]
Alexander Kirillov, Eric Mintun, Nikhila Ravi, Hanzi Mao, Chloe Rolland, Laura Gustafson, Tete Xiao, Spencer Whitehead, Alexander C Berg, Wan-Yen Lo, et al. Segment anything. InProceedings of the IEEE/CVF international conference on computer vision, pages 4015–4026, 2023. 2
work page 2023
-
[62]
Simon Klenk, Marvin Motzet, Lukas Koestler, and Daniel Cremers. Deep event visual odometry. In2024 Inter- national conference on 3D vision (3DV), pages 739–749. IEEE, 2024. 1
work page 2024
-
[63]
Cosypose: Consistent multi-view multi-object 6d pose estimation
Yann Labb ´e, Justin Carpentier, Mathieu Aubry, and Josef Sivic. Cosypose: Consistent multi-view multi-object 6d pose estimation. InEuropean conference on computer vi- sion, pages 574–591. Springer, 2020. 2, 4
work page 2020
-
[64]
arXiv preprint arXiv:2212.06870 (2022)
Yann Labb ´e, Lucas Manuelli, Arsalan Mousavian, Stephen Tyree, Stan Birchfield, Jonathan Tremblay, Justin Carpen- tier, Mathieu Aubry, Dieter Fox, and Josef Sivic. Mega- pose: 6d pose estimation of novel objects via render & com- pare.arXiv preprint arXiv:2212.06870, 2022. 1, 2, 3, 4, 6, 7
-
[65]
Taeyeop Lee, Byeong-Uk Lee, Myungchul Kim, and In So Kweon. Category-level metric scale object shape and pose estimation.IEEE Robotics and Automation Letters, 6(4): 8575–8582, 2021. 1, 2
work page 2021
-
[66]
Tta-cope: Test-time adaptation for category-level object pose estimation
Taeyeop Lee, Jonathan Tremblay, Valts Blukis, Bowen Wen, Byeong-Uk Lee, Inkyu Shin, Stan Birchfield, In So Kweon, and Kuk-Jin Yoon. Tta-cope: Test-time adaptation for category-level object pose estimation. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 21285–21295, 2023. 2
work page 2023
-
[67]
Taeyeop Lee, Gyuree Kang, Bowen Wen, Youngho Kim, Seunghyeok Back, In So Kweon, David Hyunchul Shim, and Kuk-Jin Yoon. Delta: Demonstration and language- guided novel transparent object manipulation.arXiv preprint arXiv:2510.05662, 2025. 1
-
[68]
Any6d: Model-free 6d pose estimation of novel objects
Taeyeop Lee, Bowen Wen, Minjun Kang, Gyuree Kang, In So Kweon, and Kuk-Jin Yoon. Any6d: Model-free 6d pose estimation of novel objects. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 11633–11643, 2025. 2
work page 2025
-
[69]
Ep n p: An accurate o (n) solution to the p n p problem
Vincent Lepetit, Francesc Moreno-Noguer, and Pascal Fua. Ep n p: An accurate o (n) solution to the p n p problem. International Journal of Computer Vision, 81(2):155–166,
-
[70]
Vincent Lepetit, Francesc Moreno-Noguer, and Pascal Fua. Epnp: An accurate o (n) solution to the p n p problem.Inter- national Journal of Computer Vision, 81(2):155–166, 2009. 4
work page 2009
-
[71]
Deepim: Deep iterative matching for 6d pose estimation
Yi Li, Gu Wang, Xiangyang Ji, Yu Xiang, and Dieter Fox. Deepim: Deep iterative matching for 6d pose estimation. InProceedings of the European conference on computer vi- sion (ECCV), pages 683–698, 2018. 2, 4
work page 2018
-
[72]
6-dof object tracking with event-based optical flow and frames
Zhichao Li, Arren Glover, Chiara Bartolozzi, and Lorenzo Natale. 6-dof object tracking with event-based optical flow and frames. In2025 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pages 18880– 18887. IEEE, 2025. 3
work page 2025
-
[73]
Sam-6d: Segment anything model meets zero-shot 6d object pose es- timation
Jiehong Lin, Lihua Liu, Dekun Lu, and Kui Jia. Sam-6d: Segment anything model meets zero-shot 6d object pose es- timation. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 27906– 27916, 2024. 1
work page 2024
-
[74]
Keypoint-based category-level object pose tracking from an rgb sequence with uncertainty estimation
Yunzhi Lin, Jonathan Tremblay, Stephen Tyree, Patricio A Vela, and Stan Birchfield. Keypoint-based category-level object pose tracking from an rgb sequence with uncertainty estimation. In2022 International Conference on Robotics and Automation (ICRA), pages 1258–1264. IEEE, 2022. 2
work page 2022
-
[75]
Spatiotemporal registration for event-based visual odometry
Daqi Liu, Alvaro Parra, and Tat-Jun Chin. Spatiotemporal registration for event-based visual odometry. InProceed- ings of the IEEE/CVF conference on computer vision and pattern recognition, pages 4937–4946, 2021. 3
work page 2021
-
[76]
Xu Liu, Xiaopeng Fan, Jianing Li, Dianze Li, Wei Zhang, Zhengyu Ma, and Yonghong Tian. High-rate monocu- lar depth estimation via cross frame-rate collaboration of frames and events.International Journal of Computer Vi- sion, 133(10):7332–7351, 2025. 3
work page 2025
-
[77]
Zhe Liu, Dianxi Shi, Ruihao Li, Yi Zhang, and Shaowu Yang. T-esvo: improved event-based stereo visual odome- try via adaptive time-surface and truncated signed distance function.Advanced Intelligent Systems, 5(9):2300027,
-
[78]
Optical flow-guided 6dof ob- ject pose tracking with an event camera
Zibin Liu, Banglei Guan, Yang Shang, Shunkun Liang, Zhenbao Yu, and Qifeng Yu. Optical flow-guided 6dof ob- ject pose tracking with an event camera. InProceedings of the 32nd ACM International Conference on Multimedia, pages 6501–6509, 2024. 3
work page 2024
-
[79]
Zibin Liu, Banglei Guan, Yang Shang, Qifeng Yu, and Lau- rent Kneip. Line-based 6-dof object pose estimation and tracking with an event camera.IEEE Transactions on Im- age Processing, 33:4765–4780, 2024. 5
work page 2024
-
[80]
Zibin Liu, Banglei Guan, Yang Shang, Qifeng Yu, and Lau- rent Kneip. Line-based 6-dof object pose estimation and tracking with an event camera.IEEE Transactions on Im- age Processing, 33:4765–4780, 2024. 3
work page 2024
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.