arxiv: 2603.28045 · v2 · submitted 2026-03-30 · 💻 cs.CV

Recognition: no theorem link

Event6D: Event-based Novel Object 6D Pose Tracking

Jae-Young Kang , Hoonhee Cho , Taeyeop Lee , Minjun Kang , Bowen Wen , Youngho Kim , Kuk-Jin Yoon

Authors on Pith no claims yet

Pith reviewed 2026-05-14 22:04 UTC · model grok-4.3

classification 💻 cs.CV

keywords event camera6D pose trackingnovel object trackingevent-based visiondepth reconstructionsynthetic to realreal-time trackingobject pose estimation

0 comments

The pith

EventTrack6D tracks 6D poses of novel objects using event cameras by reconstructing intensity and depth from event streams at over 120 FPS.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces a framework for 6D object pose tracking that uses event camera data to handle fast motions where standard cameras fail due to blur. It reconstructs dense intensity and depth information from sparse event streams using the latest depth measurement as conditioning. This allows the system to track objects it has never seen before during training, generalizing from synthetic data to real scenarios without additional fine-tuning. Such an approach matters because it enables reliable pose estimation in dynamic environments like robotics or augmented reality where speed and adaptability to new objects are essential.

Core claim

EventTrack6D is an event-depth tracking framework that generalizes to novel objects without object-specific training by reconstructing both intensity and depth at arbitrary timestamps between depth frames. Conditioned on the most recent depth measurement, the dual reconstruction recovers dense photometric and geometric cues from sparse event streams, operating at over 120 FPS while maintaining temporal consistency under rapid motion.

What carries the argument

The dual reconstruction network that recovers dense photometric and geometric cues from sparse event streams, conditioned on the most recent depth measurement.

If this is right

Operates in real time at over 120 FPS for fast dynamic scenes.
Generalizes from synthetic training data to real-world scenarios without fine-tuning.
Maintains accurate tracking across diverse objects and motion patterns.
Provides a benchmark suite including synthetic training data and real and simulated evaluation sets.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

This reconstruction approach could potentially be adapted for other event-based vision tasks such as optical flow estimation in high-speed scenarios.
By avoiding object-specific training, the method opens possibilities for deploying tracking systems in environments with frequently changing object sets.
The reliance on recent depth measurements suggests integration with depth sensors could further improve performance in varying lighting conditions.

Load-bearing premise

The dual reconstruction network can reliably recover dense photometric and geometric cues from sparse event streams for arbitrary novel objects and rapid motions when conditioned only on the most recent depth measurement.

What would settle it

Observing a significant drop in tracking accuracy on real event data with rapid object motions or unseen object shapes compared to synthetic benchmarks would indicate the reconstruction does not generalize as claimed.

Figures

Figures reproduced from arXiv: 2603.28045 by Bowen Wen, Hoonhee Cho, Jae-Young Kang, Kuk-Jin Yoon, Minjun Kang, Taeyeop Lee, Youngho Kim.

**Figure 2.** Figure 2: Overview of our EventTrack6D. EventTrack6D consists of a dual-modal reconstruction module and a pose refinement module. It can perform 6D [PITH_FULL_IMAGE:figures/full_fig_p003_2.png] view at source ↗

**Figure 3.** Figure 3: System designed for acquiring the Event6D dataset. The event [PITH_FULL_IMAGE:figures/full_fig_p005_3.png] view at source ↗

**Figure 4.** Figure 4: Qualitative comparison of 6D object tracking at 120 FPS on the Event6D dataset. Original FoundationPose(FP) [ [PITH_FULL_IMAGE:figures/full_fig_p007_4.png] view at source ↗

**Figure 5.** Figure 5: Qualitative depth-reconstruction results on depth-absent intervals. The future depth [PITH_FULL_IMAGE:figures/full_fig_p008_5.png] view at source ↗

**Figure 6.** Figure 6: Examples of the data used for camera calibration. [PITH_FULL_IMAGE:figures/full_fig_p010_6.png] view at source ↗

**Figure 7.** Figure 7: Illustration of Hand-eye calibration. We denote the OptiTrack [PITH_FULL_IMAGE:figures/full_fig_p010_7.png] view at source ↗

**Figure 8.** Figure 8: Visualization of trigger signals for overall system. [PITH_FULL_IMAGE:figures/full_fig_p011_8.png] view at source ↗

**Figure 10.** Figure 10: EventBlender6D samples visualized as temporal streams of RGB, event, depth, and corresponding 6D object poses. [PITH_FULL_IMAGE:figures/full_fig_p022_10.png] view at source ↗

**Figure 11.** Figure 11: EventHO3D samples visualized as temporal streams of RGB, event, depth, and corresponding 6D object poses. [PITH_FULL_IMAGE:figures/full_fig_p023_11.png] view at source ↗

**Figure 12.** Figure 12: Event6D test samples visualized as temporal streams of RGB, event, depth, and corresponding 6D object poses. [PITH_FULL_IMAGE:figures/full_fig_p024_12.png] view at source ↗

**Figure 13.** Figure 13: Qualitative comparison on the Event6D drill object sequence. Although the event-based methods operate at intervals corresponding to 120 FPS, [PITH_FULL_IMAGE:figures/full_fig_p025_13.png] view at source ↗

**Figure 14.** Figure 14: Qualitative comparison on the Event6D marker object sequence. Although the event-based methods operate at intervals corresponding to 120 FPS, [PITH_FULL_IMAGE:figures/full_fig_p026_14.png] view at source ↗

read the original abstract

Event cameras provide microsecond latency, making them suitable for 6D object pose tracking in fast, dynamic scenes where conventional RGB and depth pipelines suffer from motion blur and large pixel displacements. We introduce EventTrack6D, an event-depth tracking framework that generalizes to novel objects without object-specific training by reconstructing both intensity and depth at arbitrary timestamps between depth frames. Conditioned on the most recent depth measurement, our dual reconstruction recovers dense photometric and geometric cues from sparse event streams. Our EventTrack6D operates at over 120 FPS and maintains temporal consistency under rapid motion. To support training and evaluation, we introduce a comprehensive benchmark suite: a large-scale synthetic dataset for training and two complementary evaluation sets, including real and simulated event datasets. Trained exclusively on synthetic data, EventTrack6D generalizes effectively to real-world scenarios without fine-tuning, maintaining accurate tracking across diverse objects and motion patterns. Our method and datasets validate the effectiveness of event cameras for event-based 6D pose tracking of novel objects. Code and datasets are publicly available at https://chohoonhee.github.io/Event6D.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Event6D gives a workable event-based route to 6D tracking of unseen objects by reconstructing intensity and depth from events conditioned on the latest depth frame, with synthetic-only training that appears to carry over to real data.

read the letter

The core advance is the dual reconstruction network that turns sparse event streams into dense photometric and geometric maps at arbitrary times between depth frames, then feeds those into a tracker. They train everything on synthetic data and report that it holds up on real novel objects without fine-tuning, while running above 120 FPS. Releasing the code, the large synthetic training set, and both real and simulated test sets is the part that actually helps the field move forward; other groups can now build on the benchmark instead of starting from scratch. The abstract makes clear performance claims on generalization across objects and motions, and the circularity risk looks low since the tracking numbers are not defined by the same fitted parameters used in training. The main soft spot is the conditioning choice: the network sees only the single most recent depth measurement. Under rapid object motion or depth changes between those frames, the event stream has to carry the full burden of inferring updated geometry for shapes the model has never seen. If that inference slips, the recovered cues become noisy inputs to the pose estimator and the no-fine-tuning claim weakens. The abstract does not include the ablation or error analysis that would show how often this happens, so the soundness number stays moderate until the full methods and tables are checked. This paper is aimed at researchers who already work with event cameras for robotics or AR and need something that survives motion blur on new objects. It is worth sending to peer review because the released assets and the concrete speed claim give referees something concrete to test, even if the conditioning assumption needs tighter validation.

Referee Report

1 major / 2 minor

Summary. The paper presents EventTrack6D, an event-based framework for 6D pose tracking of novel objects. It uses a dual reconstruction network to recover dense intensity and depth cues from sparse event streams, conditioned on the most recent depth measurement, enabling tracking at arbitrary timestamps between depth frames. Trained exclusively on synthetic data, the method claims to generalize effectively to real-world scenarios without fine-tuning, operating above 120 FPS while maintaining temporal consistency under rapid motion. The work introduces a large-scale synthetic training dataset and two evaluation sets (real and simulated events) to support this, with public code and data release.

Significance. If the synthetic-to-real generalization without fine-tuning holds under the reported conditions, the result would advance event-camera applications in fast-motion 6D tracking by eliminating object-specific training requirements. The public datasets and code provide a concrete benchmark that could facilitate follow-on work in event-based vision.

major comments (1)

[§3.2] §3.2 (Dual Reconstruction Network): Conditioning the network solely on the single most recent depth measurement creates a risk that geometric priors become stale under rapid object motion or inter-frame depth changes. Because the central no-fine-tuning generalization claim depends on reliable dense cue recovery from events alone, the manuscript should include targeted ablations or error analysis on motion speed and depth variation to confirm that inference from the event stream remains accurate for novel shapes.

minor comments (2)

[Abstract] The abstract states performance at 'over 120 FPS' without specifying the hardware platform, input resolution, or exact timing breakdown between reconstruction and tracking stages.
[§5] Table or figure captions for the benchmark datasets should explicitly list the number of objects, motion types, and event rates to allow direct comparison with prior event-tracking work.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the constructive feedback and positive assessment of the significance of EventTrack6D. We address the single major comment below by agreeing to incorporate the requested analyses, which we believe will further strengthen the evidence for our synthetic-to-real generalization claims under rapid motion.

read point-by-point responses

Referee: [§3.2] §3.2 (Dual Reconstruction Network): Conditioning the network solely on the single most recent depth measurement creates a risk that geometric priors become stale under rapid object motion or inter-frame depth changes. Because the central no-fine-tuning generalization claim depends on reliable dense cue recovery from events alone, the manuscript should include targeted ablations or error analysis on motion speed and depth variation to confirm that inference from the event stream remains accurate for novel shapes.

Authors: We thank the referee for this insightful observation on potential staleness of geometric priors. Our dual reconstruction network is explicitly trained on synthetic sequences that include diverse motion speeds and depth variations, allowing the event stream to provide continuous high-frequency updates that compensate for any outdated depth conditioning. Original experiments already demonstrate robust tracking at >120 FPS on rapid-motion real and simulated sequences without fine-tuning. To directly address the request, the revised manuscript adds a new ablation subsection that systematically varies inter-frame motion velocity and depth change magnitude, reporting both reconstruction error and final 6D pose accuracy for novel objects. These results show graceful degradation, confirming that event-based inference remains reliable even when the most recent depth measurement is stale. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical generalization presented as experimental outcome

full rationale

The paper's central claim—that EventTrack6D, trained only on synthetic data, generalizes to real novel objects without fine-tuning—is framed as an empirical result validated on introduced benchmarks rather than a quantity derived by definition or self-referential fitting. The abstract describes the dual reconstruction network as a methodological component conditioned on recent depth to recover cues from events, but does not equate the reported tracking accuracy or generalization performance to any fitted parameter or input defined from the same data. No equations, self-citations, or uniqueness theorems are invoked in the provided text to force the result; the synthetic-to-real transfer is presented as an observed outcome of the framework and datasets. This leaves the derivation self-contained against external evaluation.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on the assumption that a learned dual reconstruction can invert sparse event data into usable dense cues for any novel object; no explicit free parameters, new axioms, or invented physical entities are named in the abstract.

axioms (1)

domain assumption Event camera output can be treated as a reliable sparse signal of brightness changes that, when combined with occasional depth frames, suffices to reconstruct dense intensity and geometry.
Invoked in the description of the dual reconstruction step.

pith-pipeline@v0.9.0 · 5511 in / 1273 out tokens · 38759 ms · 2026-05-14T22:04:20.260511+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

150 extracted references · 150 canonical work pages · 1 internal anchor

[1]

Mtevent: A multi-task event camera dataset for 6d pose estimation and moving object detection

Shrutarv Awasthi, Anas Gouda, Sven Franke, J ´erˆome Ruti- nowski, Frank Hoffmann, and Moritz Roidl. Mtevent: A multi-task event camera dataset for 6d pose estimation and moving object detection. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 5102–5110, 2025. 3

work page 2025
[2]

Grasp- clutter6d: A large-scale real-world dataset for robust per- ception and grasping in cluttered scenes.IEEE Robotics and Automation Letters, 2025

Seunghyeok Back, Joosoon Lee, Kangmin Kim, Heeseon Rho, Geonhyup Lee, Raeyoung Kang, Sangbeom Lee, Sangjun Noh, Youngjin Lee, Taeyeop Lee, et al. Grasp- clutter6d: A large-scale real-world dataset for robust per- ception and grasping in cluttered scenes.IEEE Robotics and Automation Letters, 2025. 2

work page 2025
[3]

Introducing hot3d: An egocentric dataset for 3d hand and object tracking.arXiv preprint arXiv:2406.09598,

Prithviraj Banerjee, Sindi Shkodrani, Pierre Moulon, Shreyas Hampali, Fan Zhang, Jade Fountain, Edward Miller, Selen Basol, Richard Newcombe, Robert Wang, et al. Introducing hot3d: An egocentric dataset for 3d hand and object tracking.arXiv preprint arXiv:2406.09598,

work page arXiv
[4]

Hot3d: Hand and object tracking in 3d from egocentric multi-view videos

Prithviraj Banerjee, Sindi Shkodrani, Pierre Moulon, Shreyas Hampali, Shangchen Han, Fan Zhang, Linguang Zhang, Jade Fountain, Edward Miller, Selen Basol, et al. Hot3d: Hand and object tracking in 3d from egocentric multi-view videos. InProceedings of the IEEE/CVF Con- ference on Computer Vision and Pattern Recognition, pages 7061–7071, 2025. 5

work page 2025
[5]

Simultaneous optical flow and intensity estimation from an event camera

Patrick Bardow, Andrew J Davison, and Stefan Leuteneg- ger. Simultaneous optical flow and intensity estimation from an event camera. InProceedings of the IEEE con- ference on computer vision and pattern recognition, pages 884–892, 2016. 2

work page 2016
[6]

Real-time image- based tracking of planes using efficient second-order min- imization

Selim Benhimane and Ezio Malis. Real-time image- based tracking of planes using efficient second-order min- imization. In2004 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)(IEEE Cat. No. 04CH37566), pages 943–948. IEEE, 2004. 2

work page 2004
[7]

Method for registration of 3-d shapes

Paul J Besl and Neil D McKay. Method for registration of 3-d shapes. InSensor fusion IV: control paradigms and data structures, pages 586–606. Spie, 1992. 2, 6, 4

work page 1992
[8]

The ycb object and model set: Towards common benchmarks for manip- ulation research

Berk Calli, Arjun Singh, Aaron Walsman, Siddhartha Srini- vasa, Pieter Abbeel, and Aaron M Dollar. The ycb object and model set: Towards common benchmarks for manip- ulation research. In2015 international conference on ad- vanced robotics (ICAR), pages 510–517. IEEE, 2015. 1, 5, 3

work page 2015
[9]

Dexycb: A benchmark for capturing hand grasping of objects

Yu-Wei Chao, Wei Yang, Yu Xiang, Pavlo Molchanov, Ankur Handa, Jonathan Tremblay, Yashraj S Narang, Karl Van Wyk, Umar Iqbal, Stan Birchfield, et al. Dexycb: A benchmark for capturing hand grasping of objects. InPro- ceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 9044–9053, 2021. 1, 5

work page 2021
[10]

Repurposing pre- trained video diffusion models for event-based video inter- polation

Jingxi Chen, Brandon Y Feng, Haoming Cai, Tianfu Wang, Levi Burner, Dehao Yuan, Cornelia Fermuller, Christo- pher A Metzler, and Yiannis Aloimonos. Repurposing pre- trained video diffusion models for event-based video inter- polation. InProceedings of the Computer Vision and Pat- tern Recognition Conference, pages 12456–12466, 2025. 3

work page 2025
[11]

Esvio: Event- based stereo visual inertial odometry.IEEE Robotics and Automation Letters, 8(6):3661–3668, 2023

Peiyu Chen, Weipeng Guan, and Peng Lu. Esvio: Event- based stereo visual inertial odometry.IEEE Robotics and Automation Letters, 8(6):3661–3668, 2023. 3

work page 2023
[12]

Gre-slam: 6-dof pure event- based slam with semi-dense depth recovery assisted bundle adjustment

Yang Chen and Lin Zhang. Gre-slam: 6-dof pure event- based slam with semi-dense depth recovery assisted bundle adjustment. InProceedings of the 2025 International Con- ference on Multimedia Retrieval, pages 90–98, 2025. 3

work page 2025
[13]

Tem- poral event stereo via joint learning with stereoscopic flow

Hoonhee Cho, Jae-Young Kang, and Kuk-Jin Yoon. Tem- poral event stereo via joint learning with stereoscopic flow. InEuropean Conference on Computer Vision, pages 294–

work page
[14]

A benchmark dataset for event-guided human pose estimation and tracking in extreme conditions.Advances in Neural Information Processing Systems, 37:134826– 134840, 2024

Hoonhee Cho, Taewoo Kim, Yuhwan Jeong, and Kuk-Jin Yoon. A benchmark dataset for event-guided human pose estimation and tracking in extreme conditions.Advances in Neural Information Processing Systems, 37:134826– 134840, 2024. 3

work page 2024
[15]

Tta-evf: Test-time adaptation for event-based video frame interpolation via reliable pixel and sample estima- tion

Hoonhee Cho, Taewoo Kim, Yuhwan Jeong, and Kuk-Jin Yoon. Tta-evf: Test-time adaptation for event-based video frame interpolation via reliable pixel and sample estima- tion. InProceedings of the IEEE/CVF Conference on Com- puter Vision and Pattern Recognition, pages 25701–25711,

work page
[16]

Ev-3dod: Pushing the temporal boundaries of 3d object detection with event cameras

Hoonhee Cho, Jae-young Kang, Youngho Kim, and Kuk- Jin Yoon. Ev-3dod: Pushing the temporal boundaries of 3d object detection with event cameras. InProceedings of the Computer Vision and Pattern Recognition Conference, pages 27197–27210, 2025. 3

work page 2025
[17]

Dense hand-object (ho) graspnet with full grasping taxonomy and dynamics

Woojin Cho, Jihyun Lee, Minjae Yi, Minje Kim, Taeyun Woo, Donghwan Kim, Taewook Ha, Hyokeun Lee, Je- Hwan Ryu, Woontack Woo, et al. Dense hand-object (ho) graspnet with full grasping taxonomy and dynamics. In European Conference on Computer Vision, pages 284–303. Springer, 2024. 5, 3

work page 2024
[18]

Real-time markerless tracking for augmented reality: the virtual visual servoing framework

Andrew I Comport, Eric Marchand, Muriel Pressigout, and Francois Chaumette. Real-time markerless tracking for augmented reality: the virtual visual servoing framework. IEEE Transactions on visualization and computer graph- ics, 12(4):615–628, 2006. 2

work page 2006
[19]

Interacting maps for fast vi- sual interpretation

Matthew Cook, Luca Gugelmann, Florian Jug, Christoph Krautz, and Angelika Steger. Interacting maps for fast vi- sual interpretation. InThe 2011 International Joint Con- ference on Neural Networks, pages 770–776. IEEE, 2011. 2

work page 2011
[20]

Robust 3d track- ing with descriptor fields

Alberto Crivellaro and Vincent Lepetit. Robust 3d track- ing with descriptor fields. InProceedings of the IEEE con- ference on computer vision and pattern recognition, pages 3414–3421, 2014. 2

work page 2014
[21]

Poserbpf: A rao–blackwellized particle filter for 6-d object pose tracking.IEEE Transac- tions on Robotics, 37(5):1328–1342, 2021

Xinke Deng, Arsalan Mousavian, Yu Xiang, Fei Xia, Tim- othy Bretl, and Dieter Fox. Poserbpf: A rao–blackwellized particle filter for 6-d object pose tracking.IEEE Transac- tions on Robotics, 37(5):1328–1342, 2021. 2

work page 2021
[22]

Blenderproc: Reducing the reality gap with photorealistic rendering

Maximilian Denninger, Martin Sundermeyer, Dominik Winkelbauer, Dmitry Olefir, Tomas Hodan, Youssef Zidan, Mohamad Elbadrawy, Markus Knauer, Harinandan Katam, and Ahsan Lodhi. Blenderproc: Reducing the reality gap with photorealistic rendering. In16th Robotics: Science and Systems, RSS 2020, Workshops, 2020. 5

work page 2020
[23]

Blenderproc2: A procedural pipeline for photorealistic ren- dering.Journal of Open Source Software, 8(82):4901,

Maximilian Denninger, Dominik Winkelbauer, Martin Sundermeyer, Wout Boerdijk, Markus Wendelin Knauer, Klaus H Strobl, Matthias Humt, and Rudolph Triebel. Blenderproc2: A procedural pipeline for photorealistic ren- dering.Journal of Open Source Software, 8(82):4901,

work page
[24]

Google scanned objects: A high-quality dataset of 3d scanned household items

Laura Downs, Anthony Francis, Nate Koenig, Brandon Kinman, Ryan Hickman, Krista Reymann, Thomas B McHugh, and Vincent Vanhoucke. Google scanned objects: A high-quality dataset of 3d scanned household items. In 2022 International Conference on Robotics and Automa- tion (ICRA), pages 2553–2560. Ieee, 2022. 5, 1, 3

work page 2022
[25]

Real-time visual tracking of complex structures.IEEE Transactions on Pattern Analysis & Machine Intelligence, 24(07):932–946,

Tom Drummond and Roberto Cipolla. Real-time visual tracking of complex structures.IEEE Transactions on Pattern Analysis & Machine Intelligence, 24(07):932–946,

work page
[26]

Pizza: A powerful image-only zero-shot zero- cad approach to 6 dof tracking

Yuming Du, Yang Xiao, Michael Ramamonjisoa, Vincent Lepetit, et al. Pizza: A powerful image-only zero-shot zero- cad approach to 6 dof tracking. In2022 International Con- ference on 3D Vision (3DV), pages 515–525. IEEE, 2022. 2

work page 2022
[27]

Rgb-de: Event camera calibration for fast 6-dof object tracking

Etienne Dubeau, Mathieu Garon, Benoit Debaque, Raoul de Charette, and Jean-Franc ¸ois Lalonde. Rgb-de: Event camera calibration for fast 6-dof object tracking. In2020 IEEE International Symposium on Mixed and Augmented Reality (ISMAR), pages 127–135. IEEE, 2020. 1, 2, 3, 5

work page 2020
[28]

Real-time 6-dof pose estimation by an event-based camera using active led markers

Gerald Ebmer, Adam Loch, Minh Nhat Vu, Roberto Mecca, Germain Haessig, Christian Hartl-Nesic, Markus Vincze, and Andreas Kugi. Real-time 6-dof pose estimation by an event-based camera using active led markers. InProceed- ings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pages 8137–8146, 2024. 3

work page 2024
[29]

Unsupervised event-based video reconstruction

Gereon Fox, Xingang Pan, Ayush Tewari, Mohamed El- gharib, and Christian Theobalt. Unsupervised event-based video reconstruction. InProceedings of the IEEE/CVF Win- ter Conference on Applications of Computer Vision, pages 4179–4188, 2024. 3

work page 2024
[30]

Unified temporal and spatial calibration for multi-sensor systems

Paul Furgale, Joern Rehder, and Roland Siegwart. Unified temporal and spatial calibration for multi-sensor systems. In2013 IEEE/RSJ International Conference on Intelligent Robots and Systems, pages 1280–1286. IEEE, 2013. 1

work page 2013
[31]

Comparative anal- ysis of optitrack motion capture systems

Joshua S Furtado, Hugh HT Liu, Gilbert Lai, Herve Lacheray, and Jason Desouza-Coelho. Comparative anal- ysis of optitrack motion capture systems. InAdvances in Motion Sensing and Control for Robotic Applications: Selected Papers from the Symposium on Mechatronics, Robotics, and Control (SMRC’18)-CSME International Congress 2018, May 27-30, 2018 Toronto, C...

work page 2018
[32]

Event-based vision: A survey.IEEE Trans- actions on Pattern Analysis and Machine Intelligence, 44 (1):154–180, 2020

Guillermo Gallego, Tobi Delbr ¨uck, Garrick Orchard, Chiara Bartolozzi, Brian Taba, Andrea Censi, Stefan Leutenegger, Andrew J Davison, J ¨org Conradt, Kostas Daniilidis, et al. Event-based vision: A survey.IEEE Trans- actions on Pattern Analysis and Machine Intelligence, 44 (1):154–180, 2020. 1

work page 2020
[33]

Deep 6-dof tracking.IEEE transactions on visualization and computer graphics, 23(11):2410–2418, 2017

Mathieu Garon and Jean-Franc ¸ois Lalonde. Deep 6-dof tracking.IEEE transactions on visualization and computer graphics, 23(11):2410–2418, 2017. 2

work page 2017
[34]

Low-latency au- tomotive vision with event cameras.Nature, 629(8014): 1034–1040, 2024

Daniel Gehrig and Davide Scaramuzza. Low-latency au- tomotive vision with event cameras.Nature, 629(8014): 1034–1040, 2024. 3

work page 2024
[35]

Asynchronous, photometric feature tracking using events and frames

Daniel Gehrig, Henri Rebecq, Guillermo Gallego, and Davide Scaramuzza. Asynchronous, photometric feature tracking using events and frames. InProceedings of the European Conference on Computer Vision (ECCV), pages 750–765, 2018. 2

work page 2018
[36]

Video to events: Recycling video datasets for event cameras

Daniel Gehrig, Mathias Gehrig, Javier Hidalgo-Carri ´o, and Davide Scaramuzza. Video to events: Recycling video datasets for event cameras. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 3586–3595, 2020. 1

work page 2020
[37]

Recurrent vision transformers for object detection with event cameras

Mathias Gehrig and Davide Scaramuzza. Recurrent vision transformers for object detection with event cameras. In Proceedings of the IEEE/CVF conference on computer vi- sion and pattern recognition, pages 13884–13893, 2023. 6

work page 2023
[38]

Dsec: A stereo event camera dataset for driving scenarios.IEEE Robotics and Automation Letters, 6(3):4947–4954, 2021

Mathias Gehrig, Willem Aarents, Daniel Gehrig, and Da- vide Scaramuzza. Dsec: A stereo event camera dataset for driving scenarios.IEEE Robotics and Automation Letters, 6(3):4947–4954, 2021. 3

work page 2021
[39]

E-raft: Dense optical flow from event cameras

Mathias Gehrig, Mario Millh ¨ausler, Daniel Gehrig, and Da- vide Scaramuzza. E-raft: Dense optical flow from event cameras. In2021 International Conference on 3D Vision (3DV), pages 197–206. IEEE, 2021. 3

work page 2021
[40]

Edopt: Event-camera 6-dof dynamic object pose tracking

Arren Glover, Luna Gava, Zhichao Li, and Chiara Bar- tolozzi. Edopt: Event-camera 6-dof dynamic object pose tracking. In2024 IEEE International Conference on Robotics and Automation (ICRA), pages 18200–18206. IEEE, 2024. 3

work page 2024
[41]

Edopt: Event-camera 6-dof dynamic object pose tracking

Arren Glover, Luna Gava, Zhichao Li, and Chiara Bar- tolozzi. Edopt: Event-camera 6-dof dynamic object pose tracking. In2024 IEEE International Conference on Robotics and Automation (ICRA), pages 18200–18206. IEEE, 2024. 5

work page 2024
[42]

Deio: Deep event inertial odometry

Weipeng Guan, Fuling Lin, Peiyu Chen, and Peng Lu. Deio: Deep event inertial odometry. InProceedings of the IEEE/CVF International Conference on Computer Vision, pages 4606–4615, 2025. 3

work page 2025
[43]

Handal: A dataset of real-world manipulable object cate- gories with pose annotations, affordances, and reconstruc- tions

Andrew Guo, Bowen Wen, Jianhe Yuan, Jonathan Trem- blay, Stephen Tyree, Jeffrey Smith, and Stan Birchfield. Handal: A dataset of real-world manipulable object cate- gories with pose annotations, affordances, and reconstruc- tions. In2023 IEEE/RSJ International Conference on In- telligent Robots and Systems (IROS), pages 11428–11435. IEEE, 2023. 1

work page 2023
[44]

Measuring depth accuracy in rgbd cameras

Hussein Haggag, Mohammed Hossny, Despina Filippidis, Douglas Creighton, Saeid Nahavandi, and Vinod Puri. Measuring depth accuracy in rgbd cameras. In2013, 7th international conference on signal processing and commu- nication systems (ICSPCS), pages 1–7. IEEE, 2013. 1

work page 2013
[45]

Etap: Event- based tracking of any point

Friedhelm Hamann, Daniel Gehrig, Filbert Febryanto, Kostas Daniilidis, and Guillermo Gallego. Etap: Event- based tracking of any point. InProceedings of the Com- puter Vision and Pattern Recognition Conference, pages 27186–27196, 2025. 6, 1, 4

work page 2025
[46]

Honnotate: A method for 3d annotation of hand and object poses

Shreyas Hampali, Mahdi Rad, Markus Oberweger, and Vin- cent Lepetit. Honnotate: A method for 3d annotation of hand and object poses. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 3196–3206, 2020. 2, 5, 1

work page 2020
[47]

Rapid-a video rate object tracker

Chris Harris and Carl Stennett. Rapid-a video rate object tracker. InBMVC, page 3, 1990. 2

work page 1990
[48]

E-pose: A large scale event camera dataset for object pose estimation.Scientific data, 12(1):245, 2025

Oussama Abdul Hay, Xiaoqian Huang, Abdulla Ayyad, Es- lam Sherif, Randa Almadhoun, Yusra Abdulrahman, Lak- mal Seneviratne, Abdulqader Abusafieh, and Yahya Zweiri. E-pose: A large scale event camera dataset for object pose estimation.Scientific data, 12(1):245, 2025. 1, 2, 3, 5

work page 2025
[49]

Learning monocular dense depth from events

Javier Hidalgo-Carri ´o, Daniel Gehrig, and Davide Scara- muzza. Learning monocular dense depth from events. In 2020 International Conference on 3D Vision (3DV), pages 534–542. IEEE, 2020. 3

work page 2020
[50]

Model based training, detection and pose estimation of texture-less 3d objects in heavily cluttered scenes

Stefan Hinterstoisser, Vincent Lepetit, Slobodan Ilic, Stefan Holzer, Gary Bradski, Kurt Konolige, and Nassir Navab. Model based training, detection and pose estimation of texture-less 3d objects in heavily cluttered scenes. InAsian conference on computer vision, pages 548–562. Springer,

work page
[51]

Bop challenge 2023 on detection segmentation and pose estimation of seen and unseen rigid objects

Tomas Hodan, Martin Sundermeyer, Yann Labbe, Van Nguyen Nguyen, Gu Wang, Eric Brachmann, Bertram Drost, Vincent Lepetit, Carsten Rother, and Jiri Matas. Bop challenge 2023 on detection segmentation and pose estimation of seen and unseen rigid objects. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 5610–5619, 2024. 1, 5

work page 2023
[52]

Ede-distill: Boosting event-based monocular depth estimation performance via knowledge distillation

Chenming Hu, Junjie Jiang, Yaohui Li, Mingyuan Sun, and Zheng Fang. Ede-distill: Boosting event-based monocular depth estimation performance via knowledge distillation. IEEE Robotics and Automation Letters, 2025. 3

work page 2025
[53]

Depth-based object tracking using a robust gaussian filter

Jan Issac, Manuel W ¨uthrich, Cristina Garcia Cifuentes, Jeannette Bohg, Sebastian Trimpe, and Stefan Schaal. Depth-based object tracking using a robust gaussian filter. In2016 IEEE international conference on robotics and au- tomation (ICRA), pages 608–615. IEEE, 2016. 2

work page 2016
[54]

To- wards robust event-based networks for nighttime via un- paired day-to-night event translation

Yuhwan Jeong, Hoonhee Cho, and Kuk-Jin Yoon. To- wards robust event-based networks for nighttime via un- paired day-to-night event translation. InEuropean Confer- ence on Computer Vision, pages 286–306. Springer, 2024. 3

work page 2024
[55]

What mat- ters in unsupervised optical flow

Rico Jonschkowski, Austin Stone, Jonathan T Barron, Ariel Gordon, Kurt Konolige, and Anelia Angelova. What mat- ters in unsupervised optical flow. InEuropean conference on computer vision, pages 557–572. Springer, 2020. 4

work page 2020
[56]

Tem- poral stereo matching from event cameras via joint learning with stereoscopic flow.IEEE Transactions on Pattern Anal- ysis and Machine Intelligence, 2025

Jae-Young Kang, Hoonhee Cho, and Kuk-Jin Yoon. Tem- poral stereo matching from event cameras via joint learning with stereoscopic flow.IEEE Transactions on Pattern Anal- ysis and Machine Intelligence, 2025. 3

work page 2025
[57]

Un- leashing the temporal potential of stereo event cameras for continuous-time 3d object detection

Jae-Young Kang, Hoonhee Cho, and Kuk-Jin Yoon. Un- leashing the temporal potential of stereo event cameras for continuous-time 3d object detection. InProceedings of the IEEE/CVF International Conference on Computer Vision, pages 6869–6881, 2025. 3

work page 2025
[58]

Intel realsense stereo- scopic depth cameras

Leonid Keselman, John Iselin Woodfill, Anders Grunnet- Jepsen, and Achintya Bhowmik. Intel realsense stereo- scopic depth cameras. InProceedings of the IEEE con- ference on computer vision and pattern recognition work- shops, pages 1–10, 2017. 1

work page 2017
[59]

Simultaneous mosaicing and track- ing with an event camera.J

Hanme Kim, Ankur Handa, Ryad Benosman, Sio-Hoi Ieng, and Andrew J Davison. Simultaneous mosaicing and track- ing with an event camera.J. Solid State Circ, 43:566–576,

work page
[60]

Real-time 3d reconstruction and 6-dof tracking with an event camera

Hanme Kim, Stefan Leutenegger, and Andrew J Davison. Real-time 3d reconstruction and 6-dof tracking with an event camera. InEuropean conference on computer vision, pages 349–364. Springer, 2016. 2, 3

work page 2016
[61]

Segment anything

Alexander Kirillov, Eric Mintun, Nikhila Ravi, Hanzi Mao, Chloe Rolland, Laura Gustafson, Tete Xiao, Spencer Whitehead, Alexander C Berg, Wan-Yen Lo, et al. Segment anything. InProceedings of the IEEE/CVF international conference on computer vision, pages 4015–4026, 2023. 2

work page 2023
[62]

Deep event visual odometry

Simon Klenk, Marvin Motzet, Lukas Koestler, and Daniel Cremers. Deep event visual odometry. In2024 Inter- national conference on 3D vision (3DV), pages 739–749. IEEE, 2024. 1

work page 2024
[63]

Cosypose: Consistent multi-view multi-object 6d pose estimation

Yann Labb ´e, Justin Carpentier, Mathieu Aubry, and Josef Sivic. Cosypose: Consistent multi-view multi-object 6d pose estimation. InEuropean conference on computer vi- sion, pages 574–591. Springer, 2020. 2, 4

work page 2020
[64]

arXiv preprint arXiv:2212.06870 (2022)

Yann Labb ´e, Lucas Manuelli, Arsalan Mousavian, Stephen Tyree, Stan Birchfield, Jonathan Tremblay, Justin Carpen- tier, Mathieu Aubry, Dieter Fox, and Josef Sivic. Mega- pose: 6d pose estimation of novel objects via render & com- pare.arXiv preprint arXiv:2212.06870, 2022. 1, 2, 3, 4, 6, 7

work page arXiv 2022
[65]

Category-level metric scale object shape and pose estimation.IEEE Robotics and Automation Letters, 6(4): 8575–8582, 2021

Taeyeop Lee, Byeong-Uk Lee, Myungchul Kim, and In So Kweon. Category-level metric scale object shape and pose estimation.IEEE Robotics and Automation Letters, 6(4): 8575–8582, 2021. 1, 2

work page 2021
[66]

Tta-cope: Test-time adaptation for category-level object pose estimation

Taeyeop Lee, Jonathan Tremblay, Valts Blukis, Bowen Wen, Byeong-Uk Lee, Inkyu Shin, Stan Birchfield, In So Kweon, and Kuk-Jin Yoon. Tta-cope: Test-time adaptation for category-level object pose estimation. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 21285–21295, 2023. 2

work page 2023
[67]

Delta: Demonstration and language- guided novel transparent object manipulation.arXiv preprint arXiv:2510.05662, 2025

Taeyeop Lee, Gyuree Kang, Bowen Wen, Youngho Kim, Seunghyeok Back, In So Kweon, David Hyunchul Shim, and Kuk-Jin Yoon. Delta: Demonstration and language- guided novel transparent object manipulation.arXiv preprint arXiv:2510.05662, 2025. 1

work page arXiv 2025
[68]

Any6d: Model-free 6d pose estimation of novel objects

Taeyeop Lee, Bowen Wen, Minjun Kang, Gyuree Kang, In So Kweon, and Kuk-Jin Yoon. Any6d: Model-free 6d pose estimation of novel objects. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 11633–11643, 2025. 2

work page 2025
[69]

Ep n p: An accurate o (n) solution to the p n p problem

Vincent Lepetit, Francesc Moreno-Noguer, and Pascal Fua. Ep n p: An accurate o (n) solution to the p n p problem. International Journal of Computer Vision, 81(2):155–166,

work page
[70]

Epnp: An accurate o (n) solution to the p n p problem.Inter- national Journal of Computer Vision, 81(2):155–166, 2009

Vincent Lepetit, Francesc Moreno-Noguer, and Pascal Fua. Epnp: An accurate o (n) solution to the p n p problem.Inter- national Journal of Computer Vision, 81(2):155–166, 2009. 4

work page 2009
[71]

Deepim: Deep iterative matching for 6d pose estimation

Yi Li, Gu Wang, Xiangyang Ji, Yu Xiang, and Dieter Fox. Deepim: Deep iterative matching for 6d pose estimation. InProceedings of the European conference on computer vi- sion (ECCV), pages 683–698, 2018. 2, 4

work page 2018
[72]

6-dof object tracking with event-based optical flow and frames

Zhichao Li, Arren Glover, Chiara Bartolozzi, and Lorenzo Natale. 6-dof object tracking with event-based optical flow and frames. In2025 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pages 18880– 18887. IEEE, 2025. 3

work page 2025
[73]

Sam-6d: Segment anything model meets zero-shot 6d object pose es- timation

Jiehong Lin, Lihua Liu, Dekun Lu, and Kui Jia. Sam-6d: Segment anything model meets zero-shot 6d object pose es- timation. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 27906– 27916, 2024. 1

work page 2024
[74]

Keypoint-based category-level object pose tracking from an rgb sequence with uncertainty estimation

Yunzhi Lin, Jonathan Tremblay, Stephen Tyree, Patricio A Vela, and Stan Birchfield. Keypoint-based category-level object pose tracking from an rgb sequence with uncertainty estimation. In2022 International Conference on Robotics and Automation (ICRA), pages 1258–1264. IEEE, 2022. 2

work page 2022
[75]

Spatiotemporal registration for event-based visual odometry

Daqi Liu, Alvaro Parra, and Tat-Jun Chin. Spatiotemporal registration for event-based visual odometry. InProceed- ings of the IEEE/CVF conference on computer vision and pattern recognition, pages 4937–4946, 2021. 3

work page 2021
[76]

High-rate monocu- lar depth estimation via cross frame-rate collaboration of frames and events.International Journal of Computer Vi- sion, 133(10):7332–7351, 2025

Xu Liu, Xiaopeng Fan, Jianing Li, Dianze Li, Wei Zhang, Zhengyu Ma, and Yonghong Tian. High-rate monocu- lar depth estimation via cross frame-rate collaboration of frames and events.International Journal of Computer Vi- sion, 133(10):7332–7351, 2025. 3

work page 2025
[77]

T-esvo: improved event-based stereo visual odome- try via adaptive time-surface and truncated signed distance function.Advanced Intelligent Systems, 5(9):2300027,

Zhe Liu, Dianxi Shi, Ruihao Li, Yi Zhang, and Shaowu Yang. T-esvo: improved event-based stereo visual odome- try via adaptive time-surface and truncated signed distance function.Advanced Intelligent Systems, 5(9):2300027,

work page
[78]

Optical flow-guided 6dof ob- ject pose tracking with an event camera

Zibin Liu, Banglei Guan, Yang Shang, Shunkun Liang, Zhenbao Yu, and Qifeng Yu. Optical flow-guided 6dof ob- ject pose tracking with an event camera. InProceedings of the 32nd ACM International Conference on Multimedia, pages 6501–6509, 2024. 3

work page 2024
[79]

Line-based 6-dof object pose estimation and tracking with an event camera.IEEE Transactions on Im- age Processing, 33:4765–4780, 2024

Zibin Liu, Banglei Guan, Yang Shang, Qifeng Yu, and Lau- rent Kneip. Line-based 6-dof object pose estimation and tracking with an event camera.IEEE Transactions on Im- age Processing, 33:4765–4780, 2024. 5

work page 2024
[80]

Line-based 6-dof object pose estimation and tracking with an event camera.IEEE Transactions on Im- age Processing, 33:4765–4780, 2024

Zibin Liu, Banglei Guan, Yang Shang, Qifeng Yu, and Lau- rent Kneip. Line-based 6-dof object pose estimation and tracking with an event camera.IEEE Transactions on Im- age Processing, 33:4765–4780, 2024. 3

work page 2024

Showing first 80 references.