Flow6D: Discrete-to-Continuous Flow Matching for Efficient and Accurate Category-Level 6D Pose Estimation

Han Sun; Huiliang Shen; Li Zhang; Mingyu Mei; Xinyue Zhao; Zaixing He; Zibo Dai

arxiv: 2606.23293 · v1 · pith:ROQ32WVUnew · submitted 2026-06-22 · 💻 cs.CV · cs.RO

Flow6D: Discrete-to-Continuous Flow Matching for Efficient and Accurate Category-Level 6D Pose Estimation

Mingyu Mei , Li Zhang , Zibo Dai , Han Sun , Xinyue Zhao , Huiliang Shen , Zaixing He This is my paper

Pith reviewed 2026-06-26 08:37 UTC · model grok-4.3

classification 💻 cs.CV cs.RO

keywords 6D pose estimationflow matchingcategory-leveldiscrete-to-continuouslatent space localizationarticulated objectsreal-time inference

0 comments

The pith

Flow6D narrows 6D pose search with discrete flow matching then refines via continuous residuals.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents Flow6D as a way to handle the accuracy and speed problems in category-level 6D pose estimation. Rotation and translation values are first placed into discrete bins so a flow matching model can restrict the latent space to regions near the correct pose. A second continuous flow matching stage then samples inside that restricted space to predict small corrections and arrive at a final accurate pose. This two-stage structure also supports extension to objects with movable parts and runs at real-time speeds on both synthetic and real data.

Core claim

By first discretizing rotation and translation parameters into bins, a discrete flow matching model can lock the latent space around the true pose and thereby shrink the search space; sampling from that localized space then lets a continuous flow matching model predict local residuals that regress to an accurate final pose.

What carries the argument

The two-stage discrete latent space localization followed by continuous pose regression using flow matching models.

If this is right

The method outperforms prior state-of-the-art approaches on both synthetic and real datasets for category-level 6D pose estimation.
Inference reaches real-time speeds of 70 frames per second.
The same hierarchical structure applies directly to articulated objects without further redesign.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Faster and more robust pose estimates could reduce planning time in robotic manipulation pipelines that rely on category-level recognition.
The discrete-then-continuous pattern may generalize to other high-dimensional continuous regression tasks where exhaustive search is prohibitive.
If binning errors prove common under real-world lighting or occlusion, adding a small number of overlapping bins could serve as a low-cost safeguard.

Load-bearing premise

Discretizing rotation and translation into bins lets the discrete flow stage reliably center its latent distribution on the true pose without systematic misses caused by bin boundaries or sensor noise.

What would settle it

A benchmark dataset in which added sensor noise or fine bin boundaries cause the true pose to lie outside every bin chosen by the discrete stage, after which the continuous stage cannot recover an accurate estimate.

Figures

Figures reproduced from arXiv: 2606.23293 by Han Sun, Huiliang Shen, Li Zhang, Mingyu Mei, Xinyue Zhao, Zaixing He, Zibo Dai.

**Figure 1.** Figure 1: Comparison between prior candidate-based pose estimation pipelines and our approach. (a) Previous methods depend on bruteforce and limited candidate ranking, incurring high cost and accuracy limitations. (b) Our method adopts latent-space localization and continuous pose regression, achieving higher accuracy and faster speed. To address the aforementioned challenges, unlike prior work [35], which relies … view at source ↗

**Figure 2.** Figure 2: Overview of our two-stage pose estimation framework. Stage I performs discrete anchor-bin probability prediction by uniformly sampling rotation and translation spaces and selecting an anchor pose via discrete flow matching. Stage II optimizes the pose via continuous flow matching with adaptive latent pose sampling, enabling fine-grained, gap-free pose regression for accurate final estimation. B. Latent Spa… view at source ↗

**Figure 3.** Figure 3: Results on the real-world REAL275 Datasets, and red and green 3D boxes represent ground truth and our predictions, respectively. IV. EXPERIMENTS A. Experimental Settings Dataset. Our method is designed to handle both rigid and articulated objects and is evaluated on a diverse set of synthetic and real-world datasets. Concretely, CAMERA25 [29] and ArtImage [34] are used for evaluation of the synthetic data… view at source ↗

**Figure 4.** Figure 4: Results on the ArtImage Dataset. and its output pose results can provide reliable support for practical applications such as robotic grasping. C. Ablation Study We conduct experiments on the ArtImage dataset (base part of the Laptop category) to evaluate the impact of different design choices in our two-stage framework. Discrete Bin Size. The discrete flow matching model for latent space localization relie… view at source ↗

**Figure 5.** Figure 5 [PITH_FULL_IMAGE:figures/full_fig_p007_5.png] view at source ↗

read the original abstract

6D pose estimation is a key task in computer vision and embodied AI, widely used in robotic manipulation, augmented reality, etc. Existing methods directly regress in a high-dimensional continuous space, facing two key challenges in category-level pose estimation: limited accuracy due to noise and local optima, and inefficient search over an infinite space that hinders real-time performance. This paper proposes Flow6D, a hierarchical flow matching framework with a two-stage discrete latent space localization-continuous pose regression strategy. Rotation and translation parameters are first discretized into bins, with a discrete flow matching model locking the latent space around the true pose to reduce search complexity. Then, by sampling in the latent space, a continuous flow matching model predicts local pose residuals to optimize the estimate and regress to an accurate pose. The framework also naturally extends to articulated objects, outperforming state-of-the-art methods on synthetic and real datasets with real-time inference at 70 FPS. Project website: https://flow6d.github.io/.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Flow6D's two-stage discrete-to-continuous flow matching is a practical attempt to fix the efficiency-accuracy tradeoff in category-level 6D pose, but the abstract leaves the discretization step's reliability unproven.

read the letter

The paper's core idea is a hierarchical flow matching setup that first bins rotation and translation parameters into a discrete latent space to localize around the true pose, then samples from there with a continuous flow matching stage to regress residuals. This is presented as a direct response to the problems of noise, local optima, and slow search in high-dimensional continuous pose estimation.

What stands out is the explicit two-stage pipeline and the claim that it extends naturally to articulated objects while hitting 70 FPS. Those are concrete engineering targets for robotics and AR applications, and the abstract frames the discrete stage as a way to shrink the search space before refinement.

The main limitation is that the description stops at the abstract level. There are no equations, no training details, no ablation tables, and no error analysis to show whether the binning step actually avoids systematic misses when sensor noise or discretization granularity pushes the true pose into the wrong bin. The outperformance claims on synthetic and real data therefore rest on unexamined implementation choices.

This work is aimed at computer vision researchers who already work on category-level pose or flow-based generative models for robotics. Someone looking for a new way to combine discrete and continuous matching would find the structure worth examining.

It deserves peer review. The idea is specific enough that referees can test the discretization assumption and the speed-accuracy numbers directly, even if the current write-up is high-level.

Referee Report

1 major / 0 minor

Summary. The paper proposes Flow6D, a hierarchical flow matching framework for category-level 6D pose estimation using a two-stage discrete-to-continuous strategy. Rotation and translation parameters are discretized into bins; a discrete flow matching model first localizes the latent space around the true pose, after which a continuous flow matching model predicts local pose residuals via sampling. The method is claimed to outperform state-of-the-art approaches on synthetic and real datasets, run at 70 FPS, and extend naturally to articulated objects.

Significance. If the two-stage pipeline is shown to work, the approach could meaningfully improve both accuracy and real-time performance in category-level 6D pose estimation by shrinking the effective search space while retaining continuous refinement, with direct relevance to robotics and AR. The claimed extension to articulated objects would further broaden its utility.

major comments (1)

Abstract: The central claim that binning rotation/translation parameters enables the discrete flow matching stage to reliably localize the latent distribution around the true pose rests on an unverified assumption; no analysis, equations, or experiments are supplied to quantify discretization error, bin size effects, or robustness to sensor noise, leaving open the possibility of systematic misses that would undermine the subsequent continuous stage.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for their review and for highlighting the need for stronger justification of the discrete stage. We address the major comment below and will revise the manuscript accordingly.

read point-by-point responses

Referee: Abstract: The central claim that binning rotation/translation parameters enables the discrete flow matching stage to reliably localize the latent distribution around the true pose rests on an unverified assumption; no analysis, equations, or experiments are supplied to quantify discretization error, bin size effects, or robustness to sensor noise, leaving open the possibility of systematic misses that would undermine the subsequent continuous stage.

Authors: We agree that the abstract states the localization benefit of discretization without accompanying analysis. In the revised manuscript we will add a dedicated subsection (and supporting equations) that derives the expected localization radius as a function of bin width, reports empirical success rates of the discrete stage in placing the true pose inside the continuous refinement window, and includes ablation experiments on bin size and additive sensor noise. These additions will quantify the discretization error and demonstrate that systematic misses remain below the threshold handled by the continuous stage. revision: yes

Circularity Check

0 steps flagged

No significant circularity

full rationale

The paper introduces a new two-stage hierarchical flow-matching pipeline (discrete binning of rotation/translation parameters to localize latent space, followed by continuous residual regression). No equations or claims in the abstract reduce a derived quantity to a fitted input by construction, nor do they rely on self-citation chains or uniqueness theorems imported from prior author work. The method is presented as an original architecture whose performance claims are empirical rather than tautological. This is the normal case of a self-contained technical proposal.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Only abstract available; no explicit free parameters, axioms, or invented entities are stated. Bin sizes and flow model architectures are implicit but unspecified.

pith-pipeline@v0.9.1-grok · 5723 in / 963 out tokens · 18055 ms · 2026-06-26T08:37:46.910984+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

40 extracted references · 1 linked inside Pith

[1]

Structured denoising diffusion models in dis- crete state-spaces

Jacob Austin et al. “Structured denoising diffusion models in dis- crete state-spaces”. In:Advances in Neural Information Processing Systems34 (2021), pp. 17981–17993

2021
[2]

Sgpa: Structure-guided prior adaptation for category-level 6d object pose estimation

Kai Chen and Qi Dou. “Sgpa: Structure-guided prior adaptation for category-level 6d object pose estimation”. In:Proceedings of the IEEE/CVF International Conference on Computer Vision. 2021, pp. 2773–2782

2021
[3]

Secondpose: Se (3)-consistent dual-stream feature fusion for category-level pose estimation

Yamei Chen et al. “Secondpose: Se (3)-consistent dual-stream feature fusion for category-level pose estimation”. In:Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2024, pp. 9959–9969

2024
[4]

The cityscapes dataset for semantic urban scene understanding

Marius Cordts et al. “The cityscapes dataset for semantic urban scene understanding”. In:Proceedings of the IEEE conference on computer vision and pattern recognition. 2016, pp. 3213–3223

2016
[5]

Gpv-pose: Category-level object pose estimation via geometry-guided point-wise voting

Yan Di et al. “Gpv-pose: Category-level object pose estimation via geometry-guided point-wise voting”. In:Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2022, pp. 6781–6791

2022
[6]

Discrete flow matching

Itai Gat et al. “Discrete flow matching”. In:Advances in Neural Information Processing Systems37 (2024), pp. 133345–133385

2024
[7]

Mask r-cnn

Kaiming He et al. “Mask r-cnn”. In:Proceedings of the IEEE international conference on computer vision. 2017, pp. 2961–2969

2017
[8]

Walking with augmented reality: A prelimi- nary assessment of visual feedback with a cable-driven active leg exoskeleton (C-ALEX)

Rand Hidayah et al. “Walking with augmented reality: A prelimi- nary assessment of visual feedback with a cable-driven active leg exoskeleton (C-ALEX)”. In:IEEE Robotics and Automation Letters 4.4 (2019), pp. 3948–3954

2019
[9]

RayPose: Ray Bundling Diffusion for Template Views in Unseen 6D Object Pose Estimation

Junwen Huang et al. “RayPose: Ray Bundling Diffusion for Template Views in Unseen 6D Object Pose Estimation”. In:Proceedings of the IEEE/CVF International Conference on Computer Vision. 2025, pp. 9102–9112

2025
[10]

Se (3) diffusion model-based point cloud registration for robust 6d object pose estimation

Haobo Jiang et al. “Se (3) diffusion model-based point cloud registration for robust 6d object pose estimation”. In:Advances in Neural Information Processing Systems36 (2023), pp. 21285–21297

2023
[11]

A Passive Power-Based Control Strategy for pHRI Tasks With Omni-Directional Robotic Mobile Platforms

Theodora Kastritsi and Arash Ajoudani. “A Passive Power-Based Control Strategy for pHRI Tasks With Omni-Directional Robotic Mobile Platforms”. In:IEEE Robotics and Automation Letters9.8 (2024), pp. 6959–6966

2024
[12]

Design and implementation of a ferrofluid- based liquid robot for small-scale manipulation

Fanxing Kong et al. “Design and implementation of a ferrofluid- based liquid robot for small-scale manipulation”. In:IEEE Robotics and Automation Letters9.4 (2023), pp. 3060–3067

2023
[13]

Gce-pose: Global context enhancement for category-level object pose estimation

Weihang Li et al. “Gce-pose: Global context enhancement for category-level object pose estimation”. In:Proceedings of the Com- puter Vision and Pattern Recognition Conference. 2025, pp. 27154– 27165

2025
[14]

Category-level articulated object pose estima- tion

Xiaolong Li et al. “Category-level articulated object pose estima- tion”. In:Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2020, pp. 3706–3715

2020
[15]

Category-level 6d object pose and size estimation using self-supervised deep prior deformation networks

Jiehong Lin et al. “Category-level 6d object pose and size estimation using self-supervised deep prior deformation networks”. In:Euro- pean Conference on Computer Vision. Springer. 2022, pp. 19–34

2022
[16]

Dualposenet: Category-level 6d object pose and size estimation using dual pose network with refined learning of pose consistency

Jiehong Lin et al. “Dualposenet: Category-level 6d object pose and size estimation using dual pose network with refined learning of pose consistency”. In:Proceedings of the IEEE/CVF International Conference on Computer Vision. 2021, pp. 3560–3569

2021
[17]

Vi-net: Boosting category-level 6d object pose estimation via learning decoupled rotations on the spherical represen- tations

Jiehong Lin et al. “Vi-net: Boosting category-level 6d object pose estimation via learning decoupled rotations on the spherical represen- tations”. In:Proceedings of the IEEE/CVF international conference on computer vision. 2023, pp. 14001–14011

2023
[18]

Flow matching for generative modeling

Yaron Lipman et al. “Flow matching for generative modeling”. In: arXiv preprint arXiv:2210.02747(2022)

Pith/arXiv arXiv 2022
[19]

Diff9d: Diffusion-based domain-generalized category- level 9-dof object pose estimation

Jian Liu et al. “Diff9d: Diffusion-based domain-generalized category- level 9-dof object pose estimation”. In:IEEE Transactions on Pattern Analysis and Machine Intelligence(2025)

2025
[20]

Monodiff9d: Monocular category-level 9d object pose estimation via diffusion model

Jian Liu et al. “Monodiff9d: Monocular category-level 9d object pose estimation via diffusion model”. In:2025 IEEE International Con- ference on Robotics and Automation (ICRA). IEEE. 2025, pp. 8687– 8694

2025
[21]

Category-Level Articulated Object 9D Pose Estima- tion via Reinforcement Learning

Liu Liu et al. “Category-Level Articulated Object 9D Pose Estima- tion via Reinforcement Learning”. In:Proceedings of the 31st ACM International Conference on Multimedia. 2023, pp. 728–736

2023
[22]

Toward real-world category-level articulation pose estimation

Liu Liu et al. “Toward real-world category-level articulation pose estimation”. In:IEEE Transactions on Image Processing31 (2022), pp. 1072–1083

2022
[23]

Category-level 6D pose estimation using geometry- guided instance-aware prior and multi-stage reconstruction

Tong Nie et al. “Category-level 6D pose estimation using geometry- guided instance-aware prior and multi-stage reconstruction”. In: IEEE Robotics and Automation Letters8.4 (2023), pp. 2381–2388

2023
[24]

Self-supervised category-level 6D object pose estimation with deep implicit shape representation

Wanli Peng et al. “Self-supervised category-level 6D object pose estimation with deep implicit shape representation”. In:Proceedings of the AAAI Conference on Artificial Intelligence. V ol. 36. 2. 2022, pp. 2082–2090

2022
[25]

Pointnet++: Deep hierarchical feature learning on point sets in a metric space

Charles Ruizhongtai Qi et al. “Pointnet++: Deep hierarchical feature learning on point sets in a metric space”. In:Advances in neural information processing systems30 (2017)

2017
[26]

i2c-net: Using instance-level neural networks for monocular category-level 6D pose estimation

Alberto Remus et al. “i2c-net: Using instance-level neural networks for monocular category-level 6D pose estimation”. In:IEEE Robotics and Automation Letters8.3 (2023), pp. 1515–1522

2023
[27]

Denoising Dif- fusion Implicit Models

Jiaming Song, Chenlin Meng, and Stefano Ermon. “Denoising Dif- fusion Implicit Models”. In:International Conference on Learning Representations. 2021

2021
[28]

Language-Embedded 6D Pose Estimation for Tool Manipulation

Yuyang Tu et al. “Language-Embedded 6D Pose Estimation for Tool Manipulation”. In:IEEE Robotics and Automation Letters(2025)

2025
[29]

Normalized object coordinate space for category- level 6d object pose and size estimation

He Wang et al. “Normalized object coordinate space for category- level 6d object pose and size estimation”. In:Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2019, pp. 2642–2651

2019
[30]

Category-level 6d object pose estimation via cascaded relation and recurrent reconstruction net- works

Jiaze Wang, Kai Chen, and Qi Dou. “Category-level 6d object pose estimation via cascaded relation and recurrent reconstruction net- works”. In:2021 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE. 2021, pp. 4807–4814

2021
[31]

Di2Pose: Discrete Diffusion Model for Occluded 3D Human Pose Estimation

Weiquan Wang et al. “Di2Pose: Discrete Diffusion Model for Occluded 3D Human Pose Estimation”. In:Advances in Neural Information Processing Systems37 (2024), pp. 98717–98741

2024
[32]

Captra: Category-level pose tracking for rigid and articulated objects from point clouds

Yijia Weng et al. “Captra: Category-level pose tracking for rigid and articulated objects from point clouds”. In:Proceedings of the IEEE/CVF International Conference on Computer Vision. 2021, pp. 13209–13218

2021
[33]

6d-diff: A keypoint diffusion framework for 6d object pose estimation

Li Xu et al. “6d-diff: A keypoint diffusion framework for 6d object pose estimation”. In:Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 2024, pp. 9676–9686

2024
[34]

OMAD: Object Model with Articulated Defor- mations for Pose Estimation and Retrieval

Han Xue et al. “OMAD: Object Model with Articulated Defor- mations for Pose Estimation and Retrieval”. In:arXiv preprint arXiv:2112.07334(2021)

arXiv 2021
[35]

Generative category- level object pose estimation via diffusion models

Jiyao Zhang, Mingdong Wu, and Hao Dong. “Generative category- level object pose estimation via diffusion models”. In:Advances in Neural Information Processing Systems36 (2023), pp. 54627–54644

2023
[36]

GaPT-DAR: Category-level Garments Pose Tracking via Integrated 2D Deformation and 3D Reconstruction

Li Zhang et al. “GaPT-DAR: Category-level Garments Pose Tracking via Integrated 2D Deformation and 3D Reconstruction”. In:Proceed- ings of the Computer Vision and Pattern Recognition Conference. 2025, pp. 22638–22647

2025
[37]

Rˆ 2-Art: Category-Level Articulation Pose Es- timation from Single RGB Image via Cascade Render Strategy

Li Zhang et al. “Rˆ 2-Art: Category-Level Articulation Pose Es- timation from Single RGB Image via Cascade Render Strategy”. In:Proceedings of the AAAI Conference on Artificial Intelligence. V ol. 39. 9. 2025, pp. 9985–9993

2025
[38]

U-COPE: Taking a Further Step to Universal 9D Category-Level Object Pose Estimation

Li Zhang et al. “U-COPE: Taking a Further Step to Universal 9D Category-Level Object Pose Estimation”. In:European Conference on Computer Vision. Springer. 2025, pp. 254–270

2025
[39]

Rbp-pose: Residual bounding box projection for category-level pose estimation

Ruida Zhang et al. “Rbp-pose: Residual bounding box projection for category-level pose estimation”. In:Computer Vision–ECCV 2022: 17th European Conference, Tel Aviv, Israel, October 23–27, 2022, Proceedings, Part I. Springer. 2022, pp. 655–672

2022
[40]

6d-vit: Category-level 6d object pose estimation via transformer-based instance representation learning

Lu Zou et al. “6d-vit: Category-level 6d object pose estimation via transformer-based instance representation learning”. In:IEEE Transactions on Image Processing31 (2022), pp. 6907–6921

2022

[1] [1]

Structured denoising diffusion models in dis- crete state-spaces

Jacob Austin et al. “Structured denoising diffusion models in dis- crete state-spaces”. In:Advances in Neural Information Processing Systems34 (2021), pp. 17981–17993

2021

[2] [2]

Sgpa: Structure-guided prior adaptation for category-level 6d object pose estimation

Kai Chen and Qi Dou. “Sgpa: Structure-guided prior adaptation for category-level 6d object pose estimation”. In:Proceedings of the IEEE/CVF International Conference on Computer Vision. 2021, pp. 2773–2782

2021

[3] [3]

Secondpose: Se (3)-consistent dual-stream feature fusion for category-level pose estimation

Yamei Chen et al. “Secondpose: Se (3)-consistent dual-stream feature fusion for category-level pose estimation”. In:Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2024, pp. 9959–9969

2024

[4] [4]

The cityscapes dataset for semantic urban scene understanding

Marius Cordts et al. “The cityscapes dataset for semantic urban scene understanding”. In:Proceedings of the IEEE conference on computer vision and pattern recognition. 2016, pp. 3213–3223

2016

[5] [5]

Gpv-pose: Category-level object pose estimation via geometry-guided point-wise voting

Yan Di et al. “Gpv-pose: Category-level object pose estimation via geometry-guided point-wise voting”. In:Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2022, pp. 6781–6791

2022

[6] [6]

Discrete flow matching

Itai Gat et al. “Discrete flow matching”. In:Advances in Neural Information Processing Systems37 (2024), pp. 133345–133385

2024

[7] [7]

Mask r-cnn

Kaiming He et al. “Mask r-cnn”. In:Proceedings of the IEEE international conference on computer vision. 2017, pp. 2961–2969

2017

[8] [8]

Walking with augmented reality: A prelimi- nary assessment of visual feedback with a cable-driven active leg exoskeleton (C-ALEX)

Rand Hidayah et al. “Walking with augmented reality: A prelimi- nary assessment of visual feedback with a cable-driven active leg exoskeleton (C-ALEX)”. In:IEEE Robotics and Automation Letters 4.4 (2019), pp. 3948–3954

2019

[9] [9]

RayPose: Ray Bundling Diffusion for Template Views in Unseen 6D Object Pose Estimation

Junwen Huang et al. “RayPose: Ray Bundling Diffusion for Template Views in Unseen 6D Object Pose Estimation”. In:Proceedings of the IEEE/CVF International Conference on Computer Vision. 2025, pp. 9102–9112

2025

[10] [10]

Se (3) diffusion model-based point cloud registration for robust 6d object pose estimation

Haobo Jiang et al. “Se (3) diffusion model-based point cloud registration for robust 6d object pose estimation”. In:Advances in Neural Information Processing Systems36 (2023), pp. 21285–21297

2023

[11] [11]

A Passive Power-Based Control Strategy for pHRI Tasks With Omni-Directional Robotic Mobile Platforms

Theodora Kastritsi and Arash Ajoudani. “A Passive Power-Based Control Strategy for pHRI Tasks With Omni-Directional Robotic Mobile Platforms”. In:IEEE Robotics and Automation Letters9.8 (2024), pp. 6959–6966

2024

[12] [12]

Design and implementation of a ferrofluid- based liquid robot for small-scale manipulation

Fanxing Kong et al. “Design and implementation of a ferrofluid- based liquid robot for small-scale manipulation”. In:IEEE Robotics and Automation Letters9.4 (2023), pp. 3060–3067

2023

[13] [13]

Gce-pose: Global context enhancement for category-level object pose estimation

Weihang Li et al. “Gce-pose: Global context enhancement for category-level object pose estimation”. In:Proceedings of the Com- puter Vision and Pattern Recognition Conference. 2025, pp. 27154– 27165

2025

[14] [14]

Category-level articulated object pose estima- tion

Xiaolong Li et al. “Category-level articulated object pose estima- tion”. In:Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2020, pp. 3706–3715

2020

[15] [15]

Category-level 6d object pose and size estimation using self-supervised deep prior deformation networks

Jiehong Lin et al. “Category-level 6d object pose and size estimation using self-supervised deep prior deformation networks”. In:Euro- pean Conference on Computer Vision. Springer. 2022, pp. 19–34

2022

[16] [16]

Dualposenet: Category-level 6d object pose and size estimation using dual pose network with refined learning of pose consistency

Jiehong Lin et al. “Dualposenet: Category-level 6d object pose and size estimation using dual pose network with refined learning of pose consistency”. In:Proceedings of the IEEE/CVF International Conference on Computer Vision. 2021, pp. 3560–3569

2021

[17] [17]

Vi-net: Boosting category-level 6d object pose estimation via learning decoupled rotations on the spherical represen- tations

Jiehong Lin et al. “Vi-net: Boosting category-level 6d object pose estimation via learning decoupled rotations on the spherical represen- tations”. In:Proceedings of the IEEE/CVF international conference on computer vision. 2023, pp. 14001–14011

2023

[18] [18]

Flow matching for generative modeling

Yaron Lipman et al. “Flow matching for generative modeling”. In: arXiv preprint arXiv:2210.02747(2022)

Pith/arXiv arXiv 2022

[19] [19]

Diff9d: Diffusion-based domain-generalized category- level 9-dof object pose estimation

Jian Liu et al. “Diff9d: Diffusion-based domain-generalized category- level 9-dof object pose estimation”. In:IEEE Transactions on Pattern Analysis and Machine Intelligence(2025)

2025

[20] [20]

Monodiff9d: Monocular category-level 9d object pose estimation via diffusion model

Jian Liu et al. “Monodiff9d: Monocular category-level 9d object pose estimation via diffusion model”. In:2025 IEEE International Con- ference on Robotics and Automation (ICRA). IEEE. 2025, pp. 8687– 8694

2025

[21] [21]

Category-Level Articulated Object 9D Pose Estima- tion via Reinforcement Learning

Liu Liu et al. “Category-Level Articulated Object 9D Pose Estima- tion via Reinforcement Learning”. In:Proceedings of the 31st ACM International Conference on Multimedia. 2023, pp. 728–736

2023

[22] [22]

Toward real-world category-level articulation pose estimation

Liu Liu et al. “Toward real-world category-level articulation pose estimation”. In:IEEE Transactions on Image Processing31 (2022), pp. 1072–1083

2022

[23] [23]

Category-level 6D pose estimation using geometry- guided instance-aware prior and multi-stage reconstruction

Tong Nie et al. “Category-level 6D pose estimation using geometry- guided instance-aware prior and multi-stage reconstruction”. In: IEEE Robotics and Automation Letters8.4 (2023), pp. 2381–2388

2023

[24] [24]

Self-supervised category-level 6D object pose estimation with deep implicit shape representation

Wanli Peng et al. “Self-supervised category-level 6D object pose estimation with deep implicit shape representation”. In:Proceedings of the AAAI Conference on Artificial Intelligence. V ol. 36. 2. 2022, pp. 2082–2090

2022

[25] [25]

Pointnet++: Deep hierarchical feature learning on point sets in a metric space

Charles Ruizhongtai Qi et al. “Pointnet++: Deep hierarchical feature learning on point sets in a metric space”. In:Advances in neural information processing systems30 (2017)

2017

[26] [26]

i2c-net: Using instance-level neural networks for monocular category-level 6D pose estimation

Alberto Remus et al. “i2c-net: Using instance-level neural networks for monocular category-level 6D pose estimation”. In:IEEE Robotics and Automation Letters8.3 (2023), pp. 1515–1522

2023

[27] [27]

Denoising Dif- fusion Implicit Models

Jiaming Song, Chenlin Meng, and Stefano Ermon. “Denoising Dif- fusion Implicit Models”. In:International Conference on Learning Representations. 2021

2021

[28] [28]

Language-Embedded 6D Pose Estimation for Tool Manipulation

Yuyang Tu et al. “Language-Embedded 6D Pose Estimation for Tool Manipulation”. In:IEEE Robotics and Automation Letters(2025)

2025

[29] [29]

Normalized object coordinate space for category- level 6d object pose and size estimation

He Wang et al. “Normalized object coordinate space for category- level 6d object pose and size estimation”. In:Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2019, pp. 2642–2651

2019

[30] [30]

Category-level 6d object pose estimation via cascaded relation and recurrent reconstruction net- works

Jiaze Wang, Kai Chen, and Qi Dou. “Category-level 6d object pose estimation via cascaded relation and recurrent reconstruction net- works”. In:2021 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE. 2021, pp. 4807–4814

2021

[31] [31]

Di2Pose: Discrete Diffusion Model for Occluded 3D Human Pose Estimation

Weiquan Wang et al. “Di2Pose: Discrete Diffusion Model for Occluded 3D Human Pose Estimation”. In:Advances in Neural Information Processing Systems37 (2024), pp. 98717–98741

2024

[32] [32]

Captra: Category-level pose tracking for rigid and articulated objects from point clouds

Yijia Weng et al. “Captra: Category-level pose tracking for rigid and articulated objects from point clouds”. In:Proceedings of the IEEE/CVF International Conference on Computer Vision. 2021, pp. 13209–13218

2021

[33] [33]

6d-diff: A keypoint diffusion framework for 6d object pose estimation

Li Xu et al. “6d-diff: A keypoint diffusion framework for 6d object pose estimation”. In:Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 2024, pp. 9676–9686

2024

[34] [34]

OMAD: Object Model with Articulated Defor- mations for Pose Estimation and Retrieval

Han Xue et al. “OMAD: Object Model with Articulated Defor- mations for Pose Estimation and Retrieval”. In:arXiv preprint arXiv:2112.07334(2021)

arXiv 2021

[35] [35]

Generative category- level object pose estimation via diffusion models

Jiyao Zhang, Mingdong Wu, and Hao Dong. “Generative category- level object pose estimation via diffusion models”. In:Advances in Neural Information Processing Systems36 (2023), pp. 54627–54644

2023

[36] [36]

GaPT-DAR: Category-level Garments Pose Tracking via Integrated 2D Deformation and 3D Reconstruction

Li Zhang et al. “GaPT-DAR: Category-level Garments Pose Tracking via Integrated 2D Deformation and 3D Reconstruction”. In:Proceed- ings of the Computer Vision and Pattern Recognition Conference. 2025, pp. 22638–22647

2025

[37] [37]

Rˆ 2-Art: Category-Level Articulation Pose Es- timation from Single RGB Image via Cascade Render Strategy

Li Zhang et al. “Rˆ 2-Art: Category-Level Articulation Pose Es- timation from Single RGB Image via Cascade Render Strategy”. In:Proceedings of the AAAI Conference on Artificial Intelligence. V ol. 39. 9. 2025, pp. 9985–9993

2025

[38] [38]

U-COPE: Taking a Further Step to Universal 9D Category-Level Object Pose Estimation

Li Zhang et al. “U-COPE: Taking a Further Step to Universal 9D Category-Level Object Pose Estimation”. In:European Conference on Computer Vision. Springer. 2025, pp. 254–270

2025

[39] [39]

Rbp-pose: Residual bounding box projection for category-level pose estimation

Ruida Zhang et al. “Rbp-pose: Residual bounding box projection for category-level pose estimation”. In:Computer Vision–ECCV 2022: 17th European Conference, Tel Aviv, Israel, October 23–27, 2022, Proceedings, Part I. Springer. 2022, pp. 655–672

2022

[40] [40]

6d-vit: Category-level 6d object pose estimation via transformer-based instance representation learning

Lu Zou et al. “6d-vit: Category-level 6d object pose estimation via transformer-based instance representation learning”. In:IEEE Transactions on Image Processing31 (2022), pp. 6907–6921

2022