pith. machine review for the scientific record. sign in

arxiv: 2604.13309 · v1 · submitted 2026-04-14 · 💻 cs.RO

Recognition: unknown

Utilizing Inpainting for Keypoint Detection for Vision-Based Control of Robotic Manipulators

Authors on Pith no claims yet

Pith reviewed 2026-05-10 14:31 UTC · model grok-4.3

classification 💻 cs.RO
keywords visual servoingkeypoint detectionimage inpaintingrobotic manipulatorsvision-based controlocclusion handlingArUco markers
0
0 comments X

The pith

Inpainting creates labeled natural images for training keypoint detectors that enable markerless vision-based robot control.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces a method to detect and track natural keypoints on robotic manipulators for vision-based control without relying on external markers during operation. It uses temporary ArUco markers to label images, then applies inpainting to remove them and create training data with natural appearances. A separate inpainting model at runtime reconstructs occluded areas to maintain detection, and an Unscented Kalman Filter refines the predictions for stable control. This approach allows model-free control in configuration space using only vision, succeeding under both full visibility and partial occlusion.

Core claim

By attaching ArUco markers for automatic labeling and then inpainting to remove them, the method generates training data for a keypoint detector that works on unmarked robot images. At runtime, a second inpainting model handles occlusions in real time, combined with UKF filtering, to achieve robust visual servoing without camera calibration or robot models.

What carries the argument

Dual inpainting pipeline: one for generating markerless labeled training data by removing temporary markers, and another for real-time occlusion removal to sustain keypoint detection.

Load-bearing premise

The inpainting accurately removes markers without distorting the underlying keypoint locations in training images and enables continuous accurate detection under occlusion at runtime.

What would settle it

Demonstration that keypoint predictions deviate substantially from true positions in inpainted images, or that control performance degrades under occlusion despite the runtime inpainter and filter.

Figures

Figures reproduced from arXiv: 2604.13309 by Abhinav Gandhi, Berk Calli, Sreejani Chatterjee, Venkatesh Mullur.

Figure 1
Figure 1. Figure 1: Image (a) shows the original images for different robot configurations covering the workspace with ArUco markers attached, and image (b) presents the corresponding binary masks highlighting these markers. Both (a) and (b) are provided as input to a modified LaMa inpainting model, which generates the corresponding reconstructed images shown in (c). The reconstructed images, along with the original ArUco mar… view at source ↗
Figure 2
Figure 2. Figure 2: The data collection phase for out-of-plane motion. Image (a) shows the original images for different robot configurations covering the workspace in 3D with ArUco markers attached, and image (b) presents the corresponding binary masks highlighting these markers. Both (a) and (b) are provided as input to a modified LaMa inpainting model, which generates the corresponding reconstructed images shown in (c). Th… view at source ↗
Figure 3
Figure 3. Figure 3: Overall data-collection pipeline [PITH_FULL_IMAGE:figures/full_fig_p009_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Overall pipeline for adaptive visual servoing using predicted keypoints from inpainted images. reliable detection more difficult. Larger markers provide robust detection across depth variation. Once the markers are placed in the desired locations on the manipulator and the images are collected, we automatically detect the marker locations in each image and create a binary mask around the markers as depicte… view at source ↗
Figure 5
Figure 5. Figure 5: Comparison of inpainting results across multiple models. Each row shows an input image, its ground truth label, and outputs from AttU-Net, ResU-Net, U2NET, and CoModGAN. Column headers are shown only once for clarity. The reconstructed images with annotated json files are archived (DOI: 10.5281/zenodo.17309869). 4.2 Network Architecture of the Keypoint Detector Model: We use PyTorch vision library’s Ayyade… view at source ↗
Figure 6
Figure 6. Figure 6: Overview of the Attention U-Net based GAN model. Occluded input image Ground truth YCB object as occlusion patches Random shape and color patches Random scenes as occlusion patches [PITH_FULL_IMAGE:figures/full_fig_p011_6.png] view at source ↗
Figure 7
Figure 7. Figure 7 [PITH_FULL_IMAGE:figures/full_fig_p011_7.png] view at source ↗
Figure 8
Figure 8. Figure 8: The images with ArUco marker are the ground truth and the red dot on the corresponding image is the keypoint predicted by Keypoint R-CNN model trained with data generated using our present method [PITH_FULL_IMAGE:figures/full_fig_p017_8.png] view at source ↗
Figure 9
Figure 9. Figure 9: Smooth keypoint detection during continuous robot motion for 4 keypoints configuration [PITH_FULL_IMAGE:figures/full_fig_p017_9.png] view at source ↗
Figure 10
Figure 10. Figure 10: The images with ArUco marker are the ground truth and the red dot on the corresponding image is the keypoint predicted by Keypoint R-CNN model trained with data generated using our present method [PITH_FULL_IMAGE:figures/full_fig_p018_10.png] view at source ↗
Figure 11
Figure 11. Figure 11: Smooth keypoint detection during continuous robot motion for 5 keypoints configuration. large occlusions result in higher average overshoot with high variance, suggesting occasional but not consistently severe control deviations. The table values were computed from the archived control–experiment datasets on Zenodo: baseline (Column 1, DOI: 10.5281/zenodo.17334885), small occlu￾sion (Column 2, DOI: 10.528… view at source ↗
Figure 12
Figure 12. Figure 12: The images with ArUco marker are the ground truth and the red dot on the corresponding image is the keypoint predicted by Keypoint R-CNN model trained with data generated using our present method [PITH_FULL_IMAGE:figures/full_fig_p019_12.png] view at source ↗
Figure 13
Figure 13. Figure 13: Smooth keypoint detection during continuous robot motion in 3D with 8 keypoints configuration [PITH_FULL_IMAGE:figures/full_fig_p019_13.png] view at source ↗
Figure 14
Figure 14. Figure 14 [PITH_FULL_IMAGE:figures/full_fig_p020_14.png] view at source ↗
Figure 15
Figure 15. Figure 15: Image feature error norm (top) and individual image feature errors (bottom) for 3 keypoints. Prior work This paper Target 1 Target 2 [PITH_FULL_IMAGE:figures/full_fig_p021_15.png] view at source ↗
Figure 17
Figure 17. Figure 17: Image feature error norm (top) and individual image feature errors (bottom) for 5 keypoints without occlusion in the scene. Error norm Time (secs) Feature error (px) [PITH_FULL_IMAGE:figures/full_fig_p021_17.png] view at source ↗
Figure 16
Figure 16. Figure 16: Comparison of trajectory smoothness between the proposed and prior methods. The low-noise profiles indicate that our method achieves robustness comparable to the existing approach. where 6 keypoints are used as visual features to control four joints (three planar + one spatial). The results are summarized in [PITH_FULL_IMAGE:figures/full_fig_p021_16.png] view at source ↗
Figure 18
Figure 18. Figure 18: Image feature error norm (top) and individual image feature errors (bottom) for 5 keypoints with occlusion in the scene. control. These keypoints are predicted in real time using our proposed pipeline. Even when significant portions of the arm are occluded, the system is able to reconstruct the missing regions, detect keypoints accurately, and maintain Prepared using sagej.cls [PITH_FULL_IMAGE:figures/fu… view at source ↗
Figure 19
Figure 19. Figure 19: Control experiment results comparing baseline keypoint detection with keypoint detection under varying levels of occlusion. Each row corresponds to the same start and goal configuration. The first image in each row shows the baseline without occlusion, the second includes a smaller occlusion, and the third includes a larger occlusion. The robot configuration with red keypoints represents the goal. While t… view at source ↗
Figure 20
Figure 20. Figure 20: Image feature error norm (top) and individual image feature errors (bottom) for control experiments with 3D motion. 6 Failure Use Cases and Observations Despite the improved robustness introduced by our inpainting-based pipeline and the integration of a UKF for keypoint correction, we observe two important failure cases that can significantly impact system performance in real￾world scenarios. These failur… view at source ↗
Figure 21
Figure 21. Figure 21: Control experiment results for out-of-plane motions using 6 keypoints and 4 joints. The robot configurations with the red keypoints represent the goal. As observed, the controller achieves smooth and noiseless trajectories in all cases. model often incorrectly reconstructs parts of the robot by blending features from the distractor. This leads to structural inaccuracies in the recovered image, such as spu… view at source ↗
Figure 22
Figure 22. Figure 22 [PITH_FULL_IMAGE:figures/full_fig_p025_22.png] view at source ↗
Figure 23
Figure 23. Figure 23: Example of pose ambiguity introduced by occlusion. All four images show the same configuration for the first two joints, but the occluded end-effector is reconstructed differently in each case, resulting in multiple plausible but incorrect poses for the last joint. Prepared using sagej.cls [PITH_FULL_IMAGE:figures/full_fig_p025_23.png] view at source ↗
Figure 24
Figure 24. Figure 24: The first image contains a large occlusion completely covering the end effector. The inpainting model reconstructs two end effector regions due to pose ambiguity as seen in second image. As a result, the initial keypoint prediction (red circles) is missed. Despite this, the UKF, leveraging distance-based thresholding, is able to generate a reasonably accurate estimate of the keypoint (green circles) Real-… view at source ↗
Figure 25
Figure 25. Figure 25: The first image contains a bright white occlusion that significantly affects reconstruction quality, as seen in the second image. As a result, the initial keypoint prediction (red circle) is severely mislocalized, falling outside the correction range of the UKF (green circle). Step 3: Place ArUco markers along the robot’s body. Quantity and placement of the markers depend on the application. Repeat Step 1… view at source ↗
read the original abstract

In this paper we present a novel visual servoing framework to control a robotic manipulator in the configuration space by using purely natural visual features. Our goal is to develop methods that can robustly detect and track natural features or keypoints on robotic manipulators that would be used for vision-based control, especially for scenarios where placing external markers on the robot is not feasible or preferred at runtime. For the model training process of our data driven approach, we create a data collection pipeline where we attach ArUco markers along the robot's body, label their centers as keypoints, and then utilize an inpainting method to remove the markers and reconstruct the occluded regions. By doing so, we generate natural (markerless) robot images that are automatically labeled with the marker locations. These images are used to train a keypoint detection algorithm, which is used to control the robot configuration using natural features of the robot. Unlike the prior methods that rely on accurate camera calibration and robot models for labeling training images, our approach eliminates these dependencies through inpainting. To achieve robust keypoint detection even in the presence of occlusion, we introduce a second inpainting model, this time to utilize during runtime, that reconstructs occluded regions of the robot in real time, enabling continuous keypoint detection. To further enhance the consistency and robustness of keypoint predictions, we integrate an Unscented Kalman Filter (UKF) that refines the keypoint estimates over time, adding to stable and reliable control performance. We obtained successful control results with this model-free and purely vision-based control strategy, utilizing natural robot features in the runtime, both under full visibility and partial occlusion.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. The manuscript presents a visual servoing framework for controlling robotic manipulators in configuration space using purely natural visual features. It generates training data by attaching ArUco markers, labeling their centers as keypoints, and applying inpainting to remove the markers and produce automatically labeled markerless images. A keypoint detector is trained on these images; at runtime, a second inpainting model reconstructs occluded regions, and an Unscented Kalman Filter (UKF) refines the estimates to enable stable model-free control under both full visibility and partial occlusion.

Significance. If the inpainting steps preserve keypoint geometry without systematic bias and the reported control performance is quantitatively validated, the method would provide a practical route to markerless, calibration-free vision-based control that relies only on natural robot appearance. The combination of runtime inpainting for occlusion handling with temporal filtering via UKF addresses a common robustness gap in visual servoing and could reduce reliance on external markers or kinematic models.

major comments (2)
  1. [Abstract] Abstract: The claim of obtaining 'successful control results' with the model-free strategy is stated without any quantitative metrics (e.g., end-effector tracking error, success rate over trials, or comparison to baselines), which is load-bearing for evaluating whether the framework actually achieves robust performance under full visibility and partial occlusion.
  2. [Abstract] Abstract (training pipeline description): The automatic labeling procedure assumes that inpainting removes ArUco markers without shifting the true keypoint locations or introducing artifacts that the detector will exploit. No validation is reported, such as mean pixel displacement between original marker centers and post-inpainting detections on held-out frames or reprojection error statistics, which directly undermines the validity of the training labels and the model-free claim.
minor comments (1)
  1. [Abstract] The abstract would benefit from briefly naming the keypoint detection architecture and the specific inpainting models employed, as these details are central to reproducibility.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive and detailed feedback on our manuscript. We appreciate the emphasis on strengthening the abstract and validating key assumptions in the training pipeline. Below we respond point by point to the major comments and indicate planned revisions.

read point-by-point responses
  1. Referee: [Abstract] Abstract: The claim of obtaining 'successful control results' with the model-free strategy is stated without any quantitative metrics (e.g., end-effector tracking error, success rate over trials, or comparison to baselines), which is load-bearing for evaluating whether the framework actually achieves robust performance under full visibility and partial occlusion.

    Authors: We agree that the abstract would be strengthened by including quantitative metrics. The manuscript body reports experimental results on tracking error and robustness, but these are not summarized numerically in the abstract. In the revised version we will add concise quantitative indicators (e.g., mean end-effector tracking error and trial success rates under both full visibility and partial occlusion) to the abstract while retaining the overall length limit. revision: yes

  2. Referee: [Abstract] Abstract (training pipeline description): The automatic labeling procedure assumes that inpainting removes ArUco markers without shifting the true keypoint locations or introducing artifacts that the detector will exploit. No validation is reported, such as mean pixel displacement between original marker centers and post-inpainting detections on held-out frames or reprojection error statistics, which directly undermines the validity of the training labels and the model-free claim.

    Authors: We acknowledge that the original submission did not include explicit quantitative validation of keypoint preservation after inpainting. While the training pipeline uses the original marker centers as ground truth before inpainting, we did not report displacement or artifact statistics. In the revision we will add a short validation analysis (mean pixel displacement on held-out frames and qualitative artifact checks) to the methods or results section and reference it briefly in the abstract to support the label quality. revision: yes

Circularity Check

0 steps flagged

No significant circularity; pipeline relies on external assumptions rather than self-referential reduction

full rationale

The paper's core pipeline attaches ArUco markers, records centers as labels, applies inpainting to create markerless training images, trains a detector, and deploys with runtime inpainting plus UKF. No equations or derivations are shown that equate a 'prediction' to a fitted input by construction, nor are self-citations used to import uniqueness theorems or ansatzes. The claim of eliminating calibration dependencies rests on the empirical performance of inpainting (an external technique), not on any definitional loop within the paper's own steps. This is a standard data-generation assumption whose validity is independent of the reported control results.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The framework rests on the domain assumption that inpainting can be used to create accurate synthetic training data from marked images and handle runtime occlusions effectively without introducing significant errors in keypoint positions.

axioms (1)
  • domain assumption Inpainting can accurately reconstruct robot appearance without markers while preserving keypoint positions for training and runtime use
    Central to generating labeled data and enabling continuous detection under occlusion.

pith-pipeline@v0.9.0 · 5609 in / 1248 out tokens · 54366 ms · 2026-05-10T14:31:55.697339+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

158 extracted references · 20 canonical work pages

  1. [1]

    Patch-based image inpainting via two-stage low rank approximation

    Guo Q, Gao S, Zhang X, Yin Y and Zhang C. Patch-based image inpainting via two-stage low rank approximation. IEEE Trans on Visualization and Computer Graphics 2017; 24(6): 2023--2036

  2. [2]

    Image inpainting

    Bertalmio M, Sapiro G, Caselles V and Ballester C. Image inpainting. In Proc. ACM SIGGRAPH Conf. on Computer Graphics. pp. 417--424

  3. [3]

    Region filling and object removal by exemplar-based image inpainting

    Criminisi A, Perez P and Toyama K. Region filling and object removal by exemplar-based image inpainting. IEEE Trans on Image Processing 2004; 13(9): 1200--1212. doi:10.1109/TIP.2004.833105

  4. [4]

    Texture synthesis by non-parametric sampling

    Efros A and Leung T. Texture synthesis by non-parametric sampling. In Proc. IEEE Intl. Conf. on Computer Vision , volume 2. pp. 1033--1038 vol.2. doi:10.1109/ICCV.1999.790383

  5. [5]

    Facial image inpainting with variational autoencoder

    Tu CT and Chen YF. Facial image inpainting with variational autoencoder. In 2019 2nd international conference of intelligent robotic and control engineering (IRCE). IEEE, pp. 119--122

  6. [6]

    Image inpainting using autoencoder and guided selection of predicted pixels

    Givkashi MH, Hadipour M, PariZanganeh A, Nabizadeh Z, Karimi N and Samavi S. Image inpainting using autoencoder and guided selection of predicted pixels. In 2022 30th International Conference on Electrical Engineering (ICEE). IEEE, pp. 700--704

  7. [7]

    Deep learning-based image and video inpainting: A survey

    Quan W, Chen J, Liu Y, Yan DM and Wonka P. Deep learning-based image and video inpainting: A survey. Intl J of Computer Vision 2024; 132(7): 2367--2400

  8. [8]

    Deep learning for image inpainting: A survey

    Xiang H, Zou Q, Nawaz MA, Huang X, Zhang F and Yu H. Deep learning for image inpainting: A survey. Pattern Recognition 2023; 134: 109046

  9. [9]

    Context encoders: Feature learning by inpainting

    Pathak D, Krähenbühl P, Donahue J, Darrell T and Efros AA. Context encoders: Feature learning by inpainting. In IEEE Conf. on Computer Vision and Pattern Recognition . pp. 2536--2544. doi:10.1109/CVPR.2016.278

  10. [10]

    Free-form image inpainting with gated convolution

    Yu J, Lin Z, Yang J, Shen X, Lu X and Huang T. Free-form image inpainting with gated convolution. In IEEE/CVF Intl. Conf. on Computer Vision . pp. 4470--4479. doi:10.1109/ICCV.2019.00457

  11. [11]

    Mathew, V

    Suvorov R, Logacheva E, Mashikhin A, Remizova A, Ashukha A, Silvestrov A, Kong N, Goka H, Park K and Lempitsky V. Resolution-robust large mask inpainting with fourier convolutions. In IEEE/CVF Winter Conf. on Appl. of Computer Vision . pp. 3172--3182. doi:10.1109/WACV51458.2022.00323

  12. [12]

    Occlusion aware unsupervised learning of optical flow

    Wang Y, Yang Y, Yang Z, Zhao L, Wang P and Xu W. Occlusion aware unsupervised learning of optical flow. In IEEE Conf. on Computer Vision and Pattern Recognition . pp. 4884--4893

  13. [13]

    Mask r-cnn

    He K, Gkioxari G, Doll \'a r P and Girshick R. Mask r-cnn. In IEEE/CVF Intl. Conf. on Computer Vision . pp. 2961--2969

  14. [14]

    Rgi: robust gan-inversion for mask-free image inpainting and unsupervised pixel-wise anomaly detection

    Mou S, Gu X, Cao M, Bai H, Huang P, Shan J and Shi J. Rgi: robust gan-inversion for mask-free image inpainting and unsupervised pixel-wise anomaly detection. In Intl. Conf. on Learning Rep

  15. [15]

    Vcnet: A robust approach to blind image inpainting

    Wang Y, Chen YC, Tao X and Jia J. Vcnet: A robust approach to blind image inpainting. In Proc. of the European Conf. on Computer Vision. Springer, pp. 752--768

  16. [16]

    Inpaint anything: Segment anything meets image inpainting

    Yu T, Feng R, Feng R, Liu J, Jin X, Zeng W and Chen Z. Inpaint anything: Segment anything meets image inpainting. arXiv preprint 2023

  17. [17]

    Empty cities: Image inpainting for a dynamic-object-invariant space

    Bescos B, Neira J, Siegwart R and Cadena C. Empty cities: Image inpainting for a dynamic-object-invariant space. In IEEE Intl. Conf. Robot. Autom. IEEE, pp. 5460--5466

  18. [18]

    Patch-based image inpainting with generative adversarial networks

    Demir U and Unal G. Patch-based image inpainting with generative adversarial networks. arXiv preprint 2018

  19. [19]

    Wasserstein generative adversarial networks

    Arjovsky M, Chintala S and Bottou L. Wasserstein generative adversarial networks. In Intl. Conf. on Machine Learning. PMLR, pp. 214--223

  20. [20]

    Improved training of wasserstein gans

    Gulrajani I, Ahmed F, Arjovsky M, Dumoulin V and Courville AC. Improved training of wasserstein gans. Advances in Neural Information Processing Systems 2017; 30

  21. [21]

    Image inpainting via context discriminator and u-net

    Wei R and Wu Y. Image inpainting via context discriminator and u-net. Math Probs in Engg 2022; 2022(1): 7328045

  22. [22]

    Globally and locally consistent image completion

    Iizuka S, Simo-Serra E and Ishikawa H. Globally and locally consistent image completion. ACM Transactions on Graphics (ToG) 2017; 36(4): 1--14

  23. [23]

    Image inpainting for irregular holes using partial convolutions

    Liu G, Reda FA, Shih KJ, Wang TC, Tao A and Catanzaro B. Image inpainting for irregular holes using partial convolutions. In Proc. of the European Conf. on Computer Vision. pp. 85--100

  24. [24]

    Repaint: Inpainting using denoising diffusion probabilistic models

    Lugmayr A, Danelljan M, Romero A, Yu F, Timofte R and Van Gool L. Repaint: Inpainting using denoising diffusion probabilistic models. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp. 11461--11471

  25. [25]

    A superior image inpainting scheme using transformer-based self-supervised attention gan model

    Zhou M, Liu X, Yi T, Bai Z and Zhang P. A superior image inpainting scheme using transformer-based self-supervised attention gan model. Expert Syst with Appl 2023; 233: 120906

  26. [26]

    T-former: An efficient transformer for image inpainting

    Deng Y, Hui S, Zhou S, Meng D and Wang J. T-former: An efficient transformer for image inpainting. In Proc. ACM Intl. Conf. on multimedia. pp. 6559--6568

  27. [27]

    MViTv2: Improved Multiscale Vision Transformers for Classification and Detection , isbn =

    Li W, Lin Z, Zhou K, Qi L, Wang Y and Jia J. Mat: Mask-aware transformer for large hole image inpainting. In IEEE Conf. on Computer Vision and Pattern Recognition . pp. 10748--10758. doi:10.1109/CVPR52688.2022.01049

  28. [28]

    Attention gated networks: Learning to leverage salient regions in medical images

    Schlemper J, Oktay O, Schaap M, Heinrich M, Kainz B, Glocker B and Rueckert D. Attention gated networks: Learning to leverage salient regions in medical images. Medical Image Analysis 2019; 53: 197--207

  29. [29]

    2d human pose estimation: New benchmark and state of the art analysis

    Andriluka M, Pishchulin L, Gehler P and Schiele B. 2d human pose estimation: New benchmark and state of the art analysis. In IEEE Conf. on Computer Vision and Pattern Recognition . pp. 3686--3693

  30. [30]

    A flexible new technique for camera calibration

    Zhang Z. A flexible new technique for camera calibration. IEEE Trans on Pattern Analysis and Machine Intelligence 2000; 22: 1330--1334

  31. [31]

    A tutorial on visual servo control

    Hutchinson S, Hager GD and Corke PI. A tutorial on visual servo control. IEEE Trans Robot 1996; 12: 651--670. doi:10.1109/70.538972

  32. [32]

    ://docs.opencv.org/4.x/d9/d0c/group__calib3d.html

    Camera calibration and 3d reconstruction. ://docs.opencv.org/4.x/d9/d0c/group__calib3d.html

  33. [33]

    Camera-to-robot pose estimation from a single image

    Lee TE, Tremblay J, To T, Cheng J, Mosier T, Kroemer O, Fox D and Birchfield S. Camera-to-robot pose estimation from a single image. In IEEE Intl. Conf. Robot. Autom. pp. 9426--9432

  34. [34]

    Deeppose: Human pose estimation via deep neural networks

    Toshev A and Szegedy C. Deeppose: Human pose estimation via deep neural networks. In IEEE Conf. on Computer Vision and Pattern Recognition . pp. 1653--1660

  35. [35]

    Dynamic visual servoing with kalman filter-based depth and velocity estimator

    Chang TY, Chang WC, Cheng MY and Yang SS. Dynamic visual servoing with kalman filter-based depth and velocity estimator. Intl J of Advanced Robotic Syst 2021; 18. doi:10.1177/17298814211016674

  36. [36]

    Experimental evaluation of uncalibrated visual servoing for precision manipulation

    Jagersand M, Fuentes O and Nelson R. Experimental evaluation of uncalibrated visual servoing for precision manipulation. IEEE Intl Conf Robot Autom 1997; 4: 2874--2880. doi:10.1109/robot.1997.606723

  37. [37]

    Faster R - C N N : Towards real-time object detection with region proposal networks

    Ren S, He K, Girshick R and Sun J. Faster r-cnn: Towards real-time object detection with region proposal networks. IEEE Trans on Pattern Analysis and Machine Intelligence 2017; 39: 1137--1149. doi:10.1109/TPAMI.2016.2577031

  38. [38]

    How to train a custom keypoint detection model with pytorch, 2021

    P A. How to train a custom keypoint detection model with pytorch, 2021

  39. [39]

    Human pose estimation using keypoint rcnn in pytorch, 2021

    Patil C and Gupta V. Human pose estimation using keypoint rcnn in pytorch, 2021

  40. [40]

    Microsoft coco: Common objects in context

    Lin TY, Maire M, Belongie S, Hays J, Perona P, Ramanan D, Doll \'a r P and Zitnick CL. Microsoft coco: Common objects in context. In Computer Vision--ECCV, Zurich, Switzerland, Proceedings, Part V 13. Springer, pp. 740--755

  41. [41]

    OpenPose: Realtime Multi-Person 2D Pose Estimation Using Part Affinity Fields

    Cao Z, Hidalgo G, Simon T, Wei SE and Sheikh Y. Openpose: Realtime multi-person 2d pose estimation using part affinity fields. IEEE Trans on Pattern Analysis and Machine Intelligence 2021; 43: 172--186. doi:10.1109/TPAMI.2019.2929257

  42. [42]

    Pose estimation for robot manipulators via keypoint optimization and sim-to-real transfer

    Lu J, Richter F and Yip MC. Pose estimation for robot manipulators via keypoint optimization and sim-to-real transfer. IEEE Robot Autom Letters 2022; 7: 4622--4629. doi:10.1109/LRA.2022.3151981

  43. [43]

    Robust jacobian estimation for uncalibrated visual servoing

    Shademan A, Farahmand AM and J \"a gersand M. Robust jacobian estimation for uncalibrated visual servoing. In IEEE Intl. Conf. Robot. Autom. pp. 5564--5569

  44. [44]

    Springer Handbook Of Robotics, volume Springer

    Chaumette F, Hutchinson S and Corke P. Springer Handbook Of Robotics, volume Springer. 2016

  45. [45]

    Versatile visual servoing without knowledge of true jacobian

    Hosoda K and Asada M. Versatile visual servoing without knowledge of true jacobian. ISBN 0780319338, pp. 186--193. doi:10.1109/IROS.1994.407392

  46. [46]

    Introduction to Robotics Mechanics and Control

    Craig JJ. Introduction to Robotics Mechanics and Control. 3 ed. Pearson Education International, 2005

  47. [47]

    Camper's plane localization and head pose estimation based on multi-view rgbd sensors

    Wang H, Huang L, Yu K, Song T, Yuan F, Yang H and Zhang H. Camper's plane localization and head pose estimation based on multi-view rgbd sensors. IEEE Access 2022; 10: 131722--131734. doi:10.1109/ACCESS.2022.3227572

  48. [48]

    A human-robot collaboration method using a pose estimation network for robot learning of assembly manipulation trajectories from demonstration videos

    Deng X, Liu J, Gong H, Gong H and Huang J. A human-robot collaboration method using a pose estimation network for robot learning of assembly manipulation trajectories from demonstration videos. IEEE Trans on Industrial Informatics 2022; doi:10.1109/TII.2022.3224966

  49. [49]

    2020 IEEE/ASME International Conference on Advanced Intelligent Mechatronics (AIM) , pages =

    Lai J, Huang K, Lu B and Chu HK. Toward vision-based adaptive configuring of a bidirectional two-segment soft continuum manipulator . IEEE/ASME Intl Conf on Adv Intell Mechatronics, AIM 2020; July: 934--939. doi:10.1109/AIM43001.2020.9158975

  50. [50]

    Vision-based precision manipulation with underactuated hands: Simple and effective solutions for dexterity

    Calli B and Dollar AM. Vision-based precision manipulation with underactuated hands: Simple and effective solutions for dexterity . In IEEE/RSJ Intl. Conf. Intell. Robots and Syst. , volume Nov. ISBN 9781509037629, pp. 1012--1018. doi:10.1109/IROS.2016.7759173

  51. [51]

    Imambi S, Prakash KB and Kanagachidambaresan GR. PyTorch. Cham: Springer International Publishing. ISBN 978-3-030-57077-4, 2021. pp. 87--104. doi:10.1007/978-3-030-57077-4_10

  52. [52]

    Modern Computer Vision with PyTorch: Explore deep learning concepts and implement over 50 real-world image applications

    Ayyadevara V and Reddy Y. Modern Computer Vision with PyTorch: Explore deep learning concepts and implement over 50 real-world image applications. Packt Publishing, 2020. ISBN 9781839216534

  53. [53]

    Densepose: Dense human pose estimation in the wild

    G \"u ler RA, Neverova N and Kokkinos I. Densepose: Dense human pose estimation in the wild. In IEEE Conf. on Computer Vision and Pattern Recognition . pp. 7297--7306

  54. [54]

    Hybrid multi-camera visual servoing to moving target

    Cuevas-Velasquez H, Li N, Tylecek R, Saval-Calvo M and Fisher RB. Hybrid multi-camera visual servoing to moving target. In IEEE/RSJ Intl. Conf. Intell. Robots and Syst. pp. 1132--1137

  55. [55]

    Reaching and grasping of objects by humanoid robots through visual servoing

    Ard \'o n P, Dragone M and Erden MS. Reaching and grasping of objects by humanoid robots through visual servoing. In Haptics: Science, Technology, and Applications: 11th Intl. Conf., EuroHaptics, Proceedings, Part II 11. Springer, pp. 353--365

  56. [56]

    Resolution-robust large mask inpainting with fourier convolutions

    Suvorov R, Logacheva E, Mashikhin A, Remizova A, Ashukha A, Silvestrov A, Kong N, Goka H, Park K and Lempitsky V. Resolution-robust large mask inpainting with fourier convolutions. In IEEE/CVF Winter Conf. on Appl. of Computer Vision . pp. 2149--2159

  57. [57]

    Detectron2 2019

    Wu Y, Kirillov A, Massa F, Lo WY and Girshick R. Detectron2 2019

  58. [58]

    Ntire 2022 image inpainting challenge: Report

    Romero A, Castillo A, Abril-Nova J, Timofte R, Das R, Hira S, Pan Z, Zhang M, Li B, He D, Lin T, Li F, Wu C, Liu X, Wang X, Yu Y, Yang J, Li R, Zhao Y, Guo Z, Fan B, Li X, Zhang R, Lu Z, Huang J, Wu G, Jiang J, Cai J, Li C, Tao X, Tai YW, Zhou X and Huang H. Ntire 2022 image inpainting challenge: Report. In IEEE Conf. on Computer Vision and Pattern Recogn...

  59. [59]

    Large scale image completion via co-modulated generative adversarial networks

    Zhao S, Cui J, Sheng Y, Dong Y, Liang X, Chang EI and Xu Y. Large scale image completion via co-modulated generative adversarial networks. arXiv preprint arXiv:210310428 2021

  60. [60]

    Online marker-free extrinsic camera calibration using person keypoint detections

    P \"a tzold B, Bultmann S and Behnke S. Online marker-free extrinsic camera calibration using person keypoint detections. In DAGM German Conf. on Pattern Recognition. Springer, pp. 300--316

  61. [61]

    Robot arm pose estimation through pixel-wise part classification

    Bohg J, Romero J, Herzog A and Schaal S. Robot arm pose estimation through pixel-wise part classification. In IEEE Intl. Conf. Robot. Autom. pp. 3143--3150

  62. [62]

    Keypoints-based adaptive visual servoing for control of robotic manipulators in configuration space

    Chatterjee S, Karade AC, Gandhi A and Calli B. Keypoints-based adaptive visual servoing for control of robotic manipulators in configuration space. In IEEE/RSJ Intl. Conf. Intell. Robots and Syst. pp. 6387--6394

  63. [63]

    Optimizing keypoint-based single-shot camera-to-robot pose estimation through shape segmentation

    Lambrecht J, Grosenick P and Meusel M. Optimizing keypoint-based single-shot camera-to-robot pose estimation through shape segmentation. In IEEE Intl. Conf. Robot. Autom. pp. 13843--13849

  64. [64]

    A vision-based marker-less pose estimation system for articulated construction robots

    Liang CJ, Lundeen KM, McGee W, Menassa CC, Lee S and Kamat VR. A vision-based marker-less pose estimation system for articulated construction robots. Autom in Construction 2019; 104: 80--94

  65. [65]

    A review on vision-based control of robot manipulators

    Hashimoto K. A review on vision-based control of robot manipulators. Adv Robot 2003; 17(10): 969--991

  66. [66]

    A tutorial on visual servo control

    Hutchinson S, Hager GD and Corke PI. A tutorial on visual servo control. IEEE Trans Robot 1996; 12(5): 651--670

  67. [67]

    Towards high performance human keypoint detection

    Zhang J, Chen Z and Tao D. Towards high performance human keypoint detection. Intl J of Computer Vision 2021; 129(9): 2639--2662

  68. [68]

    Feature refinement to improve high resolution image inpainting

    Kulshreshtha P, Pugh B and Jiddi S. Feature refinement to improve high resolution image inpainting. arXiv preprint arXiv:220613644 2022

  69. [69]

    Free-form image inpainting with gated convolution

    Yu J, Lin Z, Yang J, Shen X, Lu X and Huang TS. Free-form image inpainting with gated convolution. In IEEE/CVF Intl. Conf. on Computer Vision . pp. 4471--4480

  70. [70]

    Edgeconnect: Generative image inpainting with adversarial edge learning

    Nazeri K, Ng E, Joseph T, Qureshi FZ and Ebrahimi M. Edgeconnect: Generative image inpainting with adversarial edge learning. arXiv preprint arXiv:190100212 2019

  71. [71]

    Image inpainting by end-to-end cascaded refinement with mask awareness

    Zhu M, He D, Li X, Li C, Li F, Liu X, Ding E and Zhang Z. Image inpainting by end-to-end cascaded refinement with mask awareness. IEEE Trans on Img Processing 2021; 30: 4855--4866

  72. [72]

    Contextual residual aggregation for ultra high-resolution image inpainting

    Yi Z, Tang Q, Azizi S, Jang D and Xu Z. Contextual residual aggregation for ultra high-resolution image inpainting. In IEEE Conf. on Computer Vision and Pattern Recognition . pp. 7508--7517

  73. [73]

    Path planning using lazy prm

    Bohlin R and Kavraki LE. Path planning using lazy prm. In IEEE Intl. Conf. Robot. Autom. , volume 1. pp. 521--528

  74. [74]

    Skeleton-based adaptive visual servoing for control of robotic manipulators in configuration space

    Gandhi A, Chatterjee S and Calli B. Skeleton-based adaptive visual servoing for control of robotic manipulators in configuration space. In IEEE/RSJ Intl. Conf. Intell. Robots and Syst. pp. 2182--2189

  75. [75]

    Obstacle avoidance using image-based visual servoing integrated with nonlinear model predictive control

    Lee D, Lim H and Kim HJ. Obstacle avoidance using image-based visual servoing integrated with nonlinear model predictive control. In IEEE Conf. on Decision and Control and European Control Conference . pp. 5689--5694

  76. [76]

    Path planning in image space for robust visual servoing

    Mezouar Y and Chaumette F. Path planning in image space for robust visual servoing. In IEEE Intl. Conf. Robot. Autom. , volume 3. pp. 2759--2764

  77. [77]

    End-to-end training of deep visuomotor policies

    Levine S, Finn C, Darrell T and Abbeel P. End-to-end training of deep visuomotor policies. J of Machine Learning Research 2016; 17(39): 1--40. ://jmlr.org/papers/v17/15-522.html

  78. [78]

    Learning latent dynamics for planning from pixels

    Hafner D, Lillicrap T, Fischer I, Villegas R, Ha D, Lee H and Davidson J. Learning latent dynamics for planning from pixels. In Intl. Conf. on Machine Learning. PMLR, pp. 2555--2565

  79. [79]

    Robot motion planning in learned latent spaces

    Ichter B and Pavone M. Robot motion planning in learned latent spaces. IEEE Robot Autom Letters 2019; 4(3): 2407--2414

  80. [80]

    Probabilistic roadmaps for path planning in high-dimensional configuration spaces

    Kavraki LE, Svestka P, Latombe JC and Overmars MH. Probabilistic roadmaps for path planning in high-dimensional configuration spaces. IEEE Trans Robot 1996; 12(4): 566--580

Showing first 80 references.