pith. sign in

arxiv: 2606.02906 · v1 · pith:QNNRLFWSnew · submitted 2026-06-01 · 📡 eess.IV · cs.CV

Depth from Dual Differential Defocus and Stereo Consensus

Pith reviewed 2026-06-28 11:56 UTC · model grok-4.3

classification 📡 eess.IV cs.CV
keywords depth estimationdepth from defocusstereo visionconsensuspassive rangingcompact rangefinderdual differential defocus
0
0 comments X

The pith

D^3S combines dual differential defocus and stereo consensus to estimate depth accurately with a 10 times smaller baseline than prior triangulation systems.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents D^3S Consensus as a closed-form method that estimates depth from a pair of dual-defocus stereo images by first computing an overdetermined set of depths using a new Dual Differential Defocus theory together with stereo cues. It then retains only the prediction that shows agreement between these physically independent sources. This produces high-resolution depth maps over an extended range while using a much smaller camera separation than conventional stereo or defocus designs alone. A reader would care because the approach points toward compact passive depth sensors that fit into smaller devices without sacrificing working distance or accuracy.

Core claim

Given dual-defocus stereo images, the method computes depths from both Dual Differential Defocus and stereo, then selects the estimate that enforces consensus between the two cues; this rejects unreliable predictions and yields depth maps with 1 cm mean absolute error over 0.3-1.64 m using only a 4 mm baseline.

What carries the argument

The consensus step that retains only the depth prediction showing agreement between the DfD-derived and stereo-derived estimates.

If this is right

  • Achieves a comparable working range at the same error tolerance with a 10 times smaller baseline than previous triangulation-based depth systems.
  • Enables compact passive binocular rangefinders with substantially smaller form factors than conventional stereo or DfD designs.
  • Produces up to 900 by 1800 pixel depth maps from a single snapshot acquisition.
  • Surpasses the reported accuracy of some commercially available stereo cameras that use much larger baselines.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The snapshot nature of the acquisition opens a route to video-rate depth maps if the same consensus logic can be applied across frames.
  • Small-baseline operation could allow depth sensing to be added to space-constrained platforms such as phones or small robots without active illumination.
  • Because the two cues are physically independent, the consensus filter may improve robustness in scenes that contain textureless regions or depth discontinuities.
  • The same fusion idea could be tested with other pairs of passive cues, such as focus and polarization, to further shrink sensor size.

Load-bearing premise

The consensus step between DfD-derived depths and stereo-derived depths can reliably keep accurate estimates and discard unreliable ones without introducing systematic bias or losing coverage.

What would settle it

Ground-truth depth measurements on scenes engineered so that defocus and stereo cues disagree on the same points; check whether the consensus rule still returns the correct value or instead selects an erroneous depth when both cues err.

Figures

Figures reproduced from arXiv: 2606.02906 by Dylan Chu, Emma Alexander, Junjie Luo, Qi Guo, Wei Xu.

Figure 1
Figure 1. Figure 1: (a) Overview of D3S Consensus. It utilizes a compact binocular camera system to capture two images of different defocus and perspectives (blue and red boxes). It then use the images to generate a highly accurate sparse depth map through a purely physics-based algorithm, which can be optionally densified using off-the-shelf depth completion models, e.g. [9]. (b) The accuracy of our D3S prototype is sufficie… view at source ↗
Figure 2
Figure 2. Figure 2: The image formed on the photosensor at time [PITH_FULL_IMAGE:figures/full_fig_p004_2.png] view at source ↗
Figure 2
Figure 2. Figure 2: Image formation model. A front-parallel target with texture T(\x ) is imaged by a thin-lens camera. This paper derives an analytical relationship between the image derivatives and target’s depth from the camera Z when some or all of the optical pa￾rameters, including optical power \rho , aperture radius A, sensor distance s, and camera￾center position (\x _0, z) are differentially varied. General Depth Rel… view at source ↗
Figure 3
Figure 3. Figure 3: Overview of D3S Consensus. (a) Consider two cameras centered at X0 and X1 with different optical powers imaging a front-parallel target at depth Z. We shift the captured images I1 towards I0 by ∆x so that the effective baseline distance between I0 and the shifted I1, ∆X, is sufficiently small for accurately computing Eq. 13 via finite difference approximation. (b) To solve the target depth, D3S Consensus g… view at source ↗
Figure 4
Figure 4. Figure 4: D 3S Consensus Prototype. (a) CAD rendering of the system assembly. (b) Photograph of the built prototype. (c) Sample image pair captured in a laboratory environment. The raw measurement exhibits slight overlap between the two views; non-overlapping regions are cropped to obtain I0 and I1. A measuring tape is included in the scene to indicate the focal plane of each image. The images I0 and I1 contain both… view at source ↗
Figure 5
Figure 5. Figure 5: Analysis of the D3S Consensus prototype in real experiments. (a) Mean absolute error (MAE) as a function of ground-truth depth under different confidence thresholds. With a confidence threshold of Cthre = 0.8, the prototype achieves a working range from 0.3 m to 1.64 m while maintaining MAE below 1 cm. (b) Working range (blue) and prediction density (red) as functions of the confidence threshold. Increasin… view at source ↗
Figure 6
Figure 6. Figure 6: Projected working ranges of prior depth-from-defocus and dual-defocus stereo systems [17, 40, 42, 44–46], estimated from their reported simulation performance. The dashed line illustrates the traditional trade-off between system compactness and work￾ing range for triangulation-based rangefinders. Our method departs from this trend by achieving a substantially more compact form factor while maintaining a lo… view at source ↗
Figure 7
Figure 7. Figure 7: Sample real-world scenes captured by the D3S prototype. The sparse depth maps generated by D3S (second column) are filtered using Eq. 16 with a constant threshold Cthre = 0.5. These results demonstrate reliable depth estimation across di￾verse objects and surface textures. The densified depth maps (third column) are ob￾tained by feeding the sparse depth maps together with the image I0 into an off-the-shelf… view at source ↗
Figure 6
Figure 6. Figure 6: Our system achieves a favorable combination of compactness and operat [PITH_FULL_IMAGE:figures/full_fig_p026_6.png] view at source ↗
read the original abstract

We introduce D^3S Consensus, a physics-based, closed-form algorithm that unifies depth-from-defocus (DfD) and stereo to achieve highly accurate depth estimation throughout an extended working range beyond the depth-of-field (DoF) of cameras. Given a pair of dual-defocus stereo images, the method estimates an overdetermined set of depth using a novel DfD theory, Dual Differential Defocus (D^3), and (S)tereo in a coupled fashion. It then picks the most confident depth prediction from the set by enforcing consensus between these physically independent cues to reject unreliable estimates. Analysis shows that D^3S achieves a comparable working range under the same error tolerance with 10x smaller baseline than previous triangulation-based depth estimation systems. This enables compact passive binocular rangefinders with substantially smaller form factors than conventional stereo and DfD designs. We demonstrate the first D^3S prototype with only 4 mm baseline and 12 mm EFL. It generates up to 900 x 1800-pixel depth maps with 1-cm mean absolute error over 0.3-1.64 m from a snapshot acquisition. This has surpassed the reported accuracy of certain commercially available stereo cameras with much larger form factors.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. The paper introduces D^3S Consensus, a physics-based closed-form algorithm that unifies Dual Differential Defocus (D^3) and stereo cues from a pair of dual-defocus stereo images. It computes an overdetermined set of depth estimates and selects the most confident value by enforcing consensus between the physically independent cues, thereby rejecting unreliable estimates. The central claim is that this yields a comparable working range under the same error tolerance with a 10x smaller baseline than prior triangulation-based systems; a prototype with 4 mm baseline and 12 mm EFL is reported to produce up to 900 x 1800-pixel depth maps with 1 cm MAE over 0.3-1.64 m from a single snapshot.

Significance. If the independence of the D^3 and stereo cues and the unbiased filtering property of the consensus rule hold, the result would enable compact passive depth sensors with substantially reduced form factors while extending usable range beyond the camera DoF; this would be a meaningful advance for applications requiring small-baseline binocular rangefinders.

major comments (2)
  1. [Abstract] Abstract: the assertion that 'analysis shows' a 10x baseline reduction rests on an unshown derivation of how the closed-form consensus step preserves working range under fixed error tolerance when the stereo cue is weakened by the 4 mm baseline; the manuscript must supply the explicit equations or bounds that establish this scaling without introducing systematic retention of outliers.
  2. [Consensus step] Consensus step (described in abstract): the load-bearing assumption that the consensus metric reliably retains only estimates whose error lies below tolerance, without coverage loss or bias when cues share error modes (textureless patches, DoF boundaries, or common illumination), is not yet supported by the provided text; a concrete test or counter-example analysis is required to substantiate the 10x claim.
minor comments (1)
  1. [Prototype results] Prototype results paragraph: specify the exact functional form of the consensus metric and the threshold used to select the final depth value.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive comments on our manuscript. We address each major comment below and will revise the paper to incorporate the requested material.

read point-by-point responses
  1. Referee: [Abstract] Abstract: the assertion that 'analysis shows' a 10x baseline reduction rests on an unshown derivation of how the closed-form consensus step preserves working range under fixed error tolerance when the stereo cue is weakened by the 4 mm baseline; the manuscript must supply the explicit equations or bounds that establish this scaling without introducing systematic retention of outliers.

    Authors: We agree the abstract claim requires explicit support. The manuscript contains a theoretical analysis of the consensus step, but we acknowledge the scaling derivation with respect to baseline is not presented with sufficient detail or bounds. In the revised version we will add the explicit equations and error-tolerance bounds in the theory section, showing how consensus preserves range at the 4 mm baseline while excluding systematic outlier retention. revision: yes

  2. Referee: [Consensus step] Consensus step (described in abstract): the load-bearing assumption that the consensus metric reliably retains only estimates whose error lies below tolerance, without coverage loss or bias when cues share error modes (textureless patches, DoF boundaries, or common illumination), is not yet supported by the provided text; a concrete test or counter-example analysis is required to substantiate the 10x claim.

    Authors: The consensus rule is predicated on the physical independence of the D^3 and stereo cues. We recognize that the current text does not supply explicit validation against shared error modes. The revised manuscript will add a dedicated analysis subsection containing concrete tests and counter-example evaluations on textureless patches, DoF boundaries, and common illumination variations to demonstrate retention of estimates within tolerance and to support the 10x baseline result. revision: yes

Circularity Check

0 steps flagged

No circularity: derivation relies on independent physical cues and external analysis

full rationale

The abstract and description present D^3 as a novel DfD theory combined with stereo via a consensus step on physically independent cues. The 10x baseline claim is stated to follow from analysis of the overdetermined set, with no quoted equations or steps showing that any prediction reduces by construction to a fitted input, self-citation chain, or renamed ansatz. The method is benchmarked against external systems and prototypes, satisfying the self-contained criterion. No load-bearing step matches the enumerated circularity patterns.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The method rests on standard optics and stereo geometry assumptions plus the domain claim that the two cues are independent enough for consensus to work; no free parameters or invented entities are named in the abstract.

axioms (1)
  • domain assumption Dual Differential Defocus and stereo provide physically independent depth cues whose agreement can be used to reject unreliable estimates.
    Central to the consensus step described in the abstract.

pith-pipeline@v0.9.1-grok · 5753 in / 1158 out tokens · 22709 ms · 2026-06-28T11:56:09.287674+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

72 extracted references

  1. [1]

    FirstName LastName , title =

  2. [2]

    FirstName Alpher , title =

  3. [3]

    Journal of Foo , volume = 13, number = 1, pages =

    FirstName Alpher and FirstName Fotheringham-Smythe , title =. Journal of Foo , volume = 13, number = 1, pages =

  4. [4]

    Journal of Foo , volume = 14, number = 1, pages =

    FirstName Alpher and FirstName Fotheringham-Smythe and FirstName Gamow , title =. Journal of Foo , volume = 14, number = 1, pages =

  5. [5]

    FirstName Alpher and FirstName Gamow , title =

  6. [6]

    2019 , publisher =

    A theory of depth from differential defocus , author =. 2019 , publisher =

  7. [7]

    Proceedings of the IEEE international conference on computer vision , pages=

    Focal track: Depth and accommodation with oscillating lens deformation , author=. Proceedings of the IEEE international conference on computer vision , pages=

  8. [8]

    Computer Vision--ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11-14, 2016, Proceedings, Part III 14 , pages=

    Focal flow: Measuring distance and velocity with defocus and differential motion , author=. Computer Vision--ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11-14, 2016, Proceedings, Part III 14 , pages=. 2016 , organization=

  9. [9]

    International Journal of Computer Vision , volume=

    Focal flow: Velocity and depth from differential defocus through motion , author=. International Journal of Computer Vision , volume=. 2018 , publisher=

  10. [10]

    Proceedings of the National Academy of Sciences , volume=

    Compact single-shot metalens depth sensors inspired by eyes of jumping spiders , author=. Proceedings of the National Academy of Sciences , volume=. 2019 , publisher=

  11. [11]

    Proceedings of the Computer Vision and Pattern Recognition Conference , pages=

    Focal Split: Untethered Snapshot Depth from Differential Defocus , author=. Proceedings of the Computer Vision and Pattern Recognition Conference , pages=

  12. [12]

    Luo et al

    Depth from Coupled Optical Differentiation: J. Luo et al. , author=. International Journal of Computer Vision , pages=. 2025 , publisher=

  13. [13]

    Computer vision , pages=

    Photon, poisson noise , author=. Computer vision , pages=. 2014 , publisher=

  14. [14]

    Acs Photonics , volume=

    3D imaging using extreme dispersion in optical metasurfaces , author=. Acs Photonics , volume=. 2021 , publisher=

  15. [15]

    Proceedings of the IEEE/CVF conference on computer vision and pattern recognition , pages=

    Single image depth estimation trained via depth from defocus cues , author=. Proceedings of the IEEE/CVF conference on computer vision and pattern recognition , pages=

  16. [16]

    2021 IEEE International Conference on Computational Photography (ICCP) , pages=

    Depth from defocus with learned optics for imaging and occlusion-aware depth estimation , author=. 2021 IEEE International Conference on Computational Photography (ICCP) , pages=. 2021 , organization=

  17. [17]

    IEEE Transactions on Image Processing , volume=

    Joint depth and defocus estimation from a single image using physical consistency , author=. IEEE Transactions on Image Processing , volume=. 2021 , publisher=

  18. [18]

    Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , pages=

    Fully self-supervised depth estimation from defocus clue , author=. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , pages=

  19. [19]

    Machine vision and applications , volume=

    Deblur and deep depth from single defocus image , author=. Machine vision and applications , volume=. 2021 , publisher=

  20. [20]

    IEEE Transactions on Computational Imaging , volume=

    Depth estimation and image restoration by deep learning from defocused images , author=. IEEE Transactions on Computational Imaging , volume=. 2023 , publisher=

  21. [21]

    Proceedings of the Computer Vision and Pattern Recognition Conference , pages=

    Learned Binocular-Encoding Optics for RGBD Imaging Using Joint Stereo and Focus Cues , author=. Proceedings of the Computer Vision and Pattern Recognition Conference , pages=

  22. [22]

    Ou et al

    Learning RGBD imaging via asymmetrically focused stereo cameras: L. Ou et al. , author=. The Visual Computer , pages=. 2025 , publisher=

  23. [23]

    Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , pages=

    Codedstereo: Learned phase masks for large depth-of-field stereo , author=. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , pages=

  24. [24]

    European Conference on Computer Vision , pages=

    Low-Cost Stereoscopic Optical-Coding Design for Depth Estimation Using End-to-End Optimization , author=. European Conference on Computer Vision , pages=. 2024 , organization=

  25. [25]

    International Conference on Computer Vision Theory and Applications , volume=

    Coded aperture stereo-for extension of depth of field and refocusing , author=. International Conference on Computer Vision Theory and Applications , volume=. 2012 , organization=

  26. [26]

    Proceedings of the IEEE conference on computer vision and pattern recognition , pages=

    Depth from defocus in the wild , author=. Proceedings of the IEEE conference on computer vision and pattern recognition , pages=

  27. [27]

    Journal of Electronic Imaging , volume=

    Dual-focus stereo imaging , author=. Journal of Electronic Imaging , volume=. 2010 , publisher=

  28. [28]

    Proceedings of the IEEE conference on computer vision and pattern recognition , pages=

    Fusing depth from defocus and stereo with coded apertures , author=. Proceedings of the IEEE conference on computer vision and pattern recognition , pages=

  29. [29]

    ACM Transactions on Applied Perception (TAP) , volume=

    Depth of field affects perceived depth in stereographs , author=. ACM Transactions on Applied Perception (TAP) , volume=. 2014 , publisher=

  30. [30]

    ACM Transactions on Graphics (TOG) , volume=

    Split-aperture 2-in-1 computational cameras , author=. ACM Transactions on Graphics (TOG) , volume=. 2024 , publisher=

  31. [31]

    International Journal of Computer Vision , volume=

    Rational filters for passive depth from defocus , author=. International Journal of Computer Vision , volume=. 1998 , publisher=

  32. [32]

    Journal of the Optical Society of America A , volume=

    Range estimation by optical differentiation , author=. Journal of the Optical Society of America A , volume=. 1998 , publisher=

  33. [33]

    The international journal of robotics research , volume=

    Vision meets robotics: The kitti dataset , author=. The international journal of robotics research , volume=. 2013 , publisher=

  34. [34]

    German conference on pattern recognition , pages=

    High-resolution stereo datasets with subpixel-accurate ground truth , author=. German conference on pattern recognition , pages=. 2014 , organization=

  35. [35]

    IEEE transactions on pattern analysis and machine intelligence , number=

    A new sense for depth of field , author=. IEEE transactions on pattern analysis and machine intelligence , number=. 1987 , publisher=

  36. [36]

    stereo: How different really are they? , author=

    Depth from defocus vs. stereo: How different really are they? , author=. International Journal of Computer Vision , volume=. 2000 , publisher=

  37. [37]

    Proceedings of the IEEE/CVF International Conference on Computer Vision , pages=

    BridgeDepth: Bridging Monocular and Stereo Reasoning with Latent Alignment , author=. Proceedings of the IEEE/CVF International Conference on Computer Vision , pages=

  38. [38]

    Advances in Neural Information Processing Systems , volume=

    Depth anything v2 , author=. Advances in Neural Information Processing Systems , volume=

  39. [39]

    Proceedings of the IEEE/CVF conference on computer vision and pattern recognition , pages=

    Repurposing diffusion-based image generators for monocular depth estimation , author=. Proceedings of the IEEE/CVF conference on computer vision and pattern recognition , pages=

  40. [40]

    International Journal of computer vision , volume=

    Depth from defocus: A spatial domain approach , author=. International Journal of computer vision , volume=. 1994 , publisher=

  41. [41]

    2009 IEEE 12th international conference on computer vision , pages=

    Coded aperture pairs for depth from defocus , author=. 2009 IEEE 12th international conference on computer vision , pages=. 2009 , organization=

  42. [42]

    2003 , publisher=

    Multiple view geometry in computer vision , author=. 2003 , publisher=

  43. [43]

    2008 IEEE conference on computer vision and pattern recognition , pages=

    Variable baseline/resolution stereo , author=. 2008 IEEE conference on computer vision and pattern recognition , pages=. 2008 , organization=

  44. [44]

    Robotics and Autonomous Systems , volume=

    Depth accuracy analysis of the ZED 2i Stereo Camera in an indoor Environment , author=. Robotics and Autonomous Systems , volume=. 2024 , publisher=

  45. [45]

    Proceedings of the IEEE/CVF International Conference on Computer Vision , pages=

    Marigold-dc: Zero-shot monocular depth completion with guided diffusion , author=. Proceedings of the IEEE/CVF International Conference on Computer Vision , pages=

  46. [46]

    2022 , publisher=

    Computer vision: algorithms and applications , author=. 2022 , publisher=

  47. [47]

    Intel RealSenseTM Depth Camera D435 , publisher=

    Intel realsenseTM depth camera D435 , url=. Intel RealSenseTM Depth Camera D435 , publisher=

  48. [48]

    IEEE Access , year=

    Empirical comparison of four stereoscopic depth sensing cameras for robotics applications , author=. IEEE Access , year=

  49. [49]

    Intel realsenseTM depth camera D455 , publisher=

  50. [50]

    Zed 2 - AI stereo camera , publisher=

  51. [51]

    2019 IEEE International Conference on Computational Photography (ICCP) , pages=

    Phasecam3d—learning phase masks for passive single view depth estimation , author=. 2019 IEEE International Conference on Computational Photography (ICCP) , pages=. 2019 , organization=

  52. [52]

    Proceedings of the IEEE/CVF international conference on computer vision , pages=

    Deep optics for monocular depth estimation and 3d object detection , author=. Proceedings of the IEEE/CVF international conference on computer vision , pages=

  53. [53]

    2005 IEEE computer society conference on computer vision and pattern recognition (CVPR'05) , volume=

    Accurate and efficient stereo processing by semi-global matching and mutual information , author=. 2005 IEEE computer society conference on computer vision and pattern recognition (CVPR'05) , volume=. 2005 , organization=

  54. [54]

    Microsoft Research Technical Report , year=

    Micro-baseline stereo , author=. Microsoft Research Technical Report , year=

  55. [55]

    ACM Transactions on Graphics (ToG) , volume=

    Fast depth densification for occlusion-aware augmented reality , author=. ACM Transactions on Graphics (ToG) , volume=. 2018 , publisher=

  56. [56]

    ACM transactions on graphics (TOG) , volume=

    Image and depth from a conventional camera with a coded aperture , author=. ACM transactions on graphics (TOG) , volume=. 2007 , publisher=

  57. [57]

    European conference on computer vision , pages=

    The fast bilateral solver , author=. European conference on computer vision , pages=. 2016 , organization=

  58. [58]

    Proceedings of the Computer Vision and Pattern Recognition Conference , pages=

    Blurry-Edges: Photon-Limited Depth Estimation from Defocused Boundaries , author=. Proceedings of the Computer Vision and Pattern Recognition Conference , pages=

  59. [59]

    2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) , pages=

    Depth coefficients for depth completion , author=. 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) , pages=. 2019 , organization=

  60. [60]

    2018 IEEE international conference on robotics and automation (ICRA) , pages=

    Sparse-to-dense: Depth prediction from sparse depth samples and a single image , author=. 2018 IEEE international conference on robotics and automation (ICRA) , pages=. 2018 , organization=

  61. [61]

    IEEE Transactions on Image Processing , volume=

    Learning guided convolutional network for depth completion , author=. IEEE Transactions on Image Processing , volume=. 2020 , publisher=

  62. [62]

    arXiv preprint arXiv:2207.14780 , year=

    D-flat: A differentiable flat-optics framework for end-to-end metasurface visual sensor design , author=. arXiv preprint arXiv:2207.14780 , year=

  63. [63]

    To appear in Proceedings of the Computer Vision and Pattern Recognition Conference , year=

    SpiderCam: Low-Power Snapshot Depth from Differential Defocus , author=. To appear in Proceedings of the Computer Vision and Pattern Recognition Conference , year=

  64. [64]

    IEICE TRANSACTIONS on Information and Systems , volume=

    Depth from defocus technique based on cross reblurring , author=. IEICE TRANSACTIONS on Information and Systems , volume=. 2019 , publisher=

  65. [65]

    CUReT: Columbia Utrecht reflectance and texture database , author=

  66. [66]

    International journal of computer vision , volume=

    Distinctive image features from scale-invariant keypoints , author=. International journal of computer vision , volume=. 2004 , publisher=

  67. [67]

    In: European conference on computer vision, Springer (2014) 740–755

    Lin, T.Y., Maire, M., Belongie, S., Hays, J., Perona, P., Ramanan, D., Dollár, P., Zitnick, C.L.: Microsoft coco: Common objects in context. In: European conference on computer vision, Springer (2014) 740–755

  68. [68]

    In: European conference on computer vision, Springer (2014) 54–70

    Crowley, E.J., Zisserman, A.: In search of art. In: European conference on computer vision, Springer (2014) 54–70

  69. [69]

    In: Proceedings of the IEEE conference on computer vision and pattern recognition

    Mayer, N., Ilg, E., Hausser, P., Fischer, P., Cremers, D., Dosovitskiy, A., Brox, T.: A large dataset to train convolutional networks for disparity, optical flow, and scene flow estimation. In: Proceedings of the IEEE conference on computer vision and pattern recognition. (2016) 4040–4048

  70. [70]

    Science China Information Sciences 63(11) (2020) 212101

    Bao, W., Wang, W., Xu, Y., Guo, Y., Hong, S., Zhang, X.: Instereo2k: a large real dataset for stereo matching in indoor scenes. Science China Information Sciences 63(11) (2020) 212101

  71. [71]

    In: 2007 IEEE conference on computer vision and pattern recognition, IEEE (2007) 1–8

    Hirschmuller, H., Scharstein, D.: Evaluation of cost functions for stereo matching. In: 2007 IEEE conference on computer vision and pattern recognition, IEEE (2007) 1–8

  72. [72]

    In: Soft Materials 18(2-3), Taylor & Francis (2020) 215--227

    Terao, T.: A machine learning approach to analyze the structural formation of soft matter via image recognition. In: Soft Materials 18(2-3), Taylor & Francis (2020) 215--227