pith. sign in

arxiv: 1906.11096 · v1 · pith:4G526MEInew · submitted 2019-06-26 · 💻 cs.CV

Mapped Convolutions

Pith reviewed 2026-05-25 15:23 UTC · model grok-4.3

classification 💻 cs.CV
keywords mapped convolutionspherical convolutionequirectangular imagegeodesic griddepth estimationconvolutional neural networkstructured data
0
0 comments X

The pith

Mapped convolution decouples sampling from weighted summation to apply kernels to any structured data such as geodesic meshes.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces a mapped convolution that separates the choice of which input points to sample from the computation of their weighted sum. This separation removes the requirement that data lie on a regular pixel grid. The authors test the idea on spherical images by improving equirectangular sampling and by projecting the image onto a geodesic mesh for direct convolution. A reader would care because the change lets the same convolutional machinery operate on meshes, point sets, or other non-grid structures while producing a reported gain of nearly 17 percent on spherical depth estimation.

Core claim

Mapped convolutions are obtained by replacing the fixed grid sampling of standard convolution with an arbitrary sampling function that selects input values before the kernel performs its weighted sum. On spherical data this formulation supports both an improved sampling scheme for equirectangular images and a projection of the image onto a geodesic grid so that convolution occurs directly on the textured mesh. The resulting networks exceed prior spherical-convolution methods by nearly 17 percent on dense depth estimation.

What carries the argument

The mapped convolution, defined by an explicit sampling function that selects input locations before the kernel weights are applied.

If this is right

  • Convolution kernels can be applied to any type of structured data once an appropriate sampling function is supplied.
  • Equirectangular spherical images can be processed with a sampling method that reduces distortion compared with earlier approaches.
  • Spherical images can be projected onto a geodesic grid so that convolution occurs directly on the mesh surface.
  • Spherical depth estimation accuracy can exceed the previous state of the art by nearly 17 percent.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same separation of sampling from summation could be applied to other grid-based operations such as pooling or transposed convolution on irregular domains.
  • Geodesic meshes may serve as a more natural representation than equirectangular or cube-map formats for omnidirectional vision tasks.
  • Performance differences among spherical CNN methods may largely trace to discretization choices rather than to the convolution operator itself.

Load-bearing premise

The proposed sampling improvements and geodesic-grid projection produce a fair comparison to earlier spherical convolution methods without introducing large discretization artifacts that inflate the reported performance gain.

What would settle it

Running the identical depth-estimation network on the same equirectangular test set using only prior spherical convolution operators without the mapped sampling or geodesic projection, and checking whether the 17 percent margin disappears.

Figures

Figures reproduced from arXiv: 1906.11096 by Akash Bapat, Jan-Michael Frahm, Marc Eder, Thanh Vu, True Price.

Figure 1
Figure 1. Figure 1: A visualization of how our proposed mapped con [PITH_FULL_IMAGE:figures/full_fig_p001_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Simple encoder-decoder network architecture used for all experiments. For the filter bank method evaluated in [PITH_FULL_IMAGE:figures/full_fig_p005_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: A comparison of sampling functions used for spherical images. The blue points represent the kernel center. Note [PITH_FULL_IMAGE:figures/full_fig_p006_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Absolute error maps for depth predictions from an example equirectangular input image (top left). Our ISEA [PITH_FULL_IMAGE:figures/full_fig_p007_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Sampling the cube map according to latitude and [PITH_FULL_IMAGE:figures/full_fig_p007_5.png] view at source ↗
Figure 7
Figure 7. Figure 7: Subdividing an icosahedron to an icosphere. [PITH_FULL_IMAGE:figures/full_fig_p007_7.png] view at source ↗
Figure 8
Figure 8. Figure 8: (Best viewed in color) Performance comparison [PITH_FULL_IMAGE:figures/full_fig_p008_8.png] view at source ↗
read the original abstract

We present a versatile formulation of the convolution operation that we term a "mapped convolution." The standard convolution operation implicitly samples the pixel grid and computes a weighted sum. Our mapped convolution decouples these two components, freeing the operation from the confines of the image grid and allowing the kernel to process any type of structured data. As a test case, we demonstrate its use by applying it to dense inference on spherical data. We perform an in-depth study of existing spherical image convolution methods and propose an improved sampling method for equirectangular images. Then, we discuss the impact of data discretization when deriving a sampling function, highlighting drawbacks of the cube map representation for spherical data. Finally, we illustrate how mapped convolutions enable us to convolve directly on a mesh by projecting the spherical image onto a geodesic grid and training on the textured mesh. This method exceeds the state of the art for spherical depth estimation by nearly 17%. Our findings suggest that mapped convolutions can be instrumental in expanding the application scope of convolutional neural networks.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

1 major / 1 minor

Summary. The paper introduces mapped convolutions as a formulation that decouples sampling from the weighted sum in standard convolution, enabling the kernel to operate on arbitrary structured data beyond image grids. As a test case on spherical data, the authors propose an improved equirectangular sampling method, critique cube-map discretization drawbacks, project spherical images onto a geodesic mesh, and report that this approach exceeds prior state-of-the-art spherical depth estimation performance by nearly 17%.

Significance. If the reported gains can be shown to arise specifically from the decoupling rather than from the accompanying sampling and projection choices, the approach would offer a general mechanism for extending CNNs to non-grid structured data such as meshes.

major comments (1)
  1. [Abstract / experimental results] Abstract and experimental results: the 17% improvement is reported only after combining mapped convolutions with a new equirectangular sampling method and a geodesic-mesh projection. No ablation is described that fixes the sampling function and varies only the convolution operator itself, leaving open the possibility that the gain is driven by discretization changes rather than the claimed decoupling. This directly affects the central claim that mapped convolutions are the enabling factor.
minor comments (1)
  1. [Abstract] Abstract: the headline performance claim lacks any mention of baselines, error bars, data splits, or evaluation protocol, reducing verifiability of the empirical result.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the constructive feedback. The concern about isolating the contribution of mapped convolutions from sampling and discretization choices is valid and directly impacts the strength of our central claim. We address this below and commit to revisions that will clarify the role of the decoupling.

read point-by-point responses
  1. Referee: [Abstract / experimental results] Abstract and experimental results: the 17% improvement is reported only after combining mapped convolutions with a new equirectangular sampling method and a geodesic-mesh projection. No ablation is described that fixes the sampling function and varies only the convolution operator itself, leaving open the possibility that the gain is driven by discretization changes rather than the claimed decoupling. This directly affects the central claim that mapped convolutions are the enabling factor.

    Authors: We agree that the current experiments do not include an ablation that holds the sampling function fixed while varying only the convolution operator (standard vs. mapped). This omission leaves ambiguity about whether the reported gains stem primarily from the decoupling or from the accompanying sampling improvements and mesh projection. In the revised manuscript we will add such an ablation: we will fix the improved equirectangular sampling and compare a standard convolution baseline (adapted to the same sampling locations where feasible) against mapped convolution on the same data. We will also clarify that the geodesic-mesh experiment is only possible because mapped convolutions decouple sampling from the weighted sum, allowing direct operation on unstructured mesh vertices; standard convolutions cannot be applied in the same way without additional discretization steps. These additions will strengthen the evidence that the decoupling itself is the enabling mechanism. revision: yes

Circularity Check

0 steps flagged

No circularity: mapped convolution is a direct definitional decoupling with independent empirical validation

full rationale

The paper defines mapped convolution explicitly as the decoupling of sampling from the weighted sum operation, which is a constructive formulation rather than a derivation that reduces to its own inputs or prior self-citations. No equations or claims in the provided abstract or description show a self-definitional loop, a fitted parameter renamed as prediction, or a load-bearing uniqueness theorem imported from the authors' own prior work. The 17% gain is presented as an empirical outcome on spherical depth estimation after introducing sampling improvements and geodesic projection; these are separate methodological choices whose effects are not used to justify the core operator definition itself. The derivation chain remains self-contained against external benchmarks and does not rely on any of the enumerated circular patterns.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only review; no free parameters, axioms, or invented entities are identifiable from the provided text.

pith-pipeline@v0.9.0 · 5707 in / 1114 out tokens · 24271 ms · 2026-05-25T15:23:55.316076+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

29 extracted references · 29 canonical work pages · 7 internal anchors

  1. [1]

    Low-memory GEMM-based convolution algorithms for deep neural networks

    A. Anderson, A. Vasudevan, C. Keane, and D. Gregg. Low- memory gemm-based convolution algorithms for deep neu- ral networks. arXiv preprint arXiv:1709.03395, 2017. 3

  2. [2]

    Armeni, A

    I. Armeni, A. Sax, A. R. Zamir, and S. Savarese. Joint 2D- 3D-Semantic Data for Indoor Scene Understanding. ArXiv e-prints, Feb. 2017. 5

  3. [3]

    M. M. Bronstein, J. Bruna, Y . LeCun, A. Szlam, and P. Van- dergheynst. Geometric deep learning: going beyond eu- clidean data. IEEE Signal Processing Magazine, 34(4):18– 42, 2017. 1, 2

  4. [4]

    Chang, A

    A. Chang, A. Dai, T. Funkhouser, M. Halber, M. Niessner, M. Savva, S. Song, A. Zeng, and Y . Zhang. Matterport3d: Learning from rgb-d data in indoor environments. Interna- tional Conference on 3D Vision (3DV), 2017. 5, 6

  5. [5]

    Convolutional Networks for Spherical Signals

    T. Cohen, M. Geiger, J. K ¨ohler, and M. Welling. Con- volutional networks for spherical signals. arXiv preprint arXiv:1709.04893, 2017. 2

  6. [6]

    T. S. Cohen, M. Geiger, J. K¨ohler, and M. Welling. Spherical cnns. arXiv preprint arXiv:1801.10130, 2018. 2

  7. [7]

    Coors, A

    B. Coors, A. P. Condurache, and A. Geiger. Spherenet: Learning spherical representations for detection and classi- fication in omnidirectional images. In European Conference on Computer Vision, pages 525–541. Springer, 2018. 1, 2, 3, 4, 5, 6, 7, 8

  8. [8]

    J. Dai, H. Qi, Y . Xiong, Y . Li, G. Zhang, H. Hu, and Y . Wei. Deformable convolutional networks. CoRR, abs/1703.06211, 1(2):3, 2017. 2, 4

  9. [9]

    TextureNet: Consistent Local Parametrizations for Learning from High-Resolution Signals on Meshes

    J. Huang, H. Zhang, L. Yi, T. Funkhouser, M. Nießner, and L. Guibas. Texturenet: Consistent local parametrizations for learning from high-resolution signals on meshes. arXiv preprint arXiv:1812.00020, 2018. 1

  10. [10]

    Jaderberg, K

    M. Jaderberg, K. Simonyan, A. Zisserman, et al. Spatial transformer networks. In Advances in neural information processing systems, pages 2017–2025, 2015. 2

  11. [11]

    X. Jia, B. De Brabandere, T. Tuytelaars, and L. V . Gool. Dy- namic filter networks. In Advances in Neural Information Processing Systems, pages 667–675, 2016. 2

  12. [12]

    J. A. Kimerling, K. Sahr, D. White, and L. Song. Compar- ing geometrical properties of global grids. Cartography and Geographic Information Science, 1999. 7

  13. [13]

    D. P. Kingma and J. Ba. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980, 2014. 4

  14. [14]

    T. N. Kipf and M. Welling. Semi-supervised classifica- tion with graph convolutional networks. arXiv preprint arXiv:1609.02907, 2016. 2, 4

  15. [15]

    Laina, C

    I. Laina, C. Rupprecht, V . Belagiannis, F. Tombari, and N. Navab. Deeper depth prediction with fully convolutional residual networks. In 2016 Fourth International Conference on 3D Vision (3DV), pages 239–248. IEEE, 2016. 4

  16. [16]

    W. Liu, D. Anguelov, D. Erhan, C. Szegedy, S. Reed, C.- Y . Fu, and A. C. Berg. Ssd: Single shot multibox detector. In European conference on computer vision , pages 21–37. Springer, 2016. 1

  17. [17]

    J. Long, E. Shelhamer, and T. Darrell. Fully convolutional networks for semantic segmentation. In Proceedings of the IEEE conference on computer vision and pattern recogni- tion, pages 3431–3440, 2015. 1

  18. [18]

    C. Loop. Smooth subdivision surfaces based on triangles. Master’s thesis, University of Utah, Department of Mathe- matics, 1987. 7

  19. [19]

    Masci, D

    J. Masci, D. Boscaini, M. Bronstein, and P. Vandergheynst. Geodesic convolutional neural networks on riemannian man- ifolds. In Proceedings of the IEEE international conference on computer vision workshops, pages 37–45, 2015. 1, 2, 3

  20. [20]

    Mayer, E

    N. Mayer, E. Ilg, P. Hausser, P. Fischer, D. Cremers, A. Dosovitskiy, and T. Brox. A large dataset to train con- volutional networks for disparity, optical flow, and scene flow estimation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition , pages 4040– 4048, 2016. 1

  21. [21]

    McCormac, A

    J. McCormac, A. Handa, S. Leutenegger, and A. J.Davison. Scenenet rgb-d: Can 5m synthetic images beat generic ima- genet pre-training on indoor segmentation? 2017. 5

  22. [22]

    Monti, D

    F. Monti, D. Boscaini, J. Masci, E. Rodola, J. Svoboda, and M. M. Bronstein. Geometric deep learning on graphs and manifolds using mixture model cnns. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recogni- tion, pages 5115–5124, 2017. 1

  23. [23]

    J. P. Snyder. An equal-area map projection for polyhe- dral globes. Cartographica: The International Journal for Geographic Information and Geovisualization, 29(1):10–21,

  24. [24]

    S. Song, F. Yu, A. Zeng, A. X. Chang, M. Savva, and T. Funkhouser. Semantic scene completion from a single depth image. IEEE Conference on Computer Vision and Pat- tern Recognition, 2017. 5

  25. [25]

    Su and K

    Y .-C. Su and K. Grauman. Learning spherical convolution for fast features from 360 imagery. In Advances in Neural Information Processing Systems, pages 529–539, 2017. 2

  26. [26]

    Tateno, N

    K. Tateno, N. Navab, and F. Tombari. Distortion-aware con- volutional filters for dense prediction in panoramic images. In European Conference on Computer Vision , pages 732–

  27. [27]

    1, 2, 3, 4, 5, 6, 7, 8

    Springer, 2018. 1, 2, 3, 4, 5, 6, 7, 8

  28. [28]

    Tchapmi and D

    L. Tchapmi and D. Huber. The sumo challenge. 4, 5, 6

  29. [29]

    Zioulis, A

    N. Zioulis, A. Karakottas, D. Zarpalas, and P. Daras. Omnidepth: Dense depth estimation for indoors spherical panoramas. In Proceedings of the European Conference on Computer Vision (ECCV), pages 448–465, 2018. 2, 4, 5, 6, 7