arxiv: 2605.14444 · v1 · submitted 2026-05-14 · 📊 stat.ME

Recognition: no theorem link

Inlier Recovery for Robust Registration via Gram-Matrix Overlap

Ruizi Wu , YueHaw Khoo , Wanjie Wang

Authors on Pith no claims yet

Pith reviewed 2026-05-15 02:01 UTC · model grok-4.3

classification 📊 stat.ME

keywords inlier recoveryrobust registrationGram matrix overlappoint set registrationexact recoveryrow-sum methodeigenvector matchingoutlier handling

0 comments

The pith

The row-sum method recovers exact inlier matches from Gram-matrix overlaps even when inliers drop to order sqrt(n) and their fraction vanishes.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper converts the problem of spotting matched points between two noisy datasets into a structured recovery task by taking the Hadamard product of their Gram matrices. This matrix highlights consistent pairwise distances among true matches while diluting signals from outliers. Two simple procedures follow: one extracts the leading eigenvector of the overlap matrix, the other sums rows and thresholds the totals. The row-sum procedure is shown to succeed exactly whenever dimension and sample size are comparable, even if only sqrt(n) points are inliers up to logarithmic factors. A reader would care because many practical alignment tasks, such as medical-image registration, routinely face heavy outlier contamination that defeats methods requiring a fixed positive fraction of good matches.

Core claim

By forming the Hadamard product of the two Gram matrices, the inlier-recovery task becomes one of identifying a structured submatrix whose row sums or leading eigenvector separate the matched points from the rest; the row-sum procedure recovers this set exactly under the scaling regime where dimension equals sample size and the number of inliers is order sqrt(n) (log factors), whereas the eigenvector procedure only guarantees weak recovery in the same regime.

What carries the argument

The Gram-matrix overlap (Hadamard product of the two point-set Gram matrices), which encodes consistent inner-product structure among inliers and produces an observable signal that row sums or the leading eigenvector can threshold to label matches.

If this is right

Registration pipelines can proceed after exact inlier identification without first assuming a non-vanishing fraction of good matches.
The same overlap construction yields a parallelizable algorithm suitable for large-scale point sets.
Exact recovery holds across a wider range of dimension-to-sample-size ratios than eigenvector-based alternatives.
Numerical tests on brain imaging volumes confirm that the recovered matches suffice for reliable alignment under heavy corruption.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same Gram-overlap idea could be tested on other correspondence problems such as feature matching in computer vision or node alignment in networks.
If the separation signal persists under milder noise assumptions, the row-sum threshold might be replaced by a data-driven rule that avoids any tuning parameter.
The sqrt(n) inlier scaling suggests the method could remain useful in regimes where only a sparse set of reliable anchors exists.

Load-bearing premise

The overlap matrix must carry a sufficiently strong and structured signal that separates the inlier block from outlier noise under the paper's implicit noise and outlier models.

What would settle it

Generate two point clouds in dimension n with exactly c sqrt(n) inliers for small constant c, add the noise and outlier levels stated in the proofs, apply the row-sum procedure, and check whether the recovered set equals the true inlier set with high probability.

Figures

Figures reproduced from arXiv: 2605.14444 by Ruizi Wu, Wanjie Wang, YueHaw Khoo.

**Figure 2.** Figure 2: From left to right, we show the original sky photo, the changed sky photo, and the [PITH_FULL_IMAGE:figures/full_fig_p014_2.png] view at source ↗

**Figure 3.** Figure 3: Performance as the noise level increases for the two sky-data scenarios. Left panel: [PITH_FULL_IMAGE:figures/full_fig_p014_3.png] view at source ↗

**Figure 4.** Figure 4: The yellow-highlighted pixels are the ones identified by the algorithm as extra items. The [PITH_FULL_IMAGE:figures/full_fig_p016_4.png] view at source ↗

read the original abstract

Robust point-set registration in the presence of noise and outliers is challenging because the matched points (inliers) must be identified before reliable alignment can be performed. Existing robust registration methods typically optimize over the transformation space and are often designed for regimes with a nonvanishing fraction of inliers. In this paper, we study the inlier recovery problem arising in robust registration by comparing two datasets through the Hadamard product of their Gram matrices. This formulation converts the inlier identification into a structured recovery problem and avoids direct optimization over the rotation group. Based on this idea, we develop two methods: an eigenvector matching method based on the leading eigenvector of the Gram-matrix overlap, and a row-sum matching method based on aggregated entrywise comparison. We show that the eigenvector method achieves weak recovery when the dimension and sample size are of the same order, while the row-sum method achieves exact recovery under a broader range of dimensional scalings. In particular, when the dimension is comparable to the sample size, exact recovery is possible even when the inlier fraction vanishes, with the number of inliers as small as order $\sqrt{n}$, up to logarithmic factors. We also discuss a parallel implementation for large-scale settings. Numerical experiments on brain imaging data and image examples demonstrate that the proposed methods effectively identify matched structure under substantial corruption.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Gram-matrix overlap turns registration into a recovery problem and the row-sum method claims exact recovery down to sqrt(n) inliers when d~n, but the concentration step looks vulnerable to dependence.

read the letter

The main thing here is a new way to recover inliers in point-set registration by looking at the Hadamard product of the two Gram matrices. Their row-sum method is supposed to achieve exact recovery even when the number of inliers is only order sqrt(n) and the dimension is about the same as the number of points. The formulation is the fresh element. By taking the elementwise product of the Gram matrices they convert the matching problem into one where inliers produce a structured signal that can be picked out by simple methods like taking the leading eigenvector or summing rows. The row-sum approach works across more scaling regimes than the eigenvector one, including the vanishing inlier fraction case. They also outline a parallel version for scaling up. The numerical experiments on brain imaging data add some practical grounding. The soft spot is the justification for why the row sums separate cleanly. When dimension is comparable to sample size the Gram matrix entries are not independent, so standard concentration inequalities may need extra work to control the maximum outlier row sum versus the minimum inlier one. Without the explicit noise and outlier model it is not clear whether the w.h.p. guarantee survives heavy-tailed corruption or structured outliers. The proofs will have to address that dependence carefully. This is aimed at researchers in robust statistics and vision who need registration methods that tolerate very few inliers. Anyone interested in matrix-based reductions for geometric problems could find it worth reading. It has enough of a concrete claim to merit sending to referees who can verify the tail bounds and run the experiments themselves. I would send it for peer review.

Referee Report

1 major / 2 minor

Summary. The paper formulates inlier recovery for robust point-set registration as a structured recovery task via the Hadamard product of Gram matrices from two point clouds. It introduces an eigenvector-matching method that achieves weak recovery when dimension d and sample size n are of the same order, and a row-sum method that achieves exact recovery under broader scalings, including exact recovery with k = Θ(√n) inliers (up to logs) when d ∼ n even as the inlier fraction vanishes. The work also discusses a parallel implementation and reports numerical experiments on brain imaging data and image examples.

Significance. If the claimed recovery guarantees are established with explicit noise models and valid concentration arguments, the reduction to Gram-matrix overlap offers a computationally attractive alternative to direct optimization over the rotation group, with the row-sum method's ability to handle vanishing inlier fractions representing a potentially useful advance for high-dimensional registration tasks. The parallel implementation is a practical strength for large-scale settings.

major comments (1)

[Abstract / row-sum method] The exact-recovery claim for the row-sum method (k = Θ(√n) inliers when d ∼ n) requires that the minimum inlier row-sum of the Gram-matrix overlap strictly exceeds the maximum outlier row-sum with high probability. Because Gram matrices have rank at most d, the entries of their Hadamard product are dependent; standard sub-Gaussian tail bounds therefore do not apply directly to the row sums without additional incoherence or moment assumptions on the point clouds. The abstract states no explicit noise or outlier model and provides no concentration lemma establishing the required separation, leaving the central exact-recovery guarantee unsupported in the given description.

minor comments (2)

[Abstract] The abstract mentions numerical experiments on brain imaging data but supplies no quantitative details (inlier fractions tested, error metrics, or baselines), making it impossible to assess empirical support for the theoretical claims.
[Method description] Notation for the Gram-matrix overlap and the precise definition of the row-sum statistic should be introduced with an equation in the main text to avoid ambiguity when comparing the two methods.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the careful reading and constructive feedback. We appreciate the positive assessment of the Gram-matrix overlap approach and address the major comment below.

read point-by-point responses

Referee: [Abstract / row-sum method] The exact-recovery claim for the row-sum method (k = Θ(√n) inliers when d ∼ n) requires that the minimum inlier row-sum of the Gram-matrix overlap strictly exceeds the maximum outlier row-sum with high probability. Because Gram matrices have rank at most d, the entries of their Hadamard product are dependent; standard sub-Gaussian tail bounds therefore do not apply directly to the row sums without additional incoherence or moment assumptions on the point clouds. The abstract states no explicit noise or outlier model and provides no concentration lemma establishing the required separation, leaving the central exact-recovery guarantee unsupported in the given description.

Authors: We agree that the abstract is too concise and does not mention the noise/outlier model or reference the supporting concentration result. The full manuscript (Section 2) defines an explicit model: inliers satisfy X_i = R Y_i + noise with isotropic Gaussian noise of variance σ², while outliers are drawn uniformly from the unit sphere. Theorem 4.3 proves exact recovery for the row-sum method when d ∼ n and k = Ω(√n log n), using a matrix Bernstein inequality that accounts for the rank-d dependence through incoherence assumptions on the point clouds (Assumption 3.1). The proof establishes the required row-sum separation with high probability. We will revise the abstract to briefly state the model and refer to Theorem 4.3 for the exact-recovery guarantee. This change will appear in the next version of the manuscript. revision: yes

Circularity Check

0 steps flagged

No significant circularity detected in derivation chain

full rationale

The paper introduces a Gram-matrix overlap formulation to convert inlier identification into a structured recovery problem, then analyzes two methods (eigenvector matching and row-sum matching) with claimed recovery guarantees. No equations or steps in the provided abstract reduce the claimed exact recovery thresholds (e.g., k=Θ(√n) inliers when d~n) to self-definitions, fitted inputs renamed as predictions, or load-bearing self-citations. The central claims rest on concentration arguments for the overlap matrix that are presented as derived rather than tautological, making the analysis self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The central claims rest on domain assumptions about point embeddings and outlier models that allow the overlap matrix to separate inliers; no free parameters or invented entities are introduced in the abstract.

axioms (2)

domain assumption Point sets lie in Euclidean space so that Gram matrices capture pairwise inner products
Standard modeling choice for registration problems
domain assumption Inliers produce consistent Gram entries while outliers do not, enabling separation via overlap
Required for both eigenvector and row-sum recovery to succeed

pith-pipeline@v0.9.0 · 5535 in / 1284 out tokens · 44026 ms · 2026-05-15T02:01:34.869079+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

29 extracted references · 29 canonical work pages · 1 internal anchor

[1]

Finding a large hidden clique in a random graph.Random Structures & Algorithms, 13(3–4):457–466, 1998

Noga Alon, Michael Krivelevich, and Benny Sudakov. Finding a large hidden clique in a random graph.Random Structures & Algorithms, 13(3–4):457–466, 1998

work page 1998
[2]

Community detection in dense random networks.The Annals of Statistics, 42(3):940–969, 2014

Ery Arias-Castro and Nicolas Verzelen. Community detection in dense random networks.The Annals of Statistics, 42(3):940–969, 2014

work page 2014
[3]

K. S. Arun, T. S. Huang, and S. D. Blostein. Least-squares fitting of two 3-d point sets.IEEE Transactions on Pattern Analysis and Machine Intelligence, PAMI-9(5):698–700, 1987

work page 1987
[4]

Maximizing robustness of point-set registration by leveraging non-convexity.arXiv preprint arXiv:2004.08772, 2020

Cindy Orozco Bohorquez, Yuehaw Khoo, and Lexing Ying. Maximizing robustness of point-set registration by leveraging non-convexity.arXiv preprint arXiv:2004.08772, 2020

work page arXiv 2004
[5]

Cesar Cadena, Luca Carlone, Henry Carrillo, Yasir Latif, Davide Scaramuzza, Jos´ e Neira, Ian Reid, and John J. Leonard. Past, present, and future of simultaneous localization and mapping: Toward the robust-perception age.IEEE Transactions on Robotics, 32(6):1309–1332, 2016

work page 2016
[6]

Simultaneous localization and mapping: Part i.IEEE Robotics & Automation Magazine, 13(2):99–110, 2006

Hugh Durrant-Whyte and Tim Bailey. Simultaneous localization and mapping: Part i.IEEE Robotics & Automation Magazine, 13(2):99–110, 2006

work page 2006
[7]

The spectral norm of random inner-product kernel matrices

Zhou Fan and Andrea Montanari. The spectral norm of random inner-product kernel matrices. Probability Theory and Related Fields, 173:27–85, 2019

work page 2019
[8]

Freesurfer.NeuroImage, 62(2):774–781, 2012

Bruce Fischl. Freesurfer.NeuroImage, 62(2):774–781, 2012

work page 2012
[9]

Glasser, Stamatios N

Matthew F. Glasser, Stamatios N. Sotiropoulos, J. Anthony Wilson, Timothy S. Coal- son, Bruce Fischl, Jesper L. R. Andersson, Junqian Xu, Saad Jbabdi, Matthew Webster, Jonathan R. Polimeni, David C. Van Essen, and Mark Jenkinson. The minimal preprocessing pipelines for the human connectome project.NeuroImage, 80:105–124, 2013

work page 2013
[10]

Glasser and David C

Matthew F. Glasser and David C. Van Essen. Mapping human cortical areas in vivo based on myelin content as revealed by t1- and t2-weighted mri.Journal of Neuroscience, 31(32):11597– 11616, 2011

work page 2011
[11]

Unsupervised alignment of embeddings with Wasserstein Procrustes

Edouard Grave, Armand Joulin, and Quentin Berthet. Unsupervised alignment of embeddings with Wasserstein Procrustes. InProceedings of the Twenty-Second International Conference on Artificial Intelligence and Statistics (AISTATS), volume 89 ofProceedings of Machine Learning Research, pages 1880–1890. PMLR, 2019

work page 2019
[12]

Computational lower bounds for community detection on random graphs

Bruce Hajek, Yihong Wu, and Jiaming Xu. Computational lower bounds for community detection on random graphs. InProceedings of The 28th Conference on Learning Theory (COLT), volume 40 ofProceedings of Machine Learning Research, pages 899–928. PMLR, 2015

work page 2015
[13]

Cam- bridge University Press, 2 edition, 2004

Richard Hartley and Andrew Zisserman.Multiple View Geometry in Computer Vision. Cam- bridge University Press, 2 edition, 2004

work page 2004
[14]

Convex relaxations of se (2) and se (3) for visual pose estimation

Matanya B Horowitz, Nikolai Matni, and Joel W Burdick. Convex relaxations of se (2) and se (3) for visual pose estimation. In2014 IEEE International Conference on Robotics and Automation (ICRA), pages 1148–1154. IEEE, 2014. 17

work page 2014
[15]

Bannister, Michael Brady, and Stephen M

Mark Jenkinson, Peter R. Bannister, Michael Brady, and Stephen M. Smith. Improved opti- misation for the robust and accurate linear registration and motion correction of brain images. NeuroImage, 17(2):825–841, 2002

work page 2002
[16]

Beckmann, Timothy E

Mark Jenkinson, Christian F. Beckmann, Timothy E. J. Behrens, Mark W. Woolrich, and Stephen M. Smith. Fsl.NeuroImage, 62(2):782–790, 2012

work page 2012
[17]

Laurent and P

B. Laurent and P. Massart. Adaptive estimation of a quadratic functional by model selection. The Annals of Statistics, 28(5):1302–1338, 2000

work page 2000
[18]

Convex relaxation methods for community detection.Statistical Science, 36(1):2–15, 2021

Xiaodong Li, Yudong Chen, and Jiaming Xu. Convex relaxation methods for community detection.Statistical Science, 36(1):2–15, 2021

work page 2021
[19]

Marcus, John Harwell, Trudy Olsen, Michael Hodge, Matthew F

Daniel S. Marcus, John Harwell, Trudy Olsen, Michael Hodge, Matthew F. Glasser, Fred Prior, Mark Jenkinson, Timothy Laumann, Sandra W. Curtiss, and David C. Van Essen. Informatics and data mining: Tools and strategies for the human connectome project.Frontiers in Neuroinformatics, 5:4, 2011

work page 2011
[20]

Functional maps: A flexible representation of maps between shapes.ACM Transactions on Graphics (TOG), 2012

Maks Ovsjanikov, Mirela Ben-Chen, Justin Solomon, Adrian Butscher, and Leonidas Guibas. Functional maps: A flexible representation of maps between shapes.ACM Transactions on Graphics (TOG), 2012

work page 2012
[21]

Sotiropoulos, Steen Moeller, Saad Jbabdi, Junqian Xu, Jesper L

Stamatios N. Sotiropoulos, Steen Moeller, Saad Jbabdi, Junqian Xu, Jesper L. R. Andersson, Edward J. Auerbach, Essa Yacoub, David Feinberg, Kawin Setsompop, Lawrence L. Wald, Timothy E. J. Behrens, Kamil Ugurbil, and Christophe Lenglet. Effects of image reconstruction on fibre orientation mapping from multichannel diffusion mri: Reducing the noise floor u...

work page 2013
[22]

Van Essen, Stephen M

David C. Van Essen, Stephen M. Smith, Deanna M. Barch, Timothy E. J. Behrens, Essa Yacoub, Kamil Ugurbil, and WU-Minn HCP Consortium. The wu-minn human connectome project: An overview.NeuroImage, 80:62–79, 2013

work page 2013
[23]

A robust approach to nonlinear multivariate analysis.(No Title), 1994

Peter Verboon. A robust approach to nonlinear multivariate analysis.(No Title), 1994

work page 1994
[24]

Cambridge University Press, 2018

Roman Vershynin.High-Dimensional Probability: An Introduction with Applications in Data Science. Cambridge University Press, 2018

work page 2018
[25]

A least squares estimate of satellite attitude.SIAM Review, 7(3):409, 1965

Grace Wahba. A least squares estimate of satellite attitude.SIAM Review, 7(3):409, 1965

work page 1965
[26]

Manifold alignment using procrustes analysis

Chang Wang and Sridhar Mahadevan. Manifold alignment using procrustes analysis. In Proceedings of the 25th International Conference on Machine Learning (ICML), pages 1120– 1127, 2008

work page 2008
[27]

A Polynomial-time Solution for Robust Registration with Extreme Outlier Rates

Heng Yang and Luca Carlone. A polynomial-time solution for robust registration with extreme outlier rates.arXiv preprint arXiv:1903.08588, 2019

work page internal anchor Pith review Pith/arXiv arXiv 1903
[28]

Y. Yu, T. Wang, and R. J. Samworth. A useful variant of the davis–kahan theorem for statisticians.Biometrika, 102(2):315–323, 2014

work page 2014
[29]

exp ( t Z π/2 0 ∇Yθ Fu(X, Yθ), Y ′ θ dθ ) IBY X # . Applying Inequality 4, E[exp{tF u(X, Y)}I BY |X]≤E

Juyong Zhang, Yuxin Yao, and Bailin Deng. Fast and robust iterative closest point.IEEE Transactions on Pattern Analysis and Machine Intelligence, 44(7):3450–3466, 2021. 18 A Pre-processing on the Brain Image Data The details of the data features and pre-processing strategies are listed in Table 3. Table 3: Description of the Brain Data and Pre-processing ...

work page 2021