Recognition: 2 theorem links
· Lean TheoremAsymptotic Hausdorff and Language Similarity
Pith reviewed 2026-05-12 04:28 UTC · model grok-4.3
The pith
The Asymptotic Hausdorff lifting turns element-level metrics into pseudo-metrics on sets that ignore finite deviations and reflect asymptotic similarity.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The Asymptotic Hausdorff lifting AH_d is defined so that the distance between two sets X and Y is the limit superior, as n tends to infinity, of the classical Hausdorff distance between the n-sized subsets of X and Y. Applied to normalized edit distances on words, this yields a pseudo-metric on languages that captures asymptotic edit behavior. For regular languages the equivalence classes under AH_ned are determined by the languages' asymptotic essence, and the distance itself is computable for both regular languages and bounded context-free languages.
What carries the argument
The Asymptotic Hausdorff lifting AH_d, which replaces the classical Hausdorff distance with its asymptotic limit to ignore finite-size outliers while preserving metric properties.
If this is right
- Regular languages fall into equivalence classes determined by their asymptotic edit behavior under AH_ned.
- The resulting distances remain pseudo-metrics, satisfying non-negativity, symmetry, and the triangle inequality.
- AH_ned can be computed algorithmically for all regular languages and all bounded context-free languages.
- The construction applies to any underlying element metric d that admits a natural size measure.
Where Pith is reading between the lines
- The same lifting could be used on other infinite objects equipped with a size function, such as infinite trees or graphs, to obtain asymptotic similarity measures.
- Equivalence classes under AH_ned might correspond to recognizable properties of the generating automata that are stable under finite modifications.
- The method suggests a route to approximate language learning that only needs to match behavior on sufficiently long strings.
Load-bearing premise
That the limit superior of the scaled Hausdorff distances exists and that ignoring all finite deviations still produces a meaningful notion of similarity between languages.
What would settle it
Two regular languages that differ by only finitely many words yet receive a positive AH_ned value, or a pair of bounded context-free languages for which the computed AH_ned value contradicts direct enumeration of long words.
Figures
read the original abstract
We introduce the \textit{Asymptotic Hausdorff} lifting, denoted $\mathbb{AH}_{d}$, a general method for lifting an element-level metric $d$ to a (pseudo-) metric on sets, that captures asymptotic similarity in infinite domains equipped with a notion of size. The construction is designed to be insensitive to finite deviations and to avoid the limitations of classical Hausdorff-based approaches, which are often overly sensitive to outliers and fail to reflect asymptotic behavior. Formal languages provide a central motivating instance of this framework, where elements are words and sets are languages. When applied to normalized edit distances, the Asymptotic Hausdorff lifting yields metric-valued distances between languages that reflect asymptotic edit behavior while preserving metric structure. We study the equivalence classes of regular languages induced by $\mathbb{AH}_{d}$ for normalized edit distances $d$, and characterize their asymptotic essence. Focusing in particular on the normalized edit distance of Marzal and Vidal, $\textsf{ned}$, we investigate the computation of $\mathbb{AH}_{\textsf{ned}}$ for regular languages and for bounded context-free languages.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper introduces the Asymptotic Hausdorff lifting AH_d, a general construction for lifting an element-level metric d to a (pseudo-)metric on sets that is insensitive to finite deviations and captures asymptotic similarity. Applied to normalized edit distances on formal languages, it produces distances reflecting asymptotic edit behavior while preserving metric structure. The work studies the equivalence classes of regular languages induced by AH_d, characterizes their asymptotic essence, and investigates computation of AH_ned (for the normalized edit distance ned of Marzal and Vidal) on regular languages and bounded context-free languages.
Significance. If the lifting is shown to satisfy the metric axioms (particularly the triangle inequality) and to induce well-defined equivalence classes on regular languages that exactly capture shared asymptotic edit behavior, the construction would offer a principled, outlier-robust alternative to classical Hausdorff distances for comparing infinite objects such as languages. This could be useful in automata theory and applications requiring asymptotic similarity measures. The restriction to regular languages and the focus on computability for ned add a concrete computational angle, though no machine-checked proofs or reproducible examples are referenced.
major comments (1)
- [Abstract] Abstract: the central claims that AH_d 'preserves metric structure' and 'yields characterizations' of asymptotic essence rest on an asserted lifting construction, yet the manuscript supplies neither the explicit definition of AH_d nor a proof (or even sketch) that the triangle inequality holds on the space of languages. This is load-bearing for the claim that the result is a (pseudo-)metric reflecting asymptotic edit behavior rather than a heuristic.
Simulated Author's Rebuttal
Thank you for the careful review of our manuscript. We address the major comment below and will make targeted revisions to improve clarity.
read point-by-point responses
-
Referee: [Abstract] Abstract: the central claims that AH_d 'preserves metric structure' and 'yields characterizations' of asymptotic essence rest on an asserted lifting construction, yet the manuscript supplies neither the explicit definition of AH_d nor a proof (or even sketch) that the triangle inequality holds on the space of languages. This is load-bearing for the claim that the result is a (pseudo-)metric reflecting asymptotic edit behavior rather than a heuristic.
Authors: We thank the referee for this observation. The manuscript does contain the explicit definition of the Asymptotic Hausdorff lifting AH_d in the main body (following the introduction of the general construction), along with a full proof that it satisfies the triangle inequality and thus forms a pseudo-metric on languages. The proof proceeds by applying the triangle inequality of the base metric d to the finite approximations and then taking the lim sup. We agree that the abstract would benefit from a brief pointer to these elements to make the central claims more immediately verifiable from the abstract alone. We will therefore revise the abstract to include a short reference to the definition and the metric property. revision: partial
Circularity Check
No significant circularity in the derivation chain
full rationale
The paper introduces the Asymptotic Hausdorff lifting AH_d as a direct, general construction that lifts an arbitrary base metric d on elements to a (pseudo-)metric on sets, designed to ignore finite deviations. The central claims about metric preservation, equivalence classes of regular languages under AH_ned, and asymptotic behavior follow from the explicit definition of the lifting rather than from any fitted parameters, self-referential equations, or load-bearing self-citations. No step reduces a claimed prediction or uniqueness result to its own inputs by construction; the derivation remains self-contained against the stated axioms of the base distance.
Axiom & Free-Parameter Ledger
axioms (2)
- domain assumption d is a metric (or pseudo-metric) on the underlying word space.
- domain assumption There exists a notion of size on the infinite domain that allows asymptotic limits to be defined.
invented entities (1)
-
Asymptotic Hausdorff lifting AH_d
no independent evidence
Lean theorems connected to this paper
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclearWe introduce the Asymptotic Hausdorff lifting, denoted AH_d, a general method for lifting an element-level metric d to a (pseudo-) metric on sets, that captures asymptotic similarity in infinite domains equipped with a notion of size. ... AH^▷_d(X,Y) = lim_{k→∞} sup_{x∈X, s(x)≥k} inf_{y∈Y} d(x,y)
-
IndisputableMonolith/Foundation/RealityFromDistinction.leanreality_from_one_distinction unclearWe study the equivalence classes of regular languages induced by AH_d ... asymptotic essence ... Efa(A) obtained by replacing transitions not in SCCs by ε-transitions
Reference graph
Works this paper leans on
- [1]
-
[2]
THE DISTRIBUTION OF THE FLORA IN THE ALPINE ZONE.1 , author=. New Phytologist , year=
- [3]
-
[4]
Noam Chomsky and George A. Miller , title =. Inf. Control. , volume =. 1958 , url =. doi:10.1016/S0019-9958(58)90082-2 , timestamp =
-
[5]
Proceedings of the American Mathematical Society , volume=
Bounded regular sets , author=. Proceedings of the American Mathematical Society , volume=
- [6]
- [7]
-
[8]
Binary codes capable of correcting deletions, insertions and reversals
Levenshtein, Vladimir Iosifovich , biburl =. Binary codes capable of correcting deletions, insertions and reversals. , volume =. Soviet Physics Doklady , keywords =
-
[9]
Robert A. Wagner , title =. Commun. 1974 , url =. doi:10.1145/360980.360995 , timestamp =
-
[10]
International Symposium on Mathematical Foundations of Computer Science , pages=
Characterizing regular languages with polynomial densities , author=. International Symposium on Mathematical Foundations of Computer Science , pages=. 1992 , organization=
work page 1992
-
[11]
Marzal, A. and Vidal, E. , journal=. Computation of normalized edit distance and applications , year=
-
[12]
Discrete Applied Mathematics , volume=
Thin and slender languages , author=. Discrete Applied Mathematics , volume=. 1995 , publisher=
work page 1995
-
[13]
Mehryar Mohri , title =. Comput. Linguistics , volume =. 1997 , timestamp =
work page 1997
-
[14]
Mohri, Mehryar , title =. Proceedings of the 7th International Conference on Implementation and Application of Automata , pages =. 2002 , isbn =
work page 2002
-
[15]
A Normalized Levenshtein Distance Metric , year=
Yujian, Li and Bo, Liu , journal=. A Normalized Levenshtein Distance Metric , year=
-
[16]
Dalia Krieger and Narad Rampersad and Jeffrey O. Shallit , title =. CoRR , volume =. 2007 , url =. 0711.4990 , timestamp =
-
[17]
A contextual normalised edit distance , booktitle =
Colin de la Higuera and Luisa Mic. A contextual normalised edit distance , booktitle =
- [18]
-
[19]
Lecture notes LIAFA, Universit
Mathematical foundations of automata theory , author=. Lecture notes LIAFA, Universit
- [20]
-
[21]
Cui, Cewei and Dang, Zhe and Fischer, Thomas R. and Ibarra, Oscar H. , title =. 2013 , publisher =
work page 2013
-
[22]
Proceedings of the 26th Annual
Michael Benedikt and Gabriele Puppis and Cristian Riveros , title =. Proceedings of the 26th Annual
-
[23]
The Cost of Traveling between Languages , booktitle =
Michael Benedikt and Gabriele Puppis and Cristian Riveros , editor =. The Cost of Traveling between Languages , booktitle =
-
[24]
Michael Benedikt and Gabriele Puppis and Cristian Riveros , title =. J. Comput. Syst. Sci. , volume =
-
[25]
Theoretical Computer Science , volume =
Michael Benedikt and Gabriele Puppis and Cristian Riveros , title =. Theoretical Computer Science , volume =. 2014 , issn =
work page 2014
-
[26]
Samanta, Roopsha and Deshmukh, Jyotirmoy V. and Chaudhuri, Swarat. Robustness Analysis of String Transducers. Automated Technology for Verification and Analysis. 2013
work page 2013
-
[27]
Henzinger and Jan Otop and Roopsha Samanta , editor =
Thomas A. Henzinger and Jan Otop and Roopsha Samanta , editor =. Lipschitz Robustness of Finite-state Transducers , booktitle =
- [28]
-
[29]
Cewei Cui and Zhe Dang and Thomas R. Fischer and Oscar H. Ibarra , title =. Inf. Comput. , volume =
-
[30]
International Conference on Developments in Language Theory , pages=
Relative Prefix Distance Between Languages , author=. International Conference on Developments in Language Theory , pages=. 2017 , organization=
work page 2017
-
[31]
Weighted Transducers for Robustness Verification , booktitle =
Emmanuel Filiot and Nicolas Mazzocchi and Jean. Weighted Transducers for Robustness Verification , booktitle =
-
[32]
33rd Annual Symposium on Combinatorial Pattern Matching,
Dana Fisman and Joshua Grogin and Oded Margalit and Gera Weiss , title =. 33rd Annual Symposium on Combinatorial Pattern Matching,
-
[33]
A Normalized Edit Distance on Infinite Words , booktitle =
Dana Fisman and Joshua Grogin and Gera Weiss , editor =. A Normalized Edit Distance on Infinite Words , booktitle =
-
[34]
When Is the Normalized Edit Distance over Non-Uniform Weights a Metric? , booktitle =
Dana Fisman and Ilay Tzarfati , editor =. When Is the Normalized Edit Distance over Non-Uniform Weights a Metric? , booktitle =
-
[35]
Bruse, Florian and Herwig, Maurice and Lange, Martin , title =. 2022 , isbn =. doi:10.1007/978-3-031-07469-1_6 , booktitle =
-
[36]
A Complete Quantitative Axiomatisation of Behavioural Distance of Regular Expressions , author=. 2024 , eprint=
work page 2024
-
[37]
Lawvere, F. William , title =. Rendiconti del Seminario Matematico e Fisico di Milano , volume =
- [38]
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.