arxiv: 2605.09668 · v1 · submitted 2026-05-10 · 💻 cs.FL · math.GN

Recognition: 2 theorem links

· Lean Theorem

Asymptotic Hausdorff and Language Similarity

Dana Fisman, Gal Meirom

Authors on Pith no claims yet

Pith reviewed 2026-05-12 04:28 UTC · model grok-4.3

classification 💻 cs.FL math.GN

keywords asymptotic Hausdorff distancelanguage similaritynormalized edit distanceregular languagesformal languagespseudo-metriccontext-free languages

0 comments

The pith

The Asymptotic Hausdorff lifting turns element-level metrics into pseudo-metrics on sets that ignore finite deviations and reflect asymptotic similarity.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces the Asymptotic Hausdorff lifting AH_d as a general way to extend a distance d defined on individual elements to a distance on entire infinite sets such as formal languages. Unlike classical Hausdorff distance, the new construction examines the limit behavior as element size grows, so that occasional mismatches do not dominate the result. When d is a normalized edit distance on words, AH_d produces well-behaved distances between languages that still satisfy the triangle inequality and other metric axioms. The authors then classify regular languages up to the equivalence induced by these distances and show how the value can be computed exactly for regular languages and for bounded context-free languages.

Core claim

The Asymptotic Hausdorff lifting AH_d is defined so that the distance between two sets X and Y is the limit superior, as n tends to infinity, of the classical Hausdorff distance between the n-sized subsets of X and Y. Applied to normalized edit distances on words, this yields a pseudo-metric on languages that captures asymptotic edit behavior. For regular languages the equivalence classes under AH_ned are determined by the languages' asymptotic essence, and the distance itself is computable for both regular languages and bounded context-free languages.

What carries the argument

The Asymptotic Hausdorff lifting AH_d, which replaces the classical Hausdorff distance with its asymptotic limit to ignore finite-size outliers while preserving metric properties.

If this is right

Regular languages fall into equivalence classes determined by their asymptotic edit behavior under AH_ned.
The resulting distances remain pseudo-metrics, satisfying non-negativity, symmetry, and the triangle inequality.
AH_ned can be computed algorithmically for all regular languages and all bounded context-free languages.
The construction applies to any underlying element metric d that admits a natural size measure.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same lifting could be used on other infinite objects equipped with a size function, such as infinite trees or graphs, to obtain asymptotic similarity measures.
Equivalence classes under AH_ned might correspond to recognizable properties of the generating automata that are stable under finite modifications.
The method suggests a route to approximate language learning that only needs to match behavior on sufficiently long strings.

Load-bearing premise

That the limit superior of the scaled Hausdorff distances exists and that ignoring all finite deviations still produces a meaningful notion of similarity between languages.

What would settle it

Two regular languages that differ by only finitely many words yet receive a positive AH_ned value, or a pair of bounded context-free languages for which the computed AH_ned value contradicts direct enumeration of long words.

Figures

Figures reproduced from arXiv: 2605.09668 by Dana Fisman, Gal Meirom.

read the original abstract

We introduce the \textit{Asymptotic Hausdorff} lifting, denoted $\mathbb{AH}_{d}$, a general method for lifting an element-level metric $d$ to a (pseudo-) metric on sets, that captures asymptotic similarity in infinite domains equipped with a notion of size. The construction is designed to be insensitive to finite deviations and to avoid the limitations of classical Hausdorff-based approaches, which are often overly sensitive to outliers and fail to reflect asymptotic behavior. Formal languages provide a central motivating instance of this framework, where elements are words and sets are languages. When applied to normalized edit distances, the Asymptotic Hausdorff lifting yields metric-valued distances between languages that reflect asymptotic edit behavior while preserving metric structure. We study the equivalence classes of regular languages induced by $\mathbb{AH}_{d}$ for normalized edit distances $d$, and characterize their asymptotic essence. Focusing in particular on the normalized edit distance of Marzal and Vidal, $\textsf{ned}$, we investigate the computation of $\mathbb{AH}_{\textsf{ned}}$ for regular languages and for bounded context-free languages.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper defines an asymptotic Hausdorff lifting that ignores finite prefixes when comparing languages under a base metric, then characterizes the resulting equivalence classes for regular languages under normalized edit distance.

read the letter

The core contribution is the Asymptotic Hausdorff lifting AH_d. It takes any element-level metric d and produces a distance on sets that discards finite deviations, so two infinite languages are close if their tails match well under d. When d is normalized edit distance, this yields distances between languages that track asymptotic edit behavior instead of being skewed by initial segments or outliers, which is the usual problem with classical Hausdorff distance on languages. They then classify the equivalence classes this induces on regular languages and describe their asymptotic essence, plus they look at computation for regular languages and bounded context-free languages under the specific ned distance of Marzal and Vidal. That characterization step is the part that feels most concrete and potentially useful inside automata theory. The lifting construction itself is presented as a direct, parameter-free operation that inherits metric properties from d, and the stress-test note finds no immediate contradiction with triangle inequality or well-definedness on regular languages. The main soft spot is that the abstract supplies no proof sketches or counter-example checks, so it is still necessary to verify that the lifted operation really satisfies the metric axioms and that the claimed equivalence classes are exactly the ones sharing the same asymptotic edit behavior. The computation claims for regular languages also need the actual algorithms or complexity bounds to be evaluated. This is a paper for people already working on distances between formal languages or on similarity measures in automata theory. A reader who needs a way to compare infinite regular languages without finite-prefix noise could extract the equivalence characterization and try to apply it. It is worth sending to peer review because the lifting idea is new in the cited literature and the restriction to regular languages keeps the claims checkable, even though the proofs will have to be examined carefully in the full text.

Referee Report

1 major / 0 minor

Summary. The paper introduces the Asymptotic Hausdorff lifting AH_d, a general construction for lifting an element-level metric d to a (pseudo-)metric on sets that is insensitive to finite deviations and captures asymptotic similarity. Applied to normalized edit distances on formal languages, it produces distances reflecting asymptotic edit behavior while preserving metric structure. The work studies the equivalence classes of regular languages induced by AH_d, characterizes their asymptotic essence, and investigates computation of AH_ned (for the normalized edit distance ned of Marzal and Vidal) on regular languages and bounded context-free languages.

Significance. If the lifting is shown to satisfy the metric axioms (particularly the triangle inequality) and to induce well-defined equivalence classes on regular languages that exactly capture shared asymptotic edit behavior, the construction would offer a principled, outlier-robust alternative to classical Hausdorff distances for comparing infinite objects such as languages. This could be useful in automata theory and applications requiring asymptotic similarity measures. The restriction to regular languages and the focus on computability for ned add a concrete computational angle, though no machine-checked proofs or reproducible examples are referenced.

major comments (1)

[Abstract] Abstract: the central claims that AH_d 'preserves metric structure' and 'yields characterizations' of asymptotic essence rest on an asserted lifting construction, yet the manuscript supplies neither the explicit definition of AH_d nor a proof (or even sketch) that the triangle inequality holds on the space of languages. This is load-bearing for the claim that the result is a (pseudo-)metric reflecting asymptotic edit behavior rather than a heuristic.

Simulated Author's Rebuttal

1 responses · 0 unresolved

Thank you for the careful review of our manuscript. We address the major comment below and will make targeted revisions to improve clarity.

read point-by-point responses

Referee: [Abstract] Abstract: the central claims that AH_d 'preserves metric structure' and 'yields characterizations' of asymptotic essence rest on an asserted lifting construction, yet the manuscript supplies neither the explicit definition of AH_d nor a proof (or even sketch) that the triangle inequality holds on the space of languages. This is load-bearing for the claim that the result is a (pseudo-)metric reflecting asymptotic edit behavior rather than a heuristic.

Authors: We thank the referee for this observation. The manuscript does contain the explicit definition of the Asymptotic Hausdorff lifting AH_d in the main body (following the introduction of the general construction), along with a full proof that it satisfies the triangle inequality and thus forms a pseudo-metric on languages. The proof proceeds by applying the triangle inequality of the base metric d to the finite approximations and then taking the lim sup. We agree that the abstract would benefit from a brief pointer to these elements to make the central claims more immediately verifiable from the abstract alone. We will therefore revise the abstract to include a short reference to the definition and the metric property. revision: partial

Circularity Check

0 steps flagged

No significant circularity in the derivation chain

full rationale

The paper introduces the Asymptotic Hausdorff lifting AH_d as a direct, general construction that lifts an arbitrary base metric d on elements to a (pseudo-)metric on sets, designed to ignore finite deviations. The central claims about metric preservation, equivalence classes of regular languages under AH_ned, and asymptotic behavior follow from the explicit definition of the lifting rather than from any fitted parameters, self-referential equations, or load-bearing self-citations. No step reduces a claimed prediction or uniqueness result to its own inputs by construction; the derivation remains self-contained against the stated axioms of the base distance.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 1 invented entities

The work rests on standard metric axioms for the base distance d and on the existence of a size notion for infinite domains; the lifting operator itself is the main invented entity.

axioms (2)

domain assumption d is a metric (or pseudo-metric) on the underlying word space.
Required for the lifted object to inherit metric properties.
domain assumption There exists a notion of size on the infinite domain that allows asymptotic limits to be defined.
Invoked to make the lifting insensitive to finite deviations.

invented entities (1)

Asymptotic Hausdorff lifting AH_d no independent evidence
purpose: Lift an element metric d to a set metric that ignores finite deviations.
New operator introduced to overcome limitations of classical Hausdorff distance.

pith-pipeline@v0.9.0 · 5479 in / 1310 out tokens · 47640 ms · 2026-05-12T04:28:57.855377+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear
We introduce the Asymptotic Hausdorff lifting, denoted AH_d, a general method for lifting an element-level metric d to a (pseudo-) metric on sets, that captures asymptotic similarity in infinite domains equipped with a notion of size. ... AH^▷_d(X,Y) = lim_{k→∞} sup_{x∈X, s(x)≥k} inf_{y∈Y} d(x,y)
IndisputableMonolith/Foundation/RealityFromDistinction.lean reality_from_one_distinction unclear
We study the equivalence classes of regular languages induced by AH_d ... asymptotic essence ... Efa(A) obtained by replacing transitions not in SCCs by ε-transitions

Reference graph

Works this paper leans on

38 extracted references · 38 canonical work pages

[1]

2025 , howpublished =

Hausdorff distance ---. 2025 , howpublished =

work page 2025
[2]

New Phytologist , year=

THE DISTRIBUTION OF THE FLORA IN THE ALPINE ZONE.1 , author=. New Phytologist , year=

work page
[3]

and Weaver, Warren , biburl =

Shannon, Claude E. and Weaver, Warren , biburl =

work page
[4]

Miller , title =

Noam Chomsky and George A. Miller , title =. Inf. Control. , volume =. 1958 , url =. doi:10.1016/S0019-9958(58)90082-2 , timestamp =

work page doi:10.1016/s0019-9958(58)90082-2 1958
[5]

Proceedings of the American Mathematical Society , volume=

Bounded regular sets , author=. Proceedings of the American Mathematical Society , volume=

work page
[6]

, title =

Ginsburg, Seymour and Spanier, Edwin H. , title =. Pacific Journal of Mathematics , volume =. 1966 , publisher =

work page 1966
[7]

, title =

Parikh, Rohit J. , title =. Journal of the Association for Computing Machinery , volume =. 1966 , publisher =

work page 1966
[8]

Binary codes capable of correcting deletions, insertions and reversals

Levenshtein, Vladimir Iosifovich , biburl =. Binary codes capable of correcting deletions, insertions and reversals. , volume =. Soviet Physics Doklady , keywords =

work page
[9]

Wagner , title =

Robert A. Wagner , title =. Commun. 1974 , url =. doi:10.1145/360980.360995 , timestamp =

work page doi:10.1145/360980.360995 1974
[10]

International Symposium on Mathematical Foundations of Computer Science , pages=

Characterizing regular languages with polynomial densities , author=. International Symposium on Mathematical Foundations of Computer Science , pages=. 1992 , organization=

work page 1992
[11]

and Vidal, E

Marzal, A. and Vidal, E. , journal=. Computation of normalized edit distance and applications , year=

work page
[12]

Discrete Applied Mathematics , volume=

Thin and slender languages , author=. Discrete Applied Mathematics , volume=. 1995 , publisher=

work page 1995
[13]

Mehryar Mohri , title =. Comput. Linguistics , volume =. 1997 , timestamp =

work page 1997
[14]

Proceedings of the 7th International Conference on Implementation and Application of Automata , pages =

Mohri, Mehryar , title =. Proceedings of the 7th International Conference on Implementation and Application of Automata , pages =. 2002 , isbn =

work page 2002
[15]

A Normalized Levenshtein Distance Metric , year=

Yujian, Li and Bo, Liu , journal=. A Normalized Levenshtein Distance Metric , year=

work page
[16]

Shallit , title =

Dalia Krieger and Narad Rampersad and Jeffrey O. Shallit , title =. CoRR , volume =. 2007 , url =. 0711.4990 , timestamp =

work page arXiv 2007
[17]

A contextual normalised edit distance , booktitle =

Colin de la Higuera and Luisa Mic. A contextual normalised edit distance , booktitle =

work page
[18]

2008 , url=

Distances of formal languages , author=. 2008 , url=

work page 2008
[19]

Lecture notes LIAFA, Universit

Mathematical foundations of automata theory , author=. Lecture notes LIAFA, Universit

work page
[20]

, title =

Boker, Udi and Henzinger, Thomas A. , title =. IARCS Annual Conference on Foundations of Software Technology and Theoretical Computer Science (FSTTCS 2012) , pages =. 2012 , address =

work page 2012
[21]

and Ibarra, Oscar H

Cui, Cewei and Dang, Zhe and Fischer, Thomas R. and Ibarra, Oscar H. , title =. 2013 , publisher =

work page 2013
[22]

Proceedings of the 26th Annual

Michael Benedikt and Gabriele Puppis and Cristian Riveros , title =. Proceedings of the 26th Annual

work page
[23]

The Cost of Traveling between Languages , booktitle =

Michael Benedikt and Gabriele Puppis and Cristian Riveros , editor =. The Cost of Traveling between Languages , booktitle =

work page
[24]

Michael Benedikt and Gabriele Puppis and Cristian Riveros , title =. J. Comput. Syst. Sci. , volume =

work page
[25]

Theoretical Computer Science , volume =

Michael Benedikt and Gabriele Puppis and Cristian Riveros , title =. Theoretical Computer Science , volume =. 2014 , issn =

work page 2014
[26]

and Chaudhuri, Swarat

Samanta, Roopsha and Deshmukh, Jyotirmoy V. and Chaudhuri, Swarat. Robustness Analysis of String Transducers. Automated Technology for Verification and Analysis. 2013

work page 2013
[27]

Henzinger and Jan Otop and Roopsha Samanta , editor =

Thomas A. Henzinger and Jan Otop and Roopsha Samanta , editor =. Lipschitz Robustness of Finite-state Transducers , booktitle =

work page
[28]

2016 , eprint=

Regular Language Distance and Entropy , author=. 2016 , eprint=

work page 2016
[29]

Fischer and Oscar H

Cewei Cui and Zhe Dang and Thomas R. Fischer and Oscar H. Ibarra , title =. Inf. Comput. , volume =

work page
[30]

International Conference on Developments in Language Theory , pages=

Relative Prefix Distance Between Languages , author=. International Conference on Developments in Language Theory , pages=. 2017 , organization=

work page 2017
[31]

Weighted Transducers for Robustness Verification , booktitle =

Emmanuel Filiot and Nicolas Mazzocchi and Jean. Weighted Transducers for Robustness Verification , booktitle =

work page
[32]

33rd Annual Symposium on Combinatorial Pattern Matching,

Dana Fisman and Joshua Grogin and Oded Margalit and Gera Weiss , title =. 33rd Annual Symposium on Combinatorial Pattern Matching,

work page
[33]

A Normalized Edit Distance on Infinite Words , booktitle =

Dana Fisman and Joshua Grogin and Gera Weiss , editor =. A Normalized Edit Distance on Infinite Words , booktitle =

work page
[34]

When Is the Normalized Edit Distance over Non-Uniform Weights a Metric? , booktitle =

Dana Fisman and Ilay Tzarfati , editor =. When Is the Normalized Edit Distance over Non-Uniform Weights a Metric? , booktitle =

work page
[35]

2022 , isbn =

Bruse, Florian and Herwig, Maurice and Lange, Martin , title =. 2022 , isbn =. doi:10.1007/978-3-031-07469-1_6 , booktitle =

work page doi:10.1007/978-3-031-07469-1_6 2022
[36]

2024 , eprint=

A Complete Quantitative Axiomatisation of Behavioural Distance of Regular Expressions , author=. 2024 , eprint=

work page 2024
[37]

William , title =

Lawvere, F. William , title =. Rendiconti del Seminario Matematico e Fisico di Milano , volume =

work page
[38]

, title =

Flagg, Robert C. , title =. Algebra Universalis , volume =

work page