pith. machine review for the scientific record. sign in

arxiv: 2605.09668 · v1 · submitted 2026-05-10 · 💻 cs.FL · math.GN

Recognition: 2 theorem links

· Lean Theorem

Asymptotic Hausdorff and Language Similarity

Dana Fisman, Gal Meirom

Authors on Pith no claims yet

Pith reviewed 2026-05-12 04:28 UTC · model grok-4.3

classification 💻 cs.FL math.GN
keywords asymptotic Hausdorff distancelanguage similaritynormalized edit distanceregular languagesformal languagespseudo-metriccontext-free languages
0
0 comments X

The pith

The Asymptotic Hausdorff lifting turns element-level metrics into pseudo-metrics on sets that ignore finite deviations and reflect asymptotic similarity.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces the Asymptotic Hausdorff lifting AH_d as a general way to extend a distance d defined on individual elements to a distance on entire infinite sets such as formal languages. Unlike classical Hausdorff distance, the new construction examines the limit behavior as element size grows, so that occasional mismatches do not dominate the result. When d is a normalized edit distance on words, AH_d produces well-behaved distances between languages that still satisfy the triangle inequality and other metric axioms. The authors then classify regular languages up to the equivalence induced by these distances and show how the value can be computed exactly for regular languages and for bounded context-free languages.

Core claim

The Asymptotic Hausdorff lifting AH_d is defined so that the distance between two sets X and Y is the limit superior, as n tends to infinity, of the classical Hausdorff distance between the n-sized subsets of X and Y. Applied to normalized edit distances on words, this yields a pseudo-metric on languages that captures asymptotic edit behavior. For regular languages the equivalence classes under AH_ned are determined by the languages' asymptotic essence, and the distance itself is computable for both regular languages and bounded context-free languages.

What carries the argument

The Asymptotic Hausdorff lifting AH_d, which replaces the classical Hausdorff distance with its asymptotic limit to ignore finite-size outliers while preserving metric properties.

If this is right

  • Regular languages fall into equivalence classes determined by their asymptotic edit behavior under AH_ned.
  • The resulting distances remain pseudo-metrics, satisfying non-negativity, symmetry, and the triangle inequality.
  • AH_ned can be computed algorithmically for all regular languages and all bounded context-free languages.
  • The construction applies to any underlying element metric d that admits a natural size measure.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same lifting could be used on other infinite objects equipped with a size function, such as infinite trees or graphs, to obtain asymptotic similarity measures.
  • Equivalence classes under AH_ned might correspond to recognizable properties of the generating automata that are stable under finite modifications.
  • The method suggests a route to approximate language learning that only needs to match behavior on sufficiently long strings.

Load-bearing premise

That the limit superior of the scaled Hausdorff distances exists and that ignoring all finite deviations still produces a meaningful notion of similarity between languages.

What would settle it

Two regular languages that differ by only finitely many words yet receive a positive AH_ned value, or a pair of bounded context-free languages for which the computed AH_ned value contradicts direct enumeration of long words.

Figures

Figures reproduced from arXiv: 2605.09668 by Dana Fisman, Gal Meirom.

Figure 5
Figure 5. Figure 5: (right) illustrates the partition of the [PITH_FULL_IMAGE:figures/full_fig_p018_5.png] view at source ↗
read the original abstract

We introduce the \textit{Asymptotic Hausdorff} lifting, denoted $\mathbb{AH}_{d}$, a general method for lifting an element-level metric $d$ to a (pseudo-) metric on sets, that captures asymptotic similarity in infinite domains equipped with a notion of size. The construction is designed to be insensitive to finite deviations and to avoid the limitations of classical Hausdorff-based approaches, which are often overly sensitive to outliers and fail to reflect asymptotic behavior. Formal languages provide a central motivating instance of this framework, where elements are words and sets are languages. When applied to normalized edit distances, the Asymptotic Hausdorff lifting yields metric-valued distances between languages that reflect asymptotic edit behavior while preserving metric structure. We study the equivalence classes of regular languages induced by $\mathbb{AH}_{d}$ for normalized edit distances $d$, and characterize their asymptotic essence. Focusing in particular on the normalized edit distance of Marzal and Vidal, $\textsf{ned}$, we investigate the computation of $\mathbb{AH}_{\textsf{ned}}$ for regular languages and for bounded context-free languages.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

1 major / 0 minor

Summary. The paper introduces the Asymptotic Hausdorff lifting AH_d, a general construction for lifting an element-level metric d to a (pseudo-)metric on sets that is insensitive to finite deviations and captures asymptotic similarity. Applied to normalized edit distances on formal languages, it produces distances reflecting asymptotic edit behavior while preserving metric structure. The work studies the equivalence classes of regular languages induced by AH_d, characterizes their asymptotic essence, and investigates computation of AH_ned (for the normalized edit distance ned of Marzal and Vidal) on regular languages and bounded context-free languages.

Significance. If the lifting is shown to satisfy the metric axioms (particularly the triangle inequality) and to induce well-defined equivalence classes on regular languages that exactly capture shared asymptotic edit behavior, the construction would offer a principled, outlier-robust alternative to classical Hausdorff distances for comparing infinite objects such as languages. This could be useful in automata theory and applications requiring asymptotic similarity measures. The restriction to regular languages and the focus on computability for ned add a concrete computational angle, though no machine-checked proofs or reproducible examples are referenced.

major comments (1)
  1. [Abstract] Abstract: the central claims that AH_d 'preserves metric structure' and 'yields characterizations' of asymptotic essence rest on an asserted lifting construction, yet the manuscript supplies neither the explicit definition of AH_d nor a proof (or even sketch) that the triangle inequality holds on the space of languages. This is load-bearing for the claim that the result is a (pseudo-)metric reflecting asymptotic edit behavior rather than a heuristic.

Simulated Author's Rebuttal

1 responses · 0 unresolved

Thank you for the careful review of our manuscript. We address the major comment below and will make targeted revisions to improve clarity.

read point-by-point responses
  1. Referee: [Abstract] Abstract: the central claims that AH_d 'preserves metric structure' and 'yields characterizations' of asymptotic essence rest on an asserted lifting construction, yet the manuscript supplies neither the explicit definition of AH_d nor a proof (or even sketch) that the triangle inequality holds on the space of languages. This is load-bearing for the claim that the result is a (pseudo-)metric reflecting asymptotic edit behavior rather than a heuristic.

    Authors: We thank the referee for this observation. The manuscript does contain the explicit definition of the Asymptotic Hausdorff lifting AH_d in the main body (following the introduction of the general construction), along with a full proof that it satisfies the triangle inequality and thus forms a pseudo-metric on languages. The proof proceeds by applying the triangle inequality of the base metric d to the finite approximations and then taking the lim sup. We agree that the abstract would benefit from a brief pointer to these elements to make the central claims more immediately verifiable from the abstract alone. We will therefore revise the abstract to include a short reference to the definition and the metric property. revision: partial

Circularity Check

0 steps flagged

No significant circularity in the derivation chain

full rationale

The paper introduces the Asymptotic Hausdorff lifting AH_d as a direct, general construction that lifts an arbitrary base metric d on elements to a (pseudo-)metric on sets, designed to ignore finite deviations. The central claims about metric preservation, equivalence classes of regular languages under AH_ned, and asymptotic behavior follow from the explicit definition of the lifting rather than from any fitted parameters, self-referential equations, or load-bearing self-citations. No step reduces a claimed prediction or uniqueness result to its own inputs by construction; the derivation remains self-contained against the stated axioms of the base distance.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 1 invented entities

The work rests on standard metric axioms for the base distance d and on the existence of a size notion for infinite domains; the lifting operator itself is the main invented entity.

axioms (2)
  • domain assumption d is a metric (or pseudo-metric) on the underlying word space.
    Required for the lifted object to inherit metric properties.
  • domain assumption There exists a notion of size on the infinite domain that allows asymptotic limits to be defined.
    Invoked to make the lifting insensitive to finite deviations.
invented entities (1)
  • Asymptotic Hausdorff lifting AH_d no independent evidence
    purpose: Lift an element metric d to a set metric that ignores finite deviations.
    New operator introduced to overcome limitations of classical Hausdorff distance.

pith-pipeline@v0.9.0 · 5479 in / 1310 out tokens · 47640 ms · 2026-05-12T04:28:57.855377+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

Reference graph

Works this paper leans on

38 extracted references · 38 canonical work pages

  1. [1]

    2025 , howpublished =

    Hausdorff distance ---. 2025 , howpublished =

  2. [2]

    New Phytologist , year=

    THE DISTRIBUTION OF THE FLORA IN THE ALPINE ZONE.1 , author=. New Phytologist , year=

  3. [3]

    and Weaver, Warren , biburl =

    Shannon, Claude E. and Weaver, Warren , biburl =

  4. [4]

    Miller , title =

    Noam Chomsky and George A. Miller , title =. Inf. Control. , volume =. 1958 , url =. doi:10.1016/S0019-9958(58)90082-2 , timestamp =

  5. [5]

    Proceedings of the American Mathematical Society , volume=

    Bounded regular sets , author=. Proceedings of the American Mathematical Society , volume=

  6. [6]

    , title =

    Ginsburg, Seymour and Spanier, Edwin H. , title =. Pacific Journal of Mathematics , volume =. 1966 , publisher =

  7. [7]

    , title =

    Parikh, Rohit J. , title =. Journal of the Association for Computing Machinery , volume =. 1966 , publisher =

  8. [8]

    Binary codes capable of correcting deletions, insertions and reversals

    Levenshtein, Vladimir Iosifovich , biburl =. Binary codes capable of correcting deletions, insertions and reversals. , volume =. Soviet Physics Doklady , keywords =

  9. [9]

    Wagner , title =

    Robert A. Wagner , title =. Commun. 1974 , url =. doi:10.1145/360980.360995 , timestamp =

  10. [10]

    International Symposium on Mathematical Foundations of Computer Science , pages=

    Characterizing regular languages with polynomial densities , author=. International Symposium on Mathematical Foundations of Computer Science , pages=. 1992 , organization=

  11. [11]

    and Vidal, E

    Marzal, A. and Vidal, E. , journal=. Computation of normalized edit distance and applications , year=

  12. [12]

    Discrete Applied Mathematics , volume=

    Thin and slender languages , author=. Discrete Applied Mathematics , volume=. 1995 , publisher=

  13. [13]

    Mehryar Mohri , title =. Comput. Linguistics , volume =. 1997 , timestamp =

  14. [14]

    Proceedings of the 7th International Conference on Implementation and Application of Automata , pages =

    Mohri, Mehryar , title =. Proceedings of the 7th International Conference on Implementation and Application of Automata , pages =. 2002 , isbn =

  15. [15]

    A Normalized Levenshtein Distance Metric , year=

    Yujian, Li and Bo, Liu , journal=. A Normalized Levenshtein Distance Metric , year=

  16. [16]

    Shallit , title =

    Dalia Krieger and Narad Rampersad and Jeffrey O. Shallit , title =. CoRR , volume =. 2007 , url =. 0711.4990 , timestamp =

  17. [17]

    A contextual normalised edit distance , booktitle =

    Colin de la Higuera and Luisa Mic. A contextual normalised edit distance , booktitle =

  18. [18]

    2008 , url=

    Distances of formal languages , author=. 2008 , url=

  19. [19]

    Lecture notes LIAFA, Universit

    Mathematical foundations of automata theory , author=. Lecture notes LIAFA, Universit

  20. [20]

    , title =

    Boker, Udi and Henzinger, Thomas A. , title =. IARCS Annual Conference on Foundations of Software Technology and Theoretical Computer Science (FSTTCS 2012) , pages =. 2012 , address =

  21. [21]

    and Ibarra, Oscar H

    Cui, Cewei and Dang, Zhe and Fischer, Thomas R. and Ibarra, Oscar H. , title =. 2013 , publisher =

  22. [22]

    Proceedings of the 26th Annual

    Michael Benedikt and Gabriele Puppis and Cristian Riveros , title =. Proceedings of the 26th Annual

  23. [23]

    The Cost of Traveling between Languages , booktitle =

    Michael Benedikt and Gabriele Puppis and Cristian Riveros , editor =. The Cost of Traveling between Languages , booktitle =

  24. [24]

    Michael Benedikt and Gabriele Puppis and Cristian Riveros , title =. J. Comput. Syst. Sci. , volume =

  25. [25]

    Theoretical Computer Science , volume =

    Michael Benedikt and Gabriele Puppis and Cristian Riveros , title =. Theoretical Computer Science , volume =. 2014 , issn =

  26. [26]

    and Chaudhuri, Swarat

    Samanta, Roopsha and Deshmukh, Jyotirmoy V. and Chaudhuri, Swarat. Robustness Analysis of String Transducers. Automated Technology for Verification and Analysis. 2013

  27. [27]

    Henzinger and Jan Otop and Roopsha Samanta , editor =

    Thomas A. Henzinger and Jan Otop and Roopsha Samanta , editor =. Lipschitz Robustness of Finite-state Transducers , booktitle =

  28. [28]

    2016 , eprint=

    Regular Language Distance and Entropy , author=. 2016 , eprint=

  29. [29]

    Fischer and Oscar H

    Cewei Cui and Zhe Dang and Thomas R. Fischer and Oscar H. Ibarra , title =. Inf. Comput. , volume =

  30. [30]

    International Conference on Developments in Language Theory , pages=

    Relative Prefix Distance Between Languages , author=. International Conference on Developments in Language Theory , pages=. 2017 , organization=

  31. [31]

    Weighted Transducers for Robustness Verification , booktitle =

    Emmanuel Filiot and Nicolas Mazzocchi and Jean. Weighted Transducers for Robustness Verification , booktitle =

  32. [32]

    33rd Annual Symposium on Combinatorial Pattern Matching,

    Dana Fisman and Joshua Grogin and Oded Margalit and Gera Weiss , title =. 33rd Annual Symposium on Combinatorial Pattern Matching,

  33. [33]

    A Normalized Edit Distance on Infinite Words , booktitle =

    Dana Fisman and Joshua Grogin and Gera Weiss , editor =. A Normalized Edit Distance on Infinite Words , booktitle =

  34. [34]

    When Is the Normalized Edit Distance over Non-Uniform Weights a Metric? , booktitle =

    Dana Fisman and Ilay Tzarfati , editor =. When Is the Normalized Edit Distance over Non-Uniform Weights a Metric? , booktitle =

  35. [35]

    2022 , isbn =

    Bruse, Florian and Herwig, Maurice and Lange, Martin , title =. 2022 , isbn =. doi:10.1007/978-3-031-07469-1_6 , booktitle =

  36. [36]

    2024 , eprint=

    A Complete Quantitative Axiomatisation of Behavioural Distance of Regular Expressions , author=. 2024 , eprint=

  37. [37]

    William , title =

    Lawvere, F. William , title =. Rendiconti del Seminario Matematico e Fisico di Milano , volume =

  38. [38]

    , title =

    Flagg, Robert C. , title =. Algebra Universalis , volume =