pith. machine review for the scientific record. sign in

arxiv: 2604.15601 · v1 · submitted 2026-04-17 · 💻 cs.DS

Recognition: unknown

The Communication Complexity of Pattern Matching with Edits Revisited

Jakob Nogler, Philip Wellnitz, Tomasz Kociumaka

Authors on Pith no claims yet

Pith reviewed 2026-05-10 08:24 UTC · model grok-4.3

classification 💻 cs.DS
keywords patterncdoteditsmatchingsigmabitscommunicationcomplexity
0
0 comments X

The pith

The one-way communication complexity of reporting k-edit occurrences (including the edit sequences) is Θ(n/m · k log(m|Σ|/k)) bits for 0 < k < m < n/2.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Pattern matching with edits asks: given a long text T and short pattern P, find every place in T that is at most k edits (insertions, deletions, or substitutions) away from P. In the communication model, Alice holds the whole instance and must send a short message to Bob so Bob can list all such places and the actual edits. The authors construct a new encoding whose size is O(n/m · k log(m|Σ|/k)) bits. They also prove a matching lower bound for the version that must report the edit sequences themselves. The new upper bound removes an extra log m factor from their earlier STOC'24 result and works for arbitrary alphabets.

Core claim

We improve the encoding size to O(n/m · k log(m|Σ|/k)), which matches the lower bound for constant-sized alphabets. We further establish a new tight lower bound of Ω(n/m · k log(m|Σ|/k)) for the edit sequence reporting variant.

Load-bearing premise

The analysis assumes the standard one-way communication model and the parameter regime 0 < k < m < n/2; the lower-bound construction may not extend outside this regime or to two-way communication.

Figures

Figures reproduced from arXiv: 2604.15601 by Jakob Nogler, Philip Wellnitz, Tomasz Kociumaka.

Figure 1
Figure 1. Figure 1: Three alignments of fragments of 𝑃 = ababbabaabb onto fragments of 𝑇 = abbbabaabbbababbb are consecutively added to the corresponding inference graph from Definition 3.1 [PITH_FULL_IMAGE:figures/full_fig_p019_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: The connected components of the graphs of [PITH_FULL_IMAGE:figures/full_fig_p020_2.png] view at source ↗
read the original abstract

In the decades-old Pattern Matching with Edits problem, given a length-$n$ string $T$ (the text), a length-$m$ string $P$ (the pattern), and a positive integer $k$ (the threshold), the task is to list the $k$-error occurrences of $P$ in $T$, that is, all fragments of $T$ whose edit distance to $P$ is at most $k$. The one-way communication complexity of Pattern Matching with Edits is the minimum number of bits that Alice, given an instance $(P, T, k)$ of the problem, must send to Bob so that Bob can reconstruct the answer solely from that message. For the natural parameter regime of $0 < k < m < n/2$, our recent work [STOC'24] yields that ${\Omega}(n/m \cdot k \log(m/k))$ bits are necessary and $O(n/m \cdot k \log^2 m)$ bits are sufficient for Pattern Matching with Edits. More generally, for strings over an alphabet ${\Sigma}$, our recent work [STOC'24] gives an $O(n/m \cdot k \log m \log(m|{\Sigma}|))$-bit encoding that allows one to recover a shortest sequence of edits for every $k$-error occurrence of $P$ in $T$. In this work, we revisit the original proof and improve the encoding size to $O(n/m \cdot k \log(m|{\Sigma}|/k))$, which matches the lower bound for constant-sized alphabets. We further establish a new tight lower bound of ${\Omega}(n/m \cdot k \log(m|{\Sigma}|/k))$ for the edit sequence reporting variant that we solve. Our encoding size also matches the communication complexity established for the simpler Pattern Matching with Mismatches problem in the context of streaming algorithms [Clifford, Kociumaka, Porat; SODA'19].

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

1 major / 2 minor

Summary. The paper revisits the one-way communication complexity of Pattern Matching with Edits. For 0 < k < m < n/2, it improves the encoding size from O(n/m · k log m log(m|Σ|)) to O(n/m · k log(m|Σ|/k)) bits, allowing recovery of shortest edit sequences for all k-error occurrences of P in T. It also proves a matching Ω(n/m · k log(m|Σ|/k)) lower bound for the edit sequence reporting variant, achieving tightness for constant-sized alphabets and matching the known complexity of the mismatches variant.

Significance. If correct, the result tightens the communication complexity of a core string problem by removing an extraneous log factor from the prior STOC'24 upper bound while establishing a new tight lower bound for sequence reporting. The explicit encoding construction and reduction-based lower bound provide a clean, parameter-improved characterization that aligns with the mismatches case from SODA'19.

major comments (1)
  1. Abstract and §1: the upper-bound construction is claimed to solve the edit-sequence-reporting variant, yet the lower bound is stated only for that variant; the manuscript must explicitly confirm that the O(n/m · k log(m|Σ|/k)) encoding outputs the sequences (not merely their existence or positions) so that the matching claim is load-bearing.
minor comments (2)
  1. Abstract: the regime statement '0 < k < m < n/2' is repeated but the dependence on |Σ| in the new bound should be highlighted earlier to clarify the constant-alphabet tightness.
  2. Notation: ensure that log(m|Σ|/k) is written unambiguously (parentheses) in all equations and that the base of the logarithm is stated once.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the careful reading of the manuscript and the positive recommendation for minor revision. We address the single major comment below.

read point-by-point responses
  1. Referee: [—] Abstract and §1: the upper-bound construction is claimed to solve the edit-sequence-reporting variant, yet the lower bound is stated only for that variant; the manuscript must explicitly confirm that the O(n/m · k log(m|Σ|/k)) encoding outputs the sequences (not merely their existence or positions) so that the matching claim is load-bearing.

    Authors: We agree that an explicit statement would strengthen clarity. The improved encoding we present is a direct refinement of the STOC'24 construction, which (as already noted in the manuscript) outputs a shortest edit sequence for every k-error occurrence rather than merely the positions or existence of those occurrences. The new lower bound is proved specifically for this sequence-reporting variant. We will revise the abstract and the first paragraph of §1 to state explicitly that the O(n/m · k log(m|Σ|/k))-bit message enables recovery of the edit sequences themselves, thereby making the tightness claim load-bearing for the reporting variant. revision: yes

Circularity Check

0 steps flagged

Minor self-citation to prior STOC'24 work; new upper/lower bounds are explicit constructions and reductions.

full rationale

The paper improves the encoding via an explicit construction to O(n/m · k log(m|Σ|/k)) and proves a new tight lower bound Ω(n/m · k log(m|Σ|/k)) for edit sequence reporting via reduction from a communication problem. These steps do not reduce to fitted parameters or self-definitions within the paper. The citation to the authors' own STOC'24 result is used only for context on prior (weaker) bounds and is not load-bearing for the new claims, which stand independently as constructions and reductions in the standard one-way model.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The result rests on the standard definition of edit distance and the one-way communication model. No free parameters are fitted; the bounds are asymptotic. No new entities are postulated.

axioms (2)
  • standard math Edit distance is the minimum number of insertions, deletions, and substitutions needed to transform one string into another.
    Invoked in the problem definition and throughout the encoding construction.
  • domain assumption One-way communication complexity is the minimum bits Alice must send to Bob so Bob can output the answer.
    Core model used for both upper and lower bounds.

pith-pipeline@v0.9.0 · 5673 in / 1324 out tokens · 43864 ms · 2026-05-10T08:24:36.126235+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

24 extracted references · 21 canonical work pages

  1. [1]

    write newline

    " write newline "" before.all 'output.state := FUNCTION fin.entry.original add.period write newline FUNCTION new.block output.state before.all = 'skip after.block 'output.state := if FUNCTION new.sentence output.state after.block = 'skip output.state before.all = 'skip after.sentence 'output.state := if if FUNCTION not #0 #1 if FUNCTION and 'skip pop #0 i...

  2. [2]

    Sign-rank of k - H amming distance is constant

    Panagiotis Charalampopoulos, Tomasz Kociumaka, and Philip Wellnitz. Pattern matching under weighted edit distance. FOCS , 2025. https://arxiv.org/abs/2510.17752 arXiv:2510.17752 , https://doi.org/10.1109/FOCS63196.2025.00102 doi:10.1109/FOCS63196.2025.00102

  3. [3]

    Near-optimal-time quantum algorithms for approximate pattern matching

    Tomasz Kociumaka, Jakob Nogler, and Philip Wellnitz. Near-optimal-time quantum algorithms for approximate pattern matching. SODA , 2025. https://arxiv.org/abs/2410.06808 arXiv:2410.06808 , https://doi.org/10.1137/1.9781611978322.15 doi:10.1137/1.9781611978322.15

  4. [4]

    Thankachan

    Daniel Gibney, Ce Jin, Tomasz Kociumaka, and Sharma V. Thankachan. Near-optimal quantum algorithms for bounded edit distance and Lempel-Ziv factorization. SODA , 2024. https://arxiv.org/abs/2311.01793 arXiv:2311.01793 , https://doi.org/10.1137/1.9781611977912.11 doi:10.1137/1.9781611977912.11

  5. [5]

    Communication lower bounds for collision problems via density increment arguments

    Tomasz Kociumaka, Jakob Nogler, and Philip Wellnitz. On the communication complexity of approximate pattern matching. STOC , 2024. https://arxiv.org/abs/2403.18812 arXiv:2403.18812 , https://doi.org/10.1145/3618260.3649604 doi:10.1145/3618260.3649604

  6. [6]

    Almost linear size edit distance sketch

    Michal Kouck \' y and Michael Saks. Almost linear size edit distance sketch. STOC , 2024. https://arxiv.org/abs/2406.11225 arXiv:2406.11225 , https://doi.org/10.1145/3618260.3649783 doi:10.1145/3618260.3649783

  7. [7]

    Streaming k -edit approximate pattern matching via string decomposition

    Sudatta Bhattacharya and Michal Kouck \' y . Streaming k -edit approximate pattern matching via string decomposition. ICALP , 2023. https://arxiv.org/abs/2305.00615 arXiv:2305.00615 , https://doi.org/10.4230/LIPIcs.ICALP.2023.22 doi:10.4230/LIPIcs.ICALP.2023.22

  8. [8]

    and Li, Jerry and Liu, Allen and Narayanan, Shyam , booktitle =

    Alejandro Cassis, Tomasz Kociumaka, and Philip Wellnitz. Optimal algorithms for bounded weighted edit distance. FOCS , 2023. https://arxiv.org/abs/2305.06659 arXiv:2305.06659 , https://doi.org/10.1109/FOCS57990.2023.00135 doi:10.1109/FOCS57990.2023.00135

  9. [9]

    Kim, Vinod Vaikuntanathan, and Or Zamir

    Panagiotis Charalampopoulos, Tomasz Kociumaka, and Philip Wellnitz. Faster pattern matching under edit distance: A reduction to dynamic puzzle matching and the seaweed monoid of permutation matrices. FOCS , 2022. https://arxiv.org/abs/2204.03087 arXiv:2204.03087 , https://doi.org/10.1109/FOCS54457.2022.00072 doi:10.1109/FOCS54457.2022.00072

  10. [10]

    Covering polygons is even harder

    Tomasz Kociumaka, Ely Porat, and Tatiana Starikovskaya. Small-space and streaming pattern matching with k edits. FOCS , 2021. https://arxiv.org/abs/2106.06037 arXiv:2106.06037 , https://doi.org/10.1109/FOCS52979.2021.00090 doi:10.1109/FOCS52979.2021.00090

  11. [11]

    Monochromatic Triangles, Triangle Listing and

    Panagiotis Charalampopoulos, Tomasz Kociumaka, and Philip Wellnitz. Faster approximate pattern matching: A unified approach. FOCS , 2020. https://arxiv.org/abs/2004.08350 arXiv:2004.08350 , https://doi.org/10.1109/FOCS46700.2020.00095 doi:10.1109/FOCS46700.2020.00095

  12. [12]

    The streaming k -mismatch problem

    Rapha \" e l Clifford, Tomasz Kociumaka, and Ely Porat. The streaming k -mismatch problem. SODA , 2019. https://arxiv.org/abs/1708.05223 arXiv:1708.05223 , https://doi.org/10.1137/1.9781611975482.68 doi:10.1137/1.9781611975482.68

  13. [13]

    Edit distance cannot be computed in strongly subquadratic time (unless SETH is false)

    Arturs Backurs and Piotr Indyk. Edit distance cannot be computed in strongly subquadratic time (unless SETH is false). SIAM Journal on Computing , 2018. https://arxiv.org/abs/1412.0348 arXiv:1412.0348 , https://doi.org/10.1137/15M1053128 doi:10.1137/15M1053128

  14. [14]

    Communication and streaming complexity of approximate pattern matching

    Tatiana Starikovskaya. Communication and streaming complexity of approximate pattern matching. CPM , 2017. https://doi.org/10.4230/LIPIcs.CPM.2017.13 doi:10.4230/LIPIcs.CPM.2017.13

  15. [15]

    Landau, Rajeev Raman, Kunihiko Sadakane, Srinivasa Rao Satti, and Oren Weimann

    Philip Bille, Gad M. Landau, Rajeev Raman, Kunihiko Sadakane, Srinivasa Rao Satti, and Oren Weimann. Random access to grammar-compressed strings and trees. SIAM Journal on Computing , 2015. https://arxiv.org/abs/1001.1565 arXiv:1001.1565 , https://doi.org/10.1137/130936889 doi:10.1137/130936889

  16. [16]

    Threshold approximate matching in grammar-compressed strings

    Alexander Tiskin. Threshold approximate matching in grammar-compressed strings. PSC , 2014

  17. [17]

    Beating O (nm) in approximate LZW -compressed pattern matching

    Paweł Gawrychowski and Damian Straszak. Beating O (nm) in approximate LZW -compressed pattern matching. ISAAC , 2013. https://arxiv.org/abs/1308.6509 arXiv:1308.6509 , https://doi.org/10.1007/978-3-642-45030-3_8 doi:10.1007/978-3-642-45030-3_8

  18. [18]

    Approximate string matching: A simpler faster algorithm

    Richard Cole and Ramesh Hariharan. Approximate string matching: A simpler faster algorithm. SIAM Journal on Computing , 2002. https://doi.org/10.1137/S0097539700370527 doi:10.1137/S0097539700370527

  19. [19]

    Landau and Uzi Vishkin

    Gad M. Landau and Uzi Vishkin. Fast parallel and serial approximate string matching. Journal of Algorithms , 1989. https://doi.org/10.1016/0196-6774(89)90010-2 doi:10.1016/0196-6774(89)90010-2

  20. [20]

    Landau and Uzi Vishkin

    Gad M. Landau and Uzi Vishkin. Fast string matching with k differences. Journal of Computer and System Sciences , 1988. https://doi.org/10.1016/0022-0000(88)90045-1 doi:10.1016/0022-0000(88)90045-1

  21. [21]

    Peter H. Sellers. The theory and computation of evolutionary distances: Pattern recognition. Journal of Algorithms , 1980. https://doi.org/10.1016/0196-6774(80)90016-4 doi:10.1016/0196-6774(80)90016-4

  22. [22]

    A universal algorithm for sequential data compression.IEEE Trans

    Jacob Ziv and Abraham Lempel. A universal algorithm for sequential data compression. IEEE Transactions on Information Theory , 1977. https://doi.org/10.1109/TIT.1977.1055714 doi:10.1109/TIT.1977.1055714

  23. [23]

    Fine and Herbert S

    Nathan J. Fine and Herbert S. Wilf. Uniqueness theorems for periodic functions. Proceedings of the American Mathematical Society , 1965. https://doi.org/10.1090/S0002-9939-1965-0174934-9 doi:10.1090/S0002-9939-1965-0174934-9

  24. [24]

    Binary codes capable of correcting deletions, insertions and reversals

    Vladimir Iosifovich Levenshtein. Binary codes capable of correcting deletions, insertions and reversals. Doklady Akademii Nauk SSSR , 1965