arxiv: 2605.04826 · v1 · submitted 2026-05-06 · 💻 cs.DS

Recognition: unknown

Faster Algorithms for Shortest Unique or Absent Substrings

Hilde Verbeek, Manal Mohamed, Panagiotis Charalampopoulos, Solon P. Pissis, Wiktor Zuba

Authors on Pith no claims yet

Pith reviewed 2026-05-08 17:03 UTC · model grok-4.3

classification 💻 cs.DS

keywords shortest unique substringshortest absent substringstring algorithmsword RAM modelwavelet treesde Bruijn sequencesgeometric problemssynchronizing sets

0 comments

The pith

Algorithms compute shortest unique and absent substrings in O(n log σ / √log n) time.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper develops improved algorithms for two classic string problems: finding a shortest unique substring (SUS) and a shortest absent substring (SAS) in a string S of length n. Standard suffix-tree methods solve both in linear time, but the authors show this is not optimal in the word-RAM model when the alphabet size σ is small. They break candidate substrings down by length and period, then apply synchronizing sets, run analysis, and wavelet trees to turn the search into a geometric query that runs faster. The same framework, combined with fast de Bruijn sequence construction, yields an algorithm for SAS with identical complexity.

Core claim

By decomposing the search according to the length and period of candidate substrings and employing synchronizing sets, run analysis, and wavelet trees, the computation of a shortest unique substring reduces to a simple geometric problem solvable in O(n log σ / √log n) time; adapting the same reduction together with efficient de Bruijn sequence construction gives the same bound for shortest absent substrings.

What carries the argument

Decomposition of the problem by length and period of the target substring, together with synchronizing sets, run analysis, and wavelet trees that reduce SUS and SAS computation to a geometric problem.

Load-bearing premise

The length-and-period decomposition, synchronizing sets, run analysis, and wavelet trees correctly reduce SUS and SAS to a geometric problem solvable in the claimed time bound inside the word-RAM model.

What would settle it

A concrete string S of length n over small alphabet where the algorithm either returns an incorrect SUS or SAS or exceeds O(n log σ / √log n) operations on a standard word-RAM implementation.

Figures

Figures reproduced from arXiv: 2605.04826 by Hilde Verbeek, Manal Mohamed, Panagiotis Charalampopoulos, Solon P. Pissis, Wiktor Zuba.

**Figure 1.** Figure 1: A schematic illustration of two suffix trees. Left, the root-to- view at source ↗

**Figure 2.** Figure 2: An illustration of a suffix tree and a wavelet tree. view at source ↗

**Figure 3.** Figure 3: Node 𝑣 of T (F1) corresponds to prefix arba (left) with LCP list L𝑣 (right). Lemma 3.7 gives 7 (shortest unique prefix cadabra, in red), yielding a Shortest Unique String Pair candidate of length 4 + 7 = 11. The following lemma is a direct adaptation of the previous lemma, solving a Shortest Unique String Pair instance in the same setting. Lemma 3.6. Consider an instance of the Shortest Unique String Pair … view at source ↗

**Figure 4.** Figure 4: A set 𝑃 (left) and its primary skyline (right). The union of the shadows of the points in 𝑃 consists of all points within the shaded polygon but outside its heavily shaded part—the points in the heavily shaded part are dominated by multiple points from 𝑃. For this instance of Minimum Skyline Point, the output would be point (2, 3), shown with a rhombus. Lemma 3.10. Any instance of Minimum Skyline Point can… view at source ↗

**Figure 5.** Figure 5: Cases (1)–(3) of Lemma 3.15, illustrating the potential alignment of a run with exponent 𝑒 within a run with exponent 𝑒 ′ . Red alignments are guaranteed to be contained within the longer run; yellow alignments are contained only if the extensions of the runs satisfy specific inequalities. Lemma 3.15. Let 𝐹 and 𝐹 ′ be two runs in 𝑆 with the same Lyndon root 𝜆, where Lrepr(𝐹) = (𝜆, 𝑒, 𝛼, 𝛽) and Lrepr(𝐹 ′ ) … view at source ↗

**Figure 6.** Figure 6: The mapping 𝔥 from runs to sets of points in [0, 2𝑟 − 1] 2 . The light shaded part represents the primary skyline, while dark shaded regions indicate areas dominated by multiple points, where no unique substring can exist. • Lrepr(𝑅2) = (ab, 5, 0, 1) =⇒ 𝔥(𝑅2) = {(2, 1), (0, 3)}; • Lrepr(𝑅3) = (ab, 4, 1, 0) =⇒ 𝔥(𝑅3) = {(1, 0)}. The geometric domain is [0, 3] 2 , as 2𝑟 − 1 = 3. Applying the mapping function … view at source ↗

read the original abstract

We revisit two well-known algorithmic problems on strings: computing a shortest unique substring (SUS) and a shortest absent substring (SAS) of a string $S$ of length $n$. Both problems admit folklore $\mathcal{O}(n)$-time solutions using the suffix tree of $S$. However, for small alphabets, this complexity is not necessarily optimal in the word RAM model, where a string of length $n$ over alphabet $[0,\sigma)$ can be stored in $\mathcal{O}(n \log \sigma/\log n)$ space and read in $\mathcal{O}(n \log \sigma/\log n)$ time. We present an $\mathcal{O}(n \log \sigma/\sqrt{\log n})$-time algorithm for computing a SUS of $S$. This algorithm decomposes the problem according to the length and the period of the sought substring and uses several tools and techniques, such as synchronizing sets, the analysis of runs, and wavelet trees, to reduce the computation of a SUS to a simple geometric problem. Further, we adapt this algorithm and combine it with an efficient construction of de Bruijn sequences in order to obtain an $\mathcal{O}(n \log \sigma/\sqrt{\log n})$-time algorithm for computing a SAS of $S$.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper improves SUS and SAS to O(n log σ / sqrt(log n)) via a length/period decomposition that reduces to geometric queries, but the case analysis is the part that needs verification.

read the letter

The main point is that this paper improves the time for computing a shortest unique substring and a shortest absent substring to O(n log σ / sqrt(log n)) in the word-RAM model. They start from the observation that the standard suffix-tree method is linear, but for small alphabets there is room to do better by exploiting the space bound. The algorithm decomposes the problem by the length and period of the target substring. Synchronizing sets and the analysis of runs then reduce the hard cases to a geometric range problem, which wavelet trees solve efficiently. For the absent substring they adapt the same framework and add an efficient de Bruijn sequence construction. This combination is new. The folklore solutions are the linear ones, and this specific path through geometry is not standard. The paper does a good job of using only established components, so the novelty is in the assembly and the resulting bound. The soft spot is in the decomposition step. It needs to partition every possible candidate correctly, including those with long periods or tricky overlaps. If any case slips through or if the wavelet tree queries introduce uncancelled log factors, the claimed bound does not hold. The abstract presents it as solved, but that is the section a referee should verify in detail. This paper is for people who work on fine-grained string algorithms. Someone interested in practical speedups for bioinformatics or data management might also look at it, though the gain is asymptotic. It is worth a serious referee. The claim is clear and the approach is grounded, so the work should go through review even if revisions are needed on the proof details.

Referee Report

2 major / 2 minor

Summary. The paper claims O(n log σ / √log n)-time algorithms for computing a shortest unique substring (SUS) and a shortest absent substring (SAS) of a string S of length n. The SUS algorithm decomposes the problem by substring length and period, then uses synchronizing sets, run analysis, and wavelet trees to reduce SUS computation to a geometric problem. The SAS algorithm adapts the same framework and combines it with an efficient de Bruijn sequence construction to achieve the same bound, improving on the folklore O(n) suffix-tree solutions in the word-RAM model.

Significance. If the reductions and time analysis hold, the result offers a non-trivial asymptotic improvement over linear-time methods for these string problems when the alphabet is small relative to n. The work appropriately credits and builds upon established primitives (runs, synchronizing sets, wavelet trees, de Bruijn sequences) rather than reinventing them, which strengthens the contribution. The approach could be relevant for large-scale string processing applications.

major comments (2)

[SUS algorithm reduction (length/period decomposition and geometric mapping)] The O(n log σ / √log n) bound for SUS rests on the length/period decomposition correctly partitioning all candidate substrings and the subsequent reduction via synchronizing sets and run analysis mapping the problem to geometry without extra logarithmic factors (see the description following the abstract and the claimed reduction). The manuscript must explicitly verify coverage of long-period cases and boundary overlaps in the word-RAM model; otherwise the improvement over O(n) fails.
[Time analysis of geometric queries via wavelet trees] The wavelet-tree solution to the resulting geometric problem must be shown to achieve the full √log n cancellation in the word-RAM model without hidden log factors from query overhead or data-structure construction (the time analysis paragraph after the reduction).

minor comments (2)

[Abstract] The abstract could include a one-sentence pointer to the specific prior O(n) suffix-tree results being improved upon.
[Preliminaries] Notation for alphabet size (σ) and string length (n) is clear but should be restated at the start of the technical sections for readability.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the careful reading and constructive feedback on our manuscript. We appreciate the recognition of the significance of the asymptotic improvements and the appropriate use of established primitives. We address each major comment below and will incorporate the requested clarifications and expansions into the revised version.

read point-by-point responses

Referee: [SUS algorithm reduction (length/period decomposition and geometric mapping)] The O(n log σ / √log n) bound for SUS rests on the length/period decomposition correctly partitioning all candidate substrings and the subsequent reduction via synchronizing sets and run analysis mapping the problem to geometry without extra logarithmic factors (see the description following the abstract and the claimed reduction). The manuscript must explicitly verify coverage of long-period cases and boundary overlaps in the word-RAM model; otherwise the improvement over O(n) fails.

Authors: We agree that explicit verification strengthens the presentation. The length/period decomposition partitions candidates into short-period and long-period regimes, with synchronizing sets identifying unique positions and run analysis (via the runs theorem) handling periodic repetitions to ensure every possible substring is considered exactly once. Boundary overlaps are resolved by the geometric mapping, which encodes positions and periods without introducing extra log factors in the word-RAM model. In the revision we will insert a dedicated lemma (with proof) that formally verifies complete coverage of long-period cases and all boundary conditions. revision: yes
Referee: [Time analysis of geometric queries via wavelet trees] The wavelet-tree solution to the resulting geometric problem must be shown to achieve the full √log n cancellation in the word-RAM model without hidden log factors from query overhead or data-structure construction (the time analysis paragraph after the reduction).

Authors: The wavelet-tree queries are performed on a structure built over the geometric encoding of candidate positions; each query costs O(log σ) time in the word-RAM model, and the total number of queries is bounded by O(n log σ / √log n) after the decomposition. Construction time is linear in the size of the encoded instance (O(n log σ / log n) words), which is absorbed into the overall bound. We will expand the time-analysis paragraph to include an explicit step-by-step accounting that isolates the √log n cancellation and confirms no additional logarithmic overhead arises from either queries or preprocessing. revision: yes

Circularity Check

0 steps flagged

No significant circularity in the algorithmic reduction for SUS and SAS

full rationale

The paper's derivation is a constructive reduction: it decomposes SUS/SAS computation by substring length and period, then applies standard stringology primitives (synchronizing sets, maximal runs, wavelet trees) to map the remaining cases to a geometric problem whose solution yields the stated O(n log σ/√log n) bound in the word-RAM model. The SAS variant further adapts the same reduction and invokes an efficient de Bruijn sequence construction, a well-known independent primitive. No equation, time bound, or correctness claim reduces by definition to the paper's own inputs, nor does any load-bearing step rely on a self-citation chain that itself lacks external verification. All cited components (wavelet trees, run analysis, synchronizing sets) are established results outside the present work, making the overall derivation self-contained.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The work relies on standard stringology results and data structures without introducing new free parameters or postulated entities.

axioms (2)

standard math Suffix trees and runs in strings can be constructed and analyzed in linear time under the word-RAM model.
Invoked as the baseline and as a building block for the new decomposition.
domain assumption Synchronizing sets and wavelet trees admit efficient construction and query times in the word-RAM model for small alphabets.
Used to achieve the sub-linear per-character cost after decomposition.

pith-pipeline@v0.9.0 · 5546 in / 1384 out tokens · 72836 ms · 2026-05-08T17:03:16.385073+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

44 extracted references · 38 canonical work pages

[1]

Babenko, Paweł Gawrychowski, Tomasz Kociumaka, and Tatiana Starikovskaya

Maxim A. Babenko, Paweł Gawrychowski, Tomasz Kociumaka, and Tatiana Starikovskaya. Wavelet trees meetsuffixtrees. InProceedingsoftheTwenty-SixthAnnualACM-SIAMSymposiumonDiscreteAlgorithms, SODA 2015, pages 572–591, 2015.doi:10.1137/1.9781611973730.39

work page doi:10.1137/1.9781611973730.39 2015
[2]

Baeza-Yates

Ricardo A. Baeza-Yates. Improved string searching.Softw. Pract. Exp., 19(3):257–271, 1989.doi: 10.1002/SPE.4380190305

work page doi:10.1002/spe.4380190305 1989
[3]

Lyndon arrays in sublinear time

Hideo Bannai and Jonas Ellert. Lyndon arrays in sublinear time. In31st Annual European Symposium on Algorithms, ESA 2023, volume 274 ofLIPIcs, pages 14:1–14:16. Schloss Dagstuhl - Leibniz-Zentrum für Informatik, 2023.doi:10.4230/LIPICS.ESA.2023.14

work page doi:10.4230/lipics.esa.2023.14 2023
[4]

Jérémy Barbay, Johannes Fischer, and Gonzalo Navarro. Lrm-trees: Compressed indices, adaptive sorting, andcompressedpermutations.Theor.Comput.Sci.,459:26–41,2012.URL:https://doi.org/10.1016/ j.tcs.2012.08.010,doi:10.1016/J.TCS.2012.08.010

work page doi:10.1016/j.tcs.2012.08.010 2012
[5]

Worst case efficient single and multiple string matching in the RAM model

Djamal Belazzougui. Worst case efficient single and multiple string matching in the RAM model. In Combinatorial Algorithms - 21st International Workshop, IWOCA 2010, volume 6460 ofLecture Notes in Computer Science, pages 90–102. Springer, 2010.doi:10.1007/978-3-642-19222-7_10. 27

work page doi:10.1007/978-3-642-19222-7_10 2010
[6]

Towards optimal packed string matching.Theoretical Computer Science, 525:111–129, 2014.doi:10

Oren Ben-Kiki, Philip Bille, Dany Breslauer, Leszek Gasieniec, Roberto Grossi, and Oren Weimann. Towards optimal packed string matching.Theoretical Computer Science, 525:111–129, 2014.doi:10. 1016/j.tcs.2013.06.013

2014
[7]

LATIN 2000: Theoretical Informatics

Michael A. Bender and Martin Farach-Colton. The LCA problem revisited. In Gaston H. Gonnet, Daniel Panario,andAlfredoViola,editors,LATIN2000: TheoreticalInformatics,4thLatinAmericanSymposium, 2000, Proceedings, volume 1776 ofLecture Notes in Computer Science, pages 88–94. Springer, 2000. doi:10.1007/10719839\_9

work page doi:10.1007/10719839 2000
[8]

Fast searching in packed strings.Journal of Discrete Algorithms, 9(1):49–56, 2011.doi: 10.1016/j.jda.2010.09.003

Philip Bille. Fast searching in packed strings.Journal of Discrete Algorithms, 9(1):49–56, 2011.doi: 10.1016/j.jda.2010.09.003

work page doi:10.1016/j.jda.2010.09.003 2011
[9]

Deterministic indexing for packed strings

Philip Bille, Inge Li Gørtz, and Frederik Rye Skjoldjensen. Deterministic indexing for packed strings. In28th Annual Symposium on Combinatorial Pattern Matching, CPM 2017, volume 78 ofLIPIcs, pages 6:1–6:11.SchlossDagstuhl-Leibniz-ZentrumfürInformatik,2017.doi:10.4230/LIPIcs.CPM.2017.6

work page doi:10.4230/lipics.cpm.2017.6 2017
[10]

Constant-time word-size string matching

Dany Breslauer, Leszek Gasieniec, and Roberto Grossi. Constant-time word-size string matching. In Combinatorial Pattern Matching - 23rd Annual Symposium, CPM 2012, Proceedings, volume 7354 of Lecture Notes in Computer Science, pages 83–96. Springer, 2012.doi:10.1007/978-3-642-31265-6\ _7

work page doi:10.1007/978-3-642-31265-6 2012
[11]

Panagiotis Charalampopoulos, Maxime Crochemore, Gabriele Fici, Robert Mercas, and Solon P. Pissis. Alignment-free sequence comparison using absent words.Inf. Comput., 262:57–68, 2018.doi:10.1016/ J.IC.2018.06.002

2018
[12]

Iliopoulos, Tomasz Kociumaka, Solon P

Panagiotis Charalampopoulos, Maxime Crochemore, Costas S. Iliopoulos, Tomasz Kociumaka, Solon P. Pissis,JakubRadoszewski,WojciechRytter,andTomaszWalen. Linear-TimeAlgorithmforLongLCFwith k Mismatches. In29th Annual Symposium on Combinatorial Pattern Matching (CPM 2018), volume 105 ofLeibniz International Proceedings in Informatics (LIPIcs), pages 23:1–23:...

work page doi:10.4230/lipics.cpm.2018.23 2018
[13]

Iliopoulos, Chang Liu, and Solon P

Panagiotis Charalampopoulos, Costas S. Iliopoulos, Chang Liu, and Solon P. Pissis. Property suffix array with applications in indexing weighted sequences.ACM J. Exp. Algorithmics, 25:1–16, 2020.doi: 10.1145/3385898

work page doi:10.1145/3385898 2020
[14]

Panagiotis Charalampopoulos, Tomasz Kociumaka, Jakub Radoszewski, and Solon P. Pissis. Faster algo- rithms for longest common substring.ACM Trans. Algorithms, 2025.doi:10.1145/3774754

work page doi:10.1145/3774754 2025
[15]

Counting distinct square substrings in sublinear time

Panagiotis Charalampopoulos, Manal Mohamed, Jakub Radoszewski, Wojciech Rytter, Tomasz Walen, and Wiktor Zuba. Counting distinct square substrings in sublinear time. In50th International Symposium on Mathematical Foundations of Computer Science, MFCS 2025, volume 345 ofLIPIcs, pages 36:1–36:19. Schloss Dagstuhl - Leibniz-Zentrum für Informatik, 2025.doi:1...

work page doi:10.4230/lipics.mfcs.2025.36 2025
[16]

Pissis, and Jakub Radoszewski

Panagiotis Charalampopoulos, Solon P. Pissis, and Jakub Radoszewski. Longest palindromic substring in sublinear time. In33rd Annual Symposium on Combinatorial Pattern Matching, CPM 2022, volume 223 ofLIPIcs, pages 20:1–20:9. Schloss Dagstuhl - Leibniz-Zentrum für Informatik, 2022.doi:10.4230/ LIPICS.CPM.2022.20

2022
[17]

Data compression using antidictionaries.Proc

Maxime Crochemore, Filippo Mignosi, Antonio Restivo, and Sergio Salemi. Data compression using antidictionaries.Proc. IEEE, 88(11):1756–1768, 2000.doi:10.1109/5.892711

work page doi:10.1109/5.892711 2000
[18]

A combinatorial problem.Proc

Nicolaas Govert de Bruijn. A combinatorial problem.Proc. Koninklijke Nederlandse Akademie V. Wetenschappen, 49:758–764, 1946. URL:http://resolver.tudelft.nl/uuid: f9a56551-402a-4428-98e6-1469e38e64c1. 28

1946
[19]

Génération d’une section des classes de conjugaison et arbre des mots de lyndon de longueur bornée.Theor

Jean-Pierre Duval. Génération d’une section des classes de conjugaison et arbre des mots de lyndon de longueur bornée.Theor. Comput. Sci., 60:255–283, 1988.doi:10.1016/0304-3975(88)90113-2

work page doi:10.1016/0304-3975(88)90113-2 1988
[20]

String Processing and Information Retrieval - 30th International Symposium,

Jonas Ellert. Sublinear time Lempel-Ziv (LZ77) factorization. InString Processing and Information Retrieval - 30th International Symposium, SPIRE 2023, Proceedings, volume 14240 ofLecture Notes in Computer Science, pages 171–187. Springer, 2023.doi:10.1007/978-3-031-43980-3\_14

work page doi:10.1007/978-3-031-43980-3 2023
[21]

Time-optimal construction of string synchronizing sets

Jonas Ellert and Tomasz Kociumaka. Time-optimal construction of string synchronizing sets. In43rd International Symposium on Theoretical Aspects of Computer Science, STACS 2026, LIPIcs, pages 36:1– 36:22.SchlossDagstuhl-Leibniz-ZentrumfürInformatik,2026.doi:10.4230/LIPICS.STACS.2026.36

work page doi:10.4230/lipics.stacs.2026.36 2026
[22]

Optimal suffix tree construction with large alphabets

Martin Farach. Optimal suffix tree construction with large alphabets. In38th Annual Symposium on Foundations of Computer Science, FOCS 1997, pages 137–143. IEEE Computer Society, 1997.doi: 10.1109/SFCS.1997.646102

work page doi:10.1109/sfcs.1997.646102 1997
[23]

Alphabet-dependent string searching with wexponential search trees

Johannes Fischer and Paweł Gawrychowski. Alphabet-dependent string searching with wexponential search trees. InCombinatorial Pattern Matching - 26th Annual Symposium, CPM 2015, Proceed- ings, volume 9133 ofLecture Notes in Computer Science, pages 160–171. Springer, 2015.doi: 10.1007/978-3-319-19929-0\_14

work page doi:10.1007/978-3-319-19929-0 2015
[24]

Necklaces of beads in k colors and k-ary de bruijn sequences

Harold Fredricksen and James Maiorana. Necklaces of beads in k colors and k-ary de bruijn sequences. Discret. Math., 23(3):207–210, 1978.doi:10.1016/0012-365X(78)90002-X

work page doi:10.1016/0012-365x(78)90002-x 1978
[25]

Shift-or string matching with super-alphabets.Information Processing Letters, 87(4):201–204, 2003.doi:10.1016/S0020-0190(03)00296-5

Kimmo Fredriksson. Shift-or string matching with super-alphabets.Information Processing Letters, 87(4):201–204, 2003.doi:10.1016/S0020-0190(03)00296-5

work page doi:10.1016/s0020-0190(03)00296-5 2003
[26]

Bit-parallel string matching under hamming distance in o(n[m/w])worstcasetime.Inf.Process.Lett.,105(5):182–187,2008.doi:10.1016/J.IPL.2007.08.021

Szymon Grabowski and Kimmo Fredriksson. Bit-parallel string matching under hamming distance in o(n[m/w])worstcasetime.Inf.Process.Lett.,105(5):182–187,2008.doi:10.1016/J.IPL.2007.08.021

work page doi:10.1016/j.ipl.2007.08.021 2008
[27]

High-order entropy-compressed text indexes

Roberto Grossi, Ankur Gupta, and Jeffrey Scott Vitter. High-order entropy-compressed text indexes. In Proceedings of the Fourteenth Annual ACM-SIAM Symposium on Discrete Algorithms, pages 841–850,
[28]

URL:http://dl.acm.org/citation.cfm?id=644108.644250

work page arXiv
[29]

Genome comparison with- out alignment using shortest unique substrings.BMC Bioinform., 6:123, 2005.doi:10.1186/ 1471-2105-6-123

Bernhard Haubold, Nora Pierstorff, Friedrich Möller, and Thomas Wiehe. Genome comparison with- out alignment using shortest unique substrings.BMC Bioinform., 6:123, 2005.doi:10.1186/ 1471-2105-6-123

2005
[30]

Iliopoulos, Dennis W

Costas S. Iliopoulos, Dennis W. G. Moore, and William F. Smyth. A characterization of the squares in a fibonaccistring.Theor.Comput.Sci.,172(1-2):281–291,1997.doi:10.1016/S0304-3975(96)00141-7

work page doi:10.1016/s0304-3975(96)00141-7 1997
[31]

Linear-time longest-common- prefix computation in suffix arrays and its applications

Toru Kasai, Gunho Lee, Hiroki Arimura, Setsuo Arikawa, and Kunsoo Park. Linear-time longest-common- prefix computation in suffix arrays and its applications. InCombinatorial Pattern Matching, 12th Annual Symposium, CPM 2001, volume 2089 ofLecture Notes in Computer Science, pages 181–192. Springer, 2001.doi:10.1007/3-540-48194-X\_17

work page doi:10.1007/3-540-48194-x 2001
[32]

An Optimal Space Lower Bound for Approximating MAX-CUT

DominikKempaandTomaszKociumaka. Stringsynchronizingsets: sublinear-timeBWTconstructionand optimal LCE data structure. InProceedings of the 51st Annual ACM SIGACT Symposium on Theory of Computing, STOC 2019, pages 756–767. ACM, 2019.doi:10.1145/3313276.3316368

work page doi:10.1145/3313276.3316368 2019
[33]

Breaking theO(n)-barrier in the construction of compressed suffix arrays and suffix trees

Dominik Kempa and Tomasz Kociumaka. Breaking the𝑂(n)-barrier in the construction of compressed suffix arrays and suffix trees. InProceedings of the 2023 ACM-SIAM Symposium on Discrete Algorithms, SODA 2023, pages 5122–5202. SIAM, 2023.doi:10.1137/1.9781611977554.CH187. 29

work page doi:10.1137/1.9781611977554.ch187 2023
[34]

In: 65th IEEE Annual Symposium on Foundations of Computer Science, FOCS 2024, Chicago, IL, USA, October 27-30, 2024

Dominik Kempa and Tomasz Kociumaka. Lempel-Ziv (LZ77) factorization in sublinear time. In65th IEEEAnnualSymposiumonFoundationsofComputerScience,FOCS2024,pages2045–2055.IEEE,2024. doi:10.1109/FOCS61266.2024.00122

work page doi:10.1109/focs61266.2024.00122 2055
[35]

On the hardness hierarchy for the o(n√log n) complexity in the word RAM

Dominik Kempa and Tomasz Kociumaka. On the hardness hierarchy for the o(n√log n) complexity in the word RAM. InProceedings of the 57th Annual ACM Symposium on Theory of Computing, STOC 2025, pages 290–300. ACM, 2025.doi:10.1145/3717823.3718291

work page doi:10.1145/3717823.3718291 2025
[36]

Ian Munro, Yakov Nekrich, and Jeffrey Scott Vitter

J. Ian Munro, Yakov Nekrich, and Jeffrey Scott Vitter. Fast construction of wavelet trees.Theor. Comput. Sci., 638:91–97, 2016.doi:10.1016/j.tcs.2015.11.011

work page doi:10.1016/j.tcs.2015.11.011 2016
[37]

A bit-parallel approach to suffix automata: Fast extended string matching

Gonzalo Navarro and Mathieu Raffinot. A bit-parallel approach to suffix automata: Fast extended string matching. InCombinatorialPatternMatching,9thAnnualSymposium,CPM98,Proceedings,volume1448 ofLecture Notes in Computer Science, pages 14–33. Springer, 1998.doi:10.1007/BFB0030778

work page doi:10.1007/bfb0030778 1998
[38]

On shortest unique substring queries

Jian Pei, Wush Chi-Hsuan Wu, and Mi-Yen Yeh. On shortest unique substring queries. In29th IEEE InternationalConferenceonDataEngineering,ICDE2013,pages937–948.IEEEComputerSociety,2013. doi:10.1109/ICDE.2013.6544887

work page doi:10.1109/icde.2013.6544887 2013
[39]

Persistent minimal sequences of SARS-CoV-2.Bioinform., 36(21):5129–5132, 2021.doi:10.1093/BIOINFORMATICS/BTAA686

Diogo Pratas and Jorge Miguel Silva. Persistent minimal sequences of SARS-CoV-2.Bioinform., 36(21):5129–5132, 2021.doi:10.1093/BIOINFORMATICS/BTAA686

work page doi:10.1093/bioinformatics/btaa686 2021
[40]

On computing the smallest suffixient set

Jakub Radoszewski and Wiktor Zuba. Computing string covers in sublinear time. InString Processing and InformationRetrieval-31stInternationalSymposium, SPIRE2024, Proceedings, volume14899ofLecture Notes in Computer Science, pages 272–288. Springer, 2024.doi:10.1007/978-3-031-72200-4\_21

work page doi:10.1007/978-3-031-72200-4 2024
[41]

Silva, Diogo Pratas, Luísa Castro, Armando J

Raquel M. Silva, Diogo Pratas, Luísa Castro, Armando J. Pinho, and Paulo Jorge S. G. Ferreira. Three minimal sequences found in ebola virus genomes and absent from human DNA.Bioinform., 31(15):2421– 2425, 2015.doi:10.1093/BIOINFORMATICS/BTV189

work page doi:10.1093/bioinformatics/btv189 2015
[42]

Sleator and Robert

Daniel Dominic Sleator and Robert Endre Tarjan. A data structure for dynamic trees.J. Comput. Syst. Sci., 26(3):362–391, 1983.doi:10.1016/0022-0000(83)90006-5

work page doi:10.1016/0022-0000(83)90006-5 1983
[43]

Packed compact tries: A fast andefficientdatastructureforonlinestringprocessing.IEICETrans.Fundam.Electron.Commun.Comput

Takuya Takagi, Shunsuke Inenaga, Kunihiko Sadakane, and Hiroki Arimura. Packed compact tries: A fast andefficientdatastructureforonlinestringprocessing.IEICETrans.Fundam.Electron.Commun.Comput. Sci., 100-A(9):1785–1793, 2017.doi:10.1587/TRANSFUN.E100.A.1785

work page doi:10.1587/transfun.e100.a.1785 2017
[44]

Linear pattern matching algorithms

Peter Weiner. Linear pattern matching algorithms. In14th Annual Symposium on Switching and Automata Theory, pages 1–11. IEEE Computer Society, 1973.doi:10.1109/SWAT.1973.13. 30

work page doi:10.1109/swat.1973.13 1973