pith. machine review for the scientific record. sign in

arxiv: 2605.04258 · v1 · submitted 2026-05-05 · 💻 cs.DS

Recognition: unknown

Constructing Suffixient Arrays Revisited

Authors on Pith no claims yet

Pith reviewed 2026-05-08 17:23 UTC · model grok-4.3

classification 💻 cs.DS
keywords suffixient arraysuffix arrayLCP arrayBurrows-Wheeler transformlinear time constructionone-pass algorithmtext indexingmaximal exact matches
0
0 comments X

The pith

A new one-pass algorithm builds the suffixient array in linear time under the RAM model.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents an algorithm that constructs the suffixient array by scanning the suffix array, LCP array, and Burrows-Wheeler transform of the reversed text exactly once. This index is a compact subset of the full suffix array that still lets a user locate one occurrence of a pattern or report its maximal exact matches when random access to the original text is allowed. Earlier methods either needed several passes over the same arrays to reach linear time or paid an extra logarithmic factor when restricted to a single pass. The new procedure removes that overhead while keeping the total construction cost strictly linear in the text length.

Core claim

Given the suffix array, LCP array, and BWT of the reversed text over alphabet size sigma, the algorithm produces a correct suffixient array after a single left-to-right traversal of these structures, using only constant-time operations per position and therefore O(n) time in the standard RAM model.

What carries the argument

A single-pass decision procedure that marks positions for inclusion in the suffixient array by comparing LCP values against run boundaries in the reverse BWT and maintaining a small amount of local state.

If this is right

  • The suffixient array becomes available after one sequential read of the auxiliary arrays instead of repeated traversals.
  • Construction time remains linear even when the algorithm is forced to process its input in a streaming fashion.
  • The same index still supports single-occurrence pattern location and maximal exact match enumeration with random access to the text.
  • No extra logarithmic dependence on alphabet size appears in the one-pass setting.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The technique may reduce cache traffic when building indexes for large repetitive collections such as genomes.
  • Similar one-pass logic could be applied to other subset indices that currently require multiple passes.
  • It indicates that the extra log factor in the earlier single-pass method was an artifact of the algorithm rather than a lower bound.

Load-bearing premise

The suffix array, LCP array, and BWT of the reversed text are already computed and can be read sequentially once.

What would settle it

An input text on which the output set of suffix ranks either misses a required position for correct MEM reporting or includes an unnecessary position, or on which the measured running time exceeds O(n) in the worst case.

read the original abstract

Recently, Cenzato et al.\ proposed a new text index, called the \emph{suffixient array}, which is a subset of the suffix array and supports locating a single pattern occurrence or finding its maximal exact matches (MEMs), assuming random access to the input text $T[1..n]$ is available. They show that, given the suffix array, the longest common prefix array, and the Burrows--Wheeler transform (BWT) of the reverse of $T[1..n]$ over an alphabet $\{1,\ldots,\sigma\}$, a suffixient array can be constructed in linear time. However, their construction algorithms require multiple scans of these arrays. When restricted to a single pass over the arrays, they present an alternative construction algorithm running in $O(n + \overline{r} \log \sigma)$ time, where $\overline{r}$ is the number of runs in the BWT of the reversed text. In this paper, we present a new one-pass algorithm that constructs a suffixient array in linear time under the standard RAM model.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

0 major / 2 minor

Summary. The paper presents a new one-pass algorithm for constructing the suffixient array in linear time under the standard RAM model. Given the suffix array, LCP array, and BWT of the reversed text, the algorithm performs a single scan that uses LCP values to select the required suffix positions, avoiding the O(n + r-bar log sigma) overhead of the prior one-pass method by Cenzato et al.

Significance. If the algorithm is correct, the result delivers an optimal linear-time, single-pass construction for the suffixient array, a compact subset of the suffix array that supports single-occurrence pattern location and MEM finding with random access to the text. This removes the logarithmic dependence on the number of BWT runs and is a clear practical and theoretical advance for large-scale text indexing. The paper's direct algorithmic procedure under the word-RAM model, with no free parameters or hidden costs, is a strength.

minor comments (2)
  1. The manuscript would benefit from explicit pseudocode for the one-pass scan (currently described in prose) to make the constant-time per-element claim immediately verifiable.
  2. A short complexity analysis paragraph immediately after the algorithm description would help readers confirm that all operations (LCP lookups, position selections) are O(1) under the standard RAM model with random access.

Simulated Author's Rebuttal

0 responses · 0 unresolved

We thank the referee for their positive assessment of the paper and for recommending minor revision. The referee correctly identifies that the new algorithm achieves true linear time in a single pass over the input arrays, removing the dependence on the number of BWT runs that appeared in the prior one-pass method.

Circularity Check

0 steps flagged

No significant circularity

full rationale

The paper describes a direct one-pass algorithmic procedure for constructing the suffixient array given the SA, LCP, and BWT of the reversed text under the standard RAM model. The claimed linear time bound follows from a single scan that uses LCP values for position selection, without any equations, parameters, or claims that reduce the result to a fitted input, self-definition, or load-bearing self-citation chain. The construction is presented as an independent procedural contribution that does not rely on renaming known results or smuggling ansatzes via prior work by the same authors.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The paper relies on standard string-algorithm assumptions rather than introducing new fitted parameters or invented entities.

axioms (2)
  • domain assumption Standard RAM model with constant-time random access to arrays of size n
    Invoked when claiming linear time for the one-pass scan.
  • domain assumption Suffix array, LCP array, and BWT of reversed text are given as input
    Stated in the abstract as the starting point for construction.

pith-pipeline@v0.9.0 · 5480 in / 1216 out tokens · 34978 ms · 2026-05-08T17:23:58.334612+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

34 extracted references · 1 canonical work pages

  1. [1]

    15 Esko Ukkonen

    doi:10.1089/CMB.2021.0290. 15 Esko Ukkonen. On-Line Construction of Suffix Trees.Algorithmica, 14(3):249–260, 1995. doi:10.1007/BF01206331. 16 Peter Weiner. Linear Pattern Matching Algorithms. In14th Annual Symposium on Switching and Automata Theory, Iowa City, Iowa, USA, October 15-17, 1973, pages1–11.IEEEComputer Society, 1973. doi:10.1109/SWAT.1973.13....

  2. [2]

    is_start ←(pre_bwt̸=−1) and (curr_bwt̸= pre_bwt)

  3. [3]

    is_end ←(next_bwt̸=−1) and (curr_bwt̸= next_bwt)

  4. [4]

    return max(wstart,wend); B The Pseudocode for Computing b(i) Algorithm 2 Initial-Stack()

  5. [5]

    push (1,−1,−1) as (index,val,b _val) ontoS

  6. [6]

    return S Algorithm 3 Compute_b(S, LCP[i],i)

  7. [7]

    if tuple.index=i then

  8. [8]

    while tuple.val≥LCP[i] do

  9. [9]

    push (i, LCP[i],tuple.index) ontoS

  10. [10]

    17 Algorithm 4 Compute Suffixient Arrays (BWT[1..n], SA[1..n], LCP[1..n])

    return tuple.index C The Pseudocode and an Example for Computing Suffixient Arrays Bonizzoni et al. 17 Algorithm 4 Compute Suffixient Arrays (BWT[1..n], SA[1..n], LCP[1..n])

  11. [11]

    BWT[n + 1]←LCP[n + 1]←−1

  12. [12]

    rowList←doubly-linked list

  13. [13]

    resulti←an empty list

  14. [14]

    if BWT[2]̸= BWT[1] then

  15. [15]

    append (n−SA[1] + 1, BWT[1], LCP[2]) as (ptext,char,weight ) to the front ofrowList

  16. [16]

    MAP[c]←head of rowList

  17. [17]

    ℓ←LCP[i]; ▷Step 1 starts

  18. [19]

    while ℓ<headTriple.weightdo

  19. [22]

    MAP[headTriple.char]←null

  20. [23]

    currWeight ←Compute_w(BWT[i−1], BWT[i], BWT[i + 1], LCP[i], LCP[i + 1])

  21. [24]

    curr_b←Compute_b(S, LCP[i],i ); ▷Step 1 ends

  22. [25]

    if currWeight> −1 then ▷Step 2 starts

  23. [26]

    if currWeight ̸= LCP[i] then

  24. [27]

    curr_b←Compute_b(S, LCP[i + 1],i + 1)

  25. [28]

    if i = Candtc(curr_b,i + 1) then

  26. [29]

    if MAP[c]̸=null then

  27. [30]

    remove triple from rowList

  28. [31]

    append (n−SA[i] + 1, BWT[i],currWeight ) to the front ofrowList

  29. [32]

    MAP[c]←head of rowList; ▷Step 2 ends

  30. [33]

    while rowList̸= null do ▷Final step starts

  31. [34]

    headTriple←rowList.head

  32. [35]

    append headTriple.ptext to resultheadTriple.char

  33. [36]

    remove the head entry from rowList

  34. [37]

    The table providesT rev [1..n], the LCP, BWT, and SA arrays, the variablecurr_b, the contents ofrowList, and the character-specific result listsResultc

    return [result1, result2,..., resultσ]; ▷Final step ends 18 Constructing Suffixient Arrays Revisited Table 1 Execution trace of the algorithm computing a suffixient array for the inputT = AGCACAGCA$. The table providesT rev [1..n], the LCP, BWT, and SA arrays, the variablecurr_b, the contents ofrowList, and the character-specific result listsResultc. The ...