Recognition: unknown
Constructing Suffixient Arrays Revisited
Pith reviewed 2026-05-08 17:23 UTC · model grok-4.3
The pith
A new one-pass algorithm builds the suffixient array in linear time under the RAM model.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Given the suffix array, LCP array, and BWT of the reversed text over alphabet size sigma, the algorithm produces a correct suffixient array after a single left-to-right traversal of these structures, using only constant-time operations per position and therefore O(n) time in the standard RAM model.
What carries the argument
A single-pass decision procedure that marks positions for inclusion in the suffixient array by comparing LCP values against run boundaries in the reverse BWT and maintaining a small amount of local state.
If this is right
- The suffixient array becomes available after one sequential read of the auxiliary arrays instead of repeated traversals.
- Construction time remains linear even when the algorithm is forced to process its input in a streaming fashion.
- The same index still supports single-occurrence pattern location and maximal exact match enumeration with random access to the text.
- No extra logarithmic dependence on alphabet size appears in the one-pass setting.
Where Pith is reading between the lines
- The technique may reduce cache traffic when building indexes for large repetitive collections such as genomes.
- Similar one-pass logic could be applied to other subset indices that currently require multiple passes.
- It indicates that the extra log factor in the earlier single-pass method was an artifact of the algorithm rather than a lower bound.
Load-bearing premise
The suffix array, LCP array, and BWT of the reversed text are already computed and can be read sequentially once.
What would settle it
An input text on which the output set of suffix ranks either misses a required position for correct MEM reporting or includes an unnecessary position, or on which the measured running time exceeds O(n) in the worst case.
read the original abstract
Recently, Cenzato et al.\ proposed a new text index, called the \emph{suffixient array}, which is a subset of the suffix array and supports locating a single pattern occurrence or finding its maximal exact matches (MEMs), assuming random access to the input text $T[1..n]$ is available. They show that, given the suffix array, the longest common prefix array, and the Burrows--Wheeler transform (BWT) of the reverse of $T[1..n]$ over an alphabet $\{1,\ldots,\sigma\}$, a suffixient array can be constructed in linear time. However, their construction algorithms require multiple scans of these arrays. When restricted to a single pass over the arrays, they present an alternative construction algorithm running in $O(n + \overline{r} \log \sigma)$ time, where $\overline{r}$ is the number of runs in the BWT of the reversed text. In this paper, we present a new one-pass algorithm that constructs a suffixient array in linear time under the standard RAM model.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper presents a new one-pass algorithm for constructing the suffixient array in linear time under the standard RAM model. Given the suffix array, LCP array, and BWT of the reversed text, the algorithm performs a single scan that uses LCP values to select the required suffix positions, avoiding the O(n + r-bar log sigma) overhead of the prior one-pass method by Cenzato et al.
Significance. If the algorithm is correct, the result delivers an optimal linear-time, single-pass construction for the suffixient array, a compact subset of the suffix array that supports single-occurrence pattern location and MEM finding with random access to the text. This removes the logarithmic dependence on the number of BWT runs and is a clear practical and theoretical advance for large-scale text indexing. The paper's direct algorithmic procedure under the word-RAM model, with no free parameters or hidden costs, is a strength.
minor comments (2)
- The manuscript would benefit from explicit pseudocode for the one-pass scan (currently described in prose) to make the constant-time per-element claim immediately verifiable.
- A short complexity analysis paragraph immediately after the algorithm description would help readers confirm that all operations (LCP lookups, position selections) are O(1) under the standard RAM model with random access.
Simulated Author's Rebuttal
We thank the referee for their positive assessment of the paper and for recommending minor revision. The referee correctly identifies that the new algorithm achieves true linear time in a single pass over the input arrays, removing the dependence on the number of BWT runs that appeared in the prior one-pass method.
Circularity Check
No significant circularity
full rationale
The paper describes a direct one-pass algorithmic procedure for constructing the suffixient array given the SA, LCP, and BWT of the reversed text under the standard RAM model. The claimed linear time bound follows from a single scan that uses LCP values for position selection, without any equations, parameters, or claims that reduce the result to a fitted input, self-definition, or load-bearing self-citation chain. The construction is presented as an independent procedural contribution that does not rely on renaming known results or smuggling ansatzes via prior work by the same authors.
Axiom & Free-Parameter Ledger
axioms (2)
- domain assumption Standard RAM model with constant-time random access to arrays of size n
- domain assumption Suffix array, LCP array, and BWT of reversed text are given as input
Reference graph
Works this paper leans on
-
[1]
doi:10.1089/CMB.2021.0290. 15 Esko Ukkonen. On-Line Construction of Suffix Trees.Algorithmica, 14(3):249–260, 1995. doi:10.1007/BF01206331. 16 Peter Weiner. Linear Pattern Matching Algorithms. In14th Annual Symposium on Switching and Automata Theory, Iowa City, Iowa, USA, October 15-17, 1973, pages1–11.IEEEComputer Society, 1973. doi:10.1109/SWAT.1973.13....
-
[2]
is_start ←(pre_bwt̸=−1) and (curr_bwt̸= pre_bwt)
-
[3]
is_end ←(next_bwt̸=−1) and (curr_bwt̸= next_bwt)
-
[4]
return max(wstart,wend); B The Pseudocode for Computing b(i) Algorithm 2 Initial-Stack()
-
[5]
push (1,−1,−1) as (index,val,b _val) ontoS
-
[6]
return S Algorithm 3 Compute_b(S, LCP[i],i)
-
[7]
if tuple.index=i then
-
[8]
while tuple.val≥LCP[i] do
-
[9]
push (i, LCP[i],tuple.index) ontoS
-
[10]
17 Algorithm 4 Compute Suffixient Arrays (BWT[1..n], SA[1..n], LCP[1..n])
return tuple.index C The Pseudocode and an Example for Computing Suffixient Arrays Bonizzoni et al. 17 Algorithm 4 Compute Suffixient Arrays (BWT[1..n], SA[1..n], LCP[1..n])
-
[11]
BWT[n + 1]←LCP[n + 1]←−1
-
[12]
rowList←doubly-linked list
-
[13]
resulti←an empty list
-
[14]
if BWT[2]̸= BWT[1] then
-
[15]
append (n−SA[1] + 1, BWT[1], LCP[2]) as (ptext,char,weight ) to the front ofrowList
-
[16]
MAP[c]←head of rowList
-
[17]
ℓ←LCP[i]; ▷Step 1 starts
-
[19]
while ℓ<headTriple.weightdo
-
[22]
MAP[headTriple.char]←null
-
[23]
currWeight ←Compute_w(BWT[i−1], BWT[i], BWT[i + 1], LCP[i], LCP[i + 1])
-
[24]
curr_b←Compute_b(S, LCP[i],i ); ▷Step 1 ends
-
[25]
if currWeight> −1 then ▷Step 2 starts
-
[26]
if currWeight ̸= LCP[i] then
-
[27]
curr_b←Compute_b(S, LCP[i + 1],i + 1)
-
[28]
if i = Candtc(curr_b,i + 1) then
-
[29]
if MAP[c]̸=null then
-
[30]
remove triple from rowList
-
[31]
append (n−SA[i] + 1, BWT[i],currWeight ) to the front ofrowList
-
[32]
MAP[c]←head of rowList; ▷Step 2 ends
-
[33]
while rowList̸= null do ▷Final step starts
-
[34]
headTriple←rowList.head
-
[35]
append headTriple.ptext to resultheadTriple.char
-
[36]
remove the head entry from rowList
-
[37]
The table providesT rev [1..n], the LCP, BWT, and SA arrays, the variablecurr_b, the contents ofrowList, and the character-specific result listsResultc
return [result1, result2,..., resultσ]; ▷Final step ends 18 Constructing Suffixient Arrays Revisited Table 1 Execution trace of the algorithm computing a suffixient array for the inputT = AGCACAGCA$. The table providesT rev [1..n], the LCP, BWT, and SA arrays, the variablecurr_b, the contents ofrowList, and the character-specific result listsResultc. The ...
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.