pith. sign in

arxiv: 1907.09271 · v1 · pith:GMPLWH3Ynew · submitted 2019-07-22 · 💻 cs.DS · cs.DM· cs.FL

Succinct Representation for (Non)Deterministic Finite Automata

Pith reviewed 2026-05-24 18:11 UTC · model grok-4.3

classification 💻 cs.DS cs.DMcs.FL
keywords succinct data structuresdeterministic finite automatanondeterministic finite automataspace-efficient representationstring acceptance queryacyclic automataautomata operations
0
0 comments X

The pith

A DFA with n states over σ symbols fits in (σ-1)n log n + O(n log σ) bits and decides acceptance of string x in O(|x| log σ) time.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper constructs compact representations for deterministic finite automata that stay close to the information-theoretic minimum while supporting fast membership queries. For any DFA the encoding uses (σ-1)n log n plus lower-order terms and answers whether an input string is accepted in time linear in the length of the string times log σ. When the DFA is acyclic the space drops further and the query time becomes strictly linear in |x|. The same approach yields a representation for nondeterministic automata and efficient algorithms for the usual language operations of union, intersection, and complement.

Core claim

Any deterministic finite automaton with n states over a σ-letter alphabet can be stored in (σ-1)n log n + O(n log σ) bits so that, given an input string x, membership in the accepted language is decided in O(|x| log σ) time with constant extra words of working space. For acyclic DFAs the space improves to (σ-1)(n-1) log n + 3n + O(log² σ) + o(n) bits and the query time becomes exactly proportional to |x|. An NFA with n states is represented in σ n² + n bits and accepts a string in O(n² |x|) time. The same representations support union, intersection, and complement operations on the languages.

What carries the argument

The succinct encoding of the transition function, which compresses the n-by-σ table of next states while supporting rapid successive lookups for each symbol of the input string.

If this is right

  • Union and intersection of two languages can be formed while staying within the same asymptotic space bound.
  • Complementation of a language is possible without a significant increase in the stored size.
  • Acyclic DFAs achieve both the improved space bound and strictly linear query time.
  • The NFA representation allows acceptance testing with quadratic dependence on the number of states.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same compression ideas could be tested on other finite-state models such as transducers or weighted automata.
  • In memory-limited settings the representation would let regular-expression engines operate on larger patterns than the explicit transition table allows.
  • The acyclic case suggests that DAG-shaped automata arising in string algorithms may admit even tighter succinct forms.

Load-bearing premise

That the transition table of any DFA admits an encoding whose size is bounded by the stated formula while still allowing the claimed query procedure to run in the stated time.

What would settle it

A concrete family of DFAs for which every encoding that supports O(|x| log σ) acceptance queries requires more than (σ-1)n log n + o(n log σ) bits in the worst case.

read the original abstract

Deterministic finite automata are one of the simplest and most practical models of computation studied in automata theory. Their conceptual extension is the non-deterministic finite automata which also have plenty of applications. In this article, we study these models through the lens of succinct data structures where our ultimate goal is to encode these mathematical objects using information-theoretically optimal number of bits along with supporting queries on them efficiently. Towards this goal, we first design a succinct data structure for representing any deterministic finite automaton $\mathcal{D}$ having $n$ states over a $\sigma$-letter alphabet $\Sigma$ using $(\sigma-1) n\log n + O(n \log \sigma)$ bits of space, which can determine, given an input string $x$ over $\Sigma$, whether $\mathcal{D}$ accepts $x$ in $O(|x| \log \sigma)$ time, using constant words of working space. When the input deterministic finite automaton is acyclic, not only we can improve the above space-bound significantly to $(\sigma -1) (n-1)\log n+ 3n + O(\log^2 \sigma) + o(n)$ bits, we also obtain optimal query time for string acceptance checking. More specifically, using our succinct representation, we can check if a given input string $x$ can be accepted by the acyclic deterministic finite automaton using time proportional to the length of $x$, hence, the optimal query time. We also exhibit a succinct data structure for representing a non-deterministic finite automaton $\mathcal{N}$ having $n$ states over a $\sigma$-letter alphabet $\Sigma$ using $\sigma n^2+n$ bits of space, such that given an input string $x$, we can decide whether $\mathcal{N}$ accepts $x$ efficiently in $O(n^2|x|)$ time. Finally, we also provide time and space-efficient algorithms for performing several standard operations such as union, intersection, and complement on the languages accepted by deterministic finite automata.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 0 minor

Summary. The manuscript presents succinct data structures for deterministic and non-deterministic finite automata. For any DFA with n states over a σ-letter alphabet, it claims a representation in (σ−1)n log n + O(n log σ) bits supporting acceptance queries in O(|x| log σ) time with constant working space. For acyclic DFAs the space improves to (σ−1)(n−1) log n + 3n + O(log² σ) + o(n) bits with optimal (linear in |x|) query time. For NFAs it claims σ n² + n bits with O(n² |x|) query time. It also gives time- and space-efficient algorithms for union, intersection and complement on DFA languages.

Significance. If the central claims held, the work would advance succinct data structures for automata by supplying explicit space bounds and query procedures that improve on naïve transition-table encodings while supporting standard language operations. The acyclic-DFA and NFA results, together with the operation algorithms, would be of interest in string algorithms and formal-language applications.

major comments (2)
  1. [Abstract] Abstract: the claimed space bound for an arbitrary DFA, (σ−1)n log n + O(n log σ) bits, lies strictly below the information-theoretic minimum of σ n log n bits. There are n^{σ n} distinct transition functions δ : Q × Σ → Q; any injective encoding therefore requires Ω(σ n log n) bits in the worst case. The stated construction cannot represent every DFA.
  2. [Abstract] Abstract: the acyclic-DFA bound (σ−1)(n−1) log n + 3n + O(log² σ) + o(n) bits is likewise o(σ n log n) and cannot accommodate all acyclic transition structures, which still number super-exponentially many.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their thorough review and for highlighting the discrepancy between our claimed space bounds and the information-theoretic lower bounds. We agree that the issues raised are valid and require correction in the manuscript.

read point-by-point responses
  1. Referee: [Abstract] Abstract: the claimed space bound for an arbitrary DFA, (σ−1)n log n + O(n log σ) bits, lies strictly below the information-theoretic minimum of σ n log n bits. There are n^{σ n} distinct transition functions δ : Q × Σ → Q; any injective encoding therefore requires Ω(σ n log n) bits in the worst case. The stated construction cannot represent every DFA.

    Authors: We acknowledge the validity of this observation. The space bound we provided does not suffice to encode all possible DFAs, as correctly noted by the referee. This suggests an oversight in our analysis of the space requirements. We will revise the abstract and the main text to either adjust the space bound to the correct information-theoretic minimum or to specify the subclass of DFAs for which the given bound applies. revision: yes

  2. Referee: [Abstract] Abstract: the acyclic-DFA bound (σ−1)(n−1) log n + 3n + O(log² σ) + o(n) bits is likewise o(σ n log n) and cannot accommodate all acyclic transition structures, which still number super-exponentially many.

    Authors: We agree with this assessment as well. The bound for acyclic DFAs is also below what is needed for all possible acyclic automata. We will make the necessary revisions to correct this in the manuscript. revision: yes

Circularity Check

0 steps flagged

No significant circularity; bounds stated as explicit functions of n and σ

full rationale

The paper states its space bounds directly as (σ−1)n log n + O(n log σ) for general DFAs and similar explicit expressions for acyclic DFAs and NFAs, with query times likewise given as functions of input size. No load-bearing step equates a claimed result to a fitted parameter, renames a known quantity, or reduces via self-citation to an unverified premise internal to the work. The derivation is presented as a construction achieving the stated encoding and query procedures; any information-theoretic objection concerns external correctness rather than internal circular reduction of the claimed quantities to their own inputs.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The work rests on standard definitions of finite automata and the existence of rank/select structures for bit vectors; no free parameters, new entities, or ad-hoc axioms are introduced in the abstract.

axioms (1)
  • domain assumption Finite automata are defined by a finite set of states, a transition function, start state, and accepting states.
    The succinct encodings presuppose the standard mathematical definition of DFA and NFA.

pith-pipeline@v0.9.0 · 5916 in / 1227 out tokens · 19704 ms · 2026-05-24T18:11:13.517982+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.