Recognition: no theorem link
On the Complexity of the Matching Problem of Regular Expressions with Backreferences
Pith reviewed 2026-05-11 01:05 UTC · model grok-4.3
The pith
String matching for regular expressions with k backreferences cannot be solved in O(n^{2k-ε}) time for any ε > 0.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The string matching problem for k-REWBs cannot be solved in O(n^{2k-ε}) time for any ε > 0, is W[2]-hard when parameterized by expression length, and for 2-use 2-REWBs cannot be solved in n^{1+o(1)} time. At the same time, 1-use REWBs admit an O(n log² n)-time algorithm built from suffix trees, transition monoids, factorization forests, and string periodicity.
What carries the argument
The r-use k-REWB model that restricts how often each backreference may be used, together with fine-grained reductions that transfer known lower bounds onto the matching problem.
If this is right
- No algorithm can match k-backreference expressions substantially faster than quadratic time per backreference.
- The matching problem is W[2]-hard when the parameter is the length of the expression.
- Even the restricted case of two backreferences each used twice is as hard as triangle detection for near-linear time.
- The 1-use restriction allows a near-linear algorithm that improves on the prior quadratic bound.
Where Pith is reading between the lines
- Practical regex engines could enforce single-use limits on backreferences to guarantee faster worst-case performance.
- The same hardness techniques may apply to other string-processing tasks that track repeated substrings.
- Testing the constants hidden in the O(n log² n) algorithm on real-world input distributions would show how close it comes to linear time in practice.
Load-bearing premise
The reductions from established hard problems to REWB matching preserve the precise exponents in the running-time bounds.
What would settle it
An algorithm that solves k-REWB matching in O(n^{2k-1}) time for some fixed k, or a concrete string-expression pair that violates one of the stated reductions.
read the original abstract
ReDoS is a well-known type of algorithmic complexity attack, where an adversary supplies maliciously crafted strings to a regular expression matching engine, aiming to exhaust computational resources of systems. Even quadratic-time behavior in matching engines has been exploited in successful attacks, as exemplified by major outages at Stack Overflow (2016) and Cloudflare (2019). These incidents motivate a fundamental question: Is it possible to construct matching engines that are provably efficient, running in (near-)linear time in the length of the input string? For classical regular expressions (REGEX), Thompson's construction yields a linear-time algorithm. However, practical engines support powerful features such as backreferences, which strictly extend the expressive power of REGEX but unfortunately increase the risk of ReDoS attacks. This paper investigates the fine-grained complexity of the string matching problem for regular expressions with backreferences (REWBs). Specifically, we consider $r$-use $k$-REWBs. On the hardness side, we show that the string matching problem for $k$-REWBs cannot be solved in $O(n^{2k-\epsilon})$ time for any $\epsilon > 0$ under SETH. We also prove that this problem is \textbf{W[2]}-hard when parameterized by the length of the REWB expression, strengthening the previous \textbf{W[1]}-hardness. Moreover, we prove that this problem for $2$-use $2$-REWBs cannot be solved in $n^{1+o(1)}$ time unless the triangle detection problem can be solved in that time. On the algorithmic side, we present an $O(n \log^2 n)$-time algorithm for $1$-use REWBs, which significantly improves upon the recent $O(n^2)$-time algorithm by Nogami and Terauchi (MFCS, 2025). Our algorithm employs several techniques including suffix trees, transition monoids of REGEXes, factorization forest data structures, and periodicity of strings.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper investigates the fine-grained complexity of the string matching problem for regular expressions with backreferences (REWBs), focusing on r-use k-REWBs. It proves that k-REWBs cannot be solved in O(n^{2k-ε}) time for any ε>0 under SETH, establishes W[2]-hardness parameterized by the length of the REWB expression (strengthening prior W[1]-hardness), and shows that 2-use 2-REWBs cannot be solved in n^{1+o(1)} time unless triangle detection can be solved in that time. On the positive side, it gives an O(n log² n)-time algorithm for 1-use REWBs that combines suffix trees, transition monoids of the regular skeleton, factorization forests, and string periodicity lemmas, improving on the recent O(n²) algorithm of Nogami and Terauchi.
Significance. If the results hold, the paper delivers a tight complexity landscape for REWB matching that directly informs ReDoS mitigation. The SETH lower bound is linear in the string length and matches the exponent 2k, the W[2]-hardness result is a clear strengthening, and the triangle-detection reduction for constant-use cases is a useful conditional barrier. The algorithmic contribution is a substantial improvement that correctly assembles standard string data structures without hidden exponential dependence on expression size, as confirmed by the analysis of transition monoids and factorization-forest height.
minor comments (2)
- [Abstract] Abstract: the statement of the O(n log² n) bound should explicitly note the (polynomial) dependence on the size of the input REWB, since transition-monoid construction is performed on the regular skeleton.
- [Introduction] The paper would benefit from a summary table in the introduction that lists the complexity results for each combination of k and r (including the new 1-use algorithm and the three hardness results).
Simulated Author's Rebuttal
We thank the referee for the positive assessment of our work and the recommendation for minor revision. The report accurately summarizes the main contributions: the SETH lower bound showing that k-REWB matching cannot be solved in O(n^{2k-ε}) time, the strengthening to W[2]-hardness parameterized by expression length, the conditional n^{1+o(1)} lower bound for 2-use 2-REWBs unless triangle detection is in n^{1+o(1)}, and the O(n log² n) algorithm for 1-use REWBs via suffix trees, transition monoids, factorization forests, and string periodicity. No specific major comments appear in the provided report.
Circularity Check
No significant circularity identified
full rationale
The paper's central results consist of conditional lower bounds obtained via explicit reductions from SETH (for the n^{2k-ε} bound on k-REWBs) and from triangle detection (for the n^{1+o(1)} bound on 2-use 2-REWBs), together with an upper bound for 1-use REWBs assembled from standard, externally defined primitives (suffix trees, transition monoids of the regular skeleton, factorization forests of logarithmic height, and string periodicity lemmas). None of these steps define a quantity in terms of itself, rename a fitted parameter as a prediction, or rely on a load-bearing self-citation whose content is unverified outside the present manuscript. The reductions are stated to produce linear-length strings with the claimed number of backreferences, and the algorithmic analysis contains no hidden exponential dependence on expression size. The derivation chain is therefore independent of its own outputs.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Strong Exponential Time Hypothesis (SETH)
Reference graph
Works this paper leans on
-
[1]
A. Abboud and V. V. Williams. Popular conjectures imply strong lower bounds for dynamic problems. In FOCS ’14 , pages 434–443. IEEE Computer Society, 2014. doi:10.1109/FOCS. 2014.53
-
[2]
A. Abboud and V. V. Williams. Popular conjectures imply strong lower bounds for dynamic problems, 2014. arXiv:1402.0054, doi:10.48550/arXiv.1402.0054
-
[3]
A. V. Aho. Algorithms for finding patterns in strings. In J. van Leeuwen, editor, Handbook of Theoretical Computer Science, Volume A: Algorithms and Complexity , pages 255–300. MIT Press, 1990. doi:10.1016/B978-0-444-88071-0.50010-2
- [4]
-
[5]
A. Amir, T. Kopelowitz, A. Levy, S. Pettie, E. Porat, and B. R. Shalom. Mind the gap! online dictionary matching with one gap. Algorithmica, 81:2123–2157, 2019. doi:10.1007/ s00453-018-0526-2
2019
-
[6]
D. Angluin. Finding patterns common to a set of strings (extended abstract). In STOC ’79 , pages 130–141. ACM, 1979. doi:10.1145/800135.804406
-
[7]
D. Angluin. Finding patterns common to a set of strings. JCSS, 21(1):46–62, 1980. doi: 10.1016/0022-0000(80)90041-0
-
[8]
Checking quadratic behaviors of regex of cloudflare outage
Anonymous Author(s). Checking quadratic behaviors of regex of cloudflare outage. https: //regex101.com/r/hidvB3/1/debugger, 2025
2025
-
[9]
Checking quadratic behaviors of regex of stack overflow outage
Anonymous Author(s). Checking quadratic behaviors of regex of stack overflow outage. https://regex101.com/r/pR6DbC/1/debugger, 2025
2025
-
[10]
A. Backurs and P. Indyk. Which Regular Expression Patterns Are Hard to Match? . In FOCS, pages 457–466. IEEE Computer Society, 2016. doi:10.1109/FOCS.2016.56
-
[11]
A. Barrière and C. Pit-Claudel. Linear matching of javascript regular expressions. Proc. ACM Program. Lang., 8(PLDI), jun 2024. doi:10.1145/3656431
-
[12]
M. Berglund and B. van der Merwe. Re-examining regular expressions with backreferences. Theor. Comput. Sci. , 940(Part):66–80, 2023. doi:10.1016/j.tcs.2022.10.041
-
[13]
P. Bille and I. L. Gørtz. Sparse regular expression matching. In SODA 2024, pages 3354–3375. SIAM, 2024. doi:10.1137/1.9781611977912.120
-
[14]
P. Bille, I. L. Gørtz, H. W. Vildhøj, and D. K. Wind. String matching with variable length gaps. Theoretical Computer Science , 443:25–34, 2012. doi:10.1016/j.tcs.2012.03.029. 77
-
[15]
M. Bojańczyk. Algorithms for regular languages that use algebra. SIGMOD Rec., 41(2):5–14, aug 2012. doi:10.1145/2350036.2350038
-
[16]
M. Bojańczyk. Languages recognised by finite semigroups, and their generalisations to objects such as trees and graphs, with an emphasis on definability in monadic second-order logic. CoRR, abs/2008.11635, 2020. arXiv:2008.11635, doi:10.48550/arxiv.2008.11635
-
[17]
M. Bojańczyk and P. Parys. Efficient evaluation of nondeterministic automata using factoriza- tion forests. In S. Abramsky, C. Gavoille, C. Kirchner, F. Meyer auf der Heide, and P. G. Spi- rakis, editors, ICALP, pages 515–526. Springer, 2010. doi:10.1007/978-3-642-14165-2\ _44
-
[18]
K. Bringmann, A. Grønlund, M. Künnemann, and K. G. Larsen. The NF A Acceptance Hypothesis: Non-Combinatorial and Dynamic Lower Bounds. In ITCS 2024 , volume 287 of LIPIcs, pages 22:1–22:25. Schloss Dagstuhl, 2024. doi:10.4230/LIPIcs.ITCS.2024.22
-
[19]
A. Buckles. simple-markdown: inlinecode: Fix ReDoS & improve escape semantics (commit 89797fe). GitHub commit, 2019. URL: https://github.com/ariabuckles/ simple-markdown/commit/89797fe
2019
-
[20]
A. K. Chandra, D. C. Kozen, and L. J. Stockmeyer. Alternation. J. ACM , 28(1):114–133, jan 1981. doi:10.1145/322234.322243
-
[21]
A. Chattopadhyay, A. W. Li, and K. Mamouras. Verified and efficient matching of reg- ular expressions with lookaround. In Proceedings of the 14th ACM SIGPLAN Interna- tional Conference on Certified Programs and Proofs , CPP ’25, pages 198–213. ACM, 2025. doi:10.1145/3703595.3705884
-
[22]
Choffrut and J
C. Choffrut and J. Karhumäki. Combinatorics of words. In G. Rozenberg and A. Salomaa, editors, Handbook of Formal Languages, Volume 1: Word, Language, Grammar , pages 329–
-
[23]
doi:10.1007/978-3-642-59136-5_6
Springer, 1997. doi:10.1007/978-3-642-59136-5_6
-
[24]
Colcombet
T. Colcombet. The factorisation forest theorem. In J. Pin, editor, Handbook of Automata The- ory, pages 653–693. European Mathematical Society Publishing House, Zürich, Switzerland,
-
[25]
doi:10.4171/Automata-1/18
- [26]
-
[27]
R. Cox. Regular expression matching can be simple and fast. https://swtch.com/~rsc/ regexp/regexp1.html, January 2007
2007
-
[28]
R. Cox. Regular expression matching in the wild. https://swtch.com/~rsc/regexp/ regexp3.html, mar 2010
2010
-
[29]
M. Crochemore. An optimal algorithm for computing the repetitions in a word. Information Processing Letters, 12(5):244–250, 1981. doi:10.1016/0020-0190(81)90024-7
-
[30]
M. Crochemore, C. Hancart, and T. Lecroq. Algorithms on Strings . Cambridge University Press, 2007. doi:10.1017/CBO9780511546853. 78
-
[31]
Crochemore and W
M. Crochemore and W. Rytter. Jewels of stringology . World Scientific, 2002. doi:10.1142/ 4838
2002
-
[32]
S. A. Crosby. Denial of service through regular expressions. USENIX Association,
-
[33]
URL: https://www.usenix.org/conference/12th-usenix-security-symposium/ denial-service-through-regular-expressions
-
[34]
S. A. Crosby and D. S. Wallach. Denial of service via algorithmic complexity at- tacks. In 12th USENIX Security Symposium (USENIX Security 03) . USENIX Association,
-
[35]
URL: https://www.usenix.org/conference/12th-usenix-security-symposium/ denial-service-algorithmic-complexity-attacks
-
[36]
Fomin and Lukasz Kowalik and Daniel Lokshtanov and D
M. Cygan, F. V. Fomin, L. Kowalik, D. Lokshtanov, D. Marx, M. Pilipczuk, M. Pilipczuk, and S. Saurabh. Parameterized Algorithms. Springer, 2015. doi:10.1007/978-3-319-21275-3
-
[37]
Dąbrowski and W
R. Dąbrowski and W. Plandowski. Solving two-variable word equations (extended abstract). In J. Díaz, J. Karhumäki, A. Lepistö, and D. Sannella, editors, Automata, Languages and Pro- gramming: 31st International Colloquium, ICALP 2004, Turku, Finland, July 12-16, 2004. Proceedings, volume 3142 of Lecture Notes in Computer Science , pages 408–419. Springer,
2004
-
[38]
doi:10.1007/978-3-540-27836-8_36
-
[39]
R. Dąbrowski and W. Plandowski. On word equations in one variable. Algorithmica, 60(4):819–828, Aug 2011. doi:10.1007/s00453-009-9375-3
-
[40]
J. C. Davis, F. Servant, and D. Lee. Using selective memoization to defeat regular expression denial of service (redos). In SP’21, pages 1–17, 2021. doi:10.1109/SP40001.2021.00032
-
[41]
V. Diekert, C. Gutierrez, and C. Hagenah. The existential theory of equations with rational constraints in free groups is pspace-complete. Inf. Comput. , 202(2):105–140, 2005. doi: 10.1016/j.ic.2005.04.002
-
[42]
R. G. Downey and M. R. Fellows. Fixed-parameter tractability and completeness i: Basic re- sults. SIAM Journal on Computing , 24(4):873–921, 1995. doi:10.1137/S0097539792228228
-
[43]
Duraj, M
L. Duraj, M. Künnemann, and A. Polak. Tight conditional lower bounds for longest common increasing subsequence. Algorithmica, 81(10):3968–3992, 2019. doi:10.1007/ S00453-018-0485-7
2019
-
[44]
A. Ehrenfeucht and G. Rozenberg. Finding a homomorphism between two words in np- complete. Information Processing Letters , 9(2):86–88, 1979. doi:10.1016/0020-0190(79) 90135-2
-
[45]
H. Fernau, F. Manea, R. Mercaş, and M. L. Schmid. Pattern Matching with Variables: Fast Algorithms and New Hardness Results. In STACS 2015, volume 30 of LIPIcs, pages 302–315. Schloss Dagstuhl, 2015. doi:10.4230/LIPIcs.STACS.2015.302
-
[46]
H. Fernau, F. Manea, R. Mercaş, and M. L. Schmid. Pattern matching with variables: Efficient algorithms and complexity results. ACM Trans. Comput. Theory , 12(1), feb 2020. doi:10.1145/3369935. 79
-
[47]
H. Fernau and M. L. Schmid. Pattern matching with variables: A multivariate complexity analysis. Inf. Comput. , 242:287–305, 2015. doi:10.1016/j.ic.2015.03.006
-
[48]
Fernau, M
H. Fernau, M. L. Schmid, and Y. Villanger. On the parameterised complexity of string morphism problems. Theory of Computing Systems , 59(1):24–51, 2016. doi:10.1007/ s00224-015-9635-3
2016
-
[49]
N. J. Fine and H. S. Wilf. Uniqueness theorems for periodic functions. Proceedings of the American Mathematical Society , 16(1):109–114, 1965. doi:10.2307/2034009
-
[50]
M. J. Fischer and A. R. Meyer. Boolean matrix multiplication and transitive closure. In Annual Symposium on Switching and Automata Theory , pages 129–131, 1971. doi:10.1109/ SWAT.1971.4
1971
-
[51]
Fredriksson and S
K. Fredriksson and S. Grabowski. Efficient algorithms for pattern matching with general gaps, character classes, and transposition invariance. Information Retrieval , 11:335–357,
-
[52]
doi:10.1007/s10791-008-9054-z
-
[53]
D. D. Freydenberger and M. L. Schmid. Deterministic regular expressions with back- references. J. Comput. Syst. Sci. , 105:1–39, 2019. doi:10.1016/j.jcss.2019.04.001
- [54]
-
[55]
Fujinami and I
H. Fujinami and I. Hasuo. Efficient matching with memoization for regexes with look-around and atomic grouping. In S. Weirich, editor, 33rd European Symposium on Programming, ESOP 2024 , volume 14577 of Lecture Notes in Computer Science , pages 90–118. Springer,
2024
-
[56]
doi:10.1007/978-3-031-57267-8\_4
-
[57]
M. Furman. Application of a method of rapid multiplication of matrices to the problem of finding the transitive closure of a graph. Doklady Akademii Nauk , 194(3):524–524, 1970. URL: https://www.mathnet.ru/eng/dan35686
1970
-
[58]
[FVP13] Jan Faigl, Vojtěch Vonásek, and Libor Přeučil
A. Gajentaan and M. H. Overmars. On a class of O(n2) problems in computational geometry. Computational Geometry , 5(3):165–185, 1995. doi:10.1016/0925-7721(95)00022-2
-
[59]
S. Ginsburg and E. H. Spanier. Semigroups, Presburger formulas, and languages. Pacific Journal of Mathematics , 16(2):285–296, 1966. doi:10.2140/pjm.1966.16.285
-
[60]
Goyvaerts
J. Goyvaerts. Runaway Regular Expressions: Catastrophic Backtracking. https://www. regular-expressions.info/catastrophic.html, 2021
2021
-
[61]
Goyvaerts
J. Goyvaerts. Atomic grouping. https://www.regular-expressions.info/atomic.html, 2025
2025
-
[62]
Goyvaerts
J. Goyvaerts. Lookahead and Lookbehind Zero-Length Assertions. https://www. regular-expressions.info/lookaround.html, 2025
2025
-
[63]
Goyvaerts
J. Goyvaerts. Repetition with star and plus. https://www.regular-expressions.info/ repeat.html, 2025. 80
2025
-
[64]
J. Graham-Cumming. Details of the Cloudflare outage on July 2, 2019. Cloudflare Blog. Archived at https://web. archive.org/web/20190712160002/https://blog.cloudflare.com/ details-of-the-cloudflare-outage-on-july-2-2019/ , July 2019. URL: https: //blog.cloudflare.com/details-of-the-cloudflare-outage-on-july-2-2019/
-
[65]
D. Gusfield. Algorithms on Strings, Trees, and Sequences - Computer Science and Computa- tional Biology . Cambridge University Press, 1997. doi:10.1017/cbo9780511574931
-
[66]
T. Haapasalo, P. Silvasti, S. Sippu, and E. Soisalon-Soininen. Online dictionary matching with variable-length gaps. In Experimental Algorithms , volume 6630 of Lecture Notes in Computer Science , pages 76–87. Springer, 2011. doi:10.1007/978-3-642-20662-7_7
- [67]
-
[68]
M. Holzer. Comments on monoids induced by NF As. IFIG Research Report 2301, Institut für Informatik, Universität Giessen, March 2023. URL: https://jlupub.ub.uni-giessen. de/items/c469513c-e738-48ae-900b-397efc96b042 , doi:10.22029/jlupub-14955
-
[69]
W.-K. Hon, T.-W. Lam, R. Shah, S. V. Thankachan, H.-F. Ting, and Y. Yang. Dictionary matching with a bounded gap in pattern or in text. Algorithmica, 80:698–713, 2018. doi: 10.1007/s00453-017-0288-2
-
[70]
O. H. Ibarra, T. Pong, and S. M. Sohn. A note on parsing pattern languages. Pattern Recognit. Lett., 16(2):179–182, 1995. doi:10.1016/0167-8655(94)00091-G
-
[71]
A. Jeż. One-variable word equations in linear time. Algorithmica, 74(1):1–48, 2016. doi: 10.1007/s00453-014-9931-3
-
[72]
A. Jeż. Recompression: A simple and powerful technique for word equations. J. ACM, 63(1), feb 2016. doi:10.1145/2743014
-
[73]
A. Jeż. Solving Word Equations (And Other Unification Problems) by Recompression. In M. Fernández and A. Muscholl, editors, 28th EACSL Annual Conference on Computer Science Logic, CSL 2020, Barcelona, Spain, January 13-16, 2020 , volume 152 of LIPIcs, pages 3:1– 3:17. Schloss Dagstuhl - Leibniz-Zentrum für Informatik, 2020. doi:10.4230/LIPIcs.CSL. 2020.3
-
[74]
T. Jiang, E. Kinber, A. Salomaa, K. Salomaa, and S. Yu. Pattern languages with and without erasing. International Journal of Computer Mathematics , 50(3–4):147–163, 1994. doi:10.1080/00207169408804252
-
[75]
T. Kopelowitz, S. Pettie, and E. Porat. Higher lower bounds from the 3SUM conjecture. In SODA 2016 , pages 1272–1287. SIAM, 2016. doi:10.1137/1.9781611974331.ch89
-
[76]
M. Kufleitner. A proof of the factorization forest theorem, 2007. URL: https://arxiv.org/ abs/0710.5130, arXiv:0710.5130, doi:10.48550/arXiv.0710.5130. 81
-
[77]
M. Kufleitner. The height of factorization forests. In E. Ochmański and J. Tyszkiewicz, editors, MFCS, pages 443–454. Springer, 2008. doi:10.1007/978-3-540-85238-4\_36
-
[78]
R. E. Ladner, R. J. Lipton, and L. J. Stockmeyer. Alternating pushdown and stack automata. SIAM Journal on Computing , 13(1):135–155, 1984. doi:10.1137/0213010
-
[79]
Levy and B
A. Levy and B. R. Shalom. A comparative study of dictionary matching with gaps: Limitations, techniques and challenges. Algorithmica, 84:590–638, 2022. doi:10.1007/ s00453-021-00851-6
2022
-
[80]
Mamouras and A
K. Mamouras and A. Chattopadhyay. Efficient matching of regular expressions with lookaround assertions. Proc. ACM Program. Lang. , 8(POPL), jan 2024. doi:10.1145/ 3632934
2024
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.