pith. sign in

arxiv: 2606.01183 · v4 · pith:P632DZTZnew · submitted 2026-05-31 · 💻 cs.DC · cs.DB· cs.DS· cs.PF

The World's Fastest Matching Engine Algorithm

Pith reviewed 2026-06-28 16:32 UTC · model grok-4.3

classification 💻 cs.DC cs.DBcs.DScs.PF
keywords order matching enginepriority queuedata structureslow latencyelectronic tradingsearch treeshigh throughputorder book
0
0 comments X

The pith

Priority-Indicated Nodes let a single CPU core match 32 million orders per second at sub-microsecond latency.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper claims that conventional order-book structures impose pointer-chased traversals and root-to-leaf searches on every operation, creating tail-latency spikes during market bursts. It replaces those structures with two new mechanisms: the Priority-Indicated Node, a priority queue whose fixed contiguous slots and priority indicators let insertion position be read directly without entry comparisons, and neighbor-aware insertion and deletion that use already-known neighbors to perform O(1) reference updates followed by single-path rebalancing. These changes are shown to deliver 4.7-11 times higher throughput than existing open-source engines on identical hardware while keeping median end-to-end latency under one microsecond. A sympathetic reader would care because the resulting capacity on a single core and on a modest 96-core server exceeds current market-data feed volumes by more than an order of magnitude.

Core claim

The central discovery is that the Priority-Indicated Node resolves insertion position directly from indicators in O(1) time independent of queue size, while a depth-aware capacity model keeps hot entries in L1, and that neighbor-aware operations on red-black, AVL, and B+-trees attach or remove nodes with O(1) writes plus single-path rebalancing when neighbors are known in advance. These two changes together remove the dominant storage-layer costs of linked-list and balanced-tree order books, enabling a single core to sustain 32 million order messages per second at sub-microsecond median host-path latency and 4.7-11 times the rate of the best open-source engines.

What carries the argument

The Priority-Indicated Node (PIN), a priority queue whose entries occupy fixed-capacity contiguous slots whose indicators encode global priority status so that insertion position is read directly without comparisons.

If this is right

  • A 96-core commodity server sustains roughly 640 million messages per second across 10,000 symbols.
  • This capacity exceeds twenty times the provisioned volume of the U.S. consolidated quote feed.
  • Tail-latency spikes under micro-bursts are eliminated, preserving market quality precisely when liquidity demand peaks.
  • The neighbor-aware technique works across red-black, AVL, and B+-tree variants without changing their balance invariants.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same fixed-slot indicator approach could accelerate priority scheduling or packet classification in other high-rate real-time systems.
  • Production trading platforms might reduce reliance on specialized hardware if these structures prove portable across languages and architectures.
  • The assumption that neighbor references arrive at zero cost holds in order matching but would require separate validation in domains where neighbors are not already known.

Load-bearing premise

The reported speedups were measured on identical hardware using representative workloads and fair comparisons to open-source engines, with no undisclosed workload-specific optimizations or post-selection of test conditions.

What would settle it

Re-implementing the PIN and neighbor-aware methods, running them on the same hardware and workloads as the compared open-source engines, and measuring throughput below four times the best baseline would falsify the central performance claim.

Figures

Figures reproduced from arXiv: 2606.01183 by Jake Yoon.

Figure 1
Figure 1. Figure 1: System Architecture: a single segment of the matching engine pipeline ingests orders from 𝑁 TCP shards into a shared ingress queue, sequences them, and dispatches them to 𝑀 sharded core matchers. the core stalls on memory rather than arithmetic. The obvi￾ous opposite extreme—storing a per-price queue as a single contiguous array—plays nicely with caches and prefetchers, but is mismatched to real workloads … view at source ↗
Figure 2
Figure 2. Figure 2: Single-matcher setup. Orders are injected directly into the matcher, which pushes events to an output queue drained on an adjacent core. (the empty configuration adds no prefill and leaves only the transient book the workload itself sustains). Book-depth statistics (median active price levels) are measured in a sepa￾rate, untimed pass, so the timed throughput runs carry no in-window statistics overhead [P… view at source ↗
Figure 3
Figure 3. Figure 3: End-to-end host-path measurement span: order flow from TCP ingress through sequencing, matching, and draining (NIC and network excluded). their cancels and modifies) used for [PITH_FULL_IMAGE:figures/full_fig_p012_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: shows the corresponding end-to-end host-path response latency distribution for the same representative run: a sharp principal mode at the median, a small secondary mode near 465 ns, and a thin tail out to P99.9 = 1.69 𝜇s. The concentrated mode reflects the contention-free com￾mon path through parse, match, and encode; the tail cap￾tures the occasional kernel scheduling and queueing effects noted above. The… view at source ↗
Figure 5
Figure 5. Figure 5: Representative stochastic price path during a volatile micro-burst: this draw swings ∼40% peak-to-trough from the $167.52 open ($147.48–$215.22 range). The engine processes ∼2M order messages over the ∼60 ms burst at ∼31 M order messages/s, within the 30.4–32.5 M/s band sus￾tained across the five volatility regimes ( [PITH_FULL_IMAGE:figures/full_fig_p015_5.png] view at source ↗
read the original abstract

A single CPU core sustains 32 million order messages per second at sub-microsecond median end-to-end host-path response latency, 4.7-11 times faster than the best available open-source matching engines on identical hardware. Scaled out, a single 96-core commodity server (~$1,630/month) sustains ~640 million messages per second across 10,000 symbols, over 20 times the provisioned capacity of the U.S. consolidated quote feed. We reach these numbers by attacking the storage layer that sets matching latency. The dominant order-book implementation, linked lists chained through a balanced tree, imposes two costs on every operation: pointer-chased traversal to the insertion point, and root-to-leaf search to locate the target price level. Under micro-bursts these costs produce tail-latency spikes that degrade market quality precisely when liquidity is most needed. We present two data-structure contributions that eliminate them. The first is the Priority-Indicated Node (PIN), a priority queue in which entries occupy fixed-capacity, contiguously addressable slots, with indicators encoding the entry's global priority status. Unlike heaps, which require O(log n) comparisons per operation, the PIN resolves insertion position directly from the indicators without comparing entries; indicator updates are O(1), independent of queue size. A depth-aware capacity model sizes each PIN so hot entries fit within L1 residency. The second targets a broader inefficiency: balanced search trees search from root to leaf on every insertion and deletion, even when the caller already knows the key's in-order neighbors, which in electronic trading are available at zero cost. Neighbor-aware insertion and deletion use known neighbor references to attach or remove a node with O(1) reference writes, followed by single-path rebalancing, across red-black, AVL, and B+-tree variants.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper presents two data-structure innovations for order-book matching engines: the Priority-Indicated Node (PIN), a fixed-capacity priority queue that resolves insertion position from indicators in O(1) time without entry comparisons, and neighbor-aware insertion/deletion for balanced trees (red-black, AVL, B+) that exploit known in-order neighbors to perform O(1) attachment followed by single-path rebalancing. These are claimed to eliminate pointer-chasing and root-to-leaf search costs that cause tail-latency spikes under micro-bursts. Empirically, the resulting engine sustains 32 million order messages per second on a single CPU core at sub-microsecond median latency, 4.7–11× faster than the best open-source engines on identical hardware, scaling to ~640 million messages per second on a 96-core server across 10,000 symbols.

Significance. If the performance numbers are obtained under representative workloads and fair, reproducible comparisons, the result would be a meaningful systems contribution to high-frequency trading infrastructure, directly addressing tail-latency degradation during liquidity stress. The PIN construction and neighbor-aware tree updates are concrete algorithmic advances that could transfer to other priority-queue and ordered-set workloads; the depth-aware capacity model for L1 residency is a practical engineering insight. The manuscript supplies no machine-checked proofs or parameter-free derivations, but the empirical scaling claim (20× U.S. consolidated quote feed capacity on commodity hardware) is falsifiable and therefore valuable if the methodology is fully documented.

major comments (2)
  1. [Evaluation] Evaluation section (and abstract): the central performance claims (32 M msg/s single-core, 4.7–11× speedup, 640 M msg/s scaled) are presented without any description of the workload traces, the exact list of compared open-source engines, how micro-bursts were generated, hardware micro-architecture details, number of trials, error bars, or measurement methodology for end-to-end host-path latency. Because these details are load-bearing for the headline speedup factor, the empirical contribution cannot be assessed or reproduced from the manuscript as written.
  2. [§3] §3 (PIN data structure): the claim that indicator updates are O(1) independent of queue size and that insertion position is resolved directly from indicators without comparisons is central to the latency argument, yet the manuscript provides no pseudocode or walk-through showing how the fixed-capacity slots and priority indicators interact under concurrent or high-contention updates; without this, it is impossible to verify that the structure avoids the O(log n) comparisons of a standard heap while preserving correctness.
minor comments (2)
  1. [Title] The title 'The World's Fastest Matching Engine Algorithm' is stronger than the evidence supplied; a more precise title would better reflect the scope.
  2. [§3 and §4] Notation for the depth-aware capacity model and the single-path rebalancing invariants should be introduced with a small example or diagram to aid readability.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive and detailed comments. We agree that the evaluation methodology requires fuller documentation for reproducibility and that the PIN description would benefit from explicit pseudocode. We will revise the manuscript accordingly to address both points.

read point-by-point responses
  1. Referee: [Evaluation] Evaluation section (and abstract): the central performance claims (32 M msg/s single-core, 4.7–11× speedup, 640 M msg/s scaled) are presented without any description of the workload traces, the exact list of compared open-source engines, how micro-bursts were generated, hardware micro-architecture details, number of trials, error bars, or measurement methodology for end-to-end host-path latency. Because these details are load-bearing for the headline speedup factor, the empirical contribution cannot be assessed or reproduced from the manuscript as written.

    Authors: We agree that the current manuscript does not supply the full methodological details needed for independent assessment or reproduction. In the revised version we will expand the Evaluation section (and adjust the abstract) to document: the workload traces (synthetic micro-bursts constructed to emulate liquidity stress plus any real-market data used), the precise list and versions of the open-source engines compared, the procedure used to generate micro-bursts, hardware specifications including CPU model, cache hierarchy and micro-architecture, number of trials performed, statistical reporting (error bars or latency percentiles), and the exact end-to-end host-path latency measurement method. These additions will make the reported speedups verifiable. revision: yes

  2. Referee: [§3] §3 (PIN data structure): the claim that indicator updates are O(1) independent of queue size and that insertion position is resolved directly from indicators without comparisons is central to the latency argument, yet the manuscript provides no pseudocode or walk-through showing how the fixed-capacity slots and priority indicators interact under concurrent or high-contention updates; without this, it is impossible to verify that the structure avoids the O(log n) comparisons of a standard heap while preserving correctness.

    Authors: We accept that the manuscript currently lacks pseudocode and a concrete walk-through. The revised §3 will include (1) explicit pseudocode for PIN insertion, deletion and indicator maintenance, and (2) a worked example demonstrating how fixed-capacity slots and priority indicators allow direct position resolution in O(1) time without entry comparisons. On concurrency: the design targets the conventional single-threaded-per-order-book model used in matching engines; we will state this assumption explicitly and note that multi-threaded contention would require separate synchronization not covered in the present work. Under the single-threaded model the claimed O(1) indicator updates hold. revision: yes

Circularity Check

0 steps flagged

No circularity: performance claims are empirical measurements with no derivations or fitted predictions

full rationale

The manuscript describes two new data structures (Priority-Indicated Node and neighbor-aware tree operations) and states measured throughput and latency numbers on hardware. No equations, parameter fits, predictions derived from subsets of data, or self-citations appear in the provided text. The central claims are presented as direct experimental outcomes rather than reductions of any input by construction, satisfying the default expectation of no significant circularity.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 1 invented entities

Only the abstract is available, so the ledger is minimal. No fitted parameters or invented physical entities are described. The contributions rest on standard assumptions about tree balancing and cache behavior.

axioms (1)
  • standard math Balanced search trees maintain ordering and balance invariants after local updates
    Invoked implicitly for the neighbor-aware insertion and deletion to require only single-path rebalancing.
invented entities (1)
  • Priority-Indicated Node (PIN) no independent evidence
    purpose: Provide O(1) priority resolution via fixed slots and indicators instead of comparisons
    New structure introduced to eliminate pointer-chasing and comparison costs in order book priority queues.

pith-pipeline@v0.9.1-grok · 5858 in / 1346 out tokens · 59732 ms · 2026-06-28T16:32:33.489681+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

56 extracted references · 13 canonical work pages · 1 internal anchor

  1. [1]

    Gene M. Amdahl. 1967. Validity of the Single Processor Approach to Achieving Large Scale Computing Capabilities. InProceedings of the April 18–20, 1967, Spring Joint Computer Conference (AFIPS ’67 (Spring)). 483–485. doi:10.1145/1465482.1465560

  2. [2]

    Arms Race

    Matteo Aquilina, Eric Budish, and Peter O’Neill. 2022. Quantifying the High-Frequency Trading “Arms Race”.The Quarterly Journal of Economics137, 1 (February 2022), 493–564. doi:10.1093/qje/qjab032 https://academic.oup.com/qje/article/137/1/493/6368348

  3. [3]

    BATS Global Markets. 2011. BATS Announces New Real-time Latency Monitoring Service. Technical notice. Accessed 2025- 12-11.https://cdn.cboe.com/resources/release_notes/2011/BATS- Announces-New-Real-time-Latency-Monitoring-Service-Effective- Tuesday-February-1-2011.pdf

  4. [4]

    McCreight

    Rudolf Bayer and Edward M. McCreight. 1972. Organization and Maintenance of Large Ordered Indexes.Acta Informatica1, 3 (1972), 173–189. doi:10.1007/BF00288683

  5. [5]

    Adam Belay, George Prekas, Ana Klimovic, Samuel Grossman, Chris- tos Kozyrakis, and Edouard Bugnion. 2014. IX: A Protected Dataplane Operating System for High Throughput and Low Latency. InProceed- ings of the 11th USENIX Symposium on Operating Systems Design and Implementation (OSDI). USENIX Association

  6. [6]

    Jean-Philippe Bouchaud, Marc Mézard, and Marc Potters

  7. [7]

    Statistical properties of stock order books: Empirical results and models.Quantitative Finance2, 4 (2002), 251–

  8. [8]

    doi:10.1088/1469-7688/2/4/301 https://www.cfm.com/wp- content/uploads/2022/12/255-2002-statistical-propertes-of-stock- order-books-empiricals-results-and-model.pdf

  9. [9]

    Eric Budish, Peter Cramton, and John Shim. 2015. The High-Frequency Trading Arms Race: Frequent Batch Auctions as a Market Design Response.The Quarterly Journal of Economics130, 4 (2015), 1547–1621. doi:10.1093/qje/qjv027 https://academic.oup.com/qje/article/130/4/ 1547/1916146

  10. [10]

    Cboe Global Markets. 2025. The Necessity of Real-Time Options Data for Retail Participants.https://www.cboe.com/insights/posts/the- necessity-of-real-time-options-data-for-retail-participants/. Q3 2024 OPRA peak message volume of 44.8 million messages per second. Accessed 2025-12-11

  11. [11]

    Cboe Global Markets. 2025. U.S. Equities Market Volume Summary. Market statistics dashboard. Accessed 2025-12-11.https://www.cboe. com/us/equities/market_share/

  12. [12]

    Consolidated Tape Association. 2025. Consolidated Tape Association: CTA Overview.https://www.ctaplan.com/index. Accessed 2025-12-11

  13. [13]

    2010.A stochastic model for order book dynamics

    Rama Cont, Sasha Stoikov, and Rishi Talreja. 2010.A stochastic model for order book dynamics. Technical Report. Columbia University.https: //www.columbia.edu/~ww2040/orderbook.pdf

  14. [14]

    Cormen, Charles E

    Thomas H. Cormen, Charles E. Leiserson, Ronald L. Rivest, and Clifford Stein. 2022.Introduction to Algorithms(4 ed.). The MIT Press.https://mitpress.mit.edu/9780262046305/introduction- to-algorithms/

  15. [15]

    Databento. 2025. What is the Options Price Reporting Authority (OPRA)?https://databento.com/microstructure/opra. April 2025 peak 1-millisecond bursts exceeding 187 million messages per second. Accessed 2025-06-01

  16. [16]

    Deutsche Börse Group. 2016. Xetra Trading System: Xetra In- sights. Technical presentation. Accessed 2025-12-11.https: //www.cashmarket.deutsche-boerse.com/resource/blob/307522/ 5a13437b23985d15540ab20c28893f60/data/Xetra_Insights.pdf

  17. [17]

    Deutsche Börse Group. 2025. Insights into Trading System Dynamics: Deutsche Börse’s T7. Technical presentation. Ac- cessed 2025-12-11.https://www.eurex.com/resource/blob/48918/ e8d4df56f75c9a96fb0f6fff6b18a14f/data/presentation_insights-into- trading-system-dynamics_en.pdf

  18. [18]

    Aleksandar Dragojević, Dushyanth Narayanan, Orion Hodson, and Miguel Castro. 2014. FaRM: Fast Remote Memory. InProceedings of the 11th USENIX Symposium on Networked Systems Design and Implementation (NSDI). USENIX Association

  19. [19]

    Eurex Exchange. 2024. Publication of T7 Documentation: In- sights into Trading System Dynamics. Implementation News. https://www.eurex.com/ex-en/support/information-channels/ implementation-news/Publication-of-T7-Documentation-4042720

  20. [20]

    Joshua Fried, Zhenyuan Ruan, Amy Ousterhout, and Adam Belay

  21. [21]

    InProceedings of the 14th USENIX Symposium on Operating Systems Design and Implementation (OSDI)

    Caladan: Mitigating Interference at Microsecond Timescales. InProceedings of the 14th USENIX Symposium on Operating Systems Design and Implementation (OSDI). USENIX Association

  22. [22]

    Fujitsu Limited and Tokyo Stock Exchange, Inc. 2015. New, En- hanced TSEarrowheadCash Equity Trading System — For a Safer, More Convenient Market. Press release. Accessed 2025- 12-11.https://www.fujitsu.com/global/about/resources/news/press- releases/2015/0924-01.html

  23. [23]

    Igal Galperin and Ronald L. Rivest. 1993. Scapegoat Trees. InProceed- ings of the Fourth Annual ACM-SIAM Symposium on Discrete Algo- rithms (SODA). SIAM, 165–174.https://people.csail.mit.edu/rivest/ pubs/GR93.pdf

  24. [24]

    Gould, Mason A

    Martin D. Gould, Mason A. Porter, Stacy Williams, Mark McDon- ald, Daniel J. Fenn, and Sam D. Howison. 2013. Limit order books. Quantitative Finance13, 11 (2013), 1709–1742. doi:10.1080/14697688. 2013.803148 https://www.math.ucla.edu/~mason/papers/gould-qf- final.pdf

  25. [25]

    Guibas and Robert Sedgewick

    Leo J. Guibas and Robert Sedgewick. 1978. A Dichromatic Framework for Balanced Trees. InProceedings of the 19th Annual Symposium on Foundations of Computer Science (FOCS). 8–21. doi:10.1109/SFCS.1978.3

  26. [26]

    Japan Exchange Group, Inc. 2016. JPX Report 2016. Integrated report. Accessed 2025-12-11.https://www.jpx.co.jp/english/corporate/ investor-relations/ir-library/integrated-report/tvdivq0000008t9q- att/JPXreport2016e_all.pdf

  27. [27]

    Ivan Jericevich, Dharmesh Sing, and Tim Gebbie. 2022. CoinTossX: An open-source low-latency high-throughput matching engine.Soft- wareX19 (2022), 101136. doi:10.1016/j.softx.2022.101136

  28. [28]

    Kostis Kaffes, Timothy Chong, Jack Tigar Humphries, Adam Belay, David Mazières, and Christos Kozyrakis. 2019. Shinjuku: Preemptive Scheduling forµsecond-Scale Tail Latency. InProceedings of the 16th USENIX Symposium on Networked Systems Design and Implementation (NSDI). USENIX Association, 345–360

  29. [29]

    Andersen

    Anuj Kalia, Michael Kaminsky, and David G. Andersen. 2019. Datacen- ter RPCs Can Be General and Fast. InProceedings of the 16th USENIX Symposium on Networked Systems Design and Implementation (NSDI). USENIX Association. Best Paper Award

  30. [31]

    Putni ¯nš

    Marta Khomyn and T ¯alis J. Putni ¯nš. 2021. Algos gone wild: What drives the extreme order cancellation rates in modern markets?Jour- nal of Banking & Finance129 (2021), 106170. doi:10.1016/j.jbankfin. 2021.106170

  31. [32]

    Sida Li, Mao Ye, and Miles Zheng. 2023. Refusing the Best Price? Journal of Financial Economics147, 2 (2023), 317–337. doi:10.1016/j. jfineco.2022.11.004

  32. [33]

    Andersen, and Michael Kamin- sky

    Hyeontaek Lim, Dongsu Han, David G. Andersen, and Michael Kamin- sky. 2014. MICA: A Holistic Approach to Fast In-Memory Key-Value Storage. InProceedings of the 11th USENIX Symposium on Networked Systems Design and Implementation (NSDI). USENIX Association, 429– 444

  33. [34]

    Yandong Mao, Eddie Kohler, and Robert Tappan Morris. 2012. Cache Craftiness for Fast Multicore Key-Value Storage. InProceedings of the 7th ACM European Conference on Computer Systems (EuroSys). ACM, 19 Jake Yoon 183–196

  34. [35]

    Menkveld

    Albert J. Menkveld. 2018. High-Frequency Trading as Viewed through an Electron Microscope.Financial Analysts Journal74, 2 (2018), 24–31. doi:10.2469/faj.v74.n2.1Also available via SSRN (doi:10.2139/ssrn.2875612)

  35. [36]

    Ioane Muni Toke. 2013. The order book as a queueing system: av- erage depth and influence of the size of limit orders. arXiv preprint arXiv:1311.5661.https://arxiv.org/abs/1311.5661

  36. [37]

    Hamish Murray, Thu Phuong Pham, and Harminder Singh. 2016. La- tency reduction and market quality: The case of the Australian Stock Exchange.International Review of Financial Analysis46 (2016), 257–265. doi:10.1016/j.irfa.2015.09.001

  37. [38]

    Nasdaq, Inc. 2024. Reimagining the Markets of Tomorrow: U.S. Equity Market Data. Whitepaper.https://www.nasdaq.com/docs/2024/US_ Equity_Market_Data_Whitepaper

  38. [39]

    Object Computing, Inc. 2013. Liquibook: An Open Source C++ Order Matching Engine. GitHub repository. Accessed 2025-12-11.https: //github.com/ObjectComputing/liquibook

  39. [40]

    2021.Characterization of High-Speed Trading

    Atsuyuki Ohyama, Yoshitaka Fukuyama, Shintaro Okude, and Kenta Suzuki. 2021.Characterization of High-Speed Trading. Techni- cal Report. Financial Services Agency (Japan). Accessed 2025- 12-11.https://www.fsa.go.jp/frtc/english/seika/srhonbun/20210707_ Characterization_of_high_speed_tradingEN.pdf

  40. [41]

    Amy Ousterhout, Joshua Fried, Jonathan Behrens, Adam Belay, and Hari Balakrishnan. 2019. Shenango: Achieving High CPU Efficiency for Latency-Sensitive Datacenter Workloads. InProceedings of the 16th USENIX Symposium on Networked Systems Design and Implementation (NSDI). USENIX Association, 361–378

  41. [42]

    Simon Peter, Jialin Li, Irene Zhang, Dan R. K. Ports, Doug Woos, Arvind Krishnamurthy, Thomas Anderson, and Timothy Roscoe. 2014. Arrakis: The Operating System is the Control Plane. InProceedings of the 11th USENIX Symposium on Operating Systems Design and Imple- mentation (OSDI). USENIX Association

  42. [43]

    PostgreSQL Global Development Group. 2024. Database Page Layout. PostgreSQL Documentation, Chapter 66.https://www.postgresql.org/ docs/current/storage-page-layout.html

  43. [44]

    George Prekas, Marios Kogias, and Edouard Bugnion. 2017. ZygOS: Achieving Low Tail Latency for Microsecond-Scale Networked Tasks. InProceedings of the 26th ACM Symposium on Operating Systems Prin- ciples (SOSP). ACM, 325–341

  44. [45]

    QuantCup 1 Contest. 2011. Price–Time Matching Engine (Winning Implementation). GitHub Gist.https://gist.github.com/druska/ d6ce3f2bac74db08ee9007cdf98106ef

  45. [46]

    Zia Ur Rahman. 2021. limit-order-book: Fast, Multi-threaded Trade Matching Engine. GitHub repository.https://github.com/ziaagikian/ limit-order-book

  46. [47]

    Sepideh Roghanchi, Jakob Eriksson, and Nilanjana Basu. 2017. Ffwd: Delegation Is (Much) Faster Than You Think. InProceedings of the 26th ACM Symposium on Operating Systems Principles (SOSP). ACM, 342–358

  47. [48]

    W. K. Selph. 2011. How to Build a Fast Limit Order Book. Blog post.https://web.archive.org/web/20110219163448/http://howtohft. wordpress.com/2011/02/15/how-to-build-a-fast-limit-order-book/

  48. [49]

    Sergej Teverovski. 2023. Open Day 2023: T7 – Latency Roadmap. Deutsche Börse Group presentation. Accessed 2025-12- 11.https://www.deutsche-boerse.com/resource/blob/3690194/ fe6b01b1e14800eb40374a95516debf2/data/Open%20Day%202023% 20-%20Presentation,%20T7-Latency%20Roadmap.pdf

  49. [50]

    Stephen Tu, Wenting Zheng, Eddie Kohler, Barbara Liskov, and Samuel Madden. 2013. Speedy Transactions in Multicore In-Memory Databases. InProceedings of the 24th ACM Symposium on Operating Systems Principles (SOSP). ACM, 18–32

  50. [51]

    Securities and Exchange Commission

    U.S. Securities and Exchange Commission. 2005. 17 CFR § 242.611 — Order Protection Rule (Regulation NMS Rule 611). Code of Federal Regulations.https://www.ecfr.gov/current/title-17/chapter-II/part- 242/subject-group-ECFRac68bdd026a46db/section-242.611

  51. [52]

    Securities and Exchange Commission

    U.S. Securities and Exchange Commission. 2013. Quote Lifetime Distri- butions (Data Highlight 2013-04). SEC Market Structure Research. Last reviewed Aug. 16, 2022.https://www.sec.gov/about/quote-lifetime- distributions

  52. [53]

    Securities and Exchange Commission

    U.S. Securities and Exchange Commission. 2015. Memorandum: Rule 611 of Regulation NMS. SEC staff memo.https://www.sec. gov/spotlight/emsac/memo-rule-611-regulation-nms.pdf

  53. [54]

    Securities and Exchange Commission

    U.S. Securities and Exchange Commission. 2022. Trade to Or- der Volume Ratios. SEC Data Visualizations. Updated Aug. 6, 2025.https://www.sec.gov/data-research/statistics-data- visualizations/trade-order-volume-ratios

  54. [55]

    Securities and Exchange Commission

    U.S. Securities and Exchange Commission. 2025. Quote Life: Large Stocks (Conditional Frequency, Q1 2025). SEC Market Structure Data Visualizations.https://www.sec.gov/marketstructure/datavis/ quotelife_stocks_lg.html

  55. [56]

    Maksim Zheravin. 2019. Exchange-core: Ultra-fast matching engine. GitHub repository.https://github.com/exchange-core/exchange-core

  56. [57]

    The power of patience: A behavioral regularity in limit order placement

    Ilija I. Zovko and J. Doyne Farmer. 2002. The power of patience: A behavioral regularity in limit order placement.Quantitative Finance 2, 5 (2002), 387–392. doi:10.1088/1469-7688/2/5/308 https://arxiv.org/ abs/cond-mat/0206280. 20