arxiv: 2604.16253 · v1 · submitted 2026-04-17 · ⚛️ physics.ins-det

Recognition: unknown

A Complexity Agnostic Clustering Engine for Time Projection Chambers and its Implementation in FPGA

Datao Gong (1) ((1) Fermi National Accelerator Laboratory), Jinyuan Wu (1), Michael Wang (1)

Authors on Pith no claims yet

Pith reviewed 2026-05-10 06:36 UTC · model grok-4.3

classification ⚛️ physics.ins-det

keywords FPGAclusteringtime projection chamberTPCdata reorganizationfixed latencyhardware implementationpredictable timing

0 comments

The pith

An FPGA clustering block for time projection chambers completes its work in exactly twice the input filling time with no dependence on the number or size of clusters.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper describes a hardware clustering engine built in FPGA that takes hit data from time projection chambers and reorganizes it so that all hits from the same cluster appear together in the output stream. The engine splits its work into a data-filling phase followed by a data-outputting phase that consumes precisely the same number of clock cycles as the filling phase, producing a total latency of exactly twice the filling interval. This fixed-time behavior holds for any number of clusters or hits per cluster provided the total hit count stays inside a pre-allocated memory size. A sympathetic reader cares because detector data pipelines at high event rates cannot afford variable or quadratic processing delays that would require extra buffering or risk dropping events.

Core claim

The clustering functional block reorganizes input data and the hits data belonging to the same clusters are output together. The clustering operation consists of two phases, data filling phase and data outputting phase, and the later uses the same number of clock cycles as the data filling phase. The clustering block can accommodate events with arbitrary number of clusters and number of hits per cluster as long as the total number of hits is within a predesigned limit. The operation time is exactly twice of the data filling time with no residual O(n2) term.

What carries the argument

The two-phase clustering functional block that fills a memory array in one pass and then streams grouped cluster data in a second pass of equal length.

If this is right

Downstream processing stages receive hits already grouped by cluster without additional sorting logic.
The design supports any mixture of cluster counts and sizes as long as aggregate hits fit the memory.
Operating frequency of 200 MHz was demonstrated on a low-cost FPGA evaluation board.
No quadratic term appears in the execution time, removing the usual scaling penalty of software clustering.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The fixed-latency property could simplify real-time trigger decisions in detectors that must decide within a strict clock window.
The same two-phase memory-reorganization pattern might be reused for other high-rate sensor grouping tasks where worst-case timing must be bounded.
Larger on-chip memory would extend the maximum event size the engine can handle without changing the timing guarantee.

Load-bearing premise

The total number of hits in any single event must stay inside the memory capacity built into the clustering block.

What would settle it

Feed the block an event whose total hit count exceeds the pre-designed memory limit and observe whether all hits are still grouped and emitted within the fixed number of clock cycles.

Figures

Figures reproduced from arXiv: 2604.16253 by Datao Gong (1) ((1) Fermi National Accelerator Laboratory), Jinyuan Wu (1), Michael Wang (1).

**Figure 9.** Figure 9: Output of the cascaded clustering engines [PITH_FULL_IMAGE:figures/full_fig_p004_9.png] view at source ↗

read the original abstract

A clustering functional block implemented in field-programable-gate-array (FPGA) for time projection chambers (TPC) operating with predictable time regardless the complexity of the event is described in this paper. The clustering functional block reorganizes input data and the hits data belonging to the same clusters are output together for further process in the later stages. The clustering operation consists of two phases, data filling phase and data outputting phase, and the later uses the same number of clock cycles as the data filling phase. The clustering block can accommodate events with arbitrary number of clusters and number of hits per cluster as long as the total number of hits is within a predesigned limit. The operation time is exactly twice of the data filling time with no residual O(n2) term. The clustering block has been implemented with operating frequency of 200 MHz in a low-cost FPGA evaluation module and test results confirm the expected performance.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The manuscript describes a clustering functional block implemented in an FPGA for Time Projection Chambers (TPCs). The block reorganizes input hit data so that hits from the same clusters are grouped together for downstream processing. It operates in two phases (data filling and data outputting) that each consume the same number of clock cycles, yielding a total latency of exactly twice the filling time with no residual O(n²) term. The design handles arbitrary numbers of clusters and hits per cluster provided the total hit count remains within a pre-synthesized memory limit, and it has been realized at 200 MHz on a low-cost FPGA evaluation board with test results stated to confirm the expected fixed-time behavior.

Significance. If the fixed-time guarantee and implementation details hold, the work offers a practical hardware primitive for real-time, bounded-latency clustering in high-rate TPC readout systems. Such a module could reduce variable software latency in online event building or trigger pipelines in particle-physics experiments. The explicit conditioning on a total-hit memory ceiling and the two-phase reorganization approach are clear engineering contributions, though their generality depends on the quantitative evidence supplied in the full manuscript.

major comments (2)

[Abstract / Implementation] Abstract and implementation section: the claim of a 200 MHz operating frequency and confirmed test performance is presented without accompanying resource-utilization figures (LUTs, FFs, BRAMs, DSPs), timing closure reports, or block diagrams of the memory organization and state machines. These data are load-bearing for assessing whether the fixed-time property is achieved in practice on the target device.
[Design description] Design description (implicit in the two-phase operation claim): the fixed-time guarantee (exactly 2× filling cycles, no O(n²) remainder) is explicitly conditioned on the total number of hits staying inside a pre-designed memory limit. The manuscript should specify the concrete memory allocation (array dimensions, hash-table sizing, or dual-port RAM usage) and state what occurs on overflow—data dropping, stall, or fallback behavior—because this boundary directly determines whether the complexity-agnostic property holds for all events the TPC may produce.

minor comments (2)

[Abstract] The abstract contains a typographical error: 'field-programable-gate-array' should read 'field-programmable gate array'.
[Results / Figures] Figure captions and timing diagrams (if present) should explicitly label the filling-phase and output-phase cycle counts to make the 'exactly twice' claim visually verifiable.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive comments on our manuscript describing the FPGA clustering engine for TPCs. We address each major comment below and will revise the manuscript to incorporate the requested clarifications and supporting data.

read point-by-point responses

Referee: [Abstract / Implementation] Abstract and implementation section: the claim of a 200 MHz operating frequency and confirmed test performance is presented without accompanying resource-utilization figures (LUTs, FFs, BRAMs, DSPs), timing closure reports, or block diagrams of the memory organization and state machines. These data are load-bearing for assessing whether the fixed-time property is achieved in practice on the target device.

Authors: We agree that the manuscript would benefit from explicit resource utilization data and implementation diagrams to substantiate the 200 MHz operation and fixed-latency claims. In the revised version we will add a table reporting post-implementation resource usage (LUTs, flip-flops, BRAMs, and DSPs) on the target low-cost FPGA, a timing summary confirming closure at 200 MHz, and a block diagram of the memory organization and control state machines. These additions will allow readers to verify that the two-phase design indeed delivers the stated constant-time behavior without hidden variable costs. revision: yes
Referee: [Design description] Design description (implicit in the two-phase operation claim): the fixed-time guarantee (exactly 2× filling cycles, no O(n²) remainder) is explicitly conditioned on the total number of hits staying inside a pre-designed memory limit. The manuscript should specify the concrete memory allocation (array dimensions, hash-table sizing, or dual-port RAM usage) and state what occurs on overflow—data dropping, stall, or fallback behavior—because this boundary directly determines whether the complexity-agnostic property holds for all events the TPC may produce.

Authors: We concur that the operating envelope must be defined precisely. The revised manuscript will specify the exact memory structures employed (including array dimensions for hit storage, cluster pointer tables, and dual-port RAM configuration), the maximum total hit capacity for which the fixed 2× latency is guaranteed, and the overflow policy. In our implementation, excess hits beyond the pre-allocated memory are dropped to preserve the deterministic latency; this choice and its implications for TPC event rates will be stated explicitly. revision: yes

Circularity Check

0 steps flagged

No circularity: hardware design description with explicit precondition

full rationale

The paper describes an FPGA clustering block whose fixed-time behavior (exactly two phases, each taking N clock cycles for N hits) follows directly from the stated architecture of sequential data filling followed by output reorganization. The 'no residual O(n²) term' and complexity-agnostic claims are architectural consequences of the two-phase design and the pre-synthesis memory allocation, not a derived prediction or fitted result. The total-hit limit is stated explicitly as a precondition rather than hidden, so the performance guarantee is conditional but self-contained with no reduction to self-definition, self-citation, or ansatz smuggling.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 0 invented entities

The design rests on standard digital logic assumptions and a single capacity limit rather than new physical principles or fitted constants.

free parameters (1)

Maximum total hits capacity
The block guarantees fixed time only when total hits stay below this pre-designed memory limit; the value is chosen at design time.

axioms (1)

domain assumption FPGA fabric can realize the two-phase clustering logic without timing violations at the target clock rate
Assumes standard synthesis and place-and-route tools will meet timing for the described architecture.

pith-pipeline@v0.9.0 · 5467 in / 1242 out tokens · 52483 ms · 2026-05-10T06:36:05.564402+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

4 extracted references · 2 canonical work pages

[1]

The clustering block has been implemented with operating frequency of 200 MHz in a low-cost FPGA evaluation module and test results confirm the expected performance

term. The clustering block has been implemented with operating frequency of 200 MHz in a low-cost FPGA evaluation module and test results confirm the expected performance. Index Terms — Trigger Systems , FPGA Applications , Time Projection Chamber, Clustering I. INTRODUCTION N modern high energy physics and nuclear physics experiments, time projection cha...
[2]

The clustering operation is a type of doublet finding process, Manuscript submitted Feb

terms. The clustering operation is a type of doublet finding process, Manuscript submitted Feb. 23, 2026. This manuscript has been authored by Fermi Forward Discovery Group, LLC under Contract No. 89243024CSC000002 with the U.S. Department of Energy, Office of Science, Office of High Energy Physics. and therefore, it intrinsically should be possible to fi...

2026
[3]

A Fast General-Purpose Clustering Algorithm Based on FPGAs for High-Throughput Data Processing,

A. Annovi and M. Beretta, “A Fast General-Purpose Clustering Algorithm Based on FPGAs for High-Throughput Data Processing,” 2009 IEEE Nuclear Science Symposium Conference Record (NSS/MIC), Orlando, FL, USA, 2009, pp. 4138-4141, doi:10.1109/NSSMIC.2009.5402322

work page doi:10.1109/nssmic.2009.5402322 2009
[4]

Online pattern recognition for the ALICE high level trigger,

V. Lindenstruth et al., "Online pattern recognition for the ALICE high level trigger," in IEEE Transactions on Nuclear Science, vol. 51, no. 3, pp. 383-390, June 2004, doi: 10.1109/TNS.2004.829384 (a) (b) Fig. 7. Output from the FPGA clustering engine (a) (b) Fig. 8. An event with very long clusters Fig. 9. Output of the cascaded clustering engines

work page doi:10.1109/tns.2004.829384 2004