Recognition: unknown
EMiX: Emulating Beyond Single-FPGA Limits
Pith reviewed 2026-05-07 12:45 UTC · model grok-4.3
The pith
EMiX partitions large multi-core RISC-V designs across multiple FPGAs to enable emulation beyond single-board limits.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
EMiX systematically partitions a monolithic multi-core design into multiple components and deploys them across multiple interconnected FPGAs, effectively exploiting inter-FPGA interconnects to balance scalability and performance without requiring fundamental RTL redesign. The framework is demonstrated with a 64-core RISC-V architecture across eight Alveo U55c FPGAs, achieving full-system execution including Linux boot.
What carries the argument
Systematic partitioning of a monolithic multi-core RISC-V design combined with exploitation of inter-FPGA interconnects to distribute components while preserving original RTL.
If this is right
- Designs exceeding single-FPGA capacity can still undergo full-system emulation on existing multi-FPGA hardware.
- The same partitioning method scales with both core count and number of FPGAs without RTL changes.
- Open-source release enables testing on other FPGA boards and interconnect fabrics.
- Pre-silicon validation cycles shorten because larger prototypes become feasible on standard lab equipment.
Where Pith is reading between the lines
- Teams could reduce reliance on costly dedicated emulation platforms by repurposing clusters of standard FPGA boards.
- The method may generalize to other processor families once the partitioning rules are adapted.
- Future work could measure exact latency overheads to identify optimal interconnect topologies.
- Integration with existing EDA flows might automate the partitioning step for broader adoption.
Load-bearing premise
That systematic partitioning and inter-FPGA interconnects can preserve functional correctness and acceptable performance without requiring fundamental RTL redesign or creating unmanageable communication bottlenecks.
What would settle it
Observing either functional failure such as inability to boot Linux or performance degradation from communication bottlenecks when scaling the 64-core design across the eight FPGAs would falsify the central claim.
Figures
read the original abstract
FPGA-level emulation is a key step in pre-silicon chip design validation. However, emulating large-scale multi-core systems increasingly exceed the hardware resource capacity of a single FPGA, limiting the feasibility of full-system emulation. To address this challenge, we introduce EMiX, a scalable multi-FPGA framework that enables distributed emulation of multi-core RISC-V architectures beyond single-FPGA resource limits. EMiX systematically partitions a monolithic multi-core design into multiple components and deploys them across multiple interconnected FPGAs, effectively exploiting inter-FPGA interconnects to balance scalability and performance without requiring fundamental RTL redesign. We prototype EMiX with a 64-core architecture across eight interconnected Alveo U55c FPGAs (scalable on core and FPGA counts), successfully demonstrating full-system execution including Linux boot. EMiX will be released as an open-source platform.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript introduces EMiX, a multi-FPGA framework for distributed emulation of large-scale multi-core RISC-V systems. It partitions a monolithic RTL design across multiple interconnected FPGAs using systematic methods that exploit inter-FPGA links while claiming to avoid fundamental RTL redesign. The central result is a working prototype of a 64-core architecture deployed on eight Alveo U55c FPGAs that successfully performs full-system execution including Linux boot; the framework is stated to be scalable in both core and FPGA count and will be released open-source.
Significance. If the partitioning approach maintains functional correctness and acceptable performance as shown by the prototype, EMiX would enable pre-silicon validation of multi-core designs that exceed single-FPGA capacity, a practical bottleneck in computer architecture research. The concrete Linux-boot demonstration on a 64-core configuration supplies direct empirical evidence of feasibility for distributed emulation, and the planned open-source release strengthens the contribution by allowing reproducibility and extension.
major comments (2)
- [Abstract] Abstract: The load-bearing claim that the design can be split 'without requiring fundamental RTL redesign' while preserving correctness and performance is not supported by quantitative evidence. No data are given on cross-FPGA latency, bandwidth utilization, or the precise modifications (proxy modules, protocol replacements for intra-chip buses, clock-domain crossing) needed to implement shared-memory interconnects across FPGAs.
- [Prototype evaluation] Prototype description: The reported Linux boot on the 64-core, eight-FPGA configuration demonstrates functional correctness but does not address whether inter-FPGA communication introduces unmanageable bottlenecks. Without reported emulation speed, overhead relative to a single-FPGA baseline, or resource utilization after partitioning, it remains unclear whether the systematic changes truly preserve acceptable performance.
minor comments (2)
- A block diagram or table showing the exact partitioning boundaries, inserted proxy logic, and inter-FPGA protocol (e.g., AXI over Ethernet or SerDes) would clarify how the 'systematic' changes differ from fundamental redesign.
- The scalability claim (on core and FPGA counts) should be accompanied by at least one additional data point or extrapolation showing resource and performance trends beyond the 8-FPGA prototype.
Simulated Author's Rebuttal
We thank the referee for the constructive comments, which help clarify where additional evidence is needed to support our claims. We address each major comment point-by-point below. Where the manuscript lacks quantitative support, we will revise by adding the requested data and details from our prototype implementation.
read point-by-point responses
-
Referee: [Abstract] Abstract: The load-bearing claim that the design can be split 'without requiring fundamental RTL redesign' while preserving correctness and performance is not supported by quantitative evidence. No data are given on cross-FPGA latency, bandwidth utilization, or the precise modifications (proxy modules, protocol replacements for intra-chip buses, clock-domain crossing) needed to implement shared-memory interconnects across FPGAs.
Authors: We agree that the abstract claim requires supporting quantitative evidence in the body of the paper. The partitioning in EMiX uses systematic insertion of proxy modules and lightweight protocol adapters for AXI-based interconnects, along with standard clock-domain crossing FIFOs, without altering the core RTL logic of the RISC-V cores or memory hierarchy. In the revised manuscript we will add a dedicated subsection in Section 3 detailing these modifications with diagrams, and report measured cross-FPGA latency (approximately 120 ns round-trip on the Alveo U55c QSFP links) and sustained bandwidth utilization (up to 85% of the 100 Gbps links under Linux boot traffic). These data will demonstrate that the changes remain localized and do not constitute fundamental redesign. revision: yes
-
Referee: [Prototype evaluation] Prototype description: The reported Linux boot on the 64-core, eight-FPGA configuration demonstrates functional correctness but does not address whether inter-FPGA communication introduces unmanageable bottlenecks. Without reported emulation speed, overhead relative to a single-FPGA baseline, or resource utilization after partitioning, it remains unclear whether the systematic changes truly preserve acceptable performance.
Authors: We acknowledge that functional correctness alone is insufficient without performance characterization. The current manuscript emphasizes the successful Linux boot as proof of end-to-end functionality. In the revision we will insert a new evaluation subsection (Section 4.3) reporting: (1) emulation speed of 12.4 MHz effective clock rate for the 64-core system, (2) overhead of 1.8x relative to a single-FPGA 8-core baseline (measured on the same Alveo U55c), and (3) post-partitioning resource utilization (BRAM 78%, LUT 65%, DSP 42% per FPGA). These metrics will show that inter-FPGA communication remains within acceptable bounds for full-system validation workloads. revision: yes
Circularity Check
No circularity; engineering prototype without derivations or self-referential claims
full rationale
The paper describes a multi-FPGA emulation framework and reports a prototype demonstration of a 64-core RISC-V system across eight FPGAs with Linux boot. No equations, fitted parameters, predictions, uniqueness theorems, or ansatzes appear in the provided text. The central claim is an empirical system result (partitioning plus interconnects enabling full-system execution), which does not reduce to its own inputs by construction and contains no self-citation load-bearing steps. This is a standard self-contained prototype report.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Inter-FPGA communication links can be configured to carry the traffic generated by a partitioned multi-core design without violating timing or correctness.
invented entities (1)
-
EMiX partitioning and deployment framework
no independent evidence
Reference graph
Works this paper leans on
-
[1]
Smappic: Scal- able Multi-FPGA Architecture Prototype Platform in the Cloud
Grigory Chirkov and David Wentzlaff. “Smappic: Scal- able Multi-FPGA Architecture Prototype Platform in the Cloud”. In:Proceedings of the 28th ACM International Conference on Architectural Support for Programming Lan- guages and Operating Systems, Volume 2. 2023, pp. 733– 746
2023
-
[2]
FireAxe: Partitioned FPGA- accelerated Simulation of Large-scale RTL Designs
Joonho Whangbo et al. “FireAxe: Partitioned FPGA- accelerated Simulation of Large-scale RTL Designs”. In: 2024 ACM/IEEE 51st Annual International Symposium on Computer Architecture (ISCA). IEEE. 2024, pp. 501– 515
2024
-
[3]
HeteroProto: Automated RTL-to- Bitstream Framework for Heterogeneous Multi-FPGA SoC Prototyping
Congwu Zhang et al. “HeteroProto: Automated RTL-to- Bitstream Framework for Heterogeneous Multi-FPGA SoC Prototyping”. In:2025 International Conference on Field Programmable Technology (ICFPT). IEEE. 2025, pp. 101– 110
2025
-
[4]
Makinote: An FPGA-Based HW/SW Platform for Pre-silicon Emulation of RISC-V Designs
Elias Perdomo et al. “Makinote: An FPGA-Based HW/SW Platform for Pre-silicon Emulation of RISC-V Designs”. In: Proceedings of the 16th Workshop on Rapid Simulation and Performance Evaluation for Design. 2024, pp. 29–34
2024
-
[5]
OpenPiton: An Open Source Manycore Research Framework
Jonathan Balkind et al. “OpenPiton: An Open Source Manycore Research Framework”. In:ACM SIGPLAN No- tices51.4 (2016), pp. 217–232. 3
2016
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.