arxiv: 2604.12715 · v1 · submitted 2026-04-14 · 💻 cs.AR · cs.DC

Recognition: unknown

EPAC: The Last Dance

Filippo Mantovani , Fabio Banchelli , Pablo Vizcaino , Roger Ferrer , Oscar Palomar , Francesco Minervini , Jesus Labarta , Mauro Olivieri

show 31 more authors

Sebastiano Pomata Pedro Marcuello Jordi Cortina Alberto Moreno Josep Sans Roger Espasa Vassilis Papaefstathiou Nikolaos Dimou Georgios Ieronymakis Antonis Psathakis Michalis Giaourtas Iasonas Mastorakis Manolis Marazakis Eric Guthmuller Andrea Bocco J\'er\^ome Fereyre C\'esar Fuguet Mate Kova\v{c} Mario Kova\v{c} Luka Mrkovi\'c Josip Ramljak Luca Bertaccini Tim Fischer Frank K. Gurkaynak Paul Scheffler Luca Benini Bhavishya Goel Madhavan Manivannan Tiago Rocha Nuno Neves Jens Kr\"uger

Authors on Pith no claims yet

Pith reviewed 2026-05-10 14:29 UTC · model grok-4.3

classification 💻 cs.AR cs.DC

keywords RISC-Vaccelerator chipHPCnetwork-on-chipchip design22FDXvector processingvariable precision

0 comments

The pith

A RISC-V accelerator chip integrating vector, stencil, and variable-precision tiles has been fabricated and validated in 22nm technology.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents EPAC, a RISC-V-based accelerator chip developed in the European Processor Initiative to advance a European HPC processor ecosystem. It combines three specialized tiles: VEC for double-precision vector workloads, STX for stencil and machine learning tasks, and VRP for variable-precision numerical solvers. These tiles connect through a CHI-based network-on-chip with distributed L2 cache and use a SerDes link for external memory. The 27 square millimeter chip in GF22FDX technology with 0.3 billion transistors was taped out and successfully brought up, with major IP blocks validated. The authors also outline the architecture, physical implementation, and lessons from multi-partner European design coordination.

Core claim

EPAC is a RISC-V accelerator chip in GF22FDX technology that integrates a vector processing tile for HPC, a many-core stencil and ML tile, and a variable-precision tile, all linked by a coherent CHI network-on-chip with distributed L2 cache and SerDes memory interface. The 27 sq mm die containing approximately 0.3 billion transistors was taped out and successfully brought up, validating all major IP blocks.

What carries the argument

The CHI-based network-on-chip interconnecting the VEC, STX, and VRP tiles with distributed L2 cache and SerDes external memory link.

If this is right

Heterogeneous RISC-V tiles can address diverse HPC workload classes on one die.
The physical implementation process in 22FDX technology proves viable for such multi-tile designs.
Distributed L2 cache and CHI NoC enable coherent communication across specialized compute units.
Multi-partner academic and industrial coordination can deliver a full chip tape-out and bring-up.
The architecture supplies a working platform for extended-precision and stencil computations alongside standard vector processing.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The working chip provides a concrete reference design that could encourage broader RISC-V adoption in European HPC systems.
Bring-up data from the SerDes link and NoC may inform power and latency optimizations in follow-on chips.
Similar modular tile approaches could extend to other domains such as embedded AI or scientific computing accelerators.

Load-bearing premise

The assumption that the tile integration via the NoC and SerDes link functions without major physical or functional issues, which supports the successful bring-up claim.

What would settle it

Test results or measurements showing that any major IP block, such as the vector unit or the coherent interconnect, failed to operate after bring-up would disprove the validation success.

Figures

Figures reproduced from arXiv: 2604.12715 by Alberto Moreno, Andrea Bocco, Antonis Psathakis, Bhavishya Goel, C\'esar Fuguet, Eric Guthmuller, Fabio Banchelli, Filippo Mantovani, Francesco Minervini, Frank K. Gurkaynak, Georgios Ieronymakis, Iasonas Mastorakis, Jens Kr\"uger, J\'er\^ome Fereyre, Jesus Labarta, Jordi Cortina, Josep Sans, Josip Ramljak, Luca Benini, Luca Bertaccini, Luka Mrkovi\'c, Madhavan Manivannan, Manolis Marazakis, Mario Kova\v{c}, Mate Kova\v{c}, Mauro Olivieri, Michalis Giaourtas, Nikolaos Dimou, Nuno Neves, Oscar Palomar, Pablo Vizcaino, Paul Scheffler, Pedro Marcuello, Roger Espasa, Roger Ferrer, Sebastiano Pomata, Tiago Rocha, Tim Fischer, Vassilis Papaefstathiou.

**Figure 1.** Figure 1: EPAC test-chip block diagram accelerator chip (EPAC); iv) a coordinated engineering effort across multiple academic and industrial partners. The remaining part of the document is organized as follows: Section 2 introduces the EPAC overall system architecture; Section 3 details the RISC-V compute tiles; Section 4 summarizes the hardware development necessary for a test-chip that are external to the compute… view at source ↗

**Figure 2.** Figure 2: Avispado + VPU block diagram The internal structure of the core follows a typical generalpurpose processor design as depicted in [PITH_FULL_IMAGE:figures/full_fig_p003_2.png] view at source ↗

**Figure 3.** Figure 3: STX block diagram DMA capabilities to manage data movement. The system can be extended with SPU units, which are optional co-processors developed by Fraunhofer and optimized for stencil workloads with static access patterns and local data dependencies. These units are designed through a hardware-software co-design approach and provide additional performance for stencil kernels. STX follows a modular desig… view at source ↗

**Figure 4.** Figure 4: Left: Floorplan with area of EPAC components. [PITH_FULL_IMAGE:figures/full_fig_p006_4.png] view at source ↗

read the original abstract

This paper presents EPAC, a RISC-V-based accelerator chip developed within the European Processor Initiative (EPI) as part of a multi-year, multi-partner effort to build a European HPC processor ecosystem. EPAC is implemented in GlobalFoundries 22FDX (GF22FDX) technology, covers an area of 27 sq mm with approximately 0.3 billion transistors, and integrates three distinct RISC-V compute tiles targeting different workload classes: VEC, a vector processing tile for double-precision HPC workloads; STX, a many-core tile optimized for stencil and machine learning computations; and VRP, a variable-precision tile for iterative numerical solvers requiring extended floating-point formats. All tiles are connected through a Coherent Hub Interface (CHI) based network-on-chip with a distributed L2 cache system and communicate with external memory via a SerDes link. The chip was taped out in GF22FDX technology and successfully brought up, with all major IP blocks validated. This paper describes the architecture of each tile and the uncore infrastructure, the integration and physical implementation process, and the board-level bring-up activities. It also reflects on the engineering and coordination lessons learned from a full chip design effort distributed across academic and industrial partners in Europe.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

EPAC documents a real 22nm tapeout of a three-tile RISC-V accelerator with CHI NoC, but the bring-up success rests on qualitative statements without performance or power numbers.

read the letter

The main thing to know is that this group actually taped out and brought up a heterogeneous RISC-V chip in GF22FDX with VEC, STX, and VRP tiles linked by a CHI-based NoC and distributed L2. That is a concrete engineering result from the European Processor Initiative effort, not just a design study. The paper walks through the tile architectures, the uncore infrastructure, the multi-partner physical implementation flow, and some coordination lessons from the distributed design. Those sections give a useful picture of how the pieces fit together and what practical issues arise when academic and industrial teams build a full chip. The description of the SerDes memory link and the specific workload targets for each tile is straightforward and informative. The soft spot is the validation. The claim that the chip was successfully brought up and major IP blocks validated is repeated, but the board-level bring-up section stays at a high level with no clock frequencies, power measurements, test pattern results, or error logs from silicon. For a paper whose central assertion is that the integration worked, those numbers would make the difference between a report and a verifiable outcome. This is the sort of paper that hardware architects and people following open or European processor projects would read for the implementation details. It is not a theoretical contribution and does not need to be, but it deserves referee time because the tapeout itself is the evidence. I would send it to peer review and ask the authors to add basic bring-up metrics.

Referee Report

1 major / 0 minor

Summary. The paper presents EPAC, a 27 mm² RISC-V accelerator chip fabricated in GF22FDX technology with ~0.3 billion transistors. It integrates three heterogeneous compute tiles (VEC for double-precision vector HPC, STX for stencil/ML, VRP for variable-precision iterative solvers) connected by a CHI-based NoC with distributed L2 cache and a SerDes external memory interface. The manuscript describes the per-tile architectures, uncore infrastructure, physical implementation and integration process, and board-level bring-up, concluding that the chip was successfully taped out and all major IP blocks were validated.

Significance. If the bring-up and validation claims hold with supporting data, the work would constitute a concrete milestone in the European Processor Initiative by demonstrating first-silicon functionality of a multi-tile, multi-workload RISC-V SoC in an advanced node. The engineering coordination across academic and industrial partners is itself noteworthy for large-scale European HPC hardware efforts.

major comments (1)

[board-level bring-up activities section] Board-level bring-up activities section: the central claim that the chip 'was taped out ... and successfully brought up, with all major IP blocks validated' is supported only by qualitative statements. No quantitative metrics—achieved frequencies, measured power, test-pattern pass rates, or per-block error logs—are reported for the integrated VEC/STX/VRP tiles, CHI NoC, distributed L2, or SerDes link. This absence directly undermines verification of the integration success that underpins the entire contribution.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the constructive feedback on the board-level bring-up section. We agree that quantitative metrics are essential to substantiate the validation claims and will revise the manuscript to include them.

read point-by-point responses

Referee: [board-level bring-up activities section] Board-level bring-up activities section: the central claim that the chip 'was taped out ... and successfully brought up, with all major IP blocks validated' is supported only by qualitative statements. No quantitative metrics—achieved frequencies, measured power, test-pattern pass rates, or per-block error logs—are reported for the integrated VEC/STX/VRP tiles, CHI NoC, distributed L2, or SerDes link. This absence directly undermines verification of the integration success that underpins the entire contribution.

Authors: We acknowledge that the current manuscript presents the bring-up results primarily through qualitative statements. In the revised version, we will expand the board-level bring-up activities section with available quantitative data from post-silicon validation, including achieved frequencies for the VEC, STX, and VRP tiles, measured power consumption under representative workloads, functional test-pattern pass rates for the compute tiles and uncore components, and summarized error logs for the CHI NoC, distributed L2, and SerDes interface. These metrics were collected during board-level testing but were not included in the initial submission to maintain focus on architectural and implementation details; adding them will directly address the concern and strengthen the evidence for successful integration. revision: yes

Circularity Check

0 steps flagged

No circularity: purely descriptive engineering report with no derivations

full rationale

The paper is a factual engineering implementation report describing the EPAC chip architecture, tile designs (VEC, STX, VRP), CHI-based NoC integration, physical implementation in GF22FDX, tape-out, and board-level bring-up. It contains no equations, predictions, fitted parameters, or derivation chains that could reduce to inputs by construction. All content consists of architectural descriptions and process narratives; the success claim rests on reported design steps rather than any self-referential logic or self-citation load-bearing premises. No instances of self-definitional claims, fitted-input predictions, or ansatz smuggling appear.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

The paper describes a hardware implementation without introducing new mathematical parameters, axioms, or invented entities. All components rely on standard RISC-V architecture and existing IP blocks from prior literature.

pith-pipeline@v0.9.0 · 5707 in / 1176 out tokens · 58889 ms · 2026-05-10T14:29:03.629777+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

23 extracted references · 17 canonical work pages

[1]

Avispado: A RISC-V core supporting the RISC-V vector instruction set,

Semidynamics Technology Services. Avispado: A RISC-V core supporting the RISC-V vector instruction set, . URL https://semidynamics.com/file/pb/ mxymcvviuxxyroh4oul3tipjy0zjcd. Accessed: 2026-04-02

2026
[2]

Francesco Minervini, Oscar Palomar, Osman Unsal, Enrico Reggiani, Josue Quiroga, Joan Marimon, Carlos Rojas, Roger Figueras, Abraham Ruiz, Alberto Gonzalez, et al. Vitruvius+: An area-efficient RISC-V decoupled vector copro- cessor for high performance computing applications.ACM Transactions on Architecture and Code Optimization, 20(2):1–25, 2023. doi: 10...

work page doi:10.1145/3575861 2023
[3]

FAUST: Design and implementation of a pipelined RISC-V vector floating-point unit.Micropro- cessors and microsystems, 97:104762, 2023

Mate Kovač, Leon Dragić, Branimir Malnar, Francesco Minervini, Oscar Palomar, Carlos Rojas, Mauro Olivieri, Josip Knezović, and Mario Kovač. FAUST: Design and implementation of a pipelined RISC-V vector floating-point unit.Micropro- cessors and microsystems, 97:104762, 2023. doi: 10.1016/j.micpro.2023.104762

work page doi:10.1016/j.micpro.2023.104762 2023
[4]

STX – Supercom- puting Hardware and Software Design

Fraunhofer Institute for Industrial Mathematics ITWM. STX – Supercom- puting Hardware and Software Design. https://www.itwm.fraunhofer.de/ en/departments/analytics-computing-en/stx-supercomputing-hardware- software-design.html. Accessed: 2026-04-02

2026
[5]

A Variable and Extended Precision (VRP) Accelerator and its 22 nm soc Implementation

César Fuguet, Eric Guthmuller, Andrea Bocco, Jérôme Fereyre, Adrian Evans, and Yves Durand. A Variable and Extended Precision (VRP) Accelerator and its 22 nm soc Implementation. In2024 39th Conference on Design of Circuits and Integrated Systems (DCIS), pages 1–6. IEEE, 2024. doi: 10.1109/DCIS62603.2024.10769136

work page doi:10.1109/dcis62603.2024.10769136 2024
[6]

Short reasons for long vectors in HPC CPUs: A study based on RISC-V

Pablo Vizcaino, Georgios Ieronymakis, Nikolaos Dimou, Vassilis Papaefstathiou, Jesus Labarta, and Filippo Mantovani. Short reasons for long vectors in HPC CPUs: A study based on RISC-V. InProceedings of the SC’23 Workshops of The International Conference on High Performance Computing, Network, Storage, and Analysis, pages 1543–1549, 2023. doi: 10.1145/362...

work page doi:10.1145/3624062.3624231 2023
[7]

Open Vector Interface specification

Semidynamics Technology Services. Open Vector Interface specification. https: //github.com/semidynamics/OpenVectorInterface, . Accessed: 2026-04-02

2026
[8]

llvm-epi: EPI support for LLVM

Barcelona Supercomputing Center. llvm-epi: EPI support for LLVM. https: //repo.hca.bsc.es/gitlab/rferrer/llvm-epi. Accessed: 2026-04-02

2026
[9]

Software Development Vehicles to enable extended and early co-design: a RISC-V and HPC case of study

Filippo Mantovani, Pablo Vizcaino, Fabio Banchelli, Marta Garcia-Gasulla, Roger Ferrer, Georgios Ieronymakis, Nikolaos Dimou, Vassilis Papaefstathiou, and Jesus Labarta. Software Development Vehicles to enable extended and early co-design: a RISC-V and HPC case of study. InInternational Conference on High Performance Computing, pages 526–537. Springer, 20...

work page doi:10.1007/978-3-031-40843-4_39 2023
[10]

Designing a QEMU plugin to profile multicore long vector RISC-V architectures: RAVE

Pablo Vizcaino, Filippo Mantovani, Jesus Labarta, and Roger Ferrer. Designing a QEMU plugin to profile multicore long vector RISC-V architectures: RAVE. Future Generation Computer Systems, page 108100, 2025. doi: 10.1016/j.future. 2025.108100

work page doi:10.1016/j.future 2025
[11]

RISC-V in HPC: a Look Into Tools for Performance Monitoring

Fabio Banchelli, Rafel Albert Bros Esqueu, Tiago Rocha, Nuno Roma, Pedro Tomás, Nuno Neves, and Filippo Mantovani. RISC-V in HPC: a Look Into Tools for Performance Monitoring. InInternational Conference on High Performance Computing, pages 562–575. Springer, 2025. doi: 10.1007/978-3-032-07612-0_43

work page doi:10.1007/978-3-032-07612-0_43 2025
[12]

Exploring RISC-V long vector capabilities: A case study in Earth Sciences.Future Generation Computer Systems, 174:107932, 2026

Fabio Banchelli, David Jurado, Marta Garcia-Gasulla, and Filippo Mantovani. Exploring RISC-V long vector capabilities: A case study in Earth Sciences.Future Generation Computer Systems, 174:107932, 2026. doi: 10.1016/j.future.2025.107932

work page doi:10.1016/j.future.2025.107932 2026
[13]

Batched DGEMMs for scientific codes running on long vector architectures

Fabio Banchelli, Marta Garcia-Gasulla, and Filippo Mantovani. Batched DGEMMs for scientific codes running on long vector architectures. InInternational Confer- ence on Parallel Processing and Applied Mathematics, pages 17–31. Springer, 2024. doi: 10.1007/978-3-031-85700-3_2

work page doi:10.1007/978-3-031-85700-3_2 2024
[14]

Co-designing ab initio electronic structure methods on a RISC-V vector architecture.Open Research Europe, 4(165):165, 2024

Rogeli Grima Torres, Pablo Vizcaíno, Filippo Mantovani, and José Julio Gutiérrez Moreno. Co-designing ab initio electronic structure methods on a RISC-V vector architecture.Open Research Europe, 4(165):165, 2024. doi: 10.12688/openreseurope. 18321.4

work page doi:10.12688/openreseurope 2024
[15]

Graph computing on long vector architectures (yes, it works!)

Pablo Vizcaino, Jesus Labarta, and Filippo Mantovani. Graph computing on long vector architectures (yes, it works!). In2024 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW), pages 986–995. IEEE,
[16]

doi: 10.1109/IPDPSW63119.2024.00169

work page doi:10.1109/ipdpsw63119.2024.00169 2024
[17]

Alternative basis matrix multiplication is fast and stable,

Marc Blancafort, Roger Ferrer, Guillaume Houzeaux, Marta Garcia-Gasulla, and Filippo Mantovani. Exploiting long vectors with a CFD code: a co-design show case. In2024 IEEE International Parallel and Distributed Processing Symposium (IPDPS), pages 453–464. IEEE, 2024. doi: 10.1109/IPDPS57955.2024.00047

work page doi:10.1109/ipdps57955.2024.00047 2024
[18]

Snitch: A tiny pseudo dual-issue processor for area and energy efficient execution of floating- point intensive workloads.IEEE Transactions on Computers, 70(11):1845–1860,

Florian Zaruba, Fabian Schuiki, Torsten Hoefler, and Luca Benini. Snitch: A tiny pseudo dual-issue processor for area and energy efficient execution of floating- point intensive workloads.IEEE Transactions on Computers, 70(11):1845–1860,
[19]

doi: 10.1109/TC.2020.3027900

work page doi:10.1109/tc.2020.3027900 2020
[20]

Variable and extended precision (VRP) accelerator implemented in a 22 nm SoC.Electronics Letters, 61(1):e70255, 2025

Eric Guthmuller, César Fuguet, Andrea Bocco, Jérôme Fereyre, Adrian Evans, and Yves Durand. Variable and extended precision (VRP) accelerator implemented in a 22 nm SoC.Electronics Letters, 61(1):e70255, 2025. doi: 10.1049/ell2.70255

work page doi:10.1049/ell2.70255 2025
[21]

Accelerating variants of the conjugate gradient with the variable precision processor

Yves Durand, Eric Guthmuller, Cesar Fuguet, Jerome Fereyre, Andrea Bocco, and Riccardo Alidori. Accelerating variants of the conjugate gradient with the variable precision processor. In2022 IEEE 29th Symposium on Computer Arithmetic (ARITH), pages 51–57. IEEE, 2022. doi: 10.1109/ARITH54963.2022.00017

work page doi:10.1109/arith54963.2022.00017 2022
[22]

Stabilizing the Block BiCG with Extended Precision: A Case Study

Alexandre Hoffmann, Yves Durand, and Jérome Fereyre. Stabilizing the Block BiCG with Extended Precision: A Case Study. InInternational Conference on Parallel Processing and Applied Mathematics, pages 65–81. Springer, 2024. doi: 10.1007/978-3-031-85697-6_5

work page doi:10.1007/978-3-031-85697-6_5 2024
[23]

Xvpfloat: RISC-V ISA extension for variable extended precision floating point computation.IEEE Transactions on Computers, 73(7):1683–1697, 2024

Eric Guthmuller, César Fuguet, Andrea Bocco, Jérôme Fereyre, Riccardo Alidori, Ihsane Tahir, and Yves Durand. Xvpfloat: RISC-V ISA extension for variable extended precision floating point computation.IEEE Transactions on Computers, 73(7):1683–1697, 2024. doi: 10.1109/TC.2024.3383964

work page doi:10.1109/tc.2024.3383964 2024