pith. machine review for the scientific record. sign in

arxiv: 2604.12715 · v1 · submitted 2026-04-14 · 💻 cs.AR · cs.DC

Recognition: unknown

EPAC: The Last Dance

Authors on Pith no claims yet

Pith reviewed 2026-05-10 14:29 UTC · model grok-4.3

classification 💻 cs.AR cs.DC
keywords RISC-Vaccelerator chipHPCnetwork-on-chipchip design22FDXvector processingvariable precision
0
0 comments X

The pith

A RISC-V accelerator chip integrating vector, stencil, and variable-precision tiles has been fabricated and validated in 22nm technology.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents EPAC, a RISC-V-based accelerator chip developed in the European Processor Initiative to advance a European HPC processor ecosystem. It combines three specialized tiles: VEC for double-precision vector workloads, STX for stencil and machine learning tasks, and VRP for variable-precision numerical solvers. These tiles connect through a CHI-based network-on-chip with distributed L2 cache and use a SerDes link for external memory. The 27 square millimeter chip in GF22FDX technology with 0.3 billion transistors was taped out and successfully brought up, with major IP blocks validated. The authors also outline the architecture, physical implementation, and lessons from multi-partner European design coordination.

Core claim

EPAC is a RISC-V accelerator chip in GF22FDX technology that integrates a vector processing tile for HPC, a many-core stencil and ML tile, and a variable-precision tile, all linked by a coherent CHI network-on-chip with distributed L2 cache and SerDes memory interface. The 27 sq mm die containing approximately 0.3 billion transistors was taped out and successfully brought up, validating all major IP blocks.

What carries the argument

The CHI-based network-on-chip interconnecting the VEC, STX, and VRP tiles with distributed L2 cache and SerDes external memory link.

If this is right

  • Heterogeneous RISC-V tiles can address diverse HPC workload classes on one die.
  • The physical implementation process in 22FDX technology proves viable for such multi-tile designs.
  • Distributed L2 cache and CHI NoC enable coherent communication across specialized compute units.
  • Multi-partner academic and industrial coordination can deliver a full chip tape-out and bring-up.
  • The architecture supplies a working platform for extended-precision and stencil computations alongside standard vector processing.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The working chip provides a concrete reference design that could encourage broader RISC-V adoption in European HPC systems.
  • Bring-up data from the SerDes link and NoC may inform power and latency optimizations in follow-on chips.
  • Similar modular tile approaches could extend to other domains such as embedded AI or scientific computing accelerators.

Load-bearing premise

The assumption that the tile integration via the NoC and SerDes link functions without major physical or functional issues, which supports the successful bring-up claim.

What would settle it

Test results or measurements showing that any major IP block, such as the vector unit or the coherent interconnect, failed to operate after bring-up would disprove the validation success.

Figures

Figures reproduced from arXiv: 2604.12715 by Alberto Moreno, Andrea Bocco, Antonis Psathakis, Bhavishya Goel, C\'esar Fuguet, Eric Guthmuller, Fabio Banchelli, Filippo Mantovani, Francesco Minervini, Frank K. Gurkaynak, Georgios Ieronymakis, Iasonas Mastorakis, Jens Kr\"uger, J\'er\^ome Fereyre, Jesus Labarta, Jordi Cortina, Josep Sans, Josip Ramljak, Luca Benini, Luca Bertaccini, Luka Mrkovi\'c, Madhavan Manivannan, Manolis Marazakis, Mario Kova\v{c}, Mate Kova\v{c}, Mauro Olivieri, Michalis Giaourtas, Nikolaos Dimou, Nuno Neves, Oscar Palomar, Pablo Vizcaino, Paul Scheffler, Pedro Marcuello, Roger Espasa, Roger Ferrer, Sebastiano Pomata, Tiago Rocha, Tim Fischer, Vassilis Papaefstathiou.

Figure 1
Figure 1. Figure 1: EPAC test-chip block diagram accelerator chip (EPAC); iv) a coordinated engineering effort across multiple academic and industrial partners. The remaining part of the document is organized as follows: Section 2 introduces the EPAC overall system architecture; Sec￾tion 3 details the RISC-V compute tiles; Section 4 summarizes the hardware development necessary for a test-chip that are external to the compute… view at source ↗
Figure 2
Figure 2. Figure 2: Avispado + VPU block diagram The internal structure of the core follows a typical general￾purpose processor design as depicted in [PITH_FULL_IMAGE:figures/full_fig_p003_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: STX block diagram DMA capabilities to manage data movement. The system can be extended with SPU units, which are optional co-processors de￾veloped by Fraunhofer and optimized for stencil workloads with static access patterns and local data dependencies. These units are designed through a hardware-software co-design approach and provide additional performance for stencil kernels. STX follows a modular desig… view at source ↗
Figure 4
Figure 4. Figure 4: Left: Floorplan with area of EPAC components. [PITH_FULL_IMAGE:figures/full_fig_p006_4.png] view at source ↗
read the original abstract

This paper presents EPAC, a RISC-V-based accelerator chip developed within the European Processor Initiative (EPI) as part of a multi-year, multi-partner effort to build a European HPC processor ecosystem. EPAC is implemented in GlobalFoundries 22FDX (GF22FDX) technology, covers an area of 27 sq mm with approximately 0.3 billion transistors, and integrates three distinct RISC-V compute tiles targeting different workload classes: VEC, a vector processing tile for double-precision HPC workloads; STX, a many-core tile optimized for stencil and machine learning computations; and VRP, a variable-precision tile for iterative numerical solvers requiring extended floating-point formats. All tiles are connected through a Coherent Hub Interface (CHI) based network-on-chip with a distributed L2 cache system and communicate with external memory via a SerDes link. The chip was taped out in GF22FDX technology and successfully brought up, with all major IP blocks validated. This paper describes the architecture of each tile and the uncore infrastructure, the integration and physical implementation process, and the board-level bring-up activities. It also reflects on the engineering and coordination lessons learned from a full chip design effort distributed across academic and industrial partners in Europe.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

1 major / 0 minor

Summary. The paper presents EPAC, a 27 mm² RISC-V accelerator chip fabricated in GF22FDX technology with ~0.3 billion transistors. It integrates three heterogeneous compute tiles (VEC for double-precision vector HPC, STX for stencil/ML, VRP for variable-precision iterative solvers) connected by a CHI-based NoC with distributed L2 cache and a SerDes external memory interface. The manuscript describes the per-tile architectures, uncore infrastructure, physical implementation and integration process, and board-level bring-up, concluding that the chip was successfully taped out and all major IP blocks were validated.

Significance. If the bring-up and validation claims hold with supporting data, the work would constitute a concrete milestone in the European Processor Initiative by demonstrating first-silicon functionality of a multi-tile, multi-workload RISC-V SoC in an advanced node. The engineering coordination across academic and industrial partners is itself noteworthy for large-scale European HPC hardware efforts.

major comments (1)
  1. [board-level bring-up activities section] Board-level bring-up activities section: the central claim that the chip 'was taped out ... and successfully brought up, with all major IP blocks validated' is supported only by qualitative statements. No quantitative metrics—achieved frequencies, measured power, test-pattern pass rates, or per-block error logs—are reported for the integrated VEC/STX/VRP tiles, CHI NoC, distributed L2, or SerDes link. This absence directly undermines verification of the integration success that underpins the entire contribution.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the constructive feedback on the board-level bring-up section. We agree that quantitative metrics are essential to substantiate the validation claims and will revise the manuscript to include them.

read point-by-point responses
  1. Referee: [board-level bring-up activities section] Board-level bring-up activities section: the central claim that the chip 'was taped out ... and successfully brought up, with all major IP blocks validated' is supported only by qualitative statements. No quantitative metrics—achieved frequencies, measured power, test-pattern pass rates, or per-block error logs—are reported for the integrated VEC/STX/VRP tiles, CHI NoC, distributed L2, or SerDes link. This absence directly undermines verification of the integration success that underpins the entire contribution.

    Authors: We acknowledge that the current manuscript presents the bring-up results primarily through qualitative statements. In the revised version, we will expand the board-level bring-up activities section with available quantitative data from post-silicon validation, including achieved frequencies for the VEC, STX, and VRP tiles, measured power consumption under representative workloads, functional test-pattern pass rates for the compute tiles and uncore components, and summarized error logs for the CHI NoC, distributed L2, and SerDes interface. These metrics were collected during board-level testing but were not included in the initial submission to maintain focus on architectural and implementation details; adding them will directly address the concern and strengthen the evidence for successful integration. revision: yes

Circularity Check

0 steps flagged

No circularity: purely descriptive engineering report with no derivations

full rationale

The paper is a factual engineering implementation report describing the EPAC chip architecture, tile designs (VEC, STX, VRP), CHI-based NoC integration, physical implementation in GF22FDX, tape-out, and board-level bring-up. It contains no equations, predictions, fitted parameters, or derivation chains that could reduce to inputs by construction. All content consists of architectural descriptions and process narratives; the success claim rests on reported design steps rather than any self-referential logic or self-citation load-bearing premises. No instances of self-definitional claims, fitted-input predictions, or ansatz smuggling appear.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

The paper describes a hardware implementation without introducing new mathematical parameters, axioms, or invented entities. All components rely on standard RISC-V architecture and existing IP blocks from prior literature.

pith-pipeline@v0.9.0 · 5707 in / 1176 out tokens · 58889 ms · 2026-05-10T14:29:03.629777+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

23 extracted references · 17 canonical work pages

  1. [1]

    Avispado: A RISC-V core supporting the RISC-V vector instruction set,

    Semidynamics Technology Services. Avispado: A RISC-V core supporting the RISC-V vector instruction set, . URL https://semidynamics.com/file/pb/ mxymcvviuxxyroh4oul3tipjy0zjcd. Accessed: 2026-04-02

  2. [2]

    Francesco Minervini, Oscar Palomar, Osman Unsal, Enrico Reggiani, Josue Quiroga, Joan Marimon, Carlos Rojas, Roger Figueras, Abraham Ruiz, Alberto Gonzalez, et al. Vitruvius+: An area-efficient RISC-V decoupled vector copro- cessor for high performance computing applications.ACM Transactions on Architecture and Code Optimization, 20(2):1–25, 2023. doi: 10...

  3. [3]

    FAUST: Design and implementation of a pipelined RISC-V vector floating-point unit.Micropro- cessors and microsystems, 97:104762, 2023

    Mate Kovač, Leon Dragić, Branimir Malnar, Francesco Minervini, Oscar Palomar, Carlos Rojas, Mauro Olivieri, Josip Knezović, and Mario Kovač. FAUST: Design and implementation of a pipelined RISC-V vector floating-point unit.Micropro- cessors and microsystems, 97:104762, 2023. doi: 10.1016/j.micpro.2023.104762

  4. [4]

    STX – Supercom- puting Hardware and Software Design

    Fraunhofer Institute for Industrial Mathematics ITWM. STX – Supercom- puting Hardware and Software Design. https://www.itwm.fraunhofer.de/ en/departments/analytics-computing-en/stx-supercomputing-hardware- software-design.html. Accessed: 2026-04-02

  5. [5]

    A Variable and Extended Precision (VRP) Accelerator and its 22 nm soc Implementation

    César Fuguet, Eric Guthmuller, Andrea Bocco, Jérôme Fereyre, Adrian Evans, and Yves Durand. A Variable and Extended Precision (VRP) Accelerator and its 22 nm soc Implementation. In2024 39th Conference on Design of Circuits and Integrated Systems (DCIS), pages 1–6. IEEE, 2024. doi: 10.1109/DCIS62603.2024.10769136

  6. [6]

    Short reasons for long vectors in HPC CPUs: A study based on RISC-V

    Pablo Vizcaino, Georgios Ieronymakis, Nikolaos Dimou, Vassilis Papaefstathiou, Jesus Labarta, and Filippo Mantovani. Short reasons for long vectors in HPC CPUs: A study based on RISC-V. InProceedings of the SC’23 Workshops of The International Conference on High Performance Computing, Network, Storage, and Analysis, pages 1543–1549, 2023. doi: 10.1145/362...

  7. [7]

    Open Vector Interface specification

    Semidynamics Technology Services. Open Vector Interface specification. https: //github.com/semidynamics/OpenVectorInterface, . Accessed: 2026-04-02

  8. [8]

    llvm-epi: EPI support for LLVM

    Barcelona Supercomputing Center. llvm-epi: EPI support for LLVM. https: //repo.hca.bsc.es/gitlab/rferrer/llvm-epi. Accessed: 2026-04-02

  9. [9]

    Software Development Vehicles to enable extended and early co-design: a RISC-V and HPC case of study

    Filippo Mantovani, Pablo Vizcaino, Fabio Banchelli, Marta Garcia-Gasulla, Roger Ferrer, Georgios Ieronymakis, Nikolaos Dimou, Vassilis Papaefstathiou, and Jesus Labarta. Software Development Vehicles to enable extended and early co-design: a RISC-V and HPC case of study. InInternational Conference on High Performance Computing, pages 526–537. Springer, 20...

  10. [10]

    Designing a QEMU plugin to profile multicore long vector RISC-V architectures: RAVE

    Pablo Vizcaino, Filippo Mantovani, Jesus Labarta, and Roger Ferrer. Designing a QEMU plugin to profile multicore long vector RISC-V architectures: RAVE. Future Generation Computer Systems, page 108100, 2025. doi: 10.1016/j.future. 2025.108100

  11. [11]

    RISC-V in HPC: a Look Into Tools for Performance Monitoring

    Fabio Banchelli, Rafel Albert Bros Esqueu, Tiago Rocha, Nuno Roma, Pedro Tomás, Nuno Neves, and Filippo Mantovani. RISC-V in HPC: a Look Into Tools for Performance Monitoring. InInternational Conference on High Performance Computing, pages 562–575. Springer, 2025. doi: 10.1007/978-3-032-07612-0_43

  12. [12]

    Exploring RISC-V long vector capabilities: A case study in Earth Sciences.Future Generation Computer Systems, 174:107932, 2026

    Fabio Banchelli, David Jurado, Marta Garcia-Gasulla, and Filippo Mantovani. Exploring RISC-V long vector capabilities: A case study in Earth Sciences.Future Generation Computer Systems, 174:107932, 2026. doi: 10.1016/j.future.2025.107932

  13. [13]

    Batched DGEMMs for scientific codes running on long vector architectures

    Fabio Banchelli, Marta Garcia-Gasulla, and Filippo Mantovani. Batched DGEMMs for scientific codes running on long vector architectures. InInternational Confer- ence on Parallel Processing and Applied Mathematics, pages 17–31. Springer, 2024. doi: 10.1007/978-3-031-85700-3_2

  14. [14]

    Co-designing ab initio electronic structure methods on a RISC-V vector architecture.Open Research Europe, 4(165):165, 2024

    Rogeli Grima Torres, Pablo Vizcaíno, Filippo Mantovani, and José Julio Gutiérrez Moreno. Co-designing ab initio electronic structure methods on a RISC-V vector architecture.Open Research Europe, 4(165):165, 2024. doi: 10.12688/openreseurope. 18321.4

  15. [15]

    Graph computing on long vector architectures (yes, it works!)

    Pablo Vizcaino, Jesus Labarta, and Filippo Mantovani. Graph computing on long vector architectures (yes, it works!). In2024 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW), pages 986–995. IEEE,

  16. [16]

    doi: 10.1109/IPDPSW63119.2024.00169

  17. [17]

    Alternative basis matrix multiplication is fast and stable,

    Marc Blancafort, Roger Ferrer, Guillaume Houzeaux, Marta Garcia-Gasulla, and Filippo Mantovani. Exploiting long vectors with a CFD code: a co-design show case. In2024 IEEE International Parallel and Distributed Processing Symposium (IPDPS), pages 453–464. IEEE, 2024. doi: 10.1109/IPDPS57955.2024.00047

  18. [18]

    Snitch: A tiny pseudo dual-issue processor for area and energy efficient execution of floating- point intensive workloads.IEEE Transactions on Computers, 70(11):1845–1860,

    Florian Zaruba, Fabian Schuiki, Torsten Hoefler, and Luca Benini. Snitch: A tiny pseudo dual-issue processor for area and energy efficient execution of floating- point intensive workloads.IEEE Transactions on Computers, 70(11):1845–1860,

  19. [19]

    doi: 10.1109/TC.2020.3027900

  20. [20]

    Variable and extended precision (VRP) accelerator implemented in a 22 nm SoC.Electronics Letters, 61(1):e70255, 2025

    Eric Guthmuller, César Fuguet, Andrea Bocco, Jérôme Fereyre, Adrian Evans, and Yves Durand. Variable and extended precision (VRP) accelerator implemented in a 22 nm SoC.Electronics Letters, 61(1):e70255, 2025. doi: 10.1049/ell2.70255

  21. [21]

    Accelerating variants of the conjugate gradient with the variable precision processor

    Yves Durand, Eric Guthmuller, Cesar Fuguet, Jerome Fereyre, Andrea Bocco, and Riccardo Alidori. Accelerating variants of the conjugate gradient with the variable precision processor. In2022 IEEE 29th Symposium on Computer Arithmetic (ARITH), pages 51–57. IEEE, 2022. doi: 10.1109/ARITH54963.2022.00017

  22. [22]

    Stabilizing the Block BiCG with Extended Precision: A Case Study

    Alexandre Hoffmann, Yves Durand, and Jérome Fereyre. Stabilizing the Block BiCG with Extended Precision: A Case Study. InInternational Conference on Parallel Processing and Applied Mathematics, pages 65–81. Springer, 2024. doi: 10.1007/978-3-031-85697-6_5

  23. [23]

    Xvpfloat: RISC-V ISA extension for variable extended precision floating point computation.IEEE Transactions on Computers, 73(7):1683–1697, 2024

    Eric Guthmuller, César Fuguet, Andrea Bocco, Jérôme Fereyre, Riccardo Alidori, Ihsane Tahir, and Yves Durand. Xvpfloat: RISC-V ISA extension for variable extended precision floating point computation.IEEE Transactions on Computers, 73(7):1683–1697, 2024. doi: 10.1109/TC.2024.3383964