arxiv: 2605.08229 · v1 · submitted 2026-05-06 · 💻 cs.AR

Recognition: no theorem link

REPTILES: Repeated Tiles of Sargantana, a RISC-V multicore based on OpenPiton

Abbas Haghi, Albert Aguilera, Alexander Kropotov, Alireza Foroodnia, Alireza Monemi, Arnau Bigas, Behzad Salami, Cesar Fuguet, Davy Million, Francesc Moll, Ignacio Genovese, Ivan D\'iaz, Jonathan Balkind, Juan Antonio Rodr\'iguez, Lluc Alvarez, Lori\'en L\'opez-Villellas, Max Doblas, Miquel Moret\'o, Narc\'is Rodas, Neiel Leyva, Noelia Oliete-Escu\'in, Oscar Lostes-Cazorla, Oscar Palomar, Ra\'ul Gilabert, Roger Figueras, Sajjad Ahmad, S\'erik P\'erez, V\'ictor Soria-Pardos, Xavier Carril

Pith reviewed 2026-05-12 00:45 UTC · model grok-4.3

classification 💻 cs.AR

keywords multicore processorscalabilityhigh-performance computingopen-source designvector processingtiled architectureperformance evaluation

0 comments

The pith

Tiled multicore design delivers 3.1 times average speedup with four cores and 9.3 times on vector addition

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents a multicore processor built from repeated core tiles connected through a memory hierarchy, with added features to support high-performance computing workloads. It aims to show that such a design offers good scalability and performance improvements over single-core setups. A sympathetic reader would care because it addresses the challenge of building complex multicore systems with limited resources by providing an open and flexible framework. The reported results indicate that performance scales to 3.1 times faster on average when using four cores, and vector addition runs 9.3 times faster due to the core enhancements. This would matter if it enables more researchers to experiment with and optimize multicore architectures for demanding applications.

Core claim

By repeating tiles of an enhanced core and linking them with a memory hierarchy, the design achieves scalable performance, specifically a 3.1x average speedup with four cores and a 9.3x improvement on vector addition benchmarks through new core features.

What carries the argument

The repeated tiling of cores interconnected by the memory hierarchy, which carries the scalability argument by allowing additional cores to be added without proportional increases in complexity.

Load-bearing premise

That the measured speedups accurately represent the system's behavior without overlooked overheads from integration and that the benchmarks used are representative of modern high-performance computing applications.

What would settle it

Running the vector addition benchmark and the parallel scalability tests on physical hardware implementing the full design and observing whether the speedups match the reported 3.1x and 9.3x values; significant underperformance would indicate the claim does not hold.

Figures

Figures reproduced from arXiv: 2605.08229 by Abbas Haghi, Albert Aguilera, Alexander Kropotov, Alireza Foroodnia, Alireza Monemi, Arnau Bigas, Behzad Salami, Cesar Fuguet, Davy Million, Francesc Moll, Ignacio Genovese, Ivan D\'iaz, Jonathan Balkind, Juan Antonio Rodr\'iguez, Lluc Alvarez, Lori\'en L\'opez-Villellas, Max Doblas, Miquel Moret\'o, Narc\'is Rodas, Neiel Leyva, Noelia Oliete-Escu\'in, Oscar Lostes-Cazorla, Oscar Palomar, Ra\'ul Gilabert, Roger Figueras, Sajjad Ahmad, S\'erik P\'erez, V\'ictor Soria-Pardos, Xavier Carril.

**Figure 1.** Figure 1: Execution speedup of NAS benchmarks over 2, 3, and 4 threads with respect to 1 thread. References [1] Jonathan Balkind et al. “OpenPiton: An Open Source Manycore Research Framework”. In: SIGARCH Comput. Archit. News 44.2 (Mar. 2016), pp. 217–232. issn: 0163- 5964. doi: 10.1145/2980024.2872414. url: https://doi. org/10.1145/2980024.2872414. [2] Neiel Leyva et al. “OpenPiton4HPC: Optimizing OpenPiton Toward… view at source ↗

read the original abstract

Chip industry continues advancing and expanding modern computing systems, resulting in more complex multi-core processors. Conversely, academic projects face scalability challenges due to limited resources, highlighting the need for open-source frameworks that enable innovation and knowledge sharing. Recently, several open-source proposals have emerged, offering flexible and scalable designs, but fail to meet the performance demands of modern High-Performance Computing (HPC) applications. In this project, we present REPTILES, an open-source RISC-V multicore framework based on OpenPiton\thanks. REPTILES interconnects multiple Sargantana cores with the memory hierarchy of OpenPiton. Moreover, we present the new features incorporated in Sargantana and OpenPiton designs to improve the performance of HPC applications. We demonstrate that REPTILES presents suitable scalability, achieving a speedup of 3.1x on average with 4 cores. Additionally, we show that Sargantana's new features increase the performance of vector addition benchmark in a 9.3x.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

REPTILES wires Sargantana cores into OpenPiton with some HPC tweaks, but the 3.1x and 9.3x claims sit on unstated simulation details and unquantified overheads.

read the letter

The paper's core move is to take the Sargantana RISC-V core, add a few targeted features for vector and HPC work, then tile multiple instances using OpenPiton's existing memory hierarchy and NoC. That produces a 4-core design they call REPTILES. The open-source release and the specific integration choices are the actual new pieces; everything else builds directly on prior projects rather than inventing new mechanisms from scratch. They report 3.1x average speedup across cores and a 9.3x gain on vector addition, which on paper looks like a practical step forward for academic open hardware.

Referee Report

2 major / 2 minor

Summary. The manuscript presents REPTILES, an open-source RISC-V multicore framework that tiles multiple Sargantana cores onto OpenPiton’s memory hierarchy and coherence fabric. It describes new microarchitectural features added to Sargantana (and supporting changes in OpenPiton) intended to improve HPC workloads, and reports two headline results: an average 3.1× speedup across unspecified workloads when scaling to four cores, plus a 9.3× improvement on a vector-addition kernel attributed to the new Sargantana features.

Significance. If the reported speedups can be reproduced with a fully documented experimental methodology, the work would supply a concrete, open-source multicore RISC-V platform that academic groups could use to explore scalable HPC hardware without starting from scratch. The reuse of OpenPiton’s existing tile and NoC infrastructure is a pragmatic strength that could aid reproducibility and incremental research.

major comments (2)

[Abstract] Abstract: the central performance claims (3.1× average speedup with four cores and 9.3× vector-add gain) are stated without any description of the experimental setup, simulation model (cycle-accurate RTL vs. higher-level), baseline single-core configuration, cache-coherence or NoC latency measurements, or verification steps. Because these numbers are the primary evidence offered for the scalability and feature-benefit assertions, their unsupported presentation is load-bearing.
[Results] Results section (wherever the 3.1× and 9.3× figures appear): no quantitative accounting is given for integration overheads incurred when wiring Sargantana cores into OpenPiton’s memory hierarchy, nor is it stated whether the vector-add 9.3× figure includes or excludes the new Sargantana features. Without these data the headline scalability numbers cannot be evaluated.

minor comments (2)

[Abstract] The phrase “OpenPiton” is followed by an unexpanded “thanks” footnote marker in the abstract; this should be cleaned up.
The manuscript would benefit from an explicit table or paragraph listing the exact benchmarks, input sizes, and single-core baseline numbers used to compute the reported speedups.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive comments on our manuscript. We address each major point below and have revised the manuscript to improve clarity on the experimental methodology and quantitative details supporting our performance claims.

read point-by-point responses

Referee: [Abstract] Abstract: the central performance claims (3.1× average speedup with four cores and 9.3× vector-add gain) are stated without any description of the experimental setup, simulation model (cycle-accurate RTL vs. higher-level), baseline single-core configuration, cache-coherence or NoC latency measurements, or verification steps. Because these numbers are the primary evidence offered for the scalability and feature-benefit assertions, their unsupported presentation is load-bearing.

Authors: We agree that the abstract would benefit from additional context on the evaluation methodology. In the revised manuscript, we have added a concise statement to the abstract indicating that the reported speedups derive from cycle-accurate RTL simulations of the REPTILES framework, using the single-core Sargantana configuration as baseline. Full details on the simulation model, cache-coherence protocol, NoC latencies, and verification steps are already present in the Experimental Methodology section; the abstract revision now explicitly directs readers to this section. revision: yes
Referee: [Results] Results section (wherever the 3.1× and 9.3× figures appear): no quantitative accounting is given for integration overheads incurred when wiring Sargantana cores into OpenPiton’s memory hierarchy, nor is it stated whether the vector-add 9.3× figure includes or excludes the new Sargantana features. Without these data the headline scalability numbers cannot be evaluated.

Authors: We acknowledge the need for explicit quantitative data on integration overheads and clarification of the 9.3× result. We have expanded the Results section with a new paragraph reporting measured area and latency overheads from integrating Sargantana cores into OpenPiton’s memory hierarchy and coherence fabric. We have also clarified that the 9.3× vector-addition improvement is measured with the new Sargantana microarchitectural features enabled; a direct comparison excluding these features has been added to isolate their contribution. Supporting NoC and coherence latency measurements from our simulations are now included to substantiate the scalability numbers. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical benchmarking report with no derivations or self-referential predictions

full rationale

The paper describes an open-source multicore hardware design (REPTILES based on OpenPiton and Sargantana) and reports measured speedups (3.1x on 4 cores, 9.3x on vector-add) from benchmarks. No equations, fitted parameters, uniqueness theorems, or predictive claims appear in the abstract or described content. All performance numbers are presented as direct simulation or implementation outcomes rather than derivations that reduce to the paper's own inputs. This is a standard systems paper whose central claims rest on external verification via benchmarks, not internal circular logic.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

This is an engineering systems paper that reuses prior open-source components and relies on standard computer-architecture assumptions about memory hierarchies and core integration. No free parameters, new axioms, or invented entities are introduced.

axioms (1)

domain assumption OpenPiton's memory hierarchy integrates with multiple Sargantana cores to deliver scalable performance without major coherence or latency penalties.
The scalability claim rests on this unstated premise about the underlying OpenPiton design.

pith-pipeline@v0.9.0 · 5634 in / 1201 out tokens · 37937 ms · 2026-05-12T00:45:15.937234+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

4 extracted references · 4 canonical work pages

[1]

OpenPiton: An Open Source Manycore Research Framework

Jonathan Balkind et al. “OpenPiton: An Open Source Manycore Research Framework”. In:SIGARCH Comput. Archit. News44.2 (Mar. 2016), pp. 217–232.issn: 0163- 5964.doi: 10.1145/2980024.2872414.url: https://doi. org/10.1145/2980024.2872414

work page doi:10.1145/2980024.2872414.url: 2016
[2]

OpenPiton4HPC: Optimizing Open- Piton Toward High-Performance Manycores

Neiel Leyva et al. “OpenPiton4HPC: Optimizing Open- Piton Toward High-Performance Manycores”. In:IEEE Journal on Emerging and Selected Topics in Circuits and Systems14.3 (2024), pp. 395–408.doi: 10.1109/JETCAS. 2024.3428929

work page doi:10.1109/jetcas 2024
[3]

HPDcache: Open-Source High-Performance L1 Data Cache for RISC-V Cores

César Fuguet. “HPDcache: Open-Source High-Performance L1 Data Cache for RISC-V Cores”. In:Proceedings of the 20th ACM International Conference on Computing Fron- tiers. CF ’23. Bologna, Italy: Association for Computing Machinery, 2023, pp. 377–378.isbn: 9798400701405.doi: 10.1145/3587135.3591413 .url: https://doi.org/10. 1145/3587135.3591413

work page doi:10.1145/3587135.3591413 2023
[4]

Sargantana: A 1 GHz+ in-order RISC-V processor with SIMD vector extensions in 22nm FD-SOI

Victor Soria et al. “Sargantana: A 1 GHz+ in-order RISC-V processor with SIMD vector extensions in 22nm FD-SOI”. In:2022 25th Euromicro Conference on Digital System Design (DSD). IEEE. 2022, pp. 254–261. 2 RISC-V Summit Europe, Paris, 12-15th May 2025

work page 2022