pith. machine review for the scientific record. sign in

arxiv: 2511.09447 · v2 · submitted 2025-11-12 · 💻 cs.DC · cs.PL

Recognition: unknown

SpaDA: A Spatial Dataflow Architecture Programming Language

Authors on Pith no claims yet
classification 💻 cs.DC cs.PL
keywords programmingspadadataflowoperationsacrossarchitecturesasynchronouscerebras
0
0 comments X
read the original abstract

Spatial dataflow architectures like the Cerebras Wafer-Scale Engine deliver exceptional performance in AI and scientific computing by distributing scratchpad memory across hundreds of thousands of processing elements (PEs). Yet programming these architectures remains difficult: with no shared memory, data movement requires explicit configuration, and asynchronous task management introduces substantial complexity. We present SpaDA, a programming language that offers precise control over data placement, dataflow patterns, and asynchronous operations while abstracting low-level architectural details. We design and implement a compiler targeting Cerebras CSL through multi-level lowering and unique optimization passes. SpaDA functions as a high-level programming interface and an intermediate representation for domain-specific languages (DSLs), demonstrated here with the GT4Py stencil DSL. SpaDA enables concise expression of operations with complex parallel patterns -- including pipelined collective operations, multi-dimensional stencils, and dense linear algebra -- in 14.09x fewer lines than CSL, achieving over 260 TFlop/s across 730,000 PEs on a single device.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. Stencil Computations on Cerebras Wafer-Scale Engine

    cs.DC 2026-05 unverdicted novelty 6.0

    CStencil on the WSE-3 achieves up to 342x speedup for 2D stencils versus an adapted single-precision GPU solver and saturates both compute and on-chip memory bandwidth.