pith. machine review for the scientific record. sign in

arxiv: 2605.01140 · v1 · submitted 2026-05-01 · 💻 cs.PL · cs.PF

Recognition: unknown

SoCal: A Language for Memory-Layout Factorization of Recursive Datatypes

Artem Pelenitsyn, Michael H. Borkowski, Mikah Kainen, Mike Vollmer, Milind Kulkarni, Vidush Singhal

Authors on Pith no claims yet

Pith reviewed 2026-05-09 14:13 UTC · model grok-4.3

classification 💻 cs.PL cs.PF
keywords memory layout factorizationrecursive datatypesstructure of arrayscompiler transformationfunctional programmingtree processingdata serialization
0
0 comments X

The pith

A new language and compiler automatically convert recursive datatypes into multi-buffer layouts that separate fields for better memory performance in tree code.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces SoCal, a language for creating factored representations of algebraic datatypes in which each field occupies its own separate buffer instead of being interleaved in one block. This produces structure-of-arrays style storage for serialized recursive structures such as trees. The Colobus compiler applies the transformation automatically to functional programs so they operate on the new layout. Evaluation on tree-processing benchmarks reports a 1.46 times geometric mean speedup. Readers would care because many programs in compilers, browsers, and scientific computing manipulate tree-shaped data and stand to gain locality and parallelism benefits without manual rewriting.

Core claim

We introduce factored, multi-buffer layouts that store different ADT fields in separate buffers, enabling SoA-like layouts for serialized recursive data structures. We formalize this approach in SoCal, a language for generating factored ADT representations, and implement it in a compiler called Colobus. Colobus automatically transforms functional programs to operate over a serialized, factored layout of recursive ADTs. Our evaluation shows a 1.46x geometric mean speedup on a suite of tree-processing benchmarks.

What carries the argument

Factored multi-buffer layouts, in which each ADT field is stored in its own contiguous buffer with linking information to preserve recursive structure across buffers.

If this is right

  • Existing functional programs on recursive datatypes can run faster after automatic conversion without source changes.
  • Serialized trees gain the locality and parallel-access benefits previously available only to regular arrays.
  • Applications such as AST manipulation, DOM processing, and k-d tree workloads become candidates for the same gains.
  • The approach extends the classic AoS-to-SoA transformation to irregular, pointer-based data.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The technique could be combined with existing vectorization passes to exploit the independent buffers for SIMD execution.
  • Memory-footprint trade-offs from extra buffers and linking data might limit applicability in memory-constrained settings.
  • Similar factorization might be applied to other irregular structures if a serialization step is first defined.

Load-bearing premise

The automatic program transformation preserves the original semantics exactly for every possible input.

What would settle it

A test case in which the transformed program produces different results from the original program on the same inputs, or a controlled benchmark run that isolates the layout change and shows no performance difference.

Figures

Figures reproduced from arXiv: 2605.01140 by Artem Pelenitsyn, Michael H. Borkowski, Mikah Kainen, Mike Vollmer, Milind Kulkarni, Vidush Singhal.

Figure 1
Figure 1. Figure 1: flattened vs. factored Representation of a Haskell-Style Cons List in view at source ↗
Figure 2
Figure 2. Figure 2: sumTree source and compiled to C using flattened and factored ADT layout. representations of datatypes and demonstrate both Gibbon’s flat view of the memory layout and our factored view. 2.1 Pointer-Based Data Representations Functional programming is a convenient tool for defining and processing recursive datatypes like lists and trees. For instance, in a language like Haskell, a tree datatype can be defi… view at source ↗
Figure 3
Figure 3. Figure 3: sumtree implemented in LoCal and SoCal While Gibbon commits to the flat-buffer assumption, Colobus lifts this assumption making it possible to store a serialized tree with the integers of leaf nodes factored out into a separate buffer from the node tags. A program factored in this way tracks two byte streams: one for all data constructor tags and another for all integers in the leaf nodes (Figure 2c). The … view at source ↗
Figure 4
Figure 4. Figure 4: buildTree in SoCal. therefore reside in a single flat buffer. The associative list −−−−−−−−−−−−−−−−−−⇀ (DataCon, Idx,𝑙𝑜𝑐𝑖) maps each data￾constructor/field-index pair to the corresponding factored location. For Int and other primitive types, this location is a disjoint single buffer. For fields that are datatypes, this could either be a flattened location or another factored location. To get a better sense… view at source ↗
Figure 5
Figure 5. Figure 5: Grammar of SoCal over locations. These constraints, together with types, describe both data layout and traversal order, giving a high-level account of the low-level memory behavior induced by the compiler. 3.1 Grammar of SoCal view at source ↗
Figure 6
Figure 6. Figure 6: Static typings of SoCal. T-LetLoc-ProjField similarly projects a field location using a key (𝐾,𝑖) via projFieldLoc. Ill-formed keys (constructor mismatch or out-of-bounds field index) are rejected statically. As with T-LetLoc￾ProjTag, this projection introduces the projected location into N, updates the corresponding allocation pointer in A, and records the projection constraint in C. T-LetLoc-IntroLocVec … view at source ↗
Figure 7
Figure 7. Figure 7: Step-by-step example of type checking a simple expression. view at source ↗
Figure 8
Figure 8. Figure 8: Selected dynamic-semantics rules for SoCal. view at source ↗
Figure 9
Figure 9. Figure 9: Comparison of Packed flattened Mutable and Immutable Add1 Tree Implementations view at source ↗
Figure 10
Figure 10. Figure 10: Grammar of NoCal (an extension of SoCal) view at source ↗
Figure 11
Figure 11. Figure 11: sumTree in NoCal with mutable cursors view at source ↗
Figure 12
Figure 12. Figure 12: Grammar of SoCal. This is a copy of Figure 5. view at source ↗
Figure 13
Figure 13. Figure 13: Static typings of SoCal from Figure 6 view at source ↗
Figure 14
Figure 14. Figure 14: Static typings of SoCal (remaining rules). view at source ↗
Figure 15
Figure 15. Figure 15: Copy of dynamic semantics shown in Figure 8. view at source ↗
Figure 16
Figure 16. Figure 16: Remaining dynamic semantics for SoCal D.2 End-witness judgements Judgement forms. 𝜏; ⟨r, i𝑠 ⟩; S ⊢ew ⟨r, i𝑒 ⟩ 𝜏; ⟨𝑟𝑑, i𝑠 ⟩ 𝑙𝑑 −−−−−−−−−−⇀ (K, j, 𝑐𝑙𝑜𝑐𝑠j); S ⊢ewrec ⟨𝑟𝑑, i𝑒 ⟩ 𝑙𝑑 −−−−−−−−−−⇀ (K, j, 𝑐𝑙𝑜𝑐𝑒j) The judgement ; ; ⊢ew is the contiguous witness relation inherited from LoCal: it states that the concrete location on the right is one past the final cell of the value of type 𝜏 rooted at the concrete loc… view at source ↗
read the original abstract

Array-of-structures (AoS) to structure-of-arrays (SoA) is a classic compiler transformation that improves memory locality and enables data-parallel execution. Existing AoS-to-SoA transformations primarily target regular, array-based programs in imperative languages like C and C++. In contrast, many applications manipulate tree-shaped data structures, for example, ASTs in compilers, DOM trees in browsers, and k-d trees in scientific workloads. Prior work improves the performance of functional programs operating on such data by serializing algebraic datatypes (ADTs) into contiguous memory buffers. However, these representations interleave fields within a single buffer, similar to AoS layouts. We introduce factored, multi-buffer layouts that store different ADT fields in separate buffers, enabling SoA-like layouts for serialized recursive data structures. We formalize this approach in SoCal, a language for generating factored ADT representations, and implement it in a compiler called Colobus. Colobus automatically transforms functional programs to operate over a serialized, factored layout of recursive ADTs. Our evaluation shows a 1.46x geometric mean speedup on a suite of tree-processing benchmarks.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

1 major / 2 minor

Summary. The paper introduces SoCal, a language for memory-layout factorization of recursive algebraic datatypes (ADTs) that enables factored multi-buffer representations analogous to structure-of-arrays (SoA) layouts. It describes the Colobus compiler, which performs an automatic semantics-preserving program transformation to operate over these serialized factored layouts, and evaluates the approach on tree-processing benchmarks, reporting a 1.46x geometric mean speedup.

Significance. If the central claims hold, this work meaningfully extends classic AoS-to-SoA transformations from imperative array-based programs to recursive ADTs common in functional languages. The formalization of factored layouts in SoCal and the automatic rewrite in Colobus provide a principled foundation that could improve locality and enable data-parallelism for applications such as AST processing and scientific tree structures. The evaluation, if substantiated, demonstrates practical performance gains.

major comments (1)
  1. [Evaluation] Evaluation section: The reported 1.46x geometric mean speedup is central to the performance claim, yet the manuscript supplies no information on benchmark selection criteria, measurement methodology (including hardware, number of runs, and timing), error bars, or comparisons against strong baselines such as existing ADT serialization techniques or hand-tuned layouts. This information is required to attribute observed speedups specifically to the factored multi-buffer layout rather than incidental effects.
minor comments (2)
  1. [§3] §3 (SoCal Language): The presentation of layout rules would benefit from a concrete running example showing how a small recursive ADT (e.g., a binary tree) is mapped to multiple buffers, to clarify the factorization mechanics for readers.
  2. [Related Work] Related Work: The discussion of prior ADT serialization work could be expanded with explicit comparisons to recent efforts on memory layouts in functional compilers, to better position the novelty of the multi-buffer approach.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the constructive feedback and for recognizing the potential of SoCal and Colobus to extend AoS-to-SoA ideas to recursive ADTs. We agree that the evaluation requires more methodological detail to support the reported speedups and will revise the manuscript accordingly.

read point-by-point responses
  1. Referee: [Evaluation] Evaluation section: The reported 1.46x geometric mean speedup is central to the performance claim, yet the manuscript supplies no information on benchmark selection criteria, measurement methodology (including hardware, number of runs, and timing), error bars, or comparisons against strong baselines such as existing ADT serialization techniques or hand-tuned layouts. This information is required to attribute observed speedups specifically to the factored multi-buffer layout rather than incidental effects.

    Authors: We acknowledge that the current manuscript omits these experimental details. In the revised version we will expand the Evaluation section with: (1) explicit criteria used to select the tree-processing benchmarks, (2) hardware platform specifications, (3) number of runs, timing methodology, and statistical reporting (including error bars), and (4) direct comparisons against standard ADT serialization baselines and, where feasible, hand-tuned layouts. These additions will clarify that the observed gains are attributable to the factored multi-buffer representation. revision: yes

Circularity Check

0 steps flagged

No significant circularity

full rationale

The paper introduces SoCal as a language for defining factored multi-buffer layouts for recursive ADTs and Colobus as the implementing compiler that performs automatic program transformations. The central results are the language definition itself (which supplies the layout rules by construction) and measured wall-clock speedups on tree-processing benchmarks. No equations, fitted parameters, or first-principles derivations are present that could reduce to their own inputs. The evaluation relies on external benchmarks rather than any self-referential prediction or self-citation chain. The approach is self-contained: the formalization defines the layouts, the compiler applies the corresponding rewrites, and performance is reported empirically. This matches the common case of a systems paper whose claims rest on implementation and measurement rather than any load-bearing circular step.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 2 invented entities

The central claim rests on the unstated assumption that functional programs operating on recursive ADTs can be rewritten to use factored layouts while preserving semantics, and that the benchmarks used for evaluation are representative of real workloads. No free parameters or invented physical entities are mentioned.

axioms (1)
  • domain assumption Functional programs on recursive ADTs can be automatically transformed to equivalent programs on serialized factored layouts without changing observable behavior.
    Implicit in the description of Colobus as an automatic transformer.
invented entities (2)
  • SoCal language no independent evidence
    purpose: To specify factored multi-buffer layouts for recursive ADTs
    New language introduced to describe the layouts.
  • Colobus compiler no independent evidence
    purpose: To perform the automatic transformation of programs to use the factored layouts
    New compiler implementation that realizes the SoCal layouts.

pith-pipeline@v0.9.0 · 5520 in / 1502 out tokens · 61485 ms · 2026-05-09T14:13:45.119857+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

25 extracted references · 2 canonical work pages

  1. [1]

    Trishul M

    ISBN 978-0412048418. Trishul M. Chilimbi, Bob Davidson, and James R. Larus. Cache-conscious structure definition. InProceedings of the ACM SIGPLAN 1999 conference on Programming language design and implementation, PLDI ’99, pages 13–24, New York, NY, USA, 1999. ACM. ISBN 1-58113-094-5. doi: http://doi.acm.org/10.1145/301618.301635. URL http://doi.acm.org/...

  2. [2]

    related passes. We use an adapted ADT for the Octree and 2 passes over the octree (palet- teEntriesQuantizedandquantizationErrorProxy) that estimates the number of palettes entries after quantization and the second estimates the quantization error after quantization of the image. Results are reported in Table 27. A.11 PiecewiseFunctions ThePiecewiseFuncti...

  3. [3]

    If(l r ↦→𝜏) ∈Σ, then there existi 𝑠,i 𝑒 such that (lr ↦→ ⟨r,i 𝑠 ⟩) ∈Mand𝜏;⟨r,i 𝑠 ⟩;S⊢ ew ⟨r,i 𝑒 ⟩. SoCal: A Language for Memory-Layout Factorization of Recursive Datatypes 41 [T -V ar] Γ(𝑥)=𝜏@𝑙𝑜𝑐Σ(𝑙𝑜𝑐)=𝜏 Γ;Σ;C;A;N⊢A;N;𝑥:𝜏@𝑙𝑜𝑐 [T -Concrete-Loc] Σ(𝑙𝑜𝑐)=𝜏 Γ;Σ;C;A;N⊢A;N;𝑐𝑙𝑜𝑐:𝜏@𝑙𝑜𝑐 [T -LetLoc-ProjTag] 𝑙𝑑 𝑟𝑑 , 𝑙𝑜𝑐1 ≠𝑙𝑜𝑐 2 A(r 𝑑 )=∅𝑙𝑜𝑐 1 ∈N𝑙𝑜𝑐 1, 𝑙𝑑 𝑟𝑑 ∉N ′′ Γ;Σ...

  4. [4]

    If (𝑙𝑜𝑐 1 ↦→𝜏) ∈ Σand 𝑙𝑜𝑐 1 =𝑙 𝑑 𝑟𝑑 − − − − − − − −⇀(K,j, 𝑙𝑜𝑐 j), then there exist corresponding concrete components⟨𝑟 𝑑,i 𝑠 ⟩𝑙𝑑 − − − − − − − − − −⇀(K,j, 𝑐𝑙𝑜𝑐 𝑠j )and⟨𝑟 𝑑,i 𝑒 ⟩𝑙𝑑 − − − − − − − − − −⇀(K,j, 𝑐𝑙𝑜𝑐 𝑒j )such that (𝑙𝑜𝑐 1 ↦→ ⟨𝑟 𝑑,i 𝑠 ⟩𝑙𝑑 − − − − − − − − − −⇀(K,j, 𝑐𝑙𝑜𝑐 𝑠j )) ∈M and 𝜏;⟨𝑟 𝑑,i 𝑠 ⟩𝑙𝑑 − − − − − − − − − −⇀(K,j, 𝑐𝑙𝑜𝑐 𝑠j );S⊢ ewrec ⟨𝑟𝑑,i...

  5. [5]

    42 Singhal et al

    The constraint environment is realized by the runtime configuration: C⊢ 𝑤 𝑓𝑐 𝑓 𝑐 M;S. 42 Singhal et al. [T -Int] Γ;Σ;C;A;N⊢A;N;𝑛:𝜏@𝑙𝑜𝑐 [T -App] | − −⇀𝑙𝑜𝑐 ′ |=| − − −⇀𝑙𝑜𝑐 ′′′ |𝑛=| −⇀v|=| −⇀𝑥|𝑙𝑜𝑐 𝑜𝑢𝑡 ∈N Γ;Σ;C;A;N⊢A;N; −⇀v𝑖 : − − − − −⇀𝜏𝑖 @𝑙𝑜𝑐𝑖 A(𝑟𝑒𝑔)=𝑙𝑜𝑐 𝑜𝑢𝑡 ∀𝑖 .∃ 𝑗 .( − − −⇀𝑙𝑜𝑐 ′′′ 𝑖 = − − −⇀𝑙𝑜𝑐 ′′ 𝑗 ∧ − −⇀𝑙𝑜𝑐 𝑖 = − −⇀𝑙𝑜𝑐 ′ 𝑗 ) ∃𝑘 .(𝑙𝑜𝑐 ′′′ 𝑜𝑢𝑡 = − − −⇀𝑙𝑜𝑐...

  6. [6]

    The allocation and nursery environments are realized by the runtime configuration: A;N⊢ 𝑤 𝑓𝑐𝑎 M;S

  7. [7]

    Typed roots and nursery locations are disjoint: 𝑑𝑜𝑚(Σ) ∩N=∅. SoCal: A Language for Memory-Layout Factorization of Recursive Datatypes 43 [D-Let-Expr] S;M;e 1 ↩→S ′;M ′;e ′ 1 e1 ≠v S;M;let𝑥: ˆ𝜏=e 1 ine 2 ↩→S ′;M ′;let𝑥: ˆ𝜏=e ′ 1 ine 2 [D-Let-V al] S;M;let𝑥: ˆ𝜏=v 1 ine 2 ↩→S;M;e 2 [v1/𝑥] [D-App] S;M;𝑓[ −⇀𝑙𝑜𝑐] −⇀v↩→S;M;e[ −⇀v/ −⇀𝑥] [ −⇀𝑙𝑜𝑐/ − −⇀𝑙𝑜𝑐 ′ ] where...

  8. [8]

    If𝜏=IntandS(r) (i 𝑠 )=n, then Int;⟨r,i 𝑠 ⟩;S⊢ ew ⟨r,i 𝑠 +1⟩

  9. [9]

    |K ′ −⇀𝜏 ′ |

    IfS(r) (i 𝑠 )=K ′ and data𝜏= − − − − −⇀ K1 −⇀𝜏 1 |. . .|K ′ −⇀𝜏 ′ |. . .| − − − − − −⇀ K𝑚 −⇀𝜏 𝑚, then the tag at⟨r,i 𝑠 ⟩is a valid start for a value of type𝜏

  10. [10]

    For constructor fields −⇀𝜏 ′ 1,

    Let𝑤 0 =i 𝑠 +1. For constructor fields −⇀𝜏 ′ 1, . . . , −⇀𝜏 ′ 𝑛, require −⇀𝜏 ′ 1;⟨r, 𝑤 0⟩;S⊢ ew ⟨r, 𝑤 1⟩ and − −⇀𝜏 ′ 𝑗+1 ;⟨r, 𝑤 𝑗 ⟩;S⊢ ew ⟨r, 𝑤 𝑗+1 ⟩𝑗∈ {1, . . . , 𝑛−1}

  11. [11]

    44 Singhal et al

    The overall end witness is the end of the final field, or the cell one past the tag if the constructor has no fields. 44 Singhal et al. Recursive witness relation for factored roots.The recursive witness relation captures the additional invariants needed for factored roots in factored layouts

  12. [12]

    The data-constructor component of the factored root is present at the tag buffer location: S(𝑟 𝑑 ) (i𝑠 )=K ′ for some constructorK ′ of type𝜏

  13. [13]

    Scalar fields and datatypes laid out in an flattened form use the contiguous witness relation, while nested factored fields may in turn require the recursive witness relation

    For each non-self-recursive field entry (K ′, j, 𝑐𝑙𝑜𝑐𝑠j ) in the component structure, the pro- jected concrete field component is well typed for the corresponding field type. Scalar fields and datatypes laid out in an flattened form use the contiguous witness relation, while nested factored fields may in turn require the recursive witness relation

  14. [14]

    Self-recursive fields omitted from the component structure are recovered by the recursive structure of the layout. In the preorder fragment used in the main paper, these descendants are reached by following the tag component and the sequence of factored (after·) con- straints generated by T-DataConstructor-FullyFactored. For instance, the first self recur...

  15. [15]

    In particular, the tag component yields the next constructor position in its buffer, and the field components provide the corresponding end witnesses for the non-tag buffers

    The resulting end witness is the componentwise end of the entire rooted value. In particular, the tag component yields the next constructor position in its buffer, and the field components provide the corresponding end witnesses for the non-tag buffers. D.3 Well-formedness of constructor progress constraints Judgement form. C⊢ 𝑤 𝑓𝑐 𝑓 𝑐 M;S This judgement ...

  16. [16]

    If(𝑙𝑜𝑐↦→ (startr)) ∈C, thenM(𝑙𝑜𝑐)=ZeroIdx(𝑙𝑜𝑐)

  17. [17]

    If(𝑙𝑜𝑐↦→𝑙𝑜𝑐 ′ +1) ∈CandM(𝑙𝑜𝑐 ′)=⟨r,i⟩, thenM(𝑙𝑜𝑐)=⟨r,i+1⟩

  18. [18]

    If(𝑙𝑜𝑐↦→ (after𝜏@𝑙𝑜𝑐 ′)) ∈CandM(𝑙𝑜𝑐 ′)=𝑐𝑙𝑜𝑐 ′, then there exists𝑐𝑙𝑜𝑐 ′′ such that 𝜏;𝑐𝑙𝑜𝑐 ′;S⊢ ew 𝑐𝑙𝑜𝑐 ′′ andM(𝑙𝑜𝑐)=𝑐𝑙𝑜𝑐 ′′

  19. [19]

    If (𝑙𝑑 𝑟𝑑 ↦→ (projTagLoc𝑙𝑜𝑐 1)) ∈ CandM (𝑙𝑜𝑐 1) is a factored concrete root, then its tag component is the concrete location bound to𝑙 𝑑 𝑟𝑑

  20. [20]

    If (𝑙𝑜𝑐 𝑖 ↦→ (projFieldLoc( K, i)𝑙𝑜𝑐 1)) ∈ CandM (𝑙𝑜𝑐 1) is a factored concrete root whose component structure contains(K,i, 𝑐𝑙𝑜𝑐 𝑖 ), thenM(𝑙𝑜𝑐 𝑖 )=𝑐𝑙𝑜𝑐 𝑖

  21. [21]

    D.4 Well-formedness of allocation Judgement form

    If (𝑙𝑜𝑐↦→ (introLocVec𝑙 𝑑 𝑟𝑑 𝑓 𝑙𝑑𝑠)) ∈ C, thenM (𝑙𝑜𝑐) is the factored concrete root obtained by combining the concrete tag location for 𝑙𝑑 𝑟𝑑 with the concrete field locations denoted by the entries of𝑓 𝑙𝑑𝑠. D.4 Well-formedness of allocation Judgement form. A;N⊢ 𝑤 𝑓𝑐𝑎 M;S This judgement ensures that allocation remains linear and that no store cell is overwritten

  22. [22]

    SoCal: A Language for Memory-Layout Factorization of Recursive Datatypes 45

    If 𝑙𝑜𝑐∈ N, then 𝑙𝑜𝑐 is mapped byM, but the concrete store cell(s) denoted by that location have not yet been written inS. SoCal: A Language for Memory-Layout Factorization of Recursive Datatypes 45

  23. [23]

    For a factored location with multiple buffers, the concrete loc will have the highest index across all buffers

    If (r ↦→𝑙𝑜𝑐) ∈ Aand 𝑙𝑜𝑐∈ N, then the concrete location denoted by 𝑙𝑜𝑐 lies strictly past the highest written cell in regionr. For a factored location with multiple buffers, the concrete loc will have the highest index across all buffers

  24. [24]

    If (r ↦→𝑙𝑜𝑐) ∈ Aand 𝑙𝑜𝑐∉ N, then the end witness of the value rooted at 𝑙𝑜𝑐 lies at or past the current allocation frontier for regionr

  25. [25]

    If(r↦→ ∅) ∈A, then regionrhas no written cells inS