pith. machine review for the scientific record. sign in

arxiv: 2605.02691 · v1 · submitted 2026-05-04 · 💻 cs.PL

Recognition: unknown

Compile-Time Java Stream Fusion via mapMulti

Authors on Pith no claims yet

Pith reviewed 2026-05-08 01:38 UTC · model grok-4.3

classification 💻 cs.PL
keywords Java Stream APIstream fusionmapMulticompile-time optimizationbytecode transformationperformanceintermediate objects
0
0 comments X

The pith

A bytecode optimizer fuses chains of Java map and filter calls into one mapMulti to cut intermediate objects while preserving the stream programming model.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper describes a compile-time tool that rewrites Java Stream pipelines by collapsing consecutive map and filter operations into a single mapMulti call introduced in Java 16. This fusion removes the performance cost of creating temporary stream objects without converting the entire pipeline into imperative loops. The approach sidesteps restrictions in earlier optimizers around variable assignment, object escaping in lambdas, and stream passing. Tests on nine benchmarks showed better or equal speed in most cases, and the tool processed the full bytecode of a large real-world system without altering test outcomes. If the transformation holds, stream-based code can stay expressive yet run closer to hand-written loops.

Core claim

The central claim is that merging successive map and filter stages into a single mapMulti invocation at the bytecode level achieves stream fusion for Java, delivering performance gains comparable to loop unrolling while avoiding the semantic restrictions of prior methods.

What carries the argument

The mapMulti operation, which lets one element produce zero or more output elements inside a single pipeline stage, acts as the fusion target that absorbs preceding map and filter logic.

If this is right

  • Stream pipelines can be written with multiple operations yet execute with fewer object allocations.
  • The same bytecode-level pass works on already-compiled libraries without source changes.
  • Cases that break loop-unrolling optimizers, such as streams stored in variables, remain fusable.
  • A large existing codebase can be optimized and still pass its full test suite.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Similar fusion patterns could be applied to other languages that expose a multi-output operation analogous to mapMulti.
  • Automatic fusion might encourage wider adoption of stream APIs in performance-sensitive code.
  • Extending the analyzer to additional stream methods such as flatMap would increase the set of fusable pipelines.

Load-bearing premise

The mapMulti rewrite must keep the exact original stream behavior for every pipeline the analyzer selects.

What would settle it

Apply the optimizer to a stream pipeline that assigns an intermediate stream to a variable or passes it through a method call, then compare the final output and side effects against the unoptimized version.

Figures

Figures reproduced from arXiv: 2605.02691 by Maxim Trunnikov, Vladimir Zakharov, Yegor Bugayenko.

Figure 1
Figure 1. Figure 1: Two variants of computing sums of even squares suggested by Møller and Veileborg [22] to demonstrate that the usage of the Stream API degrades performance. Modern JVMs handle streams better than their earlier versions. We reproduced the experiment of Møller and Veile￾borg [22] view at source ↗
Figure 2
Figure 2. Figure 2: Java class after compilation and disassembly to 𝜑-expression. The Java compiler translates lambda expressions into invokedynamic and invokeinterface instruction pairs [27]. We rewrite these pairs into Φ.filter or Φ.map pragmas—𝜑-calculus formations that temporarily carry rewriting data but don’t exist in the final bytecode. Both Φ.filter and Φ.map are then unified into Φ.distill, an abstraction accepting a… view at source ↗
read the original abstract

The Java Stream API, introduced in Java 8, makes data processing more expressive and concise compared to imperative loops. However, this abstraction can come with significant performance overhead, often due to the creation of multiple intermediate objects during pipeline execution. In functional languages such as Haskell, this problem is addressed through stream fusion, a compile-time optimization that eliminates unnecessary intermediate structures. Inspired by this idea, Streamliner was the first tool to perform ahead-of-time, bytecode-to-bytecode stream optimization for Java by unrolling stream pipelines into imperative loops. In this paper, we introduce an open-source optimizer that takes a different approach. Instead of unrolling streams into loops, it merges consecutive map() and filter() operations into a single mapMulti() call, available since Java 16. Our method avoids several limitations of Streamliner, including its sensitivity to escaping objects in lambda expressions and its restrictions on assigning or passing streams as variables. We evaluated our optimizer on nine benchmarks and observed superior performance in two cases and comparable results in most others. We also applied it to the bytecode of Apache Kafka, successfully executing all 31,799 unit tests without failures.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. The paper presents a compile-time, bytecode-to-bytecode optimizer for Java Streams that fuses sequences of map() and filter() operations into single mapMulti() calls (available since Java 16) rather than unrolling pipelines into imperative loops as in the prior Streamliner tool. It claims this avoids Streamliner's limitations regarding escaping objects in lambdas and restrictions on stream variable assignments or passing. Evaluation consists of nine benchmarks (superior performance in two cases, comparable in most) plus successful execution of all 31,799 unit tests on the transformed bytecode of Apache Kafka.

Significance. If semantic preservation can be established more rigorously, the work supplies a practical, source-compatible alternative to loop-unrolling stream fusion for Java that leverages a standard library primitive. The Kafka-scale validation on a production codebase is a concrete strength, demonstrating that the approach can be applied to large real-world code without immediate breakage.

major comments (2)
  1. [Evaluation] The evaluation reports only qualitative outcomes (superior in two of nine benchmarks, comparable in most) with no quantitative speedup factors, no description of the benchmark workloads, no baseline details (unoptimized streams vs. Streamliner), and no measurement methodology. This gap directly weakens the performance-claim component of the central thesis.
  2. [Implementation and Semantics] Semantic preservation of the mapMulti fusion is load-bearing for the entire contribution, yet the only evidence is the Kafka test-suite result. The paper does not discuss or test cases involving stateful lambdas, side-effecting operations, parallel streams, exception semantics, or order-dependent behavior that could differ when collapsing intermediate Stream objects; the static analysis rules for selecting fusable pipelines are also not shown to be sound on the full space of patterns.
minor comments (1)
  1. [Abstract and Evaluation] The abstract and evaluation would be clearer if they explicitly stated the Java version targeted and whether the optimizer handles short-circuiting operations or terminal operations beyond the fused maps/filters.

Simulated Author's Rebuttal

2 responses · 1 unresolved

We thank the referee for the constructive and detailed comments, which highlight important areas for improvement in our presentation of the evaluation and semantic arguments. We will revise the manuscript to strengthen these sections while preserving the core contribution of a practical bytecode-to-bytecode optimizer using mapMulti.

read point-by-point responses
  1. Referee: [Evaluation] The evaluation reports only qualitative outcomes (superior in two of nine benchmarks, comparable in most) with no quantitative speedup factors, no description of the benchmark workloads, no baseline details (unoptimized streams vs. Streamliner), and no measurement methodology. This gap directly weakens the performance-claim component of the central thesis.

    Authors: We agree that the evaluation section is insufficiently detailed. In the revised version we will report concrete speedup factors (or slowdowns) for all nine benchmarks, provide full descriptions of each workload including input sizes and operations performed, specify the exact baselines (unoptimized streams and Streamliner where applicable), and document the measurement methodology including hardware, JVM version, number of iterations, warm-up strategy, and statistical reporting. revision: yes

  2. Referee: [Implementation and Semantics] Semantic preservation of the mapMulti fusion is load-bearing for the entire contribution, yet the only evidence is the Kafka test-suite result. The paper does not discuss or test cases involving stateful lambdas, side-effecting operations, parallel streams, exception semantics, or order-dependent behavior that could differ when collapsing intermediate Stream objects; the static analysis rules for selecting fusable pipelines are also not shown to be sound on the full space of patterns.

    Authors: We accept that the current evidence for semantic preservation is primarily empirical via the Kafka suite. The revision will add a new subsection that explicitly discusses stateful lambdas, side effects, parallel streams, exceptions, and ordering. We will clarify that the optimizer targets only stateless map/filter pipelines and relies on mapMulti's standard-library semantics to preserve observable behavior for those cases; parallel streams are currently out of scope and will be stated as such. The static analysis rules will be presented in full, accompanied by informal soundness reasoning based on behavioral equivalence for supported patterns and additional micro-benchmarks exercising the listed edge cases. A machine-checked formal proof over the entire Java language is outside the scope of this practical engineering paper. revision: partial

standing simulated objections not resolved
  • A complete formal proof of soundness for the static analysis rules across the full space of possible Java stream usage patterns

Circularity Check

0 steps flagged

No circularity; empirical claims rest on direct implementation and external testing

full rationale

The paper describes an engineering implementation that rewrites stream pipelines into mapMulti calls at the bytecode level, then reports benchmark timings and successful execution of Kafka's 31,799 unit tests. No equations, fitted parameters, uniqueness theorems, or derivation steps appear in the provided text. Claims of semantic preservation and performance improvement are supported solely by the external test corpus and direct measurement rather than by any self-referential reduction or self-citation chain. The central precondition (that the chosen transformations leave observable behavior unchanged) is an engineering assumption verified by the Kafka suite, not a logical step that collapses into the method's own definition.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on the domain assumption that mapMulti fusion is semantically equivalent to the original map/filter pipeline and that the bytecode transformer can reliably detect applicable cases.

axioms (1)
  • domain assumption Java Stream API semantics are preserved when consecutive map and filter operations are replaced by a single mapMulti call
    The optimizer's correctness depends on this equivalence holding for the patterns it rewrites.

pith-pipeline@v0.9.0 · 5498 in / 1375 out tokens · 76914 ms · 2026-05-08T01:38:44.975584+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

33 extracted references · 29 canonical work pages

  1. [1]

    Matteo Basso, Filippo Schiavio, Andrea Rosà, and Walter Binder. 2022. Optimizing Parallel Java Streams. InProceedings of the 26th Interna- tional Conference on Engineering of Complex Computer Systems. 23–32. doi:10.1109/ICECCS54210.2022.00012

  2. [2]

    Aggelos Biboudis, Nick Palladinos, and Yannis Smaragdakis. 2014. Clash of the Lambdas. arXiv:1406.6631 [cs.PL]

  3. [3]

    R. S. Bird. 1989. Algebraic Identities for Program Calculation.The Computer Journal32, 2 (1 1989), 122–126. doi:10.1093/comjnl/32.2.122

  4. [4]

    Yegor Bugayenko and Maxim Trunnikov. 2021. 𝜑-Calculus: Object-Oriented Formalism. arXiv:2111.13384 [cs.PL]

  5. [5]

    Yegor Bugayenko, Maxim Trunnikov, and Vladimir Zakharov. 2026. Compile-Time Java Stream Fusion via mapMulti. doi:10.5281/zenodo .19837769

  6. [6]

    Wei-Ngan Chin. 1992. Safe Fusion of Functional Expressions. InPro- ceedings of the Conference on LISP and Functional Programming. 11–20. doi:10.1145/141471.141494

  7. [7]

    Duncan Coutts, Roman Leshchinskiy, and Don Stewart. 2007. Stream Fusion: From Lists to Streams to Nothing at All.SIGPLAN Notices42, 9 (10 2007), 315–326. doi:10.1145/1291220.1291199

  8. [8]

    Gilles Duboscq, Lukas Stadler, Thomas Würthinger, Doug Simon, Christian Wimmer, and Hanspeter Mössenböck. 2013. Graal IR: An Extensible Declarative Intermediate Representation

  9. [9]

    Michael Eichberg and Ben Hermann. 2014. A Software Product Line for Static Analyses: The OPAL Framework. InProceedings of the 3rd SIGPLAN International Workshop on the State of the Art in Java Program Analysis. 1–6. doi:10.1145/2614628.2614630

  10. [10]

    Andy Georges, Dries Buytaert, and Lieven Eeckhout. 2007. Statistically Rigorous Java Performance Evaluation.ACM SIGPLAN Notices42, 10 (2007), 57–76. doi:10.1145/1297105.1297033 Compile-Time Java Stream Fusion viamapMultiSOAP ’26, June 15–19, 2026, Boulder, CO, USA

  11. [11]

    Peyton Jones

    Andrew Gill, John Launchbury, and Simon L. Peyton Jones. 1993. A Short Cut to Deforestation. InProceedings of the Conference on Func- tional Programming Languages and Computer Architecture. 223–232. doi:10.1145/165180.165214

  12. [12]

    G. W. Hamilton. 2002. Extending Higher-Order Deforestation: Trans- forming Programs to Eliminate Even More Trees

  13. [13]

    Ralf Hinze, Thomas Harper, and Daniel W. H. James. 2011. Theory and Practice of Fusion. InProceedings of the Implementation and Application of Functional Languages. 19–37. doi:10.1007/978-3-642-24276-2_2

  14. [14]

    Martin Hirzel, Robert Soulé, Scott Schneider, Buğra Gedik, and Robert Grimm. 2014. A Catalog of Stream Processing Optimizations.ACM Computing Surveys46, 4 (2014), 1–34. doi:10.1145/2528412

  15. [15]

    Ruyi Ji, Yuwei Zhao, Nadia Polikarpova, Yingfei Xiong, and Zhenjiang Hu. 2024. Superfusion: Eliminating Intermediate Data Structures via Inductive Synthesis.Proceedings of ACM Programmning Languages8, 1 (6 2024). doi:10.1145/3656415

  16. [16]

    Raffi Khatchadourian, Yiming Tang, and Mehdi Bagherzadeh. 2020. Safe Automated Refactoring for Intelligent Parallelization of Java 8 Streams.Science of Computer Programming195, 1 (2020). doi:10.1016/j. scico.2020.102476

  17. [17]

    Raffi Khatchadourian, Yiming Tang, Mehdi Bagherzadeh, and Baishakhi Ray. 2020. An Empirical Study on the Use and Misuse of Java 8 Streams. InProceedings of the International Conference on Fun- damental Approaches to Software Engineering. 97–118. doi:10.1007/978- 3-030-45234-6_5

  18. [18]

    Oleg Kiselyov, Aggelos Biboudis, Nick Palladinos, and Yannis Smarag- dakis. 2017. Stream Fusion, to Completeness.SIGPLAN Notices52, 1 (1 2017), 285–299. doi:10.1145/3093333.3009880

  19. [19]

    Oleg Kiselyov, Tomoaki Kobayashi, and Nick Palladinos. 2024. Complete Fusion for Stateful Streams: Equational Theory of Stateful Streams and Fusion as Normalization-by-Evaluation. arXiv:2412.15768 [cs.PL]

  20. [20]

    Davood Mazinanian, Ameya Ketkar, Nikolaos Tsantalis, and Danny Dig. 2017. Understanding the Use of Lambda Expressions in Java. Proceedings of the ACM on Programming Languages1, 1 (2017), 1–31. doi:10.1145/3133909

  21. [21]

    Erik Meijer, Maarten Fokkinga, and Ross Paterson. 1991. Functional Programming With Bananas, Lenses, Envelopes and Barbed Wire. In Proceedings of the Functional Programming Languages and Computer Architecture. 124–144. doi:10.1007/3540543961_7

  22. [22]

    Anders Møller and Oskar Haarklou Veileborg. 2020. Eliminating Ab- straction Overhead of Java Stream Pipelines Using Ahead-of-Time Pro- gram Optimization. InProceedings of the Conference on Object-Oriented Programming, Systems, Languages, and Applications (OOPSLA). 1–29. doi:10.1145/3428236

  23. [23]

    Derek Gordon Murray, Michael Isard, and Yuan Yu. 2011. Steno: Au- tomatic Optimization of Declarative Queries. InProceedings of the 32md Conference on Programming Language Design and Implementa- tion. 121–131. doi:10.1145/1993498.1993513

  24. [24]

    Erik Poll and Simon Thompson. 1999. Algebra of Programming by Richard Bird and Oege De Moor.Journal of Functional Programming 9, 3 (1999), 347–354. doi:10.1017/S095679689922326X

  25. [25]

    Michael Reif, Florian Kübler, Dominik Helm, Ben Hermann, Michael Eichberg, and Mira Mezini. 2020. TACAI: An Intermediate Repre- sentation Based on Abstract Interpretation. InProceedings of the 9th SIGPLAN International Workshop on the State of the Art in Program Analysis. 2–7. doi:10.1145/3394451.3397204

  26. [26]

    Francisco José Torres Ribeiro. 2018. Java Stream Optimization Through Program Fusion.https://hdl.handle.net/1822/59688. [Online; accessed 23-07-2025]

  27. [27]

    John R. Rose. 2009. Bytecodes Meet Combinators: Invokedynamic on the JVM. InProceedings of the Third Workshop on Virtual Machines and Intermediate Languages. 1–11. doi:10.1145/1711506.1711508

  28. [28]

    Joanna C. S. Santos and Julian Dolby. 2022. Program Analysis Using WALA. InProceedings of the 30th Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering

  29. [29]

    doi:10.1145/3540250.3569449

  30. [30]

    Akihiko Takano and Erik Meijer. 1995. Shortcut Deforestation in Calculational Form. InProceedings of the 7th International Confer- ence on Functional Programming Languages and Computer Architecture. 306–313. doi:10.1145/224164.224221

  31. [31]

    Tian Tan and Yue Li. 2023. Tai-e: A Developer-Friendly Static Analysis Framework for Java by Harnessing the Good Designs of Classics. In Proceedings of the 32nd International Symposium on Software Testing and Analysis. 1093–1105. doi:10.1145/3597926.3598120

  32. [32]

    Raja Vallée-Rai, Phong Co, Etienne Gagnon, Laurie Hendren, Patrick Lam, and Vijay Sundaresan. 1999. Soot: A Java Bytecode Optimization Framework. InProceedings of the Conference of the Centre for Advanced Studies on Collaborative Research (CASCON’99). 214–224. doi:10.5555/ 781995.782008

  33. [33]

    Philip Wadler. 1990. Deforestation: Transforming Programs to Elim- inate Trees.Theoretical Computer Science73, 2 (1990), 231–248. doi:10.1016/0304-3975(90)90147-A Received 2026-02-10; accepted 2026-04-15