arxiv: 2605.02691 · v1 · submitted 2026-05-04 · 💻 cs.PL

Recognition: unknown

Compile-Time Java Stream Fusion via mapMulti

Yegor Bugayenko , Maxim Trunnikov , Vladimir Zakharov

Authors on Pith no claims yet

Pith reviewed 2026-05-08 01:38 UTC · model grok-4.3

classification 💻 cs.PL

keywords Java Stream APIstream fusionmapMulticompile-time optimizationbytecode transformationperformanceintermediate objects

0 comments

The pith

A bytecode optimizer fuses chains of Java map and filter calls into one mapMulti to cut intermediate objects while preserving the stream programming model.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper describes a compile-time tool that rewrites Java Stream pipelines by collapsing consecutive map and filter operations into a single mapMulti call introduced in Java 16. This fusion removes the performance cost of creating temporary stream objects without converting the entire pipeline into imperative loops. The approach sidesteps restrictions in earlier optimizers around variable assignment, object escaping in lambdas, and stream passing. Tests on nine benchmarks showed better or equal speed in most cases, and the tool processed the full bytecode of a large real-world system without altering test outcomes. If the transformation holds, stream-based code can stay expressive yet run closer to hand-written loops.

Core claim

The central claim is that merging successive map and filter stages into a single mapMulti invocation at the bytecode level achieves stream fusion for Java, delivering performance gains comparable to loop unrolling while avoiding the semantic restrictions of prior methods.

What carries the argument

The mapMulti operation, which lets one element produce zero or more output elements inside a single pipeline stage, acts as the fusion target that absorbs preceding map and filter logic.

If this is right

Stream pipelines can be written with multiple operations yet execute with fewer object allocations.
The same bytecode-level pass works on already-compiled libraries without source changes.
Cases that break loop-unrolling optimizers, such as streams stored in variables, remain fusable.
A large existing codebase can be optimized and still pass its full test suite.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Similar fusion patterns could be applied to other languages that expose a multi-output operation analogous to mapMulti.
Automatic fusion might encourage wider adoption of stream APIs in performance-sensitive code.
Extending the analyzer to additional stream methods such as flatMap would increase the set of fusable pipelines.

Load-bearing premise

The mapMulti rewrite must keep the exact original stream behavior for every pipeline the analyzer selects.

What would settle it

Apply the optimizer to a stream pipeline that assigns an intermediate stream to a variable or passes it through a method call, then compare the final output and side effects against the unoptimized version.

Figures

Figures reproduced from arXiv: 2605.02691 by Maxim Trunnikov, Vladimir Zakharov, Yegor Bugayenko.

**Figure 1.** Figure 1: Two variants of computing sums of even squares suggested by Møller and Veileborg [22] to demonstrate that the usage of the Stream API degrades performance. Modern JVMs handle streams better than their earlier versions. We reproduced the experiment of Møller and Veileborg [22] view at source ↗

**Figure 2.** Figure 2: Java class after compilation and disassembly to 𝜑-expression. The Java compiler translates lambda expressions into invokedynamic and invokeinterface instruction pairs [27]. We rewrite these pairs into Φ.filter or Φ.map pragmas—𝜑-calculus formations that temporarily carry rewriting data but don’t exist in the final bytecode. Both Φ.filter and Φ.map are then unified into Φ.distill, an abstraction accepting a… view at source ↗

read the original abstract

The Java Stream API, introduced in Java 8, makes data processing more expressive and concise compared to imperative loops. However, this abstraction can come with significant performance overhead, often due to the creation of multiple intermediate objects during pipeline execution. In functional languages such as Haskell, this problem is addressed through stream fusion, a compile-time optimization that eliminates unnecessary intermediate structures. Inspired by this idea, Streamliner was the first tool to perform ahead-of-time, bytecode-to-bytecode stream optimization for Java by unrolling stream pipelines into imperative loops. In this paper, we introduce an open-source optimizer that takes a different approach. Instead of unrolling streams into loops, it merges consecutive map() and filter() operations into a single mapMulti() call, available since Java 16. Our method avoids several limitations of Streamliner, including its sensitivity to escaping objects in lambda expressions and its restrictions on assigning or passing streams as variables. We evaluated our optimizer on nine benchmarks and observed superior performance in two cases and comparable results in most others. We also applied it to the bytecode of Apache Kafka, successfully executing all 31,799 unit tests without failures.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

This paper gives a bytecode optimizer that fuses Java map/filter streams into mapMulti calls, sidestepping some Streamliner limits with decent real-world checks but thin quantitative detail.

read the letter

The main takeaway is a new compile-time tool that rewrites chains of map and filter operations into single mapMulti calls at the bytecode level. This is positioned as an alternative to Streamliner's unrolling approach, and it claims to work in more situations, such as when streams are stored in variables or when lambdas let objects escape. The authors report that their optimizer ran on nine benchmarks, outperforming the baseline in two cases and matching it in most others. They also applied it to Apache Kafka bytecode and confirmed that all 31,799 unit tests still passed. That last result is a useful practical signal that the changes did not cause obvious breakage on a large codebase. The paper does a clear job laying out the motivation around intermediate object overhead in the Stream API and why mapMulti fusion can reduce it without turning everything into loops. The shift to mapMulti is a genuine technical difference from the cited prior work. The soft spots are mostly around evidence and assumptions. The abstract supplies no concrete speedup numbers or descriptions of the benchmark workloads, so the size of the improvement is hard to judge. The central claim depends on the static analysis correctly selecting only safe pipelines and the fused mapMulti producing identical observable behavior, including side effects, exceptions, and ordering. The Kafka test suite is reassuring but does not cover stateful lambdas, parallel streams, or every edge case the analysis might encounter. Without more on the analysis rules or additional verification, the semantic preservation part stays under-supported. This paper is for readers working on Java performance tools or language implementations who want an example of a different fusion tactic. Someone already following Streamliner or similar optimizers would find the approach worth seeing. I would send it to peer review. The idea is distinct enough and the Kafka result is a positive sign, but it needs tighter numbers and more discussion of correctness to hold up under scrutiny.

Referee Report

2 major / 1 minor

Summary. The paper presents a compile-time, bytecode-to-bytecode optimizer for Java Streams that fuses sequences of map() and filter() operations into single mapMulti() calls (available since Java 16) rather than unrolling pipelines into imperative loops as in the prior Streamliner tool. It claims this avoids Streamliner's limitations regarding escaping objects in lambdas and restrictions on stream variable assignments or passing. Evaluation consists of nine benchmarks (superior performance in two cases, comparable in most) plus successful execution of all 31,799 unit tests on the transformed bytecode of Apache Kafka.

Significance. If semantic preservation can be established more rigorously, the work supplies a practical, source-compatible alternative to loop-unrolling stream fusion for Java that leverages a standard library primitive. The Kafka-scale validation on a production codebase is a concrete strength, demonstrating that the approach can be applied to large real-world code without immediate breakage.

major comments (2)

[Evaluation] The evaluation reports only qualitative outcomes (superior in two of nine benchmarks, comparable in most) with no quantitative speedup factors, no description of the benchmark workloads, no baseline details (unoptimized streams vs. Streamliner), and no measurement methodology. This gap directly weakens the performance-claim component of the central thesis.
[Implementation and Semantics] Semantic preservation of the mapMulti fusion is load-bearing for the entire contribution, yet the only evidence is the Kafka test-suite result. The paper does not discuss or test cases involving stateful lambdas, side-effecting operations, parallel streams, exception semantics, or order-dependent behavior that could differ when collapsing intermediate Stream objects; the static analysis rules for selecting fusable pipelines are also not shown to be sound on the full space of patterns.

minor comments (1)

[Abstract and Evaluation] The abstract and evaluation would be clearer if they explicitly stated the Java version targeted and whether the optimizer handles short-circuiting operations or terminal operations beyond the fused maps/filters.

Simulated Author's Rebuttal

2 responses · 1 unresolved

We thank the referee for the constructive and detailed comments, which highlight important areas for improvement in our presentation of the evaluation and semantic arguments. We will revise the manuscript to strengthen these sections while preserving the core contribution of a practical bytecode-to-bytecode optimizer using mapMulti.

read point-by-point responses

Referee: [Evaluation] The evaluation reports only qualitative outcomes (superior in two of nine benchmarks, comparable in most) with no quantitative speedup factors, no description of the benchmark workloads, no baseline details (unoptimized streams vs. Streamliner), and no measurement methodology. This gap directly weakens the performance-claim component of the central thesis.

Authors: We agree that the evaluation section is insufficiently detailed. In the revised version we will report concrete speedup factors (or slowdowns) for all nine benchmarks, provide full descriptions of each workload including input sizes and operations performed, specify the exact baselines (unoptimized streams and Streamliner where applicable), and document the measurement methodology including hardware, JVM version, number of iterations, warm-up strategy, and statistical reporting. revision: yes
Referee: [Implementation and Semantics] Semantic preservation of the mapMulti fusion is load-bearing for the entire contribution, yet the only evidence is the Kafka test-suite result. The paper does not discuss or test cases involving stateful lambdas, side-effecting operations, parallel streams, exception semantics, or order-dependent behavior that could differ when collapsing intermediate Stream objects; the static analysis rules for selecting fusable pipelines are also not shown to be sound on the full space of patterns.

Authors: We accept that the current evidence for semantic preservation is primarily empirical via the Kafka suite. The revision will add a new subsection that explicitly discusses stateful lambdas, side effects, parallel streams, exceptions, and ordering. We will clarify that the optimizer targets only stateless map/filter pipelines and relies on mapMulti's standard-library semantics to preserve observable behavior for those cases; parallel streams are currently out of scope and will be stated as such. The static analysis rules will be presented in full, accompanied by informal soundness reasoning based on behavioral equivalence for supported patterns and additional micro-benchmarks exercising the listed edge cases. A machine-checked formal proof over the entire Java language is outside the scope of this practical engineering paper. revision: partial

standing simulated objections not resolved

A complete formal proof of soundness for the static analysis rules across the full space of possible Java stream usage patterns

Circularity Check

0 steps flagged

No circularity; empirical claims rest on direct implementation and external testing

full rationale

The paper describes an engineering implementation that rewrites stream pipelines into mapMulti calls at the bytecode level, then reports benchmark timings and successful execution of Kafka's 31,799 unit tests. No equations, fitted parameters, uniqueness theorems, or derivation steps appear in the provided text. Claims of semantic preservation and performance improvement are supported solely by the external test corpus and direct measurement rather than by any self-referential reduction or self-citation chain. The central precondition (that the chosen transformations leave observable behavior unchanged) is an engineering assumption verified by the Kafka suite, not a logical step that collapses into the method's own definition.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on the domain assumption that mapMulti fusion is semantically equivalent to the original map/filter pipeline and that the bytecode transformer can reliably detect applicable cases.

axioms (1)

domain assumption Java Stream API semantics are preserved when consecutive map and filter operations are replaced by a single mapMulti call
The optimizer's correctness depends on this equivalence holding for the patterns it rewrites.

pith-pipeline@v0.9.0 · 5498 in / 1375 out tokens · 76914 ms · 2026-05-08T01:38:44.975584+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

33 extracted references · 29 canonical work pages

[1]

Matteo Basso, Filippo Schiavio, Andrea Rosà, and Walter Binder. 2022. Optimizing Parallel Java Streams. InProceedings of the 26th Interna- tional Conference on Engineering of Complex Computer Systems. 23–32. doi:10.1109/ICECCS54210.2022.00012

work page doi:10.1109/iceccs54210.2022.00012 2022
[2]

Aggelos Biboudis, Nick Palladinos, and Yannis Smaragdakis. 2014. Clash of the Lambdas. arXiv:1406.6631 [cs.PL]

work page arXiv 2014
[3]

R. S. Bird. 1989. Algebraic Identities for Program Calculation.The Computer Journal32, 2 (1 1989), 122–126. doi:10.1093/comjnl/32.2.122

work page doi:10.1093/comjnl/32.2.122 1989
[4]

Yegor Bugayenko and Maxim Trunnikov. 2021. 𝜑-Calculus: Object-Oriented Formalism. arXiv:2111.13384 [cs.PL]

work page arXiv 2021
[5]

Yegor Bugayenko, Maxim Trunnikov, and Vladimir Zakharov. 2026. Compile-Time Java Stream Fusion via mapMulti. doi:10.5281/zenodo .19837769

work page doi:10.5281/zenodo 2026
[6]

Wei-Ngan Chin. 1992. Safe Fusion of Functional Expressions. InPro- ceedings of the Conference on LISP and Functional Programming. 11–20. doi:10.1145/141471.141494

work page doi:10.1145/141471.141494 1992
[7]

Duncan Coutts, Roman Leshchinskiy, and Don Stewart. 2007. Stream Fusion: From Lists to Streams to Nothing at All.SIGPLAN Notices42, 9 (10 2007), 315–326. doi:10.1145/1291220.1291199

work page doi:10.1145/1291220.1291199 2007
[8]

Gilles Duboscq, Lukas Stadler, Thomas Würthinger, Doug Simon, Christian Wimmer, and Hanspeter Mössenböck. 2013. Graal IR: An Extensible Declarative Intermediate Representation

2013
[9]

Michael Eichberg and Ben Hermann. 2014. A Software Product Line for Static Analyses: The OPAL Framework. InProceedings of the 3rd SIGPLAN International Workshop on the State of the Art in Java Program Analysis. 1–6. doi:10.1145/2614628.2614630

work page doi:10.1145/2614628.2614630 2014
[10]

Andy Georges, Dries Buytaert, and Lieven Eeckhout. 2007. Statistically Rigorous Java Performance Evaluation.ACM SIGPLAN Notices42, 10 (2007), 57–76. doi:10.1145/1297105.1297033 Compile-Time Java Stream Fusion viamapMultiSOAP ’26, June 15–19, 2026, Boulder, CO, USA

work page doi:10.1145/1297105.1297033 2007
[11]

Peyton Jones

Andrew Gill, John Launchbury, and Simon L. Peyton Jones. 1993. A Short Cut to Deforestation. InProceedings of the Conference on Func- tional Programming Languages and Computer Architecture. 223–232. doi:10.1145/165180.165214

work page doi:10.1145/165180.165214 1993
[12]

G. W. Hamilton. 2002. Extending Higher-Order Deforestation: Trans- forming Programs to Eliminate Even More Trees

2002
[13]

Ralf Hinze, Thomas Harper, and Daniel W. H. James. 2011. Theory and Practice of Fusion. InProceedings of the Implementation and Application of Functional Languages. 19–37. doi:10.1007/978-3-642-24276-2_2

work page doi:10.1007/978-3-642-24276-2_2 2011
[14]

Martin Hirzel, Robert Soulé, Scott Schneider, Buğra Gedik, and Robert Grimm. 2014. A Catalog of Stream Processing Optimizations.ACM Computing Surveys46, 4 (2014), 1–34. doi:10.1145/2528412

work page doi:10.1145/2528412 2014
[15]

Ruyi Ji, Yuwei Zhao, Nadia Polikarpova, Yingfei Xiong, and Zhenjiang Hu. 2024. Superfusion: Eliminating Intermediate Data Structures via Inductive Synthesis.Proceedings of ACM Programmning Languages8, 1 (6 2024). doi:10.1145/3656415

work page doi:10.1145/3656415 2024
[16]

Raffi Khatchadourian, Yiming Tang, and Mehdi Bagherzadeh. 2020. Safe Automated Refactoring for Intelligent Parallelization of Java 8 Streams.Science of Computer Programming195, 1 (2020). doi:10.1016/j. scico.2020.102476

work page doi:10.1016/j 2020
[17]

Raffi Khatchadourian, Yiming Tang, Mehdi Bagherzadeh, and Baishakhi Ray. 2020. An Empirical Study on the Use and Misuse of Java 8 Streams. InProceedings of the International Conference on Fun- damental Approaches to Software Engineering. 97–118. doi:10.1007/978- 3-030-45234-6_5

work page doi:10.1007/978- 2020
[18]

Oleg Kiselyov, Aggelos Biboudis, Nick Palladinos, and Yannis Smarag- dakis. 2017. Stream Fusion, to Completeness.SIGPLAN Notices52, 1 (1 2017), 285–299. doi:10.1145/3093333.3009880

work page doi:10.1145/3093333.3009880 2017
[19]

Oleg Kiselyov, Tomoaki Kobayashi, and Nick Palladinos. 2024. Complete Fusion for Stateful Streams: Equational Theory of Stateful Streams and Fusion as Normalization-by-Evaluation. arXiv:2412.15768 [cs.PL]

work page arXiv 2024
[20]

Davood Mazinanian, Ameya Ketkar, Nikolaos Tsantalis, and Danny Dig. 2017. Understanding the Use of Lambda Expressions in Java. Proceedings of the ACM on Programming Languages1, 1 (2017), 1–31. doi:10.1145/3133909

work page doi:10.1145/3133909 2017
[21]

Erik Meijer, Maarten Fokkinga, and Ross Paterson. 1991. Functional Programming With Bananas, Lenses, Envelopes and Barbed Wire. In Proceedings of the Functional Programming Languages and Computer Architecture. 124–144. doi:10.1007/3540543961_7

work page doi:10.1007/3540543961_7 1991
[22]

Anders Møller and Oskar Haarklou Veileborg. 2020. Eliminating Ab- straction Overhead of Java Stream Pipelines Using Ahead-of-Time Pro- gram Optimization. InProceedings of the Conference on Object-Oriented Programming, Systems, Languages, and Applications (OOPSLA). 1–29. doi:10.1145/3428236

work page doi:10.1145/3428236 2020
[23]

Derek Gordon Murray, Michael Isard, and Yuan Yu. 2011. Steno: Au- tomatic Optimization of Declarative Queries. InProceedings of the 32md Conference on Programming Language Design and Implementa- tion. 121–131. doi:10.1145/1993498.1993513

work page doi:10.1145/1993498.1993513 2011
[24]

Erik Poll and Simon Thompson. 1999. Algebra of Programming by Richard Bird and Oege De Moor.Journal of Functional Programming 9, 3 (1999), 347–354. doi:10.1017/S095679689922326X

work page doi:10.1017/s095679689922326x 1999
[25]

Michael Reif, Florian Kübler, Dominik Helm, Ben Hermann, Michael Eichberg, and Mira Mezini. 2020. TACAI: An Intermediate Repre- sentation Based on Abstract Interpretation. InProceedings of the 9th SIGPLAN International Workshop on the State of the Art in Program Analysis. 2–7. doi:10.1145/3394451.3397204

work page doi:10.1145/3394451.3397204 2020
[26]

Francisco José Torres Ribeiro. 2018. Java Stream Optimization Through Program Fusion.https://hdl.handle.net/1822/59688. [Online; accessed 23-07-2025]

2018
[27]

John R. Rose. 2009. Bytecodes Meet Combinators: Invokedynamic on the JVM. InProceedings of the Third Workshop on Virtual Machines and Intermediate Languages. 1–11. doi:10.1145/1711506.1711508

work page doi:10.1145/1711506.1711508 2009
[28]

Joanna C. S. Santos and Julian Dolby. 2022. Program Analysis Using WALA. InProceedings of the 30th Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering

2022
[29]

doi:10.1145/3540250.3569449

work page doi:10.1145/3540250.3569449
[30]

Akihiko Takano and Erik Meijer. 1995. Shortcut Deforestation in Calculational Form. InProceedings of the 7th International Confer- ence on Functional Programming Languages and Computer Architecture. 306–313. doi:10.1145/224164.224221

work page doi:10.1145/224164.224221 1995
[31]

Tian Tan and Yue Li. 2023. Tai-e: A Developer-Friendly Static Analysis Framework for Java by Harnessing the Good Designs of Classics. In Proceedings of the 32nd International Symposium on Software Testing and Analysis. 1093–1105. doi:10.1145/3597926.3598120

work page doi:10.1145/3597926.3598120 2023
[32]

Raja Vallée-Rai, Phong Co, Etienne Gagnon, Laurie Hendren, Patrick Lam, and Vijay Sundaresan. 1999. Soot: A Java Bytecode Optimization Framework. InProceedings of the Conference of the Centre for Advanced Studies on Collaborative Research (CASCON’99). 214–224. doi:10.5555/ 781995.782008

work page arXiv 1999
[33]

Philip Wadler. 1990. Deforestation: Transforming Programs to Elim- inate Trees.Theoretical Computer Science73, 2 (1990), 231–248. doi:10.1016/0304-3975(90)90147-A Received 2026-02-10; accepted 2026-04-15

work page doi:10.1016/0304-3975(90)90147-a 1990