Recognition: unknown
Presynthesis: Towards Scaling Up Program Synthesis with Finer-Grained Abstract Semantics
Pith reviewed 2026-05-10 13:14 UTC · model grok-4.3
The pith
An offline presynthesis phase builds a tree automaton and oracle so that finer-grained abstract semantics can accelerate program synthesis without prohibitive pruning costs.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Given any DSL equipped with abstract semantics, an offline presynthesis phase first builds a tree automaton A whose runs correspond exactly to program executions under those semantics; it then extracts an oracle O that, for any input-output example, returns precisely the set of DSL programs satisfying the example under the abstract semantics. During online synthesis the oracle replaces expensive per-program abstract evaluation, so the synthesis engine can safely adopt finer abstractions and still finish faster overall.
What carries the argument
The presynthesis pipeline that first materializes a tree automaton capturing abstract program behavior and then materializes an oracle enabling constant-time satisfaction checks against examples.
If this is right
- Synthesis latency drops when finer abstract domains are used because pruning cost no longer grows with abstraction precision.
- The same presynthesis technique can be instantiated for any DSL that already possesses abstract semantics.
- Sound pruning becomes available without enumerating candidate programs or evaluating them one by one.
- The three evaluated domains (SQL, strings, matrices) each obtain higher success rates or lower timeouts than previous synthesizers.
Where Pith is reading between the lines
- The method could be combined with existing enumerative or neural synthesizers to supply cheap, sound filters at any granularity.
- If the oracle construction itself can be incremental, presynthesis might support live, continuously refined abstractions inside an IDE.
- The automaton representation may also supply debugging information about why a program was pruned, an aspect left implicit in the current work.
- Domains whose abstract semantics are already expensive to evaluate stand to gain the largest speed-ups once an oracle is precomputed.
Load-bearing premise
The one-time cost of building the automaton and oracle is amortized across many synthesis queries and the oracle's pruning decisions are both sound and fast enough to more than offset the modeling work.
What would settle it
Run the full pipeline (presynthesis plus synthesis) on the paper's benchmarks and compare total wall-clock time against the prior state-of-the-art baselines; if the new method is not faster on at least two of the three domains, the central claim does not hold.
Figures
read the original abstract
Abstract semantics has proven to be instrumental for accelerating search-based program synthesis, by enabling the sound pruning of a set of incorrect programs (without enumerating them). One may expect faster synthesis with increasingly finer-grained abstract semantics. Unfortunately, to the best of our knowledge, this is not the case, yet. The reason is because, as abstraction granularity increases -- while fewer programs are enumerated -- pruning becomes more costly. This imposes a fundamental limit on the overall synthesis performance, which we aim to address in this work. Our key idea is to introduce an offline presynthesis phase, which consists of two steps. Given a DSL with abstract semantics, the first semantics modeling step constructs a tree automaton A for a space of inputs -- such that, for any program P and for any considered input I, A has a run that corresponds to P's execution on I under abstract semantics. Then, the second step builds an oracle O for A. This O enables fast pruning during synthesis, by allowing us to efficiently find exactly those DSL programs that satisfy a given input-output example under abstract semantics. We have implemented this presynthesis-based synthesis paradigm in a framework, Foresighter. On top of it, we have developed three instantiations for SQL, string transformation, and matrix manipulation. All of them significantly outperform prior work in the respective domains.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper introduces a presynthesis paradigm for program synthesis that performs an offline phase to construct a tree automaton A representing abstract executions of DSL programs over a space of inputs, followed by building an oracle O that enables fast, sound pruning of incorrect programs during online synthesis. This is realized in the Foresighter framework and instantiated for three domains (SQL, string transformations, and matrix manipulation), with the claim that all three significantly outperform prior work.
Significance. If the offline presynthesis costs amortize over multiple queries and the oracle provides strictly faster yet sound pruning decisions than on-the-fly abstract interpretation, the approach could remove a key scalability barrier for finer-grained abstractions in search-based synthesis. The engineering separation of semantics modeling from synthesis queries is a concrete contribution that, if empirically validated, would be of interest to the program synthesis community.
major comments (2)
- [Abstract] Abstract: the claim that 'all of them significantly outperform prior work' is presented without any quantitative data on synthesis times, success rates, automaton sizes, oracle query latencies, or break-even query counts; this is load-bearing for the central performance thesis and must be substantiated with concrete experimental results (e.g., in the evaluation section).
- [Abstract] Abstract: the description states that O 'enables fast pruning' and that presynthesis time 'is amortized,' yet supplies no complexity bounds, construction-time measurements, or comparison showing that oracle lookups are cheaper than standard abstract interpretation; without this analysis the amortization prerequisite for outperformance remains unverified.
minor comments (1)
- [Abstract] Abstract: the phrase 'a space of inputs' is used without specifying how the space is chosen or bounded, which directly affects both automaton size and pruning completeness.
Simulated Author's Rebuttal
We thank the referee for the careful reading and constructive comments on our manuscript. We address each major comment below and indicate the revisions we will make to strengthen the presentation of our results.
read point-by-point responses
-
Referee: [Abstract] Abstract: the claim that 'all of them significantly outperform prior work' is presented without any quantitative data on synthesis times, success rates, automaton sizes, oracle query latencies, or break-even query counts; this is load-bearing for the central performance thesis and must be substantiated with concrete experimental results (e.g., in the evaluation section).
Authors: The evaluation section of the manuscript reports the requested quantitative results for all three domains, including synthesis times, success rates, automaton sizes, and direct comparisons to prior work that demonstrate the claimed outperformance. To address the concern that the abstract itself does not display these numbers, we will revise the abstract to include a concise summary of the key empirical findings (e.g., speedups and break-even query counts). revision: yes
-
Referee: [Abstract] Abstract: the description states that O 'enables fast pruning' and that presynthesis time 'is amortized,' yet supplies no complexity bounds, construction-time measurements, or comparison showing that oracle lookups are cheaper than standard abstract interpretation; without this analysis the amortization prerequisite for outperformance remains unverified.
Authors: The full manuscript details the offline construction of the tree automaton A and oracle O and presents empirical evidence from the Foresighter instantiations that oracle queries are faster than on-the-fly abstract interpretation in practice, with amortization occurring after a modest number of synthesis queries. We agree that the abstract would benefit from explicit support for these claims. We will revise the abstract to reference the complexity of oracle lookups and include construction-time and latency comparisons drawn from the evaluation. revision: yes
Circularity Check
No circularity; presynthesis technique is an independent engineering contribution
full rationale
The paper introduces an offline presynthesis phase that builds a tree automaton A and oracle O from a DSL's abstract semantics, then uses them for pruning in synthesis. No equations, fitted parameters, or self-citations appear in the derivation; the central claim of outperformance by the three instantiations (SQL, string, matrix) is presented as an empirical engineering result rather than a mathematical reduction to prior inputs. The amortization and pruning-speed assumptions are stated as prerequisites but are not derived from self-referential steps.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Abstract semantics for a DSL can be faithfully represented by a tree automaton over input spaces.
invented entities (2)
-
Tree automaton A
no independent evidence
-
Oracle O
no independent evidence
Reference graph
Works this paper leans on
-
[1]
https://github.com/OutSystems/CUBES
CUBES tool. https://github.com/OutSystems/CUBES
-
[2]
https://en.wikipedia.org/wiki/Elliptic_curve_point_multiplication
Elliptic curve point multiplication. https://en.wikipedia.org/wiki/Elliptic_curve_point_multiplication
-
[3]
https://en.wikipedia.org/wiki/Rainbow_table
Rainbow table (wiki). https://en.wikipedia.org/wiki/Rainbow_table
-
[4]
https://sygus-org.github.io/
Syntax-Guided Synthesis website. https://sygus-org.github.io/
-
[5]
Rajeev Alur, Rastislav Bodik, Garvit Juniwal, Milo M. K. Martin, Mukund Raghothaman, Sanjit A. Seshia, Rishabh Singh, Armando Solar-Lezama, Emina Torlak, and Abhishek Udupa. 2013. Syntax-guided synthesis. In2013 Formal Methods in Computer-Aided Design. 1–8. doi:10.1109/FMCAD.2013.6679385
-
[6]
Ricardo Brancas, Miguel Terra-Neves, Miguel Ventura, Vasco Manquinho, and Ruben Martins. 2024. Towards reliable SQL synthesis: Fuzzing-based evaluation and disambiguation. InInternational Conference on Fundamental Approaches to Software Engineering. Springer Nature Switzerland Cham, 232–254
2024
-
[7]
Ricardo Brancas, Miguel Terra-Neves, Miguel Ventura, Vasco Manquinho, and Ruben Martins. 2025. CUBES: A Parallel Synthesizer for SQL Using Examples.Form. Asp. Comput.(Sept. 2025). doi:10.1145/3768578
-
[8]
Xin Chen, You Peng, Sibo Wang, and Jeffrey Xu Yu. 2022. DLCR: efficient indexing for label-constrained reachability queries on large dynamic graphs.Proceedings of the VLDB Endowment15, 8 (2022), 1645–1657
2022
-
[9]
Edith Cohen, Eran Halperin, Haim Kaplan, and Uri Zwick. 2003. Reachability and distance queries via 2-hop labels. SIAM J. Comput.32, 5 (2003), 1338–1355
2003
-
[10]
Hubert Comon, Max Dauchet, Rémi Gilleron, Florent Jacquemard, Denis Lugiez, Christof Löding, Sophie Tison, and Marc Tommasi. 2008. Tree automata techniques and applications
2008
-
[11]
Patrick Cousot and Radhia Cousot. 1977. Abstract Interpretation: A Unified Lattice Model for Static Analysis of Programs by Construction or Approximation of Fixpoints. InProceedings of the 4th ACM SIGACT-SIGPLAN Symposium on Principles of Programming Languages. ACM Press, Los Angeles, California, 238–252. doi:10.1145/512950.512973
-
[12]
Rui Dong, Qingyue Wu, Danny Ding, Zheng Guo, Ruyi Ji, and Xinyu Wang. 2026.Foresighter (Presynthesis: Towards Scaling Up Program Synthesis with Finer-Grained Abstract Semantics). doi:10.5281/zenodo.19078478
-
[13]
Azadeh Farzan and Victor Nicolet. 2021. Counterexample-guided partial bounding for recursive function synthesis. In International Conference on Computer Aided Verification. Springer, 832–855
2021
-
[14]
Yu Feng, Ruben Martins, Jacob Van Geffen, Isil Dillig, and Swarat Chaudhuri. 2017. Component-based synthesis of table consolidation and transformation tasks from examples.ACM SIGPLAN Notices52, 6 (2017), 422–436
2017
-
[15]
Yu Feng, Ruben Martins, Yuepeng Wang, Isil Dillig, and Thomas W Reps. 2017. Component-based synthesis for complex APIs. InProceedings of the 44th ACM SIGPLAN Symposium on Principles of Programming Languages. 599–612
2017
-
[16]
John K Feser, Swarat Chaudhuri, and Isil Dillig. 2015. Synthesizing data structure transformations from input-output examples.ACM SIGPLAN Notices50, 6 (2015), 229–239. doi:10.1145/2737924.2737977
-
[17]
Sumit Gulwani. 2011. Automating string processing in spreadsheets using input-output examples.ACM Sigplan Notices 46, 1 (2011), 317–330. doi:10.1145/1926385.1926423
-
[18]
Sumit Gulwani, Oleksandr Polozov, Rishabh Singh, et al . 2017. Program synthesis.Foundations and Trends®in Programming Languages4, 1-2 (2017), 1–119
2017
-
[19]
Zheng Guo, Michael James, David Justo, Jiaxiao Zhou, Ziteng Wang, Ranjit Jhala, and Nadia Polikarpova. 2019. Program synthesis by type-guided abstraction refinement.Proceedings of the ACM on Programming Languages4, POPL (2019), 1–28
2019
-
[20]
Sankha Narayan Guria, Jeffrey S Foster, and David Van Horn. 2023. Absynthe: Abstract interpretation-guided synthesis. Proceedings of the ACM on Programming Languages7, PLDI (2023), 1584–1607. Proc. ACM Program. Lang., Vol. 10, No. PLDI, Article 210. Publication date: June 2026. 210:24 Rui Dong, Qingyue Wu, Danny Ding, Zheng Guo, Ruyi Ji, and Xinyu Wang
2023
-
[21]
Tihomir Gvero, Viktor Kuncak, Ivan Kuraj, and Ruzica Piskac. 2013. Complete completion using types and weights. In Proceedings of the 34th ACM SIGPLAN conference on Programming language design and implementation. 27–38
2013
- [22]
-
[23]
Ruoming Jin, Yang Xiang, Ning Ruan, and Haixun Wang. 2008. Efficiently answering reachability queries on very large directed graphs. InProceedings of the 2008 ACM SIGMOD international conference on Management of data. 595–608
2008
-
[24]
Keith JC Johnson, Rahul Krishnan, Thomas Reps, and Loris D’Antoni. 2024. Automating Pruning in Top-Down Enumeration for Program Synthesis Problems with Monotonic Semantics.Proceedings of the ACM on Programming Languages8, OOPSLA2 (2024), 935–961
2024
-
[25]
Rajeev Joshi, Greg Nelson, and Keith Randall. 2002. Denali: A goal-directed superoptimizer.ACM SIGPLAN Notices37, 5 (2002), 304–314
2002
-
[26]
Jared Kaplan, Sam McCandlish, Tom Henighan, Tom B Brown, Benjamin Chess, Rewon Child, Scott Gray, Alec Radford, Jeffrey Wu, and Dario Amodei. 2020. Scaling laws for neural language models.arXiv preprint arXiv:2001.08361(2020)
work page internal anchor Pith review Pith/arXiv arXiv 2020
-
[27]
James Koppel, Zheng Guo, Edsko De Vries, Armando Solar-Lezama, and Nadia Polikarpova. 2022. Searching entangled program spaces.Proceedings of the ACM on Programming Languages6, ICFP (2022), 23–51. doi:10.1145/3547622
-
[28]
Woosuk Lee. 2021. Combining the top-down propagation and bottom-up enumeration for inductive program synthesis. Proceedings of the ACM on Programming Languages5, POPL (2021), 1–28. doi:10.1145/3434335
-
[29]
Xiang Li, Xiangyu Zhou, Rui Dong, Yihong Zhang, and Xinyu Wang. 2024. Efficient bottom-up synthesis for programs with local variables.Proceedings of the ACM on Programming Languages8, POPL (2024), 1540–1568
2024
-
[30]
Yuanbo Li, Kris Satya, and Qirun Zhang. 2022. Efficient algorithms for dynamic bidirected dyck-reachability.Proceedings of the ACM on Programming Languages6, POPL (2022), 1–29
2022
-
[31]
Yuanbo Li, Qirun Zhang, and Thomas Reps. 2021. On the complexity of bidirected interleaved Dyck-reachability. Proceedings of the ACM on Programming Languages5, POPL (2021), 1–28
2021
-
[32]
Yuanbo Li, Qirun Zhang, and Thomas Reps. 2022. Fast graph simplification for interleaved-dyck reachability.ACM Transactions on Programming Languages and Systems (TOPLAS)44, 2 (2022), 1–28
2022
-
[33]
David Mandelin, Lin Xu, Rastislav Bodík, and Doug Kimelman. 2005. Jungloid mining: helping to navigate the API jungle.ACM Sigplan Notices40, 6 (2005), 48–61
2005
-
[34]
Stephen Mell, Steve Zdancewic, and Osbert Bastani. 2024. Optimal Program Synthesis via Abstract Interpretation. Proc. ACM Program. Lang.8, POPL, Article 16 (Jan. 2024), 25 pages. doi:10.1145/3632858
-
[35]
Anders Miltner, Adrian Trejo Nuñez, Ana Brendel, Swarat Chaudhuri, and Isil Dillig. 2022. Bottom-up synthesis of recursive functional programs using angelic execution.Proceedings of the ACM on Programming Languages6, POPL (2022), 1–29. doi:10.1145/3498682
-
[36]
Pedro Orvalho, Miguel Terra-Neves, Miguel Ventura, Ruben Martins, and Vasco Manquinho. 2020. SQUARES: A SQL synthesizer using query reverse engineering.Proceedings of the VLDB Endowment13, 12 (2020), 2853–2856
2020
-
[37]
Mihai Patrascu and Liam Roditty. 2010. Distance oracles beyond the Thorup-Zwick bound. In2010 IEEE 51st Annual Symposium on Foundations of Computer Science. IEEE, 815–823
2010
-
[38]
You Peng, Ying Zhang, Xuemin Lin, Lu Qin, and Wenjie Zhang. 2020. Answering billion-scale label-constrained reachability queries within microsecond.Proceedings of the VLDB Endowment13, 6 (2020), 812–825
2020
-
[39]
Nadia Polikarpova, Ivan Kuraj, and Armando Solar-Lezama. 2016. Program synthesis from polymorphic refinement types.ACM SIGPLAN Notices51, 6 (2016), 522–538
2016
-
[40]
Oleksandr Polozov and Sumit Gulwani. 2015. Flashmeta: A framework for inductive program synthesis. InProceedings of the 2015 ACM SIGPLAN International Conference on Object-Oriented Programming, Systems, Languages, and Applications. 107–126
2015
-
[41]
Gokul Subramanian Ravi, Pranav Gokhale, Yi Ding, William Kirby, Kaitlin Smith, Jonathan M Baker, Peter J Love, Henry Hoffmann, Kenneth R Brown, and Frederic T Chong. 2022. CAFQA: A classical simulation bootstrap for variational quantum algorithms. InProceedings of the 28th ACM International Conference on Architectural Support for Programming Languages and...
2022
-
[42]
Alex Reinking and Ruzica Piskac. 2015. A type-directed approach to program repair. InInternational Conference on Computer Aided Verification. Springer, 511–517
2015
-
[43]
Sherif Sakr and Ghazi Al-Naymat. 2012. An overview of graph indexing and querying techniques.Graph Data Management: Techniques and Applications(2012), 71–88
2012
-
[44]
2008.Program synthesis by sketching
Armando Solar-Lezama. 2008.Program synthesis by sketching. University of California, Berkeley
2008
-
[45]
Armando Solar-Lezama, Liviu Tancau, Rastislav Bodik, Sanjit Seshia, and Vijay Saraswat. 2006. Combinatorial sketching for finite programs. InProceedings of the 12th international conference on Architectural support for programming languages and operating systems. 404–415
2006
-
[46]
Saurabh Srivastava, Sumit Gulwani, and Jeffrey S Foster. 2010. From program verification to program synthesis. In Proceedings of the 37th annual ACM SIGPLAN-SIGACT symposium on Principles of programming languages. 313–326. Proc. ACM Program. Lang., Vol. 10, No. PLDI, Article 210. Publication date: June 2026. Presynthesis 210:25
2010
-
[47]
Keita Takenouchi, Takashi Ishio, Joji Okada, and Yuji Sakata. 2021. PATSQL: efficient synthesis of SQL queries from example tables with quick inference of projected columns.Proc. VLDB Endow.14, 11 (July 2021), 1937–1949. doi:10.14778/3476249.3476253
-
[48]
Mikkel Thorup. 2004. Compact oracles for reachability and approximate distances in planar digraphs.Journal of the ACM (JACM)51, 6 (2004), 993–1024
2004
-
[49]
Mikkel Thorup and Uri Zwick. 2005. Approximate distance oracles.Journal of the ACM (JACM)52, 1 (2005), 1–24
2005
-
[50]
Quoc Trung Tran, Chee-Yong Chan, and Srinivasan Parthasarathy. 2009. Query by output. InProceedings of the 2009 ACM SIGMOD International Conference on Management of data. 535–548
2009
-
[51]
Lucien DJ Valstar, George HL Fletcher, and Yuichi Yoshida. 2017. Landmark indexing for evaluation of label-constrained reachability queries. InProceedings of the 2017 ACM International Conference on Management of Data. 345–358
2017
-
[52]
Chenglong Wang, Alvin Cheung, and Rastislav Bodik. 2017. Synthesizing highly expressive SQL queries from input-output examples. InProceedings of the 38th ACM SIGPLAN Conference on Programming Language Design and Implementation. 452–466
2017
-
[53]
2019.An efficient programming-by-example framework
Xinyu Wang. 2019.An efficient programming-by-example framework. Ph. D. Dissertation
2019
-
[54]
Xinyu Wang, Greg Anderson, Isil Dillig, and Kenneth L McMillan. 2018. Learning Abstractions for Program Synthesis. InInternational Conference on Computer Aided Verification. Springer, 407–426
2018
-
[55]
Xinyu Wang, Isil Dillig, and Rishabh Singh. 2017. Program Synthesis using Abstraction Refinement.Proceedings of the ACM on Programming Languages2, POPL (2017), 1–30. doi:10.1145/3158151
-
[56]
Xinyu Wang, Isil Dillig, and Rishabh Singh. 2017. Synthesis of data completion scripts using finite tree automata. Proceedings of the ACM on Programming Languages1, OOPSLA (2017), 1–26. doi:10.1145/3133886
-
[57]
Yuepeng Wang, Xinyu Wang, and Isil Dillig. 2018. Relational program synthesis.Proceedings of the ACM on Programming Languages2, OOPSLA (2018), 1–27. doi:10.1145/3276525
-
[58]
Navid Yaghmazadeh, Xinyu Wang, and Isil Dillig. 2018. Automated migration of hierarchical data to relational tables using programming-by-example.VLDB11, 5 (2018), 580–593. doi:10.1145/3187009.3177735
-
[59]
Yongho Yoon, Woosuk Lee, and Kwangkeun Yi. 2023. Inductive Program Synthesis via Iterative Forward-Backward Abstract Interpretation.Proc. ACM Program. Lang.7, PLDI, Article 174 (June 2023), 25 pages. doi:10.1145/3591288 Proc. ACM Program. Lang., Vol. 10, No. PLDI, Article 210. Publication date: June 2026. 210:26 Rui Dong, Qingyue Wu, Danny Ding, Zheng Guo...
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.