pith. sign in

arxiv: 2605.20763 · v1 · pith:6XHSE5AQnew · submitted 2026-05-20 · 💻 cs.LG

ShapeBench: A Scalable Benchmark and Diagnostic Suite for Standardized Evaluation in Aerodynamic Shape Optimization

Pith reviewed 2026-05-21 06:29 UTC · model grok-4.3

classification 💻 cs.LG
keywords aerodynamic shape optimizationbenchmarkoptimizer evaluationsurrogate modelsevolutionary algorithmsLLM-driven optimizationshape categoriesfidelity gap
0
0 comments X

The pith

A new benchmark for aerodynamic shape optimization reveals that optimizer performance rankings change dramatically across different shape categories and problem types.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces ShapeBench to address the lack of standardized evaluation in aerodynamic shape optimization. It provides a unified set of 103 tasks across eight shape categories with validated surrogates for efficient testing and options for high-fidelity checks. By including consistent baselines and a new LLM-based method, the benchmark shows that optimizer effectiveness does not transfer well between tasks. This matters because relying on results from single problems can lead to misleading conclusions about which methods work best in real applications. Researchers can now compare approaches more reliably and identify where general solutions are still needed.

Core claim

ShapeBench is an open-source benchmark with a unified API for 103 aerodynamic shape optimization tasks spanning eight shape categories and multiple regimes. Each task comes with a validated surrogate model for fast optimization and, where possible, a CFD pipeline for verification. Using a consistent budget metric, comparisons of classical optimizers and LLM-driven methods, including the new ShapeEvolve baseline, show substantial variance in rankings with a mean pairwise Spearman correlation of only 0.013. This indicates that single-task results do not generalize reliably across problem classes, and classical methods are not broadly applicable.

What carries the argument

ShapeBench, a scalable benchmark suite that standardizes tasks, provides surrogates for search, and uses a matched-budget protocol to enable fair comparisons across optimizers and shape classes.

Load-bearing premise

The surrogates used in the benchmark are accurate enough representations of the true optimization landscapes to make search results meaningful.

What would settle it

If a broad set of optimizers shows consistently similar ranking orders when evaluated across all eight shape categories using the same budget, that would contradict the observed variance.

Figures

Figures reproduced from arXiv: 2605.20763 by Jack Guo, Krissh Chawla, Madeleine Udell, Matthias Ihme, Shaghayegh Fazliani, Yiren Shen.

Figure 1
Figure 1. Figure 1: An overview of ShapeBench: covered geometries, optimizers, simulation environments, an [PITH_FULL_IMAGE:figures/full_fig_p002_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: ShapeBench generates design- and run-level visuals (e.g., geometry/field plots and opti [PITH_FULL_IMAGE:figures/full_fig_p004_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Overview of the ShapeEvolve pipeline. and benchmarked against all baselines. Figure 4a shows evaluation across diverse shape categories and task setups can surface optimizer behaviors that are not visible in single-task studies — that is, performance is strongly task-dependent. For instance, Bayesian optimization and PSO perform best on the CERAS and delta-wing tasks, but rank among the weakest methods on … view at source ↗
Figure 4
Figure 4. Figure 4: (a) Final median objective values for each task per optimizer. Colors are column-wise [PITH_FULL_IMAGE:figures/full_fig_p006_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: shows a representative CCA experiment and illustrates ShapeBench’s cross-optimizer visualization tools. Panel 5(b) also compares the best designs from each optimizer against the reference geometry produced by nTop[31]. Optimizers behave differently on this task than on many others from Figure 4a: ShapeEvolve converges to the best L/D, significantly outperforming classical methods. The source of this distin… view at source ↗
Figure 6
Figure 6. Figure 6: (a) Convergence plot (objective vs. evaluations) plot for the 2D airfoil multi-point drag [PITH_FULL_IMAGE:figures/full_fig_p008_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: (a) Best designs for the baseline and for each method (2D side views and 3D isometric [PITH_FULL_IMAGE:figures/full_fig_p008_7.png] view at source ↗
Figure 8
Figure 8. Figure 8: Median normalized rank trajectory of each optimizer over relative evaluation budget across [PITH_FULL_IMAGE:figures/full_fig_p017_8.png] view at source ↗
Figure 9
Figure 9. Figure 9: Schematic of the proposed VortexNet graph neural network (GNN) architecture. Colors indicate different network blocks. A black arrow indicates represents direct message passing between blocks, and a blue arrow denotes a skip connection to the receiving block. The figure also presents snapshots of the graph at each computational step, showing nodes, edges, and their associated feature arrays. Figure is from… view at source ↗
Figure 10
Figure 10. Figure 10: The HF surface pressure coefficient obtained from CFD for a wing with [PITH_FULL_IMAGE:figures/full_fig_p028_10.png] view at source ↗
Figure 11
Figure 11. Figure 11: Overall architecture of the surrogate model. The network consists of two main components: [PITH_FULL_IMAGE:figures/full_fig_p029_11.png] view at source ↗
Figure 12
Figure 12. Figure 12: Illustration of the blended wing body (BWB) planform parameterization, showing key [PITH_FULL_IMAGE:figures/full_fig_p030_12.png] view at source ↗
Figure 13
Figure 13. Figure 13: Three-view diagram of a typical kink wing. Figure is from [ [PITH_FULL_IMAGE:figures/full_fig_p031_13.png] view at source ↗
Figure 14
Figure 14. Figure 14: correlation of predicted aerodynamic coefficient CL and CD against VLM data for [PITH_FULL_IMAGE:figures/full_fig_p032_14.png] view at source ↗
Figure 15
Figure 15. Figure 15: CCA: Collaborative Combat Aircraft Parameter Lower Bound Upper Bound Dihedral angle (deg) 0.25 15 Max Wing Blend (mm) 25 1000 Inlet Angle 1 (deg) 0 45 Inlet Angle 2 (deg) 0 10 Wing Position 0.22 0.51 Rear Point (mm) (4500, 0, 0) (7500, 0, 0) . . . . . . . . . Inlet Location 0.2 0.6 NACA 4-digit code {1412, 0012, 2408, 4412} Fore Top Angle (deg) 0 10 Aft Top Angle (deg) 12 32.5 Top Height Aft (mm) 36 220 B… view at source ↗
Figure 16
Figure 16. Figure 16: Grid convergence [PITH_FULL_IMAGE:figures/full_fig_p033_16.png] view at source ↗
Figure 17
Figure 17. Figure 17: Sampled CCA designs representative of geometries in [PITH_FULL_IMAGE:figures/full_fig_p034_17.png] view at source ↗
Figure 18
Figure 18. Figure 18: COCOANet CCA surrogate data characteristics [PITH_FULL_IMAGE:figures/full_fig_p034_18.png] view at source ↗
Figure 19
Figure 19. Figure 19: correlation of predicted aerodynamic coefficient CL and CD against CFD data [PITH_FULL_IMAGE:figures/full_fig_p035_19.png] view at source ↗
Figure 20
Figure 20. Figure 20: Single point lift-to-drag optimization results [PITH_FULL_IMAGE:figures/full_fig_p037_20.png] view at source ↗
Figure 21
Figure 21. Figure 21: Best shape per method for lift-to-drag delta wing task [PITH_FULL_IMAGE:figures/full_fig_p037_21.png] view at source ↗
Figure 22
Figure 22. Figure 22: Convergence trajectories for optimizers for two point Vortnet task [PITH_FULL_IMAGE:figures/full_fig_p038_22.png] view at source ↗
Figure 23
Figure 23. Figure 23: Best Design Overlay of Delta Wingdesign for two point objective [PITH_FULL_IMAGE:figures/full_fig_p038_23.png] view at source ↗
Figure 24
Figure 24. Figure 24: Objectives vs. evaluations plot for 3D BWB Design multipoint tasks. As given in [PITH_FULL_IMAGE:figures/full_fig_p042_24.png] view at source ↗
Figure 25
Figure 25. Figure 25: Best designs, 2D planform and 3D isometric views, for the min- [PITH_FULL_IMAGE:figures/full_fig_p043_25.png] view at source ↗
Figure 26
Figure 26. Figure 26: Cross-objective scatter plot for the min- [PITH_FULL_IMAGE:figures/full_fig_p044_26.png] view at source ↗
Figure 27
Figure 27. Figure 27: Objectives vs. evaluations plot for max- [PITH_FULL_IMAGE:figures/full_fig_p045_27.png] view at source ↗
Figure 28
Figure 28. Figure 28: Best designs, 2D planform and 3D isometric views, max- [PITH_FULL_IMAGE:figures/full_fig_p045_28.png] view at source ↗
Figure 29
Figure 29. Figure 29: Best designs, 2D planform overlay comparisons across both objectives. Left-hand side [PITH_FULL_IMAGE:figures/full_fig_p046_29.png] view at source ↗
Figure 30
Figure 30. Figure 30: 3D BWB experiment with different initializations; (a) reward vs. evaluations & (b) BWB [PITH_FULL_IMAGE:figures/full_fig_p047_30.png] view at source ↗
Figure 31
Figure 31. Figure 31: Three-stage optimization pipeline for the 2D airfoil single-point maximum lift-to-drag task. [PITH_FULL_IMAGE:figures/full_fig_p050_31.png] view at source ↗
Figure 32
Figure 32. Figure 32: Stage 1 convergence plot (objective vs. evaluations) of the four optimization methods for [PITH_FULL_IMAGE:figures/full_fig_p051_32.png] view at source ↗
Figure 33
Figure 33. Figure 33: Overlay of best IPOPT-refined airfoil profiles (with [PITH_FULL_IMAGE:figures/full_fig_p051_33.png] view at source ↗
Figure 34
Figure 34. Figure 34: Convergence plot (objective vs. evaluations) plot for the 2D airfoil multi-point drag [PITH_FULL_IMAGE:figures/full_fig_p052_34.png] view at source ↗
Figure 35
Figure 35. Figure 35: Airfoil design profiles (y/c vs. x/c) for best-performing designs from all five methods, for the 2D airfoil multi-point drag minimization task. Adjoint (IPOPT) ShapeEvolve PSO (120p×500i) L-BFGS-B Bayes. Opt. (exact GP) 0.0775 0.0800 0.0825 0.0850 0.0875 0.0900 0.0925 0.0950 0.0975 W eig hte d CD (lo w e r is b ette r) Best-design XFOIL vs NeuralFoil evaluation multipoint CL targets (CL 2 © 0:8; 1:0; 1:2;… view at source ↗
Figure 36
Figure 36. Figure 36: NeuralFoil and XFOIL evaluations of best design for each method, for the 2D airfoil multi-point drag minimization task. • thin airfoil height, relative to that of the 2D airfoil single-point maximum lift-to-drag task seen in figure 33 • moderate camber (maximum camber value of y/c ≃ 0.12 near x/c ≃ 0.35) The Bayesian optimization design is over-cambered and thicker overall, with a correspondingly larger C… view at source ↗
Figure 37
Figure 37. Figure 37: Single Objective SuperWing case: minimize CD with CL constraint [PITH_FULL_IMAGE:figures/full_fig_p055_37.png] view at source ↗
Figure 38
Figure 38. Figure 38: SuperWing best design overlay per method for single objective drag minimization Operating points M0 ∈ {0.75, 0.80, 0.86, 0.90} Objective min x f(x) = min x 1 K X k∈K   −Mk CL(x; α (k) 0 , Mk) CD(x; α (k) 0 , Mk) λ [PITH_FULL_IMAGE:figures/full_fig_p055_38.png] view at source ↗
Figure 39
Figure 39. Figure 39: Optimizer trajectories for Multi point range maximization problem for [PITH_FULL_IMAGE:figures/full_fig_p056_39.png] view at source ↗
Figure 40
Figure 40. Figure 40: Optimum bets design per method for SuperWing multi-point range maximization problem Objective min x f(x) = min x X j wj   −M0 CL(x; α (j) 0 , M0) CD(x; α (j) 0 , M0) λ [PITH_FULL_IMAGE:figures/full_fig_p056_40.png] view at source ↗
Figure 41
Figure 41. Figure 41: ShapeBench SuperWing problem 30: optimization methods and results [PITH_FULL_IMAGE:figures/full_fig_p057_41.png] view at source ↗
Figure 42
Figure 42. Figure 42: ShapeBench SuperWing problem 30: best design overlay [PITH_FULL_IMAGE:figures/full_fig_p057_42.png] view at source ↗
Figure 43
Figure 43. Figure 43: Enter Caption D.5 3D Collaborative Combat Aircraft (CCA) Design Design variables We use the following 3D single-duct drone parametrization from nTop [31]. 57 [PITH_FULL_IMAGE:figures/full_fig_p057_43.png] view at source ↗
Figure 44
Figure 44. Figure 44: CCA evaluation for optimization of lift-to-drag ratio. (a) optimization method performance [PITH_FULL_IMAGE:figures/full_fig_p059_44.png] view at source ↗
Figure 45
Figure 45. Figure 45: Convergence plot (objective vs. evaluations) for minimized [PITH_FULL_IMAGE:figures/full_fig_p060_45.png] view at source ↗
Figure 46
Figure 46. Figure 46: Best designs for the baseline and for each method (2D side views and 3D isometric views) [PITH_FULL_IMAGE:figures/full_fig_p060_46.png] view at source ↗
Figure 47
Figure 47. Figure 47: CERAS fuelmass objective optimization results [PITH_FULL_IMAGE:figures/full_fig_p062_47.png] view at source ↗
Figure 48
Figure 48. Figure 48: Best designs per method overlayed for CERAS fuelmass case [PITH_FULL_IMAGE:figures/full_fig_p063_48.png] view at source ↗
read the original abstract

Rapid progress in aerodynamic shape optimization (ASO) has outpaced currently-available standardized evaluation frameworks. Fair comparison requires a unified benchmark spanning diverse shape classes, objective formulations, and matched-budget state-of-the-art baselines. We introduce ShapeBench, an open-source ASO benchmark with a unified API spanning 103 tasks across eight shape categories and multiple optimization regimes. Each ShapeBench task includes a validated surrogate for fast search; when feasible, a high-fidelity Computational Fluid Dynamics (CFD) pipeline for final verification is available, enabling systematic fidelity-gap analysis. ShapeBench provides a reproducible protocol with well-configured baselines to compare fairly using a consistent budget metric, allowing for comparison among both classical and LLM-driven methods, including general-purpose optimizers and a new domain-specialized evolutionary LLM baseline, ShapeEvolve. Results on ShapeBench demonstrate substantial variance in optimizer rankings across shape categories and problem formulations, with mean pairwise Spearman $\rho = 0.013$, so single-task conclusions do not reliably generalize across problem classes. The benchmark is also far from saturation; classical methods are rarely applicable across all shape categories and tasks, further highlighting the need for more general-purpose approaches.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

1 major / 1 minor

Summary. The manuscript introduces ShapeBench, an open-source benchmark for aerodynamic shape optimization (ASO) with a unified API spanning 103 tasks across eight shape categories and multiple optimization regimes. Each task supplies a validated surrogate for fast search and, when feasible, a high-fidelity CFD pipeline for verification and fidelity-gap analysis. The benchmark supplies reproducible protocols and matched-budget baselines that include both classical optimizers and LLM-driven methods, notably a new domain-specialized evolutionary LLM baseline (ShapeEvolve). The central empirical result is substantial variance in optimizer rankings across shape categories and problem formulations, quantified by a mean pairwise Spearman ρ of 0.013, from which the authors conclude that single-task conclusions do not reliably generalize.

Significance. If the surrogates are demonstrated to preserve relative optimizer rankings that would be obtained under high-fidelity CFD, ShapeBench would provide a much-needed standardized, scalable evaluation framework for ASO that enables fair comparison of classical and emerging LLM-based approaches. The public release, consistent budget metric, and explicit support for fidelity-gap studies are clear strengths. The reported low cross-task correlation usefully challenges the common practice of drawing broad conclusions from single-task experiments. The overall significance remains moderate until quantitative surrogate validation is supplied.

major comments (1)
  1. [Abstract] Abstract (paragraph on task structure and baselines): The central claim that observed ranking variance (mean pairwise Spearman ρ = 0.013) demonstrates failure of single-task conclusions to generalize rests on the assumption that surrogate error does not distort relative optimizer performance. The abstract states that surrogates are “validated” and that a CFD pipeline enables “systematic fidelity-gap analysis,” yet no quantitative validation statistics—RMSE, rank correlation with CFD on optimized designs, or per-category fidelity-gap tables—are reported. Without these, it remains possible that non-uniform approximation error across shape categories inflates the reported variance.
minor comments (1)
  1. [Abstract] Abstract: The phrase “mean pairwise Spearman ρ = 0.013” would benefit from an explicit statement of the exact set of optimizer pairs and tasks over which the mean is taken, and whether the value is computed on final performance or on the entire optimization trajectory.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the detailed and constructive feedback. The concern about surrogate validation and its potential effect on the reported ranking variance is a substantive point that merits direct response. We address it below and outline planned revisions.

read point-by-point responses
  1. Referee: The central claim that observed ranking variance (mean pairwise Spearman ρ = 0.013) demonstrates failure of single-task conclusions to generalize rests on the assumption that surrogate error does not distort relative optimizer performance. The abstract states that surrogates are “validated” and that a CFD pipeline enables “systematic fidelity-gap analysis,” yet no quantitative validation statistics—RMSE, rank correlation with CFD on optimized designs, or per-category fidelity-gap tables—are reported. Without these, it remains possible that non-uniform approximation error across shape categories inflates the reported variance.

    Authors: We agree that the interpretation of the low mean pairwise Spearman ρ = 0.013 as evidence against generalizing from single tasks implicitly assumes that surrogate approximation error does not systematically alter relative optimizer rankings across categories. Each surrogate is trained on CFD-generated data and the benchmark supplies a high-fidelity CFD pipeline for verification on feasible tasks, but the current manuscript does not report explicit quantitative metrics (RMSE on held-out CFD points, rank correlation of final designs, or per-category fidelity-gap tables) that would directly test preservation of optimizer orderings. In the revised manuscript we will add a dedicated validation subsection containing these statistics on a representative subset of tasks, together with an updated abstract that references the new results. This addition will allow readers to evaluate the possible contribution of non-uniform surrogate error to the observed variance. revision: yes

Circularity Check

0 steps flagged

No significant circularity detected

full rationale

The paper introduces ShapeBench as an open benchmark with 103 tasks and reports an empirical finding of low mean pairwise Spearman ρ = 0.013 in optimizer rankings across categories. This statistic is obtained by running the listed baselines on the defined tasks and surrogates; it is a direct measurement rather than a quantity derived from prior inputs by construction, fitted parameter, or self-citation chain. No equations, uniqueness theorems, or ansatzes are invoked that reduce the central claim to the benchmark definition itself. The result is therefore self-contained as an observation on a publicly released resource.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claims rest on the assumption that the 103 tasks and surrogates form a representative and validated testbed for comparing optimizers across regimes.

axioms (1)
  • domain assumption Surrogates are validated and sufficiently accurate for optimization search; high-fidelity CFD is available for verification when feasible.
    Stated in the abstract description of each task.

pith-pipeline@v0.9.0 · 5756 in / 1204 out tokens · 37219 ms · 2026-05-21T06:29:54.381740+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

59 extracted references · 59 canonical work pages · 5 internal anchors

  1. [1]

    Accelerating materials design via llm-guided evolutionary search.arXiv preprint arXiv:2510.22503, 2025

    Nikhil Abhyankar, Sanchit Kabra, Saaketh Desai, and Chandan K Reddy. Accelerating materials design via llm-guided evolutionary search.arXiv preprint arXiv:2510.22503, 2025

  2. [2]

    Aerodynamic shape optimization benchmarks with error control and automatic parameterization

    George R Anderson, Marian Nemec, and Michael J Aftosmis. Aerodynamic shape optimization benchmarks with error control and automatic parameterization. In53rd AIAA Aerospace Sciences Meeting, page 1719, 2015

  3. [3]

    Casadi—a soft- ware framework for nonlinear optimization and optimal control.Mathematical Programming Computation, 11(1):1–36, 2018

    Joel Andersson, Joris Gillis, Greg Horn, Jim Rawlings, and Moritz Diehl. Casadi—a soft- ware framework for nonlinear optimization and optimal control.Mathematical Programming Computation, 11(1):1–36, 2018

  4. [4]

    Ashton, N., Mockett, C., Fuchs, M., Fliessbach, L., Het- mann, H., Knacke, T., Schonwald, N., Skaperdas, V ., Fotiadis, G., Walle, A., et al

    Neil Ashton, Charles Mockett, Marian Fuchs, Louis Fliessbach, Hendrik Hetmann, Thilo Knacke, Norbert Schonwald, Vangelis Skaperdas, Grigoris Fotiadis, Astrid Walle, et al. Dri- vAerML: High-fidelity computational fluid dynamics dataset for road-car external aerodynamics. arXiv preprint arXiv:2408.11969, 2024

  5. [5]

    Botorch: A framework for efficient monte-carlo bayesian optimization

    Maximilian Balandat, Brian Karrer, Daniel Jiang, Samuel Daulton, Ben Letham, Andrew G Wil- son, and Eytan Bakshy. Botorch: A framework for efficient monte-carlo bayesian optimization. Advances in neural information processing systems, 33:21524–21538, 2020

  6. [6]

    Zilliac, Jim S

    Jennifer Dacles-Mariani, Gregory G. Zilliac, Jim S. Chow, and Peter Bradshaw. Numerical/- experimental study of a wingtip vortex in the near field.AIAA Journal, 33(9):1561–1568, 1995

  7. [7]

    From fast to fast-oad: An open source framework for rapid overall aircraft design

    Christophe David, Scott Delbecq, Sebastien Defoort, Peter Schmollgruber, Emmanuel Benard, and Valerie Pommier-Budinger. From fast to fast-oad: An open source framework for rapid overall aircraft design. InIOP Conference Series: Materials Science and Engineering, volume 1024, page 012062. IOP Publishing, 2021

  8. [8]

    An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale

    Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias Minderer, Georg Heigold, Sylvain Gelly, et al. An image is worth 16x16 words: Transformers for image recognition at scale.arXiv preprint arXiv:2010.11929, 2020

  9. [9]

    Pros & cons of airfoil optimization

    Mark Drela. Pros & cons of airfoil optimization. InFrontiers of Computational Fluid Dynamics 1998, pages 363–381. World Scientific, 1998

  10. [10]

    Drivaernet++: A large- scale multimodal car dataset with computational fluid dynamics simulations and deep learning benchmarks, 2025

    Mohamed Elrefaie, Florin Morar, Angela Dai, and Faez Ahmed. Drivaernet++: A large- scale multimodal car dataset with computational fluid dynamics simulations and deep learning benchmarks, 2025

  11. [11]

    John Wiley & Sons, 2008

    Alexander Forrester, Andras Sobester, and Andy Keane.Engineering design via surrogate modelling: a practical guide. John Wiley & Sons, 2008

  12. [12]

    A Tutorial on Bayesian Optimization

    Peter I Frazier. A tutorial on bayesian optimization.arXiv preprint arXiv:1807.02811, 2018

  13. [13]

    Design of optimal aerodynamic shapes using stochastic optimization methods and computational intelligence.Progress in Aerospace sciences, 38(1):43–76, 2002

    KC Giannakoglou. Design of optimal aerodynamic shapes using stochastic optimization methods and computational intelligence.Progress in Aerospace sciences, 38(1):43–76, 2002

  14. [14]

    An introduction to the adjoint approach to design.Flow, turbulence and combustion, 65(3):393–415, 2000

    Michael B Giles and Niles A Pierce. An introduction to the adjoint approach to design.Flow, turbulence and combustion, 65(3):393–415, 2000

  15. [15]

    Cma-es/pycma on github, 2019

    Nikolaus Hansen, Youhei Akimoto, and Petr Baudis. Cma-es/pycma on github, 2019. 10

  16. [16]

    Coco: A platform for comparing continuous optimizers in a black-box setting.Optimization Methods and Software, 36(1):114–144, 2021

    Nikolaus Hansen, Anne Auger, Raymond Ros, Olaf Mersmann, Tea Tušar, and Dimo Brockhoff. Coco: A platform for comparing continuous optimizers in a black-box setting.Optimization Methods and Software, 36(1):114–144, 2021

  17. [17]

    Completely derandomized self-adaptation in evolu- tion strategies.Evolutionary computation, 9(2):159–195, 2001

    Nikolaus Hansen and Andreas Ostermeier. Completely derandomized self-adaptation in evolu- tion strategies.Evolutionary computation, 9(2):159–195, 2001

  18. [18]

    direct search

    Robert Hooke and Terry A Jeeves. “direct search”solution of numerical and statistical problems. Journal of the ACM (JACM), 8(2):212–229, 1961

  19. [19]

    Aerodynamic design via control theory.Journal of scientific computing, 3(3):233–260, 1988

    Antony Jameson. Aerodynamic design via control theory.Journal of scientific computing, 3(3):233–260, 1988

  20. [20]

    Particle swarm optimization

    James Kennedy and Russell Eberhart. Particle swarm optimization. InProceedings of ICNN’95- international conference on neural networks, volume 4, pages 1942–1948. ieee, 1995

  21. [21]

    Aerodynamic shape optimization of the crm configuration including buffet-onset conditions

    Gaetan K Kenway and Joaquim RRA Martins. Aerodynamic shape optimization of the crm configuration including buffet-onset conditions. In54th AIAA Aerospace Sciences Meeting, page 1294, 2016

  22. [22]

    Adam: A Method for Stochastic Optimization

    Diederik P Kingma and Jimmy Ba. Adam: A method for stochastic optimization.arXiv preprint arXiv:1412.6980, 2014

  23. [23]

    Universal parametric geometry representation method.Journal of aircraft, 45(1):142–158, 2008

    Brenda M Kulfan. Universal parametric geometry representation method.Journal of aircraft, 45(1):142–158, 2008

  24. [24]

    Shinkaevolve: Towards open-ended and sample-efficient program evolution, 2025

    Robert Tjarko Lange, Yuki Imajuku, and Edoardo Cetin. Shinkaevolve: Towards open-ended and sample-efficient program evolution, 2025

  25. [25]

    Study based on the aiaa aero- dynamic design optimization discussion group test cases.AIAA Journal, 53(7):1910–1935, 2015

    Stephen T LeDoux, John C Vassberg, David P Young, Spencer Fugal, Dmitry Kamenetskiy, William P Huffman, Robin G Melvin, and Matthew F Smith. Study based on the aiaa aero- dynamic design optimization discussion group test cases.AIAA Journal, 53(7):1910–1935, 2015

  26. [26]

    Solving quantitative reasoning problems with language models, 2022

    Aitor Lewkowycz, Anders Andreassen, David Dohan, Ethan Dyer, Henryk Michalewski, Vinay Ramasesh, Ambrose Slone, Cem Anil, Imanol Schlag, Theo Gutman-Solo, Yuhuai Wu, Behnam Neyshabur, Guy Gur-Ari, and Vedant Misra. Solving quantitative reasoning problems with language models, 2022

  27. [27]

    Afbench: A large-scale benchmark for airfoil design

    Jian Liu, Jianyu Wu, Hairun Xie, Guoqing Zhang, Jing Wang, Wei Liu, Wanli Ouyang, Junjun Jiang, Xianming Liu, Shixiang Tang, et al. Afbench: A large-scale benchmark for airfoil design. Advances in Neural Information Processing Systems, 37:82757–82780, 2024

  28. [28]

    Decoupled Weight Decay Regularization

    Ilya Loshchilov and Frank Hutter. Decoupled weight decay regularization.arXiv preprint arXiv:1711.05101, 2017

  29. [29]

    Martins and Andrew B

    Joaquim R.R.A. Martins and Andrew B. Lambe. Multidisciplinary design optimization: A survey of architectures.AIAA Journal, 51:2049–2075, 2013

  30. [30]

    MeshLib: Mesh Processing Library

    MeshInspector. MeshLib: Mesh Processing Library. https://github.com/MeshInspector/ MeshLib, 2025. Version 3.0.9.196, accessed 2026-04-30

  31. [31]

    ntop (release 4.1), 2025

    nTop Inc. ntop (release 4.1), 2025. Computational design software

  32. [32]

    Improving reproducibility in machine learning research (a report from the neurips 2019 reproducibility program), 2020

    Joelle Pineau, Philippe Vincent-Lamarre, Koustuv Sinha, Vincent Larivière, Alina Beygelzimer, Florence d’Alché Buc, Emily Fox, and Hugo Larochelle. Improving reproducibility in machine learning research (a report from the neurips 2019 reproducibility program), 2020

  33. [33]

    Drivaerstar: An industrial-grade cfd dataset for vehicle aerodynamic optimization.arXiv preprint arXiv:2510.16857, 2025

    Jiyan Qiu, Lyulin Kuang, Guan Wang, Yichen Xu, Leiyao Cui, Shaotong Fu, Yixin Zhu, and Ruihua Zhang. Drivaerstar: An industrial-grade cfd dataset for vehicle aerodynamic optimization.arXiv preprint arXiv:2510.16857, 2025

  34. [34]

    Surrogate-based analysis and optimization.Progress in aerospace sciences, 41(1):1–28, 2005

    Nestor V Queipo, Raphael T Haftka, Wei Shyy, Tushar Goel, Rajkumar Vaidyanathan, and P Kevin Tucker. Surrogate-based analysis and optimization.Progress in aerospace sciences, 41(1):1–28, 2005. 11

  35. [35]

    Mathematical discoveries from program search with large language models

    Bernardino Romera-Paredes, Mohammadamin Barekatain, Alexander Novikov, Matej Balog, M Pawan Kumar, Emilien Dupont, Francisco JR Ruiz, Jordan S Ellenberg, Pengming Wang, Omar Fawzi, et al. Mathematical discoveries from program search with large language models. Nature, 625(7995):468–475, 2024

  36. [36]

    Evolution Strategies as a Scalable Alternative to Reinforcement Learning

    Tim Salimans, Jonathan Ho, Xi Chen, Szymon Sidor, and Ilya Sutskever. Evolution strategies as a scalable alternative to reinforcement learning.arXiv preprint arXiv:1703.03864, 2017

  37. [37]

    Marco Saporito, Andrea Da Ronch, Nathalie Bartoli, and Sébastien Defoort. Robust multidis- ciplinary analysis and optimization for conceptual design of flexible aircraft under dynamic aeroelastic constraints.Aerospace Science and Technology, 138:108349, 2023

  38. [38]

    Bayesian optimization for mixed variables using an adaptive dimension reduction process: applications to aircraft design

    Paul Saves, Nathalie Bartoli, Youssef Diouane, Thierry Lefebvre, Joseph Morlier, Christophe David, Eric Nguyen Van, and Sébastien Defoort. Bayesian optimization for mixed variables using an adaptive dimension reduction process: applications to aircraft design. InAIAA SciTech 2022 Forum, page 0082, 2022

  39. [39]

    Multidisciplinary design optimization with mixed categorical variables for aircraft design

    Paul Saves, Nathalie Bartoli, Youssef Diouane, Thierry Lefebvre, Joseph Morlier, Christophe David, Eric Nguyen Van, and Sébastien Defoort. Multidisciplinary design optimization with mixed categorical variables for aircraft design. InAIAA SCITECH 2022 Forum. American Institute of Aeronautics and Astronautics, January 2022

  40. [40]

    Openevolve: Open-source implementation of alphaevolve, 2025

    Asankhaya Sharma. Openevolve: Open-source implementation of alphaevolve, 2025. Accessed: 2026-05-07

  41. [41]

    John Hansman

    Peter Sharpe and R. John Hansman. Neuralfoil: An airfoil aerodynamics analysis tool using physics-informed machine learning, 2025

  42. [42]

    Sharpe.Accelerating Practical Engineering Design Optimization with Computational Graph Transformations

    Peter D. Sharpe.Accelerating Practical Engineering Design Optimization with Computational Graph Transformations. PhD thesis, Massachusetts Institute of Technology, 2024

  43. [43]

    PhD thesis, 2021

    Peter D Sharpe and R John Hansman.Aerosandbox: A differentiable framework for aircraft design optimization. PhD thesis, 2021

  44. [44]

    Graph neural network-guided aerodynamic shape optimization for conceptual design of supersonic transport wings

    Yiren Shen and Juan Alonso. Graph neural network-guided aerodynamic shape optimization for conceptual design of supersonic transport wings. InAIAA AVIATION FORUM AND ASCEND 2025, page 3228, 2025

  45. [45]

    V ortexnet: A graph neural network-based multi-fidelity surrogate model for field predictions

    Yiren Shen, Jacob T Needels, and Juan J Alonso. V ortexnet: A graph neural network-based multi-fidelity surrogate model for field predictions. InAIAA SciTech 2025 Forum, page 0494, 2025

  46. [46]

    Jones, and Faez Ahmed

    Nicholas Sung, Steven Spreizer, Mohamed Elrefaie, Matthew C. Jones, and Faez Ahmed. Blendednet++: A large-scale blended wing body aerodynamics dataset and benchmark, 2025

  47. [47]

    Jones, and Faez Ahmed

    Nicholas Sung, Steven Spreizer, Mohamed Elrefaie, Kaira Samuel, Matthew C. Jones, and Faez Ahmed. Blendednet: A blended wing body aircraft dataset and surrogate model for aerodynamic predictions. InVolume 3B: 51st Design Automation Conference (DAC), IDETC- CIE2025. American Society of Mechanical Engineers, August 2025

  48. [48]

    Neural-solver-library: A library for advanced neural pde solvers

    Tsinghua University Machine Learning Group (THUML). Neural-solver-library: A library for advanced neural pde solvers. https://github.com/thuml/Neural-Solver-Library, 2025. Last accessed: 2025-09-29

  49. [49]

    On the implementation of an interior-point filter line-search algorithm for large-scale nonlinear programming.Mathematical programming, 106(1):25–57, 2006

    Andreas Wächter and Lorenz T Biegler. On the implementation of an interior-point filter line-search algorithm for large-scale nonlinear programming.Mathematical programming, 106(1):25–57, 2006

  50. [50]

    No free lunch theorems for optimization.IEEE transactions on evolutionary computation, 1(1):67–82, 2002

    David H Wolpert and William G Macready. No free lunch theorems for optimization.IEEE transactions on evolutionary computation, 1(1):67–82, 2002

  51. [51]

    Transolver: A fast transformer solver for pdes on general geometries

    Haixu Wu, Huakun Luo, Haowen Wang, Jianmin Wang, and Mingsheng Long. Transolver: A fast transformer solver for pdes on general geometries. InInternational Conference on Machine Learning, 2024. 12

  52. [52]

    Superwing: a comprehensive transonic wing dataset for data-driven aerodynamic design, 2025

    Yunjia Yang, Weishao Tang, Mengxin Liu, Nils Thuerey, Yufei Zhang, and Haixin Chen. Superwing: a comprehensive transonic wing dataset for data-driven aerodynamic design, 2025

  53. [53]

    Using large language models for parametric shape opti- mization, 2024

    Xinxin Zhang, Zhuoqun Xu, Guangpu Zhu, Chien Ming Jonathan Tay, Yongdong Cui, Boo Cheong Khoo, and Lailai Zhu. Using large language models for parametric shape opti- mization, 2024

  54. [54]

    Evo- lutionary optimization methods for high-dimensional expensive problems: A survey.IEEE/CAA Journal of Automatica Sinica, 11(5):1092–1105, 2024

    MengChu Zhou, Meiji Cui, Dian Xu, Shuwei Zhu, Ziyan Zhao, and Abdullah Abusorrah. Evo- lutionary optimization methods for high-dimensional expensive problems: A survey.IEEE/CAA Journal of Automatica Sinica, 11(5):1092–1105, 2024

  55. [55]

    Engibench: A benchmark for evaluating large language models on engineering problem solving, 2025

    Xiyuan Zhou, Xinlei Wang, Yirui He, Yang Wu, Ruixi Zou, Yuheng Cheng, Yulu Xie, Wenxuan Liu, Huan Zhao, Yan Xu, Jinjin Gu, and Junhua Zhao. Engibench: A benchmark for evaluating large language models on engineering problem solving, 2025

  56. [56]

    / scratch / ShapeEvolve

    Ciyou Zhu, Richard H Byrd, Peihuang Lu, and Jorge Nocedal. Algorithm 778: L-bfgs-b: Fortran subroutines for large-scale bound-constrained optimization.ACM Transactions on mathematical software (TOMS), 23(4):550–560, 1997. 13 Appendix Table of Contents A More Information on This Project 16 A.1 Licenses . . . . . . . . . . . . . . . . . . . . . . . . . . . ...

  57. [57]

    Transolver [51] is a neural operator architecture and one of the models used within DrivAerStar

    are the inclusion of vehicle features such as engine bays, cooling systems, and internal airflow as well as greater wind tunnel validation accuracy errors of ∼1 % compared to the >5 % typical values for the previous studies. Transolver [51] is a neural operator architecture and one of the models used within DrivAerStar. In validation tests, the Transolver...

  58. [58]

    FFD-based geometric morphing in Blender

  59. [59]

    Yehudi break

    Surrogate evaluation with Transolver 30 The surrogate achieves a total mean absolute percentage error (MAPE) of 2.422%, with per-style MAPEs of2.633% for E,2.195% for F, and2.437% for N. C.4NeuralFoil NeuralFoil [41, 42], when combined with the extension AeroSandbox, is a surrogate tool for rapid analysis of airfoils that can provide the aerodynamics for ...