pith. machine review for the scientific record. sign in

arxiv: 2605.11540 · v1 · submitted 2026-05-12 · 📊 stat.ME

Recognition: no theorem link

The design of selection experiments using a model-based approach

Alison B Smith, Brian R Cullis, David Butler, David GD Hughes

Pith reviewed 2026-05-13 01:28 UTC · model grok-4.3

classification 📊 stat.ME
keywords design of experimentslinear mixed modelsselection experimentsgenotype by environment interactionoptimal designplant breedingmulti-environment trialsselection accuracy
0
0 comments X

The pith

A model-based approach builds optimal designs for selection experiments by aligning layouts with the linear mixed model used for analysis.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper develops a method to create designs for plant breeding selection trials that are optimal or near-optimal under a linear mixed model incorporating genotype-by-environment interactions, genetic relatedness, and other variation sources. This model is chosen to match the one that will later analyze the data, with an extra step that first decides how many plots each genotype receives before assigning them to positions. The goal is to improve selection accuracy and genetic gain when resources are limited. Examples cover single- and multi-environment trials, and simulations compare the new designs against conventional ones. A sympathetic reader would care because breeding programs spend years and land on trials; better designs could extract more reliable variety rankings from the same effort.

Core claim

We present an approach for constructing designs for selection experiments which are optimal or near optimal against a robust and sensible linear mixed model. This model reflects the models used for analysis. The approach is flexible and introduces an additional step to accommodate efficient resource allocation of replication status to genotypes, which is undertaken prior to the allocation of plots to genotypes.

What carries the argument

A linear mixed model for genotype-by-environment effects and genetic relatedness that is used both to optimize the experimental layout and to perform the subsequent analysis.

If this is right

  • Designs constructed to match the analysis model improve selection accuracy compared with designs that ignore that model.
  • Allocating replication numbers to genotypes before assigning plots to locations allows more efficient use of limited plot resources.
  • The same framework applies to both single-environment and multi-environment selection experiments.
  • In-silico simulations show measurable gains in accuracy under realistic variance structures for genotype-by-environment interaction.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • If the assumed model is only approximately correct, the resulting designs may still outperform random or traditional layouts because they incorporate relatedness and interaction structure.
  • The replication-allocation step could be extended to incorporate costs or constraints that vary by genotype or environment.
  • Integration with genomic relationship matrices instead of pedigree-based relatedness would be a direct next use of the same machinery.

Load-bearing premise

The linear mixed model used for design optimization correctly represents the main sources of variation and relatedness that will appear in the actual trial data.

What would settle it

A field trial or simulation in which the model-based design produces lower selection accuracy than a conventional design when the true variance structure differs from the one assumed during design construction.

read the original abstract

Plant breeding programs use data obtained from multi-environment selection experiments to produce improved varieties with the ultimate aim of maintaining high levels of genetic gain. Selection accuracy can be improved with the use of advanced statistical analytical methods that use informative and parsimonious variance models for the set of genotype by environment interaction effects, include information on genetic relatedness and appropriately accommodate non-genetic sources of variation within the framework of a single step estimation and prediction algorithm. Maximal gains from using these advanced techniques are more likely to be achieved if the designs used match the aims of the selection experiment and make full use of the available resources. In this paper we present an approach for constructing designs for selection experiments which are optimal or near optimal against a robust and sensible linear mixed model. This model reflects the models used for analysis. The approach is flexible and introduces an additional step to accommodate efficient resource allocation of replication status to genotypes, which is undertaken prior to the allocation of plots to genotypes. A motivating example is used to illustrate the approach, two illustrative examples are presented one each for single and multiple environment selection experiments and several in-silico simulation studies are used to demonstrate the advantages of these approaches.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper proposes a model-based approach to construct optimal or near-optimal designs for plant breeding selection experiments. Designs are optimized against a linear mixed model (LMM) that incorporates genotype-by-environment (GxE) interactions, genetic relatedness, and non-genetic effects, matching the models used in subsequent analysis. The method adds a preliminary step for efficient replication allocation to genotypes before plot assignment. It is illustrated via a motivating example, single- and multi-environment cases, and in-silico simulations that compare the proposed designs to standard alternatives.

Significance. If the central claim holds, the work offers a practical extension of optimal design theory to modern LMMs employed in breeding, with the replication-allocation step providing a useful operational feature. The simulations under the design model provide initial evidence of improved selection accuracy, but the absence of misspecification tests limits the assessed impact on real programs where variance structures are uncertain.

major comments (2)
  1. [in-silico simulation studies] The in-silico simulation studies (described after the illustrative examples) generate data exclusively from the same LMM used for design optimization, including the assumed GxE covariance and additive relationship matrix. No results are shown for cases where the true data-generating process differs (e.g., altered GxE correlations or misspecified relatedness), which directly undermines the translation from model-based optimality to improved selection accuracy under realistic conditions.
  2. [illustrative examples] The optimality criterion employed for the multi-environment design (Section on illustrative examples) is not stated explicitly (e.g., A-optimality for prediction error variance of breeding values versus a custom selection-accuracy metric). Without this, it is unclear how the reported designs achieve near-optimality or how sensitive the results are to the chosen criterion.
minor comments (2)
  1. The abstract would be strengthened by naming the specific optimality criterion and briefly quantifying the simulation gains (e.g., relative improvement in selection accuracy).
  2. Notation for the LMM variance components (G, R, etc.) should be introduced once in the methods and used consistently; currently some parameters appear only in the examples.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their thorough review and constructive comments. We address each major comment below and indicate the revisions made to strengthen the manuscript.

read point-by-point responses
  1. Referee: The in-silico simulation studies (described after the illustrative examples) generate data exclusively from the same LMM used for design optimization, including the assumed GxE covariance and additive relationship matrix. No results are shown for cases where the true data-generating process differs (e.g., altered GxE correlations or misspecified relatedness), which directly undermines the translation from model-based optimality to improved selection accuracy under realistic conditions.

    Authors: The simulations were conducted under the design model to demonstrate the gains achievable when the model assumptions hold, which aligns with standard practice for assessing model-based optimal designs. We agree that evaluations under misspecification would better inform real-world performance. Due to the high computational demands of the optimization and simulation workflow, such analyses could not be completed for this revision. We have added a discussion paragraph noting this limitation and identifying robustness checks as future work. revision: partial

  2. Referee: The optimality criterion employed for the multi-environment design (Section on illustrative examples) is not stated explicitly (e.g., A-optimality for prediction error variance of breeding values versus a custom selection-accuracy metric). Without this, it is unclear how the reported designs achieve near-optimality or how sensitive the results are to the chosen criterion.

    Authors: We appreciate the referee highlighting this lack of clarity. The multi-environment designs were constructed using the A-optimality criterion on the average prediction error variance of the breeding values, as defined in the general methodology. We have revised the illustrative examples section to state the criterion explicitly and to clarify its connection to selection accuracy. revision: yes

Circularity Check

0 steps flagged

No circularity: model-based design optimization is independent of analysis data

full rationale

The paper's central contribution is a method to construct experimental designs that are optimal or near-optimal with respect to a pre-specified linear mixed model (including GxE, relatedness, and non-genetic terms) that will later be used for analysis. This is a standard, non-circular workflow in optimal design theory: the design criterion is defined from the model structure before any data are collected, and the motivating examples, illustrative cases, and in-silico simulations simply evaluate performance under that same model. No equation or step reduces a claimed prediction or optimality result to a fitted quantity defined from the same data, nor does any load-bearing premise rest on a self-citation chain or imported uniqueness theorem. The derivation chain is therefore self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

1 free parameters · 2 axioms · 0 invented entities

The central claim rests on the premise that optimizing experimental designs against a linear mixed model that matches the intended analysis will improve selection accuracy. No new entities are introduced. The approach assumes standard properties of linear mixed models hold in the breeding context.

free parameters (1)
  • variance components and covariance parameters of the linear mixed model
    These parameters define the model against which designs are optimized and must be specified or estimated; they are inputs to the design construction.
axioms (2)
  • domain assumption The linear mixed model used for design optimization accurately represents the true genotype-by-environment interactions, genetic relatedness, and non-genetic sources of variation in the experiments.
    Invoked when stating that the design model 'reflects the models used for analysis' and will lead to maximal gains.
  • standard math Standard assumptions of linear mixed models (normality, independence of residuals given the random effects structure) hold sufficiently for the optimality criterion to be meaningful.
    Implicit in any linear mixed model optimization for design.

pith-pipeline@v0.9.0 · 5501 in / 1574 out tokens · 71192 ms · 2026-05-13T01:28:53.231179+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

44 extracted references · 44 canonical work pages

  1. [1]

    (2019, March)

    Arief, V.N., Desmae, H., Hardner, C., DeLacy, I.H., Gilmour, A., Bull, J.K., Basford, K.E. (2019, March). Utilization of Multiyear Plant Breeding Data to Better Predict Genotype Performance.Crop Science,59(2), 480–490, https://doi.org/ 10.2135/cropsci2018.03.0182

  2. [2]

    Hobson, K

    Asif, M.A., Bithell, S.L., Pirathiban, R., Cullis, B.R., Hughes, D.G.D., McGarty, A., . . . Hobson, K. (2023, December). Rapid and High Throughput Hydroponics Phenotyping Method for Evaluating Chickpea Resistance to Phytophthora Root Rot.Plants,12(23), 4069, https://doi.org/10.3390/plants12234069

  3. [3]

    (2008).Design of comparative experiments

    Bailey, R. (2008).Design of comparative experiments. Cambridge: Cambridge University Press. 22

  4. [4]

    Castro, A.J

    Bhatta, M., Gutierrez, L., Cammarota, L., Cardozo, F., Germ´ an, S., G´ omez-Guerrero, B., . . . Castro, A.J. (2020, March). Multi-trait Genomic Prediction Model Increased the Predictive Ability for Agronomic and Malting Quality Traits in Barley (Hordeum vulgareL.).G3 Genes|Genomes|Genetics,10(3), 1113–1124, https://doi.org/10.1534/g3.119.400968

  5. [5]

    Brien, C.J., & Dem´ etrio, C.G. (2009). Formulating mixed models for experiments, including longitudinal experiments.Journal of Agricultural, Biological, and Environmental Statistics,14(3), 253–280, https://doi.org/10.1198/jabes.2009 .08001 Bueno Filho, J.S.D.S., & Gilmour, S.G. (2007). Block designs for random treatment effects.Journal of Statistical Pla...

  6. [6]

    (2013).On The Optimal Design of Experiments under the Linear Mixed Model(Unpublished doctoral dissertation)

    Butler, D. (2013).On The Optimal Design of Experiments under the Linear Mixed Model(Unpublished doctoral dissertation). The University of Queensland

  7. [7]

    (2018).Optimal Design under the Linear Mixed Model(Tech

    Butler, D., & Cullis, B. (2018).Optimal Design under the Linear Mixed Model(Tech. Rep.). Wollongong: Unversity of Wollongong

  8. [8]

    Butler, D.G., Smith, A.B., Cullis, B.R. (2014). On the Design of Field Experi- ments with Correlated Treatment Effects.Journal of Agricultural, Biological, and Environmental Statistics,19(4), 539–555, https://doi.org/10.1007/s13253 -014-0191-0

  9. [9]

    (1999).The Design of Field Experiments When the Data are Spa- tially Correlated.(Unpublished doctoral dissertation)

    Chan, B.S.P. (1999).The Design of Field Experiments When the Data are Spa- tially Correlated.(Unpublished doctoral dissertation). Unversity of Queensland, Briasbane

  10. [10]

    (2002).The reactive TABU search for efficient correlated experimental designs.(Unpublished doctoral dissertation)

    Coombes, N.E. (2002).The reactive TABU search for efficient correlated experimental designs.(Unpublished doctoral dissertation). Liverpool John Moores

  11. [11]

    (2009).DiGGeR, a Spatial Design Program(Biometric Bulletin)

    Coombes, N.E. (2009).DiGGeR, a Spatial Design Program(Biometric Bulletin). NSW DPI

  12. [12]

    (2020, December)

    Cullis, B.R., Smith, A.B., Cocks, N.A., Butler, D.G. (2020, December). The Design of Early-Stage Plant Breeding Trials Using Genetic Relatedness.Journal of

  13. [13]

    Agricultural, Biological and Environmental Statistics,25(4), 553–578, https:// doi.org/10.1007/s13253-020-00403-5 23

  14. [14]

    Cullis, B.R., Smith, A.B., Coombes, N.E. (2006). On the design of early genera- tion variety trials with correlated data.Journal of agricultural, biological, and environmental statistics,11(4), 381–393,

  15. [15]

    (2024, September)

    Fairlie, W., Hughes, D., Cullis, B., Edwards, J., Kuchel, H. (2024, September). Genotype-by-environment interaction for wheat falling number performance due to late maturityA-amylase.Crop Science, csc2.21348, https://doi.org/10.1002/ csc2.21348

  16. [16]

    Gilmour, A.R., Cullis, B.R., Welham, S.J., Gogel, B.J., Thompson, R. (2004). An effi- cient computing strategy for prediction in mixed linear models.Computational Statistics and Data Analysis,44, 571–586,

  17. [17]

    Glover, F. (1989). Tabu Search Part I.ORSA Journal of Computing,1(3), 190–207,

  18. [18]

    Goddard, M.E., Hayes, B.J., Meuwissen, T.H.E. (2011). Using the genomic relation- ship matrix to predict the accuracy of genomic selection.Journal of Animal Breeding and Genetics,128(6), 409–421, https://doi.org/10.1111/j.1439-0388 .2011.00964.x

  19. [19]

    Heslot, N., Akdemir, D., Sorrells, M.E., Jannink, J.L. (2014). Integrating environ- mental covariates and crop modeling into the genomic selection framework to predict genotype by environment interactions.Theoretical and Applied Genetics, 127(2), 463–480, https://doi.org/10.1007/s00122-013-2231-5 Jarqu´ ın, D., Crossa, J., Lacaze, X., Du Cheyron, P., Dauc...

  20. [20]

    Campos, G. (2014). A reaction norm model for genomic selection using high- dimensional genomic and environmental data.Theoretical and Applied Genetics, 127, 595–607,

  21. [21]

    (1995).Cyclic and Computer Generated Designs.(2nd ed.)

    John, J.A., & Williams, E.R. (1995).Cyclic and Computer Generated Designs.(2nd ed.). Chapman and Hall, London

  22. [22]

    Kempton, R.A. (1982). The design and analysis of unreplicated field trials.Vortrage fur Pflanzenzuchtung,7, 219–242,

  23. [23]

    Lisle, C., Smith, A.B., Birrell, C.L., Cullis, B. (2021). Information Based Diagnos- tic for Genetic Variance Parameter Estimation in Multi-Environment Trials. Frontiers in Plant Science,12, 16, https://doi.org/10.3389/fpls.2021.785430 24

  24. [24]

    Martin, R. (1986). On the design of experiments under spatial correlation.Biometrika, 73, 247–277,

  25. [25]

    (1997).Construction of optimal and near-optimal designs for dependent observations using simulated annealing.(Tech

    Martin, R., & Eccleston, J. (1997).Construction of optimal and near-optimal designs for dependent observations using simulated annealing.(Tech. Rep.). Dept. Prob

  26. [26]

    Martin, R.J., Chauhan, N., Eccleston, J.A., Chan, B.S.P. (2006). Efficient experi- mental designs when most treatments are unreplicated.Linear Algebra and Its Applications,417, 163–182, https://doi.org/10.1016/j.laa.2006.02.009

  27. [27]

    Martin, R.J., & Eccleston, J.A. (1991). Efficient block designs for correlated observations.Australian Journal of Statistics,33(3), 299–311,

  28. [28]

    Meuwissen, T.H. (2012). The accuracy of genomic selection.XV Meeting of the EUCARPIA Section - Biometrics in Plant Breeding(p. 24). Stuttgart – Hohenheim: Eucarpia

  29. [29]

    (2018, December)

    Meyer, K., Tier, B., Swan, A. (2018, December). Estimates of genetic trend for single- step genomic evaluations.Genetics Selection Evolution,50(1), 39, https:// doi.org/10.1186/s12711-018-0410-1

  30. [30]

    Nguyen, N.-K., & Williams, E.R. (1993). An algorithm for constructing optimal resolvable row-columns designs.Australian & New Zealand Journal of Statistics, 35, 363–370,

  31. [31]

    Norman, A., Taylor, J., Tanaka, E., Telfer, P., Edwards, J., Martinant, J.P., Kuchel, H. (2017). Increased genomic prediction accuracy in wheat breeding using a large Australian panel.Theoretical and Applied Genetics,130(12), 2543–2555, https://doi.org/10.1007/s00122-017-2975-4

  32. [32]

    Oakey, H., Cullis, B., Thompson, R., Comadran, J., Halpin, C., Waugh, R. (2016). Genomic selection in multi-environment crop trials.G3: Genes, Genomes, Genetics,6(5), 1313–1326, https://doi.org/10.1534/g3.116.027524

  33. [33]

    Oakey, H., Verbyla, A.P., Cullis, B.R., Pitchford, W.S., Kuchel, H. (2006). Joint modeling of additive and non-additive genetic line effects in single field trials. Theoretical and Applied Genetics,113(5), 809–819, 25

  34. [34]

    Oakey, H., Verbyla, A.P., Cullis, B.R., Wei, X., Pitchford, W.S. (2007). Joint mod- elling of additive and non-additive (genetic line) effects in multi-environment trials.Theoretical and Applied Genetics,114, 1319–1332,

  35. [35]

    Piepho, H.-P., & Williams, E.R. (2006). A comparison of experimental designs for selection in breeding trials with nested treatment structure.Theoretical Applied Genetics.,113, 1505–1513,

  36. [36]

    Robinson, G. (1991). That BLUP is a good thing: Estimation of random effects. Statistical Science,6(1), 15–51,

  37. [37]

    Korzun, V

    Schmidt, M., Kollers, S., Maasberg-Prelle, A., Großer, J., Schinkel, B., Tomerius, A., . . . Korzun, V. (2016, February). Prediction of malting quality traits in barley based on genome-wide marker data to assess the potential of genomic selection. Theoretical and Applied Genetics,129(2), 203–213, https://doi.org/10.1007/ s00122-015-2639-1

  38. [38]

    Smith, A., Ganesalingam, A., Lisle, C., Kadkol, G., Hobson, K., Cullis, B. (2021). Use of Contemporary Groups in the Construction of Multi-Environment Trial Datasets for Selection in Plant Breeding Programs.Frontiers in Plant Science, 11(February), 1–13, https://doi.org/10.3389/fpls.2020.623586

  39. [39]

    Smith, A., Norman, A., Kuchel, H., Cullis, B. (2021). Plant variety selection using interaction classes derived from Factor Analytic Linear Mixed Models : Models with independent variety effects .Frontiers in Plant Science,12, , https:// doi.org/10.3389/fpls.2021.737462

  40. [40]

    (2018).Design Tableau: An Aid to Specifying the Linear Mixed Model for a Comparative Experiment.(Tech

    Smith, A.B., & Cullis, B.R. (2018).Design Tableau: An Aid to Specifying the Linear Mixed Model for a Comparative Experiment.(Tech. Rep. No. 5-18). University of Wollongong

  41. [41]

    Tolhurst, D.J., Mathews, K.L., Smith, A.B., Cullis, B.R. (2019). Genomic selection in multi-environment plant breeding trials using a factor analytic linear mixed model.Journal of Animal Breeding and Genetics,136(4), 279–300, 26

  42. [42]

    (2023, November)

    Verbyla, A. (2023, November). On two-stage analysis of multi-environment trials. Euphytica,219(11), 121, https://doi.org/10.1007/s10681-023-03248-4

  43. [43]

    (2023, December)

    Vo-Thanh, N., & Piepho, H.-P. (2023, December). Generating Designs for Compar- ative Experiments with Two Blocking Factors.Biometrics,79(4), 3574–3585, https://doi.org/10.1111/biom.13913

  44. [44]

    Wilkinson, G.N., & Rogers, C.E. (1973). Symbolic description of factorial models for analysis of variance.Applied Statistics,22, 392–399, 27