arxiv: 2605.11540 · v1 · submitted 2026-05-12 · 📊 stat.ME

Recognition: no theorem link

The design of selection experiments using a model-based approach

Alison B Smith, Brian R Cullis, David Butler, David GD Hughes

Pith reviewed 2026-05-13 01:28 UTC · model grok-4.3

classification 📊 stat.ME

keywords design of experimentslinear mixed modelsselection experimentsgenotype by environment interactionoptimal designplant breedingmulti-environment trialsselection accuracy

0 comments

The pith

A model-based approach builds optimal designs for selection experiments by aligning layouts with the linear mixed model used for analysis.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper develops a method to create designs for plant breeding selection trials that are optimal or near-optimal under a linear mixed model incorporating genotype-by-environment interactions, genetic relatedness, and other variation sources. This model is chosen to match the one that will later analyze the data, with an extra step that first decides how many plots each genotype receives before assigning them to positions. The goal is to improve selection accuracy and genetic gain when resources are limited. Examples cover single- and multi-environment trials, and simulations compare the new designs against conventional ones. A sympathetic reader would care because breeding programs spend years and land on trials; better designs could extract more reliable variety rankings from the same effort.

Core claim

We present an approach for constructing designs for selection experiments which are optimal or near optimal against a robust and sensible linear mixed model. This model reflects the models used for analysis. The approach is flexible and introduces an additional step to accommodate efficient resource allocation of replication status to genotypes, which is undertaken prior to the allocation of plots to genotypes.

What carries the argument

A linear mixed model for genotype-by-environment effects and genetic relatedness that is used both to optimize the experimental layout and to perform the subsequent analysis.

If this is right

Designs constructed to match the analysis model improve selection accuracy compared with designs that ignore that model.
Allocating replication numbers to genotypes before assigning plots to locations allows more efficient use of limited plot resources.
The same framework applies to both single-environment and multi-environment selection experiments.
In-silico simulations show measurable gains in accuracy under realistic variance structures for genotype-by-environment interaction.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

If the assumed model is only approximately correct, the resulting designs may still outperform random or traditional layouts because they incorporate relatedness and interaction structure.
The replication-allocation step could be extended to incorporate costs or constraints that vary by genotype or environment.
Integration with genomic relationship matrices instead of pedigree-based relatedness would be a direct next use of the same machinery.

Load-bearing premise

The linear mixed model used for design optimization correctly represents the main sources of variation and relatedness that will appear in the actual trial data.

What would settle it

A field trial or simulation in which the model-based design produces lower selection accuracy than a conventional design when the true variance structure differs from the one assumed during design construction.

read the original abstract

Plant breeding programs use data obtained from multi-environment selection experiments to produce improved varieties with the ultimate aim of maintaining high levels of genetic gain. Selection accuracy can be improved with the use of advanced statistical analytical methods that use informative and parsimonious variance models for the set of genotype by environment interaction effects, include information on genetic relatedness and appropriately accommodate non-genetic sources of variation within the framework of a single step estimation and prediction algorithm. Maximal gains from using these advanced techniques are more likely to be achieved if the designs used match the aims of the selection experiment and make full use of the available resources. In this paper we present an approach for constructing designs for selection experiments which are optimal or near optimal against a robust and sensible linear mixed model. This model reflects the models used for analysis. The approach is flexible and introduces an additional step to accommodate efficient resource allocation of replication status to genotypes, which is undertaken prior to the allocation of plots to genotypes. A motivating example is used to illustrate the approach, two illustrative examples are presented one each for single and multiple environment selection experiments and several in-silico simulation studies are used to demonstrate the advantages of these approaches.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper gives a clear, usable method for building selection trial designs that match the LMM analysis model, with a useful pre-step for deciding replication, but the simulations stay inside the assumed model and do not check performance when that model is off.

read the letter

The core contribution here is a practical extension of model-based optimal design to plant breeding selection experiments. It adds an explicit step that first assigns replication status to genotypes before allocating plots, then optimizes the layout against a linear mixed model that includes GxE, genetic relatedness, and non-genetic effects. That matches what breeders actually analyze, so the designs are more likely to deliver the accuracy the analysis promises. The single- and multi-environment examples plus the in-silico simulations show the approach in action and give some evidence that it improves selection accuracy when the working model is correct. That is useful work for the subfield; the method is flexible enough to handle realistic constraints on resources and plot numbers. The motivating example helps ground it in a real breeding context. The simulations are the main support for the claim of advantage, and they appear to be run cleanly under the design model. The paper does not claim robustness beyond that, which keeps the claims proportionate. The main soft spot is exactly the one the stress-test flags: all the reported gains assume the variance structure (GxE correlations, additive matrix, etc.) is known and correct. In practice those parameters are estimated and often misspecified to some degree, and selection accuracy is sensitive to that. Without at least one set of simulations or real-data checks where the true structure differs from the assumed one, it is hard to know how much of the reported gain survives in the field. The abstract and examples do not address this, so the translation from optimality under the model to practical gain remains the least secure part. This is a targeted methods paper aimed at quantitative geneticists and plant breeders who already use LMMs for analysis. Readers in that group will find the replication-allocation step and the worked examples directly usable. It is not a broad theoretical advance, but it solves a concrete design problem that matters in breeding programs. The work is coherent on its own terms and shows honest engagement with the practical constraints. I would send it for peer review; the method is worth referee scrutiny even if the robustness question needs more attention in revision.

Referee Report

2 major / 2 minor

Summary. The paper proposes a model-based approach to construct optimal or near-optimal designs for plant breeding selection experiments. Designs are optimized against a linear mixed model (LMM) that incorporates genotype-by-environment (GxE) interactions, genetic relatedness, and non-genetic effects, matching the models used in subsequent analysis. The method adds a preliminary step for efficient replication allocation to genotypes before plot assignment. It is illustrated via a motivating example, single- and multi-environment cases, and in-silico simulations that compare the proposed designs to standard alternatives.

Significance. If the central claim holds, the work offers a practical extension of optimal design theory to modern LMMs employed in breeding, with the replication-allocation step providing a useful operational feature. The simulations under the design model provide initial evidence of improved selection accuracy, but the absence of misspecification tests limits the assessed impact on real programs where variance structures are uncertain.

major comments (2)

[in-silico simulation studies] The in-silico simulation studies (described after the illustrative examples) generate data exclusively from the same LMM used for design optimization, including the assumed GxE covariance and additive relationship matrix. No results are shown for cases where the true data-generating process differs (e.g., altered GxE correlations or misspecified relatedness), which directly undermines the translation from model-based optimality to improved selection accuracy under realistic conditions.
[illustrative examples] The optimality criterion employed for the multi-environment design (Section on illustrative examples) is not stated explicitly (e.g., A-optimality for prediction error variance of breeding values versus a custom selection-accuracy metric). Without this, it is unclear how the reported designs achieve near-optimality or how sensitive the results are to the chosen criterion.

minor comments (2)

The abstract would be strengthened by naming the specific optimality criterion and briefly quantifying the simulation gains (e.g., relative improvement in selection accuracy).
Notation for the LMM variance components (G, R, etc.) should be introduced once in the methods and used consistently; currently some parameters appear only in the examples.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their thorough review and constructive comments. We address each major comment below and indicate the revisions made to strengthen the manuscript.

read point-by-point responses

Referee: The in-silico simulation studies (described after the illustrative examples) generate data exclusively from the same LMM used for design optimization, including the assumed GxE covariance and additive relationship matrix. No results are shown for cases where the true data-generating process differs (e.g., altered GxE correlations or misspecified relatedness), which directly undermines the translation from model-based optimality to improved selection accuracy under realistic conditions.

Authors: The simulations were conducted under the design model to demonstrate the gains achievable when the model assumptions hold, which aligns with standard practice for assessing model-based optimal designs. We agree that evaluations under misspecification would better inform real-world performance. Due to the high computational demands of the optimization and simulation workflow, such analyses could not be completed for this revision. We have added a discussion paragraph noting this limitation and identifying robustness checks as future work. revision: partial
Referee: The optimality criterion employed for the multi-environment design (Section on illustrative examples) is not stated explicitly (e.g., A-optimality for prediction error variance of breeding values versus a custom selection-accuracy metric). Without this, it is unclear how the reported designs achieve near-optimality or how sensitive the results are to the chosen criterion.

Authors: We appreciate the referee highlighting this lack of clarity. The multi-environment designs were constructed using the A-optimality criterion on the average prediction error variance of the breeding values, as defined in the general methodology. We have revised the illustrative examples section to state the criterion explicitly and to clarify its connection to selection accuracy. revision: yes

Circularity Check

0 steps flagged

No circularity: model-based design optimization is independent of analysis data

full rationale

The paper's central contribution is a method to construct experimental designs that are optimal or near-optimal with respect to a pre-specified linear mixed model (including GxE, relatedness, and non-genetic terms) that will later be used for analysis. This is a standard, non-circular workflow in optimal design theory: the design criterion is defined from the model structure before any data are collected, and the motivating examples, illustrative cases, and in-silico simulations simply evaluate performance under that same model. No equation or step reduces a claimed prediction or optimality result to a fitted quantity defined from the same data, nor does any load-bearing premise rest on a self-citation chain or imported uniqueness theorem. The derivation chain is therefore self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

1 free parameters · 2 axioms · 0 invented entities

The central claim rests on the premise that optimizing experimental designs against a linear mixed model that matches the intended analysis will improve selection accuracy. No new entities are introduced. The approach assumes standard properties of linear mixed models hold in the breeding context.

free parameters (1)

variance components and covariance parameters of the linear mixed model
These parameters define the model against which designs are optimized and must be specified or estimated; they are inputs to the design construction.

axioms (2)

domain assumption The linear mixed model used for design optimization accurately represents the true genotype-by-environment interactions, genetic relatedness, and non-genetic sources of variation in the experiments.
Invoked when stating that the design model 'reflects the models used for analysis' and will lead to maximal gains.
standard math Standard assumptions of linear mixed models (normality, independence of residuals given the random effects structure) hold sufficiently for the optimality criterion to be meaningful.
Implicit in any linear mixed model optimization for design.

pith-pipeline@v0.9.0 · 5501 in / 1574 out tokens · 71192 ms · 2026-05-13T01:28:53.231179+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

44 extracted references · 44 canonical work pages

[1]

(2019, March)

Arief, V.N., Desmae, H., Hardner, C., DeLacy, I.H., Gilmour, A., Bull, J.K., Basford, K.E. (2019, March). Utilization of Multiyear Plant Breeding Data to Better Predict Genotype Performance.Crop Science,59(2), 480–490, https://doi.org/ 10.2135/cropsci2018.03.0182

work page doi:10.2135/cropsci2018.03.0182 2019
[2]

Hobson, K

Asif, M.A., Bithell, S.L., Pirathiban, R., Cullis, B.R., Hughes, D.G.D., McGarty, A., . . . Hobson, K. (2023, December). Rapid and High Throughput Hydroponics Phenotyping Method for Evaluating Chickpea Resistance to Phytophthora Root Rot.Plants,12(23), 4069, https://doi.org/10.3390/plants12234069

work page doi:10.3390/plants12234069 2023
[3]

(2008).Design of comparative experiments

Bailey, R. (2008).Design of comparative experiments. Cambridge: Cambridge University Press. 22

work page 2008
[4]

Castro, A.J

Bhatta, M., Gutierrez, L., Cammarota, L., Cardozo, F., Germ´ an, S., G´ omez-Guerrero, B., . . . Castro, A.J. (2020, March). Multi-trait Genomic Prediction Model Increased the Predictive Ability for Agronomic and Malting Quality Traits in Barley (Hordeum vulgareL.).G3 Genes|Genomes|Genetics,10(3), 1113–1124, https://doi.org/10.1534/g3.119.400968

work page doi:10.1534/g3.119.400968 2020
[5]

Brien, C.J., & Dem´ etrio, C.G. (2009). Formulating mixed models for experiments, including longitudinal experiments.Journal of Agricultural, Biological, and Environmental Statistics,14(3), 253–280, https://doi.org/10.1198/jabes.2009 .08001 Bueno Filho, J.S.D.S., & Gilmour, S.G. (2007). Block designs for random treatment effects.Journal of Statistical Pla...

work page doi:10.1198/jabes.2009 2009
[6]

(2013).On The Optimal Design of Experiments under the Linear Mixed Model(Unpublished doctoral dissertation)

Butler, D. (2013).On The Optimal Design of Experiments under the Linear Mixed Model(Unpublished doctoral dissertation). The University of Queensland

work page 2013
[7]

(2018).Optimal Design under the Linear Mixed Model(Tech

Butler, D., & Cullis, B. (2018).Optimal Design under the Linear Mixed Model(Tech. Rep.). Wollongong: Unversity of Wollongong

work page 2018
[8]

Butler, D.G., Smith, A.B., Cullis, B.R. (2014). On the Design of Field Experi- ments with Correlated Treatment Effects.Journal of Agricultural, Biological, and Environmental Statistics,19(4), 539–555, https://doi.org/10.1007/s13253 -014-0191-0

work page doi:10.1007/s13253 2014
[9]

(1999).The Design of Field Experiments When the Data are Spa- tially Correlated.(Unpublished doctoral dissertation)

Chan, B.S.P. (1999).The Design of Field Experiments When the Data are Spa- tially Correlated.(Unpublished doctoral dissertation). Unversity of Queensland, Briasbane

work page 1999
[10]

(2002).The reactive TABU search for efficient correlated experimental designs.(Unpublished doctoral dissertation)

Coombes, N.E. (2002).The reactive TABU search for efficient correlated experimental designs.(Unpublished doctoral dissertation). Liverpool John Moores

work page 2002
[11]

(2009).DiGGeR, a Spatial Design Program(Biometric Bulletin)

Coombes, N.E. (2009).DiGGeR, a Spatial Design Program(Biometric Bulletin). NSW DPI

work page 2009
[12]

(2020, December)

Cullis, B.R., Smith, A.B., Cocks, N.A., Butler, D.G. (2020, December). The Design of Early-Stage Plant Breeding Trials Using Genetic Relatedness.Journal of

work page 2020
[13]

Agricultural, Biological and Environmental Statistics,25(4), 553–578, https:// doi.org/10.1007/s13253-020-00403-5 23

work page doi:10.1007/s13253-020-00403-5
[14]

Cullis, B.R., Smith, A.B., Coombes, N.E. (2006). On the design of early genera- tion variety trials with correlated data.Journal of agricultural, biological, and environmental statistics,11(4), 381–393,

work page 2006
[15]

(2024, September)

Fairlie, W., Hughes, D., Cullis, B., Edwards, J., Kuchel, H. (2024, September). Genotype-by-environment interaction for wheat falling number performance due to late maturityA-amylase.Crop Science, csc2.21348, https://doi.org/10.1002/ csc2.21348

work page 2024
[16]

Gilmour, A.R., Cullis, B.R., Welham, S.J., Gogel, B.J., Thompson, R. (2004). An effi- cient computing strategy for prediction in mixed linear models.Computational Statistics and Data Analysis,44, 571–586,

work page 2004
[17]

Glover, F. (1989). Tabu Search Part I.ORSA Journal of Computing,1(3), 190–207,

work page 1989
[18]

Goddard, M.E., Hayes, B.J., Meuwissen, T.H.E. (2011). Using the genomic relation- ship matrix to predict the accuracy of genomic selection.Journal of Animal Breeding and Genetics,128(6), 409–421, https://doi.org/10.1111/j.1439-0388 .2011.00964.x

work page doi:10.1111/j.1439-0388 2011
[19]

Heslot, N., Akdemir, D., Sorrells, M.E., Jannink, J.L. (2014). Integrating environ- mental covariates and crop modeling into the genomic selection framework to predict genotype by environment interactions.Theoretical and Applied Genetics, 127(2), 463–480, https://doi.org/10.1007/s00122-013-2231-5 Jarqu´ ın, D., Crossa, J., Lacaze, X., Du Cheyron, P., Dauc...

work page doi:10.1007/s00122-013-2231-5 2014
[20]

Campos, G. (2014). A reaction norm model for genomic selection using high- dimensional genomic and environmental data.Theoretical and Applied Genetics, 127, 595–607,

work page 2014
[21]

(1995).Cyclic and Computer Generated Designs.(2nd ed.)

John, J.A., & Williams, E.R. (1995).Cyclic and Computer Generated Designs.(2nd ed.). Chapman and Hall, London

work page 1995
[22]

Kempton, R.A. (1982). The design and analysis of unreplicated field trials.Vortrage fur Pflanzenzuchtung,7, 219–242,

work page 1982
[23]

Lisle, C., Smith, A.B., Birrell, C.L., Cullis, B. (2021). Information Based Diagnos- tic for Genetic Variance Parameter Estimation in Multi-Environment Trials. Frontiers in Plant Science,12, 16, https://doi.org/10.3389/fpls.2021.785430 24

work page doi:10.3389/fpls.2021.785430 2021
[24]

Martin, R. (1986). On the design of experiments under spatial correlation.Biometrika, 73, 247–277,

work page 1986
[25]

(1997).Construction of optimal and near-optimal designs for dependent observations using simulated annealing.(Tech

Martin, R., & Eccleston, J. (1997).Construction of optimal and near-optimal designs for dependent observations using simulated annealing.(Tech. Rep.). Dept. Prob

work page 1997
[26]

Martin, R.J., Chauhan, N., Eccleston, J.A., Chan, B.S.P. (2006). Efficient experi- mental designs when most treatments are unreplicated.Linear Algebra and Its Applications,417, 163–182, https://doi.org/10.1016/j.laa.2006.02.009

work page doi:10.1016/j.laa.2006.02.009 2006
[27]

Martin, R.J., & Eccleston, J.A. (1991). Efficient block designs for correlated observations.Australian Journal of Statistics,33(3), 299–311,

work page 1991
[28]

Meuwissen, T.H. (2012). The accuracy of genomic selection.XV Meeting of the EUCARPIA Section - Biometrics in Plant Breeding(p. 24). Stuttgart – Hohenheim: Eucarpia

work page 2012
[29]

(2018, December)

Meyer, K., Tier, B., Swan, A. (2018, December). Estimates of genetic trend for single- step genomic evaluations.Genetics Selection Evolution,50(1), 39, https:// doi.org/10.1186/s12711-018-0410-1

work page doi:10.1186/s12711-018-0410-1 2018
[30]

Nguyen, N.-K., & Williams, E.R. (1993). An algorithm for constructing optimal resolvable row-columns designs.Australian & New Zealand Journal of Statistics, 35, 363–370,

work page 1993
[31]

Norman, A., Taylor, J., Tanaka, E., Telfer, P., Edwards, J., Martinant, J.P., Kuchel, H. (2017). Increased genomic prediction accuracy in wheat breeding using a large Australian panel.Theoretical and Applied Genetics,130(12), 2543–2555, https://doi.org/10.1007/s00122-017-2975-4

work page doi:10.1007/s00122-017-2975-4 2017
[32]

Oakey, H., Cullis, B., Thompson, R., Comadran, J., Halpin, C., Waugh, R. (2016). Genomic selection in multi-environment crop trials.G3: Genes, Genomes, Genetics,6(5), 1313–1326, https://doi.org/10.1534/g3.116.027524

work page doi:10.1534/g3.116.027524 2016
[33]

Oakey, H., Verbyla, A.P., Cullis, B.R., Pitchford, W.S., Kuchel, H. (2006). Joint modeling of additive and non-additive genetic line effects in single field trials. Theoretical and Applied Genetics,113(5), 809–819, 25

work page 2006
[34]

Oakey, H., Verbyla, A.P., Cullis, B.R., Wei, X., Pitchford, W.S. (2007). Joint mod- elling of additive and non-additive (genetic line) effects in multi-environment trials.Theoretical and Applied Genetics,114, 1319–1332,

work page 2007
[35]

Piepho, H.-P., & Williams, E.R. (2006). A comparison of experimental designs for selection in breeding trials with nested treatment structure.Theoretical Applied Genetics.,113, 1505–1513,

work page 2006
[36]

Robinson, G. (1991). That BLUP is a good thing: Estimation of random effects. Statistical Science,6(1), 15–51,

work page 1991
[37]

Korzun, V

Schmidt, M., Kollers, S., Maasberg-Prelle, A., Großer, J., Schinkel, B., Tomerius, A., . . . Korzun, V. (2016, February). Prediction of malting quality traits in barley based on genome-wide marker data to assess the potential of genomic selection. Theoretical and Applied Genetics,129(2), 203–213, https://doi.org/10.1007/ s00122-015-2639-1

work page 2016
[38]

Smith, A., Ganesalingam, A., Lisle, C., Kadkol, G., Hobson, K., Cullis, B. (2021). Use of Contemporary Groups in the Construction of Multi-Environment Trial Datasets for Selection in Plant Breeding Programs.Frontiers in Plant Science, 11(February), 1–13, https://doi.org/10.3389/fpls.2020.623586

work page doi:10.3389/fpls.2020.623586 2021
[39]

Smith, A., Norman, A., Kuchel, H., Cullis, B. (2021). Plant variety selection using interaction classes derived from Factor Analytic Linear Mixed Models : Models with independent variety effects .Frontiers in Plant Science,12, , https:// doi.org/10.3389/fpls.2021.737462

work page doi:10.3389/fpls.2021.737462 2021
[40]

(2018).Design Tableau: An Aid to Specifying the Linear Mixed Model for a Comparative Experiment.(Tech

Smith, A.B., & Cullis, B.R. (2018).Design Tableau: An Aid to Specifying the Linear Mixed Model for a Comparative Experiment.(Tech. Rep. No. 5-18). University of Wollongong

work page 2018
[41]

Tolhurst, D.J., Mathews, K.L., Smith, A.B., Cullis, B.R. (2019). Genomic selection in multi-environment plant breeding trials using a factor analytic linear mixed model.Journal of Animal Breeding and Genetics,136(4), 279–300, 26

work page 2019
[42]

(2023, November)

Verbyla, A. (2023, November). On two-stage analysis of multi-environment trials. Euphytica,219(11), 121, https://doi.org/10.1007/s10681-023-03248-4

work page doi:10.1007/s10681-023-03248-4 2023
[43]

(2023, December)

Vo-Thanh, N., & Piepho, H.-P. (2023, December). Generating Designs for Compar- ative Experiments with Two Blocking Factors.Biometrics,79(4), 3574–3585, https://doi.org/10.1111/biom.13913

work page doi:10.1111/biom.13913 2023
[44]

Wilkinson, G.N., & Rogers, C.E. (1973). Symbolic description of factorial models for analysis of variance.Applied Statistics,22, 392–399, 27

work page 1973