pith. machine review for the scientific record. sign in

arxiv: 2604.03028 · v1 · submitted 2026-04-03 · 🧬 q-bio.PE · q-bio.GN

Recognition: 1 theorem link

· Lean Theorem

Synonymous Codon Usage Bias Overrides Phylogeny to Reflect Convergent Frond Architecture in a Rapidly Radiating Fern Family Thelypteridaceae

Hanbin Yin, Haoliang Hu, Huan Li, Hui Shang, Jiangping Shu, Jun Yan, Kerui Huang, Lixuan Xiang, Ningyun Zhang, Peng Xie, Rongjie Huang, Wenyan Zhao, Xuan Tang, Yi Liu, Yulong Xiao, Yun Wang, Zui Yao

Pith reviewed 2026-05-13 18:41 UTC · model grok-4.3

classification 🧬 q-bio.PE q-bio.GN
keywords synonymous codon usage biasconvergent evolutionThelypteridaceaechloroplast geneslamina base architecturephotosynthesisfern phylogenyadaptive molecular convergence
0
0 comments X

The pith

Codon usage bias in Thelypteridaceae chloroplast genes clusters species by convergent lamina base architecture instead of phylogeny.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper establishes that patterns of synonymous codon usage in chloroplast genes of this fern family fail to match the established species tree. Instead the usage frequencies sort taxa into groups that align closely with a shared leaf-base shape that evolved independently in separate lineages. This sorting is traced to a small set of photosynthesis genes whose third codon positions differ systematically between the morphological types. The authors date the spread of the convergent trait to the early Neogene and argue that the codon signal reflects selection acting on the efficiency of protein synthesis rather than neutral sequence history. If the claim holds, codon usage offers a measurable molecular signature for detecting adaptive convergence that standard phylogenetic markers can miss.

Core claim

Chloroplast codon usage bias patterns in Thelypteridaceae are incongruent with the family phylogeny but partition species into clusters that correspond to convergently evolved lamina base architecture; the signal originates from type-specific third-position substitutions concentrated in the photosynthesis genes ndhJ, psaA, and psbD.

What carries the argument

Dimensionality reduction applied to chloroplast codon usage frequencies, correlated against morphological data and divergence times, which isolates a convergent molecular signal in photosynthesis genes.

If this is right

  • Codon usage bias can serve as a quantifiable indicator of adaptive history in rapidly radiating plant lineages.
  • Selection on a small subset of photosynthesis genes is sufficient to produce a convergent codon signal that overrides phylogenetic history.
  • Lamina base architecture in Thelypteridaceae radiated during the early Neogene under pressures that also shaped codon preferences.
  • Third-position substitutions in ndhJ, psaA, and psbD are the primary drivers of the observed morphological correlation.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same codon-reduction approach could be tested in other fern or angiosperm clades that display repeated leaf-shape convergence.
  • If codon bias is under selection for photosynthetic performance, environmental gradients in light or temperature should predict usage patterns across independent lineages.
  • Molecular phylogenies built from chloroplast data may need re-examination in groups where strong morphological convergence is already documented.

Load-bearing premise

The incongruence between codon usage and phylogeny is produced by selection on protein synthesis efficiency in photosynthesis genes rather than by mutational bias, GC content, or incomplete lineage sorting.

What would settle it

Sequencing the same chloroplast genes from additional Thelypteridaceae species and finding that the codon-usage clusters either dissolve or realign with the phylogenetic tree rather than with lamina base morphology would falsify the central claim.

Figures

Figures reproduced from arXiv: 2604.03028 by Hanbin Yin, Haoliang Hu, Huan Li, Hui Shang, Jiangping Shu, Jun Yan, Kerui Huang, Lixuan Xiang, Ningyun Zhang, Peng Xie, Rongjie Huang, Wenyan Zhao, Xuan Tang, Yi Liu, Yulong Xiao, Yun Wang, Zui Yao.

Figure 1
Figure 1. Figure 1: Chloroplast Genome Assembly, Annotation, and Morphology of Three Newly Sequenced Christella Species. This figure provides a comprehensive overview of the chloroplast genomes for (a) Christella acuminatus, (b) Christella parasiticus, and (c) Christella latipinnus. The top row presents Bandage assembly graphs that confirm the complete circular structure of each plastome, with annotations indicating contig le… view at source ↗
Figure 2
Figure 2. Figure 2: Phylogenetic Relationships within Thelypteridaceae Based on 31 Chloroplast Protein-Coding Genes. The Maximum Likelihood (ML) phylogeny was inferred from a concatenated alignment of 31 shared protein￾coding genes. Bootstrap support values are shown at the nodes. Two species from Woodsiaceae were used as outgroups. Calibration points F1, F2, and F3, used for divergence time analysis, are marked on the tree w… view at source ↗
Figure 4
Figure 4. Figure 4: UMAP Visualization of Codon Usage Bias in Comparison with Phylogenetic Relationships. (a) Simplified Phylogenetic tree; (b) Global UMAP visualization; (c) Focused UMAP visualization. Due to the observed inconsistencies between codon preference patterns in the global UMAP analysis and phylogenetic relationships, especially the conflicts evident in the core members of the Thelypteridoideae subfamily represen… view at source ↗
Figure 5
Figure 5. Figure 5: Divergence time estimation based on cp genome sequences. The divergence times are exhibited on each node, whereas the greed bars represent the 95% highest posterior density interval for each node age [PITH_FULL_IMAGE:figures/full_fig_p014_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Analysis of Evolutionary Forces Shaping Codon Usage Bias in Thelypteridoideae. (a) Effective Number of Codons (ENC) plot. (b) PR2-plot. (c) Neutrality plot (GC12 vs. GC3) [PITH_FULL_IMAGE:figures/full_fig_p016_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: The gene-by-gene UMAP clustering using UMAP Visualization of Codon Usage Bias among Thelypteridaceae species of Type A and Type C [PITH_FULL_IMAGE:figures/full_fig_p017_7.png] view at source ↗
Figure 8
Figure 8. Figure 8: Sequence Alignment result of Type-Specific Codon Bias Sites for genes shared by Type A and Type C species. This figure provides a quantitative analysis of the diagnostic nucleotide sites that differentiate Type A and Type C across key chloroplast genes. For each gene, the chart displays the absolute count (left axis) of total type-specific nucleotide sites (red bars) and the subset of those sites located a… view at source ↗
read the original abstract

Convergent evolution provides powerful evidence for natural selection, yet its molecular basis is typically sought in protein-coding amino acid substitutions. Whether adaptive pressures can drive the convergent evolution of synonymous codon usage bias (CUB) to override phylogenetic history remains a fundamental question. Here, we investigate this within the rapidly radiating fern family Thelypteridaceae by establishing a comparative framework that integrates chloroplast phylogenomics with dimensionality reduction of codon usage, morphological data, and divergence time estimation. Our results reveal that chloroplast CUB patterns are strikingly incongruent with the phylogeny of this family. Instead, they partition species into distinct clusters that strongly correlate with a convergently evolved morphological trait, lamina base architecture, a key adaptation whose radiation we date to the early Neogene. This convergent molecular signal is driven by a specific subset of photosynthesis-related genes (ndhJ, psaA, and psbD), which exhibit a high density of type-specific, third-position codon substitutions. These findings demonstrate that CUB can serve as a powerful, quantifiable indicator of adaptive history, revealing a cryptic layer of molecular convergence linked to the regulation of protein synthesis. Our work providing a new framework for uncovering adaptive histories obscured by complex evolutionary processes.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. The manuscript claims that in the fern family Thelypteridaceae, chloroplast codon usage bias (CUB) patterns are incongruent with the reconstructed phylogeny but instead form clusters that correlate strongly with convergent lamina base architecture. This signal is attributed to third-position substitutions in a subset of photosynthesis genes (ndhJ, psaA, psbD), dated to the early Neogene, and is interpreted as evidence that CUB can override phylogenetic history as a marker of adaptive convergence in protein synthesis regulation.

Significance. If the reported CUB-morphology correlation survives explicit controls for mutational bias and GC content, the result would establish synonymous codon usage as a quantifiable indicator of adaptive history in rapidly radiating lineages, extending the molecular basis of convergence beyond amino-acid changes and offering a new comparative framework for detecting cryptic selection on photosynthetic efficiency.

major comments (2)
  1. [Results (CUB clustering and gene-specific analysis)] The central claim that CUB clustering reflects adaptive pressure on protein synthesis (rather than mutational bias or GC variation) is load-bearing but unsupported by any reported regression of CUB metrics on GC content, partial correlation after bias correction, or comparison of CUB distances to phylogenetic distances; this omission directly undermines the causal interpretation in the abstract and results sections describing the ndhJ/psaA/psbD-driven clusters.
  2. [Methods (gene selection and dimensionality reduction)] The post-hoc identification of ndhJ, psaA, and psbD as the drivers of the morphology-correlated signal raises a multiple-testing concern; the manuscript does not demonstrate that the full chloroplast gene set fails to recover the same architecture-based clustering or provide an a priori criterion for restricting the analysis to these three genes.
minor comments (1)
  1. [Abstract] Abstract, final sentence: 'Our work providing a new framework' is grammatically incomplete and should read 'Our work provides a new framework'.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their constructive comments on our manuscript. These have prompted us to strengthen the statistical support for our claims and clarify methodological choices. We provide point-by-point responses below.

read point-by-point responses
  1. Referee: The central claim that CUB clustering reflects adaptive pressure on protein synthesis (rather than mutational bias or GC variation) is load-bearing but unsupported by any reported regression of CUB metrics on GC content, partial correlation after bias correction, or comparison of CUB distances to phylogenetic distances; this omission directly undermines the causal interpretation in the abstract and results sections describing the ndhJ/psaA/psbD-driven clusters.

    Authors: We agree that the absence of these controls weakens the interpretation. In the revision, we will add regressions of CUB metrics on GC content, partial correlations controlling for GC bias, and comparisons of CUB distances to phylogenetic distances (e.g., via Mantel tests). This will directly address the concern and support the adaptive interpretation of the clusters driven by ndhJ, psaA, and psbD. revision: yes

  2. Referee: The post-hoc identification of ndhJ, psaA, and psbD as the drivers of the morphology-correlated signal raises a multiple-testing concern; the manuscript does not demonstrate that the full chloroplast gene set fails to recover the same architecture-based clustering or provide an a priori criterion for restricting the analysis to these three genes.

    Authors: We recognize the validity of the multiple-testing concern arising from post-hoc gene selection. We will revise the methods and results to include a full analysis of all chloroplast genes, showing that the architecture-based clustering is not recovered in the complete set. We will also articulate an a priori rationale for prioritizing ndhJ, psaA, and psbD based on their photosynthetic functions and observed high rates of third-position substitutions in the dataset. revision: yes

Circularity Check

0 steps flagged

No significant circularity; CUB-morphology correlation treated as independent empirical observation

full rationale

The derivation integrates chloroplast phylogenomics, dimensionality reduction on codon usage tables, and separate morphological scoring of lamina architecture as distinct inputs. No equation reduces a 'prediction' to a fitted parameter by construction, no self-citation supplies the load-bearing uniqueness or ansatz, and the reported incongruence is presented as a direct comparison of independently generated distance matrices rather than a renaming or self-definition. The central claim therefore remains self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 0 invented entities

Central claim rests on standard phylogenomic assumptions and codon usage metrics; no new entities postulated.

free parameters (1)
  • dimensionality reduction hyperparameters
    Parameters used to reduce codon usage data into clusters that correlate with morphology.
axioms (1)
  • domain assumption Chloroplast genome evolves without significant horizontal gene transfer or recombination
    Invoked implicitly for phylogenomic reconstruction in the abstract.

pith-pipeline@v0.9.0 · 5583 in / 1090 out tokens · 53646 ms · 2026-05-13T18:41:00.632553+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

Reference graph

Works this paper leans on

2 extracted references · 2 canonical work pages

  1. [1]

    1 Parker, J. et al. Genome-wide signatures of convergent evolution in echolocating mammals. Nature 502, 228-231 (2013). https://doi.org/10.1038/nature12511 2 Ujvari, B. et al. Widespread convergence in toxin resistance by predictable molecular evolution. Proceedings of the National Academy of Sciences 112, 11911 -11916 (2015). https://doi.org/10.1073/pnas...

  2. [2]

    https://doi.org/10.1111/tpj.12876 46 Wei, L

    Plant J 82, 1030 -1041 (2015). https://doi.org/10.1111/tpj.12876 46 Wei, L. et al. Analysis of codon usage bias of mitochondrial genome in Bombyx mori and its relation to evolution. BMC Evol Biol 14, 262 (2014). https://doi.org/10.1186/s12862-014-0262-4 47 Pennington, R. T. et al. Historical climate change and speciation: neotropical seasonally dry forest...