Deep-time consistency in proteome elemental composition across cellular and viral life
Pith reviewed 2026-05-20 02:26 UTC · model grok-4.3
The pith
Proteomes maintain tightly constrained elemental composition across all cellular life, viruses, and LUCA reconstructions.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The authors claim that proteome elemental composition is a deeply conserved organizational feature across all cellular and viral life. Thousands of proteomes spanning the tree of life show nearly identical ratios of carbon, hydrogen, nitrogen, oxygen, and sulfur. This holds for LUCA reconstructions but not for synthetic proteins made from reduced primordial alphabets, which fall outside the modern range and change how elemental composition correlates with predicted folds. The pattern is stronger than amino acid composition alone would predict and persists in viruses without a shared ancestor, pointing to universal biochemical constraints that likely stabilized the modern amino acid set early
What carries the argument
Statistical comparison of elemental stoichiometries (C, H, N, O, S percentages) in real proteomes against LUCA reconstructions and synthetic reduced-alphabet proteomes, which only the modern alphabet keeps inside the observed narrow range.
If this is right
- The modern amino acid alphabet was likely selected in part to satisfy elemental constraints already present at the time of LUCA.
- Viral proteomes follow the same elemental rules despite lacking a common ancestor with cells, implying universal biochemical limits.
- Reduced alphabets alter both elemental balance and the link between composition and predicted protein structural organization.
- The consistency exceeds what amino acid frequencies or shared ancestry would produce on their own.
Where Pith is reading between the lines
- Lab experiments with nonstandard amino acids could test whether evolved proteomes naturally return to the standard elemental ratios.
- The constraint may connect to why life selects specific elements in macromolecules beyond simple environmental availability.
- Early-Earth chemical models incorporating these ratios could predict which protein-like structures form under prebiotic conditions.
Load-bearing premise
That the observed elemental consistency cannot be explained by evolutionary relatedness, biological function, or amino acid usage patterns alone, and that synthetic reduced-alphabet proteomes fairly represent primordial conditions.
What would settle it
Finding a natural proteome or a folded synthetic protein from a reduced primordial alphabet that maintains high sequence similarity yet falls outside the narrow modern elemental range would falsify the claim.
Figures
read the original abstract
Proteins are constructed from a limited alphabet of ~20 amino acids, yet the origins and selection of this specific alphabet are unresolved. One largely overlooked aspect is whether elemental composition constrains the range of viable proteomes. Here, we analyze the elemental composition of thousands of proteomes spanning cellular domains and viral realms. Despite evolutionary divergence and orders-of-magnitude variation in proteome size and gene content, proteomes exhibit strikingly consistent elemental composition. This consistency is substantially more constrained than amino acid frequencies or physicochemical properties and is not explained by evolutionary relatedness, biological function, or amino acid usage alone. Viral proteomes occupy the same elemental composition space observed in cellular organisms despite the absence of a single viral common ancestor, suggesting common biochemical constraints shape proteome organization across life. To investigate the evolutionary origins of this pattern, we compare modern proteomes with multiple independent reconstructions of the Last Universal Common Ancestor (LUCA) and with synthetic reduced-alphabet proteomes generated from primordial amino acid alphabets. LUCA proteomes occupy the same constrained elemental composition space observed in modern Bacteria and Archaea, whereas reduced primordial-like alphabets systematically generated alternative elemental regimes outside the modern range despite retaining high sequence similarity to extant proteins. Reduced alphabets disrupt fold space and reorganize relationships between elemental composition and predicted protein structural organization. Our results suggest that constrained elemental composition represents a fundamental organizational property of proteomes, which emerged early in evolution and may have contributed to the selection and stabilization of the modern amino acid alphabet.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper analyzes elemental composition (C, H, N, O, S) across thousands of proteomes from Bacteria, Archaea, Eukarya, and viruses, reporting striking consistency that exceeds variation in amino-acid frequencies or physicochemical properties. LUCA reconstructions fall inside the observed modern range, while synthetic proteomes built from reduced primordial alphabets fall outside it despite retaining high sequence similarity; the authors conclude that constrained elemental composition is a fundamental organizational property that emerged early and may have contributed to stabilization of the modern amino-acid alphabet.
Significance. If the central comparisons hold after controls, the work identifies a potential pre-LUCA biochemical constraint on proteome organization that is independent of phylogeny or function and links modern observations to both ancestral reconstructions and hypothetical early alphabets. The scale of the dataset (thousands of proteomes spanning cellular and viral realms) and the use of multiple independent LUCA reconstructions are clear strengths that could open new avenues for testing deep-time hypotheses in molecular evolution.
major comments (2)
- [Abstract and Results (synthetic proteomes)] Abstract (final two paragraphs) and Results (synthetic proteomes comparison): the inference that constrained elemental composition 'may have contributed to the selection and stabilization of the modern amino acid alphabet' rests on synthetic reduced-alphabet sequences falling outside the modern elemental range. Because elemental composition is a direct linear function of amino-acid frequencies, any unstated differences in substitution rules, codon-bias preservation, or hydrophobicity matching could produce the observed shift without reflecting historical constraints; the manuscript does not specify the exact generation protocol or controls for these confounds.
- [Abstract] Abstract (first results paragraph): the claim that consistency 'is substantially more constrained than amino acid frequencies or physicochemical properties and is not explained by evolutionary relatedness, biological function, or amino acid usage alone' is load-bearing for the 'fundamental organizational property' conclusion, yet the abstract (and presumably the main text) provides no quantitative details on the statistical tests, multiple-comparison corrections, data-exclusion criteria, or error propagation used to establish this.
minor comments (2)
- [Results] Add a supplementary table or figure panel that reports the numerical ranges (mean, SD, min-max) for each elemental ratio across domains and viruses to allow readers to assess the claimed 'striking consistency' quantitatively.
- [Methods] Clarify in Methods whether proteome elemental calculations weight by protein abundance or treat all predicted proteins equally, and whether any filtering was applied for incomplete genomes or annotation quality.
Simulated Author's Rebuttal
We thank the referee for their constructive and detailed review. The comments highlight important areas where additional methodological transparency will strengthen the manuscript. We have revised the text to address both points and provide point-by-point responses below.
read point-by-point responses
-
Referee: [Abstract and Results (synthetic proteomes)] Abstract (final two paragraphs) and Results (synthetic proteomes comparison): the inference that constrained elemental composition 'may have contributed to the selection and stabilization of the modern amino acid alphabet' rests on synthetic reduced-alphabet sequences falling outside the modern elemental range. Because elemental composition is a direct linear function of amino-acid frequencies, any unstated differences in substitution rules, codon-bias preservation, or hydrophobicity matching could produce the observed shift without reflecting historical constraints; the manuscript does not specify the exact generation protocol or controls for these confounds.
Authors: We agree that explicit specification of the synthetic proteome protocol is required. In the revised manuscript we have added a dedicated Methods subsection describing the generation procedure in full. Reduced-alphabet sequences were produced by replacing each modern amino acid with its closest primordial counterpart according to the consensus prebiotic alphabet of Ilardo et al. (2015), using a deterministic mapping that preserves sequence length and achieves >85% identity to the original protein. Hydrophobicity distributions were matched to the source proteome using the Kyte-Doolittle scale with a tolerance of 0.1 units per residue. Codon bias was retained by resampling replacement codons from the original organism-specific codon table. We further generated two control sets: (i) random substitutions drawn from the same amino-acid frequency distribution and (ii) substitutions preserving hydrophobicity but ignoring primordial mapping. Only the primordial mapping produced elemental compositions outside the modern observed range. These controls are now reported in Results and referenced in the abstract. We therefore maintain that the elemental shift reflects the chemical properties of the reduced alphabet rather than unaccounted substitution artifacts. revision: yes
-
Referee: [Abstract] Abstract (first results paragraph): the claim that consistency 'is substantially more constrained than amino acid frequencies or physicochemical properties and is not explained by evolutionary relatedness, biological function, or amino acid usage alone' is load-bearing for the 'fundamental organizational property' conclusion, yet the abstract (and presumably the main text) provides no quantitative details on the statistical tests, multiple-comparison corrections, data-exclusion criteria, or error propagation used to establish this.
Authors: We accept that the abstract and main text should report the quantitative basis for the claim of greater constraint. In the revision we have inserted the following details into both the abstract and a new paragraph in Results: variance in elemental composition (atomic percentages of C, H, N, O, S) was compared with variance in amino-acid frequencies and in five physicochemical properties using Levene’s test; all elemental variances were significantly smaller (p < 0.001 after Bonferroni correction across 5 elements × 3 comparison classes). Proteomes were excluded if >10% of residues were unannotated or if GC content fell outside 20–80%. Confidence intervals on range boundaries were obtained by 1,000 bootstrap replicates of the full dataset. These statistics are now stated concisely in the abstract and fully documented in Methods and Results. The added information directly supports the assertion that elemental composition is more constrained than the other quantities examined. revision: yes
Circularity Check
No significant circularity; observational comparisons on external data and prior reconstructions
full rationale
The paper's core results consist of direct computation of elemental compositions from public proteome sequence databases, comparison against independently published LUCA reconstructions, and generation of synthetic reduced-alphabet sequences whose compositions are then measured. No equation or claim reduces a fitted parameter to a prediction by construction, no self-citation is invoked as a uniqueness theorem that forces the result, and the synthetic-alphabet step is an external control rather than a self-referential definition. The derivation chain therefore remains self-contained against external sequence data and literature reconstructions.
Axiom & Free-Parameter Ledger
axioms (2)
- standard math Elemental composition of a proteome can be computed directly from its amino-acid sequence using fixed atomic counts per residue.
- domain assumption Existing LUCA proteome reconstructions are sufficiently accurate to serve as a proxy for ancestral elemental composition.
Lean theorems connected to this paper
-
IndisputableMonolith/Foundation/RealityFromDistinction.leanreality_from_one_distinction unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
constrained elemental composition represents a fundamental organizational property of proteomes, which emerged early in evolution
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
LUCA proteomes occupy the same constrained elemental composition space
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
-
[1]
1 Deep-time consistency in proteome elemental composition across cellular and viral life L. Felipe Benites1, Louie Slocombe1, Sara Imari Walker1,2,3* 1Beyond Center for Fundamental Concepts in Science, Arizona State University, Tempe AZ USA 2School of Earth and Space Exploration, Arizona State University, Tempe AZ USA 3Santa Fe Institute, Santa Fe, NM USA...
work page 1991
-
[2]
indicating these could have been present at the origin of life. Furthermore, thousands of amino acids are theoretically possible based on structural exploration of chemical space (Ilardo et al. 2015). These can be classified according to core func-tional groups as α-, β-, γ-, δ-, with most having one or more enantiomeric mirror image forms (L or D). Howev...
work page 2015
-
[3]
Relationships between amino acid composition, physicochemical properties, and proteome elemental composition across cellular and viral life. (A) Heatmap showing Spearman rank correlations (ρ) between amino acid frequencies and normalized elemental composition for carbon (C), hydrogen (H), nitrogen (N), oxygen (O), sulfur (S), and selenium (Se) across cell...
work page 2017
-
[4]
2024), parame-ters that could influence proteome composition (Knight et al
and elevated hydrogen fluxes (Moody et al. 2024), parame-ters that could influence proteome composition (Knight et al. 2004). To address this, we compared independent LUCA-consensus proteomes, built from deep homologous sequences (PFAM - 8 Wehbi et al. 2024; COG – Crapitto et al. 2022; KO – Moody et al. 2024), including ancestrally reconstructed proteins ...
work page 2024
-
[5]
Elemental composition and compositional distances of LUCA-derived proteomes relative to modern cellular proteomes. Elemental composition of LUCA consensus and ancestral sequence reconstructed (ASR) proteomes compared with mean archaeal and bacterial proteomes. Values for carbon (C), hydrogen (H), nitrogen (N), oxygen (O), and sulfur (S) are shown as perce...
work page 2000
-
[6]
The Origin of Biological Homochirality
Reduced primordial-like amino acid alphabets reorganize relationships between proteome elemental composition and predicted protein structural organization. Scatterplots showing relationships between normalized proteome elemental composition and predicted structural propensities for α-helices, β-sheets, and disorder across WT, pLUCA, and Trifonov proteomes...
-
[7]
PMID: 16515719; PMCID: PMC1431706. Makarov M, Sanchez Rocha AC, Krystufek R, Cherepashuk I, Dzmitruk V, Charnavets T, Faustino AM, Lebl M, Fujishima K, Fried SD, Hlouchova K. Early Selection of the Amino Acid Alphabet Was Adaptively Shaped by Biophysical Constraints of Foldability. J Am Chem Soc. 2023 Mar 8;145(9):5320-5329. doi: 10.1021/jacs.2c12987. Epu...
-
[8]
Brown SM, Mayer-Bacon C, Freeland S
PMID: 36826345; PMCID: PMC10017022. Brown SM, Mayer-Bacon C, Freeland S. Xeno Amino Acids: A Look into Biochemistry as We Do Not Know It. Life (Basel). 2023 Nov 29;13(12):2281. doi: 10.3390/life13122281. PMID: 38137883; PMCID: PMC10744825. Baudouin-Cornu P, Schuerer K, Marlière P, Thomas D. Intimate evolution of proteins. Proteome atomic content correlate...
-
[9]
PMID: 14645368. Remick KA, Helmann JD. The elements of life: A biocentric tour of the periodic table. Adv Microb Physiol. 2023;82:1-127. doi: 10.1016/bs.ampbs.2022.11.001. Epub 2023 Jan
-
[10]
The new solar abundances—Part I: The observations. 16 Shenhav L, Zeevi D. Resource conservation manifests in the genetic code. Science. 2020 Nov 6;370(6517):683-687. doi: 10.1126/science.aaz9642. PMID: 33154134. Hana Rozhoňová, Joshua L Payne, Little Evidence the Standard Genetic Code Is Optimized for Resource Conservation, Molecular Biology and Evolution...
-
[11]
Slesarev, Alexei I et al. “The complete genome of hyperthermophile Methanopyrus kandleri AV19 and monophyly of archaeal methanogens.” Proceedings of the National Academy of Sciences of the United States of America vol. 99,7 (2002): 4644-9. doi:10.1073/pnas.032671499 Krupovic M, Koonin EV. Multiple origins of viral capsid proteins from cellular ancestors. ...
-
[12]
PMID: 28265094; PMCID: PMC5373398. Harris HMB, Hill C. A Place for Viruses on the Tree of Life. Front Microbiol. 2021 Jan 14;11:604048. doi: 10.3389/fmicb.2020.604048. PMID: 33519747; PMCID: PMC7840587. Weiss MC, Sousa FL, Mrnjavac N, Neukirchen S, Roettger M, Nelson-Sathi S, Martin WF. The physiology and habitat of the last universal common ancestor. Nat...
-
[13]
PMID: 15150418; PMCID: PMC420404. 17 Wehbi, Sawsan et al. “Order of amino acid recruitment into the genetic code resolved by last uni-versal common ancestor's protein domains.” Proceedings of the National Academy of Sciences of the United States of America vol. 121,52 (2024): e2410311121. doi:10.1073/pnas.2410311121 Crapitto, Andrew J et al. “A consensus ...
-
[14]
PMID: 26311124. Novoselov AA, Silva D, Schneider J, Abrevaya XC, Chaffin MS, Serrano P, Navarro MS, Conti MJ, Souza Filho CR. Geochemical constraints on the Hadean environment from mineral finger-prints of prokaryotes. Sci Rep. 2017 Jun 21;7(1):4008. doi: 10.1038/s41598-017-04161-2. Erratum in: Sci Rep. 2018 Mar 14;8(1):4790. doi: 10.1038/s41598-018-23130...
-
[15]
Taxonomic partitioning at fasta sequence level by viral realm was conducted with SeqKit v2.8.0 (Shen et al. 2016). Random proteome datasets Random proteomes (n =
work page 2016
-
[16]
Proba-ble_and_sampling_threshold_met
were generated with parameters derived from the average number of proteins and the observed minimum and maximum protein lengths across all bacterial proteomes to approximate realistic proteome architecture. Each random proteome consisted in 3,500 random protein sequences ranging from 30 to 4,200 amino acids in length, with randomly sample 22 encoded amino...
-
[17]
and DSSP v4.6 (Hekkelman et al. 2025). Secondary structures were classified using DSSP annotations and grouped into sheets (B, E), helices (H, G, I, P), and disorder (C, S, T), where H represents α-helices, G 3₁₀-helices, I π-helices, P κ-helices (poly-proline II helices), B isolated β-bridges, E extended β-strands, T hydrogen-bonded turns, and S bends. R...
work page 2025
-
[18]
Analyses were performed on Euclidean distances calculated 22 from elemental percentages
in R (v4.2.1). Analyses were performed on Euclidean distances calculated 22 from elemental percentages. To reduce computational burden while maintaining balanced repre-sentation, datasets were randomly subsampled to up to 500 proteomes per group using a fixed seed for reproducibility. Significance was assessed using 999 permutations, and the proportion of...
work page 2015
-
[19]
Fast and sensitive protein alignment using DIAMOND
Best high-scoring segment pairs (HSPs) were retained for downstream analyses. Alignment coverage and bitscore values were calculated for matched proteins. To normalize sequence simi-larity across proteins of different lengths and compositions, we calculated relative bitscores by dividing the monster-versus-WT bitscore by the corresponding WT self-alignmen...
-
[20]
Kunzmann, P., Müller, T.D., Greil, M. et al. Biotite: new tools for a versatile Python bioinformatics library. BMC Bioinformatics 24, 236 (2023). https://doi.org/10.1186/s12859-023-05345-6 26 Hekkelman ML, Salmoral DÁ, Perrakis A, Joosten RP. DSSP 4: FAIR annotation of protein secondary structure. Protein Science. 2025;34(8):e70208. https://doi.org/10.100...
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.