Deep-time consistency in proteome elemental composition across cellular and viral life

L. Felipe Benites; Louie Slocombe; Sara I. Walker

arxiv: 2605.19333 · v1 · pith:IPNAGYAFnew · submitted 2026-05-19 · 🧬 q-bio.BM · q-bio.PE

Deep-time consistency in proteome elemental composition across cellular and viral life

L. Felipe Benites , Louie Slocombe , Sara I. Walker This is my paper

Pith reviewed 2026-05-20 02:26 UTC · model grok-4.3

classification 🧬 q-bio.BM q-bio.PE

keywords proteome elemental compositionamino acid alphabetLUCAviral proteomesprotein evolutionorigins of lifebiochemical constraints

0 comments

The pith

Proteomes maintain tightly constrained elemental composition across all cellular life, viruses, and LUCA reconstructions.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper sets out to show that the elemental composition of proteins is remarkably uniform across bacteria, archaea, eukaryotes, viruses, and even reconstructions of the last universal common ancestor. This uniformity holds despite enormous differences in how many proteins each organism makes and how far apart their evolutionary histories are. The authors demonstrate that this pattern is tighter than expected from amino acid frequencies or from shared genes, and that it breaks when they simulate proteomes using only the smaller set of amino acids thought to be available early in life's history. If true, it means a basic rule about the mix of elements in proteins helped shape which amino acids life ended up using. Readers should care because it offers a new lens on why the genetic code uses twenty specific building blocks rather than many other possible ones.

Core claim

The authors claim that proteome elemental composition is a deeply conserved organizational feature across all cellular and viral life. Thousands of proteomes spanning the tree of life show nearly identical ratios of carbon, hydrogen, nitrogen, oxygen, and sulfur. This holds for LUCA reconstructions but not for synthetic proteins made from reduced primordial alphabets, which fall outside the modern range and change how elemental composition correlates with predicted folds. The pattern is stronger than amino acid composition alone would predict and persists in viruses without a shared ancestor, pointing to universal biochemical constraints that likely stabilized the modern amino acid set early

What carries the argument

Statistical comparison of elemental stoichiometries (C, H, N, O, S percentages) in real proteomes against LUCA reconstructions and synthetic reduced-alphabet proteomes, which only the modern alphabet keeps inside the observed narrow range.

If this is right

The modern amino acid alphabet was likely selected in part to satisfy elemental constraints already present at the time of LUCA.
Viral proteomes follow the same elemental rules despite lacking a common ancestor with cells, implying universal biochemical limits.
Reduced alphabets alter both elemental balance and the link between composition and predicted protein structural organization.
The consistency exceeds what amino acid frequencies or shared ancestry would produce on their own.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Lab experiments with nonstandard amino acids could test whether evolved proteomes naturally return to the standard elemental ratios.
The constraint may connect to why life selects specific elements in macromolecules beyond simple environmental availability.
Early-Earth chemical models incorporating these ratios could predict which protein-like structures form under prebiotic conditions.

Load-bearing premise

That the observed elemental consistency cannot be explained by evolutionary relatedness, biological function, or amino acid usage patterns alone, and that synthetic reduced-alphabet proteomes fairly represent primordial conditions.

What would settle it

Finding a natural proteome or a folded synthetic protein from a reduced primordial alphabet that maintains high sequence similarity yet falls outside the narrow modern elemental range would falsify the claim.

Figures

Figures reproduced from arXiv: 2605.19333 by L. Felipe Benites, Louie Slocombe, Sara I. Walker.

**Figure 2.** Figure 2: Relationships between amino acid composition, physicochemical properties, and [PITH_FULL_IMAGE:figures/full_fig_p007_2.png] view at source ↗

**Figure 3.** Figure 3: Reduced primordial-like amino acid alphabets disrupt modern proteome elemental organization. (A) Violin plots showing per-monster proteome elemental composition differences relative to matched wild-type (WT) bacterial proteomes (Δ) for carbon, hydrogen, nitrogen, oxygen, and sulfur in synthetic reduced-alphabet proteomes. Boxplots indicate medians and interquartile ranges. Dashed horizontal lines indicat… view at source ↗

**Figure 4.** Figure 4: Reduced primordial-like amino acid alphabets reorganize relationships between proteome elemental composition and predicted protein structural organization. Scatterplots showing relationships between normalized proteome elemental composition and predicted structural propensities for α-helices, β-sheets, and disorder across WT, pLUCA, and Trifonov proteomes. Each point represents a sampled proteome analyzed … view at source ↗

read the original abstract

Proteins are constructed from a limited alphabet of ~20 amino acids, yet the origins and selection of this specific alphabet are unresolved. One largely overlooked aspect is whether elemental composition constrains the range of viable proteomes. Here, we analyze the elemental composition of thousands of proteomes spanning cellular domains and viral realms. Despite evolutionary divergence and orders-of-magnitude variation in proteome size and gene content, proteomes exhibit strikingly consistent elemental composition. This consistency is substantially more constrained than amino acid frequencies or physicochemical properties and is not explained by evolutionary relatedness, biological function, or amino acid usage alone. Viral proteomes occupy the same elemental composition space observed in cellular organisms despite the absence of a single viral common ancestor, suggesting common biochemical constraints shape proteome organization across life. To investigate the evolutionary origins of this pattern, we compare modern proteomes with multiple independent reconstructions of the Last Universal Common Ancestor (LUCA) and with synthetic reduced-alphabet proteomes generated from primordial amino acid alphabets. LUCA proteomes occupy the same constrained elemental composition space observed in modern Bacteria and Archaea, whereas reduced primordial-like alphabets systematically generated alternative elemental regimes outside the modern range despite retaining high sequence similarity to extant proteins. Reduced alphabets disrupt fold space and reorganize relationships between elemental composition and predicted protein structural organization. Our results suggest that constrained elemental composition represents a fundamental organizational property of proteomes, which emerged early in evolution and may have contributed to the selection and stabilization of the modern amino acid alphabet.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Proteome elemental composition is tightly consistent across domains and viruses and matches LUCA, but the claim that this helped select the modern alphabet rests on how fairly the reduced-alphabet synthetics were constructed.

read the letter

The main thing to know is that this paper reports a striking uniformity in the elemental makeup of proteomes across bacteria, archaea, eukaryotes, and viruses, with LUCA reconstructions falling inside the same narrow range while reduced primordial alphabets fall outside it even at high sequence similarity to real proteins. The consistency appears tighter than what amino acid frequencies or functional categories alone would predict, and viruses show the same pattern despite lacking a shared ancestor with cellular life. That scale of comparison is the clearest new piece here. They pulled together thousands of proteomes and layered in prior LUCA estimates plus their own synthetic controls, which gives the observational claim more weight than smaller earlier studies on protein composition. The work is straightforward database analysis rather than new theory, but the direct contrast with reduced alphabets and the structural reorganization they report adds a useful empirical angle on why the standard twenty amino acids might have been favored. The central observation holds up as a solid pattern if the statistics and data filters are clean. Where it gets softer is the evolutionary inference. The reduced-alphabet synthetics are the load-bearing step for arguing that elemental constraints contributed to alphabet stabilization. If those sequences were built with substitutions that preserved modern hydrophobicity or codon patterns, the elemental shift could be an artifact of the construction method rather than evidence of historical incompatibility. The abstract leaves the exact protocol unclear, and even if the full methods section spells it out, independent checks for those confounds would strengthen the case. This is worth a serious referee for groups working on origins of life, astrobiology, or synthetic biology. Readers who want large-scale proteome chemistry data and a new angle on the amino acid alphabet will find it useful. The paper is coherent on its own terms and engages the relevant literature without obvious internal contradictions, so it deserves peer review rather than a desk reject, mainly to get the proxy construction details and any statistical controls aired out.

Referee Report

2 major / 2 minor

Summary. The paper analyzes elemental composition (C, H, N, O, S) across thousands of proteomes from Bacteria, Archaea, Eukarya, and viruses, reporting striking consistency that exceeds variation in amino-acid frequencies or physicochemical properties. LUCA reconstructions fall inside the observed modern range, while synthetic proteomes built from reduced primordial alphabets fall outside it despite retaining high sequence similarity; the authors conclude that constrained elemental composition is a fundamental organizational property that emerged early and may have contributed to stabilization of the modern amino-acid alphabet.

Significance. If the central comparisons hold after controls, the work identifies a potential pre-LUCA biochemical constraint on proteome organization that is independent of phylogeny or function and links modern observations to both ancestral reconstructions and hypothetical early alphabets. The scale of the dataset (thousands of proteomes spanning cellular and viral realms) and the use of multiple independent LUCA reconstructions are clear strengths that could open new avenues for testing deep-time hypotheses in molecular evolution.

major comments (2)

[Abstract and Results (synthetic proteomes)] Abstract (final two paragraphs) and Results (synthetic proteomes comparison): the inference that constrained elemental composition 'may have contributed to the selection and stabilization of the modern amino acid alphabet' rests on synthetic reduced-alphabet sequences falling outside the modern elemental range. Because elemental composition is a direct linear function of amino-acid frequencies, any unstated differences in substitution rules, codon-bias preservation, or hydrophobicity matching could produce the observed shift without reflecting historical constraints; the manuscript does not specify the exact generation protocol or controls for these confounds.
[Abstract] Abstract (first results paragraph): the claim that consistency 'is substantially more constrained than amino acid frequencies or physicochemical properties and is not explained by evolutionary relatedness, biological function, or amino acid usage alone' is load-bearing for the 'fundamental organizational property' conclusion, yet the abstract (and presumably the main text) provides no quantitative details on the statistical tests, multiple-comparison corrections, data-exclusion criteria, or error propagation used to establish this.

minor comments (2)

[Results] Add a supplementary table or figure panel that reports the numerical ranges (mean, SD, min-max) for each elemental ratio across domains and viruses to allow readers to assess the claimed 'striking consistency' quantitatively.
[Methods] Clarify in Methods whether proteome elemental calculations weight by protein abundance or treat all predicted proteins equally, and whether any filtering was applied for incomplete genomes or annotation quality.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their constructive and detailed review. The comments highlight important areas where additional methodological transparency will strengthen the manuscript. We have revised the text to address both points and provide point-by-point responses below.

read point-by-point responses

Referee: [Abstract and Results (synthetic proteomes)] Abstract (final two paragraphs) and Results (synthetic proteomes comparison): the inference that constrained elemental composition 'may have contributed to the selection and stabilization of the modern amino acid alphabet' rests on synthetic reduced-alphabet sequences falling outside the modern elemental range. Because elemental composition is a direct linear function of amino-acid frequencies, any unstated differences in substitution rules, codon-bias preservation, or hydrophobicity matching could produce the observed shift without reflecting historical constraints; the manuscript does not specify the exact generation protocol or controls for these confounds.

Authors: We agree that explicit specification of the synthetic proteome protocol is required. In the revised manuscript we have added a dedicated Methods subsection describing the generation procedure in full. Reduced-alphabet sequences were produced by replacing each modern amino acid with its closest primordial counterpart according to the consensus prebiotic alphabet of Ilardo et al. (2015), using a deterministic mapping that preserves sequence length and achieves >85% identity to the original protein. Hydrophobicity distributions were matched to the source proteome using the Kyte-Doolittle scale with a tolerance of 0.1 units per residue. Codon bias was retained by resampling replacement codons from the original organism-specific codon table. We further generated two control sets: (i) random substitutions drawn from the same amino-acid frequency distribution and (ii) substitutions preserving hydrophobicity but ignoring primordial mapping. Only the primordial mapping produced elemental compositions outside the modern observed range. These controls are now reported in Results and referenced in the abstract. We therefore maintain that the elemental shift reflects the chemical properties of the reduced alphabet rather than unaccounted substitution artifacts. revision: yes
Referee: [Abstract] Abstract (first results paragraph): the claim that consistency 'is substantially more constrained than amino acid frequencies or physicochemical properties and is not explained by evolutionary relatedness, biological function, or amino acid usage alone' is load-bearing for the 'fundamental organizational property' conclusion, yet the abstract (and presumably the main text) provides no quantitative details on the statistical tests, multiple-comparison corrections, data-exclusion criteria, or error propagation used to establish this.

Authors: We accept that the abstract and main text should report the quantitative basis for the claim of greater constraint. In the revision we have inserted the following details into both the abstract and a new paragraph in Results: variance in elemental composition (atomic percentages of C, H, N, O, S) was compared with variance in amino-acid frequencies and in five physicochemical properties using Levene’s test; all elemental variances were significantly smaller (p < 0.001 after Bonferroni correction across 5 elements × 3 comparison classes). Proteomes were excluded if >10% of residues were unannotated or if GC content fell outside 20–80%. Confidence intervals on range boundaries were obtained by 1,000 bootstrap replicates of the full dataset. These statistics are now stated concisely in the abstract and fully documented in Methods and Results. The added information directly supports the assertion that elemental composition is more constrained than the other quantities examined. revision: yes

Circularity Check

0 steps flagged

No significant circularity; observational comparisons on external data and prior reconstructions

full rationale

The paper's core results consist of direct computation of elemental compositions from public proteome sequence databases, comparison against independently published LUCA reconstructions, and generation of synthetic reduced-alphabet sequences whose compositions are then measured. No equation or claim reduces a fitted parameter to a prediction by construction, no self-citation is invoked as a uniqueness theorem that forces the result, and the synthetic-alphabet step is an external control rather than a self-referential definition. The derivation chain therefore remains self-contained against external sequence data and literature reconstructions.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The analysis rests on standard atomic compositions of amino acids and on the accuracy of previously published LUCA proteome reconstructions; no new free parameters or invented entities are introduced in the abstract.

axioms (2)

standard math Elemental composition of a proteome can be computed directly from its amino-acid sequence using fixed atomic counts per residue.
Basic biochemistry; invoked throughout the comparative analysis.
domain assumption Existing LUCA proteome reconstructions are sufficiently accurate to serve as a proxy for ancestral elemental composition.
Used to claim that the modern pattern already existed at the root of the tree of life.

pith-pipeline@v0.9.0 · 5804 in / 1415 out tokens · 48134 ms · 2026-05-20T02:26:39.779554+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Foundation/RealityFromDistinction.lean reality_from_one_distinction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

constrained elemental composition represents a fundamental organizational property of proteomes, which emerged early in evolution
IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

LUCA proteomes occupy the same constrained elemental composition space

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

20 extracted references · 20 canonical work pages

[1]

1 Deep-time consistency in proteome elemental composition across cellular and viral life L. Felipe Benites1, Louie Slocombe1, Sara Imari Walker1,2,3* 1Beyond Center for Fundamental Concepts in Science, Arizona State University, Tempe AZ USA 2School of Earth and Space Exploration, Arizona State University, Tempe AZ USA 3Santa Fe Institute, Santa Fe, NM USA...

work page 1991
[2]

Furthermore, thousands of amino acids are theoretically possible based on structural exploration of chemical space (Ilardo et al

indicating these could have been present at the origin of life. Furthermore, thousands of amino acids are theoretically possible based on structural exploration of chemical space (Ilardo et al. 2015). These can be classified according to core func-tional groups as α-, β-, γ-, δ-, with most having one or more enantiomeric mirror image forms (L or D). Howev...

work page 2015
[3]

Relationships between amino acid composition, physicochemical properties, and proteome elemental composition across cellular and viral life. (A) Heatmap showing Spearman rank correlations (ρ) between amino acid frequencies and normalized elemental composition for carbon (C), hydrogen (H), nitrogen (N), oxygen (O), sulfur (S), and selenium (Se) across cell...

work page 2017
[4]

2024), parame-ters that could influence proteome composition (Knight et al

and elevated hydrogen fluxes (Moody et al. 2024), parame-ters that could influence proteome composition (Knight et al. 2004). To address this, we compared independent LUCA-consensus proteomes, built from deep homologous sequences (PFAM - 8 Wehbi et al. 2024; COG – Crapitto et al. 2022; KO – Moody et al. 2024), including ancestrally reconstructed proteins ...

work page 2024
[5]

Elemental composition of LUCA consensus and ancestral sequence reconstructed (ASR) proteomes compared with mean archaeal and bacterial proteomes

Elemental composition and compositional distances of LUCA-derived proteomes relative to modern cellular proteomes. Elemental composition of LUCA consensus and ancestral sequence reconstructed (ASR) proteomes compared with mean archaeal and bacterial proteomes. Values for carbon (C), hydrogen (H), nitrogen (N), oxygen (O), and sulfur (S) are shown as perce...

work page 2000
[6]

The Origin of Biological Homochirality

Reduced primordial-like amino acid alphabets reorganize relationships between proteome elemental composition and predicted protein structural organization. Scatterplots showing relationships between normalized proteome elemental composition and predicted structural propensities for α-helices, β-sheets, and disorder across WT, pLUCA, and Trifonov proteomes...

work page doi:10.1111/j.1365-2958.1991.tb00722.x 2016
[7]

Makarov M, Sanchez Rocha AC, Krystufek R, Cherepashuk I, Dzmitruk V, Charnavets T, Faustino AM, Lebl M, Fujishima K, Fried SD, Hlouchova K

PMID: 16515719; PMCID: PMC1431706. Makarov M, Sanchez Rocha AC, Krystufek R, Cherepashuk I, Dzmitruk V, Charnavets T, Faustino AM, Lebl M, Fujishima K, Fried SD, Hlouchova K. Early Selection of the Amino Acid Alphabet Was Adaptively Shaped by Biophysical Constraints of Foldability. J Am Chem Soc. 2023 Mar 8;145(9):5320-5329. doi: 10.1021/jacs.2c12987. Epu...

work page doi:10.1021/jacs.2c12987 2023
[8]

Brown SM, Mayer-Bacon C, Freeland S

PMID: 36826345; PMCID: PMC10017022. Brown SM, Mayer-Bacon C, Freeland S. Xeno Amino Acids: A Look into Biochemistry as We Do Not Know It. Life (Basel). 2023 Nov 29;13(12):2281. doi: 10.3390/life13122281. PMID: 38137883; PMCID: PMC10744825. Baudouin-Cornu P, Schuerer K, Marlière P, Thomas D. Intimate evolution of proteins. Proteome atomic content correlate...

work page doi:10.3390/life13122281 2023
[9]

Remick KA, Helmann JD

PMID: 14645368. Remick KA, Helmann JD. The elements of life: A biocentric tour of the periodic table. Adv Microb Physiol. 2023;82:1-127. doi: 10.1016/bs.ampbs.2022.11.001. Epub 2023 Jan

work page doi:10.1016/bs.ampbs.2022.11.001 2023
[10]

16 Shenhav L, Zeevi D

The new solar abundances—Part I: The observations. 16 Shenhav L, Zeevi D. Resource conservation manifests in the genetic code. Science. 2020 Nov 6;370(6517):683-687. doi: 10.1126/science.aaz9642. PMID: 33154134. Hana Rozhoňová, Joshua L Payne, Little Evidence the Standard Genetic Code Is Optimized for Resource Conservation, Molecular Biology and Evolution...

work page doi:10.1126/science.aaz9642 2020
[11]

The complete genome of hyperthermophile Methanopyrus kandleri AV19 and monophyly of archaeal methanogens

Slesarev, Alexei I et al. “The complete genome of hyperthermophile Methanopyrus kandleri AV19 and monophyly of archaeal methanogens.” Proceedings of the National Academy of Sciences of the United States of America vol. 99,7 (2002): 4644-9. doi:10.1073/pnas.032671499 Krupovic M, Koonin EV. Multiple origins of viral capsid proteins from cellular ancestors. ...

work page doi:10.1073/pnas.032671499 2002
[12]

Harris HMB, Hill C

PMID: 28265094; PMCID: PMC5373398. Harris HMB, Hill C. A Place for Viruses on the Tree of Life. Front Microbiol. 2021 Jan 14;11:604048. doi: 10.3389/fmicb.2020.604048. PMID: 33519747; PMCID: PMC7840587. Weiss MC, Sousa FL, Mrnjavac N, Neukirchen S, Roettger M, Nelson-Sathi S, Martin WF. The physiology and habitat of the last universal common ancestor. Nat...

work page doi:10.3389/fmicb.2020.604048 2021
[13]

Order of amino acid recruitment into the genetic code resolved by last uni-versal common ancestor's protein domains

PMID: 15150418; PMCID: PMC420404. 17 Wehbi, Sawsan et al. “Order of amino acid recruitment into the genetic code resolved by last uni-versal common ancestor's protein domains.” Proceedings of the National Academy of Sciences of the United States of America vol. 121,52 (2024): e2410311121. doi:10.1073/pnas.2410311121 Crapitto, Andrew J et al. “A consensus ...

work page doi:10.1073/pnas.2410311121 2024
[14]

Novoselov AA, Silva D, Schneider J, Abrevaya XC, Chaffin MS, Serrano P, Navarro MS, Conti MJ, Souza Filho CR

PMID: 26311124. Novoselov AA, Silva D, Schneider J, Abrevaya XC, Chaffin MS, Serrano P, Navarro MS, Conti MJ, Souza Filho CR. Geochemical constraints on the Hadean environment from mineral finger-prints of prokaryotes. Sci Rep. 2017 Jun 21;7(1):4008. doi: 10.1038/s41598-017-04161-2. Erratum in: Sci Rep. 2018 Mar 14;8(1):4790. doi: 10.1038/s41598-018-23130...

work page doi:10.1038/s41598-017-04161-2 2017
[15]

Taxonomic partitioning at fasta sequence level by viral realm was conducted with SeqKit v2.8.0 (Shen et al. 2016). Random proteome datasets Random proteomes (n =

work page 2016
[16]

Proba-ble_and_sampling_threshold_met

were generated with parameters derived from the average number of proteins and the observed minimum and maximum protein lengths across all bacterial proteomes to approximate realistic proteome architecture. Each random proteome consisted in 3,500 random protein sequences ranging from 30 to 4,200 amino acids in length, with randomly sample 22 encoded amino...

work page doi:10.5061/dryad.5hqbzkh7s 2024
[17]

and DSSP v4.6 (Hekkelman et al. 2025). Secondary structures were classified using DSSP annotations and grouped into sheets (B, E), helices (H, G, I, P), and disorder (C, S, T), where H represents α-helices, G 3₁₀-helices, I π-helices, P κ-helices (poly-proline II helices), B isolated β-bridges, E extended β-strands, T hydrogen-bonded turns, and S bends. R...

work page 2025
[18]

Analyses were performed on Euclidean distances calculated 22 from elemental percentages

in R (v4.2.1). Analyses were performed on Euclidean distances calculated 22 from elemental percentages. To reduce computational burden while maintaining balanced repre-sentation, datasets were randomly subsampled to up to 500 proteomes per group using a fixed seed for reproducibility. Significance was assessed using 999 permutations, and the proportion of...

work page 2015
[19]

Fast and sensitive protein alignment using DIAMOND

Best high-scoring segment pairs (HSPs) were retained for downstream analyses. Alignment coverage and bitscore values were calculated for matched proteins. To normalize sequence simi-larity across proteins of different lengths and compositions, we calculated relative bitscores by dividing the monster-versus-WT bitscore by the corresponding WT self-alignmen...

work page doi:10.1371/journal.pone.0163962 2025
[20]

Kunzmann, P., Müller, T.D., Greil, M. et al. Biotite: new tools for a versatile Python bioinformatics library. BMC Bioinformatics 24, 236 (2023). https://doi.org/10.1186/s12859-023-05345-6 26 Hekkelman ML, Salmoral DÁ, Perrakis A, Joosten RP. DSSP 4: FAIR annotation of protein secondary structure. Protein Science. 2025;34(8):e70208. https://doi.org/10.100...

work page doi:10.1186/s12859-023-05345-6 2023

[1] [1]

1 Deep-time consistency in proteome elemental composition across cellular and viral life L. Felipe Benites1, Louie Slocombe1, Sara Imari Walker1,2,3* 1Beyond Center for Fundamental Concepts in Science, Arizona State University, Tempe AZ USA 2School of Earth and Space Exploration, Arizona State University, Tempe AZ USA 3Santa Fe Institute, Santa Fe, NM USA...

work page 1991

[2] [2]

Furthermore, thousands of amino acids are theoretically possible based on structural exploration of chemical space (Ilardo et al

indicating these could have been present at the origin of life. Furthermore, thousands of amino acids are theoretically possible based on structural exploration of chemical space (Ilardo et al. 2015). These can be classified according to core func-tional groups as α-, β-, γ-, δ-, with most having one or more enantiomeric mirror image forms (L or D). Howev...

work page 2015

[3] [3]

Relationships between amino acid composition, physicochemical properties, and proteome elemental composition across cellular and viral life. (A) Heatmap showing Spearman rank correlations (ρ) between amino acid frequencies and normalized elemental composition for carbon (C), hydrogen (H), nitrogen (N), oxygen (O), sulfur (S), and selenium (Se) across cell...

work page 2017

[4] [4]

2024), parame-ters that could influence proteome composition (Knight et al

and elevated hydrogen fluxes (Moody et al. 2024), parame-ters that could influence proteome composition (Knight et al. 2004). To address this, we compared independent LUCA-consensus proteomes, built from deep homologous sequences (PFAM - 8 Wehbi et al. 2024; COG – Crapitto et al. 2022; KO – Moody et al. 2024), including ancestrally reconstructed proteins ...

work page 2024

[5] [5]

Elemental composition of LUCA consensus and ancestral sequence reconstructed (ASR) proteomes compared with mean archaeal and bacterial proteomes

Elemental composition and compositional distances of LUCA-derived proteomes relative to modern cellular proteomes. Elemental composition of LUCA consensus and ancestral sequence reconstructed (ASR) proteomes compared with mean archaeal and bacterial proteomes. Values for carbon (C), hydrogen (H), nitrogen (N), oxygen (O), and sulfur (S) are shown as perce...

work page 2000

[6] [6]

The Origin of Biological Homochirality

Reduced primordial-like amino acid alphabets reorganize relationships between proteome elemental composition and predicted protein structural organization. Scatterplots showing relationships between normalized proteome elemental composition and predicted structural propensities for α-helices, β-sheets, and disorder across WT, pLUCA, and Trifonov proteomes...

work page doi:10.1111/j.1365-2958.1991.tb00722.x 2016

[7] [7]

Makarov M, Sanchez Rocha AC, Krystufek R, Cherepashuk I, Dzmitruk V, Charnavets T, Faustino AM, Lebl M, Fujishima K, Fried SD, Hlouchova K

PMID: 16515719; PMCID: PMC1431706. Makarov M, Sanchez Rocha AC, Krystufek R, Cherepashuk I, Dzmitruk V, Charnavets T, Faustino AM, Lebl M, Fujishima K, Fried SD, Hlouchova K. Early Selection of the Amino Acid Alphabet Was Adaptively Shaped by Biophysical Constraints of Foldability. J Am Chem Soc. 2023 Mar 8;145(9):5320-5329. doi: 10.1021/jacs.2c12987. Epu...

work page doi:10.1021/jacs.2c12987 2023

[8] [8]

Brown SM, Mayer-Bacon C, Freeland S

PMID: 36826345; PMCID: PMC10017022. Brown SM, Mayer-Bacon C, Freeland S. Xeno Amino Acids: A Look into Biochemistry as We Do Not Know It. Life (Basel). 2023 Nov 29;13(12):2281. doi: 10.3390/life13122281. PMID: 38137883; PMCID: PMC10744825. Baudouin-Cornu P, Schuerer K, Marlière P, Thomas D. Intimate evolution of proteins. Proteome atomic content correlate...

work page doi:10.3390/life13122281 2023

[9] [9]

Remick KA, Helmann JD

PMID: 14645368. Remick KA, Helmann JD. The elements of life: A biocentric tour of the periodic table. Adv Microb Physiol. 2023;82:1-127. doi: 10.1016/bs.ampbs.2022.11.001. Epub 2023 Jan

work page doi:10.1016/bs.ampbs.2022.11.001 2023

[10] [10]

16 Shenhav L, Zeevi D

The new solar abundances—Part I: The observations. 16 Shenhav L, Zeevi D. Resource conservation manifests in the genetic code. Science. 2020 Nov 6;370(6517):683-687. doi: 10.1126/science.aaz9642. PMID: 33154134. Hana Rozhoňová, Joshua L Payne, Little Evidence the Standard Genetic Code Is Optimized for Resource Conservation, Molecular Biology and Evolution...

work page doi:10.1126/science.aaz9642 2020

[11] [11]

The complete genome of hyperthermophile Methanopyrus kandleri AV19 and monophyly of archaeal methanogens

Slesarev, Alexei I et al. “The complete genome of hyperthermophile Methanopyrus kandleri AV19 and monophyly of archaeal methanogens.” Proceedings of the National Academy of Sciences of the United States of America vol. 99,7 (2002): 4644-9. doi:10.1073/pnas.032671499 Krupovic M, Koonin EV. Multiple origins of viral capsid proteins from cellular ancestors. ...

work page doi:10.1073/pnas.032671499 2002

[12] [12]

Harris HMB, Hill C

PMID: 28265094; PMCID: PMC5373398. Harris HMB, Hill C. A Place for Viruses on the Tree of Life. Front Microbiol. 2021 Jan 14;11:604048. doi: 10.3389/fmicb.2020.604048. PMID: 33519747; PMCID: PMC7840587. Weiss MC, Sousa FL, Mrnjavac N, Neukirchen S, Roettger M, Nelson-Sathi S, Martin WF. The physiology and habitat of the last universal common ancestor. Nat...

work page doi:10.3389/fmicb.2020.604048 2021

[13] [13]

Order of amino acid recruitment into the genetic code resolved by last uni-versal common ancestor's protein domains

PMID: 15150418; PMCID: PMC420404. 17 Wehbi, Sawsan et al. “Order of amino acid recruitment into the genetic code resolved by last uni-versal common ancestor's protein domains.” Proceedings of the National Academy of Sciences of the United States of America vol. 121,52 (2024): e2410311121. doi:10.1073/pnas.2410311121 Crapitto, Andrew J et al. “A consensus ...

work page doi:10.1073/pnas.2410311121 2024

[14] [14]

Novoselov AA, Silva D, Schneider J, Abrevaya XC, Chaffin MS, Serrano P, Navarro MS, Conti MJ, Souza Filho CR

PMID: 26311124. Novoselov AA, Silva D, Schneider J, Abrevaya XC, Chaffin MS, Serrano P, Navarro MS, Conti MJ, Souza Filho CR. Geochemical constraints on the Hadean environment from mineral finger-prints of prokaryotes. Sci Rep. 2017 Jun 21;7(1):4008. doi: 10.1038/s41598-017-04161-2. Erratum in: Sci Rep. 2018 Mar 14;8(1):4790. doi: 10.1038/s41598-018-23130...

work page doi:10.1038/s41598-017-04161-2 2017

[15] [15]

Taxonomic partitioning at fasta sequence level by viral realm was conducted with SeqKit v2.8.0 (Shen et al. 2016). Random proteome datasets Random proteomes (n =

work page 2016

[16] [16]

Proba-ble_and_sampling_threshold_met

were generated with parameters derived from the average number of proteins and the observed minimum and maximum protein lengths across all bacterial proteomes to approximate realistic proteome architecture. Each random proteome consisted in 3,500 random protein sequences ranging from 30 to 4,200 amino acids in length, with randomly sample 22 encoded amino...

work page doi:10.5061/dryad.5hqbzkh7s 2024

[17] [17]

and DSSP v4.6 (Hekkelman et al. 2025). Secondary structures were classified using DSSP annotations and grouped into sheets (B, E), helices (H, G, I, P), and disorder (C, S, T), where H represents α-helices, G 3₁₀-helices, I π-helices, P κ-helices (poly-proline II helices), B isolated β-bridges, E extended β-strands, T hydrogen-bonded turns, and S bends. R...

work page 2025

[18] [18]

Analyses were performed on Euclidean distances calculated 22 from elemental percentages

in R (v4.2.1). Analyses were performed on Euclidean distances calculated 22 from elemental percentages. To reduce computational burden while maintaining balanced repre-sentation, datasets were randomly subsampled to up to 500 proteomes per group using a fixed seed for reproducibility. Significance was assessed using 999 permutations, and the proportion of...

work page 2015

[19] [19]

Fast and sensitive protein alignment using DIAMOND

Best high-scoring segment pairs (HSPs) were retained for downstream analyses. Alignment coverage and bitscore values were calculated for matched proteins. To normalize sequence simi-larity across proteins of different lengths and compositions, we calculated relative bitscores by dividing the monster-versus-WT bitscore by the corresponding WT self-alignmen...

work page doi:10.1371/journal.pone.0163962 2025

[20] [20]

Kunzmann, P., Müller, T.D., Greil, M. et al. Biotite: new tools for a versatile Python bioinformatics library. BMC Bioinformatics 24, 236 (2023). https://doi.org/10.1186/s12859-023-05345-6 26 Hekkelman ML, Salmoral DÁ, Perrakis A, Joosten RP. DSSP 4: FAIR annotation of protein secondary structure. Protein Science. 2025;34(8):e70208. https://doi.org/10.100...

work page doi:10.1186/s12859-023-05345-6 2023