arxiv: 2604.11915 · v1 · submitted 2026-04-13 · 💻 cs.LG · cs.AI· cs.NE· q-bio.PE

Recognition: unknown

Can AI Detect Life? Lessons from Artificial Life

Ankit Gupta , Christoph Adami (Michigan State University)

Authors on Pith no claims yet

Pith reviewed 2026-05-10 15:17 UTC · model grok-4.3

classification 💻 cs.LG cs.AIcs.NEq-bio.PE

keywords artificial lifemachine learninglife detectionout-of-distributionfalse positivesastrobiologyextraterrestrial samplesbiomarkers

0 comments

The pith

AI models trained to detect life on Earth can be fooled by artificial life into reporting it with near 100 percent confidence.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper shows that modern machine learning methods, trained to separate biotic from abiotic molecular mixtures using Earth samples, assign high life-detection scores to chemical mixtures generated by artificial life systems even though those mixtures cannot sustain life. This occurs because the artificial samples lie outside the distribution of the training data, and machine learning is known to fail on such out-of-distribution inputs. Extraterrestrial samples are expected to lie even farther outside the same terrestrial distribution, so the same methods will produce many false positives. Readers should care because planned space missions intend to use these AI tools to search for life, and the work indicates they will not work as hoped without new safeguards.

Core claim

Applying machine learning classifiers trained on terrestrial biotic and abiotic organic mixtures to samples produced by artificial life simulations shows that the models classify non-living artificial samples as biotic with near 100 percent . Because extraterrestrial samples will almost certainly be out of the distribution spanned by Earth training data, AI-based life detection will yield significant false positives.

What carries the argument

Artificial life systems that generate out-of-distribution chemical mixtures used to expose the failure mode of machine learning life detectors.

If this is right

AI life-detection systems will return many false positives when applied to extraterrestrial material.
Training sets built only from terrestrial biotic and abiotic samples are insufficient for reliable generalization.
Robust life detection will require methods explicitly designed to handle inputs far from the training distribution.
Artificial life simulations can be used to systematically reveal weaknesses in AI applied to scientific detection tasks.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Similar out-of-distribution failures are likely in other scientific domains where AI is applied to previously unseen data types.
Life-detection pipelines could be strengthened by adding diverse non-terrestrial simulated chemistries to the training process.
Hybrid systems that combine machine learning with explicit chemical or physical constraints may prove more reliable than purely data-driven classifiers.
Validation against a broad range of artificial systems should become standard practice before any AI detector is deployed on a space mission.

Load-bearing premise

That chemical mixtures created by simulated artificial life accurately represent the kinds of out-of-distribution samples that would actually be found in extraterrestrial environments.

What would settle it

Testing the same machine learning models on a set of real extraterrestrial samples and finding that they do not produce high-confidence false positives, or showing that those samples fall inside the distribution of the original Earth training data.

Figures

Figures reproduced from arXiv: 2604.11915 by Ankit Gupta, Christoph Adami (Michigan State University).

**Figure 2.** Figure 2: (top) Cross-entropy loss as a function of epoch for [PITH_FULL_IMAGE:figures/full_fig_p002_2.png] view at source ↗

**Figure 3.** Figure 3: Representative string evolution for a single run: [PITH_FULL_IMAGE:figures/full_fig_p003_3.png] view at source ↗

**Figure 4.** Figure 4: Mean SPOOFing confidence over model queries [PITH_FULL_IMAGE:figures/full_fig_p004_4.png] view at source ↗

**Figure 7.** Figure 7: Frequency of symbol (color bar on the right) as [PITH_FULL_IMAGE:figures/full_fig_p005_7.png] view at source ↗

**Figure 5.** Figure 5: Frequency of final 9-mers evolved from 780 spoof [PITH_FULL_IMAGE:figures/full_fig_p005_5.png] view at source ↗

**Figure 6.** Figure 6: Hamming distance of final 9-mers evolved from [PITH_FULL_IMAGE:figures/full_fig_p005_6.png] view at source ↗

read the original abstract

Modern machine learning methods have been proposed to detect life in extraterrestrial samples, drawing on their ability to distinguish biotic from abiotic samples based on training models using natural and synthetic organic molecular mixtures. Here we show using Artificial Life that such methods are easily fooled into detecting life with near 100% confidence even if the analyzed sample is not capable of life. This is due to modern machine learning methods' propensity to be easily fooled by out-of-distribution samples. Because extra-terrestrial samples are very likely out of the distribution provided by terrestrial biotic and abiotic samples, using AI methods for life detection is bound to yield significant false positives.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper uses artificial life to show ML life detectors can be fooled by OOD samples, but provides almost no experimental details and the leap to real ET chemistry is untested.

read the letter

The main thing to know is that the authors run artificial life simulations to produce molecular mixtures that standard ML classifiers, trained on Earth biotic and abiotic samples, label as life with near-100% confidence. They use this to argue that any AI approach to extraterrestrial life detection will generate lots of false positives because alien samples will also sit outside the training distribution. The concrete demonstration with ALife as the OOD generator is the new piece; earlier work on ML for astrobiology has not tested against evolving simulated systems in this targeted way. It does a service by reminding people that pattern recognition alone is brittle when the input space is unknown. The soft spots are straightforward. The abstract and available text give no model architectures, training data breakdowns, ALife parameters, or statistical controls, so there is no way to check whether the fooling result is stable or just an artifact of one setup. The bigger issue is the extrapolation: the stress-test note is correct that simulated ALife operates under human-chosen rules and alphabets that may not match the elemental abundances, reaction networks, or mineral interactions on Mars or Europa. If the simulated OOD is not the same kind of OOD as actual planetary chemistry, the claimed false-positive risk does not follow. This is aimed at astrobiologists and ML practitioners working on life-detection pipelines for sample return or spectroscopy missions. A reader already worried about over-reliance on black-box classifiers will find the caution useful, though the paper does not supply replacement methods. I would send it to peer review so the methods can be documented and the representativeness assumption can be examined directly.

Referee Report

2 major / 1 minor

Summary. The paper claims that modern ML classifiers trained to distinguish biotic from abiotic terrestrial molecular mixtures assign near-100% life-detection confidence to outputs from artificial life (ALife) simulations that are not capable of life. It attributes this to the models' vulnerability to out-of-distribution (OOD) inputs and concludes that the same failure mode will produce significant false positives when the methods are applied to extraterrestrial samples, which are also expected to be OOD relative to terrestrial training data.

Significance. If the empirical demonstration is reproducible and the ALife systems are accepted as a reasonable proxy for the distributional shift expected in real extraterrestrial chemistry, the result would provide a concrete cautionary example of OOD generalization failure in a high-stakes scientific inference task. It would strengthen arguments for incorporating explicit OOD detection, physics-informed constraints, or uncertainty quantification into future astrobiology instrumentation and data pipelines.

major comments (2)

[Abstract] Abstract: the central empirical claim ('near 100% confidence') is stated without any description of the ML architectures, training data composition, exact ALife generative rules or parameters, statistical controls, or the precise metric used to quantify 'confidence.' These omissions make it impossible to assess whether the reported fooling effect is robust or an artifact of particular implementation choices.
[Discussion] Discussion / Conclusion: the extrapolation from ALife fooling to extraterrestrial false-positive risk rests on the untested premise that the specific molecular alphabets, reaction networks, and physical constraints inside the ALife simulator produce the same OOD failure modes that would arise from unknown planetary chemistries (different elemental abundances, mineral interactions, or reaction networks). No comparative analysis or sensitivity test is provided to support this representativeness assumption.

minor comments (1)

[Abstract] The abstract and introduction could more precisely delimit the scope of 'such methods' (e.g., which families of ML models were tested) to avoid over-generalization.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their detailed and constructive feedback. We address each major comment below, indicating the changes we will make to the manuscript.

read point-by-point responses

Referee: [Abstract] Abstract: the central empirical claim ('near 100% confidence') is stated without any description of the ML architectures, training data composition, exact ALife generative rules or parameters, statistical controls, or the precise metric used to quantify 'confidence.' These omissions make it impossible to assess whether the reported fooling effect is robust or an artifact of particular implementation choices.

Authors: We agree that the abstract is too concise and omits key methodological details needed to evaluate the central claim. In the revised manuscript we will expand the abstract to briefly specify the ML architectures (feed-forward neural networks operating on molecular feature vectors), the training data (terrestrial biotic and abiotic organic mixtures drawn from public databases), the ALife simulator (a standard reaction-network model with defined molecular alphabets and update rules), the statistical controls (multiple random seeds and baseline classifiers), and the confidence metric (maximum softmax probability). Full implementation details will remain in the methods section and supplementary material. revision: yes
Referee: [Discussion] Discussion / Conclusion: the extrapolation from ALife fooling to extraterrestrial false-positive risk rests on the untested premise that the specific molecular alphabets, reaction networks, and physical constraints inside the ALife simulator produce the same OOD failure modes that would arise from unknown planetary chemistries (different elemental abundances, mineral interactions, or reaction networks). No comparative analysis or sensitivity test is provided to support this representativeness assumption.

Authors: The referee is correct that the paper treats the chosen ALife systems as a representative proxy for OOD extraterrestrial chemistry without performing explicit sensitivity tests across alternative generative models. We will revise the discussion to state this assumption explicitly, note that the ALife examples illustrate one class of chemically plausible yet non-biotic OOD inputs, and acknowledge that other planetary chemistries could induce different failure modes. We will also add a short paragraph outlining how future studies could vary elemental abundances or reaction constraints to test robustness. A full comparative analysis lies outside the scope of the present work, which is intended as a proof-of-concept demonstration rather than an exhaustive survey. revision: partial

Circularity Check

0 steps flagged

No significant circularity detected

full rationale

The paper's central argument rests on an empirical demonstration: ML classifiers trained on terrestrial biotic/abiotic molecular mixtures assign high life-detection confidence to outputs from artificial life simulations, which are treated as out-of-distribution. This is then used to warn that extraterrestrial samples, also presumed OOD, will produce false positives. No load-bearing step reduces by construction to a fitted parameter renamed as prediction, a self-definition, or a self-citation chain. The OOD concept and the ALife simulation rules are external to the fitted model outputs; the extrapolation follows from standard ML generalization principles rather than internal re-derivation. The derivation is therefore self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on the domain assumption that extraterrestrial chemistry will be out-of-distribution relative to terrestrial training sets and that artificial life provides a valid proxy for such chemistry. No free parameters or new entities are introduced in the abstract.

axioms (1)

domain assumption Machine learning models trained on terrestrial biotic and abiotic organic mixtures will encounter out-of-distribution inputs when applied to extraterrestrial samples.
This assumption drives the prediction of significant false positives and is stated directly in the abstract.

pith-pipeline@v0.9.0 · 5402 in / 1207 out tokens · 41773 ms · 2026-05-10T15:17:43.453833+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

22 extracted references · 3 canonical work pages · 2 internal anchors

[1]

Adami, C. (1998). Introduction to Artificial Life . Springer Verlag, New York

1998
[2]

Adami, C. (2006). Digital genetics: unravelling the genetic basis of evolution. Nature Reviews Genetics , 7(2):109--118

2006
[3]

Adami, C. (2024). The Evolution of Biological Information . Princeton University Press, Princeton, N.J

2024
[4]

and Brown, C

Adami, C. and Brown, C. (1994). Evolutionary learning in the 2D Artificial Life system Avida . In Brooks, R. and Maes, P., editors, Proceedings of the 4th International Conference on the Synthesis and Simulation of Living Systems (Artificial Life 4) , pages 377--381. MIT Press

1994
[5]

and LaBar, T

Adami, C. and LaBar, T. (2017). From entropy to information: Biased typewriters and the origin of life. In Walker, S., Davies, P., and Ellis, G., editors, Information and Causality: From Matter to Life , pages 95--112. Cambridge University Press, Cambridge, MA

2017
[6]

and Adami, C

C G, N. and Adami, C. (2021). Information-theoretic characterization of the complete genotype-phenotype map of a complex pre-biotic world. Phys Life Rev , 38:111--114

2021
[7]

C G , N., LaBar, T., Hintze, A., and Adami, C. (2017). Origin of life in a digital microcosm. Philos Trans R Soc Lond A , 375:20160350

2017
[8]

A., Hinman, N

Chan, M. A., Hinman, N. W., Potter-McIntyre, S. L., Schubert, K. E., Gillams, R. J., et al. (2019). Deciphering biosignatures in planetary contexts. Astrobiology , 19:1075--1102

2019
[9]

J., Hystad, G., Prabhu, A., Wong, M

Cleaves, 2nd, H. J., Hystad, G., Prabhu, A., Wong, M. L., Cody, G. D., et al. (2023). A robust, agnostic molecular biosignature based on machine learning. Proc Natl Acad Sci U S A , 120:e2307149120

2023
[10]

Dorn, E. D. and Adami, C. (2011). Robust monomer-distribution biosignatures in evolving digital biota. Astrobiology , 11:959--68

2011
[11]

D., McDonald, G

Dorn, E. D., McDonald, G. D., Storrie-Lombardi, M. C., and Nealson, K. H. (2003). Principal component analysis and neural networks for detection of amino acid biosignatures. Icarus , 166:403--409

2003
[12]

D., Nealson, K

Dorn, E. D., Nealson, K. H., and Adami, C. (2011). Monomer abundance patterns as a universal biosignature: Examples from terrestrial and artificial life. Journal of Molecular Evolution , 72:283--295

2011
[13]

Goodfellow, I., Bengio, Y., and Courville, A. (2016). Deep Learning . MIT Press

2016
[14]

Explaining and Harnessing Adversarial Examples

Goodfellow, I. J., Shlens, J., and Szegedy, C. (2014). Explaining and harnessing adversarial examples. arXiv preprint arXiv:1412.6572

work page internal anchor Pith review arXiv 2014
[15]

Gupta, A., Adami, C., and Dolson, E. (2025). SPOOF : Simple pixel operations for out-of-distribution fooling. arXiv preprint arXiv:2512.06185

work page arXiv 2025
[16]

J., Garmon, C

Hystad, G., Cleaves II, H. J., Garmon, C. A., Wong, M. L., Prabhu, A., et al. (2025). Detecting biosignatures in complex molecular mixtures from pyrolysis-gas chromatography-mass spectrometry data using machine learning. Journal of Geophysical Research: Machine Learning and Computation , 2:e2024JH000441

2025
[17]

Nguyen, A., Yosinski, J., and Clune, J. (2015). Deep neural networks are easily fooled: High confidence predictions for unrecognizable images. In CVPR

2015
[18]

M., and Wilke, C

Ofria, C., Bryson, D. M., and Wilke, C. O. (2009). Avida: A software platform for research in computational evolutionary biology. In Komosinski, M. and Adamatzky, A., editors, Artificial Life Models in Software , pages 3--35. Springer London

2009
[19]

Smith, H. B. and Mathis, C. (2023). Life detection in a universe of false positives. BioEssays , 45:2300050

2023
[20]

Szegedy, C., Zaremba, W., Sutskever, I., Bruna, J., Erhan, D., et al. (2013). Intriguing properties of neural networks. arXiv preprint arXiv:1312.6199

work page internal anchor Pith review arXiv 2013
[21]

I., Bains, W., Cronin, L., DasSarma, S., Danielache, S., et al

Walker, S. I., Bains, W., Cronin, L., DasSarma, S., Danielache, S., et al. (2018). Exoplanet biosignatures: Future directions. Astrobiology , 18:779--824

2018
[22]

L., Prabhu, A., Alexander, C

Wong, M. L., Prabhu, A., Alexander, C. O., Cleaves, 2nd, H. J., Cody, G. D., et al. (2025). Organic geochemical evidence for life in archean rocks identified by pyrolysis-gc-ms and supervised machine learning. Proc Natl Acad Sci U S A , 122:e2514534122

2025