Building AI-Ready Data Systems for Space Life Sciences, Aerospace Medicine, and Deep Space Exploration
Pith reviewed 2026-06-30 08:41 UTC · model grok-4.3
The pith
Spaceflight biological data requires a three-tier progression from FAIR to AI-ready to space-ready forms to become usable by AI systems.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The authors state that a three-tier approach proceeding from FAIR to AI-ready to space-ready data, backed by a neutral international coordinating body, is required to systematically restructure heterogeneous spaceflight biological data into machine-actionable forms that close the AI access gap and enable trustworthy, agent-accessible infrastructure for deep space biological research.
What carries the argument
The three-tier data progression from FAIR to AI-ready to space-ready, which carries the argument by defining successive levels of machine accessibility and space-specific optimization for biological datasets.
If this is right
- Existing infrastructures can be improved to support AI access to diverse spaceflight datasets.
- AI systems gain the capacity to access and analyze heterogeneous scientific data from space missions.
- A trustworthy, agent-accessible infrastructure becomes available for deep space biological research.
- Systematic restructuring of data into machine-actionable forms is needed beyond current open-access practices.
Where Pith is reading between the lines
- Standardized data handling could emerge across international space agencies to support shared AI tools.
- Integrated datasets might allow AI to model combined effects of space environment and biology in real time.
- The approach could extend to other high-stakes domains like climate or medical research needing agent-accessible data.
Load-bearing premise
That existing open-access infrastructures cannot meet the distinct demands of growing AI approaches on data structure, metadata, and access interfaces.
What would settle it
Showing that multiple current FAIR-compliant space biology databases can be queried and analyzed accurately by diverse AI models with no added restructuring or new governance.
read the original abstract
While AI holds the potential to revolutionize space life sciences, realizing this promise is contingent upon the systematic restructuring of heterogeneous spaceflight biological data into machine-actionable, AI-ready forms. Even though open access principles support human reuse and scientific reproducibility, this does not necessarily enable AI systems to access and analyze such a diverse set of scientific datasets. In addition, the growing array of AI approaches places distinct demands on data structure, metadata, and access interfaces. In order to respond to such growing changes we propose a three-tier approach, proceeding from FAIR to AI-ready to space-ready data. We discuss existing infrastructures and how they can be improved to close the AI access gap. We conclude by proposing a neutral international coordinating body as the governance backbone for the trustworthy, agent-accessible space biology infrastructure that deep space biological research will require.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript claims that realizing the potential of AI in space life sciences requires restructuring heterogeneous spaceflight biological data into machine-actionable forms via a three-tier progression from FAIR to AI-ready to space-ready data. It asserts that open access and FAIR principles do not suffice for AI systems due to their distinct demands on data structure, metadata, and access interfaces, and proposes a neutral international coordinating body to provide governance for trustworthy, agent-accessible infrastructure.
Significance. Should the proposed three-tier approach and coordinating body be adopted and validated, this work would be significant in bridging the gap between current data infrastructures and the needs of AI for deep space exploration. It identifies a potentially critical limitation in existing open-access systems for supporting advanced AI applications in biology and medicine, offering a conceptual framework that could guide future infrastructure development in the field.
major comments (2)
- [Abstract] Abstract: The assertion that open access 'does not necessarily enable AI systems to access and analyze such a diverse set of scientific datasets' and that 'the growing array of AI approaches places distinct demands on data structure, metadata, and access interfaces' is load-bearing for justifying the three-tier proposal and new coordinating body, yet the text supplies no concrete examples of named AI methods, specific spaceflight datasets, or documented failure modes of existing systems such as NASA GeneLab or ESA archives.
- [Abstract] Abstract (final paragraph): The proposal for a 'neutral international coordinating body' as governance backbone rests on the unshown premise that existing bodies cannot meet AI demands; without differentiation from or analysis of current international data-sharing mechanisms, the necessity of a new entity over incremental improvements to existing infrastructures is not secured.
minor comments (2)
- The terms 'AI-ready' and 'space-ready' are introduced without explicit definitions or criteria in the abstract, which would improve clarity if provided with examples in the main text.
- The discussion of how existing infrastructures can be improved would benefit from at least one illustrative case study or table comparing current metadata standards to AI-specific requirements.
Simulated Author's Rebuttal
We thank the referee for the constructive comments, which help clarify the justification for our proposed framework. We respond to each major comment below and indicate planned revisions to the manuscript.
read point-by-point responses
-
Referee: [Abstract] Abstract: The assertion that open access 'does not necessarily enable AI systems to access and analyze such a diverse set of scientific datasets' and that 'the growing array of AI approaches places distinct demands on data structure, metadata, and access interfaces' is load-bearing for justifying the three-tier proposal and new coordinating body, yet the text supplies no concrete examples of named AI methods, specific spaceflight datasets, or documented failure modes of existing systems such as NASA GeneLab or ESA archives.
Authors: We agree that the abstract would be strengthened by concrete examples to support the load-bearing claims. The full manuscript discusses limitations of current infrastructures, but does not provide named AI methods or specific failure modes in the abstract. We will revise the abstract to include brief, specific examples (e.g., transformer-based models requiring standardized metadata schemas and challenges with heterogeneous omics data in GeneLab) while maintaining length constraints. revision: yes
-
Referee: [Abstract] Abstract (final paragraph): The proposal for a 'neutral international coordinating body' as governance backbone rests on the unshown premise that existing bodies cannot meet AI demands; without differentiation from or analysis of current international data-sharing mechanisms, the necessity of a new entity over incremental improvements to existing infrastructures is not secured.
Authors: The manuscript positions the new body as necessary for neutral, cross-agency coordination focused on agent-accessible infrastructure. We acknowledge that the abstract does not differentiate from existing mechanisms. In revision, we will expand the discussion section with a concise analysis of current bodies (e.g., NASA GeneLab governance, ESA data policies, and ISS international agreements) to highlight gaps in AI-specific requirements that incremental changes may not fully address. revision: yes
Circularity Check
No circularity: high-level conceptual proposal without derivations or reductions to inputs
full rationale
The manuscript is a policy-style proposal advocating a three-tier data progression (FAIR to AI-ready to space-ready) plus a coordinating body. It contains no equations, no fitted parameters, no predictions derived from data, and no mathematical derivations. The central argument rests on the stated premise that existing open-access systems fall short for AI demands, but this premise is presented as an assumption rather than derived from or reduced to any self-referential construction within the paper. No self-citation chains, ansatzes, or renamings function as load-bearing steps that collapse the claim back onto its own inputs. The text is therefore self-contained as an advocacy document and exhibits none of the enumerated circularity patterns.
Axiom & Free-Parameter Ledger
axioms (2)
- domain assumption Open access principles support human reuse and scientific reproducibility but do not necessarily enable AI systems to access and analyze diverse scientific datasets
- domain assumption The growing array of AI approaches places distinct demands on data structure, metadata, and access interfaces
invented entities (1)
-
neutral international coordinating body
no independent evidence
Reference graph
Works this paper leans on
-
[1]
Afshinnekoo, E. et al. Fundamental biological features of spaceflight: Advancing the field to enable deep-space exploration. Cell 183 , 1162–1184 (2020)
2020
-
[2]
Gebre, S. G. et al. NASA open science data repository: open science for life in space. Nucleic Acids Res. 53 , D1697–D1710 (2025)
2025
-
[3]
Otsuki, A. et al. ibSLS: A Biobank for Democratizing Access to Multi-Omics Data and Biospecimens from Spaceflight Research. bioRxiv (2025) doi:10.1101/2025.09.08.675003
-
[4]
Moon Base Igniting Progress
NASA. Moon Base Igniting Progress. NP-2026-04-6806-HQ https://www.nasa.gov/wp-content/uploads/2026/04/moon-base-architecture-users-guide. pdf (2026)
2026
-
[5]
Overbey, E. G. et al. The Space Omics and Medical Atlas (SOMA) and international astronaut biobank. Nature 632 , 1145–1154 (2024)
2024
-
[6]
Into the deep
Dolgin, E. Into the deep. Science 391 , 436–441 (2026)
2026
-
[7]
Mason, C. E. et al. A second space age spanning omics, platforms and medicine across orbits. Nature 632 , 995–1008 (2024)
2024
-
[8]
Sanders, L. M. et al. Biological research and self-driving labs in deep space supported by artificial intelligence. Nat. Mach. Intell. 5 , 208–219 (2023)
2023
-
[9]
Scott, R. T. et al. Biomonitoring and precision health in deep space supported by artificial intelligence. Nat. Mach. Intell. 5 , 196–207 (2023)
2023
-
[10]
Moon to Mars Architecture Definition Document
NASA, Exploration Systems Development Mission Directorate. Moon to Mars Architecture Definition Document . https://www.nasa.gov/wp-content/uploads/2025/12/add-revision-c-20251211.pdf?emrc= 18 02371b
2025
-
[11]
Ilangovan, H. et al. Harmonizing heterogeneous transcriptomics datasets for machine learning-based analysis to identify spaceflown murine liver-specific changes. NPJ Microgravity 10 , 61 (2024)
2024
-
[12]
& Cline, M
Casaletto, J., Bernier, A., McDougall, R. & Cline, M. S. Federated analysis for privacy-preserving data sharing: A technical and legal primer. Annu. Rev. Genomics Hum. Genet. 24 , 347–368 (2023)
2023
-
[13]
Gao, S. et al. Empowering biomedical discovery with AI agents. Cell 187 , 6125–6151 (2024)
2024
-
[14]
Theodoris, C. V. et al. Transfer learning enables predictions in network biology. Nature 618 , 616–624 (2023)
2023
-
[15]
Cui, H. et al. scGPT: toward building a foundation model for single-cell multi-omics using generative AI. Nat. Methods 21 , 1470–1480 (2024)
2024
-
[16]
Hollmann, N. et al. Accurate predictions on small data with a tabular foundation model. Nature 637 , 319–326 (2025)
2025
-
[17]
Li, B., Saini, A. K., Hernandez, J. G. & Moore, J. H. Agentic AI and the rise of in silico team science in biomedical research. Nat. Biotechnol. (2026) doi:10.1038/s41587-026-03035-1
-
[18]
Soman, K. et al. Biomedical knowledge graph-optimized prompt generation for large language models. Bioinformatics 40 , btae560 (2024)
2024
-
[19]
Caufield, H. et al. CurateGPT: A flexible language-model assisted biocuration tool. arXiv [cs.CL] (2024) doi:10.48550/arXiv.2411.00046
-
[20]
Nelson, C. A. et al. Knowledge network embedding of transcriptomic data from spaceflown mice uncovers signs and symptoms associated with terrestrial diseases. Life 11 , 42 (2021)
2021
-
[21]
& Zitnik, M
Chandak, P., Huang, K. & Zitnik, M. Building a knowledge graph to enable precision medicine. Sci. Data 10 , 67 (2023)
2023
-
[22]
Huang, K. et al. A foundation model for clinician-centered drug repurposing. medRxiv 19 (2024) doi:10.1101/2023.03.19.23287458
-
[23]
Casaletto, J. et al. Bridging Earth and space: A flexible and resilient federated learning framework deployed on the International Space Station. bioRxiv (2025) doi:10.1101/2025.01.14.633017
-
[24]
Sheller, M. J. et al. Federated learning in medicine: facilitating multi-institutional collaborations without sharing patient data. Sci. Rep. 10 , 12598 (2020)
2020
-
[25]
Pati, S. et al. Federated learning enables big data for rare cancer boundary detection. Nat. Commun. 13 , 7346 (2022)
2022
-
[26]
Pereira, T. D. et al. SLEAP: A deep learning system for multi-animal pose tracking. Nat. Methods 19 , 486–495 (2022)
2022
-
[27]
Bohnslav, J. P. et al. DeepEthogram, a machine learning pipeline for supervised behavior classification from raw pixels. Elife 10 , (2021)
2021
-
[28]
Ma, J. et al. Segment anything in medical images. Nat. Commun. 15 , 654 (2024)
2024
-
[29]
Huang, A. S. et al. Artificial intelligence deep learning models to predict Spaceflight Associated Neuro-Ocular Syndrome. Am. J. Ophthalmol. 278 , 115–123 (2025)
2025
-
[30]
Casaletto, J. A. et al. Analyzing the relationship between gene expression and phenotype in space-flown mice using a causal inference machine learning ensemble. Sci. Rep. 15 , 2363 (2025)
2025
-
[31]
Gottesman, O. et al. Guidelines for reinforcement learning in healthcare. Nat. Med. 25 , 16–18 (2019)
2019
-
[32]
& Bez, J
Hiniduma, K., Byna, S. & Bez, J. L. Data readiness for AI: A 360-degree survey. ACM Comput. Surv. 57 , 1–39 (2025)
2025
-
[33]
Rutter, L. et al. A New Era for Space Life Science: International Standards for Space Omics Processing. Patterns (N Y) 1 , 100148 (2020)
2020
-
[34]
Manzano, A. et al. Enhancing European capabilities for application of multi-omics studies in biology and biomedicine space research. iScience 26 , 107289 (2023)
2023
-
[35]
Wilkinson, M. D. et al. The FAIR Guiding Principles for scientific data management and stewardship. Sci Data 3 , 160018 (2016). 20
2016
-
[36]
Verhulst, S., Zahuranec, A. & Chafetz, H. Moving Toward the FAIR-R principles: Advancing AI-Ready Data. (2025) doi:10.2139/ssrn.5164337
-
[37]
Hiniduma, K., Ryan, D., Byna, S., Bez, J. L. & Madduri, R. AIDRIN 2.0: A framework to assess data readiness for AI. arXiv [cs.CY] (2025) doi:10.48550/arXiv.2505.18213
-
[38]
Clark, T. et al. AI-readiness for biomedical data: Bridge2AI recommendations. bioRxivorg (2024) doi:10.1101/2024.10.23.619844
-
[39]
Rehm, H. L. et al. GA4GH: International policies and standards for data sharing across genomic research and healthcare. Cell Genom. 1 , 100029 (2021)
2021
-
[40]
The White House https://www.whitehouse.gov/presidential-actions/2025/11/launching-the-genesis-mission /
Launching the Genesis Mission. The White House https://www.whitehouse.gov/presidential-actions/2025/11/launching-the-genesis-mission /. (2025)
2025
-
[41]
V., Gentemann, C
Costes, S. V., Gentemann, C. L., Platts, S. H. & Carnell, L. A. Biological horizons: pioneering open science in the cosmos. Nat. Commun. 15 , 4780 (2024)
2024
-
[42]
& Jacobsen, A
Mons, B., Schultes, E., Liu, F. & Jacobsen, A. The FAIR principles: First generation implementation choices and challenges. Data Intell. 2 , 1–9 (2020)
2020
-
[43]
Apache Parquet
Apache Software Foundation. Apache Parquet. Parquet https://parquet.apache.org/ (2026)
2026
-
[44]
Apache Arrow
Apache Software Foundation. Apache Arrow. Apache Arrow https://arrow.apache.org/ (2026)
2026
-
[45]
Sanders, L. M. et al. Batch effect correction methods for NASA GeneLab transcriptomic datasets. Frontiers in Astronomy and Space Sciences 10 , (2023)
2023
-
[46]
Casaletto, J. A. et al. Machine learning ensemble investigates age in the transcriptomic response to spaceflight in Murine mammary tissue: Observational study. JMIRx Bio 4 , e73041–e73041 (2026)
2026
-
[47]
Overbey, E. G. et al. Challenges and considerations for single-cell and spatially resolved transcriptomics sample collection during spaceflight. Cell Rep. Methods 2 , 100325 (2022)
2022
-
[48]
& Rocca-Serra, P
González-Beltrán, A., Maguire, E., Sansone, S.-A. & Rocca-Serra, P. linkedISA: 21 semantic representation of ISA-Tab experimental metadata. BMC Bioinformatics 15 Suppl 14 , S4 (2014)
2014
-
[49]
Caufield, J. H. et al. Structured Prompt Interrogation and Recursive Extraction of Semantics (SPIRES): a method for populating knowledge bases using zero-shot learning. Bioinformatics 40 , (2024)
2024
-
[50]
Moxon, S. A. T. et al. LinkML: an open data modeling framework. GigaScience 15 , (2026)
2026
-
[51]
(Github)
AI4Curation . (Github). https://github.com/ai4curation
-
[52]
https://www.w3.org/TR/prov-o/
PROV-O: The PROV Ontology. https://www.w3.org/TR/prov-o/
-
[53]
Wilkinson, S. R. et al. Applying the FAIR Principles to computational workflows. Sci. Data 12 , 328 (2025)
2025
-
[54]
Di Tommaso, P. et al. Nextflow enables reproducible computational workflows. Nat. Biotechnol. 35 , 316–319 (2017)
2017
-
[55]
Walsh, I. et al. DOME: recommendations for supervised machine learning validation in biology. Nat. Methods 18 , 1122–1127 (2021)
2021
-
[56]
SPD-41a: Scientific Information Policy for the Science Mission Directorate
NASA Science Mission Directorate. SPD-41a: Scientific Information Policy for the Science Mission Directorate . https://science.nasa.gov/wp-content/uploads/2023/08/smd-information-policy-spd-41a.p df (2022)
2023
-
[57]
Putman, T. E. et al. The Monarch Initiative in 2024: an analytic platform integrating phenotypes, genes and diseases across species. Nucleic Acids Research 52 , D938–D949 (2024)
2024
-
[58]
Morris, J. H. et al. The scalable precision medicine open knowledge engine (SPOKE): a massive knowledge graph of biomedical information. Bioinformatics 39 , (2023)
2023
-
[59]
Şen, B. et al. CROssBARv2: A unified computational framework for heterogeneous biomedical data representation and LLM-driven exploration. bioRxiv (2026) doi: 10.64898/2026.04.12.718028
-
[60]
Morton, K. et al. ROBOKOP: an abstraction layer and user interface for knowledge 22 graphs to support question answering. Bioinformatics 35 , 5382–5384 (2019)
2019
-
[61]
Bizon, C. et al. ROBOKOP KG and KGB: Integrated knowledge graphs from federated sources. J. Chem. Inf. Model. 59 , 4968–4973 (2019)
2019
-
[62]
Lobentanzer, S. et al. Democratizing knowledge representation with BioCypher. Nat. Biotechnol. 41 , 1056–1059 (2023)
2023
-
[63]
Kuehl, M. et al. BioContextAI is a community hub for agentic biomedical systems. Nat. Biotechnol. 43 , 1755–1757 (2025)
2025
-
[64]
Makarov, V. A. et al. Natural language querying of biological databases with large language models. Drug Discov. Today 31 , 104654 (2026)
2026
-
[65]
Edge, D. et al. From local to global: A graph RAG approach to query-focused summarization. arXiv [cs.CL] (2024) doi:10.48550/arXiv.2404.16130
work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2404.16130 2024
-
[66]
The crisis of biomedical foundation models
Wang, F. The crisis of biomedical foundation models. J. Biomed. Inform. 171 , 104917 (2025)
2025
-
[67]
Huang, K. et al. Biomni: A general-purpose biomedical AI agent. bioRxivorg (2025) doi:10.1101/2025.05.30.656746
-
[68]
& Donoviel, D
Wu, J., Strangman, G., Bokhari, R. & Donoviel, D. Human and Environmental Research Matrix for Exploration of Space (HERMES) Project. in (International Astronautical Federation, 2025)
2025
-
[69]
Rutter, L. A. et al. Astronaut omics and the impact of space on the human body at scale. Nat. Commun. 15 , 4952 (2024)
2024
-
[70]
Camera, A. et al. Aging and putative frailty biomarkers are altered by spaceflight. Sci. Rep. 14 , 13098 (2024)
2024
-
[71]
D., Chen, Y
Li, R., Romano, J. D., Chen, Y. & Moore, J. H. Centralized and federated models for the analysis of clinical data. Annu. Rev. Biomed. Data Sci. 7 , 179–199 (2024)
2024
-
[72]
Casaletto, J. et al. Using federated learning to overcome data gravity in space. in 2022 ASGSR Annual Conference (2022)
2022
-
[73]
A., Dunbar, B
Bloomfield, S. A., Dunbar, B. J., Schmit, C. D., Sawyer, A. J. & Charles, J. B. Developing an international database on long-term health effects of spaceflight. Acta Astronaut. 23 198 , 347–353 (2022)
2022
-
[74]
Shiba, D. et al. Development of new experimental platform ‘MARS’-Multiple Artificial-gravity Research System-to elucidate the impacts of micro/partial gravity on mice. Sci. Rep. 7 , 10837 (2017)
2017
-
[75]
Rambla, J. et al. Beacon v2 and Beacon networks: A ‘lingua franca’ for federated data discovery in biomedical genomics, and beyond. Hum. Mutat. 43 , 791–799 (2022)
2022
-
[76]
Akhtar, M. et al. Croissant: A Metadata Format for ML-Ready Datasets. in Proceedings of the Eighth Workshop on Data Management for End-to-End Machine Learning 1–6 (ACM, New York, NY, USA, 2024)
2024
-
[77]
Gebru, T. et al. Datasheets for datasets. Commun. ACM 64 , 86–92 (2021)
2021
-
[78]
Hespeels, B. et al. Rotifers in Space: Transcriptomic Response of the bdelloid rotifer Adineta vaga aboard the International Space Station. NASA GeneLab https://doi.org/10.26030/K36D-D232 (2025)
-
[79]
Moris, V. C. et al. Rotifers in space: transcriptomic response of the bdelloid rotifer Adineta vaga aboard the International Space Station. BMC Biol. 23 , 182 (2025)
2025
-
[80]
Qin, C. et al. SciHorizon: Benchmarking AI-for-science readiness from scientific data to large language models. in Proceedings of the 31st ACM SIGKDD Conference on Knowledge Discovery and Data Mining V.2 5754–5765 (ACM, New York, NY, USA, 2025)
2025
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.