TerraMARS: A Domain-Adapted Small-Language-Model Pipeline for Mars Terraforming Literature
Pith reviewed 2026-06-26 18:02 UTC · model grok-4.3
The pith
A fine-tuned 1B-parameter model extracts structured Mars data from papers for use in habitability and terraforming models.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
We present TerraMARS, an end-to-end information extraction pipeline that combines a domain-adapted Small Language Model to answer Mars terraforming-related questions and convert unstructured Mars science text into machine-readable structured outputs in JavaScript Object Notation (JSON) format. A corpus of open-access papers is collected and processed using a multistage retrieval and chunking framework. Google Gemma 3 1B was adapted to the domain using Quantized Low-Rank Adaptation (QLoRA) fine-tuning on Mars-specific question-answering and information extraction datasets. The resulting pipeline generates both types of output and provides a foundation for integrating knowledge from scientific
What carries the argument
The domain-adapted Gemma 3 1B model fine-tuned with QLoRA on Mars QA and extraction datasets, which performs both question answering and conversion of paper text into JSON.
If this is right
- The pipeline produces both natural-language answers to Mars questions and machine-readable JSON records from the same input papers.
- Quantitative constraints extracted from the literature become available for direct use in habitability assessment and terraforming studies.
- The structured outputs supply a ready input layer for digital-twin models of the Martian environment.
- Further accuracy gains would allow the same pipeline to handle larger volumes of new papers without additional manual curation.
Where Pith is reading between the lines
- Similar domain-adaptation steps could be applied to literature on other planetary bodies or Earth-analog environments.
- The JSON outputs could be versioned and linked to specific paper sections, creating an auditable knowledge graph for Mars research.
- Integration with simulation code might let new papers automatically update parameter ranges inside habitability models.
Load-bearing premise
The domain-adapted model produces accurate extractions and answers with high factual consistency on unseen Mars papers.
What would settle it
Running the pipeline on a held-out set of Mars papers, then measuring how often the JSON fields and answers match human-verified ground truth; accuracy or consistency below acceptable thresholds would falsify the claim.
Figures
read the original abstract
Researchers are interested in learning about Mars so that it may eventually become habitable for humans. To achieve this, there is a need for comprehensive knowledge of the planet's atmosphere, hydrology, surface chemistry, radiation environment, and spatial features through the scientific literature. These contain valuable information and meaningful quantitative constraints that can be used in other models and studies, such as habitability assessment and future terraforming studies. We present TerraMARS, an end-to-end information extraction pipeline that combines a domain-adapted Small Language Model to answer Mars terraforming-related questions and convert unstructured Mars science text into machine-readable structured outputs in JavaScript Object Notation (JSON) format. A corpus of open-access papers is collected and processed using a multistage retrieval and chunking framework. Google Gemma 3 1B was adapted to the domain using Quantized Low-Rank Adaptation (QLoRA) fine-tuning on Mars-specific question-answering and information extraction datasets. The resulting pipeline generates both types of output and provides a foundation for integrating knowledge from scientific literature into downstream applications like digital twins and habitability modeling for Mars. The output from this pipeline looks promising, but further improvements are needed to increase extraction accuracy and factual consistency.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript presents TerraMARS, an end-to-end information extraction pipeline for Mars terraforming literature. It collects open-access papers, applies a multistage retrieval and chunking framework, domain-adapts Google Gemma 3 1B via QLoRA fine-tuning on Mars-specific QA and IE datasets, and generates both structured JSON outputs and answers to terraforming-related questions. The work claims this pipeline provides a foundation for downstream applications such as digital twins and habitability modeling, while noting that outputs look promising but require further improvements in extraction accuracy and factual consistency.
Significance. If the domain-adapted 1B model were shown to produce high-accuracy extractions and factually consistent answers on held-out Mars papers, the pipeline would offer a practical, resource-efficient route to converting unstructured scientific literature into machine-readable constraints usable in computational models of planetary habitability and terraforming. The engineering choices—multistage chunking plus QLoRA on a small open model—are reproducible and lower the barrier for domain adaptation in specialized scientific corpora. However, the absence of any reported quantitative metrics means the significance cannot yet be assessed.
major comments (1)
- [Abstract] Abstract: The central claim that the pipeline 'provides a foundation for integrating knowledge from scientific literature into downstream applications like digital twins and habitability modeling' is unsupported. No quantitative metrics (entity F1, answer correctness rate, factual consistency on held-out papers, or baseline comparisons) are supplied, and the abstract itself states that 'further improvements are needed to increase extraction accuracy and factual consistency.' This directly undermines the load-bearing assertion that current outputs are usable for the claimed applications.
Simulated Author's Rebuttal
We thank the referee for the constructive feedback. We address the major comment below.
read point-by-point responses
-
Referee: [Abstract] Abstract: The central claim that the pipeline 'provides a foundation for integrating knowledge from scientific literature into downstream applications like digital twins and habitability modeling' is unsupported. No quantitative metrics (entity F1, answer correctness rate, factual consistency on held-out papers, or baseline comparisons) are supplied, and the abstract itself states that 'further improvements are needed to increase extraction accuracy and factual consistency.' This directly undermines the load-bearing assertion that current outputs are usable for the claimed applications.
Authors: We agree that the abstract's claim is not supported by quantitative metrics, which are absent from the manuscript, and that the self-noted need for further improvements in accuracy and consistency makes the assertion about usability for downstream applications premature. We will revise the abstract to qualify or remove this claim, instead describing the pipeline's design, the domain adaptation approach, and its intended potential for future applications once evaluation metrics demonstrate sufficient performance. This will align the stated contributions with the evidence provided. revision: yes
Circularity Check
No circularity: standard engineering pipeline description
full rationale
The paper presents an applied pipeline for domain-adapting Gemma 3 1B via QLoRA on Mars QA/IE datasets, followed by multistage retrieval/chunking and JSON output generation. No equations, first-principles derivations, fitted parameters renamed as predictions, or uniqueness theorems appear. All steps rely on external open-access literature and standard fine-tuning techniques; the central claim (usable outputs for downstream modeling) is framed as an engineering result whose accuracy is explicitly caveated rather than derived from the method itself. This is a self-contained description with no load-bearing self-citation chains or self-definitional reductions.
Axiom & Free-Parameter Ledger
free parameters (1)
- QLoRA rank, alpha, and learning rate
axioms (1)
- domain assumption The fine-tuned model generalizes to unseen Mars papers with acceptable factual consistency
Reference graph
Works this paper leans on
-
[1]
TerraMARS: A Domain-Adapted Small-Language-Model Pipeline for Mars Terraforming Literature Jyotsna Singh∗1, Ash Black1, Jeff Larsen2, Scott R. Saleska3,4 1College of Information Science, University of Arizona, Tucson, AZ, USA 2Biosphere 2, University of Arizona, Tucson, AZ, USA 3Department of Ecology and Evolutionary Biology, University of Arizona, Tucson...
Pith/arXiv arXiv 2026
-
[2]
an organism, microbial property, or biological adaptation
Most of the chunks are in general category followed by water (88) and atmosphere (80) and least went into survival (3). We constructed the datasets using synthetic instruction generation pipeline. Across the pipeline, a larger set of synthetic samples was generated that passed through filtering and validation. It led to 1179 high quality examples across s...
arXiv 2019
-
[3]
The case for Mars terraforming research
DeBenedictis, Erika Alden, et al. “The case for Mars terraforming research.”Nature Astronomy9.5 (2025): 634–639. Dettmers, Tim, et al. “Qlora: Efficient finetuning of quantized llms.”Advances in Neural Information Processing Systems36 (2023): 10088–10115. Devlin, Jacob, et al. “Bert: Pre-training of deep bidirectional transformers for language understandi...
2025
-
[4]
Bacterial survival in Martian conditions
Galletta, Giuseppe, Giulio Bertoloni, and Maurizio D’Alessandro. “Bacterial survival in Martian conditions.”arXiv preprint arXiv:1002.4077(2010). Gu, Yu, et al. “Domain-specific language model pretraining for biomedical natural language processing.”ACM Transactions on Computing for Healthcare (HEALTH)3.1 (2021): 1–23. Hancock, David Y., et al. “Jetstream2...
Pith/arXiv arXiv 2010
-
[5]
The curious case of neural text degeneration
1–8. Holtzman, Ari, et al. “The curious case of neural text degeneration.”arXiv preprint arXiv:1904.09751(2019). Hu, Edward J., et al. “Lora: Low-rank adaptation of large language models.”ICLR1.2 (2022):
Pith/arXiv arXiv 1904
-
[6]
Hu, Renyu. “Predicted diurnal variation of the deuterium to hydrogen ratio in water at the surface of Mars caused by mass exchange with the regolith.”Earth and Planetary Science Letters519 (2019): 192–201. Kite, Edwin S., and Mohit Melwani Daswani. “Geochemistry constrains global hydrology on Early Mars.”Earth and Planetary Science Letters524 (2019): 1157...
Pith/arXiv arXiv 2019
-
[7]
An Introduction to Mars Terraforming, 2025 Workshop Summary
15 Stork, Devon, and Erika DeBenedictis. “An Introduction to Mars Terraforming, 2025 Workshop Summary.”arXiv preprint arXiv:2510.07344(2025). Tonmoy, S.M., et al. “A comprehensive survey of hallucination mitigation techniques in large language models.”arXiv preprint arXiv:2401.01313(2024). Tshitoyan, Vahe, et al. “Unsupervised word embeddings capture late...
arXiv 2025
-
[8]
Strong water isotopic anomalies in the martian atmosphere: Probing current and ancient reservoirs
Villanueva, G.L., et al. “Strong water isotopic anomalies in the martian atmosphere: Probing current and ancient reservoirs.”Science348.6231 (2015): 218–221. Zubrin, Robert, and Christopher McKay. “Technological requirements for terraforming Mars.”29th Joint Propulsion Conference and Exhibit
2015
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.