TerraMARS: A Domain-Adapted Small-Language-Model Pipeline for Mars Terraforming Literature

Ash Black; Jeff Larsen; Jyotsna Singh; Scott R. Saleska

arxiv: 2606.19700 · v1 · pith:RHMO655Mnew · submitted 2026-06-18 · 💻 cs.CL

TerraMARS: A Domain-Adapted Small-Language-Model Pipeline for Mars Terraforming Literature

Jyotsna Singh , Ash Black , Jeff Larsen , Scott R. Saleska This is my paper

Pith reviewed 2026-06-26 18:02 UTC · model grok-4.3

classification 💻 cs.CL

keywords Mars terraforminginformation extractionsmall language modelsdomain adaptationQLoRAscientific literature processingJSON structured outputhabitability modeling

0 comments

The pith

A fine-tuned 1B-parameter model extracts structured Mars data from papers for use in habitability and terraforming models.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper builds TerraMARS as an end-to-end pipeline that first retrieves and chunks open-access Mars science papers, then applies a domain-adapted version of Gemma 3 1B to answer terraforming questions and turn text into JSON records. The adaptation uses QLoRA fine-tuning on Mars-specific question-answering and information-extraction datasets. A sympathetic reader would care because the resulting outputs supply quantitative constraints that can feed directly into digital-twin simulations and habitability assessments without manual parsing of the literature. The work treats the pipeline as a foundation rather than a finished product.

Core claim

We present TerraMARS, an end-to-end information extraction pipeline that combines a domain-adapted Small Language Model to answer Mars terraforming-related questions and convert unstructured Mars science text into machine-readable structured outputs in JavaScript Object Notation (JSON) format. A corpus of open-access papers is collected and processed using a multistage retrieval and chunking framework. Google Gemma 3 1B was adapted to the domain using Quantized Low-Rank Adaptation (QLoRA) fine-tuning on Mars-specific question-answering and information extraction datasets. The resulting pipeline generates both types of output and provides a foundation for integrating knowledge from scientific

What carries the argument

The domain-adapted Gemma 3 1B model fine-tuned with QLoRA on Mars QA and extraction datasets, which performs both question answering and conversion of paper text into JSON.

If this is right

The pipeline produces both natural-language answers to Mars questions and machine-readable JSON records from the same input papers.
Quantitative constraints extracted from the literature become available for direct use in habitability assessment and terraforming studies.
The structured outputs supply a ready input layer for digital-twin models of the Martian environment.
Further accuracy gains would allow the same pipeline to handle larger volumes of new papers without additional manual curation.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Similar domain-adaptation steps could be applied to literature on other planetary bodies or Earth-analog environments.
The JSON outputs could be versioned and linked to specific paper sections, creating an auditable knowledge graph for Mars research.
Integration with simulation code might let new papers automatically update parameter ranges inside habitability models.

Load-bearing premise

The domain-adapted model produces accurate extractions and answers with high factual consistency on unseen Mars papers.

What would settle it

Running the pipeline on a held-out set of Mars papers, then measuring how often the JSON fields and answers match human-verified ground truth; accuracy or consistency below acceptable thresholds would falsify the claim.

Figures

Figures reproduced from arXiv: 2606.19700 by Ash Black, Jeff Larsen, Jyotsna Singh, Scott R. Saleska.

read the original abstract

Researchers are interested in learning about Mars so that it may eventually become habitable for humans. To achieve this, there is a need for comprehensive knowledge of the planet's atmosphere, hydrology, surface chemistry, radiation environment, and spatial features through the scientific literature. These contain valuable information and meaningful quantitative constraints that can be used in other models and studies, such as habitability assessment and future terraforming studies. We present TerraMARS, an end-to-end information extraction pipeline that combines a domain-adapted Small Language Model to answer Mars terraforming-related questions and convert unstructured Mars science text into machine-readable structured outputs in JavaScript Object Notation (JSON) format. A corpus of open-access papers is collected and processed using a multistage retrieval and chunking framework. Google Gemma 3 1B was adapted to the domain using Quantized Low-Rank Adaptation (QLoRA) fine-tuning on Mars-specific question-answering and information extraction datasets. The resulting pipeline generates both types of output and provides a foundation for integrating knowledge from scientific literature into downstream applications like digital twins and habitability modeling for Mars. The output from this pipeline looks promising, but further improvements are needed to increase extraction accuracy and factual consistency.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

This is a standard QLoRA domain-adaptation pipeline applied to Mars terraforming papers, with no quantitative results to back the utility claim.

read the letter

The paper describes TerraMARS, an end-to-end setup that gathers open Mars science papers, chunks them via multistage retrieval, fine-tunes Gemma 3 1B with QLoRA on domain QA and extraction examples, and then runs the model to produce answers plus JSON outputs. That is the main contribution.

They handle the corpus collection and the fine-tuning steps in a clear, practical way. The goal of feeding structured knowledge into habitability models or digital twins is reasonable for this narrow literature.

The technique itself follows routine practice for small-model adaptation, so nothing new appears in the method. The value would have to come from the specific corpus and the quality of the outputs.

The soft spot is the total absence of numbers. No entity F1, no answer accuracy on held-out papers, no baseline comparisons, and no error rates. The abstract itself says the results look promising but need work on accuracy and factual consistency. Without those measurements, the claim that the pipeline supplies a usable foundation for downstream modeling stays untested.

The paper is aimed at planetary scientists or applied NLP people who already work with small models and want a ready template for a specialized scientific domain. A reader looking for a worked example of QLoRA on scientific text might get some procedural ideas from it.

I would send this to peer review only if the full manuscript adds proper quantitative validation and comparisons. On the current evidence it does not yet justify referee time.

Referee Report

1 major / 0 minor

Summary. The manuscript presents TerraMARS, an end-to-end information extraction pipeline for Mars terraforming literature. It collects open-access papers, applies a multistage retrieval and chunking framework, domain-adapts Google Gemma 3 1B via QLoRA fine-tuning on Mars-specific QA and IE datasets, and generates both structured JSON outputs and answers to terraforming-related questions. The work claims this pipeline provides a foundation for downstream applications such as digital twins and habitability modeling, while noting that outputs look promising but require further improvements in extraction accuracy and factual consistency.

Significance. If the domain-adapted 1B model were shown to produce high-accuracy extractions and factually consistent answers on held-out Mars papers, the pipeline would offer a practical, resource-efficient route to converting unstructured scientific literature into machine-readable constraints usable in computational models of planetary habitability and terraforming. The engineering choices—multistage chunking plus QLoRA on a small open model—are reproducible and lower the barrier for domain adaptation in specialized scientific corpora. However, the absence of any reported quantitative metrics means the significance cannot yet be assessed.

major comments (1)

[Abstract] Abstract: The central claim that the pipeline 'provides a foundation for integrating knowledge from scientific literature into downstream applications like digital twins and habitability modeling' is unsupported. No quantitative metrics (entity F1, answer correctness rate, factual consistency on held-out papers, or baseline comparisons) are supplied, and the abstract itself states that 'further improvements are needed to increase extraction accuracy and factual consistency.' This directly undermines the load-bearing assertion that current outputs are usable for the claimed applications.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the constructive feedback. We address the major comment below.

read point-by-point responses

Referee: [Abstract] Abstract: The central claim that the pipeline 'provides a foundation for integrating knowledge from scientific literature into downstream applications like digital twins and habitability modeling' is unsupported. No quantitative metrics (entity F1, answer correctness rate, factual consistency on held-out papers, or baseline comparisons) are supplied, and the abstract itself states that 'further improvements are needed to increase extraction accuracy and factual consistency.' This directly undermines the load-bearing assertion that current outputs are usable for the claimed applications.

Authors: We agree that the abstract's claim is not supported by quantitative metrics, which are absent from the manuscript, and that the self-noted need for further improvements in accuracy and consistency makes the assertion about usability for downstream applications premature. We will revise the abstract to qualify or remove this claim, instead describing the pipeline's design, the domain adaptation approach, and its intended potential for future applications once evaluation metrics demonstrate sufficient performance. This will align the stated contributions with the evidence provided. revision: yes

Circularity Check

0 steps flagged

No circularity: standard engineering pipeline description

full rationale

The paper presents an applied pipeline for domain-adapting Gemma 3 1B via QLoRA on Mars QA/IE datasets, followed by multistage retrieval/chunking and JSON output generation. No equations, first-principles derivations, fitted parameters renamed as predictions, or uniqueness theorems appear. All steps rely on external open-access literature and standard fine-tuning techniques; the central claim (usable outputs for downstream modeling) is framed as an engineering result whose accuracy is explicitly caveated rather than derived from the method itself. This is a self-contained description with no load-bearing self-citation chains or self-definitional reductions.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 0 invented entities

The central claim rests on the unverified assumption that fine-tuning on Mars-specific QA and IE data yields reliable factual outputs on new papers; no independent evidence or benchmarks are supplied in the abstract.

free parameters (1)

QLoRA rank, alpha, and learning rate
Standard fine-tuning hyperparameters whose specific values are not reported but are required for the adaptation step.

axioms (1)

domain assumption The fine-tuned model generalizes to unseen Mars papers with acceptable factual consistency
Invoked when claiming the pipeline provides a foundation for downstream applications.

pith-pipeline@v0.9.1-grok · 5752 in / 1255 out tokens · 25506 ms · 2026-06-26T18:02:30.199452+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

8 extracted references · 4 linked inside Pith

[1]

general” only when no domain- specific keyword matches. The keyword dictionary is derived from eighteen search queries that combine the keyword “Mars

TerraMARS: A Domain-Adapted Small-Language-Model Pipeline for Mars Terraforming Literature Jyotsna Singh∗1, Ash Black1, Jeff Larsen2, Scott R. Saleska3,4 1College of Information Science, University of Arizona, Tucson, AZ, USA 2Biosphere 2, University of Arizona, Tucson, AZ, USA 3Department of Ecology and Evolutionary Biology, University of Arizona, Tucson...

Pith/arXiv arXiv 2026
[2]

an organism, microbial property, or biological adaptation

Most of the chunks are in general category followed by water (88) and atmosphere (80) and least went into survival (3). We constructed the datasets using synthetic instruction generation pipeline. Across the pipeline, a larger set of synthetic samples was generated that passed through filtering and validation. It led to 1179 high quality examples across s...

arXiv 2019
[3]

The case for Mars terraforming research

DeBenedictis, Erika Alden, et al. “The case for Mars terraforming research.”Nature Astronomy9.5 (2025): 634–639. Dettmers, Tim, et al. “Qlora: Efficient finetuning of quantized llms.”Advances in Neural Information Processing Systems36 (2023): 10088–10115. Devlin, Jacob, et al. “Bert: Pre-training of deep bidirectional transformers for language understandi...

2025
[4]

Bacterial survival in Martian conditions

Galletta, Giuseppe, Giulio Bertoloni, and Maurizio D’Alessandro. “Bacterial survival in Martian conditions.”arXiv preprint arXiv:1002.4077(2010). Gu, Yu, et al. “Domain-specific language model pretraining for biomedical natural language processing.”ACM Transactions on Computing for Healthcare (HEALTH)3.1 (2021): 1–23. Hancock, David Y., et al. “Jetstream2...

Pith/arXiv arXiv 2010
[5]

The curious case of neural text degeneration

1–8. Holtzman, Ari, et al. “The curious case of neural text degeneration.”arXiv preprint arXiv:1904.09751(2019). Hu, Edward J., et al. “Lora: Low-rank adaptation of large language models.”ICLR1.2 (2022):

Pith/arXiv arXiv 1904
[6]

Predicted diurnal variation of the deuterium to hydrogen ratio in water at the surface of Mars caused by mass exchange with the regolith

Hu, Renyu. “Predicted diurnal variation of the deuterium to hydrogen ratio in water at the surface of Mars caused by mass exchange with the regolith.”Earth and Planetary Science Letters519 (2019): 192–201. Kite, Edwin S., and Mohit Melwani Daswani. “Geochemistry constrains global hydrology on Early Mars.”Earth and Planetary Science Letters524 (2019): 1157...

Pith/arXiv arXiv 2019
[7]

An Introduction to Mars Terraforming, 2025 Workshop Summary

15 Stork, Devon, and Erika DeBenedictis. “An Introduction to Mars Terraforming, 2025 Workshop Summary.”arXiv preprint arXiv:2510.07344(2025). Tonmoy, S.M., et al. “A comprehensive survey of hallucination mitigation techniques in large language models.”arXiv preprint arXiv:2401.01313(2024). Tshitoyan, Vahe, et al. “Unsupervised word embeddings capture late...

arXiv 2025
[8]

Strong water isotopic anomalies in the martian atmosphere: Probing current and ancient reservoirs

Villanueva, G.L., et al. “Strong water isotopic anomalies in the martian atmosphere: Probing current and ancient reservoirs.”Science348.6231 (2015): 218–221. Zubrin, Robert, and Christopher McKay. “Technological requirements for terraforming Mars.”29th Joint Propulsion Conference and Exhibit

2015

[1] [1]

general” only when no domain- specific keyword matches. The keyword dictionary is derived from eighteen search queries that combine the keyword “Mars

TerraMARS: A Domain-Adapted Small-Language-Model Pipeline for Mars Terraforming Literature Jyotsna Singh∗1, Ash Black1, Jeff Larsen2, Scott R. Saleska3,4 1College of Information Science, University of Arizona, Tucson, AZ, USA 2Biosphere 2, University of Arizona, Tucson, AZ, USA 3Department of Ecology and Evolutionary Biology, University of Arizona, Tucson...

Pith/arXiv arXiv 2026

[2] [2]

an organism, microbial property, or biological adaptation

Most of the chunks are in general category followed by water (88) and atmosphere (80) and least went into survival (3). We constructed the datasets using synthetic instruction generation pipeline. Across the pipeline, a larger set of synthetic samples was generated that passed through filtering and validation. It led to 1179 high quality examples across s...

arXiv 2019

[3] [3]

The case for Mars terraforming research

DeBenedictis, Erika Alden, et al. “The case for Mars terraforming research.”Nature Astronomy9.5 (2025): 634–639. Dettmers, Tim, et al. “Qlora: Efficient finetuning of quantized llms.”Advances in Neural Information Processing Systems36 (2023): 10088–10115. Devlin, Jacob, et al. “Bert: Pre-training of deep bidirectional transformers for language understandi...

2025

[4] [4]

Bacterial survival in Martian conditions

Galletta, Giuseppe, Giulio Bertoloni, and Maurizio D’Alessandro. “Bacterial survival in Martian conditions.”arXiv preprint arXiv:1002.4077(2010). Gu, Yu, et al. “Domain-specific language model pretraining for biomedical natural language processing.”ACM Transactions on Computing for Healthcare (HEALTH)3.1 (2021): 1–23. Hancock, David Y., et al. “Jetstream2...

Pith/arXiv arXiv 2010

[5] [5]

The curious case of neural text degeneration

1–8. Holtzman, Ari, et al. “The curious case of neural text degeneration.”arXiv preprint arXiv:1904.09751(2019). Hu, Edward J., et al. “Lora: Low-rank adaptation of large language models.”ICLR1.2 (2022):

Pith/arXiv arXiv 1904

[6] [6]

Predicted diurnal variation of the deuterium to hydrogen ratio in water at the surface of Mars caused by mass exchange with the regolith

Hu, Renyu. “Predicted diurnal variation of the deuterium to hydrogen ratio in water at the surface of Mars caused by mass exchange with the regolith.”Earth and Planetary Science Letters519 (2019): 192–201. Kite, Edwin S., and Mohit Melwani Daswani. “Geochemistry constrains global hydrology on Early Mars.”Earth and Planetary Science Letters524 (2019): 1157...

Pith/arXiv arXiv 2019

[7] [7]

An Introduction to Mars Terraforming, 2025 Workshop Summary

15 Stork, Devon, and Erika DeBenedictis. “An Introduction to Mars Terraforming, 2025 Workshop Summary.”arXiv preprint arXiv:2510.07344(2025). Tonmoy, S.M., et al. “A comprehensive survey of hallucination mitigation techniques in large language models.”arXiv preprint arXiv:2401.01313(2024). Tshitoyan, Vahe, et al. “Unsupervised word embeddings capture late...

arXiv 2025

[8] [8]

Strong water isotopic anomalies in the martian atmosphere: Probing current and ancient reservoirs

Villanueva, G.L., et al. “Strong water isotopic anomalies in the martian atmosphere: Probing current and ancient reservoirs.”Science348.6231 (2015): 218–221. Zubrin, Robert, and Christopher McKay. “Technological requirements for terraforming Mars.”29th Joint Propulsion Conference and Exhibit

2015