A Registry-Bound LLM Pipeline for Evidence-Grounded Trait Extraction across Tropical Plants, Aquatic Species, and Exotic Pets

Jeff Wang

arxiv: 2606.00994 · v1 · pith:Y6WKPEHBnew · submitted 2026-05-31 · 💻 cs.CL

A Registry-Bound LLM Pipeline for Evidence-Grounded Trait Extraction across Tropical Plants, Aquatic Species, and Exotic Pets

Jeff Wang This is my paper

Pith reviewed 2026-06-28 17:39 UTC · model grok-4.3

classification 💻 cs.CL

keywords LLM pipelinetrait extractionevidence-grounded recordstropical plantsaquatic speciesexotic petsauditable extractionstructured data

0 comments

The pith

Four mechanisms—a 39-key trait registry, verbatim quotes, confidence labels, and versioning—make LLM-derived species trait records auditable.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents a pipeline that extracts structured trait data from text descriptions of tropical plants, aquatic species, and exotic pets using large language models while adding mechanisms for traceability. It applies a closed set of 39 traits, requires each value to be backed by a direct quote from the source, assigns high or medium confidence, and retains versions of the data. On nearly 410,000 species the pipeline produced over 5.4 million records, with validation showing most quotes match source text verbatim and audits confirming support for the values. A sympathetic reader would care because the approach turns otherwise opaque LLM outputs into records that can be checked against originals without claiming the extractions are fully correct on their own.

Core claim

The contribution is the four-mechanism framework that renders LLM-derived rows auditable: a versioned 39-key closed-vocabulary trait registry constraining every admitted value to a typed schema; a per-row verbatim evidence quote tying each value to source text; a per-row confidence label (high or medium; low dropped pre-persist); and multi-version preservation. Applied to 409,880 species, the pipeline persisted 5,489,881 trait records with 81.57 percent at high confidence and three layers of validation showing high rates of quote support.

What carries the argument

The four-mechanism auditability framework: versioned 39-key closed-vocabulary trait registry, per-row verbatim evidence quote, per-row confidence label, and multi-version preservation.

If this is right

The pipeline processed 409,880 species and produced records for 99.985 percent of them.
90.12 percent of 5,427,588 evidence-bearing rows have their quote as a verbatim source substring.
A quote-supports-value audit on 100 stratified rows yielded 100 out of 100 successes.
Face-validity review on 50 red-zone rows yielded 50 out of 50 acceptances.
Per-record correctness is not claimed and requires pending human curation.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same four mechanisms could be tested on text corpora outside species descriptions, such as medical case reports or legal documents, to check if auditability transfers.
The fixed 39-key registry may systematically exclude traits that fall outside its vocabulary, creating a measurable coverage gap that future work could quantify by comparing against open-ended extractions.
Pairing the automated pipeline with targeted human review only on low-confidence or red-zone rows could form a hybrid workflow that scales while preserving verifiability.

Load-bearing premise

Source texts contain extractable verbatim evidence that the LLM can reliably quote and the 39-key registry adequately covers traits without significant information loss or bias.

What would settle it

A check on a large sample of rows finding many cases where the quoted evidence is absent from the source text or does not support the extracted value.

Figures

Figures reproduced from arXiv: 2606.00994 by Jeff Wang.

**Figure 1.** Figure 1: The extraction pipeline. Each species-level substrate record is passed with the subdomainrestricted 39-key registry to the mimo-v2.5 extractor; the structured response is admitted only after passing the substring-verification and enum-conformance filters before persistence. Per-run telemetry is written separately to species_traits_ai_runs. The registry-OOV / enum-conformance filter rejects two out-of-voca… view at source ↗

**Figure 2.** Figure 2: Red-zone routing. Registry-flagged red-zone keys (4 of 39) are persisted into the standard trait table but indexed for priority moderator review; the index pre-orders curator effort onto safety-bearing keys without altering the extraction or persistence path. Red-zone high-confidence rate (87.82%) exceeds the global rate (81.57%) by 6.25 pp. bearing deposit is auditability with disclosure rather than silen… view at source ↗

**Figure 3.** Figure 3: Extended star schema — core_taxon (P1 substrate, referenced via FK) and the two P2 trait tables species_traits_ai and species_traits_ai_runs. P2 publishes its own independent Zenodo record; substrate references are id-only. PK), species_id (BIGINT, FK → species(id) via ON DELETE CASCADE), trait_key (VARCHAR(64), one of 39 registry keys), value (VARCHAR(255), stored verbatim with type-specific parsing perfo… view at source ↗

**Figure 4.** Figure 4: Per-trait coverage and high-confidence share by trait_key, grouped by trait domain. Snapshot 2026-05-29 against canonical model_version full-v1-20260524. The trait-rows-per-species distribution is tight: 280,506 publishable species carry 11–15 trait rows, 80,125 carry 16–25, 47,952 carry 6–10, 1,120 carry 1–5, 117 carry 26–50, and 60 carry zero. The sum reconciles [PITH_FULL_IMAGE:figures/full_fig_p012_4.png] view at source ↗

**Figure 5.** Figure 5: Three-layer validation schema for the deposit. 4.1 Schema and registry conformance — clean filter decomposition By construction of the pipeline gates documented in §2.4, every persisted row passes the registry-OOV check, the value-type check, and the red-zone routing decision. Schema conformance is therefore not an empirical question for the deposit — it is a design invariant whose enforcement we report th… view at source ↗

**Figure 5.** Figure 5: Three-layer validation overview. Layer 1 (substring) is automated at full population; Layers 2-3 are manual single-author preliminary audits at n=100 and n=50. See §4.5. 37.47%). Three points merit careful reading. First, the substring rejections and the enum rejections are different filter classes and are reported separately here; collapsing them into a single “registry-OOV” count would hide the dominant … view at source ↗

**Figure 6.** Figure 6: Per-trait_key substring-verification rate across all 39 trait keys [PITH_FULL_IMAGE:figures/full_fig_p015_6.png] view at source ↗

**Figure 6.** Figure 6: Per-trait substring-verification rate (39 keys), sorted descending. Population: 5,427,588 evidence-bearing rows. Outlier cites_appendix_in_bio (20.20%) references quick-card fields outside bio_sections by design. Median ≈94%. 455 of 546 (83.3%) of divergences are soft — multi_enum element ordering or subset differences, and text paraphrase. Two worked examples: ornamental_value_type for Microsorum pteropu… view at source ↗

read the original abstract

We describe a registry-bound large-language-model extraction pipeline producing evidence-grounded structured trait records at scale, on cultivated tropical plant, aquatic, and pet species. Four mechanisms render LLM-derived rows auditable: a versioned 39-key closed-vocabulary trait registry constraining every admitted value to a typed schema; a per-row verbatim evidence quote tying each value to source text; a per-row confidence label (high or medium; low dropped pre-persist); and multi-version preservation. Applied to 409,880 publishable species from the Tropical Species Encyclopedia, the pipeline executed 706,220 runs and persisted 5,489,881 trait records across 409,820 species (99.985%), 81.57% at high confidence. We report three validation layers in descending evidentiary strength: at full population, 90.12% of 5,427,588 evidence-bearing rows have their quote as a verbatim source substring (93.49% excluding one compliance meta-trait); a quote-supports-value audit on n=100 stratified non-red-zone rows yielded 100/100 (lower bound 96.30%); face-validity on n=50 red-zone rows yielded 50/50 Accept (lower bound 92.86%). Per-record correctness is not claimed; 100% pending human curation. The contribution is the four-mechanism framework.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

This is a practical engineering description of an LLM trait-extraction pipeline with four audit mechanisms, but the reliability evidence rests on a population substring check plus a tiny n=100 audit.

read the letter

The paper walks through a pipeline that turns unstructured text on tropical plants, aquatic species, and exotic pets into structured trait records using a fixed 39-key registry, per-row verbatim quotes, high/medium confidence filtering, and version tracking. They ran it on roughly 410k species and stored 5.5 million records, with 81% at high confidence.

The four mechanisms are a straightforward way to make LLM output traceable without claiming the values are correct. The closed registry and typed schema cut down on open-ended drift, and keeping the source quote lets a human spot-check later. The scale numbers and the 90% substring match on the full set show the system runs without obvious breakage.

The weak part is the validation. The substring check only confirms the quote exists in the source; it does not test whether the extracted value is supported by that quote. The direct test is a 100-row audit that passed, but 100 rows is small next to 5 million records and it skipped the red-zone cases. They correctly note that per-record correctness is not claimed, which keeps the claims honest but also caps how much the results can be used without further checking.

This is for groups that need to populate or update trait databases in ecology or conservation and are prepared to do human review afterward. It is incremental tooling rather than a new theoretical result, so it is most useful to readers already working on similar extraction tasks.

I would send it to peer review if the full methods section adds detail on registry coverage and error patterns across domains; otherwise it fits better as a tools note.

Referee Report

1 major / 0 minor

Summary. The manuscript presents a registry-bound LLM pipeline for extracting structured trait records from species descriptions across tropical plants, aquatic species, and exotic pets. It applies the system to 409,880 species from the Tropical Species Encyclopedia, generating 5,489,881 records via 706,220 runs. The core contribution is a four-mechanism framework for auditability: a versioned 39-key closed-vocabulary trait registry, per-row verbatim evidence quotes, high/medium confidence filtering (low dropped), and multi-version preservation. Three validation layers are reported: 90.12% quote-substring match at population scale, 100/100 on an n=100 stratified audit, and 50/50 on n=50 red-zone rows, with the explicit caveat that per-record correctness is not claimed.

Significance. If the four mechanisms reliably support auditability, the work offers a practical framework for scaling evidence-grounded LLM extraction in biodiversity data curation, where traceability and schema constraints are critical. The transparency around not claiming per-record correctness and the use of external validation metrics are positive elements. The approach could influence similar pipelines in applied NLP for scientific domains if the evidence-grounding holds beyond the reported checks.

major comments (1)

[Abstract] Abstract (validation layers paragraph): The population-level substring check (90.12% of 5,427,588 rows) only confirms quote presence in source text and does not verify entailment of the extracted value by that quote. The direct test of quote-supports-value is restricted to an n=100 stratified audit on non-red-zone rows (100/100 success, lower bound 96.30%), which is two orders of magnitude smaller than the 5.5M persisted records and excludes the 18.43% medium-confidence records; this sample size is insufficient to support the central claim that the four mechanisms render rows auditable at scale.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the careful review and for highlighting the distinction between the validation layers. We address the comment below and maintain that the manuscript's central claim concerns the design of the four-mechanism framework rather than a statistical guarantee of per-record correctness.

read point-by-point responses

Referee: [Abstract] Abstract (validation layers paragraph): The population-level substring check (90.12% of 5,427,588 rows) only confirms quote presence in source text and does not verify entailment of the extracted value by that quote. The direct test of quote-supports-value is restricted to an n=100 stratified audit on non-red-zone rows (100/100 success, lower bound 96.30%), which is two orders of magnitude smaller than the 5.5M persisted records and excludes the 18.43% medium-confidence records; this sample size is insufficient to support the central claim that the four mechanisms render rows auditable at scale.

Authors: We agree that the population-level substring match verifies only verbatim quote presence and not entailment, and that the n=100 audit is small, excludes medium-confidence rows, and cannot support population-level statistical inference on correctness. The manuscript already states explicitly that 'Per-record correctness is not claimed; 100% pending human curation' and positions the contribution as the four-mechanism framework itself. The reported checks demonstrate that the mechanisms operate as specified (quote presence at full scale; support in the sampled non-red-zone cases), with lower-bound intervals provided to reflect sample limitations. We do not interpret the results as claiming statistical auditability at scale; the framework enables human audit rather than replacing it. The abstract wording is therefore consistent with the stated scope. No change to the manuscript is required. revision: no

Circularity Check

0 steps flagged

No circularity; descriptive applied system with independent empirical validations

full rationale

The paper describes a four-mechanism pipeline for evidence-grounded trait extraction and reports three layers of validation (population-level substring match at 90.12%, n=100 stratified audit at 100/100, n=50 red-zone face-validity at 50/50). No derivation chain, fitted parameters presented as predictions, self-citations, or ansatzes exist in the provided text. The central claim (auditable rows via registry + quote + confidence + versioning) is supported by external checks rather than reducing to its own inputs by construction. The validations are statistically independent of the pipeline definition itself.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

Paper describes an applied extraction method without mathematical derivations or fitted parameters; relies on standard assumptions about LLM prompting and text availability.

axioms (1)

domain assumption LLMs can be prompted to produce structured outputs with verbatim quotes from input text
Implicit foundation for the extraction pipeline

pith-pipeline@v0.9.1-grok · 5774 in / 1169 out tokens · 30322 ms · 2026-06-28T17:39:10.571733+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

14 extracted references · 8 canonical work pages

[1]

Wang, J. (2026). Tropicals.cn: Tropical Species Encyclopedia (v1.0.1) [Data set]. Zenodo. https://doi.org/10.5281/zenodo.20377811

work page doi:10.5281/zenodo.20377811 2026
[2]

Wang, J. (2026). A cross-domain tropical species dataset with Chinese vernacular names and CITES source links [Data Descriptor]. Zenodo. https://doi.org/10.5281/zenodo.20424981

work page doi:10.5281/zenodo.20424981 2026
[3]

C., Leadley, P ., et al

Kattge, J., Bönisch, G., Díaz, S., Lavorel, S., Prentice, I. C., Leadley, P ., et al. (2020). TRY plant trait database — enhanced coverage and open access. Global Change Biology, 26(1), 119–188. https://doi.org/10.1111/gcb.14904. Database portal: https://www.try-db.org

work page doi:10.1111/gcb.14904 2020
[4]

S., Boyle, B., Casler, N., Condit, R., Donoghue, J., Durán, S

Maitner, B. S., Boyle, B., Casler, N., Condit, R., Donoghue, J., Durán, S. M., et al. (2018). The bien r package: A tool to access the Botanical Information and Ecology Network (BIEN) database. Methods in Ecology and Evolution, 9(2), 373–379. https://doi.org/10.1111/2041-210X.12861

work page doi:10.1111/2041-210x.12861 2018
[5]

Weigelt, P ., König, C., & Kreft, H. (2020). GIFT — A Global Inventory of Floras and Traits for macroe- cology and biogeography. Journal of Biogeography, 47(1), 16–43. https://doi.org/10.1111/jbi.13623

work page doi:10.1111/jbi.13623 2020
[6]

LoDoPaB-CT, a benchmark dataset for low-dose computed tomography reconstruction,

Falster, D., Gallagher, R., Wenk, E. H., Wright, I. J., Indiarto, D., Andrew, S. C., et al. (2021). AusTraits, a curated plant trait database for the Australian flora. Scientific Data, 8, 254. https://doi.org/10.1038/s41597- 021-01006-6

work page doi:10.1038/s41597- 2021
[7]

GBIF: The Global Biodiversity Information Facility

GBIF Secretariat (2024). GBIF: The Global Biodiversity Information Facility. https://www.gbif.org

2024
[8]

iNaturalist — A joint initiative of the California Academy of Sciences and the National Geographic Society

iNaturalist (2024). iNaturalist — A joint initiative of the California Academy of Sciences and the National Geographic Society. https://www.inaturalist.org

2024
[9]

Toxic and Non-Toxic Plants

American Society for the Prevention of Cruelty to Animals (ASPCA) (2024). Toxic and Non-Toxic Plants. https://www.aspca.org/pet-care/animal-poison-control/toxic-and-non-toxic-plants

2024
[10]

Species+

UNEP-WCMC and CITES Secretariat (2024). Species+. https://www.speciesplus.net

2024
[11]

The IUCN Red List of Threatened Species

IUCN (2024). The IUCN Red List of Threatened Species. https://www.iucnredlist.org

2024
[12]

Plants of the World Online (POWO)

Royal Botanic Gardens, Kew (2024). Plants of the World Online (POWO). https://powo.science.kew.org

2024
[13]

Wieczorek, J., Bloom, D., Guralnick, R., Blum, S., Döring, M., Giovanni, R., Robertson, T., & Vieglais, D. (2012). Darwin Core: An evolving community-developed biodiversity data standard. PLoS ONE, 7(1), e29715. https://doi.org/10.1371/journal.pone.0029715

work page doi:10.1371/journal.pone.0029715 2012
[14]

, year 1927

Wilson, E. B. (1927). Probable Inference, the Law of Succession, and Statistical Inference. Journal of the American Statistical Association, 22(158), 209–212. https://doi.org/10.1080/01621459.1927.10502953 Supplementary Materials This appendix supplies the auxiliary material referenced from the main text: (S1) the full enumeration of the 39-key trait regi...

work page doi:10.1080/01621459.1927.10502953 1927

[1] [1]

Wang, J. (2026). Tropicals.cn: Tropical Species Encyclopedia (v1.0.1) [Data set]. Zenodo. https://doi.org/10.5281/zenodo.20377811

work page doi:10.5281/zenodo.20377811 2026

[2] [2]

Wang, J. (2026). A cross-domain tropical species dataset with Chinese vernacular names and CITES source links [Data Descriptor]. Zenodo. https://doi.org/10.5281/zenodo.20424981

work page doi:10.5281/zenodo.20424981 2026

[3] [3]

C., Leadley, P ., et al

Kattge, J., Bönisch, G., Díaz, S., Lavorel, S., Prentice, I. C., Leadley, P ., et al. (2020). TRY plant trait database — enhanced coverage and open access. Global Change Biology, 26(1), 119–188. https://doi.org/10.1111/gcb.14904. Database portal: https://www.try-db.org

work page doi:10.1111/gcb.14904 2020

[4] [4]

S., Boyle, B., Casler, N., Condit, R., Donoghue, J., Durán, S

Maitner, B. S., Boyle, B., Casler, N., Condit, R., Donoghue, J., Durán, S. M., et al. (2018). The bien r package: A tool to access the Botanical Information and Ecology Network (BIEN) database. Methods in Ecology and Evolution, 9(2), 373–379. https://doi.org/10.1111/2041-210X.12861

work page doi:10.1111/2041-210x.12861 2018

[5] [5]

Weigelt, P ., König, C., & Kreft, H. (2020). GIFT — A Global Inventory of Floras and Traits for macroe- cology and biogeography. Journal of Biogeography, 47(1), 16–43. https://doi.org/10.1111/jbi.13623

work page doi:10.1111/jbi.13623 2020

[6] [6]

LoDoPaB-CT, a benchmark dataset for low-dose computed tomography reconstruction,

Falster, D., Gallagher, R., Wenk, E. H., Wright, I. J., Indiarto, D., Andrew, S. C., et al. (2021). AusTraits, a curated plant trait database for the Australian flora. Scientific Data, 8, 254. https://doi.org/10.1038/s41597- 021-01006-6

work page doi:10.1038/s41597- 2021

[7] [7]

GBIF: The Global Biodiversity Information Facility

GBIF Secretariat (2024). GBIF: The Global Biodiversity Information Facility. https://www.gbif.org

2024

[8] [8]

iNaturalist — A joint initiative of the California Academy of Sciences and the National Geographic Society

iNaturalist (2024). iNaturalist — A joint initiative of the California Academy of Sciences and the National Geographic Society. https://www.inaturalist.org

2024

[9] [9]

Toxic and Non-Toxic Plants

American Society for the Prevention of Cruelty to Animals (ASPCA) (2024). Toxic and Non-Toxic Plants. https://www.aspca.org/pet-care/animal-poison-control/toxic-and-non-toxic-plants

2024

[10] [10]

Species+

UNEP-WCMC and CITES Secretariat (2024). Species+. https://www.speciesplus.net

2024

[11] [11]

The IUCN Red List of Threatened Species

IUCN (2024). The IUCN Red List of Threatened Species. https://www.iucnredlist.org

2024

[12] [12]

Plants of the World Online (POWO)

Royal Botanic Gardens, Kew (2024). Plants of the World Online (POWO). https://powo.science.kew.org

2024

[13] [13]

Wieczorek, J., Bloom, D., Guralnick, R., Blum, S., Döring, M., Giovanni, R., Robertson, T., & Vieglais, D. (2012). Darwin Core: An evolving community-developed biodiversity data standard. PLoS ONE, 7(1), e29715. https://doi.org/10.1371/journal.pone.0029715

work page doi:10.1371/journal.pone.0029715 2012

[14] [14]

, year 1927

Wilson, E. B. (1927). Probable Inference, the Law of Succession, and Statistical Inference. Journal of the American Statistical Association, 22(158), 209–212. https://doi.org/10.1080/01621459.1927.10502953 Supplementary Materials This appendix supplies the auxiliary material referenced from the main text: (S1) the full enumeration of the 39-key trait regi...

work page doi:10.1080/01621459.1927.10502953 1927