arxiv: 2605.03537 · v1 · submitted 2026-05-05 · 💻 cs.DL · cs.AI

Recognition: unknown

A Skill-Based AI Agentic Pipeline for Library of Congress Subject Indexing

Eric H. C. Chow

Authors on Pith no claims yet

Pith reviewed 2026-05-09 16:20 UTC · model grok-4.3

classification 💻 cs.DL cs.AI

keywords Library of Congress Subject Headingssubject indexingAI agent pipelineMARC21 recordscataloging automationLCSHskill-based agentsSHM instruction sheets

0 comments

The pith

An AI pipeline decomposes Library of Congress subject indexing into four sequential skills based on official manuals.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces a modular system that breaks subject indexing into four discrete AI agent skills performed in sequence. Each skill draws directly from the Library of Congress Subject Headings Manual instruction sheets and standard subject analysis principles to handle conceptual analysis, term filtering, authority checking, and MARC record generation. When tested against existing headings for ten titles drawn from Harvard Library records, the pipeline produces output that aligns conceptually with how professional catalogers work. The results also reflect the 2026 policy shift away from form subdivisions toward separate LCGFT 655 fields. This approach targets a labor-intensive cataloging step by embedding domain rules into executable agent steps rather than relying on end-to-end generation.

Core claim

The skill-based pipeline, by encoding SHM-derived knowledge into four sequentially executed agent skills—conceptual analysis, quantitative filtering, authority validation, and MARC field synthesis—produces subject headings that demonstrate strong conceptual alignment with professional practice on a set of ten titles, while differing in specificity, subdivision application, and consistent use of LCGFT 655 fields in line with the 2026 LC policy.

What carries the argument

Four discrete, sequentially executed agent skills—conceptual analysis, quantitative filtering, authority validation, and MARC field synthesis—each encoding rules from the Subject Headings Manual and subject analysis theory.

If this is right

The pipeline generates headings that match professional conceptual choices on tested titles.
Output differs from existing records in levels of specificity and how subdivisions are applied.
The system adheres to the 2026 policy by routing form information to LCGFT 655 fields rather than form subdivisions.
Automation via modular skills can target the time-consuming analysis and encoding steps in cataloging workflows.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The modular skill structure allows individual components to be updated independently when LCSH policies or manuals change.
Differences in specificity observed on the test set could guide refinements to the quantitative filtering skill for better calibration.
If scaled beyond ten titles, the pipeline might surface recurring patterns in how AI handles complex or interdisciplinary aboutness that human catalogers resolve differently.
MARC-compatible output makes direct integration into systems such as Alma feasible without additional format conversion layers.

Load-bearing premise

That decomposing subject indexing into these four specific skills drawn from manuals fully captures professional practice and that results from ten titles are enough to show meaningful alignment.

What would settle it

Running the pipeline on a larger corpus of titles across varied subjects and comparing outputs term-by-term against multiple expert catalogers' headings to check whether conceptual alignment holds or systematic differences in specificity and subdivisions persist.

Figures

Figures reproduced from arXiv: 2605.03537 by Eric H. C. Chow.

**Figure 1.** Figure 1: The four-skill sequential pipeline. Each skill receives view at source ↗

read the original abstract

This paper presents a modular AI agentic skill pipeline for automating subject indexing with Library of Congress Subject Headings (LCSH). Subject indexing - the process of analyzing a work's aboutness, selecting controlled vocabulary terms, and encoding them as MARC21 subject access fields - is one of the most time-consuming components of library cataloging. The system decomposes this process into four discrete, sequentially executed agent skills: conceptual analysis, quantitative filtering, authority validation, and MARC field synthesis. Each skill encodes domain knowledge drawn directly from Library of Congress Subject Headings Manual (SHM) instruction sheets and subject analysis theory. The pipeline was evaluated against a corpus of ten titles whose existing subject headings were captured from the Harvard Library bibliographic dataset (a snapshot of their Alma ILS). Results demonstrate strong conceptual alignment with professional subject indexing practice, with notable differences in specificity, subdivision practice, and the agent's adherence to the 2026 LC policy discontinuing form subdivisions in favor of LCGFT 655 fields.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

This paper offers a domain-grounded agent pipeline for LCSH indexing that is worth looking at, though the ten-title evaluation does not yet substantiate the alignment results strongly.

read the letter

The key takeaway is that this paper describes a four-skill AI agent pipeline for LCSH subject indexing, grounded in the Subject Headings Manual, but the support for its effectiveness comes from a qualitative look at only ten titles. The new part is the specific decomposition into conceptual analysis, quantitative filtering, authority validation, and MARC field synthesis. Each step pulls directly from library domain sources rather than relying on general large language model capabilities. That makes the system more interpretable and potentially more reliable for cataloging work. The evaluation also flags how the agent follows the upcoming LC policy on form subdivisions, which shows some awareness of current standards. It does well at laying out a modular structure that could be extended or modified. For a digital libraries audience, this kind of explicit skill breakdown is useful as a template for other cataloging tasks. The main limitation is the evaluation. Comparing outputs to existing headings on ten Harvard titles without any quantitative measures like term overlap scores or agreement rates makes the strong conceptual alignment hard to assess. The sample is small and not described as random or representative, so differences in specificity or subdivision use could be case-specific rather than indicative of broader performance. No baselines against simpler prompting or existing tools are given either. This paper is for readers interested in AI applications to library metadata and cataloging automation. Someone working on similar agent systems in information science would get value from the domain integration details. It deserves a serious referee because the core idea is concrete and the implementation choices are documented. Reviewers could push for expanded testing and metrics, which would make the contribution clearer. I would recommend sending it to peer review after the authors add a larger evaluation set and some basic quantitative analysis.

Referee Report

2 major / 2 minor

Summary. The paper proposes a modular AI agentic pipeline for automating Library of Congress Subject Headings (LCSH) indexing. It decomposes the process into four sequential skills—conceptual analysis, quantitative filtering, authority validation, and MARC field synthesis—each drawing directly from the Subject Headings Manual (SHM) instruction sheets and subject analysis theory. The system is evaluated on ten titles from a Harvard Library Alma snapshot, with results claimed to show strong conceptual alignment to professional practice, including differences in specificity, subdivision usage, and adherence to the 2026 LC policy favoring LCGFT 655 fields over form subdivisions.

Significance. If the alignment claim holds under expanded testing, the work would offer a transparent, domain-grounded framework for AI-assisted cataloging that could reduce the labor intensity of subject indexing while improving consistency in digital libraries. The explicit encoding of SHM-derived rules into discrete agent skills provides a reproducible template that distinguishes this from black-box approaches and could support integration into library systems for scalable metadata generation.

major comments (2)

The claim of 'strong conceptual alignment' (abstract) rests entirely on qualitative comparison of agent outputs to existing headings for a corpus of only ten titles. No quantitative metrics such as term-level precision, recall, or F1 are reported, nor are baselines (inter-indexer agreement, existing automated tools) or sampling rationale provided. This renders the noted differences in specificity and subdivision practice anecdotal and insufficient to support the central claim, as the small non-random sample cannot distinguish systematic alignment from prompt artifacts or selection effects.
The pipeline design assumes that decomposing subject indexing into four discrete, sequentially executed skills (conceptual analysis through MARC synthesis) fully captures professional practice (abstract and pipeline description). The manuscript provides no validation or ablation showing that this sequential structure matches the iterative, holistic judgment of expert indexers, which is load-bearing for the claim that the agentic approach aligns with SHM-based workflows.

minor comments (2)

The evaluation section would benefit from explicit description of the agent implementation (e.g., underlying LLM, prompt templates, temperature settings, and how 'quantitative filtering' is operationalized), as these details are necessary for reproducibility but are not addressed in the provided abstract or results summary.
Clarify the exact criteria used to judge 'conceptual alignment' versus 'notable differences' in the ten-title comparison; without this, readers cannot assess the objectivity of the qualitative assessment.

Simulated Author's Rebuttal

2 responses · 0 unresolved

Thank you for the constructive comments on our manuscript. We have carefully considered each point and revised the paper to address the concerns regarding the strength of our claims and the validation of the pipeline structure.

read point-by-point responses

Referee: The claim of 'strong conceptual alignment' (abstract) rests entirely on qualitative comparison of agent outputs to existing headings for a corpus of only ten titles. No quantitative metrics such as term-level precision, recall, or F1 are reported, nor are baselines (inter-indexer agreement, existing automated tools) or sampling rationale provided. This renders the noted differences in specificity and subdivision practice anecdotal and insufficient to support the central claim, as the small non-random sample cannot distinguish systematic alignment from prompt artifacts or selection effects.

Authors: We acknowledge the validity of this critique. The evaluation was intentionally qualitative and limited to ten titles to facilitate detailed, title-by-title analysis of how the agentic pipeline aligns with SHM guidelines in terms of conceptual analysis and policy adherence. We have revised the abstract to replace 'strong conceptual alignment' with 'initial conceptual alignment' and added a new 'Limitations and Future Work' section that explicitly discusses the small sample size, the absence of quantitative metrics and baselines, and the non-random sampling from the Harvard Library dataset. We also include the sampling rationale: the titles were selected to represent a variety of subject domains for illustrative purposes. While we cannot retroactively add inter-indexer agreement data, we note in the revision that such comparisons would be valuable for future validation. revision: yes
Referee: The pipeline design assumes that decomposing subject indexing into four discrete, sequentially executed skills (conceptual analysis through MARC synthesis) fully captures professional practice (abstract and pipeline description). The manuscript provides no validation or ablation showing that this sequential structure matches the iterative, holistic judgment of expert indexers, which is load-bearing for the claim that the agentic approach aligns with SHM-based workflows.

Authors: We agree that the manuscript lacks explicit validation or ablation studies for the sequential decomposition. The four skills are modeled after the distinct phases outlined in the Library of Congress Subject Headings Manual and standard cataloging literature, which treat conceptual analysis, term selection, authority control, and encoding as sequential steps. To address this, we have expanded the 'Pipeline Design' section to provide more detailed justification drawn from SHM instruction sheets and added a discussion in the Limitations section noting that real-world indexing often involves iteration, which our current pipeline does not model. We have not performed ablation experiments in this initial study due to resource constraints but have outlined plans for such analyses in future work to test the necessity of each skill and potential for iterative refinement. revision: yes

Circularity Check

0 steps flagged

No significant circularity; pipeline and evaluation are externally grounded

full rationale

The paper decomposes subject indexing into four agent skills drawn directly from external Library of Congress Subject Headings Manual (SHM) instruction sheets and subject analysis theory. No equations, parameter fitting, statistical predictions, or self-citations appear in the derivation. The evaluation performs qualitative comparison of outputs against pre-existing headings from the Harvard Alma snapshot on ten titles; this is an observational check against independent data rather than a fitted input renamed as prediction or a self-referential claim. All load-bearing steps cite external domain sources and remain self-contained against those benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim depends on the assumption that the four-skill decomposition accurately models professional subject indexing; no free parameters are fitted, no new entities are postulated, and the axioms are standard domain assumptions from library standards.

axioms (1)

domain assumption The subject indexing process can be decomposed into four discrete, sequentially executed skills (conceptual analysis, quantitative filtering, authority validation, MARC field synthesis) that encode SHM instruction sheets and subject analysis theory.
The pipeline design and evaluation rest directly on this decomposition as stated in the abstract.

pith-pipeline@v0.9.0 · 5466 in / 1329 out tokens · 69243 ms · 2026-05-09T16:20:37.387750+00:00 · methodology

Review history (2 revisions) →

discussion (0)

Reference graph

Works this paper leans on

9 extracted references · 9 canonical work pages

[1]

Asula, M., Makke, J., Freienthal, L., Kuulmets, H.-A., & Sirel, R. (2021). Kratt: Developing an automatic subject indexing tool for the National Library of Estonia. Cataloging & Classification Quarterly, 59(8), 775--793. https://doi.org/10.1080/01639374.2021.1998283

work page doi:10.1080/01639374.2021.1998283 2021
[2]

Brzustowicz, R. (2023). From ChatGPT to CatGPT: The implications of artificial intelligence on library cataloging. Information Technology and Libraries, 42(3). https://doi.org/10.5860/ital.v42i3.16295

work page doi:10.5860/ital.v42i3.16295 2023
[3]

Chow, E. H. C., Kao, T. J., & Li, X. (2024). An experiment with the use of ChatGPT for LCSH subject assignment on electronic theses and dissertations. Cataloging & Classification Quarterly, 62(5), 574--588. https://doi.org/10.1080/01639374.2024.2394516

work page doi:10.1080/01639374.2024.2394516 2024
[4]

Harvard Library. (2022). Harvard Library bibliographic metadata [Data set]. Harvard Dataverse. https://doi.org/10.7910/DVN/I8L0ZZ

work page doi:10.7910/dvn/i8l0zz 2022
[5]

M., & Joudrey, D

Holley, R. M., & Joudrey, D. N. (2021). Aboutness and conceptual analysis: A review. Cataloging & Classification Quarterly, 59(2--3), 159--185. https://doi.org/10.1080/01639374.2020.1856992

work page doi:10.1080/01639374.2020.1856992 2021
[6]

D'Souza, J., Sadruddin, S., Israel, H., Begoin, M., & Slawig, D. (2025). SemEval-2025 Task 5: LLMs4Subjects---LLM-based automated subject tagging for a national technical library's open-access catalog. arXiv preprint arXiv:2504.07199. https://arxiv.org/abs/2504.07199

work page arXiv 2025
[7]

D'Souza, J., Sadruddin, S., K\" a hler, M., Salfinger, A., Zaccagna, L., Incitti, F., Snidaro, L., & Suominen, O. (2026). An extreme multi-label text classification (XMTC) library dataset: What if we took ``Use of Practical AI in Digital Libraries'' seriously? arXiv preprint arXiv:2603.10876. https://arxiv.org/abs/2603.10876

work page arXiv 2026
[8]

Suominen, O., Inkinen, J., & Lehtinen, M. (2025). Annif at SemEval-2025 Task 5: Traditional XMTC augmented by LLMs. arXiv preprint arXiv:2504.19675. https://arxiv.org/abs/2504.19675

work page arXiv 2025
[9]

Tang, K.-L., & Jiang, Y. (2025). Better recommendations: Validating AI-generated subject terms through LOC Linked Data Service. arXiv preprint arXiv:2508.00867. https://arxiv.org/abs/2508.00867

work page arXiv 2025