arxiv: 2604.19777 · v1 · submitted 2026-03-28 · 💻 cs.CL · cs.AI· cs.IR

Recognition: no theorem link

Self-Describing Structured Data with Dual-Layer Guidance: A Lightweight Alternative to RAG for Precision Retrieval in Large-Scale LLM Knowledge Navigation

Hung Ming Liu

Authors on Pith no claims yet

Pith reviewed 2026-05-14 22:36 UTC · model grok-4.3

classification 💻 cs.CL cs.AIcs.IR

keywords SDSRDual-Layer Guidanceprimacy biasLost-in-the-MiddleRAG alternativestructured data retrievalLLM knowledge navigationadversarial benchmark

0 comments

The pith

Placing human-authored navigational metadata at the start of structured files enables 100% primary routing accuracy across 119 categories in LLM queries.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Large language models lose attention to content in the middle of long contexts, which hinders direct use of large structured knowledge bases. The paper introduces Self-Describing Structured Retrieval that embeds human-written navigational metadata at each file's start position to exploit the model's preference for early information. A dual-layer strategy adds explicit routing rules in the system prompt. Benchmarks on a library expanded from 36 to 119 categories via adversarial distractors show the combined approach reaches full accuracy on primary routing tasks, far above the no-guidance baseline. The work also notes that primary routing yields to rules while secondary cross-category tasks need intent built into the data structure, and it extends the method to semi-structured data without vector databases.

Core claim

Self-Describing Structured Retrieval places human-authored navigational metadata at the primacy position inside each structured data file, thereby exploiting the LLM primacy bias rather than fighting the lost-in-the-middle effect. Dual-Layer Guidance combines this in-file metadata with explicit routing rules supplied in the system prompt. On a 190-skill library expanded to 119 categories through adversarial distractor injection, the full dual-layer version delivers 100% primary routing accuracy while the no-guidance baseline reaches only 65%. Primary routing proves solvable by explicit rules; secondary cross-category routing requires architectural intent encoded directly in the data layout.

What carries the argument

Self-Describing Structured Retrieval with Dual-Layer Guidance: human-authored navigational metadata placed at the primacy position in each file, paired with explicit routing rules in the system prompt.

If this is right

Primary routing accuracy reaches 100% when both in-file metadata and prompt rules are used, even after expansion to 119 categories.
Secondary cross-category routing requires explicit architectural intent encoded in the data structure rather than rules alone.
The approach operates on semi-structured corpora by encoding cross-references, removing the need for vector databases when recoverable document structure exists.
It supplies a lightweight alternative to RAG for libraries whose semantic boundaries are defined by humans rather than statistical learning.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The primacy-position placement could be adapted for other known positional biases by testing different metadata locations in the same benchmark setup.
Accuracy at 119 categories suggests the method may scale to larger libraries provided distractor difficulty is controlled similarly.
Hybrid systems could combine SDSR primary routing with selective vector retrieval only for secondary cross-category steps.
Data structures that embed routing intent may reduce error propagation in multi-hop queries compared with purely prompt-driven approaches.

Load-bearing premise

That placing human-authored navigational metadata at the file's primacy position will reliably exploit the LLM's primacy bias across different models and that the adversarial 119-category benchmark is representative of real-world retrieval difficulty.

What would settle it

Running the same 20-query primary routing test on a model family whose measured primacy bias is weaker than the test models, or replacing the adversarial distractors with queries drawn from actual user logs, and observing primary routing accuracy drop below 95%.

Figures

Figures reproduced from arXiv: 2604.19777 by Hung Ming Liu.

**Figure 2.** Figure 2: Round 3 (119 categories) results by metric and condition. Version D achieves perfect [PITH_FULL_IMAGE:figures/full_fig_p011_2.png] view at source ↗

**Figure 3.** Figure 3: Per-question primary routing hit/miss matrix for Round 3 (119 categories). Green [PITH_FULL_IMAGE:figures/full_fig_p012_3.png] view at source ↗

**Figure 4.** Figure 4: The SDSR two-tier retrieval pipeline. Python reads only the [PITH_FULL_IMAGE:figures/full_fig_p015_4.png] view at source ↗

**Figure 5.** Figure 5: SDSR applied to legal judgment retrieval. A one-time structuring pass converts [PITH_FULL_IMAGE:figures/full_fig_p017_5.png] view at source ↗

read the original abstract

Large Language Models (LLMs) exhibit a well-documented positional bias when processing long input contexts: information in the middle of a context window receives substantially less attention than content at the boundaries, a phenomenon termed the Lost-in-the-Middle effect (Liu et al., 2024). This limits knowledge-retrieval applications that embed large structured knowledge bases directly in the LLM context. Retrieval-Augmented Generation (RAG) addresses scalability by retrieving only relevant fragments, but introduces substantial infrastructure overhead and is ill-suited to libraries whose semantic boundaries are human-defined rather than statistically learned. We propose Self-Describing Structured Retrieval (SDSR), a lightweight framework in which structured data files embed human-authored navigational metadata at the file's primacy position, thereby exploiting rather than fighting the LLM's primacy bias. We further propose a Dual-Layer Guidance strategy combining in-file metadata with explicit routing rules in the system prompt. We validate SDSR through a four-round benchmark using a 190-skill library expanded from 36 to 119 categories via adversarial distractor injection. Four conditions are tested: (A) no guidance, (B) in-file summary only, (C) prompt hint only, (D) both combined. Version D achieves 100% primary routing accuracy (20/20) at 119 categories versus 65% for the no-guidance baseline. We identify a fundamental asymmetry: primary routing is solvable by explicit rules, while secondary cross-category routing requires architectural intent explicitly encoded in the data structure. We further extend SDSR to semi-structured corpora, showing how cross-reference encoding enables operation without vector databases in domains with recoverable document structure.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper shows how to exploit LLM primacy bias with file-start metadata and dual prompts for routing, but the 20-query test is too small to support the 100% accuracy claim.

read the letter

The main thing here is a practical trick for structured data: put human-written navigational summaries at the start of each file and layer in explicit routing rules in the system prompt. This turns the lost-in-the-middle effect into an advantage instead of fighting it with RAG. They test four versions on a library expanded to 119 categories with adversarial distractors and report version D hitting 100% primary routing on 20 queries versus 65% with no guidance. They also flag the asymmetry that primary routing follows rules while secondary cross-category needs intent built into the data structure, and they sketch an extension to semi-structured files via cross-references.

Referee Report

2 major / 2 minor

Summary. The manuscript proposes Self-Describing Structured Retrieval (SDSR), a lightweight alternative to RAG in which structured data files embed human-authored navigational metadata at the primacy position to exploit LLMs' positional (Lost-in-the-Middle) bias. It introduces Dual-Layer Guidance that pairs in-file metadata with explicit routing rules in the system prompt. The central empirical claim is a four-condition benchmark on a 190-skill library expanded to 119 categories via adversarial distractors, where the combined guidance condition (D) reaches 100% primary routing accuracy (20/20) versus 65% for the no-guidance baseline (A). The paper further identifies an asymmetry between primary and secondary routing and sketches an extension to semi-structured corpora without vector databases.

Significance. If the accuracy gains prove robust under larger-scale testing, SDSR could offer a low-infrastructure approach to precise knowledge navigation for human-curated structured libraries by directly leveraging rather than mitigating LLM context biases, reducing reliance on external retrieval systems in domains where semantic boundaries are explicitly defined.

major comments (2)

[Abstract] Abstract / Benchmark: The claim that Version D achieves 100% primary routing accuracy (20/20) at 119 categories rests on a test set of only 20 queries. No variance estimates, confidence intervals, or results on additional held-out queries are reported, so the perfect score may reflect query selection, prompt sensitivity, or limited diversity rather than reliable exploitation of primacy bias across models.
[Abstract] Abstract: No details are supplied on the specific LLM model, exact prompt wording for each of the four conditions, controls for prompt length or token budget, or any statistical significance test for the 65% to 100% accuracy difference. These omissions prevent assessment of whether the result supports the central claim of a generalizable lightweight retrieval method.

minor comments (2)

[Abstract] The four-round benchmark structure is referenced but not explained in relation to the four conditions (A–D); clarifying whether rounds test repeated queries, different sets, or progressive difficulty would improve reproducibility.
[Abstract] The asymmetry between primary routing (solvable by explicit rules) and secondary cross-category routing (requiring architectural intent in the data structure) is asserted without concrete examples or quantitative support in the abstract, weakening the interpretive claim.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their constructive comments, which have helped us improve the clarity and rigor of our manuscript. We address each major comment below and have made revisions to incorporate additional details and analyses.

read point-by-point responses

Referee: [Abstract] Abstract / Benchmark: The claim that Version D achieves 100% primary routing accuracy (20/20) at 119 categories rests on a test set of only 20 queries. No variance estimates, confidence intervals, or results on additional held-out queries are reported, so the perfect score may reflect query selection, prompt sensitivity, or limited diversity rather than reliable exploitation of primacy bias across models.

Authors: We acknowledge that the test set of 20 queries is limited in scale. These queries were deliberately constructed to span the 119 categories while incorporating adversarial distractors that directly test the limits of primacy-based routing. The 100% accuracy under Dual-Layer Guidance (condition D) versus 65% baseline demonstrates the framework's effectiveness in this controlled setting. In the revised manuscript we will add an explicit description of the query selection and distractor generation process, report bootstrap confidence intervals for all accuracy figures, and include results on an expanded held-out set of 30 additional queries to provide stronger evidence of robustness. revision: yes
Referee: [Abstract] Abstract: No details are supplied on the specific LLM model, exact prompt wording for each of the four conditions, controls for prompt length or token budget, or any statistical significance test for the 65% to 100% accuracy difference. These omissions prevent assessment of whether the result supports the central claim of a generalizable lightweight retrieval method.

Authors: We agree these details are necessary for reproducibility and evaluation. The revised manuscript will specify the LLM used, reproduce the exact system and user prompt templates for conditions A–D, confirm that prompt lengths and token budgets were matched across conditions, and report a statistical significance test (McNemar’s test) on the accuracy difference between the no-guidance baseline and the combined-guidance condition. revision: yes

Circularity Check

0 steps flagged

No circularity; empirical benchmark with independent test results

full rationale

The paper proposes the SDSR framework and Dual-Layer Guidance strategy, then validates them via direct empirical measurement of primary routing accuracy across four conditions on an expanded 119-category benchmark (20 queries). No equations, fitted parameters, or derivations are present that could reduce to inputs by construction. The central claim (100% accuracy for Version D vs. 65% baseline) is a reported experimental outcome, not a self-referential calculation. The citation to Liu et al. (2024) provides background on positional bias but is not load-bearing for the performance results, which stand on the benchmark comparison itself.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The approach rests on the documented positional bias of LLMs and the assumption that human-authored metadata placed at primacy will be attended to; no free parameters or invented entities are introduced.

axioms (1)

domain assumption LLMs exhibit a Lost-in-the-Middle effect where middle-context information receives less attention
Cited from Liu et al., 2024; invoked to justify placing metadata at the file's primacy position

pith-pipeline@v0.9.0 · 5608 in / 1234 out tokens · 31085 ms · 2026-05-14T22:36:18.968346+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

18 extracted references · 18 canonical work pages · 2 internal anchors

[1]

Liu, Kevin Lin, John Hewitt, Ashwin Paranjape, Michele Bevilacqua, Fabio Petroni, and Percy Liang

Nelson F. Liu, Kevin Lin, John Hewitt, Ashwin Paranjape, Michele Bevilacqua, Fabio Petroni, and Percy Liang. Lost in the middle: How language models use long contexts. Transactions of the Association for Computational Linguistics, 12:157–173, 2024

work page 2024
[2]

Retrieval-augmented generation for knowledge-intensive NLP tasks

Patrick Lewis, Ethan Perez, Aleksandra Piktus, Fabio Petroni, Vladimir Karpukhin, Naman Goyal, Heinrich K¨ uttler, Mike Lewis, Wen-tau Yih, Tim Rockt¨ aschel, Sebastian Riedel, and Douwe Kiela. Retrieval-augmented generation for knowledge-intensive NLP tasks. In Advances in Neural Information Processing Systems 33 (NeurIPS), 2020

work page 2020
[3]

Teachers College, Columbia University, 1913

Hermann Ebbinghaus.Memory: A Contribution to Experimental Psychology. Teachers College, Columbia University, 1913. Originally published in German, 1885

work page 1913
[4]

Qiu, and Lili Qiu

Zhiyuan He, Huiqiang Jiang, Zilong Wang, Yuqing Yang, Luna K. Qiu, and Lili Qiu. Position engineering: Boosting large language models through positional information manipulation. InProceedings of the 2024 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 7333–7345, Miami, Florida, USA, 2024. Association for Computational Lingui...

work page doi:10.18653/v1/2024.emnlp-main.417 2024
[5]

Data engineering for scaling language models to 128K context.arXiv preprint arXiv:2402.10171, 2024

Yao Fu, Rameswar Panda, Xinyao Niu, Xiang Yue, Hannaneh Hajishirzi, Yoon Kim, and Hao Peng. Data engineering for scaling language models to 128K context.arXiv preprint arXiv:2402.10171, 2024

work page arXiv 2024
[6]

On the emergence of position bias in transformers

Xinyi Wu, Yifei Wang, Stefanie Jegelka, and Ali Jadbabaie. On the emergence of position bias in transformers. InProceedings of the 42nd International Conference on Machine Learning (ICML), 2025. arXiv:2502.01951

work page arXiv 2025
[7]

Lost in the middle: An emergent property from information retrieval demands in LLMs.arXiv preprint arXiv:2510.10276, 2025

Nikolaus Salvatore, Hao Wang, and Qiong Zhang. Lost in the middle: An emergent property from information retrieval demands in LLMs.arXiv preprint arXiv:2510.10276, 2025

work page arXiv 2025
[8]

RankRAG: Unifying context ranking with retrieval-augmented generation in LLMs.arXiv preprint arXiv:2407.02485, 2024

Yue Yu, Wei Ping, Zihan Liu, Boxin Wang, Jiaxuan You, Chao Zhang, Bryan Catanzaro, and Anima Anandkumar. RankRAG: Unifying context ranking with retrieval-augmented generation in LLMs.arXiv preprint arXiv:2407.02485, 2024

work page arXiv 2024
[9]

From Local to Global: A Graph RAG Approach to Query-Focused Summarization

Darren Edge, Ha Trinh, Newman Cheng, Joshua Bradley, Alex Chao, Apurva Mody, Steven Truitt, and Jonathan Larson. From local to global: A graph RAG approach to query-focused summarization.arXiv preprint arXiv:2404.16130, 2024

work page internal anchor Pith review Pith/arXiv arXiv 2024
[10]

A survey on knowledge graph-enhanced RAG.arXiv preprint, 2024

Haoyu Guo et al. A survey on knowledge graph-enhanced RAG.arXiv preprint, 2024

work page 2024
[11]

HiQA: A hierarchical contextual augmentation RAG for massive documents QA.arXiv preprint arXiv:2402.01767, 2024

Xinyue Chen, Pengyu Gao, Jiangjiang Song, and Xiaoyang Tan. HiQA: A hierarchical contextual augmentation RAG for massive documents QA.arXiv preprint arXiv:2402.01767, 2024

work page arXiv 2024
[12]

Optimizing and evaluating enterprise retrieval-augmented generation (RAG): A content design perspective

Sarah Packowski, Inge Halilovic, Jenifer Schlotfeldt, and Trish Smith. Optimizing and evaluating enterprise retrieval-augmented generation (RAG): A content design perspective. InProceedings of the 2024 8th International Conference on Advances in Artificial Intelligence (ICAAI ’24), New York, NY, USA, 2024. ACM. doi: 10.1145/3704137.3704181

work page doi:10.1145/3704137.3704181 2024
[13]

Tongxin Yuan, Zhiwei He, Lingzhong Dong, Yiming Wang, Ruijie Zhao, Tian Xia, Lizhen Xu, Binglin Zhou, Fangqi Li, Zhuosheng Zhang, et al

Xiaobo Guo and Soroush Vosoughi. Serial position effects of large language models. In Findings of the Association for Computational Linguistics: ACL 2025, pages 927–953, Vienna, Austria, 2025. Association for Computational Linguistics. doi: 10.18653/v1/2025. findings-acl.52

work page doi:10.18653/v1/2025 2025
[14]

Meta-prompting: Enhancing language models with task-agnostic scaffolding.arXiv preprint arXiv:2401.12954, 2024

Mirac Suzgun and Adam Tauman Kalai. Meta-prompting: Enhancing language models with task-agnostic scaffolding.arXiv preprint arXiv:2401.12954, 2024. 19

work page arXiv 2024
[15]

The Prompt Report: A Systematic Survey of Prompt Engineering Techniques

Sander Schulhoff, Michael Ilie, Nishant Balepur, Konstantine Kahadze, Amanda Liu, Chenglei Si, Yinheng Li, Aayush Gupta, HyoJung Han, Sevien Schulhoff, et al. The prompt report: A systematic survey of prompting techniques.arXiv preprint arXiv:2406.06608, 2024

work page internal anchor Pith review arXiv 2024
[16]

Found in the middle: How language models use long contexts better via plug-and-play positional encoding

Zhenyu Zhang, Runjin Chen, Shiwei Liu, Zhewei Yao, Olatunji Ruwase, Beidi Chen, Xiaoxia Wu, and Zhangyang Wang. Found in the middle: How language models use long contexts better via plug-and-play positional encoding. InAdvances in Neural Information Processing Systems 37 (NeurIPS), 2024

work page 2024
[17]

Kakade, Hao Peng, and Heng Ji

Ziqi Wang, Hanlin Zhang, Xiner Li, Kuan-Hao Huang, Chi Han, Shuiwang Ji, Sham M. Kakade, Hao Peng, and Heng Ji. Eliminating position bias of language models: A mechanistic approach. InProceedings of the 13th International Conference on Learning Representations (ICLR), 2025

work page 2025
[18]

I need to design an AI assistant that, upon receiving user input, can automatically deter- mine what type of request it is and decide what processing strategy to apply

Shengnan An, Zexiong Ma, Zeqi Lin, Nanning Zheng, Jian-Guang Lou, and Weizhu Chen. Make your LLM fully utilize the context. InAdvances in Neural Information Processing Systems 37 (NeurIPS), 2024. 20 A Full Test Question Set with Answer Keys Table 6: Complete 20-question test set with primary and secondary target categories. Questions are designed with key...

work page 2024