arxiv: 2604.23113 · v1 · submitted 2026-04-25 · 💻 cs.SI

Recognition: unknown

Reducing Detail Hallucinations in Long-Context Regulatory Understanding via Targeted Preference Optimization

Bin Chong, Chongyang Zhang, Hao Zheng, Jiayu Liang, Kefu Xu, Qian Li, Ran Ran, Yang Liu, Yuhan Lin, Ziyi Zhang

Authors on Pith no claims yet

Pith reviewed 2026-05-08 07:10 UTC · model grok-4.3

classification 💻 cs.SI

keywords detail hallucinationspreference optimizationregulatory documentslong-context LLMsDetailDPODetailBencherror taxonomy

0 comments

The pith

Targeted preference optimization on pairs differing in one detail cuts LLM hallucinations in regulatory texts by 42-61%.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Large language models often generate plausible but incorrect details when reading long regulatory documents, such as wrong thresholds, units, or conditions. This work introduces DetailDPO, which trains models using contrastive pairs that differ in exactly one detail from a five-type error taxonomy. It supplies a new benchmark of 13,000 human-annotated pairs drawn from real and synthetic regulatory texts. Experiments across Qwen and Llama models at multiple sizes and context lengths show large, consistent reductions in these errors, including transfer to financial and medical domains.

Core claim

By constructing preference pairs that differ in exactly one detail dimension, DetailDPO concentrates the DPO gradient signal on detail-bearing tokens; theoretical analysis shows this occurs under mild assumptions, and experiments confirm a 42-61% relative drop in Detail Error Rate across five error types, three context-length tiers, and multiple model families with cross-domain transfer.

What carries the argument

Minimal detail perturbation contrastive pairs inside a targeted DPO loop, which isolate gradient updates to specific factual elements rather than broad response quality.

If this is right

Detail Error Rate falls 42-61% relative to standard baselines on the same models and data.
Gains appear uniformly across all five error types in the taxonomy.
The same training produces measurable improvements on financial and medical documents without domain-specific retraining.
Benefits hold for models ranging from 7B to 72B parameters and contexts from 8K to 64K tokens.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The single-dimension perturbation idea could be adapted to other alignment methods to target specific factual failure modes rather than overall preference.
DetailBench-style annotation could become a template for building precision benchmarks in any domain where small factual slips carry high cost.
Controlling the exact difference between preferred and rejected responses may prove more efficient than scaling data volume alone for fixing narrow hallucination types.

Load-bearing premise

Contrastive pairs that differ in exactly one detail dimension will concentrate DPO gradient signal on detail-bearing tokens under the mild assumptions in the theoretical analysis.

What would settle it

Measuring DPO gradients on detail tokens when using minimal-perturbation pairs and finding no concentration relative to standard DPO pairs would falsify the claimed mechanism.

Figures

Figures reproduced from arXiv: 2604.23113 by Bin Chong, Chongyang Zhang, Hao Zheng, Jiayu Liang, Kefu Xu, Qian Li, Ran Ran, Yang Liu, Yuhan Lin, Ziyi Zhang.

**Figure 1.** Figure 1: DetailDPO pipeline: (1) Real + synthetic regulatory document collection, (2) Compliance annotation with view at source ↗

**Figure 2.** Figure 2: DetailBench statistics. Left: Document counts by source (GB: 65, CFR: 31, EUR-Lex: 76, Synthetic: 150). Right: Distribution of detail elements across five error types and three context-length tiers. by varying the document combinations and query formulations; each sample then yields up to five preference pairs (one per error type via minimal detail perturbation). We enforce strict documentlevel splits: do… view at source ↗

**Figure 3.** Figure 3: Results overview. Left: Compliance accuracy and evidence F1 across methods. Right: DER by error type, showing DetailDPO’s consistent reduction across all five categories. Method Compl. Acc. DER↓ Evid. Prec. Evid. Rec. Evid. F1 Evid. Consist. Commercial API baselines (zero-shot) GPT-4o 0.78 0.16 0.68 0.61 0.64 0.58 Claude-3.5-Sonnet 0.76 0.18 0.65 0.59 0.62 0.55 Qwen2.5-7B-Instruct Zero-shot 0.65 0.28 0.42 … view at source ↗

**Figure 4.** Figure 4: Gradient concentration analysis. (a): Mean gradient norm ratio (detail vs. non-detail tokens); error bars show ±1 std over 500 samples. (b): Per-token gradient magnitude heatmap for a representative sample; dashed lines mark detail token positions P. DetailDPO concentrates gradient on detail tokens, while Generic DPO distributes it uniformly. tions, the DPO gradient is forced to concentrate on exactly thos… view at source ↗

read the original abstract

Large language models (LLMs) frequently produce \emph{detail hallucinations} when processing long regulatory documents, including subtle errors in threshold values, units, scopes, obligation levels, and conditions that preserve surface plausibility while corrupting safety-critical parameters. We formalize this phenomenon through a fine-grained \emph{Detail Error Taxonomy} of five error types and introduce \textbf{DetailBench}, a benchmark built from 172 real regulatory documents and 150 synthetic documents spanning three jurisdictions, with human-annotated detail-level ground truth comprising 13,000 preference pairs. We propose \textbf{DetailDPO}, a targeted preference optimization framework that constructs contrastive pairs differing in exactly one detail dimension, concentrating DPO gradient signal on detail-bearing~tokens. We provide theoretical analysis showing why \emph{minimal detail perturbation} pairs yield gradient concentration under mild assumptions. Experiments on the Qwen2.5 family (7B, 14B, 72B) and Llama-3.1-8B across three context-length tiers (8K--64K tokens) show that DetailDPO reduces the Detail Error Rate by 42--61\% relative to baselines, with consistent gains across all five error types and cross-domain transfer to financial and medical documents.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper adds a solid taxonomy and benchmark for detail errors in regulatory text plus a targeted DPO variant that delivers clear error drops in the reported experiments.

read the letter

The punchline is that this work gives a practical handle on a real deployment problem: LLMs messing up precise numbers, units, scopes, and conditions in long regulatory documents. They define five error categories, release DetailBench built from 172 real documents plus synthetics with 13,000 human preference pairs, and introduce DetailDPO that builds contrastive pairs differing in exactly one detail dimension. The experiments on Qwen2.5 models and Llama-3.1 show 42-61% relative drops in detail error rate across context lengths and some transfer to finance and medical text. That combination of new resources and consistent empirical gains is the useful part. The minimal-perturbation pair construction is a reasonable idea for steering the optimization signal. The results look reproducible enough on the surface to be worth testing in other labs. The soft spots are mostly in the supporting analysis. The abstract gives no error bars, no full hyperparameter or training details, and no ablation on how the pairs are actually constructed or filtered. The theoretical claim that single-dimension perturbations concentrate gradients on detail tokens rests on mild assumptions, but nothing in the reported work checks whether those assumptions survive real attention patterns or subword tokenization of numbers and units in 8K-64K contexts. If the gains largely come from running DPO rather than the targeting trick, the central mechanism is not yet nailed down. This is for people working on reliable long-context LLMs for compliance, legal, or regulated domains. A reader who needs benchmarks or methods for reducing specific hallucination types will get concrete value from the dataset and the reported numbers. It is coherent enough and introduces enough new material to deserve a serious referee, even if the mechanism needs more verification.

Referee Report

3 major / 2 minor

Summary. The manuscript claims that LLMs produce detail hallucinations in long regulatory documents, which it formalizes via a five-type Detail Error Taxonomy. It introduces DetailBench, a benchmark derived from 172 real and 150 synthetic regulatory documents across jurisdictions with 13,000 human-annotated preference pairs. The proposed DetailDPO method constructs contrastive pairs that differ in exactly one detail dimension to target DPO gradients on detail-bearing tokens, supported by theoretical analysis under mild assumptions. Experiments across Qwen2.5 (7B,14B,72B) and Llama-3.1-8B models at 8K-64K context lengths demonstrate 42-61% relative reductions in Detail Error Rate, with gains across error types and transfer to financial and medical domains.

Significance. If the results hold and the targeted mechanism is confirmed, this would represent a meaningful advance in mitigating a specific and safety-critical failure mode in LLMs for regulatory text processing. The fine-grained taxonomy and large-scale benchmark from real documents provide reusable resources for the community. The consistent performance across model scales and context lengths, plus cross-domain generalization, strengthens the practical value. The theoretical analysis, while not yet empirically validated, offers a principled starting point for targeted alignment techniques.

major comments (3)

The theoretical analysis claims that contrastive pairs differing in exactly one detail dimension concentrate the DPO gradient on detail-bearing tokens under mild assumptions. However, no empirical verification (e.g., gradient attribution, token-level update analysis, or ablation on attention patterns) is provided to confirm this holds in long-context transformers (8K-64K), where attention diffusion or subword tokenization of numbers/units could spread updates. This is load-bearing for the central claim that gains stem from the targeted mechanism rather than generic preference optimization.
The Experiments section reports 42--61% relative reductions in Detail Error Rate across models and context tiers but provides no error bars, statistical significance tests, full training hyperparameters, data splits, or ablations on contrastive pair construction (e.g., ensuring exactly one detail dimension differs). These omissions undermine assessment of robustness and the cross-domain transfer claims.
The DetailBench construction (mentioned with 13,000 pairs) lacks reported inter-annotator agreement for human annotations of the five error types or validation details for the 150 synthetic documents against real regulatory structures. This affects the reliability of the ground truth used to support the taxonomy and evaluation.

minor comments (2)

The abstract contains a formatting artifact ('detail-bearing~tokens'); ensure consistent notation throughout.
Consider reporting absolute Detail Error Rates alongside relative reductions to improve interpretability of the 42--61% figures.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the thoughtful and constructive feedback on our manuscript. We appreciate the positive assessment of the work's significance and the identification of areas for improvement. Below, we address each major comment in detail and outline the revisions we will make.

read point-by-point responses

Referee: The theoretical analysis claims that contrastive pairs differing in exactly one detail dimension concentrate the DPO gradient on detail-bearing tokens under mild assumptions. However, no empirical verification (e.g., gradient attribution, token-level update analysis, or ablation on attention patterns) is provided to confirm this holds in long-context transformers (8K-64K), where attention diffusion or subword tokenization of numbers/units could spread updates. This is load-bearing for the central claim that gains stem from the targeted mechanism rather than generic preference optimization.

Authors: We agree that empirical verification of the gradient concentration mechanism would strengthen the central claim. The theoretical analysis in the manuscript derives the concentration result under the stated mild assumptions of token independence and single-dimension perturbation. The experimental results show that DetailDPO outperforms standard DPO, which we attribute to the targeted construction, but we acknowledge the lack of direct attribution analysis. In the revised version, we will add an ablation study on contrastive pair construction (single vs. multiple dimension differences) and include a preliminary token-level gradient analysis for a subset of the models to provide supporting evidence for the mechanism. revision: yes
Referee: The Experiments section reports 42--61% relative reductions in Detail Error Rate across models and context tiers but provides no error bars, statistical significance tests, full training hyperparameters, data splits, or ablations on contrastive pair construction (e.g., ensuring exactly one detail dimension differs). These omissions undermine assessment of robustness and the cross-domain transfer claims.

Authors: We thank the referee for pointing out these reporting gaps. The full set of training hyperparameters and data splits are detailed in Appendix B of the manuscript, but we will move key information to the main Experiments section for better visibility. We will add error bars (standard deviation over 3 runs) and statistical significance tests (e.g., Wilcoxon signed-rank tests) to all reported results. Additionally, we will include an ablation on the contrastive pair construction to demonstrate the benefit of ensuring exactly one detail dimension differs, which will also bolster the cross-domain transfer claims by showing consistent gains. revision: yes
Referee: The DetailBench construction (mentioned with 13,000 pairs) lacks reported inter-annotator agreement for human annotations of the five error types or validation details for the 150 synthetic documents against real regulatory structures. This affects the reliability of the ground truth used to support the taxonomy and evaluation.

Authors: We agree that reporting inter-annotator agreement is essential for establishing the reliability of the annotations. Although the manuscript describes the annotation process, we did not include the specific agreement metrics in the main text. In the revision, we will add the inter-annotator agreement results (computed using standard metrics such as Cohen's kappa) for the five error types. Similarly, we will provide additional details on how the 150 synthetic documents were validated against real regulatory structures, including the expert review process used to ensure they reflect authentic regulatory formats. revision: yes

Circularity Check

0 steps flagged

No circularity: theoretical analysis and empirical gains are independent of inputs by construction

full rationale

The paper's derivation chain consists of (1) formalizing a Detail Error Taxonomy, (2) building DetailBench with human-annotated pairs, (3) constructing contrastive pairs that differ in one detail dimension, and (4) providing theoretical analysis under mild assumptions that such pairs concentrate DPO gradients on detail-bearing tokens. None of these steps reduce to a self-definition, a fitted parameter renamed as prediction, or a load-bearing self-citation. The reported 42-61% Detail Error Rate reductions are measured on external benchmarks (Qwen2.5 family, Llama-3.1, cross-domain transfer) rather than quantities defined by the method itself. The theoretical claim is an explanatory analysis, not a uniqueness theorem or ansatz smuggled via prior work that would force the outcome. This is the normal case of an empirical method with supporting analysis.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

The central claim rests on the assumption that minimal detail perturbation pairs produce useful gradient concentration; no explicit free parameters, axioms, or invented entities are stated in the abstract.

pith-pipeline@v0.9.0 · 5549 in / 1173 out tokens · 28004 ms · 2026-05-08T07:10:48.826564+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

3 extracted references · 3 canonical work pages · 3 internal anchors

[1]

Rethinking Attention with Performers

Legal-bert: The muppets straight out of law school. InFindings of the association for computa- tional linguistics: EMNLP 2020, pages 2898–2904. Ting Chen, Simon Kornblith, Mohammad Norouzi, and Geoffrey Hinton. 2020. A simple framework for contrastive learning of visual representations. InIn- ternational conference on machine learning, pages 1597–1607. Pm...

work page internal anchor Pith review arXiv 2020
[2]

YaRN: Efficient Context Window Extension of Large Language Models

Training language models to follow instruc- tions with human feedback.Advances in neural in- formation processing systems, 35:27730–27744. Bowen Peng, Jeffrey Quesnelle, Honglu Fan, and En- rico Shippole. 2023. Yarn: Efficient context window extension of large language models.arXiv preprint arXiv:2309.00071. Rafael Rafailov, Archit Sharma, Eric Mitchell, ...

work page internal anchor Pith review arXiv 2023
[3]

Proximal Policy Optimization Algorithms

Direct preference optimization: Your language model is secretly a reward model.Advances in neural information processing systems, 36:53728–53741. John Schulman, Filip Wolski, Prafulla Dhariwal, Alec Radford, and Oleg Klimov. 2017. Proxi- mal policy optimization algorithms.arXiv preprint arXiv:1707.06347. Thomas Schuster, Marian Lambert, Nico Döring, and J...

work page internal anchor Pith review arXiv 2017