pith. machine review for the scientific record. sign in

arxiv: 2604.01413 · v2 · submitted 2026-04-01 · 💻 cs.CL · cs.AI

Recognition: no theorem link

Adaptive Stopping for Multi-Turn LLM Reasoning

Authors on Pith no claims yet

Pith reviewed 2026-05-13 22:17 UTC · model grok-4.3

classification 💻 cs.CL cs.AI
keywords conformal predictionmulti-turn reasoninglarge language modelsadaptive stoppingretrieval-augmented generationReActcoverage guaranteesquestion answering
0
0 comments X

The pith

MiCP enables multi-turn LLM reasoning to stop adaptively while preserving formal coverage guarantees by allocating error budgets across turns.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Large language models increasingly use multi-turn pipelines such as adaptive RAG and ReAct to answer complex questions, yet deciding when to stop has relied on heuristics without reliability guarantees. MiCP extends conformal prediction to these settings by dividing the total error budget among reasoning turns, allowing the model to halt early when intermediate outputs support it. Experiments on single-hop and multi-hop question answering benchmarks show that the method meets the target coverage probability while lowering average turns, inference cost, and final prediction set size. A new joint metric tracks both coverage validity and answering efficiency. The approach targets high-stakes domains where extra turns raise latency and expense without sacrificing the chance that the true answer remains covered.

Core claim

MiCP is the first conformal prediction framework for multi-turn LLM reasoning. It allocates different error budgets across turns so that adaptive stopping decisions still deliver an overall coverage guarantee. When applied to adaptive RAG and ReAct agents, MiCP reaches the target coverage on single-hop and multi-hop QA benchmarks, reduces the number of turns, inference cost, and prediction set size, and introduces a metric that jointly evaluates coverage validity and answering efficiency.

What carries the argument

Multi-turn conformal prediction with per-turn error budget allocation that supports adaptive stopping while preserving overall coverage.

If this is right

  • MiCP achieves the target coverage on both single-hop and multi-hop question answering benchmarks.
  • The method reduces the number of turns, inference cost, and prediction set size relative to fixed-turn baselines.
  • Formal coverage guarantees now apply to adaptive multi-turn pipelines that previously used only heuristics.
  • A new metric jointly quantifies coverage validity and answering efficiency.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The budget-allocation idea could extend to other sequential LLM workflows such as multi-agent collaboration.
  • Task-specific tuning of per-turn budgets might further improve efficiency without harming coverage.
  • Similar adaptive rules could be tested with uncertainty methods other than conformal prediction.

Load-bearing premise

The adaptive stopping rule based on intermediate outputs preserves the exchangeability conditions that conformal prediction needs for valid coverage guarantees.

What would settle it

An experiment on a QA benchmark where the stopping rule systematically favors low-confidence turns and the resulting empirical coverage falls below the nominal level.

Figures

Figures reproduced from arXiv: 2604.01413 by Bo Yu, Chenxi Liu, Huy Nguyen, Lu Cheng, Xiaofan Zhou.

Figure 2
Figure 2. Figure 2: Empirical gold retention rate vs. the target 1 [PITH_FULL_IMAGE:figures/full_fig_p008_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Empirical coverage rate vs. the target 1 [PITH_FULL_IMAGE:figures/full_fig_p009_3.png] view at source ↗
read the original abstract

Large Language Models (LLMs) increasingly rely on multi-turn reasoning and interaction, such as adaptive retrieval-augmented generation (RAG) and ReAct-style agents, to answer difficult questions. These methods improve accuracy by iteratively retrieving information, reasoning, or acting, but introduce a key challenge: \textbf{When should the model stop?} Existing approaches rely on heuristic stopping rules or fixed turn budgets and provide no formal guarantees that the final prediction still contains the correct answer. This limitation is particularly problematic in high-stakes domains such as finance and healthcare, where unnecessary turns increase cost and latency, while stopping too early risks incorrect decisions. Conformal prediction (CP) provides formal coverage guarantees, but existing LLM-CP methods only apply to a single model output and cannot handle multi-turn pipelines with adaptive stopping. To address this gap, we propose Multi-Turn Language Models with Conformal Prediction (MiCP), the first CP framework for multi-turn reasoning. MiCP allocates different error budgets across turns, enabling the model to stop early while maintaining an overall coverage guarantee. We demonstrate MiCP on adaptive RAG and ReAct, where it achieves the target coverage on both single-hop and multi-hop question answering benchmarks while reducing the number of turns, inference cost, and prediction set size. We further introduce a new metric that jointly evaluates coverage validity and answering efficiency.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 2 minor

Summary. The paper proposes Multi-Turn Language Models with Conformal Prediction (MiCP), a framework extending conformal prediction to multi-turn LLM pipelines such as adaptive RAG and ReAct. MiCP splits error budgets across turns to permit adaptive early stopping while claiming to preserve an overall coverage guarantee. Experiments on single-hop and multi-hop QA benchmarks are said to show that target coverage is achieved alongside reductions in turns, inference cost, and prediction-set size; a new joint metric for coverage validity and efficiency is also introduced.

Significance. If the coverage guarantee survives adaptive stopping, MiCP would supply the first formal CP treatment of multi-turn LLM agents, addressing a practical gap in high-stakes applications where both reliability and cost matter. The empirical reductions in turns and set size, together with the new efficiency-coverage metric, would be useful contributions provided they rest on a sound theoretical foundation.

major comments (3)
  1. [§3] §3 (MiCP Framework): The description of error-budget allocation across turns asserts an overall coverage guarantee, yet supplies neither a derivation of the per-turn thresholds nor a martingale/optional-stopping argument showing that exchangeability of nonconformity scores is preserved when stopping decisions depend on prior outputs. This is load-bearing for the central claim.
  2. [§4] §4 (Experiments): The text states that target coverage is achieved on the reported benchmarks, but provides no calibration-set size, explicit nonconformity-score definition, or per-turn coverage breakdown; without these it is impossible to verify whether the empirical results actually support the claimed guarantee.
  3. [§5] §5 (New Metric): The joint coverage-efficiency metric is introduced without a formal definition, invariance properties, or comparison to existing CP efficiency measures, making it difficult to assess whether it adds reproducible value.
minor comments (2)
  1. [Abstract] Abstract: the new metric is mentioned but never named; adding its name would improve readability.
  2. [Notation] Notation: the multi-turn process variables (e.g., stopping time, cumulative score) are introduced inconsistently between the method and experiment sections.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the thoughtful and constructive comments. We agree that the theoretical foundation, experimental details, and metric definition require strengthening for clarity and rigor. Below we respond point-by-point and indicate the revisions we will make in the next version of the manuscript.

read point-by-point responses
  1. Referee: [§3] §3 (MiCP Framework): The description of error-budget allocation across turns asserts an overall coverage guarantee, yet supplies neither a derivation of the per-turn thresholds nor a martingale/optional-stopping argument showing that exchangeability of nonconformity scores is preserved when stopping decisions depend on prior outputs. This is load-bearing for the central claim.

    Authors: We acknowledge that the original §3 presented the error-budget allocation at a high level without a complete formal derivation. In the revision we have added a new subsection that (i) explicitly derives the per-turn thresholds by sequentially partitioning the total miscoverage budget α across a maximum number of turns, and (ii) supplies a martingale argument based on the optional stopping theorem. Under the maintained assumption that nonconformity scores remain exchangeable conditional on the filtration generated by prior turns, the overall coverage guarantee is preserved at stopping time. We believe this addresses the load-bearing concern. revision: yes

  2. Referee: [§4] §4 (Experiments): The text states that target coverage is achieved on the reported benchmarks, but provides no calibration-set size, explicit nonconformity-score definition, or per-turn coverage breakdown; without these it is impossible to verify whether the empirical results actually support the claimed guarantee.

    Authors: We agree these details are necessary for verification. The revised manuscript now reports the exact calibration-set sizes (1,000 examples per benchmark), gives the precise nonconformity-score definition used (negative log-probability of the gold answer under the model), and adds a supplementary table showing per-turn empirical coverage together with the cumulative coverage at the adaptive stopping time. These additions confirm that the reported results align with the theoretical guarantee. revision: yes

  3. Referee: [§5] §5 (New Metric): The joint coverage-efficiency metric is introduced without a formal definition, invariance properties, or comparison to existing CP efficiency measures, making it difficult to assess whether it adds reproducible value.

    Authors: We have expanded §5 with a formal definition of the joint metric as the product of the coverage indicator and a normalized efficiency term (1 − |C| / |C_max|). We prove its invariance under monotone transformations of the nonconformity scores and include a direct comparison against the conventional average set-size metric and the efficiency-coverage Pareto curves from prior single-turn CP work. The new material demonstrates that the metric provides a compact, reproducible summary tailored to multi-turn adaptive settings. revision: yes

Circularity Check

0 steps flagged

MiCP coverage guarantee derived from standard split conformal prediction with explicit error allocation; no reduction to input by construction.

full rationale

The paper introduces MiCP by allocating per-turn error budgets (alpha_t) such that sum alpha_t = alpha, then applies standard conformal prediction at each stopping time. No equations are presented that define the coverage probability in terms of the stopping rule itself, nor are any parameters fitted to the test data and then relabeled as predictions. The validity argument rests on the marginal coverage property of conformal prediction under the maintained exchangeability assumption, which is stated as an assumption rather than derived from the method. Empirical results on external QA benchmarks are reported separately from the guarantee. No self-citations are used to justify uniqueness or to import an ansatz. The derivation is therefore self-contained and does not collapse to a tautology.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The framework rests on the standard conformal prediction coverage guarantee being preserved when error budgets are allocated across turns in an adaptive process; no free parameters or new entities are described in the abstract.

axioms (1)
  • standard math Conformal prediction supplies valid marginal coverage under exchangeability of the data and model outputs
    Invoked implicitly when extending single-turn CP to the multi-turn adaptive setting.

pith-pipeline@v0.9.0 · 5541 in / 1292 out tokens · 38316 ms · 2026-05-13T22:17:56.732847+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

40 extracted references · 40 canonical work pages · 3 internal anchors

  1. [1]

    GPT-4 Technical Report

    Josh Achiam, Steven Adler, Sandhini Agarwal, Lama Ahmad, Ilge Akkaya, Florencia Leoni Aleman, Diogo Almeida, Janko Altenschmidt, Sam Altman, Shyamal Anadkat, et al. Gpt-4 technical report. arXiv preprint arXiv:2303.08774, 2023

  2. [2]

    A Gentle Introduction to Conformal Prediction and Distribution-Free Uncertainty Quantification

    Anastasios N Angelopoulos and Stephen Bates. A gentle introduction to conformal prediction and distribution-free uncertainty quantification. arXiv preprint arXiv:2107.07511, 2021

  3. [3]

    Conformal prediction for natural language processing: A survey

    Margarida Campos, Ant \'o nio Farinhas, Chrysoula Zerva, M \'a rio AT Figueiredo, and Andr \'e FT Martins. Conformal prediction for natural language processing: A survey. Transactions of the Association for Computational Linguistics, 12: 0 1497--1516, 2024

  4. [4]

    Principled context engineering for rag: Statistical guarantees via conformal prediction

    Debashish Chakraborty, Eugene Yang, Daniel Khashabi, Dawn Lawrie, and Kevin Duh. Principled context engineering for rag: Statistical guarantees via conformal prediction. arXiv preprint arXiv:2511.17908, 2025

  5. [5]

    Retrieve only when it needs: Adaptive retrieval augmentation for hallucination mitigation in large language models

    Hanxing Ding, Liang Pang, Zihao Wei, Huawei Shen, and Xueqi Cheng. Retrieve only when it needs: Adaptive retrieval augmentation for hallucination mitigation in large language models. CoRR, abs/2402.10612, 2024. doi:10.48550/ARXIV.2402.10612. URL https://doi.org/10.48550/arXiv.2402.10612

  6. [6]

    Elasticsearch

    BV Elasticsearch. Elasticsearch. software], version, 6 0 (1), 2018

  7. [7]

    Constructing a multi-hop QA dataset for comprehensive evaluation of reasoning steps

    Xanh Ho, Anh-Khoa Duong Nguyen, Saku Sugawara, and Akiko Aizawa. Constructing a multi-hop QA dataset for comprehensive evaluation of reasoning steps. In Proceedings of the 28th International Conference on Computational Linguistics, pp.\ 6609--6625, Barcelona, Spain (Online), 2020. International Committee on Computational Linguistics

  8. [8]

    Adaptive-rag: Learning to adapt retrieval-augmented large language models through question complexity

    Soyeong Jeong, Jinheon Baek, Sukmin Cho, Sung Ju Hwang, and Jong Park. Adaptive-rag: Learning to adapt retrieval-augmented large language models through question complexity. In Kevin Duh, Helena G \' o mez - Adorno, and Steven Bethard (eds.), Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics:...

  9. [9]

    Active retrieval augmented generation

    Zhengbao Jiang, Frank Xu, Luyu Gao, Zhiqing Sun, Qian Liu, Jane Dwivedi-Yu, Yiming Yang, Jamie Callan, and Graham Neubig. Active retrieval augmented generation. In Houda Bouamor, Juan Pino, and Kalika Bali (eds.), Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, pp.\ 7969--7992, Singapore, December 2023. Association ...

  10. [10]

    Weld, and Luke Zettlemoyer

    Mandar Joshi, Eunsol Choi, Daniel S. Weld, and Luke Zettlemoyer. TriviaQA : A large scale distantly supervised challenge dataset for reading comprehension. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp.\ 1601--1611, Vancouver, Canada, 2017

  11. [11]

    Conformal prediction with large language models for multi-choice question answering

    Bhawesh Kumar, Charlie Lu, Gauri Gupta, Anil Palepu, David Bellamy, Ramesh Raskar, and Andrew Beam. Conformal prediction with large language models for multi-choice question answering. arXiv preprint arXiv:2305.18404, 2023

  12. [12]

    Dai, Jakob Uszkoreit, Quoc Le, and Slav Petrov

    Tom Kwiatkowski, Jennimaria Palomaki, Olivia Redfield, Michael Collins, Ankur Parikh, Chris Alberti, Danielle Epstein, Illia Polosukhin, Jacob Devlin, Kenton Lee, Kristina Toutanova, Llion Jones, Matthew Kelcey, Ming-Wei Chang, Andrew M. Dai, Jakob Uszkoreit, Quoc Le, and Slav Petrov. Natural questions: A benchmark for question answering research. Transac...

  13. [13]

    TRAQ : Trustworthy retrieval augmented question answering via conformal prediction

    Shuo Li, Sangdon Park, Insup Lee, and Osbert Bastani. TRAQ : Trustworthy retrieval augmented question answering via conformal prediction. In Kevin Duh, Helena Gomez, and Steven Bethard (eds.), Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Paper...

  14. [14]

    Quco-rag: Quantifying uncertainty from the pre-training corpus for dynamic retrieval-augmented generation

    Dehai Min, Kailin Zhang, Tongtong Wu, and Lu Cheng. Quco-rag: Quantifying uncertainty from the pre-training corpus for dynamic retrieval-augmented generation. arXiv preprint arXiv:2512.19134, 2025

  15. [15]

    Language

    Christopher Mohri and Tatsunori Hashimoto. Language models with conformal factuality guarantees. arXiv preprint arXiv:2402.10978, 2024

  16. [16]

    Adaptive retrieval without self-knowledge? bringing uncertainty back home

    Viktor Moskvoretskii, Maria Marina, Mikhail Salnikov, Nikolay Ivanov, Sergey Pletenev, Daria Galimzianova, Nikita Krayko, Vasily Konovalov, Irina Nikishina, and Alexander Panchenko. Adaptive retrieval without self-knowledge? bringing uncertainty back home. In Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1...

  17. [17]

    Conformal language modeling

    Victor Quach, Adam Fisch, Tal Schuster, Adam Yala, Jae Ho Sohn, Tommi S Jaakkola, and Regina Barzilay. Conformal language modeling. arXiv preprint arXiv:2306.10193, 2023

  18. [18]

    Qwen3.5 : Towards native multimodal agents, February 2026

    Qwen Team . Qwen3.5 : Towards native multimodal agents, February 2026. URL https://qwen.ai/blog?id=qwen3.5

  19. [19]

    Contribution and performance of chatgpt and other large language models (llm) for scientific and research advancements: a double-edged sword

    Nitin Liladhar Rane, Abhijeet Tawde, Saurabh P Choudhary, and Jayesh Rane. Contribution and performance of chatgpt and other large language models (llm) for scientific and research advancements: a double-edged sword. International Research Journal of Modernization in Engineering Technology and Science, 5 0 (10): 0 875--899, 2023

  20. [20]

    Conformal language model reasoning with coherent factuality

    Maxon Rubin-Toles, Maya Gambhir, Keshav Ramji, Aaron Roth, and Surbhi Goel. Conformal language model reasoning with coherent factuality. arXiv preprint arXiv:2505.17126, 2025

  21. [21]

    Confident adaptive language modeling

    Tal Schuster, Adam Fisch, Jai Gupta, Mostafa Dehghani, Dara Bahri, Vinh Tran, Yi Tay, and Donald Metzler. Confident adaptive language modeling. Advances in Neural Information Processing Systems, 35: 0 17456--17472, 2022

  22. [22]

    A tutorial on conformal prediction

    Glenn Shafer and Vladimir Vovk. A tutorial on conformal prediction. Journal of Machine Learning Research, 9 0 (3), 2008

  23. [23]

    Analyzing uncertainty of llm-as-a-judge: Interval evaluations with conformal prediction, 2025

    Huanxin Sheng, Xinyi Liu, Hangfeng He, Jieyu Zhao, and Jian Kang. Analyzing uncertainty of llm-as-a-judge: Interval evaluations with conformal prediction, 2025. URL https://arxiv.org/abs/2509.18658

  24. [24]

    Reflexion: Language agents with verbal reinforcement learning

    Noah Shinn, Federico Cassano, Ashwin Gopinath, Karthik Narasimhan, and Shunyu Yao. Reflexion: Language agents with verbal reinforcement learning. Advances in neural information processing systems, 36: 0 8634--8652, 2023

  25. [25]

    API is enough: Conformal prediction for large language models without logit-access

    Jiayuan Su, Jing Luo, Hongwei Wang, and Lu Cheng. API is enough: Conformal prediction for large language models without logit-access. In Yaser Al-Onaizan, Mohit Bansal, and Yun-Nung Chen (eds.), Findings of the Association for Computational Linguistics: EMNLP 2024, pp.\ 979--995, Miami, Florida, USA, November 2024 a . Association for Computational Linguis...

  26. [26]

    Dragin: Dynamic retrieval augmented generation based on the real-time information needs of large language models

    Weihang Su, Yichen Tang, Qingyao Ai, Zhijing Wu, and Yiqun Liu. Dragin: Dynamic retrieval augmented generation based on the information needs of large language models, 2024 b . URL https://arxiv.org/abs/2403.10081

  27. [27]

    Gemma: Open Models Based on Gemini Research and Technology

    Gemma Team, Thomas Mesnard, Cassidy Hardin, Robert Dadashi, Surya Bhupatiraju, Shreya Pathak, Laurent Sifre, Morgane Rivi \`e re, Mihir Sanjay Kale, Juliette Love, et al. Gemma: Open models based on gemini research and technology. arXiv preprint arXiv:2403.08295, 2024

  28. [28]

    MuSiQue : Multihop questions via single-hop question composition

    Harsh Trivedi, Niranjan Balasubramanian, Tushar Khot, and Ashish Sabharwal. MuSiQue : Multihop questions via single-hop question composition. Transactions of the Association for Computational Linguistics, 10: 0 539--554, 2022

  29. [29]

    Interleaving Retrieval with Chain-of-Thought Reasoning for Knowledge-Intensive Multi-Step Questions

    Harsh Trivedi, Niranjan Balasubramanian, Tushar Khot, and Ashish Sabharwal. Interleaving retrieval with chain-of-thought reasoning for knowledge-intensive multi-step questions. In Anna Rogers, Jordan L. Boyd - Graber, and Naoaki Okazaki (eds.), Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers),...

  30. [30]

    Large language models for robotics: Opportunities, challenges, and perspectives

    Jiaqi Wang, Enze Shi, Huawen Hu, Chong Ma, Yiheng Liu, Xuhui Wang, Yincheng Yao, Xuan Liu, Bao Ge, and Shu Zhang. Large language models for robotics: Opportunities, challenges, and perspectives. Journal of Automation and Intelligence, 4 0 (1): 0 52--64, 2025

  31. [31]

    Cohen, Ruslan Salakhutdinov, and Christopher D

    Zhilin Yang, Peng Qi, Saizheng Zhang, Yoshua Bengio, William W. Cohen, Ruslan Salakhutdinov, and Christopher D. Manning. HotpotQA : A dataset for diverse, explainable multi-hop question answering. In Conference on Empirical Methods in Natural Language Processing ( EMNLP ) , 2018

  32. [32]

    React: Synergizing reasoning and acting in language models

    Shunyu Yao, Jeffrey Zhao, Dian Yu, Nan Du, Izhak Shafran, Karthik R Narasimhan, and Yuan Cao. React: Synergizing reasoning and acting in language models. In The eleventh international conference on learning representations, 2022

  33. [33]

    Seakr: Self-aware knowledge retrieval for adaptive retrieval augmented generation

    Zijun Yao, Weijian Qi, Liangming Pan, Shulin Cao, Linmei Hu, Weichuan Liu, Lei Hou, and Juanzi Li. Seakr: Self-aware knowledge retrieval for adaptive retrieval augmented generation. CoRR, abs/2406.19215, 2024. doi:10.48550/ARXIV.2406.19215. URL https://doi.org/10.48550/arXiv.2406.19215

  34. [34]

    Wong, Emine Yilmaz, Shuming Shi, and Zhaopeng Tu

    Fanghua Ye, Mingming Yang, Jianhui Pang, Longyue Wang, Derek F. Wong, Emine Yilmaz, Shuming Shi, and Zhaopeng Tu. Benchmarking LLM s via uncertainty quantification. In The Thirty-eight Conference on NIPS Datasets and Benchmarks Track, 2024. URL https://openreview.net/forum?id=L0oSfTroNE

  35. [35]

    Conformal structured prediction

    Botong Zhang, Shuo Li, and Osbert Bastani. Conformal structured prediction. arXiv preprint arXiv:2410.06296, 2024

  36. [36]

    Conformal prediction: A data perspective

    Xiaofan Zhou, Baiting Chen, Yu Gui, and Lu Cheng. Conformal prediction: A data perspective. ACM Computing Surveys, 2025

  37. [37]

    write newline

    " write newline "" before.all 'output.state := FUNCTION n.dashify 't := "" t empty not t #1 #1 substring "-" = t #1 #2 substring "--" = not "--" * t #2 global.max substring 't := t #1 #1 substring "-" = "-" * t #2 global.max substring 't := while if t #1 #1 substring * t #2 global.max substring 't := if while FUNCTION format.date year duplicate empty "emp...

  38. [38]

    @esa (Ref

    \@ifxundefined[1] #1\@undefined \@firstoftwo \@secondoftwo \@ifnum[1] #1 \@firstoftwo \@secondoftwo \@ifx[1] #1 \@firstoftwo \@secondoftwo [2] @ #1 \@temptokena #2 #1 @ \@temptokena \@ifclassloaded agu2001 natbib The agu2001 class already includes natbib coding, so you should not add it explicitly Type <Return> for now, but then later remove the command n...

  39. [39]

    \@lbibitem[] @bibitem@first@sw\@secondoftwo \@lbibitem[#1]#2 \@extra@b@citeb \@ifundefined br@#2\@extra@b@citeb \@namedef br@#2 \@nameuse br@#2\@extra@b@citeb \@ifundefined b@#2\@extra@b@citeb @num @parse #2 @tmp #1 NAT@b@open@#2 NAT@b@shut@#2 \@ifnum @merge>\@ne @bibitem@first@sw \@firstoftwo \@ifundefined NAT@b*@#2 \@firstoftwo @num @NAT@ctr \@secondoft...

  40. [40]

    @open @close @open @close and [1] URL: #1 \@ifundefined chapter * \@mkboth \@ifxundefined @sectionbib * \@mkboth * \@mkboth\@gobbletwo \@ifclassloaded amsart * \@ifclassloaded amsbook * \@ifxundefined @heading @heading NAT@ctr thebibliography [1] @ \@biblabel @NAT@ctr \@bibsetup #1 @NAT@ctr @ @openbib .11em \@plus.33em \@minus.07em 4000 4000 `\.\@m @bibit...