arxiv: 2604.13258 · v1 · submitted 2026-04-14 · 💻 cs.CL · cs.AI

Recognition: unknown

Hessian-Enhanced Token Attribution (HETA): Interpreting Autoregressive LLMs

Maisha Maliha, Nathaniel D. Bastian, Sumit Kumar Jha, Vishal Pramanik

Pith reviewed 2026-05-10 15:19 UTC · model grok-4.3

classification 💻 cs.CL cs.AI

keywords token attributionautoregressive LLMsmodel interpretabilityHessian sensitivityKL divergencedecoder-only modelsattribution faithfulnessgenerative benchmarks

0 comments

The pith

HETA improves token attributions for autoregressive language models by combining semantic transition vectors, Hessian sensitivities, and KL divergence.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces HETA to explain which input tokens shape the outputs of decoder-only language models. Existing attribution methods rely on linear approximations designed for encoder architectures and miss the sequential causal dynamics of autoregressive generation. HETA adds a semantic transition vector to track token influence across layers, Hessian-based scores to capture second-order sensitivity effects, and KL divergence to quantify information loss from masking tokens. Evaluations across models and datasets show higher faithfulness scores and stronger agreement with human annotations than prior techniques. This matters because clearer explanations can help users understand and verify model behavior in text generation tasks.

Core claim

HETA is a unified attribution framework for decoder-only language models that integrates a semantic transition vector capturing token-to-token influence across layers, Hessian-based sensitivity scores modeling second-order effects, and KL divergence measuring information loss when tokens are masked. This produces context-aware, causally faithful, and semantically grounded attributions. Empirical tests across multiple models and datasets show consistent outperformance over existing methods on faithfulness metrics and alignment with human annotations, while also introducing a curated benchmark dataset for generative attribution evaluation.

What carries the argument

The HETA framework, which unifies a semantic transition vector, Hessian-based sensitivity scores, and KL divergence measurements to quantify each token's contribution during autoregressive generation.

If this is right

Attributions produced by HETA align more closely with human judgments on generated text than prior methods.
The framework generalizes across multiple decoder-only models and evaluation datasets.
A new benchmark dataset enables systematic comparison of attribution quality in generative settings.
HETA addresses shortcomings of encoder-focused linear techniques for causal autoregressive processes.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

HETA attributions could help trace which tokens trigger specific outputs such as factual errors or biased responses.
The Hessian component might extend to second-order analysis in other neural network interpretability tasks.
The introduced benchmark could become a reference standard for testing future attribution methods on generative models.

Load-bearing premise

That the combination of semantic transition vectors, Hessian sensitivities, and KL divergence captures the causal and semantic complexities of autoregressive generation more effectively than linear approximations.

What would settle it

Direct head-to-head tests on the paper's benchmark dataset where HETA fails to exceed baseline methods on faithfulness metrics or human annotation agreement would falsify the central claim.

Figures

Figures reproduced from arXiv: 2604.13258 by Maisha Maliha, Nathaniel D. Bastian, Sumit Kumar Jha, Vishal Pramanik.

**Figure 1.** Figure 1: Overview of HETA. The pipeline (a) rolls out attention–value flows that end at the target token to form a causal gate over input tokens, (b) estimates token-level curvature via scalable Hessian–vector products to capture nonlinear interactions, and (c) measures KL-based information impact under token masking. The final attribution combines causal gating, curvature sensitivity, and information gain to produ… view at source ↗

**Figure 2.** Figure 2: (a)-(c) Analysis of HETA components. Each bar plot shows the effect of ablating key components of HETA. The full HETA model achieves the highest attribution faithfulness and alignment across all metrics, while removing individual components consistently degrades performance.(d) Input importance distributions for a generative task using our proposed HETA method. (a) (b) (c) 敏捷的棕色狐狸跳过懒狗敏捷的棕色狐狸跳过懒 … view at source ↗

**Figure 3.** Figure 3: (a)-(c) Analysis of robustness of HETA vs baseline methods (Left) Sensitivity under Gaussian perturbations (lower is better), where HETA maintains the lowest variance across input noise. (Center) Active/Passive robustness (higher is better), reflecting attribution consistency across syntactic rephrasings. (Right) Alignment F1 score against annotated tokens (higher is better). HETA outperforms all baselines… view at source ↗

**Figure 4.** Figure 4: Word-level attribution visualization for predicting the final word “slice.” Each word is [PITH_FULL_IMAGE:figures/full_fig_p033_4.png] view at source ↗

**Figure 5.** Figure 5: Word-level attribution visualization for predicting the final word “friends.” Bounding boxes [PITH_FULL_IMAGE:figures/full_fig_p033_5.png] view at source ↗

**Figure 6.** Figure 6: Word-level attribution visualization for predicting the final word “bush.” Attribution scores [PITH_FULL_IMAGE:figures/full_fig_p039_6.png] view at source ↗

read the original abstract

Attribution methods seek to explain language model predictions by quantifying the contribution of input tokens to generated outputs. However, most existing techniques are designed for encoder-based architectures and rely on linear approximations that fail to capture the causal and semantic complexities of autoregressive generation in decoder-only models. To address these limitations, we propose Hessian-Enhanced Token Attribution (HETA), a novel attribution framework tailored for decoder-only language models. HETA combines three complementary components: a semantic transition vector that captures token-to-token influence across layers, Hessian-based sensitivity scores that model second-order effects, and KL divergence to measure information loss when tokens are masked. This unified design produces context-aware, causally faithful, and semantically grounded attributions. Additionally, we introduce a curated benchmark dataset for systematically evaluating attribution quality in generative settings. Empirical evaluations across multiple models and datasets demonstrate that HETA consistently outperforms existing methods in attribution faithfulness and alignment with human annotations, establishing a new standard for interpretability in autoregressive language models.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

HETA combines Hessian sensitivity, semantic transition vectors, and KL divergence for token attribution in decoder-only LLMs plus a new benchmark, but the abstract gives no numbers or costs so the outperformance claim is hard to judge yet.

read the letter

Hi, about the HETA paper. The main thing is a three-part attribution method built specifically for autoregressive decoder-only models: a semantic transition vector to track token influence across layers, Hessian-based scores to pick up second-order effects, and KL divergence to quantify what gets lost when a token is masked. They also release a benchmark dataset aimed at generative settings rather than just classification. That focus on decoder-only architectures fills a real gap, since most older attribution work targets encoders and linear approximations that miss the sequential causal structure here. The design looks like a reasonable way to get more context-aware and semantically grounded scores than plain gradients or attention alone. The abstract states that HETA beats existing methods on faithfulness and human alignment across several models and datasets. If the full experiments include proper baselines, ablations, and controls, that would be a useful incremental result for interpretability work. The soft spots are the missing details. No quantitative results, error bars, or baseline names appear in the abstract, and there is no discussion of the computational cost of Hessian calculations on large models. It is also unclear exactly how the three components are weighted or combined, or whether the benchmark avoids common evaluation pitfalls like leaking future tokens. These are fixable but they leave the central claim resting on unspecified empirical work. This is for researchers who need better token-level explanations for debugging or safety auditing of current LLMs. A reader already working on attribution methods would find the tailored design and the benchmark worth looking at, even if the gains turn out modest. It deserves peer review because the problem is timely and the proposal is concrete enough to evaluate once the experiments are fully reported.

Referee Report

2 major / 2 minor

Summary. The paper proposes Hessian-Enhanced Token Attribution (HETA) for decoder-only autoregressive LLMs. HETA integrates a semantic transition vector capturing token-to-token influence across layers, Hessian-based second-order sensitivity scores, and KL divergence for information loss under masking. The method is evaluated on a newly curated benchmark dataset, with claims that it yields more context-aware, causally faithful attributions than prior linear-approximation techniques and aligns better with human annotations across multiple models and datasets.

Significance. If the empirical superiority holds under rigorous controls, HETA would address a clear gap in interpretability methods for generative decoder-only models, moving beyond encoder-centric linear approximations. The introduction of a dedicated generative benchmark is a positive contribution that could facilitate future standardized comparisons.

major comments (2)

The central empirical claim (consistent outperformance in faithfulness and human alignment) is load-bearing yet unsupported by any quantitative results, tables, error bars, or explicit baseline comparisons in the abstract or visible structure. The manuscript must supply these details (e.g., specific faithfulness metrics, statistical significance tests, and ablation results) in the experimental evaluation section to substantiate the claim that the three-component design outperforms existing methods.
No discussion of the computational cost of Hessian computation appears, despite its known expense for large models. This omission affects the practicality claim; the paper should quantify runtime/memory overhead relative to baselines (e.g., in § on experiments or implementation details) and discuss approximations if used.

minor comments (2)

Notation for the semantic transition vector and Hessian sensitivity scores should be defined explicitly with equations early in the method section to improve readability.
The abstract asserts 'new standard' status; this phrasing should be tempered to 'promising results' pending peer validation and broader replication.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback on our work. We address the two major comments point by point below and will incorporate revisions to improve the clarity and completeness of the empirical presentation and practical discussion.

read point-by-point responses

Referee: The central empirical claim (consistent outperformance in faithfulness and human alignment) is load-bearing yet unsupported by any quantitative results, tables, error bars, or explicit baseline comparisons in the abstract or visible structure. The manuscript must supply these details (e.g., specific faithfulness metrics, statistical significance tests, and ablation results) in the experimental evaluation section to substantiate the claim that the three-component design outperforms existing methods.

Authors: We appreciate the referee drawing attention to the need for explicit quantitative support. The experimental evaluation section contains the relevant results, including faithfulness metrics (insertion/deletion and human correlation scores), tables comparing HETA to baselines such as Integrated Gradients and attention rollout across models, error bars from repeated runs, and component ablations. To address visibility concerns and strengthen substantiation of the three-component design, we will revise the manuscript to add a consolidated summary table with statistical significance tests (e.g., paired t-tests) directly in the main experimental section. revision: yes
Referee: No discussion of the computational cost of Hessian computation appears, despite its known expense for large models. This omission affects the practicality claim; the paper should quantify runtime/memory overhead relative to baselines (e.g., in § on experiments or implementation details) and discuss approximations if used.

Authors: We agree that computational overhead is an important practical consideration that was not addressed. In the revised manuscript we will add a dedicated paragraph (or short subsection) in the experimental or implementation details section that reports runtime and peak memory usage of the Hessian component relative to the baselines, along with any approximations (such as diagonal or layer-wise Hessian estimates) employed to ensure scalability. revision: yes

Circularity Check

0 steps flagged

No significant circularity identified

full rationale

The paper proposes HETA by combining a semantic transition vector, Hessian-based sensitivity scores, and KL divergence to produce attributions for decoder-only models. No equations appear in the abstract or description that define any output quantity in terms of itself or reduce a claimed prediction to a fitted input by construction. The central claims rest on empirical outperformance across models and datasets plus a new benchmark, which are external to the method definition itself. No self-citations, uniqueness theorems, or ansatzes imported from prior author work are invoked to bear the load of the derivation. The approach therefore remains self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract provides no explicit free parameters, axioms, or invented entities; the method extends standard attribution techniques (Hessian, KL divergence) without introducing new postulated objects or fitted constants visible at this level.

pith-pipeline@v0.9.0 · 5478 in / 1146 out tokens · 53144 ms · 2026-05-10T15:19:45.289619+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

51 extracted references · 23 canonical work pages · 1 internal anchor

[1]

May 31, 2020.DOI: 10.48550/arXiv.2005.00928

S. Abnar and W. Zuidema. Quantifying attention flow in transformers. arXiv preprint arXiv:2005.00928, 2020

work page arXiv 2005
[2]

On the Robustness of Interpretability Methods

D. Alvarez-Melis and T. S. Jaakkola. On the robustness of interpretability methods. arXiv preprint arXiv:1806.08049, 2018

work page Pith review arXiv 2018
[3]

S. Bach, A. Binder, G. Montavon, F. Klauschen, K.-R. M \"u ller, and W. Samek. On pixel-wise explanations for non-linear classifier decisions by layer-wise relevance propagation. PloS one, 10 0 (7): 0 e0130140, 2015

2015
[4]

Barkan, Y

O. Barkan, Y. Toib, Y. Elisha, J. Weill, and N. Koenigstein. Llm explainability via attributive masking learning. In Findings of the Association for Computational Linguistics: EMNLP 2024, pages 9522--9537, 2024

2024
[5]

J. M. Ben \' tez, J. L. Castro, and I. Requena. Are artificial neural networks black boxes? IEEE Transactions on neural networks, 8 0 (5): 0 1156--1164, 1997

1997
[6]

Bressan, N

M. Bressan, N. Cesa-Bianchi, E. Esposito, Y. Mansour, S. Moran, and M. Thiessen. A theory of interpretable approximations. In The Thirty Seventh Annual Conference on Learning Theory, pages 648--668. PMLR, 2024

2024
[7]

L. Chen, J. Bruna, and A. Bietti. Distributional associations vs in-context reasoning: A study of feed-forward and attention layers. arXiv preprint arXiv:2406.03068, 2024

work page arXiv 2024
[8]

Cohen-Wang, H

B. Cohen-Wang, H. Shah, K. Georgiev, and A. Madry. Contextcite: Attributing model generation to context. Advances in Neural Information Processing Systems, 37: 0 95764--95807, 2024

2024
[9]

Conmy, A

A. Conmy, A. N. Mavor-Parker, A. Lynch, S. Heimersheim, and A. Garriga-Alonso. Towards automated circuit discovery for mechanistic interpretability. In Advances in Neural Information Processing Systems, volume 36, 2023

2023
[10]

Dhamdhere, M

K. Dhamdhere, M. Sundararajan, and Q. Yan. How important is a neuron?, 2018. URL https://arxiv.org/abs/1805.12233

work page arXiv 2018
[11]

Z. Dong, Y. Zhang, Z.-Q. Luo, J. Yao, and R. Sun. Towards quantifying the hessian structure of neural networks. arXiv preprint arXiv:2505.02809, 2025

work page arXiv 2025
[12]

Z. Feng, H. Zhou, Z. Zhu, J. Qian, and K. Mao. Unveiling and manipulating prompt influence in large language models. arXiv preprint arXiv:2405.11891, 2024

work page arXiv 2024
[13]

M. Geva, A. Caciularu, K. R. Wang, and Y. Goldberg. Transformer feed-forward layers build predictions by promoting concepts in the vocabulary space. arXiv preprint arXiv:2203.14680, 2022

work page arXiv 2022
[14]

T. Han, S. Srinivas, and H. Lakkaraju. Which explanation should i choose? a function approximation perspective to characterizing post hoc explanations. Advances in neural information processing systems, 35: 0 5256--5268, 2022

2022
[15]

Hewitt and C

J. Hewitt and C. D. Manning. A structural probe for finding syntax in word representations. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pages 4129--4138, 2019

2019
[16]

Hooker, D

S. Hooker, D. Erhan, P.-J. Kindermans, and B. Kim. A benchmark for interpretability methods in deep neural networks. Advances in neural information processing systems, 32, 2019

2019
[17]

Attention is not Explanation

S. Jain and B. C. Wallace. Attention is not explanation. arXiv preprint arXiv:1902.10186, 2019

work page Pith review arXiv 1902
[18]

Kariyappa, F

S. Kariyappa, F. L \'e cu \'e , S. Mishra, C. Pond, D. Magazzeni, and M. Veloso. Progressive inference: Explaining decoder-only sequence classification models using intermediate predictions. arXiv preprint arXiv:2406.02625, 2024

work page arXiv 2024
[19]

Kobayashi, T

G. Kobayashi, T. Kuribayashi, S. Yokoi, and K. Inui. Attention is not only a weight: Analyzing transformers with vector norms. arXiv preprint arXiv:2004.10102, 2020

work page arXiv 2004
[20]

2023 , month = feb, journal =

G. Kobayashi, T. Kuribayashi, S. Yokoi, and K. Inui. Analyzing feed-forward blocks in transformers through the lens of attention map. arXiv preprint arXiv:2302.00456, 2023

work page arXiv 2023
[21]

Ko c isk \`y , J

T. Ko c isk \`y , J. Schwarz, P. Blunsom, C. Dyer, K. M. Hermann, G. Melis, and E. Grefenstette. The narrativeqa reading comprehension challenge. Transactions of the Association for Computational Linguistics, 6: 0 317--328, 2018

2018
[22]

Y. K. Lal, N. Chambers, R. Mooney, and N. Balasubramanian. Tellmewhy: A dataset for answering why-questions in narratives. arXiv preprint arXiv:2106.06132, 2021

work page arXiv 2021
[23]

X. Li, J. Chen, Y. Chai, and H. Xiong. Gilot: Interpreting generative language models via optimal transport. In Forty-first International Conference on Machine Learning, 2024

2024
[24]

Y. Liu, D. Iter, Y. Xu, S. Wang, R. Xu, and C. Zhu. G-Eval : NLG evaluation using GPT-4 with better human alignment. In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, 2023

2023
[25]

K. Lu, Z. Wang, P. Mardziel, and A. Datta. Influence patterns for explaining information flow in bert. Advances in Neural Information Processing Systems, 34: 0 4461--4474, 2021

2021
[26]

S. M. Lundberg and S.-I. Lee. A unified approach to interpreting model predictions. Advances in neural information processing systems, 30, 2017

2017
[27]

Selfcheckgpt: Zero-resource black-box hallucination detection for generative large language models

P. Manakul, A. Liusie, and M. J. Gales. Selfcheckgpt: Zero-resource black-box hallucination detection for generative large language models. arXiv preprint arXiv:2303.08896, 2023

work page arXiv 2023
[28]

K. Meng, D. Bau, A. Andonian, and Y. Belinkov. Locating and editing factual associations in gpt. Advances in Neural Information Processing Systems, 35: 0 17359--17372, 2022

2022
[29]

Mitchell, C

E. Mitchell, C. Lin, A. Bosselut, C. D. Manning, and C. Finn. Memory-based model editing at scale. In International Conference on Machine Learning, pages 15817--15831. PMLR, 2022

2022
[30]

Towards transparent ai: A survey on explainable large language models.arXiv preprint arXiv:2506.21812, 2025

A. Palikhe, Z. Yu, Z. Wang, and W. Zhang. Towards transparent ai: A survey on explainable large language models. arXiv preprint arXiv:2506.21812, 2025

work page arXiv 2025
[31]

H. Peng, X. Wang, S. Hu, H. Jin, L. Hou, J. Li, Z. Liu, and Q. Liu. Copen: Probing conceptual knowledge in pre-trained language models. arXiv preprint arXiv:2211.04079, 2022

work page arXiv 2022
[32]

Phukan, S

A. Phukan, S. Somasundaram, A. Saxena, K. Goswami, and B. V. Srinivasan. Peering into the mind of language models: An approach for attribution in contextual question answering. In Findings of the Association for Computational Linguistics: ACL 2024, pages 11481--11495, Bangkok, Thailand, Aug. 2024. Association for Computational Linguistics. doi:10.18653/v1...

work page doi:10.18653/v1/2024.findings-acl.682 2024
[33]

M. T. Ribeiro, S. Singh, and C. Guestrin. ``why should i trust you?'' explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135--1144, 2016

2016
[34]

Samek, A

W. Samek, A. Binder, G. Montavon, S. Lapuschkin, and K.-R. M \"u ller. Evaluating the visualization of what a deep neural network has learned. IEEE Transactions on Neural Networks and Learning Systems, 28: 0 2660--2673, 2017

2017
[35]

Sanyal and X

S. Sanyal and X. Ren. Discretized integrated gradients for explaining language models. arXiv preprint arXiv:2108.13654, 2021

work page arXiv 2021
[36]

R. R. Selvaraju, M. Cogswell, A. Das, R. Vedantam, D. Parikh, and D. Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618--626, 2017

2017
[37]

Shrikumar, P

A. Shrikumar, P. Greenside, and A. Kundaje. Learning important features through propagating activation differences. In International conference on machine learning, pages 3145--3153. PMLR, 2017

2017
[38]

Sundararajan, A

M. Sundararajan, A. Taly, and Q. Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319--3328. PMLR, 2017

2017
[39]

K. Vafa, Y. Deng, D. M. Blei, and A. M. Rush. Rationales for sequential predictions. arXiv preprint arXiv:2109.06387, 2021

work page arXiv 2021
[40]

Wang and A

B. Wang and A. Komatsuzaki. GPT-J-6B: A 6 Billion Parameter Autoregressive Language Model . https://github.com/kingoflolz/mesh-transformer-jax, May 2021

2021
[41]

K. Wang, A. Variengien, A. Conmy, B. Shlegeris, and J. Steinhardt. Interpretability in the wild: a circuit for indirect object identification in gpt-2 small. arXiv preprint arXiv:2211.00593, 2022

work page internal anchor Pith review arXiv 2022
[42]

Welbl, N

J. Welbl, N. F. Liu, and M. Gardner. Crowdsourcing multiple choice science questions. In Proceedings of the Workshop on Noisy User-generated Text, 2017

2017
[43]

Y. Xu, S. Zhao, J. Song, R. Stewart, and S. Ermon. A theory of usable information under computational constraints. arXiv preprint arXiv:2002.10689, 2020

work page arXiv 2002
[44]

Y. Yao, P. Wang, B. Tian, S. Cheng, Z. Li, S. Deng, H. Chen, and N. Zhang. Editing large language models: Problems, methods, and opportunities. arXiv preprint arXiv:2305.13172, 2023

work page arXiv 2023
[45]

Zhao and N

Z. Zhao and N. Aletras. Incorporating attribution importance for improving faithfulness metrics. arXiv preprint arXiv:2305.10496, 2023

work page arXiv 2023
[46]

Zhao and B

Z. Zhao and B. Shan. Reagent: A model-agnostic feature attribution method for generative language models. arXiv preprint arXiv:2402.00794, 2024

work page arXiv 2024
[47]

Zheng, W.-L

L. Zheng, W.-L. Chiang, Y. Sheng, S. Zhuang, Z. Wu, Y. Zhuang, Z. Lin, Z. Li, D. Li, E. P. Xing, et al. Judging LLM -as-a-judge with MT-Bench and chatbot arena. In Advances in Neural Information Processing Systems, volume 36, 2023

2023
[48]

write newline

" write newline "" before.all 'output.state := FUNCTION n.dashify 't := "" t empty not t #1 #1 substring "-" = t #1 #2 substring "--" = not "--" * t #2 global.max substring 't := t #1 #1 substring "-" = "-" * t #2 global.max substring 't := while if t #1 #1 substring * t #2 global.max substring 't := if while FUNCTION format.date year duplicate empty "emp...
[49]

@esa (Ref

\@ifxundefined[1] #1\@undefined \@firstoftwo \@secondoftwo \@ifnum[1] #1 \@firstoftwo \@secondoftwo \@ifx[1] #1 \@firstoftwo \@secondoftwo [2] @ #1 \@temptokena #2 #1 @ \@temptokena \@ifclassloaded agu2001 natbib The agu2001 class already includes natbib coding, so you should not add it explicitly Type <Return> for now, but then later remove the command n...
[50]

\@lbibitem[] @bibitem@first@sw\@secondoftwo \@lbibitem[#1]#2 \@extra@b@citeb \@ifundefined br@#2\@extra@b@citeb \@namedef br@#2 \@nameuse br@#2\@extra@b@citeb \@ifundefined b@#2\@extra@b@citeb @num @parse #2 @tmp #1 NAT@b@open@#2 NAT@b@shut@#2 \@ifnum @merge>\@ne @bibitem@first@sw \@firstoftwo \@ifundefined NAT@b*@#2 \@firstoftwo @num @NAT@ctr \@secondoft...
[51]

,# (7),01444 '9=82<.342C 2! !22222222222222222222222222222222222222222222222222

@open @close @open @close and [1] URL: #1 \@ifundefined chapter * \@mkboth \@ifxundefined @sectionbib * \@mkboth * \@mkboth\@gobbletwo \@ifclassloaded amsart * \@ifclassloaded amsbook * \@ifxundefined @heading @heading NAT@ctr thebibliography [1] @ \@biblabel @NAT@ctr \@bibsetup #1 @NAT@ctr @ @openbib .11em \@plus.33em \@minus.07em 4000 4000 `\.\@m @bibit...