{"record_type":"pith_number_record","schema_url":"https://pith.science/schemas/pith-number/v1.json","pith_number":"pith:2019:OPGYAPR56IPDRF2EMMZ7A63QA2","short_pith_number":"pith:OPGYAPR5","schema_version":"1.0","canonical_sha256":"73cd803e3df21e3897446333f07b70068294fe92c9ce665b8a05f0ab1626feef","source":{"kind":"arxiv","id":"1906.03731","version":1},"attestation_state":"computed","paper":{"title":"Is Attention Interpretable?","license":"http://arxiv.org/licenses/nonexclusive-distrib/1.0/","headline":"","cross_cats":[],"primary_cat":"cs.CL","authors_text":"Noah A. Smith, Sofia Serrano","submitted_at":"2019-06-09T22:46:12Z","abstract_excerpt":"Attention mechanisms have recently boosted performance on a range of NLP tasks. Because attention layers explicitly weight input components' representations, it is also often assumed that attention can be used to identify information that models found important (e.g., specific contextualized word tokens). We test whether that assumption holds by manipulating attention weights in already-trained text classification models and analyzing the resulting differences in their predictions. While we observe some ways in which higher attention weights correlate with greater impact on model predictions, "},"verification_status":{"content_addressed":true,"pith_receipt":true,"author_attested":false,"weak_author_claims":0,"strong_author_claims":0,"externally_anchored":false,"storage_verified":false,"citation_signatures":0,"replication_records":0,"graph_snapshot":true,"references_resolved":false,"formal_links_present":false},"canonical_record":{"source":{"id":"1906.03731","kind":"arxiv","version":1},"metadata":{"license":"http://arxiv.org/licenses/nonexclusive-distrib/1.0/","primary_cat":"cs.CL","submitted_at":"2019-06-09T22:46:12Z","cross_cats_sorted":[],"title_canon_sha256":"29dde621e02b97367e63d0905714815e47937b9d40c278a0ab6f27957852ef13","abstract_canon_sha256":"d60fc0eb30fa6ac48114541bafda990e6638f20e945aa9d7592cbd2abcd260fc"},"schema_version":"1.0"},"receipt":{"kind":"pith_receipt","key_id":"pith-v1-2026-05","algorithm":"ed25519","signed_at":"2026-05-17T23:43:45.126618Z","signature_b64":"DmnGsbXP+tst8mp+PMFZQeZ1oF6wHFbBIiKpSMStRZqEfUasYdqswpggNhZkObsoDrvdoHxTMauXaH0SIgmvAg==","signed_message":"canonical_sha256_bytes","builder_version":"pith-number-builder-2026-05-17-v1","receipt_version":"0.3","canonical_sha256":"73cd803e3df21e3897446333f07b70068294fe92c9ce665b8a05f0ab1626feef","last_reissued_at":"2026-05-17T23:43:45.125962Z","signature_status":"signed_v1","first_computed_at":"2026-05-17T23:43:45.125962Z","public_key_fingerprint":"8d4b5ee74e4693bcd1df2446408b0d54"},"graph_snapshot":{"paper":{"title":"Is Attention Interpretable?","license":"http://arxiv.org/licenses/nonexclusive-distrib/1.0/","headline":"","cross_cats":[],"primary_cat":"cs.CL","authors_text":"Noah A. Smith, Sofia Serrano","submitted_at":"2019-06-09T22:46:12Z","abstract_excerpt":"Attention mechanisms have recently boosted performance on a range of NLP tasks. Because attention layers explicitly weight input components' representations, it is also often assumed that attention can be used to identify information that models found important (e.g., specific contextualized word tokens). We test whether that assumption holds by manipulating attention weights in already-trained text classification models and analyzing the resulting differences in their predictions. While we observe some ways in which higher attention weights correlate with greater impact on model predictions, "},"claims":{"count":0,"items":[],"snapshot_sha256":"258153158e38e3291e3d48162225fcdb2d5a3ed65a07baac614ab91432fd4f57"},"source":{"id":"1906.03731","kind":"arxiv","version":1},"verdict":{"id":null,"model_set":{},"created_at":null,"strongest_claim":"","one_line_summary":"","pipeline_version":null,"weakest_assumption":"","pith_extraction_headline":""},"references":{"count":0,"sample":[],"resolved_work":0,"snapshot_sha256":"258153158e38e3291e3d48162225fcdb2d5a3ed65a07baac614ab91432fd4f57","internal_anchors":0},"formal_canon":{"evidence_count":0,"snapshot_sha256":"258153158e38e3291e3d48162225fcdb2d5a3ed65a07baac614ab91432fd4f57"},"author_claims":{"count":0,"strong_count":0,"snapshot_sha256":"258153158e38e3291e3d48162225fcdb2d5a3ed65a07baac614ab91432fd4f57"},"builder_version":"pith-number-builder-2026-05-17-v1"},"aliases":[{"alias_kind":"arxiv","alias_value":"1906.03731","created_at":"2026-05-17T23:43:45.126066+00:00"},{"alias_kind":"arxiv_version","alias_value":"1906.03731v1","created_at":"2026-05-17T23:43:45.126066+00:00"},{"alias_kind":"doi","alias_value":"10.48550/arxiv.1906.03731","created_at":"2026-05-17T23:43:45.126066+00:00"},{"alias_kind":"pith_short_12","alias_value":"OPGYAPR56IPD","created_at":"2026-05-18T12:33:24.271573+00:00"},{"alias_kind":"pith_short_16","alias_value":"OPGYAPR56IPDRF2E","created_at":"2026-05-18T12:33:24.271573+00:00"},{"alias_kind":"pith_short_8","alias_value":"OPGYAPR5","created_at":"2026-05-18T12:33:24.271573+00:00"}],"events":[],"event_summary":{},"paper_claims":[],"inbound_citations":{"count":6,"internal_anchor_count":4,"sample":[{"citing_arxiv_id":"2503.16771","citing_title":"Enabling Global, Human-Centered Explanations for LLMs:From Tokens to Interpretable Code and Test Generation","ref_index":55,"is_internal_anchor":true},{"citing_arxiv_id":"2602.16608","citing_title":"Explainable AI: Context-Aware Layer-Wise Integrated Gradients for Explaining Transformer Models","ref_index":13,"is_internal_anchor":true},{"citing_arxiv_id":"2508.03793","citing_title":"AttnTrace: Contextual Attribution of Prompt Injection and Knowledge Corruption","ref_index":49,"is_internal_anchor":true},{"citing_arxiv_id":"2508.04427","citing_title":"Decoding the Multimodal Maze: A Systematic Review on the Adoption of Explainability in Multimodal Attention-based Models","ref_index":120,"is_internal_anchor":true},{"citing_arxiv_id":"2605.05668","citing_title":"Large Vision-Language Models Get Lost in Attention","ref_index":13,"is_internal_anchor":false},{"citing_arxiv_id":"2605.06032","citing_title":"Does Synthetic Data Help? Empirical Evidence from Deep Learning Time Series Forecasters","ref_index":298,"is_internal_anchor":false}]},"formal_canon":{"evidence_count":0,"sample":[],"anchors":[]},"links":{"html":"https://pith.science/pith/OPGYAPR56IPDRF2EMMZ7A63QA2","json":"https://pith.science/pith/OPGYAPR56IPDRF2EMMZ7A63QA2.json","graph_json":"https://pith.science/api/pith-number/OPGYAPR56IPDRF2EMMZ7A63QA2/graph.json","events_json":"https://pith.science/api/pith-number/OPGYAPR56IPDRF2EMMZ7A63QA2/events.json","paper":"https://pith.science/paper/OPGYAPR5"},"agent_actions":{"view_html":"https://pith.science/pith/OPGYAPR56IPDRF2EMMZ7A63QA2","download_json":"https://pith.science/pith/OPGYAPR56IPDRF2EMMZ7A63QA2.json","view_paper":"https://pith.science/paper/OPGYAPR5","resolve_alias":"https://pith.science/api/pith-number/resolve?arxiv=1906.03731&json=true","fetch_graph":"https://pith.science/api/pith-number/OPGYAPR56IPDRF2EMMZ7A63QA2/graph.json","fetch_events":"https://pith.science/api/pith-number/OPGYAPR56IPDRF2EMMZ7A63QA2/events.json","actions":{"anchor_timestamp":"https://pith.science/pith/OPGYAPR56IPDRF2EMMZ7A63QA2/action/timestamp_anchor","attest_storage":"https://pith.science/pith/OPGYAPR56IPDRF2EMMZ7A63QA2/action/storage_attestation","attest_author":"https://pith.science/pith/OPGYAPR56IPDRF2EMMZ7A63QA2/action/author_attestation","sign_citation":"https://pith.science/pith/OPGYAPR56IPDRF2EMMZ7A63QA2/action/citation_signature","submit_replication":"https://pith.science/pith/OPGYAPR56IPDRF2EMMZ7A63QA2/action/replication_record"}},"created_at":"2026-05-17T23:43:45.126066+00:00","updated_at":"2026-05-17T23:43:45.126066+00:00"}