Pith Number

pith:VS7FSX64

pith:2026:VS7FSX64WHQIURZFLXCWM6ZMSF

not attested not anchored not stored refs resolved

Neural Activation Patterns Across Language Model Architectures: A Comprehensive Analysis of Cognitive Task Performance

Faezeh Ghaderi, Mahdi Naser-Moghadasi

Mathematical reasoning produces the highest attention entropy across language model architectures.

arxiv:2605.15436 v1 · 2026-05-14 · cs.CL · cs.LG

Open paper page JSON Open Graph Bundle Merged state Verified badge What is a Pith Number?

Add to your LaTeX paper

\usepackage{pith}
\pithnumber{VS7FSX64WHQIURZFLXCWM6ZMSF}

Prints a linked badge after your title and injects PDF metadata. Compiles on arXiv. Learn more · Embed verified badge

Record completeness

1 Bitcoin timestamp

2 Internet Archive

3 Author claim open · sign in to claim

4 Citations open

5 Replications open

✓ Portable graph bundle live · download bundle · merged state

The bundle contains the canonical record plus signed events. A mirror can host it anywhere and recompute the same current state with the deterministic merge algorithm.

Claims

C1strongest claim

Our analysis of 144 task-model combinations demonstrates that mathematical reasoning consistently produces the highest attention entropy across all architectures, while decoder models exhibit significantly higher sparsity patterns compared to encoder models.

C2weakest assumption

The twelve cognitive task categories and the chosen measurement definitions (final activation values, attention entropy, sparsity) are assumed to capture meaningful and comparable computational differences without substantial confounding from task formulation or model-specific tokenization effects.

C3one line summary

Analysis of 144 task-model pairs finds mathematical reasoning produces the highest attention entropy in all architectures while decoder models show significantly higher sparsity than encoders.

References

50 extracted · 50 resolved · 10 Pith anchors

[1] Llama 2: Open Foundation and Fine-Tuned Chat Models 2023 · arXiv:2307.09288

[2] A. Q. Jiang et al., ”Mistral 7B,” arXiv preprint arXiv:2310.06825, 2023 2023 · arXiv:2310.06825

[3] J. Devlin, M. Chang, K. Lee, and K. Toutanova, ”BERT: Pre-training of deep bidirectional transformers for language understanding,” in Pro- ceedings of NAACL-HLT, 2019, pp. 4171-4186 2019

[4] Radford et al., ”Language models are unsupervised multitask learn- ers,” OpenAI blog, vol 2019

[5] Qwen Technical Report 2023 · arXiv:2309.16609

Receipt and verification

First computed	2026-05-20T00:00:58.510920Z
Builder	pith-number-builder-2026-05-17-v1
Signature	Pith Ed25519 (`pith-v1-2026-05`) · public key
Schema	pith-number/v1.0

Canonical hash

acbe595fdcb1e08a47255dc5667b2c914dacfcd6cb65d6dd1705b2e6f51d185a

Aliases

arxiv: 2605.15436 · arxiv_version: 2605.15436v1 · doi: 10.48550/arxiv.2605.15436 · pith_short_12: VS7FSX64WHQI · pith_short_16: VS7FSX64WHQIURZF · pith_short_8: VS7FSX64

Agent API

Resolver JSON Graph JSON Events JSON Schema Signing key

Verify this Pith Number yourself

curl -sH 'Accept: application/ld+json' https://pith.science/pith/VS7FSX64WHQIURZFLXCWM6ZMSF \
  | jq -c '.canonical_record' \
  | python3 -c "import sys,json,hashlib; b=json.dumps(json.loads(sys.stdin.read()), sort_keys=True, separators=(',',':'), ensure_ascii=False).encode(); print(hashlib.sha256(b).hexdigest())"
# expect: acbe595fdcb1e08a47255dc5667b2c914dacfcd6cb65d6dd1705b2e6f51d185a

Canonical record JSON

{
  "metadata": {
    "abstract_canon_sha256": "323ee04d9b2ad9d88a7635b939cf925e81804fdeac22020879fc7707e7109867",
    "cross_cats_sorted": [
      "cs.LG"
    ],
    "license": "http://arxiv.org/licenses/nonexclusive-distrib/1.0/",
    "primary_cat": "cs.CL",
    "submitted_at": "2026-05-14T21:31:19Z",
    "title_canon_sha256": "45db951a6c79865d598c4d8bd1179df722501ed8c78f2d4a3dec3ec011210a1c"
  },
  "schema_version": "1.0",
  "source": {
    "id": "2605.15436",
    "kind": "arxiv",
    "version": 1
  }
}