{"state_type":"pith_open_graph_state","state_version":"1.0","pith_number":"pith:2026:TWV2C5X66W4PZLZE2BGYUL7XZ3","merge_version":"pith-open-graph-merge-v1","event_count":2,"valid_event_count":2,"invalid_event_count":0,"equivocation_count":0,"current":{"canonical_record":{"metadata":{"abstract_canon_sha256":"55d280e5ff2c9ff860bb26d96a76c6a5e20046c66f90111525e574df15e7e675","cross_cats_sorted":[],"license":"http://creativecommons.org/licenses/by/4.0/","primary_cat":"cs.AI","submitted_at":"2026-04-14T01:07:15Z","title_canon_sha256":"a851621559ef56051cf7edbbffbc879b3072759f2dd251310457fead204c9472"},"schema_version":"1.0","source":{"id":"2604.12176","kind":"arxiv","version":2}},"source_aliases":[{"alias_kind":"arxiv","alias_value":"2604.12176","created_at":"2026-06-03T01:05:13Z"},{"alias_kind":"arxiv_version","alias_value":"2604.12176v2","created_at":"2026-06-03T01:05:13Z"},{"alias_kind":"doi","alias_value":"10.48550/arxiv.2604.12176","created_at":"2026-06-03T01:05:13Z"},{"alias_kind":"pith_short_12","alias_value":"TWV2C5X66W4P","created_at":"2026-06-03T01:05:13Z"},{"alias_kind":"pith_short_16","alias_value":"TWV2C5X66W4PZLZE","created_at":"2026-06-03T01:05:13Z"},{"alias_kind":"pith_short_8","alias_value":"TWV2C5X6","created_at":"2026-06-03T01:05:13Z"}],"graph_snapshots":[{"event_id":"sha256:d0753b134338111bae06d4495d45c82657724262e78fb93ba3d3bfcb46b2fe6e","target":"graph","created_at":"2026-06-03T01:05:13Z","signer":{"key_id":"pith-v1-2026-05","public_key_fingerprint":"8d4b5ee74e4693bcd1df2446408b0d54","signer_id":"pith.science","signer_type":"pith_registry"},"payload":{"graph_snapshot":{"author_claims":{"count":0,"snapshot_sha256":"258153158e38e3291e3d48162225fcdb2d5a3ed65a07baac614ab91432fd4f57","strong_count":0},"builder_version":"pith-number-builder-2026-05-17-v1","claims":{"count":4,"items":[{"attestation":"unclaimed","claim_id":"C1","kind":"strongest_claim","source":"verdict.strongest_claim","status":"machine_extracted","text":"Across frontier LLMs, performance degrades consistently and monotonically as RC increases, even when the total number of entities is held fixed. This failure mode persists with increased test-time compute and in-context learning, suggesting a limitation tied to the arity of the required relational binding rather than to insufficient inference steps or lack of exposure to examples."},{"attestation":"unclaimed","claim_id":"C2","kind":"weakest_assumption","source":"verdict.weakest_assumption","status":"machine_extracted","text":"That the generative tasks in REL truly isolate relational complexity (arity of binding) without introducing uncontrolled confounders in input structure, vocabulary, or task framing that could explain the performance drop instead."},{"attestation":"unclaimed","claim_id":"C3","kind":"one_line_summary","source":"verdict.one_line_summary","status":"machine_extracted","text":"LLMs show consistent performance degradation on higher-arity relational reasoning tasks in a new benchmark REL that isolates relational complexity across scientific domains."},{"attestation":"unclaimed","claim_id":"C4","kind":"headline","source":"verdict.pith_extraction.headline","status":"machine_extracted","text":"Frontier LLMs show steady performance drops on relational tasks as the number of entities that must bind together increases, even with fixed total entities and extra compute."}],"snapshot_sha256":"0a7456ae9206a6eb9b751cdc374c638c1026cc074397fa786be4bf2888b708d6"},"formal_canon":{"evidence_count":0,"snapshot_sha256":"258153158e38e3291e3d48162225fcdb2d5a3ed65a07baac614ab91432fd4f57"},"integrity":{"available":true,"clean":true,"detectors_run":[],"endpoint":"/pith/2604.12176/integrity.json","findings":[],"snapshot_sha256":"c28c3603d3b5d939e8dc4c7e95fa8dfce3d595e45f758748cecf8e644a296938","summary":{"advisory":0,"by_detector":{},"critical":0,"informational":0}},"paper":{"abstract_excerpt":"Relational reasoning is the ability to infer relations that jointly bind multiple entities, attributes, or variables. This ability is central to scientific reasoning, but existing evaluations of relational reasoning in large language models often focus on structured inputs such as tables, graphs, or synthetic tasks, and do not isolate the difficulty introduced by higher-arity relational binding. We study this problem through the lens of Relational Complexity (RC), which we define as the minimum number of independent entities or operands that must be simultaneously bound to apply a relation. RC","authors_text":"Ada Fang, Lukas Fesser, Marinka Zitnik, Sham M. Kakade, Yasha Ektefaie","cross_cats":[],"headline":"Frontier LLMs show steady performance drops on relational tasks as the number of entities that must bind together increases, even with fixed total entities and extra compute.","license":"http://creativecommons.org/licenses/by/4.0/","primary_cat":"cs.AI","submitted_at":"2026-04-14T01:07:15Z","title":"Evaluating Relational Reasoning in LLMs with REL"},"references":{"count":0,"internal_anchors":0,"resolved_work":0,"sample":[],"snapshot_sha256":"258153158e38e3291e3d48162225fcdb2d5a3ed65a07baac614ab91432fd4f57"},"source":{"id":"2604.12176","kind":"arxiv","version":2},"verdict":{"created_at":"2026-05-10T16:21:12.936530Z","id":"6af3fe3b-af5a-4716-874f-9125dd78251c","model_set":{"reader":"grok-4.3"},"one_line_summary":"LLMs show consistent performance degradation on higher-arity relational reasoning tasks in a new benchmark REL that isolates relational complexity across scientific domains.","pipeline_version":"pith-pipeline@v0.9.0","pith_extraction_headline":"Frontier LLMs show steady performance drops on relational tasks as the number of entities that must bind together increases, even with fixed total entities and extra compute.","strongest_claim":"Across frontier LLMs, performance degrades consistently and monotonically as RC increases, even when the total number of entities is held fixed. This failure mode persists with increased test-time compute and in-context learning, suggesting a limitation tied to the arity of the required relational binding rather than to insufficient inference steps or lack of exposure to examples.","weakest_assumption":"That the generative tasks in REL truly isolate relational complexity (arity of binding) without introducing uncontrolled confounders in input structure, vocabulary, or task framing that could explain the performance drop instead."}},"verdict_id":"6af3fe3b-af5a-4716-874f-9125dd78251c"}}],"author_attestations":[],"timestamp_anchors":[],"storage_attestations":[],"citation_signatures":[],"replication_records":[],"corrections":[],"mirror_hints":[],"record_created":{"event_id":"sha256:09f2cde581289dad9f3d7bd861abe10535d9f96698efc4765d1aa47bea69278d","target":"record","created_at":"2026-06-03T01:05:13Z","signer":{"key_id":"pith-v1-2026-05","public_key_fingerprint":"8d4b5ee74e4693bcd1df2446408b0d54","signer_id":"pith.science","signer_type":"pith_registry"},"payload":{"attestation_state":"computed","canonical_record":{"metadata":{"abstract_canon_sha256":"55d280e5ff2c9ff860bb26d96a76c6a5e20046c66f90111525e574df15e7e675","cross_cats_sorted":[],"license":"http://creativecommons.org/licenses/by/4.0/","primary_cat":"cs.AI","submitted_at":"2026-04-14T01:07:15Z","title_canon_sha256":"a851621559ef56051cf7edbbffbc879b3072759f2dd251310457fead204c9472"},"schema_version":"1.0","source":{"id":"2604.12176","kind":"arxiv","version":2}},"canonical_sha256":"9daba176fef5b8fcaf24d04d8a2ff7cef68d180e20b89c23839d79d1829eb18e","receipt":{"algorithm":"ed25519","builder_version":"pith-number-builder-2026-05-17-v1","canonical_sha256":"9daba176fef5b8fcaf24d04d8a2ff7cef68d180e20b89c23839d79d1829eb18e","first_computed_at":"2026-06-03T01:05:13.536512Z","key_id":"pith-v1-2026-05","kind":"pith_receipt","last_reissued_at":"2026-06-03T01:05:13.536512Z","public_key_fingerprint":"8d4b5ee74e4693bcd1df2446408b0d54","receipt_version":"0.3","signature_b64":"xaJZTlrMzV/uPRHI/c4tQISHawAtJbf9D3Ipe+ETApNNyvni3AmQLuiKY7n3GzlPW27tfK49MMy1qa83YEJyDw==","signature_status":"signed_v1","signed_at":"2026-06-03T01:05:13.537113Z","signed_message":"canonical_sha256_bytes"},"source_id":"2604.12176","source_kind":"arxiv","source_version":2}}},"equivocations":[],"invalid_events":[],"applied_event_ids":["sha256:09f2cde581289dad9f3d7bd861abe10535d9f96698efc4765d1aa47bea69278d","sha256:d0753b134338111bae06d4495d45c82657724262e78fb93ba3d3bfcb46b2fe6e"],"state_sha256":"5104836a3531fb2e35c994c79732ab672a54dab6bb1038c1ac030812ec611723"}