pith. sign in

hub Canonical reference

Arc prize 2024: Technical report

Canonical reference. 100% of citing Pith papers cite this work as background.

25 Pith papers citing it
Background 100% of classified citations

hub tools

citation-role summary

background 5

citation-polarity summary

years

2026 19 2025 6

roles

background 5

polarities

background 5

clear filters

representative citing papers

Knowledge Index of Noah's Ark

cs.AI · 2026-06-03 · unverdicted · novelty 7.0

Introduces KINA benchmark with 899 items over 261 disciplines, formal (1-1/e) coverage guarantee and bonus-on-bar tournament theorem, plus evaluations of 42 models with top score 53.17%.

Modality-Driven Search with Holistic Trace Judging for ARC-AGI-2

cs.AI · 2026-06-30 · unverdicted · novelty 6.0

A modality-driven search system with holistic trace judging for ARC-AGI-2 reaches 72.9% on the semi-private set and 76.1% on the public set, outperforming GPT-5.2 Pro and Gemini 3 Pro by 18.7 points while releasing full code.

Test-Time Matching: Unlocking Compositional Reasoning in Multimodal Models

cs.AI · 2025-10-09 · unverdicted · novelty 6.0

Introduces group matching score for better evaluation of compositional reasoning and Test-Time Matching (TTM) algorithm for unsupervised self-improvement in multimodal models, achieving SOTA gains including surpassing GPT-4.1 and estimated human performance.

Language-Guided Abstraction for Visual Reasoning

cs.CV · 2026-06-11 · unverdicted · novelty 5.0

L-VARC is a LUPI framework that refines crowd-sourced language descriptions with an LLM and uses cross-attention to guide visual ARC models during training only, yielding SOTA results with a lightweight 18M-parameter network.

Hierarchical Reasoning Model

cs.AI · 2025-06-26 · unverdicted · novelty 5.0

HRM is a recurrent architecture with high-level planning and low-level execution modules that reaches near-perfect accuracy on complex Sudoku, maze navigation, and ARC benchmarks using 27M parameters and 1000 samples without pre-training or CoT supervision.

Humanity's Last Exam

cs.LG · 2025-01-24 · unverdicted · novelty 5.0

Humanity's Last Exam is a new 2,500-question benchmark at the frontier of human knowledge where state-of-the-art LLMs show low accuracy.

A Compositional Framework for Open-ended Intelligence

cs.LG · 2026-06-13 · unverdicted · novelty 4.0

Open-ended intelligence is formalized as the compositional closure L(P,C) of primitives P under operators C, with next primitive prediction proposed as an objective to acquire reusable primitives and grammar for lifelong adaptation.

Customizing an LLM for Enterprise Software Engineering

cs.SE · 2026-05-15 · unverdicted · novelty 4.0 · 2 refs

Gemini for Google, customized via continued pre-training on proprietary Google engineering data, delivers measurable productivity gains in a large internal developer study.

Measuring AI Reasoning: A Guide for Researchers

cs.AI · 2026-05-04 · unverdicted · novelty 4.0

Reasoning in language models should be measured by the faithfulness and validity of their multi-step search processes and intermediate traces, not final-answer accuracy.

citing papers explorer

Showing 2 of 2 citing papers after filters.