pith. sign in

hub Canonical reference

Gpt-4 technical report

Canonical reference. 71% of citing Pith papers cite this work as background.

51 Pith papers citing it
Background 71% of classified citations

hub tools

citation-role summary

background 19 baseline 4 method 1

citation-polarity summary

clear filters

representative citing papers

Mind2Web: Towards a Generalist Agent for the Web

cs.CL · 2023-06-09 · accept · novelty 8.0

Mind2Web is the first large-scale dataset of real-world web tasks for developing generalist language-guided agents that complete complex actions on diverse websites.

GAIA: a benchmark for General AI Assistants

cs.CL · 2023-11-21 · unverdicted · novelty 7.0

GAIA benchmark shows humans at 92% accuracy on simple real-world questions far outperform current AI systems at 15%, proposing this gap as a key milestone for general AI.

Detecting Pretraining Data from Large Language Models

cs.CL · 2023-10-25 · conditional · novelty 7.0

Min-K% Prob detects pretraining data in LLMs by flagging outlier low-probability words in text, achieving 7.4% better performance than prior methods on the new WIKIMIA benchmark.

Learning Interactive Real-World Simulators

cs.AI · 2023-10-09 · conditional · novelty 7.0

UniSim learns a universal real-world simulator from orchestrated diverse datasets, enabling zero-shot deployment of policies trained purely in simulation.

LIMA: Less Is More for Alignment

cs.CL · 2023-05-18 · conditional · novelty 7.0

Fine-tuning a 65B model on 1,000 high-quality examples produces output that humans rate as good as or better than GPT-4 in 43% of cases, indicating most capabilities come from pretraining.

EvoMAS: Learning Execution-Time Workflows for Multi-Agent Systems

cs.AI · 2026-05-09 · unverdicted · novelty 6.0

EvoMAS trains a workflow adapter with policy gradients to dynamically instantiate stage-specific multi-agent workflows from a fixed agent pool, using explicit task-state construction and terminal success signals, and outperforms static baselines on GAIA, HLE, and DeepResearcher.

DeepSeek-OCR: Contexts Optical Compression

cs.CV · 2025-10-21 · unverdicted · novelty 6.0

DeepSeek-OCR compresses text contexts up to 20x via 2D optical mapping while achieving 97% OCR accuracy below 10x and 60% at 20x, outperforming prior OCR tools with fewer vision tokens.

ShinkaEvolve: Towards Open-Ended And Sample-Efficient Program Evolution

cs.CL · 2025-09-17 · unverdicted · novelty 6.0

ShinkaEvolve improves sample efficiency in LLM-driven program evolution via parent sampling, code novelty rejection-sampling, and bandit LLM ensemble selection, achieving new SOTA circle packing with 150 samples and gains on math reasoning and competitive programming tasks.

citing papers explorer

Showing 1 of 1 citing paper after filters.