pith. sign in

hub

Mlagentbench: Evaluating language agents on ma- chine learning experimentation

18 Pith papers cite this work. Polarity classification is still indexing.

18 Pith papers citing it

hub tools

citation-role summary

background 4

citation-polarity summary

roles

background 4

polarities

background 4

representative citing papers

Neurodata Without Boredom: Benchmarking Agentic AI for Data Reuse

cs.LG · 2026-05-12 · unverdicted · novelty 7.0 · 2 refs

AI agents handle individual data-loading and reformatting steps on neuroscience datasets but rarely complete fully error-free end-to-end pipelines, and AI judges are unreliable without ground-truth references.

How Far Are We From True Auto-Research?

cs.AI · 2026-05-18 · unverdicted · novelty 6.0

ResearchArena shows that agent-generated papers fail top-tier acceptance standards primarily due to fabricated results, underpowered experiments, and plan-execution mismatches that vary sharply by agent.

Pioneer Agent: Continual Improvement of Small Language Models in Production

cs.AI · 2026-04-10 · unverdicted · novelty 6.0

Pioneer Agent automates the full lifecycle of adapting and continually improving small language models via diagnosis-driven data synthesis and regression-constrained retraining, delivering gains of 1.6-83.8 points on benchmarks and large lifts in production-style tasks.

Can We Predict Before Executing Machine Learning Agents?

cs.CL · 2026-01-09 · unverdicted · novelty 6.0

LLMs primed with verified data reports predict agent solution quality at 61.5% accuracy, powering a Predict-then-Verify agent that converges 6x faster than execution-only baselines.

citing papers explorer

Showing 18 of 18 citing papers.