pith. sign in

Datascibench: An llm agent benchmark for data science

8 Pith papers cite this work. Polarity classification is still indexing.

8 Pith papers citing it

citation-role summary

background 1 dataset 1

citation-polarity summary

years

2026 5 2025 3

polarities

background 2

representative citing papers

AgenticDataBench: A Comprehensive Benchmark for Data Agents

cs.DB · 2026-07-02 · unverdicted · novelty 5.0

AgenticDataBench is a new benchmark covering realistic data science tasks across 15 domains using extracted skills and LLM-generated workflows to evaluate data agents at fine granularity.

Business Utility of Large Language Models as Exploratory Data Analysis Agents

cs.CY · 2026-05-08 · unverdicted · novelty 5.0

Evaluation of 15 LLM configurations across four conditions in a supply chain EDA benchmark finds most lack sufficient repeatability for autonomous deployment, with GPT-5.4 at extra-high reasoning effort scoring highest on mean score (0.8748) and proposed Business utility (0.6952).

citing papers explorer

Showing 8 of 8 citing papers.