DiscoveryWorld: A virtual environment for developing and evaluating automated scientific discovery agents

Peter Jansen, Marc-Alexandre C ˆot´e, Tushar Khot, Erin Bransom, Bhavana Dalvi Mishra, Bodhisattwa Prasad Majumder, Oyvind Tafjord, Peter Clark · 2024 · DOI 10.52202/079017-0324

2 Pith papers cite this work. Polarity classification is still indexing.

2 Pith papers citing it

open at publisher browse 2 citing papers

representative citing papers

AutoMedBench: Towards Medical AutoResearch with Agentic AI Models

cs.AI · 2026-06-01 · conditional · novelty 7.0

AutoMedBench evaluates AI agents on long-horizon medical workflows across five stages and finds validation and submission as dominant failure points based on thousands of runs.

Models Recall What They Violate: Constraint Adherence in Multi-Turn LLM Ideation

cs.CL · 2026-04-30 · conditional · novelty 7.0

LLMs in multi-turn ideation reliably increase structural complexity while violating original constraints despite preserved declarative recall, with KBV rates ranging 8-99% across models.

citing papers explorer

Showing 2 of 2 citing papers.

AutoMedBench: Towards Medical AutoResearch with Agentic AI Models cs.AI · 2026-06-01 · conditional · none · ref 31
AutoMedBench evaluates AI agents on long-horizon medical workflows across five stages and finds validation and submission as dominant failure points based on thousands of runs.
Models Recall What They Violate: Constraint Adherence in Multi-Turn LLM Ideation cs.CL · 2026-04-30 · conditional · none · ref 2
LLMs in multi-turn ideation reliably increase structural complexity while violating original constraints despite preserved declarative recall, with KBV rates ranging 8-99% across models.

DiscoveryWorld: A virtual environment for developing and evaluating automated scientific discovery agents

fields

years

verdicts

representative citing papers

citing papers explorer