DiscoverPhysics is a new benchmark with 22 on-demand N-body simulated worlds where LLM agents design experiments to infer non-standard physics, evaluated via held-out trajectory MSE and LLM-judged explanation quality.
61 Anoop Cherian, Radu Corcodel, Siddarth Jain, and Diego Romeres
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
years
2026 2verdicts
UNVERDICTED 2representative citing papers
HExA is a training-free agent framework that improves LLM performance on novel physics tasks from 2% to 77% by iteratively designing experiments and composing learned skills.
citing papers explorer
-
DiscoverPhysics: Benchmarking LLMs for Out-of-the-Box Scientific Thinking
DiscoverPhysics is a new benchmark with 22 on-demand N-body simulated worlds where LLM agents design experiments to infer non-standard physics, evaluated via held-out trajectory MSE and LLM-judged explanation quality.
-
Hierarchical Experimentalist Agents
HExA is a training-free agent framework that improves LLM performance on novel physics tasks from 2% to 77% by iteratively designing experiments and composing learned skills.