PhySciBench benchmark shows current AI models achieve at most 33.5% accuracy on physical science tasks; DelveAgent framework improves accuracy by up to 7.5 points and cuts costs to one-third.
Szymanski, Bernardus Rendy, Yuxing Fei, Rishi E
15 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
fields
cond-mat.mtrl-sci 3 cs.AI 3 physics.comp-ph 3 cs.CY 1 cs.DC 1 eess.SY 1 physics.app-ph 1 physics.ed-ph 1 physics.optics 1years
2026 15verdicts
UNVERDICTED 15representative citing papers
LAP is a new protocol extending A2A and MCP with four physical-world primitives for agent-to-instrument interaction in autonomous laboratories.
LLM agents run closed-loop design of photonic components and a full modulator by proposing, simulating, and refining against acceptance criteria.
ColPackAgent integrates a custom colpack Python package wrapping HOOMD-blue with MCP tools and an agent skill to enable reliable autonomous workflows for colloidal packing simulations across interactive, prompt-driven, and autoresearch modes.
ShardTensor is a domain-parallelism system for SciML that enables flexible scaling of extreme-resolution spatial datasets by removing the constraint of batch size one per device.
A two-layer certification framework decouples knowledge validity from human authorship to accommodate AI-enabled research in existing publication systems.
QMP-Bench supplies a realistic test set for AI on quantum many-body problems while PhysVEC uses integrated verifiers to turn unreliable LLM generations into code that passes both syntax and physics checks, outperforming baselines.
The paper introduces Experiment-as-Code Labs as a declarative stack synthesizing AI agents, systems orchestration, and physical lab control for AI-driven discovery.
An affordable Arduino-based IoT setup generates real-time optical data for students to compare traversal, Bayesian, and deep learning methods in a self-driving experimental workflow.
A trust-region Bayesian optimization framework integrates LEED multiple scattering models to jointly optimize structural and experimental parameters for automated surface reconstruction.
AIMBio-Mat is a conceptual blueprint for an AI-native, FAIR, governance-aware decision layer that formulates biomedical-materials discovery as constrained multi-objective optimization under uncertainty.
Spectra-Scope is a new AutoML framework that trains interpretable machine learning models on spectral data to characterize material properties while enabling users to understand which spectral features drive the predictions.
Proposes a regional data-centric materials science ecosystem for the Great Plains, identifying five barriers to data sharing and outlining a staged roadmap illustrated by a high-purity germanium pilot.
Infrastructure is the primary obstacle to embodied AI for science in the Global South, and addressing it turns automation into essential capacity rather than a luxury.
A survey of generative crystal modeling, multimodal learning, and closed-loop inverse design pipelines for crystalline solids, including failure modes and evaluation practices.
citing papers explorer
No citing papers match the current filters.