LLM planning agent with dynamic KG state achieves 81.5% accuracy on 200 multi-hop questions from NuScale FSAR documents, outperforming non-planning RAG baselines by up to 38pp.
NuclearQA: A human-made benchmark for language models for the nuclear domain
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.AI 1years
2026 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
LLM-Guided Planning for Multi-hop Reasoning over Multimodal Nuclear Regulatory Documents
LLM planning agent with dynamic KG state achieves 81.5% accuracy on 200 multi-hop questions from NuScale FSAR documents, outperforming non-planning RAG baselines by up to 38pp.