{"paper":{"title":"Offline Reinforcement Learning for Rotation Profile Control in Tokamaks","license":"http://arxiv.org/licenses/nonexclusive-distrib/1.0/","headline":"Offline reinforcement learning policies trained on historical tokamak data can control plasma rotation profiles when deployed on a real device.","cross_cats":[],"primary_cat":"cs.LG","authors_text":"Andrew Rothstein, Egemen Kolemen, Hiro Josep Farre Kaga, Ian Char, Jeff Schneider, Jiayu Chen, Ricardo Shousha, Rohit Sonker","submitted_at":"2026-05-07T08:26:59Z","abstract_excerpt":"Tokamaks remain leading candidates for achieving practical fusion energy, yet many important control problems inside these devices are still difficult or unsolved. One such challenge is controlling the plasma rotation profile, which strongly influences stability, confinement, and transport. While the average rotation can be controlled, controlling the full profile is challenging due to high dimensionality, response to multiple actuators and dependence on plasma condition. Learning-based control methods, such as reinforcement learning (RL), provide a potential solution to this challenging probl"},"claims":{"count":4,"items":[{"kind":"strongest_claim","text":"Our final method uses probabilistic models of plasma dynamics to generate rollouts for RL training. We deploy this policy on the DIII-D Tokamak and observe promising real-world results.","source":"verdict.strongest_claim","status":"machine_extracted","claim_id":"C1","attestation":"unclaimed"},{"kind":"weakest_assumption","text":"That historical data from past plasma conditions is representative enough for the learned policy to generalize safely to new operating points and that the probabilistic models capture the relevant dynamics without dangerous extrapolation errors.","source":"verdict.weakest_assumption","status":"machine_extracted","claim_id":"C2","attestation":"unclaimed"},{"kind":"one_line_summary","text":"Offline RL policies trained solely on DIII-D historical data were deployed on the tokamak and produced promising real-world control of the plasma rotation profile.","source":"verdict.one_line_summary","status":"machine_extracted","claim_id":"C3","attestation":"unclaimed"},{"kind":"headline","text":"Offline reinforcement learning policies trained on historical tokamak data can control plasma rotation profiles when deployed on a real device.","source":"verdict.pith_extraction.headline","status":"machine_extracted","claim_id":"C4","attestation":"unclaimed"}],"snapshot_sha256":"16ba1d0a1ec85bd4aa409de4ffb24a64b75918ab0bc4da8452e28ddc363da370"},"source":{"id":"2605.05857","kind":"arxiv","version":2},"verdict":{"id":"d27e6347-dd2b-4916-8640-80268399ed61","model_set":{"reader":"grok-4.3"},"created_at":"2026-05-08T14:44:04.810191Z","strongest_claim":"Our final method uses probabilistic models of plasma dynamics to generate rollouts for RL training. We deploy this policy on the DIII-D Tokamak and observe promising real-world results.","one_line_summary":"Offline RL policies trained solely on DIII-D historical data were deployed on the tokamak and produced promising real-world control of the plasma rotation profile.","pipeline_version":"pith-pipeline@v0.9.0","weakest_assumption":"That historical data from past plasma conditions is representative enough for the learned policy to generalize safely to new operating points and that the probabilistic models capture the relevant dynamics without dangerous extrapolation errors.","pith_extraction_headline":"Offline reinforcement learning policies trained on historical tokamak data can control plasma rotation profiles when deployed on a real device."},"integrity":{"clean":true,"summary":{"advisory":0,"critical":0,"by_detector":{},"informational":0},"endpoint":"/pith/2605.05857/integrity.json","findings":[],"available":true,"detectors_run":[{"name":"claim_evidence","ran_at":"2026-05-20T13:42:04.577426Z","status":"completed","version":"1.0.0","findings_count":0},{"name":"ai_meta_artifact","ran_at":"2026-05-20T08:41:41.252168Z","status":"completed","version":"1.0.0","findings_count":0},{"name":"doi_title_agreement","ran_at":"2026-05-19T19:31:19.556176Z","status":"completed","version":"1.0.0","findings_count":0},{"name":"doi_compliance","ran_at":"2026-05-19T13:10:23.532499Z","status":"completed","version":"1.0.0","findings_count":0}],"snapshot_sha256":"f4756a38ae2a6726ed243c55bb67e29871a2c2c26a7b9339eaea9eee7ff41425"},"references":{"count":0,"sample":[],"resolved_work":0,"snapshot_sha256":"258153158e38e3291e3d48162225fcdb2d5a3ed65a07baac614ab91432fd4f57","internal_anchors":0},"formal_canon":{"evidence_count":0,"snapshot_sha256":"258153158e38e3291e3d48162225fcdb2d5a3ed65a07baac614ab91432fd4f57"},"author_claims":{"count":0,"strong_count":0,"snapshot_sha256":"258153158e38e3291e3d48162225fcdb2d5a3ed65a07baac614ab91432fd4f57"},"builder_version":"pith-number-builder-2026-05-17-v1"}