{"record_type":"pith_number_record","schema_url":"https://pith.science/schemas/pith-number/v1.json","pith_number":"pith:2026:J3QNN4CBHU24AABEYRYZQM7ACA","short_pith_number":"pith:J3QNN4CB","schema_version":"1.0","canonical_sha256":"4ee0d6f0413d35c00024c4719833e010346918cf67ed4bd7a227883874f28f7b","source":{"kind":"arxiv","id":"2604.22794","version":1},"attestation_state":"computed","paper":{"title":"Accelerating Reinforcement Learning for Wind Farm Control via Expert Demonstrations","license":"http://creativecommons.org/licenses/by/4.0/","headline":"Pretraining with expert demonstrations lets reinforcement learning wind farm controllers start at baseline performance instead of lagging by 12 percent.","cross_cats":["cs.LG","cs.SY"],"primary_cat":"eess.SY","authors_text":"Julian Quick, Marcus Binder Nilsen, Nikolay Dimitrov, Pierre-Elouan R\\'ethor\\'e, Tuhfe G\\\"o\\c{c}men","submitted_at":"2026-04-13T12:25:43Z","abstract_excerpt":"Reinforcement learning (RL) offers a promising approach for adaptive wind farm flow control, yet its practical deployment is hindered by slow training convergence and poor initial performance, factors that could translate to years of reduced power output if an untrained agent were deployed directly. This work investigates whether domain knowledge from steady-state wake models can accelerate RL training and improve initial controller performance. We propose a pretraining methodology in which expert demonstrations are generated by deploying a PyWake-based steady-state optimizer within a dynamic "},"verification_status":{"content_addressed":true,"pith_receipt":true,"author_attested":false,"weak_author_claims":0,"strong_author_claims":0,"externally_anchored":false,"storage_verified":false,"citation_signatures":0,"replication_records":0,"graph_snapshot":true,"references_resolved":true,"formal_links_present":false},"canonical_record":{"source":{"id":"2604.22794","kind":"arxiv","version":1},"metadata":{"license":"http://creativecommons.org/licenses/by/4.0/","primary_cat":"eess.SY","submitted_at":"2026-04-13T12:25:43Z","cross_cats_sorted":["cs.LG","cs.SY"],"title_canon_sha256":"354f22270abbd1f919f0a5d1f61a6fe2dfb7ac45020c9ba7611fc927b2e38384","abstract_canon_sha256":"dab427759cb036f5f65ec21193c9e00c4c2b1d9d117e1e8e281f345542bbf941"},"schema_version":"1.0"},"receipt":{"kind":"pith_receipt","key_id":"pith-v1-2026-05","algorithm":"ed25519","signed_at":"2026-06-01T01:02:40.526671Z","signature_b64":"rZmuM2/f6ac1WC/uyuunnT3LTFPRqodkRbu37w5boDrIKBZvlhW/cFqYVoP18HgYaPjpDIn7aEtMtIWykL7LDg==","signed_message":"canonical_sha256_bytes","builder_version":"pith-number-builder-2026-05-17-v1","receipt_version":"0.3","canonical_sha256":"4ee0d6f0413d35c00024c4719833e010346918cf67ed4bd7a227883874f28f7b","last_reissued_at":"2026-06-01T01:02:40.525474Z","signature_status":"signed_v1","first_computed_at":"2026-06-01T01:02:40.525474Z","public_key_fingerprint":"8d4b5ee74e4693bcd1df2446408b0d54"},"graph_snapshot":{"paper":{"title":"Accelerating Reinforcement Learning for Wind Farm Control via Expert Demonstrations","license":"http://creativecommons.org/licenses/by/4.0/","headline":"Pretraining with expert demonstrations lets reinforcement learning wind farm controllers start at baseline performance instead of lagging by 12 percent.","cross_cats":["cs.LG","cs.SY"],"primary_cat":"eess.SY","authors_text":"Julian Quick, Marcus Binder Nilsen, Nikolay Dimitrov, Pierre-Elouan R\\'ethor\\'e, Tuhfe G\\\"o\\c{c}men","submitted_at":"2026-04-13T12:25:43Z","abstract_excerpt":"Reinforcement learning (RL) offers a promising approach for adaptive wind farm flow control, yet its practical deployment is hindered by slow training convergence and poor initial performance, factors that could translate to years of reduced power output if an untrained agent were deployed directly. This work investigates whether domain knowledge from steady-state wake models can accelerate RL training and improve initial controller performance. We propose a pretraining methodology in which expert demonstrations are generated by deploying a PyWake-based steady-state optimizer within a dynamic "},"claims":{"count":4,"items":[{"kind":"strongest_claim","text":"Experiments on a 2x2 wind farm show that pretraining eliminates the costly initial learning phase: while an untrained agent underperforms the greedy zero-yaw baseline by approximately 12%, pretraining raises initial performance to near-baseline levels. During online fine-tuning, all configurations converge within 250,000 environment steps to achieve similar performance, ultimately exceeding that of a lookup-table controller, which reaches approximately 7% power gain after 500,000 steps.","source":"verdict.strongest_claim","status":"machine_extracted","claim_id":"C1","attestation":"unclaimed"},{"kind":"weakest_assumption","text":"That expert demonstrations generated by a steady-state PyWake optimizer inside the dynamic WindGym simulator transfer effectively to initialize both actor and critic networks of a Soft Actor-Critic agent for online fine-tuning.","source":"verdict.weakest_assumption","status":"machine_extracted","claim_id":"C2","attestation":"unclaimed"},{"kind":"one_line_summary","text":"Pretraining Soft Actor-Critic agents via behavior cloning on PyWake-generated expert trajectories in WindGym simulations eliminates the initial learning phase for 2x2 wind farm control and yields final performance exceeding a lookup-table baseline.","source":"verdict.one_line_summary","status":"machine_extracted","claim_id":"C3","attestation":"unclaimed"},{"kind":"headline","text":"Pretraining with expert demonstrations lets reinforcement learning wind farm controllers start at baseline performance instead of lagging by 12 percent.","source":"verdict.pith_extraction.headline","status":"machine_extracted","claim_id":"C4","attestation":"unclaimed"}],"snapshot_sha256":"8b57fce5662701dfecc3ab23406f2b6b048adae1497bd9933a08f9136494d3b4"},"source":{"id":"2604.22794","kind":"arxiv","version":1},"verdict":{"id":"19f5d9f4-7d0c-44f1-9f38-f73dc18884ac","model_set":{"reader":"grok-4.3"},"created_at":"2026-05-10T15:51:11.547725Z","strongest_claim":"Experiments on a 2x2 wind farm show that pretraining eliminates the costly initial learning phase: while an untrained agent underperforms the greedy zero-yaw baseline by approximately 12%, pretraining raises initial performance to near-baseline levels. During online fine-tuning, all configurations converge within 250,000 environment steps to achieve similar performance, ultimately exceeding that of a lookup-table controller, which reaches approximately 7% power gain after 500,000 steps.","one_line_summary":"Pretraining Soft Actor-Critic agents via behavior cloning on PyWake-generated expert trajectories in WindGym simulations eliminates the initial learning phase for 2x2 wind farm control and yields final performance exceeding a lookup-table baseline.","pipeline_version":"pith-pipeline@v0.9.0","weakest_assumption":"That expert demonstrations generated by a steady-state PyWake optimizer inside the dynamic WindGym simulator transfer effectively to initialize both actor and critic networks of a Soft Actor-Critic agent for online fine-tuning.","pith_extraction_headline":"Pretraining with expert demonstrations lets reinforcement learning wind farm controllers start at baseline performance instead of lagging by 12 percent."},"integrity":{"clean":true,"summary":{"advisory":0,"critical":0,"by_detector":{},"informational":0},"endpoint":"/pith/2604.22794/integrity.json","findings":[],"available":true,"detectors_run":[],"snapshot_sha256":"c28c3603d3b5d939e8dc4c7e95fa8dfce3d595e45f758748cecf8e644a296938"},"references":{"count":19,"sample":[{"doi":"","year":2019,"title":"Veers P, Dykes K, Lantz E, Barth S, Bottasso C L, Carlson O, Clifton A, Green J, Green P, Holttinen H et al.2019 Grand challenges in the science of wind energyScience366eaau2027","work_id":"8d04299e-4393-40be-be3c-5adc2ab1924e","ref_index":1,"cited_arxiv_id":"","is_internal_anchor":false},{"doi":"","year":2022,"title":"Meyers J, Bottasso C, Dykes K, Fleming P, Gebraad P, Giebel G, Göçmen T and Van Wingerden J W 2022 Wind farm flow control: prospects and challengesWind Energy Science Discussions20221–56","work_id":"e8d1d2d4-9eef-4eb1-80b3-9279df00ad8d","ref_index":2,"cited_arxiv_id":"","is_internal_anchor":false},{"doi":"","year":2020,"title":"Howland M F and Dabiri J O 2020 Influence of wake model superposition and secondary steering on model- based wake steering control with SCADA data assimilationEnergies","work_id":"eccdf91f-c6ae-4e2e-9831-e4bdd04073ec","ref_index":3,"cited_arxiv_id":"","is_internal_anchor":false},{"doi":"","year":2023,"title":"Abkar M, Zehtabiyan-Rezaie N and Iosifidis A 2023 Reinforcement learning for wind-farm flow control: Current state and future actionsTheoretical and Applied Mechanics Letters100475","work_id":"d2caba03-6c23-4bcc-b75c-8a7ccaf94b9c","ref_index":4,"cited_arxiv_id":"","is_internal_anchor":false},{"doi":"","year":2024,"title":"Göçmen T, Liew J, Kadoche E, Dimitrov N, Riva R, Andersen S J, Lio A W, Quick J, Réthoré P E and Dykes K 2024 Data-driven wind farm flow control and challenges towards field implementationRenewable an","work_id":"1eb188f5-ee4f-4199-ad15-236380e4477d","ref_index":5,"cited_arxiv_id":"","is_internal_anchor":false}],"resolved_work":19,"snapshot_sha256":"1365980856c1dac217be8f63528453b3a148a3bc332e4161c059fd9ee704d827","internal_anchors":1},"formal_canon":{"evidence_count":0,"snapshot_sha256":"258153158e38e3291e3d48162225fcdb2d5a3ed65a07baac614ab91432fd4f57"},"author_claims":{"count":0,"strong_count":0,"snapshot_sha256":"258153158e38e3291e3d48162225fcdb2d5a3ed65a07baac614ab91432fd4f57"},"builder_version":"pith-number-builder-2026-05-17-v1"},"aliases":[{"alias_kind":"arxiv","alias_value":"2604.22794","created_at":"2026-06-01T01:02:40.525640+00:00"},{"alias_kind":"arxiv_version","alias_value":"2604.22794v1","created_at":"2026-06-01T01:02:40.525640+00:00"},{"alias_kind":"doi","alias_value":"10.48550/arxiv.2604.22794","created_at":"2026-06-01T01:02:40.525640+00:00"},{"alias_kind":"pith_short_12","alias_value":"J3QNN4CBHU24","created_at":"2026-06-01T01:02:40.525640+00:00"},{"alias_kind":"pith_short_16","alias_value":"J3QNN4CBHU24AABE","created_at":"2026-06-01T01:02:40.525640+00:00"},{"alias_kind":"pith_short_8","alias_value":"J3QNN4CB","created_at":"2026-06-01T01:02:40.525640+00:00"}],"events":[],"event_summary":{},"paper_claims":[],"inbound_citations":{"count":0,"internal_anchor_count":0,"sample":[]},"formal_canon":{"evidence_count":0,"sample":[],"anchors":[]},"links":{"html":"https://pith.science/pith/J3QNN4CBHU24AABEYRYZQM7ACA","json":"https://pith.science/pith/J3QNN4CBHU24AABEYRYZQM7ACA.json","graph_json":"https://pith.science/api/pith-number/J3QNN4CBHU24AABEYRYZQM7ACA/graph.json","events_json":"https://pith.science/api/pith-number/J3QNN4CBHU24AABEYRYZQM7ACA/events.json","paper":"https://pith.science/paper/J3QNN4CB"},"agent_actions":{"view_html":"https://pith.science/pith/J3QNN4CBHU24AABEYRYZQM7ACA","download_json":"https://pith.science/pith/J3QNN4CBHU24AABEYRYZQM7ACA.json","view_paper":"https://pith.science/paper/J3QNN4CB","resolve_alias":"https://pith.science/api/pith-number/resolve?arxiv=2604.22794&json=true","fetch_graph":"https://pith.science/api/pith-number/J3QNN4CBHU24AABEYRYZQM7ACA/graph.json","fetch_events":"https://pith.science/api/pith-number/J3QNN4CBHU24AABEYRYZQM7ACA/events.json","actions":{"anchor_timestamp":"https://pith.science/pith/J3QNN4CBHU24AABEYRYZQM7ACA/action/timestamp_anchor","attest_storage":"https://pith.science/pith/J3QNN4CBHU24AABEYRYZQM7ACA/action/storage_attestation","attest_author":"https://pith.science/pith/J3QNN4CBHU24AABEYRYZQM7ACA/action/author_attestation","sign_citation":"https://pith.science/pith/J3QNN4CBHU24AABEYRYZQM7ACA/action/citation_signature","submit_replication":"https://pith.science/pith/J3QNN4CBHU24AABEYRYZQM7ACA/action/replication_record"}},"created_at":"2026-06-01T01:02:40.525640+00:00","updated_at":"2026-06-01T01:02:40.525640+00:00"}