GeoDisaster: Benchmarking Orchestrated Agents for Operational Disaster Geo-Intelligence
Pith reviewed 2026-06-27 03:45 UTC · model grok-4.3
The pith
GeoDisaster benchmark requires tool-grounded spatial reasoning for disaster tasks that current RS-VLMs cannot meet, while RCEA alignment improves agent tool use and consistency.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
GeoDisaster supplies 2,921 verified instances across 43 question types that integrate heterogeneous EO/GIS evidence and require hazard detection, damage assessment, exposure estimation, and report generation; ground-truth labels derive directly from executable geospatial workflows and consistency checks. The Role-Contract Expectation Alignment method aligns role-specialized agents through failure-aware supervised fine-tuning and contract-grounded reinforcement learning over dense step-level signals, producing measurable improvements in tool use, evidence grounding, state consistency, and decision generation over prior RS-VLMs and agentic baselines.
What carries the argument
Role-Contract Expectation Alignment (RCEA), a training procedure that combines failure-aware supervised fine-tuning with contract-grounded reinforcement learning to enforce explicit execution contracts among 18 disaster-oriented tools coordinated by role-specialized agents.
If this is right
- Existing RS-VLMs and agentic systems will continue to underperform on tasks that require chaining heterogeneous geospatial tools and producing evidence-backed outputs.
- Benchmarks grounded in executable workflows eliminate reliance on language-model-generated labels for disaster-related spatial reasoning.
- Multi-agent coordination via explicit contracts and step-level reinforcement signals improves reliability in state tracking and decision generation.
- The five task families provide standardized evaluation for deforestation monitoring, multi-hazard analysis, building-damage assessment, flood-safe routing, and SAR flood monitoring.
Where Pith is reading between the lines
- Similar contract-based alignment could be tested on non-disaster geospatial tasks such as infrastructure monitoring or agricultural yield estimation.
- The benchmark design suggests that future agent systems may need native integration with GIS execution environments rather than text-only interfaces.
- If RCEA gains hold, operational centers could adopt role-specialized agent teams for rapid disaster assessment instead of single monolithic models.
Load-bearing premise
Executable geospatial workflows and deterministic consistency checks can serve as complete, unbiased ground truth for operational disaster reasoning without missing subjective or contextual factors.
What would settle it
Run the 18-tool RCEA agents and baseline systems on the full 2,921 instances and measure whether RCEA produces statistically higher accuracy in tool selection, evidence citation, state consistency, or final decision correctness; absence of such gains would falsify the improvement claim.
Figures
read the original abstract
Remote-sensing vision-language models (RS-VLMs) have advanced Earth-observation analysis toward visual interpretation and instruction-following, yet fall short of operational geo-intelligence, which demands tool-grounded spatial reasoning and structured, evidence-backed decisions. We introduce GeoDisaster, an operational geospatial disaster reasoning benchmark with 2,921 verified instances across 43 question types and five task families: deforestation monitoring, multi-hazard analysis, building-damage assessment, flood-safe routing, and Sentinel-1 SAR flood monitoring. Instances integrate heterogeneous EO/GIS evidence-optical and SAR imagery, raster masks, vector geometries, road networks, and exposure layers-spanning hazard detection, damage assessment, exposure estimation, and diagnostic report generation. Ground-truth answers are grounded in executable geospatial workflows and deterministic consistency checks, removing the need for language-model annotation. We further propose an orchestrated multi-agent framework with 18 disaster-oriented tools, where role-specialized agents coordinate through explicit execution contracts, aligned via Role-Contract Expectation Alignment (RCEA): failure-aware supervised fine-tuning combined with contract-grounded reinforcement learning over dense step-level signals. Experiments show that GeoDisaster challenges existing RS-VLMs and agentic systems, while RCEA improves tool use, evidence grounding, state consistency, and decision generation.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper introduces GeoDisaster, a benchmark with 2,921 verified instances across 43 question types and five task families (deforestation monitoring, multi-hazard analysis, building-damage assessment, flood-safe routing, Sentinel-1 SAR flood monitoring) for operational geospatial disaster reasoning. Ground truth is derived from executable geospatial workflows and deterministic consistency checks rather than LM annotation. It proposes an orchestrated multi-agent system with 18 disaster-oriented tools coordinated via explicit execution contracts, trained with Role-Contract Expectation Alignment (RCEA) that combines failure-aware supervised fine-tuning and contract-grounded reinforcement learning. The central claim is that the benchmark challenges existing RS-VLMs and agentic systems while RCEA yields improvements in tool use, evidence grounding, state consistency, and decision generation.
Significance. If the experimental results hold, the benchmark's construction from executable workflows and deterministic checks represents a methodological strength that could reduce annotation bias in geo-intelligence evaluation. The RCEA regime for multi-agent coordination addresses a practical gap in deploying agents for structured, evidence-backed decisions on heterogeneous EO/GIS data. This could support more reliable operational systems if the claimed gains are reproducible.
major comments (1)
- [Abstract] Abstract: The claim that 'Experiments show that GeoDisaster challenges existing RS-VLMs and agentic systems, while RCEA improves tool use, evidence grounding, state consistency, and decision generation' is unsupported by any quantitative results, baseline comparisons, ablation studies, metrics, or error analysis. This absence is load-bearing for the central claim, as the data-to-claim link cannot be evaluated from the manuscript.
Simulated Author's Rebuttal
We thank the referee for identifying this critical issue with the abstract. We address the comment point-by-point below.
read point-by-point responses
-
Referee: [Abstract] Abstract: The claim that 'Experiments show that GeoDisaster challenges existing RS-VLMs and agentic systems, while RCEA improves tool use, evidence grounding, state consistency, and decision generation' is unsupported by any quantitative results, baseline comparisons, ablation studies, metrics, or error analysis. This absence is load-bearing for the central claim, as the data-to-claim link cannot be evaluated from the manuscript.
Authors: We agree that the abstract currently asserts experimental outcomes without including, referencing, or summarizing any supporting quantitative evidence, baselines, metrics, or analysis from the manuscript. This renders the claim unevaluable as written. In the revised manuscript we will either (a) add a concise summary of key results (e.g., tool-use accuracy, consistency scores, and comparative deltas versus baselines) directly into the abstract or (b) rephrase the final sentence to describe the benchmark and RCEA framework without asserting unsupported performance gains. The Experiments section will remain the primary location for all quantitative details, ablations, and error analysis. revision: yes
Circularity Check
No significant circularity identified
full rationale
The paper presents an empirical benchmark (GeoDisaster) and an agent framework (RCEA) whose central claims rest on experimental results and deterministic ground-truth construction from executable geospatial workflows, not on any derivation chain, fitted parameters renamed as predictions, or self-citation load-bearing premises. No equations, uniqueness theorems, or ansatzes appear in the provided text, and the ground-truth mechanism is explicitly designed to be independent of language-model annotation. The absence of any load-bearing reduction to inputs by construction makes the derivation self-contained.
Axiom & Free-Parameter Ledger
Reference graph
Works this paper leans on
-
[1]
Geochat: Grounded large vision-language model for remote sensing
Kartik Kuckreja, Muhammad Sohail Danish, Muzammal Naseer, Abhijit Das, Salman Khan, and Fahad Shahbaz Khan. Geochat: Grounded large vision-language model for remote sensing. In CVPR, 2024
2024
-
[2]
Rescueadi: Adaptive disaster interpretation in remote sensing images with autonomous agents.IEEE Transactions on Geoscience and Remote Sensing, 2025
Zhuoran Liu, Danpei Zhao, Bo Yuan, and Zhiguo Jiang. Rescueadi: Adaptive disaster interpretation in remote sensing images with autonomous agents.IEEE Transactions on Geoscience and Remote Sensing, 2025
2025
-
[3]
Earthgpt: A universal 13 multimodal large language model for multisensor image comprehension in remote sensing domain
Wei Zhang, Miaoxin Cai, Tong Zhang, Yin Zhuang, and Xuerui Mao. Earthgpt: A universal 13 multimodal large language model for multisensor image comprehension in remote sensing domain. IEEE Transactions on Geoscience and Remote Sensing, 2024
2024
-
[4]
Akashah Shabbir, Muhammad Umer Sheikh, Muhammad Akhtar Munir, Hiyam Debary, Mustansar Fiaz, Muhammad Zaigham Zaheer, Paolo Fraccaro, Fahad Shahbaz Khan, Muhammad Haris Khan, Xiao Xiang Zhu, and Salman Khan. Openearthagent: A unified framework for tool-augmented geospatial agents.arXiv preprint arXiv:2602.17665, 2026
-
[5]
Floodnet: A high resolution aerial imagery dataset for post flood scene understanding.IEEE Access, 2021
Maryam Rahnemoonfar, Tashnim Chowdhury, Argho Sarkar, Debvrat Varshney, Masoud Yari, and Robin Roberson Murphy. Floodnet: A high resolution aerial imagery dataset for post flood scene understanding.IEEE Access, 2021
2021
-
[6]
Akashah Shabbir, Muhammad Akhtar Munir, Akshay Dudhane, Muhammad Umer Sheikh, Muham- mad Haris Khan, Paolo Fraccaro, Juan Bernabe Moreno, Fahad Shahbaz Khan, and Salman Khan. Thinkgeo: Evaluating tool-augmented agents for remote sensing tasks.arXiv preprint arXiv:2505.23752, 2025
-
[7]
Wenshuai Li, Xiantai Xiang, Zixiao Wen, Guangyao Zhou, Ben Niu, Feng Wang, Lijia Huang, Qiantong Wang, and Yuxin Hu. Georeason: Aligning thinking and answering in remote sens- ing vision-language models via logical consistency reinforcement learning.arXiv preprint arXiv:2601.04118, 2026
-
[8]
Pan, Shuyi Yang, Lakshya A
Mert Cemri, Melissa Z. Pan, Shuyi Yang, Lakshya A. Agrawal, Bhavya Chopra, Rishabh Tiwari, Kurt Keutzer, Aditya Parameswaran, Dan Klein, Kannan Ramchandran, Matei Zaharia, Joseph E. Gonzalez, and Ion Stoica. Why do multi-agent llm systems fail? InAdvances in Neural Information Processing Systems, 2025
2025
-
[9]
Proximal Policy Optimization Algorithms
John Schulman, Filip Wolski, Prafulla Dhariwal, Alec Radford, and Oleg Klimov. Proximal policy optimization algorithms.arXiv preprint arXiv:1707.06347, 2017
work page internal anchor Pith review Pith/arXiv arXiv 2017
-
[10]
Multi-agent deep research: Training multi-agent systems with m-grpo
Haoyang Hong, Jiajun Yin, Yuan Wang, Jingnan Liu, Zhe Chen, Ailing Yu, Ji Li, Zhiling Ye, Hansong Xiao, Yefei Chen, Hualei Zhou, Yun Yue, Minghui Yang, Chunxiao Guo, Junwei Liu, Peng Wei, and Jinjie Gu. Multi-agent deep research: Training multi-agent systems with m-grpo. arXiv preprint arXiv:2511.13288, 2025
-
[12]
xView2: Assessing building damage from satellite imagery, 2019
Defense Innovation Unit. xView2: Assessing building damage from satellite imagery, 2019. xView2 Challenge
2019
-
[13]
Sen1floods11: A georefer- enced dataset to train and test deep learning flood algorithms for sentinel-1
Derrick Bonafilia, Beth Tellman, Tyler Anderson, and Erica Issenberg. Sen1floods11: A georefer- enced dataset to train and test deep learning flood algorithms for sentinel-1. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, 2020
2020
-
[14]
Rescuenet: A high resolution uav semantic segmentation dataset for natural disaster damage assessment.Scientific Data, 2023
Maryam Rahnemoonfar, Tashnim Chowdhury, and Robin Murphy. Rescuenet: A high resolution uav semantic segmentation dataset for natural disaster damage assessment.Scientific Data, 2023
2023
-
[15]
Bacastow
Ronny H¨ansch, Jacob Arndt, Dalton Lunga, Matthew Gibb, Tyler Pedelose, Arnold Boedihardjo, Desiree Petrie, and Todd M. Bacastow. Spacenet 8 - the detection of flooded roads and buildings. In CVPRW, 2022
2022
-
[16]
Vqa-aid: Visual question answering for post-disaster damage assessment and analysis.IEEE International Geoscience and Remote Sensing Symposium (IGARSS), 2021
Argho Sarkar and Maryam Rahnemoonfar. Vqa-aid: Visual question answering for post-disaster damage assessment and analysis.IEEE International Geoscience and Remote Sensing Symposium (IGARSS), 2021. 14
2021
-
[17]
Disasterm3: A remote sensing vision-language dataset for disaster damage assessment and response.Advances in Neural Information Processing Systems 38 (NeurIPS 2025), 2025
Junjue Wang, Weihao Xuan, Heli Qi, Zhihao Liu, Kunyi Liu, Yuhan Wu, Hongruixuan Chen, Jian Song, Junshi Xia, Zhuo Zheng, and Naoto Yokoya. Disasterm3: A remote sensing vision-language dataset for disaster damage assessment and response.Advances in Neural Information Processing Systems 38 (NeurIPS 2025), 2025
2025
-
[18]
Hrvqa: A visual question answering benchmark for high-resolution aerial images.ISPRS Journal of Photogrammetry and Remote Sensing, 2024
Kun Li, George V osselman, and Michael Ying Yang. Hrvqa: A visual question answering benchmark for high-resolution aerial images.ISPRS Journal of Photogrammetry and Remote Sensing, 2024
2024
-
[19]
Rsvlm- qa: A benchmark dataset for remote sensing vision language model-based question answering
Xing Zi, Jinghao Xiao, Yunxiao Shi, Xian Tao, Jun Li, Ali Braytee, and Mukesh Prasad. Rsvlm- qa: A benchmark dataset for remote sensing vision language model-based question answering. Proceedings of the 33rd ACM International Conference on Multimedia, 2025
2025
-
[20]
Geommbench and geommagent: Toward expert-level multimodal intelligence in geoscience and remote sensing.CVPR, 2026
Aoran Xiao, Shihao Cheng, Yonghao Xu, Yexian Ren, Hongruixuan Chen, and Naoto Yokoya. Geommbench and geommagent: Toward expert-level multimodal intelligence in geoscience and remote sensing.CVPR, 2026
2026
-
[21]
Core: Full-path evaluation of llm agents beyond final state.Workshop on Large Agentic Workflows (LAW), NeurIPS, 2025
Panagiotis Michelakis, Yiannis Hadjiyiannis, and Dimitrios Stamoulis. Core: Full-path evaluation of llm agents beyond final state.Workshop on Large Agentic Workflows (LAW), NeurIPS, 2025
2025
-
[22]
Evaluating tool-augmented agents in remote sensing platforms.ICLR Workshop on Machine Learning for Remote Sensing (ML4RS), 2024
Simranjit Singh, Michael Fore, and Dimitrios Stamoulis. Evaluating tool-augmented agents in remote sensing platforms.ICLR Workshop on Machine Learning for Remote Sensing (ML4RS), 2024
2024
-
[23]
Wenjia Xu, Zijian Yu, Boyang Mu, Zhiwei Wei, Yuanben Zhang, Guangzuo Li, Jiuniu Wang, and Mugen Peng. Rs-agent: Automating remote sensing tasks through intelligent agents.arXiv preprint arXiv:2406.07089, 2024
-
[24]
Change- agent: Toward interactive comprehensive remote sensing change interpretation and analysis.IEEE Transactions on Geoscience and Remote Sensing, 2024
Chenyang Liu, Keyan Chen, Haotian Zhang, Zipeng Qi, Zhengxia Zou, and Zhenwei Shi. Change- agent: Toward interactive comprehensive remote sensing change interpretation and analysis.IEEE Transactions on Geoscience and Remote Sensing, 2024
2024
-
[25]
Empowering LLM Agents with Geospatial Awareness: Toward Grounded Reasoning for Wildfire Response
Yiheng Chen, Lingyao Li, Zihui Ma, Qikai Hu, Yilun Zhu, Min Deng, and Runlong Yu. Empowering llm agents with geospatial awareness: Toward grounded reasoning for wildfire response.arXiv preprint arXiv:2510.12061, 2025
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[26]
Earth-agent: Unlocking the full landscape of earth observation with agents.ICLR, 2026
Peilin Feng, Zhutao Lv, Junyan Ye, Xiaolei Wang, Xinjie Huo, Jinhua Yu, Wanghan Xu, Wenlong Zhang, Lei Bai, Conghui He, and Weijia Li. Earth-agent: Unlocking the full landscape of earth observation with agents.ICLR, 2026
2026
-
[27]
Multi-agent geospa- tial copilots for remote sensing workflows.IEEE International Geoscience and Remote Sensing Symposium (IGARSS), 2025
Chaehong Lee, Varatheepan Paramanayakam, Andreas Karatzas, Yanan Jian, Michael Fore, Heming Liao, Fuxun Yu, Ruopu Li, Iraklis Anagnostopoulos, and Dimitrios Stamoulis. Multi-agent geospa- tial copilots for remote sensing workflows.IEEE International Geoscience and Remote Sensing Symposium (IGARSS), 2025
2025
-
[28]
Toolllm: Facilitating large language models to master 16000+ real-world apis.International Conference on Learning Representations 2024 (ICLR 2024), 2024
Yujia Qin, Shihao Liang, Yining Ye, Kunlun Zhu, Lan Yan, Yaxi Lu, Yankai Lin, Xin Cong, Xiangru Tang, Bill Qian, Sihan Zhao, Lauren Hong, Runchu Tian, Ruobing Xie, Jie Zhou, Mark Gerstein, Dahai Li, Zhiyuan Liu, and Maosong Sun. Toolllm: Facilitating large language models to master 16000+ real-world apis.International Conference on Learning Representation...
2024
-
[29]
Agenttun- ing: Enabling generalized agent abilities for llms.Findings of the Association for Computational Linguistics: ACL 2024, 2024
Aohan Zeng, Mingdao Liu, Rui Lu, Bowen Wang, Xiao Liu, Yuxiao Dong, and Jie Tang. Agenttun- ing: Enabling generalized agent abilities for llms.Findings of the Association for Computational Linguistics: ACL 2024, 2024
2024
-
[30]
Fine-Tuning Language Models from Human Preferences
Daniel M. Ziegler, Nisan Stiennon, Jeffrey Wu, Tom B. Brown, Alec Radford, Dario Amodei, Paul Christiano, and Geoffrey Irving. Fine-tuning language models from human preferences.arXiv preprint arXiv:1909.08593, 2019. 15
work page internal anchor Pith review Pith/arXiv arXiv 1909
-
[31]
Long Ouyang, Jeff Wu, Xu Jiang, Diogo Almeida, Carroll L. Wainwright, Pamela Mishkin, Chong Zhang, Sandhini Agarwal, Katarina Slama, Alex Ray, John Schulman, Jacob Hilton, Fraser Kelton, Luke Miller, Maddie Simens, Amanda Askell, Peter Welinder, Paul Christiano, Jan Leike, and Ryan Lowe. Training language models to follow instructions with human feedback....
2022
-
[32]
Manning, and Chelsea Finn
Rafael Rafailov, Archit Sharma, Eric Mitchell, Stefano Ermon, Christopher D. Manning, and Chelsea Finn. Direct preference optimization: Your language model is secretly a reward model. Advances in Neural Information Processing Systems, 36, 2023
2023
-
[33]
Zhihong Shao, Peiyi Wang, Qihao Zhu, Runxin Xu, Junxiao Song, Xiao Bi, Haowei Zhang, Mingchuan Zhang, Y . K. Li, Y . Wu, and Daya Guo. Deepseekmath: Pushing the limits of mathemat- ical reasoning in open language models.arXiv preprint arXiv:2402.03300, 2024
work page internal anchor Pith review Pith/arXiv arXiv 2024
-
[34]
Group-in-group policy optimization for LLM agent training
Lang Feng, Zhenghai Xue, Tingcong Liu, and Bo An. Group-in-group policy optimization for LLM agent training. InAdvances in Neural Information Processing Systems, volume 38, pages 46375–46408, 2025
2025
-
[35]
Selaur: Self evolving llm agent via uncertainty-aware rewards.Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD), 2026
Dengjia Zhang, Xiaoou Liu, Lu Cheng, Yaqing Wang, Kenton Murray, and Hua Wei. Selaur: Self evolving llm agent via uncertainty-aware rewards.Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD), 2026
2026
-
[36]
Llm collaboration with multi-agent reinforcement learning.AAAI, 2026
Shuo Liu, Zeyu Liang, Xueguang Lyu, and Christopher Amato. Llm collaboration with multi-agent reinforcement learning.AAAI, 2026
2026
-
[37]
End-to-end optimization of llm-driven multi-agent search systems via heterogeneous-group-based reinforcement learning.Association for Computational Linguistics, 2026
Guanzhong Chen, Shaoxiong Yang, Chao Li, Wei Liu, Jian Luan, and Zenglin Xu. End-to-end optimization of llm-driven multi-agent search systems via heterogeneous-group-based reinforcement learning.Association for Computational Linguistics, 2026
2026
- [38]
-
[39]
Geoff Boeing. Osmnx: New methods for acquiring, constructing, analyzing, and visualizing complex street networks.Computers, Environment and Urban Systems, 65:126–139, 2017. doi: 10.1016/j.compenvurbsys.2017.05.004
-
[40]
Hansen, Peter V
Matthew C. Hansen, Peter V . Potapov, Rebecca Moore, Matt Hancher, Svetlana A. Turubanova, Alexandra Tyukavina, David Thau, Stephen V . Stehman, Scott J. Goetz, Thomas R. Loveland, et al. High-resolution global maps of 21st-century forest cover change.Science, 342(6160):850–853,
-
[41]
doi: 10.1126/science.1244693
-
[42]
xbd: A dataset for as- sessing building damage from satellite imagery,
Ritwik Gupta, Bryce Goodman, Nirav Patel, Richard Hosfelt, Sandra Sajeev, Eric Heim, Jigar Doshi, Keane Lucas, Howie Choset, and Matthew Gaston. Creating xbd: A dataset for assessing building damage from satellite imagery.arXiv preprint arXiv:1911.09296, 2019
-
[43]
goal":"generate SAR flood report with known-water split
Copernicus Emergency Management Service. Copernicus emergency management service. https: //emergency.copernicus.eu/, 2026. Flood extent and emergency mapping products. 16 SUPPLEMENTARY MATERIAL S1 Dataset Details and Statistics GeoDisaster contains 2,921 verified instances across five disaster task families and 43 question types. Beyond dataset size, its ...
2026
-
[44]
goal":"xBD summary
ORC plan:{"goal":"xBD summary", "steps":[{"agent":"GA", "subtask":"prepare evidence"}, {"agent":"VRA", "subtask":"assess damage"}, {"agent":"PA", "subtask":"compute statistics"}]}
-
[45]
{task: Prepare xBD evidence, evidence: [post-disaster RGB, mask], stop criteria: work_1 prepared with polygons, severe mask, and quadrant grid}
ORC→GA: CallAgent(agent=GA, contract="{task: Prepare xBD evidence, evidence: [post-disaster RGB, mask], stop criteria: work_1 prepared with polygons, severe mask, and quadrant grid}")
-
[46]
GA tool:ReadData(inputs=image_1);PrepareScene(mode=xbd_scene_summary, grid=2x2)
-
[47]
GA Tool Output:scene_1;271polygons and288ksevere pixels loaded.work_1prepared with2×2quadrant grid
-
[48]
ORC→VRA: CallAgent(agent=VRA, contract="{task: Assess scene-level building damage, evidence: [work_1, building polygons, post-disaster RGB, target mask], stop criteria: damage_1 produced with per-building and scene-level damage class}")
-
[49]
VRA tool:AssessDamage(work=work_1, aggregation=scene)
-
[50]
VRA Tool Output:damage_1;271/271buildings destroyed, corresponding to100%severe damage
-
[51]
ORC→P A: CallAgent(agent=PA, contract="{task: Compute area, mask validation, and spatial concentration, evidence: [work_1, damage_1, severe mask, quadrant grid], stop criteria: stats_1 produced with area, dispersion score, and worst quadrant}")
-
[52]
P A tool:ComputeStats(work=work_1, metrics=[area, mask_validation, dispersion, worst_quadrant])
-
[53]
Scene is spatially dispersed with complete severe-damage dominance
P A Tool Output:stats_1; severe area =954,478 m 2; dispersion score =0.34; worst quadrant =Q0. Scene is spatially dispersed with complete severe-damage dominance
-
[54]
destroyed
ORC: Terminate:{"destroyed":271, "severe_pct":100, "severe_m2":954478, "dispersion":"dispersed", "worst_Q":"Q0"} Figure S7:Example GeoDisaster for scene-level building-damage assessment. The task combines pre/post imagery, damage polygons, quadrant overlays, and role-specialized tool calls to produce a structured damage summary with counts, severe-area es...
2021
-
[55]
goal":"estimate burned area
ORC plan:{"goal":"estimate burned area", "steps":[{"agent":"GA", "subtask":"prepare burned-area context"}, {"agent":"PA", "subtask":"measure footprint area"}, {"agent":"PA", "subtask":"validate agreement"}]}
-
[56]
{task: Open burned-area product for AOI, evidence: [pre-fire image, post-fire image, burn product, AOI boundary], stop criteria: work_1 prepared with burn mask and scene context}
ORC→GA: CallAgent(agent=GA, contract="{task: Open burned-area product for AOI, evidence: [pre-fire image, post-fire image, burn product, AOI boundary], stop criteria: work_1 prepared with burn mask and scene context}")
-
[57]
GA tool:LoadCachedRaster(aoi_id=dixie_fire_aoi, handle=mcd64a1_event); PrepareScene(inputs={burned_area:scene_1}, mode=burned_area_assessment)
-
[58]
GA Tool Output:scene_1; MODIS MCD64A1 BurnDate loaded for2021-07-01to2021-09-29; modality =burned_area_mask; scale =500m.work_1prepared with binary burn mask
-
[59]
ORC→P A: CallAgent(agent=PA, contract="{task: Compute burned footprint area, evidence: [work_1, burn mask, AOI boundary, post-fire overlay], stop criteria: burned-area estimate produced in hectares with source agreement}")
-
[60]
P A tool:ComputeArea(work=work_1, mask=burned_area_mask, unit=ha); CompareAcrossSources(handles=[mcd64a1_event], metric=area_agreement)
-
[61]
MODIS MCD64A1 area =17,874.54 ha; agreement class =high
P A Tool Output:burned_area_1; one source used. MODIS MCD64A1 area =17,874.54 ha; agreement class =high
-
[62]
value":17874.54,
ORC: Terminate:{"value":17874.54, "units":"ha", "source":"MODIS MCD64A1", "evidence":"binary burn mask + AOI boundary"}
-
[63]
Figure S8:Example GeoDisaster for wildfire burned-area estimation
ORC: Final Answer:Burned area =17,874.54 ha, evidence:MODIS MCD64A1. Figure S8:Example GeoDisaster for wildfire burned-area estimation. The task uses pre/post fire imagery and a burn-mask product to compute the burned footprint area, with role-specialized tool calls for evidence loading, area measurement, and final grounded reporting. 27 GeoDisaster LLM-a...
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.