arxiv: 2605.00167 · v1 · submitted 2026-04-30 · ❄️ cond-mat.dis-nn · physics.ao-ph

Recognition: unknown

Data-Driven Modelling to predict forest fire spread in the Patagonian region in Argentina

Lucas Becerra , Monica Malen Denham , Alejandro B. Kolton , Karina Laneri

Authors on Pith no claims yet

Pith reviewed 2026-05-09 19:54 UTC · model grok-4.3

classification ❄️ cond-mat.dis-nn physics.ao-ph

keywords wildfire modelinggenetic algorithmXGBoostPatagoniareaction-diffusion-convectionparameter estimationburned area overlap

0 comments

The pith

A genetic algorithm combined with XGBoost recovers reference parameters for a wildfire spread model by maximizing overlap with observed burned areas.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper implements a reaction-diffusion-convection model of fire propagation that incorporates maps of slope, wind, and vegetation in the Steffen and Martin Lakes region of Patagonia. It tests this model through three scenarios of increasing landscape complexity and uses a genetic algorithm to search for parameter values that produce simulated burned regions most closely matching reference data. The resulting parameter estimates are then refined with XGBoost to boost accuracy. If the approach holds, it supplies a repeatable way to infer hard-to-measure quantities such as effective fuel consumption rates or ignition thresholds directly from spatial fire records rather than from separate field campaigns.

Core claim

The genetic algorithm recovers the reference parameters of the reaction-diffusion-convection wildfire model across all three tested scenarios by maximizing the spatial overlap between simulated and observed burned areas. Subsequent application of XGBoost improves the accuracy of these estimates, with the largest gains occurring in the simpler scenarios. The combined procedure therefore constitutes a practical method for estimating difficult-to-measure wildfire parameters from existing burned-area data.

What carries the argument

The genetic algorithm that evolves candidate sets of reaction-diffusion-convection parameters to maximize the spatial overlap (match) between the model's simulated burned region and the reference burned area.

Load-bearing premise

That the parameter values producing the greatest overlap with past burned areas correctly capture the underlying physical spread rates and will continue to do so for new fires under different conditions.

What would settle it

Using the fitted parameters to simulate an independent wildfire event in the same region and observing substantially lower spatial overlap with the actual burned area than achieved on the training cases.

Figures

Figures reproduced from arXiv: 2605.00167 by Alejandro B. Kolton, Karina Laneri, Lucas Becerra, Monica Malen Denham.

**Figure 1.** Figure 1: Reference wildfire for the first experiment, simulated with D = 10 m2 h −1 , A = 1 × 10−4 , B = 15 m h−1 , and ignition point at (x, y) = (400, 600). Each green area corresponds to a different fuel type (Forest A, Forest B, exotic forest, pasture, and shrubland), with spatially heterogeneous β and γ values listed in [PITH_FULL_IMAGE:figures/full_fig_p010_1.png] view at source ↗

**Figure 2.** Figure 2: Reference wildfire for the second experiment (Steffen-Martin landscape). The same reference values as the first experiment were used for (D = 10 m2 h −1 , A = 1×10−4 , B = 15 m h−1 , and ignition point at (x, y) = (400, 600)), while homogeneous β and γ were introduced as additional tunable parameters with reference values (β = 1.5 h−1 and γ = 0.5 h−1 ), respectively. Red indicates burned area (final state … view at source ↗

**Figure 3.** Figure 3: Reference wildfire used in Experiment 3 (Steffen-Martin landscape). The simulation uses D = 10 m2 h −1 , A = 1×10−4 , and B = 15 m h−1 , with three fixed ignition points at (1130, 290), (1300, 150), and (620, 280). Each green area corresponds to a different fuel type (Forest A, Forest B, exotic forest, pasture, and shrubland), with heterogeneous and tunable βi and γi values ( [PITH_FULL_IMAGE:figures/ful… view at source ↗

**Figure 4.** Figure 4: Evolution of the fitness in the three experiments. The dashed line shows the mean [PITH_FULL_IMAGE:figures/full_fig_p016_4.png] view at source ↗

**Figure 5.** Figure 5: illustrates the fitness landscape with respect to β and γ in the second experiment. The surface exhibits a highly irregular structure, with many points of similar fitness and no clear decreasing direction, which makes accurate estimation difficult. This behavior is consistent with the correlation matrix for the second experiment ( [PITH_FULL_IMAGE:figures/full_fig_p020_5.png] view at source ↗

**Figure 6.** Figure 6: Correlation matrix of the model parameters found in the second experiment. The correlation was calculated using the best 10000 individuals across all the generations. There is a strong correlation between β and γ, which leads to almost a linear relation between both parameters and difficults the optimization process. By contrast, [PITH_FULL_IMAGE:figures/full_fig_p021_6.png] view at source ↗

**Figure 7.** Figure 7: Fitness landscape in relation to the first (left) and second (right) experiment. In [PITH_FULL_IMAGE:figures/full_fig_p022_7.png] view at source ↗

**Figure 8.** Figure 8: Difference between the reference wildfire and the best-estimated wildfire in the [PITH_FULL_IMAGE:figures/full_fig_p022_8.png] view at source ↗

read the original abstract

Wildfires are among the most severe disturbances affecting forest ecosystems, with over 50,000 hectares burned in Patagonia, Argentina, during 2025 alone. This study implements a Reaction-Diffusion-Convection (RDC) model to simulate wildfire spread in the Steffen and Martin Lakes area, a region severely impacted by fires. By integrating high-resolution maps of slope, wind velocity, and vegetation, we conducted three computational experiments of increasing complexity to simulate fire propagation across heterogeneous landscapes. We employed a Genetic Algorithm (GA) to recover reference model parameters by maximizing the spatial overlap between simulated and reference burned areas. Subsequently, parameter estimates were refined using XGBoost to improve accuracy. Results demonstrate that the GA accurately recovers reference parameters across all scenarios, while the XGBoost fine-tuning significantly enhances accuracy in simpler cases. This integrated framework offers a systematic approach for estimating difficult-to-measure wildfire parameters, demonstrating the potential of hybrid computational methods for wildfire modeling and forest management.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 2 minor

Summary. The manuscript implements a Reaction-Diffusion-Convection (RDC) model for wildfire spread simulation in the Steffen and Martin Lakes area of Patagonia using high-resolution maps of slope, wind velocity, and vegetation. Three computational experiments of increasing complexity are performed; a Genetic Algorithm (GA) recovers reference model parameters by maximizing spatial overlap between simulated and reference burned areas, after which XGBoost is applied to refine the estimates. The central claim is that the GA accurately recovers the reference parameters across scenarios and that the hybrid GA+XGBoost framework improves accuracy (especially in simpler cases) while providing a systematic method for estimating difficult-to-measure wildfire parameters.

Significance. If the recovered parameters were shown to generalize beyond the fitting data and to match independent physical measurements, the hybrid approach could offer a practical route for calibrating complex simulation models in regions with limited direct observations. The use of GA for global search followed by XGBoost fine-tuning is a reasonable combination, but the current evaluation on synthetic references generated from the same forward model provides only a test of optimizer invertibility rather than physical validity or predictive power for new events.

major comments (3)

[Abstract] Abstract: The statement that 'the GA accurately recovers reference parameters across all scenarios' and that 'XGBoost fine-tuning significantly enhances accuracy' supplies no quantitative overlap scores (e.g., Jaccard or Dice coefficients), error bars, cross-validation statistics, or description of how the reference burned-area maps were generated. Without these numbers the data-to-claim link cannot be evaluated.
[Computational Experiments and Results] Computational Experiments and Results sections: The reference burned areas used both to drive the GA objective and to assess success appear to be synthetic outputs generated from the identical RDC model with known parameters. In this setup, successful 'recovery' only verifies that the optimizer can invert the forward model on calibration data; it does not demonstrate that the recovered values correspond to real physical processes or will generalize to unseen fire events.
[Abstract and Methods] Abstract and Methods: No independent test set, held-out real fire perimeters, satellite validation, or comparison against field-measured parameters (fuel moisture, wind thresholds, etc.) is described. The absence of such a check makes the claim that the framework estimates 'difficult-to-measure wildfire parameters' for forest management unsupported.

minor comments (2)

[Abstract] Abstract: The phrase 'over 50,000 hectares burned in Patagonia, Argentina, during 2025 alone' should be clarified with a citation or data source, as the manuscript context suggests the work predates or is contemporaneous with that period.
Notation: The RDC model parameters (diffusion, convection, reaction rates) are referred to collectively as 'reference model parameters' without an explicit list or table of their symbols and units in the main text.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive and detailed feedback. The comments highlight important issues regarding quantitative reporting, the synthetic nature of the experiments, and the scope of claims about real-world parameter estimation. We agree that revisions are needed to strengthen the manuscript by adding specific metrics, clarifying limitations, and moderating overstatements. We will submit a revised version addressing these points.

read point-by-point responses

Referee: [Abstract] The statement that 'the GA accurately recovers reference parameters across all scenarios' and that 'XGBoost fine-tuning significantly enhances accuracy' supplies no quantitative overlap scores (e.g., Jaccard or Dice coefficients), error bars, cross-validation statistics, or description of how the reference burned-area maps were generated. Without these numbers the data-to-claim link cannot be evaluated.

Authors: We will revise the abstract to report average Jaccard and Dice coefficients with standard deviations for each of the three scenarios, along with a brief description that reference burned-area maps were generated by forward simulation of the RDC model using known ground-truth parameters on the provided high-resolution slope, wind, and vegetation maps. revision: yes
Referee: [Computational Experiments and Results] The reference burned areas used both to drive the GA objective and to assess success appear to be synthetic outputs generated from the identical RDC model with known parameters. In this setup, successful 'recovery' only verifies that the optimizer can invert the forward model on calibration data; it does not demonstrate that the recovered values correspond to real physical processes or will generalize to unseen fire events.

Authors: We acknowledge that the experiments rely on synthetic data generated from the same RDC model. This controlled setup was chosen to quantify recovery accuracy with known ground truth. We will add an explicit limitations subsection in the Discussion that states the current results test invertibility rather than physical validity, and we outline planned future work using real fire perimeters. revision: partial
Referee: [Abstract and Methods] No independent test set, held-out real fire perimeters, satellite validation, or comparison against field-measured parameters (fuel moisture, wind thresholds, etc.) is described. The absence of such a check makes the claim that the framework estimates 'difficult-to-measure wildfire parameters' for forest management unsupported.

Authors: We agree the manuscript lacks real-world validation. We will revise the abstract and conclusions to state that the hybrid GA+XGBoost framework offers a systematic method for recovering parameters in simulation models when direct measurements are unavailable, while clearly noting that application to operational forest management requires additional validation against observed fire events and field data. revision: yes

Circularity Check

1 steps flagged

GA 'recovery' of reference parameters reduces to maximizing overlap on the same synthetic burned areas used to define success

specific steps

fitted input called prediction [Abstract]
"We employed a Genetic Algorithm (GA) to recover reference model parameters by maximizing the spatial overlap between simulated and reference burned areas. ... Results demonstrate that the GA accurately recovers reference parameters across all scenarios, while the XGBoost fine-tuning significantly enhances accuracy in simpler cases."

Reference burned areas are produced by the identical RDC forward model with known parameters; GA recovery is defined as maximizing overlap with those same areas. Declaring 'accurate recovery' therefore reports that the optimizer succeeded at inverting its own training data rather than demonstrating independent predictive power or physical validity on unseen events.

full rationale

The paper generates reference burned areas from the RDC model itself using known parameters, then uses GA to maximize spatial overlap with those exact areas to 'recover' the parameters. Success is declared when the recovered values match the known inputs (or overlap is high). This is a direct inversion test on calibration data with no held-out real fire perimeters, no external physical measurements, and no independent test set. The subsequent XGBoost step refines the same fitted values. The claimed 'prediction' of difficult-to-measure parameters and generalization therefore collapses to the fitting procedure by construction.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 0 invented entities

The central claim rests on the suitability of the RDC model for fire spread and on the premise that overlap-maximizing parameter search produces physically meaningful values; both are taken as given without further justification in the abstract.

free parameters (1)

RDC model parameters (diffusion, convection, reaction rates)
Recovered by genetic algorithm to maximize spatial overlap with reference burned areas; no specific values or ranges given in abstract.

axioms (1)

domain assumption The Reaction-Diffusion-Convection model accurately captures wildfire propagation across heterogeneous terrain
The study implements the RDC framework as the base simulator without discussing its assumptions or limitations.

pith-pipeline@v0.9.0 · 5478 in / 1443 out tokens · 65077 ms · 2026-05-09T19:54:13.723440+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

35 extracted references · 5 canonical work pages

[1]

Fire Ecology , volume=

Biotic and physical drivers of fire in northwestern Patagonia , author=. Fire Ecology , volume=. 2025 , publisher=

2025
[2]

2022 , address =

Marcelo Bari and Paula Presti and Anabella Carp and Mariana Lipori , title =. 2022 , address =

2022
[3]

Proceedings of the 22nd acm sigkdd international conference on knowledge discovery and data mining , pages=

Xgboost: A scalable tree boosting system , author=. Proceedings of the 22nd acm sigkdd international conference on knowledge discovery and data mining , pages=
[4]

International Journal of Wildland Fire , volume=

Human-caused fire occurrence modelling in perspective: a review , author=. International Journal of Wildland Fire , volume=. 2017 , publisher=

2017
[5]

Predicci

Denham, M. Predicci. 2009 , address =

2009
[6]

Journal of Computational Science , volume=

Using efficient parallelization in graphic processing units to parameterize stochastic fire propagation models , author=. Journal of Computational Science , volume=. 2018 , publisher=

2018
[7]

Environmental Modelling & Software , volume=

Visualization and modeling of forest fire propagation in Patagonia , author=. Environmental Modelling & Software , volume=. 2022 , publisher=

2022
[8]

Forest Ecology and management , volume=

Forest functions, ecosystem stability and management , author=. Forest Ecology and management , volume=. 2000 , publisher=

2000
[9]

2025 , note =

Incendio en Los Manzanos: un trabajo constante en condiciones adversas , url =. 2025 , note =

2025
[10]

Advances in neural information processing systems , volume=

Why do tree-based models still outperform deep learning on typical tabular data? , author=. Advances in neural information processing systems , volume=
[11]

Wildfire statistics , institution=

Hoover, Katie and Hanson, Laura A , year=. Wildfire statistics , institution=
[12]

Ecological Modelling , volume=

The impact of dynamic wind flow behavior on forest fire spread using cellular automata: application to the watershed BOUKHALEF (Morocco) , author=. Ecological Modelling , volume=. 2022 , publisher=

2022
[13]

Multimedia tools and applications , volume=

A review on genetic algorithm: past, present, and future , author=. Multimedia tools and applications , volume=. 2021 , publisher=

2021
[14]

International Journal of Wildland Fire , volume=

Techniques for evaluating wildfire simulators via the simulation of historical fires using the Australis simulator , author=. International Journal of Wildland Fire , volume=. 2015 , publisher=

2015
[15]

Science of the total environment , volume=

Projections of fire probability and ecosystem vulnerability under 21st century climate across a trans-Andean productivity gradient in Patagonia , author=. Science of the total environment , volume=. 2022 , publisher=

2022
[16]

International Journal of Wildland Fire , volume=

A novel fire regime driven by increased lightning activity and lightning ignition efficiency for northwestern Patagonia, Argentina , author=. International Journal of Wildland Fire , volume=. 2025 , publisher=

2025
[17]

Fire , volume=

Comparative assessment of wildland fire rate of spread models: effects of wind velocity , author=. Fire , volume=. 2023 , publisher=

2023
[18]

and Laneri, Karina , year=

Kolton, Alejandro B. and Laneri, Karina , year=. Rough infection fronts in a random medium , volume=. The European Physical Journal B , publisher=. doi:10.1140/epjb/e2019-90582-3 , number=

work page doi:10.1140/epjb/e2019-90582-3
[19]

Journal of Computer Science & Technology , year=

First steps towards a dynamical model for forest fire behaviour in Argentinian landscapes , author=. Journal of Computer Science & Technology , year=
[20]

Earth's Future , volume=

Projecting large fires in the western US with an interpretable and accurate hybrid machine learning method , author=. Earth's Future , volume=. 2024 , publisher=

2024
[21]

, author=

Weather and grassland fire behaviour. , author=
[22]

Ecological Modelling , volume=

A stochastic fire spread model for north Patagonia based on fire occurrence maps , author=. Ecological Modelling , volume=. 2015 , publisher=

2015
[23]

J. D. Murray , title =. 2003 , address =. doi:10.1007/b98869 , isbn =

work page doi:10.1007/b98869 2003
[24]

31st confernce on neural information processing systems , volume=

Cupy: A numpy-compatible library for nvidia gpu calculations , author=. 31st confernce on neural information processing systems , volume=
[25]

Information fusion , volume=

Tabular data: Deep learning is not all you need , author=. Information fusion , volume=. 2022 , publisher=

2022
[26]

International Journal of Wildland Fire , volume=

Location, timing and extent of wildfire vary by cause of ignition , author=. International Journal of Wildland Fire , volume=. 2015 , publisher=

2015
[27]

Our World in Data , year =

Hannah Ritchie , title =. Our World in Data , year =
[28]

2007 , publisher=

Numerical recipes 3rd edition: The art of scientific computing , author=. 2007 , publisher=

2007
[29]

Provatas, Nikolas and Ala-Nissila, Tapio and Grant, Martin and Elder, K. R. and Pich. Scaling, propagation, and kinetic roughening of flame fronts in random media , journal=. 1995 , month=. doi:10.1007/BF02179255 , url=

work page doi:10.1007/bf02179255 1995
[30]

1972 , publisher=

A mathematical model for predicting fire spread in wildland fuels , author=. 1972 , publisher=

1972
[31]

Servicio Nacional de Manejo del Fuego, Reporte técnico de ocurrencia , year =
[32]

Wadhwani et al

Integrating deep learning with physics-based model for predicting grassfire spread: R. Wadhwani et al. , author=. Journal of Forestry Research , volume=. 2025 , publisher=

2025
[33]

2025 , note =

WindNinja: a computer program that computes spatially varying wind fields for wildland fire application , author =. 2025 , note =

2025
[34]

arXiv preprint arXiv:2507.00761 , year=

A probabilistic approach to wildfire spread prediction using a denoising diffusion surrogate model , author=. arXiv preprint arXiv:2507.00761 , year=

work page arXiv
[35]

Zagarra, Renzo and Laneri, Karina and Kolton, Alejandro B. , year=. Infection fronts in randomly varying transmission-rate media , volume=. Physical Review E , publisher=. doi:10.1103/physreve.110.034308 , number=

work page doi:10.1103/physreve.110.034308