pith. machine review for the scientific record. sign in

arxiv: 2604.26634 · v1 · submitted 2026-04-29 · 💻 cs.LG · econ.GN· q-fin.EC· stat.AP

Recognition: unknown

Electricity price forecasting across Norway's five bidding zones in the post-crisis era

Dat Thanh Nguyen, Hoai Phuong Ha, My Thi Diem Phan, Trung Tuyen Truong

Authors on Pith no claims yet

Pith reviewed 2026-05-07 13:35 UTC · model grok-4.3

classification 💻 cs.LG econ.GNq-fin.ECstat.AP
keywords electricity price forecastingNorway bidding zonesLightGBMfeature ablationregime analysispost-crisis markethydropowerNord Pool
0
0 comments X

The pith

LightGBM achieves lowest errors in forecasting Norway's electricity prices post-crisis, with lagged prices and calendars often matching full data.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper builds a multimodal hourly dataset from 2019 to 2025 covering all five Norwegian bidding zones and tests eight model families with causal rolling backtesting. It shows LightGBM delivers the best accuracy everywhere, with mean absolute errors between 1.64 and 5.74 EUR per MWh, while a simple ridge ARX model stays competitive in the northern zones. Feature ablation finds that using only past prices and calendar variables frequently equals or beats models that add reservoir levels, gas prices, and other external inputs. Conditional regime analysis nevertheless shows those external variables help identify periods when forecast errors rise sharply. This matters because the 2021-2022 energy crisis and deeper European integration have changed how prices form in a hydropower-dominated market, rendering pre-crisis models unreliable for current operations and planning.

Core claim

A comprehensive evaluation across Norway's five Nord Pool bidding zones using a 2019-2025 multimodal dataset, rolling-origin backtesting, leave-one-group-out ablation, and conditional regime splits demonstrates that LightGBM outperforms other families in every zone while ridge ARX remains a strong linear baseline in northern areas; models limited to lagged prices and calendar features reach comparable accuracy, yet external covariates remain essential for separating forecast performance across stressed versus normal market regimes.

What carries the argument

Leave-one-group-out feature ablation paired with conditional regime analysis on the post-crisis multimodal dataset to isolate feature contributions and error behavior.

If this is right

  • Forecasting systems can often rely on a narrow set of lagged prices and calendar variables without losing accuracy.
  • Error spikes remain predictable once market regimes are identified, allowing targeted risk adjustments.
  • Linear benchmarks like ridge ARX retain practical value in hydro-rich northern zones where external drivers matter less.
  • Regime-aware monitoring becomes necessary for decision makers as integration with Continental Europe continues.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same pattern of minimal features sufficing until regimes shift may appear in other hydro-dominated markets facing similar integration pressures.
  • Embedding real-time regime detection into operational systems could reduce exposure during high-error periods without increasing model complexity.
  • Extending the ablation to quantify the marginal value of each external variable under different hydro inflow conditions would clarify when multimodal data is worth the cost.

Load-bearing premise

The 2019-2025 multimodal dataset fully captures post-crisis price formation dynamics without unmodeled structural breaks or data quality problems.

What would settle it

A sustained rise in out-of-sample errors on 2026 data or during a fresh market shock that the current regime splits do not anticipate would show the models fail to generalize.

Figures

Figures reproduced from arXiv: 2604.26634 by Dat Thanh Nguyen, Hoai Phuong Ha, My Thi Diem Phan, Trung Tuyen Truong.

Figure 1
Figure 1. Figure 1: Price overview across all five Norwegian bidding zones (NO1–NO5), 2019–2025. view at source ↗
Figure 2
Figure 2. Figure 2: Model comparison across all zones (test set, 2025). view at source ↗
Figure 3
Figure 3. Figure 3: Diebold-Mariano pairwise test results (HLN-corrected, one-sided, view at source ↗
Figure 4
Figure 4. Figure 4: Walk-forward rolling-origin backtest for NO1 (52 weekly steps, 2025). view at source ↗
Figure 5
Figure 5. Figure 5: Feature group ablation across NO1–NO5. (A) LOGO ∆MAE relative to the full model. (B) LOGO ∆R2 relative to the full model. (C) Lags-plus-one-group ∆MAE relative to the full model. (D) Lags-plus-one-group ∆R2 relative to the full model the southern zones, whereas forecast errors are slightly higher under high-reservoir conditions in NO3 and NO4. Panel B shows that high TTF prices consistently increase sMAPE … view at source ↗
Figure 6
Figure 6. Figure 6: Conditional regime analysis across the five Norwegian bidding zones. view at source ↗
read the original abstract

Norway's electricity market is heavily dominated by hydropower, but the 2021--2022 energy crisis and stronger integration with Continental Europe have fundamentally altered price formation, reducing the reliability of forecasting models calibrated on historical data. Despite the critical need for updated models, a unified benchmark evaluating feature contributions across all structurally diverse Norwegian bidding zones remains lacking. Here we present a comprehensive evaluation of electricity price forecasting across all five Norwegian Nord Pool bidding zones. We constructed a multimodal hourly dataset spanning 2019--2025 and evaluated eight forecasting model families including LightGBM, ARX, and advanced deep learning architectures using a strictly causal test set. We implemented robust rolling-origin backtesting, leave-one-group-out feature ablation, and conditional regime analysis to dissect model performance and feature utility. Our results show that LightGBM achieves the best performance in every zone with MAE ranging from 1.64 to 5.74~EUR/MWh, while the ridge ARX model remains a highly competitive linear benchmark in northern zones. Feature ablation reveals that models relying solely on lagged prices and calendar variables achieve high accuracy and often match or exceed full multimodal integration. However, conditional regime analysis demonstrates that external features like reservoir levels and gas prices remain crucial to stratify forecast errors, which consistently increase under stressed market regimes. This highlights the practical value of model interpretability and regime awareness for decision makers facing structural changes in market dynamics.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

1 major / 2 minor

Summary. The manuscript presents a comprehensive empirical evaluation of electricity price forecasting across Norway's five Nord Pool bidding zones using a multimodal hourly dataset spanning 2019-2025. It compares eight model families (including LightGBM, ridge ARX, and deep learning architectures) via strictly causal rolling-origin backtesting, leave-one-group-out feature ablation, and conditional regime analysis, claiming that LightGBM achieves the best performance in every zone (MAE 1.64-5.74 EUR/MWh), that lagged prices plus calendar variables are often sufficient, and that external features (e.g., reservoir levels, gas prices) are primarily useful for stratifying forecast errors under stressed regimes.

Significance. If the central results hold, the work supplies a timely post-crisis benchmark for a hydropower-dominated market undergoing structural integration with Continental Europe. The combination of causal backtesting, systematic ablation, and regime-stratified error analysis provides actionable guidance on feature selection and model robustness for practitioners. The construction of the 2019-2025 multimodal dataset and the emphasis on interpretability under varying market conditions are clear strengths.

major comments (1)
  1. [§3.2] §3.2 (rolling-origin backtesting): The procedure applies rolling-origin evaluation over the entire 2019-2025 span without restricting training windows to post-2022 observations for later test periods. This risks mixing pre-crisis and post-crisis regimes, which directly undermines the claims that results isolate post-crisis dynamics and that external features are needed only for error stratification rather than core prediction.
minor comments (2)
  1. [Abstract] Abstract: The MAE range 1.64-5.74 EUR/MWh is reported without per-zone attribution or accompanying variability measures, reducing interpretability of the 'best performance' claim.
  2. [Results] Results: Feature ablation and regime analysis would be strengthened by explicit tables or figures showing per-zone performance deltas and error distributions under each regime.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the constructive feedback on our manuscript. We address the single major comment below and indicate the planned revisions.

read point-by-point responses
  1. Referee: [§3.2] §3.2 (rolling-origin backtesting): The procedure applies rolling-origin evaluation over the entire 2019-2025 span without restricting training windows to post-2022 observations for later test periods. This risks mixing pre-crisis and post-crisis regimes, which directly undermines the claims that results isolate post-crisis dynamics and that external features are needed only for error stratification rather than core prediction.

    Authors: We appreciate the referee's observation on the backtesting design. Our rolling-origin procedure follows standard practice for causal time-series evaluation by retraining models on all data available up to each forecast origin. This reflects operational reality in which forecasters retain historical observations rather than discarding pre-crisis data. The regime-stratified error analysis already isolates performance differences under stressed versus normal conditions. Nevertheless, we acknowledge that the current setup does not fully isolate post-2022 dynamics. In the revised manuscript we will add a sensitivity experiment that restricts training windows to post-2022 observations for all test periods after 2022 and reports the resulting MAE, feature-ablation outcomes, and regime-stratified errors. This addition will directly test whether pre-crisis data materially alters the conclusions on model ranking and external-feature utility. revision: partial

Circularity Check

0 steps flagged

No circularity: purely empirical model comparison on held-out data

full rationale

The paper reports an empirical evaluation of eight model families (LightGBM, ARX, deep learning) on a 2019-2025 multimodal dataset for Norwegian electricity prices. It uses standard rolling-origin backtesting, leave-one-group-out ablation, and regime-stratified error analysis to compare MAE and feature utility. No mathematical derivation chain, first-principles predictions, or fitted parameters renamed as out-of-sample results exist; all reported metrics are computed directly on strictly causal held-out test windows. The central claims (LightGBM best across zones, lagged prices+calendar often sufficient) are therefore falsifiable against external benchmarks and do not reduce to the inputs by construction.

Axiom & Free-Parameter Ledger

1 free parameters · 2 axioms · 0 invented entities

Claims rest on the representativeness of the 2019-2025 multimodal dataset for post-crisis dynamics and standard ML assumptions that historical patterns plus external features enable causal forecasting.

free parameters (1)
  • Hyperparameters of LightGBM, ARX, and deep learning models
    Fitted during training on the dataset; not specified in abstract but central to reported performance.
axioms (2)
  • domain assumption Electricity prices follow patterns predictable from lagged values, calendars, and external features under causal evaluation
    Core premise enabling all model comparisons and ablation studies.
  • domain assumption The 2019-2025 period sufficiently represents post-crisis market regimes without major unaccounted shifts
    Basis for dataset construction and regime-stratified error analysis.

pith-pipeline@v0.9.0 · 5569 in / 1507 out tokens · 53399 ms · 2026-05-07T13:35:49.148231+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

28 extracted references · 14 canonical work pages · 1 internal anchor

  1. [1]

    European Parliament and Council, Directive 96/92/EC of the European parliament and of the Council of 19 December 1996 concerning common rules for the internal market in electricity, Tech. Rep. L 27/20, Official Journal of the European Communities (1997)

  2. [2]

    R. Weron, Electricity price forecasting: A review of the state-of-the-art with a look into the future, International Journal of Forecasting 30 (4) (2014) 1030–1081.doi: 10.1016/j.ijforecast.2014.08.008

  3. [3]

    J. Lago, G. Marcjasz, B. De Schutter, R. Weron, Forecasting day-ahead electricity prices: A review of state-of-the-art algorithms, best practices and an open-access benchmark, Applied Energy 293 (2021) 116983.doi:10.1016/j.apenergy.2021.116983

  4. [4]

    rep., ESMA (2023)

    European Securities and Markets Authority (ESMA), The August 2022 surge in the price of natural gas futures, Tech. rep., ESMA (2023). URLhttps://www.esma.europa.eu/sites/default/files/2023-10/ ESMA50-524821-2963_TRV_Article_the_August_2022_surge_in_the_price_of_ natural_gas_futures.pdf

  5. [5]

    doi: https://doi.org/10.1016/j.eneco

    B. Sæther, A. Neumann, The effect of the 2022 energy crisis on electricity markets ashore the north sea, Energy Economics 131 (2024) 107380.doi:10.1016/j.eneco. 2024.107380

  6. [6]

    rep., ACER (2022)

    Agency for the Cooperation of Energy Regulators (ACER), Wholesale electricity market monitoring 2022: Key developments, Tech. rep., ACER (2022). URLhttps://acer.europa.eu/sites/default/files/documents/Publications/ Electricity_MMR_2022-Key_Developments.pdf

  7. [7]

    rep., ACER (April 2022)

    Agency for the Cooperation of Energy Regulators (ACER), Final assessment of the EU wholesale electricity market design, Tech. rep., ACER (April 2022). URLhttps://euagenda.eu/upload/publications/acer-26-23039-3bs-20final-20assessment-20of-20the-20eu-20wholesale-20electricity-20market-20design. pdf 27

  8. [8]

    URLhttps://energifaktanorge.no/en/norsk-energiforsyning/ kraftproduksjon/

    Norwegian Water Resources and Energy Directorate (NVE), Electricity production in Norway, Energifakta Norge, accessed April 2026 (2025). URLhttps://energifaktanorge.no/en/norsk-energiforsyning/ kraftproduksjon/

  9. [9]

    Statnett, Interconnector to UK on track for trial operation in october,https://www.statnett.no/en/ for-stakeholders-in-the-power-industry/news-for-the-power-industry/ interconnector-to-uk-on-track-for-trial-operation-in-october/, accessed: April 2026 (2021)

  10. [10]

    Statnett, NordLink in operation,https://www.statnett.no/en/about-statnett/ news-and-press-releases/news-archive-2021/nordlink-in-operation2/, ac- cessed: April 2026 (2021)

  11. [11]

    Uniejewski, R

    B. Uniejewski, R. Weron, F. Ziel, Variance stabilizing transformations for electricity spot price forecasting, IEEE Transactions on Power Systems 33 (2) (2019) 2219–2229. doi:10.1109/TPWRS.2017.2734563

  12. [12]

    doi: 10.3390/en13071667

    G. Marcjasz, B. Uniejewski, R. Weron, Beating the naïve - combining LASSO with naïve intraday recalibration for day-ahead electricity price forecasting, Energies 13 (7) (2020) 1667.doi:10.3390/en13071667

  13. [13]

    J¸ edrzejewski, J

    A. J¸ edrzejewski, J. Lago, Electricity price forecasting: The dawn of machine learn- ing, IEEE Power and Energy Magazine 20 (4) (2022) 5–12.doi:10.1109/MPE.2022. 3150809

  14. [14]

    Kristiansen, Forecasting Nord Pool day-ahead prices with an autoregressive model, Energy Policy 49 (2012) 328–332.doi:10.1016/j.enpol.2012.06.028

    T. Kristiansen, Forecasting Nord Pool day-ahead prices with an autoregressive model, Energy Policy 49 (2012) 328–332.doi:10.1016/j.enpol.2012.06.028

  15. [15]

    T. Kristiansen, A time series spot price forecast model for the Nord Pool market, International Journal of Electrical Power & Energy Systems 61 (2014) 20–26.doi: 10.1016/j.ijepes.2014.03.007. 28

  16. [16]

    W. Li, D. M. Becker, Day-ahead electricity price prediction applying hybrid models of LSTM-baseddeeplearningmethodsandfeatureselectionalgorithmsunderconsideration of market coupling, Energy 237 (2021) 121543.doi:10.1016/j.energy.2021.121543

  17. [17]

    Mehroudust, I

    F. Mehroudust, I. Noorani, S. B. Belhaouari, Forecasting Nordic electricity spot price using deep learning networks, Neural Computing and Applications 35 (26) (2023) 19169– 19185.doi:10.1007/s00521-023-08734-3

  18. [18]

    Vamathevan, M

    G. Vamathevan, M. F. Dynge,˘U. Cali, Electricity price forecasting for norwegian day- ahead market using hybrid ai models, in: 2022 18th International Conference on the European Energy Market (EEM), IEEE, Ljubljana, Slovenia, 2022, pp. 1–6.doi:10. 1109/EEM54602.2022.9921003. URLhttps://doi.org/10.1109/EEM54602.2022.9921003

  19. [19]

    Omdena, AI forecasts Norway power prices,https://www.omdena.com/blog/ electricity-price-forecasting-in-norway(2025)

  20. [20]

    Scott, W

    H. Scott, W. Mellander, Scenario-based long-term electricity price forecasting in no4 (norway) using a hybrid machine learning model, DiVA portal, diva2:1985635, disserta- tion (2025)

  21. [21]

    G. Ke, Q. Meng, T. Finley, T. Wang, W. Chen, W. Ma, Q. Ye, T.-Y. Liu, LightGBM: A highly efficient gradient boosting decision tree, in: Advances in Neural Information Processing Systems, Vol. 30, 2017, pp. 3146–3154

  22. [22]

    Hochreiter, J

    S. Hochreiter, J. Schmidhuber, Long short-term memory, Neural computation 9 (8) (1997) 1735–1780

  23. [23]

    S. Bai, J. Z. Kolter, V. Koltun, An empirical evaluation of generic convolutional and recurrent networks for sequence modeling, arXiv preprint arXiv:1803.01271 (2018)

  24. [24]

    Vaswani, N

    A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, Ł. Kaiser, I. Polosukhin, Attention is all you need, in: Advances in Neural Information Processing Systems, Vol. 30, 2017, pp. 5998–6008. 29

  25. [25]

    Makridakis, Accuracy measures: Theoretical and practical concerns, International Journal of Forecasting 9 (4) (1993) 527–529.doi:10.1016/0169-2070(93)90079-3

    S. Makridakis, Accuracy measures: Theoretical and practical concerns, International Journal of Forecasting 9 (4) (1993) 527–529.doi:10.1016/0169-2070(93)90079-3

  26. [26]

    F. X. Diebold, R. S. Mariano, Comparing predictive accuracy, Journal of Business & Economic Statistics 13 (3) (1995) 253–263.doi:10.1080/07350015.1995.10524599

  27. [27]

    Harvey, S

    D. Harvey, S. Leybourne, P. Newbold, Testing the equality of prediction mean squared errors, International Journal of Forecasting 13 (2) (1997) 281–291.doi:10.1016/ S0169-2070(96)00719-4

  28. [28]

    Open-Meteo, Historical weather API,https://open-meteo.com(2024). 30 Supplementary Figures Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec Month 0 20 40 60 80 100 120Mean Price (EUR/MWh) A NO1 NO2 NO3 NO4 NO5 0 25 50 75 100 125 150 175 200 Lag (hours) 0.6 0.7 0.8 0.9 1.0ACF 24h 168h B NO1 NO2 NO3 NO4 NO5 Mon Tue Wed Thu Fri Sat Sun Day of Week 0 10 20 30 4...