arxiv: 2605.05270 · v1 · submitted 2026-05-06 · 📊 stat.ML · cs.LG· stat.AP

Recognition: unknown

Forecasting Oncology Demand Trends with Boosting-Based Bayesian Conjugate Models

Ademir Batista dos Santos Neto, Paulo Renato Alves Firmino, Tiago Alessandro Espinola Ferreira

Authors on Pith no claims yet

Pith reviewed 2026-05-08 16:47 UTC · model grok-4.3

classification 📊 stat.ML cs.LGstat.AP

keywords oncology demand forecastingBayesian conjugate modelsresidual boostingtrend detectionPoisson processhealthcare time seriesGamma priordirectional accuracy

0 comments

The pith

A residual-boosting Bayesian conjugate model forecasts oncology demand trends more accurately than standard methods by tracking directional shifts.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper develops a Bayesian model for predicting weekly oncology appointments by treating demand as a Poisson process with a Gamma prior on the rate parameter. It adds a residual-based boosting layer using a Gamma-Log-Normal conjugate pair to adapt to changing trends over short and long periods while keeping updates analytically tractable. The approach was tested on real data from a Brazilian oncology service and shown to predict the direction of demand changes better than linear regression, ARIMA, naive methods, LSTM, and XGBoost, with improvements up to 38 percent in directional accuracy. Accurate forecasts matter because they support better planning of medical resources and staffing in healthcare settings.

Core claim

The authors establish that incorporating a residual-based boosting mechanism within a Gamma-Log-Normal conjugate Bayesian structure for Poisson demand rates allows the model to track both short- and long-term trend shifts in oncology service data, resulting in superior directional forecast accuracy relative to conventional and machine learning baselines on the evaluated Brazilian dataset.

What carries the argument

Residual-based boosting mechanism grounded in a Gamma-Log-Normal conjugate structure that iteratively corrects the demand rate prior to capture persistent directional patterns while preserving conjugate Bayesian tractability.

If this is right

More reliable predictions enable better scheduling and resource planning for oncology services.
The approach supports continuous updating as new weekly data arrives without retraining from scratch.
It provides a balance between adaptability to trend changes and avoidance of overfitting through the conjugate prior structure.
Directional accuracy gains suggest reduced errors in anticipating increases or decreases in patient volume.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The method could be tested on other healthcare count series such as emergency visits or prescription demands.
Incorporating additional predictors like holidays or disease outbreaks might enhance performance further.
Its computational efficiency makes it practical for smaller hospitals lacking advanced computing resources.
Cross-validation on datasets from different countries would test the robustness beyond the Cariri region.

Load-bearing premise

The residual-based boosting mechanism grounded in the Gamma-Log-Normal conjugate structure can track both short- and long-term trend shifts without introducing bias or overfitting to the specific dataset.

What would settle it

If the proposed model shows lower directional accuracy than at least one baseline method when evaluated on a fresh collection of oncology appointment records not used in the original study, the claimed superiority would be falsified.

Figures

Figures reproduced from arXiv: 2605.05270 by Ademir Batista dos Santos Neto, Paulo Renato Alves Firmino, Tiago Alessandro Espinola Ferreira.

**Figure 1.** Figure 1: Gamma-Poisson Bayesian net to model demand count processes. The likelihood view at source ↗

**Figure 2.** Figure 2: Gamma–Log-Normal Bayesian net graph of the multiplicative error with respect view at source ↗

**Figure 3.** Figure 3: Boxplot of POCID across models: ARIMA, Linear Regression, Naïve forecasting, view at source ↗

read the original abstract

Accurate trend forecasting in healthcare time series is essential for planning and resource allocation. This paper proposes a Bayesian framework for predicting oncology demand trends, modeling weekly appointments as a Poisson process with a Gamma prior to the demand rate. To enhance adaptability and capture persistent directional patterns, we incorporate a residual-based boosting mechanism grounded in a Gamma-Log-Normal conjugate structure. This boosting approach allows the model to track both short- and long-term trend shifts while maintaining the analytical tractability of conjugate Bayesian updating. The methodology was evaluated on real oncology service data from Cariri, Ceara, Brazil, and compared against established baselines, including linear regression, ARIMA, naive forecasting, LSTM neural networks, and XGBoost. Results showed that the proposed model outperforms competing methods in trend detection accuracy, with gains in terms of percentage of correct direction of 38.25% in relation to the second best approach in some cases.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper proposes a conjugate Bayesian boosting model for oncology forecasting but its performance claims lack supporting experimental details.

read the letter

The paper proposes a Bayesian conjugate model for forecasting oncology demand, using a Poisson process with Gamma prior and adding a residual-based boosting mechanism based on Gamma-Log-Normal structure. This is meant to capture trend shifts while keeping analytical updates. What stands out is the attempt to combine boosting with exact conjugate Bayesian inference for count data. They test it on real weekly appointment data from Cariri, Brazil, and report it outperforms linear regression, ARIMA, naive, LSTM, and XGBoost in detecting the correct direction of trends, with up to 38.25% better accuracy in some cases. The approach has some appeal for applied work because conjugate models allow fast updates without MCMC. The boosting part is intended to handle both short and long term patterns without losing that tractability. However, the abstract provides almost no information on the experimental setup. There are no details on data splitting, how boosting hyperparameters were chosen, whether cross-validation was used, or any statistical significance tests. The single dataset also limits how much we can generalize the results. On the technical side, I share the stress-test concern: adding residuals to the boosting step could easily disrupt the exact conjugacy of the Gamma-Log-Normal prior, forcing approximations that the paper claims to avoid. Without the full equations or code, it's hard to tell if they pulled that off or if the tractability is only approximate. This paper is aimed at people working on healthcare resource planning who want a Bayesian alternative to black-box ML for time series. A reader looking for new methods in applied forecasting might find the idea useful, but they would need to implement and test it themselves. It has enough of a novel angle to deserve peer review, though it would need major revisions for the methods section and more validation. I would recommend sending it to referees with specific requests for the full model equations, evaluation protocol, and additional datasets or simulations to check the conjugacy claim.

Referee Report

3 major / 2 minor

Summary. The paper proposes a Bayesian framework for forecasting oncology demand trends by modeling weekly appointments as a Poisson process with a Gamma prior on the demand rate. It augments this with a residual-based boosting mechanism grounded in a Gamma-Log-Normal conjugate structure to capture short- and long-term trend shifts while preserving analytical tractability of conjugate Bayesian updating. Evaluated on real oncology service data from Cariri, Ceara, Brazil, the model is claimed to outperform baselines including linear regression, ARIMA, naive forecasting, LSTM, and XGBoost in trend detection accuracy, with gains of up to 38.25% in the percentage of correct direction predictions relative to the second-best method.

Significance. If the conjugacy is exactly preserved under the boosting updates and the empirical gains are shown to be robust under proper evaluation controls, the approach could provide an efficient, closed-form Bayesian method for adaptive healthcare demand forecasting that combines interpretability with flexibility for trend shifts. This would be particularly useful for resource planning in oncology services where data are count-based and updates need to remain tractable.

major comments (3)

[Abstract / Model formulation] Abstract and model description: The central claim that the residual-based boosting 'maintains the analytical tractability of conjugate Bayesian updating' is load-bearing for the contribution, yet no explicit equations are provided showing how residuals (on rate or log-rate scale) are defined and incorporated so that the posterior remains exactly Gamma-Log-Normal after each boosting iteration. If the residual step requires approximation or iterative numerical adjustment, the claimed closed-form property fails and the 38.25% gain may reflect bias rather than genuine improvement.
[Evaluation / Results] Evaluation section: The reported outperformance (38.25% gain in correct direction) is presented without any description of the data partitioning protocol, train/test split sizes, number of time periods in the Cariri dataset, cross-validation procedure, or whether boosting hyperparameters (learning rate, number of iterations) were tuned on held-out data. This omission leaves the headline empirical result only partially supported and vulnerable to overfitting or selection bias.
[Results] Results comparison: No error bars, confidence intervals, or statistical significance tests are reported for the direction-accuracy metric against baselines (ARIMA, LSTM, XGBoost). Without these, it is impossible to determine whether the observed gains are reliable or could arise from random variation on a single regional dataset.

minor comments (2)

[Abstract] The phrase 'percentage of correct direction' should be formally defined (e.g., sign of predicted change matching sign of observed change) and the exact formula given, as it is the primary performance metric.
[Methodology] Clarify whether the Poisson-Gamma base model is updated exactly at each time step or whether the boosting residuals are applied in a batch manner; this affects the claimed online adaptability.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the detailed and constructive comments. We address each major comment below and indicate the specific revisions planned for the manuscript.

read point-by-point responses

Referee: [Abstract / Model formulation] Abstract and model description: The central claim that the residual-based boosting 'maintains the analytical tractability of conjugate Bayesian updating' is load-bearing for the contribution, yet no explicit equations are provided showing how residuals (on rate or log-rate scale) are defined and incorporated so that the posterior remains exactly Gamma-Log-Normal after each boosting iteration. If the residual step requires approximation or iterative numerical adjustment, the claimed closed-form property fails and the 38.25% gain may reflect bias rather than genuine improvement.

Authors: We agree that explicit equations are required to substantiate the conjugacy preservation claim. In the revised manuscript we will insert a dedicated subsection (new Section 3.3) that defines the residuals explicitly on the log-rate scale and derives the exact multiplicative update rules. The derivation will show that each boosting step maps the current Gamma-Log-Normal posterior to a new Gamma-Log-Normal posterior of the same functional form, thereby retaining closed-form Bayesian updating without numerical approximation or iterative adjustment. revision: yes
Referee: [Evaluation / Results] Evaluation section: The reported outperformance (38.25% gain in correct direction) is presented without any description of the data partitioning protocol, train/test split sizes, number of time periods in the Cariri dataset, cross-validation procedure, or whether boosting hyperparameters (learning rate, number of iterations) were tuned on held-out data. This omission leaves the headline empirical result only partially supported and vulnerable to overfitting or selection bias.

Authors: We acknowledge that the experimental protocol was described too briefly. The revised paper will add a new subsection (Section 4.1) that specifies: (i) the chronological train/test partitioning used to respect temporal order, (ii) the exact number of weeks in the Cariri dataset and the resulting split sizes, (iii) the cross-validation scheme applied exclusively to the training portion, and (iv) confirmation that boosting hyperparameters were selected via grid search on held-out training folds only, with final evaluation performed on the untouched test set. revision: yes
Referee: [Results] Results comparison: No error bars, confidence intervals, or statistical significance tests are reported for the direction-accuracy metric against baselines (ARIMA, LSTM, XGBoost). Without these, it is impossible to determine whether the observed gains are reliable or could arise from random variation on a single regional dataset.

Authors: We will strengthen the Results section by adding bootstrap-derived error bars and 95% confidence intervals for the direction-accuracy metric across all methods. In addition, we will report the results of a paired statistical test (McNemar’s test for binary direction predictions) to assess whether the observed improvements over the second-best baseline are statistically significant. These additions will be included in the revised Tables 2 and 3 and accompanying text. revision: yes

Circularity Check

0 steps flagged

No circularity: proposed conjugate boosting model evaluated empirically against baselines

full rationale

The paper introduces a new residual-based boosting mechanism on a Gamma-Log-Normal conjugate structure for Poisson demand modeling, then reports empirical outperformance on held-out oncology appointment data from Cariri against linear regression, ARIMA, naive, LSTM, and XGBoost baselines. No equations or self-citations are present that define the boosting residuals in terms of the target predictions, rename a fitted quantity as a forecast, or invoke an author-specific uniqueness theorem to force the architecture. The claimed tractability and trend-direction gains are presented as consequences of the model choice rather than tautological re-expressions of the training fit, leaving the derivation self-contained.

Axiom & Free-Parameter Ledger

1 free parameters · 2 axioms · 0 invented entities

The central claim rests on standard conjugate properties and a domain assumption about Poisson arrivals, plus an ad-hoc boosting mechanism whose parameters are not detailed in the abstract.

free parameters (1)

Boosting hyperparameters (learning rate, iterations)
Residual boosting requires tuning parameters to control correction size and number of steps, which are typically fitted or chosen on data.

axioms (2)

domain assumption Weekly oncology appointments follow a Poisson process.
Explicitly stated as the base model for demand counts.
standard math Gamma prior is conjugate to Poisson likelihood.
Invoked to ensure analytical tractability of Bayesian updates.

pith-pipeline@v0.9.0 · 5463 in / 1582 out tokens · 66476 ms · 2026-05-08T16:47:16.521769+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

27 extracted references

[1]

A new hybrid prediction model for stock market forecasting.Expert SystemswithApplications, 39(3):4346–4356, 2012

Shahrokh Asadi, Esmaeil Hadavandi, Farhad Mehmanpazir, and Mo- hammad Mehdi Nakhostin. A new hybrid prediction model for stock market forecasting.Expert SystemswithApplications, 39(3):4346–4356, 2012

2012
[2]

Time series analysis: forecasting and control

George EP Box, Gwilym M Jenkins, Gregory C Reinsel, and Greta M Ljung. Time series analysis: forecasting and control. John Wiley & Sons, 2015

2015
[3]

Xgboost: A scalable tree boosting system

Tianqi Chen and Carlos Guestrin. Xgboost: A scalable tree boosting system. InProceedings of the 22nd acm sigkdd international conference on knowledge discovery and data mining, pages 785–794, 2016

2016
[4]

Optimal statistical decisions

Morris H DeGroot. Optimal statistical decisions. John Wiley & Sons, 2005

2005
[5]

Generalized linear models to forecast malaria incidence in three endemic regions of senegal

Ousmane Diao, P-A Absil, and Mouhamadou Diallo. Generalized linear models to forecast malaria incidence in three endemic regions of senegal. International Journal of Environmental Research and Public Health, 20(13):6303, 2023

2023
[6]

Diebold and Roberto S

Francis X. Diebold and Roberto S. Mariano. Comparing predictive ac- curacy. Journal of Business & Economic Statistics, 13(3):253–263, 1995. 16

1995
[7]

A poisson- gamma model for zero inflated rainfall data.Journal of Probability and Statistics, 2018(1):1012647, 2018

Nelson Christopher Dzupire, Philip Ngare, and Leo Odongo. A poisson- gamma model for zero inflated rainfall data.Journal of Probability and Statistics, 2018(1):1012647, 2018

2018
[8]

Firmino, Paulo S

Paulo Renato A. Firmino, Paulo S. G. de Mattos Neto, and Tiago A. E. Ferreira. Correcting and combining time series forecasters.Neural Networks, 50:1–11, 2014

2014
[9]

Greedy function approximation: a gradient boost- ing machine.Annals of statistics, pages 1189–1232, 2001

Jerome H Friedman. Greedy function approximation: a gradient boost- ing machine.Annals of statistics, pages 1189–1232, 2001

2001
[10]

Bayesian Data Analysis

Andrew Gelman, John B Carlin, Hal S Stern, David B Dunson, Aki Vehtari, and Donald B Rubin. Bayesian Data Analysis. CRC Press, 2013

2013
[11]

Exponential and bayesian conjugate families: review and exten- sions

E Gutiérrez-Peña, AFM Smith, José M Bernardo, Guido Consonni, Piero Veronese, EI George, FJ Girón, ML Martínez, G Letac, and Carl N Morris. Exponential and bayesian conjugate families: review and exten- sions. Test, 6:1–90, 1997

1997
[12]

Long short-term memory

Sepp Hochreiter and Jürgen Schmidhuber. Long short-term memory. Neural computation, 9(8):1735–1780, 1997

1997
[13]

OTexts, 2018

Rob J Hyndman and George Athanasopoulos.Forecasting: principles and practice. OTexts, 2018

2018
[14]

A state space framework for automatic forecasting using exponential smoothing methods.InternationalJournalofforecasting, 18(3):439–454, 2002

Rob J Hyndman, Anne B Koehler, Ralph D Snyder, and Simone Grose. A state space framework for automatic forecasting using exponential smoothing methods.InternationalJournalofforecasting, 18(3):439–454, 2002

2002
[15]

Nathan Minois, Stéphanie Savy, Valérie Lauwers-Cances, Sandrine An- drieu, and Nicolas Savy. How to deal with the poisson-gamma model to forecast patients’ recruitment in clinical trials when there are pauses in recruitment dynamic?Contemporary clinical trials communications, 5:144–152, 2017

2017
[16]

The assessment of probability dis- tributions from expert opinions with an application to seismic fragility curves

Ali Mosleh and George Apostolakis. The assessment of probability dis- tributions from expert opinions with an application to seismic fragility curves. Risk Analysis, 6(4):447–461, 1986. 17

1986
[17]

Machine learning: a probabilistic perspective

Kevin P Murphy. Machine learning: a probabilistic perspective. MIT press, 2012

2012
[18]

Scikit-learn: machine learning in python,”: e journal of machine learning research, vol

F Pedregosa, G Varoquaux, A Gramfort, V Michel, B Thirion, O Grisel, M Blondel, P Prettenhofer, R Weiss, V Dubourg, et al. Scikit-learn: machine learning in python,”: e journal of machine learning research, vol. 12. 2011

2011
[19]

pmdarima: Arima estimators for python

Taylor G Smith et al. pmdarima: Arima estimators for python. Retrieved from, 309, 2017

2017
[20]

Cancerincidence and mortality projections in the uk until 2035.British journal of cancer, 115(9):1147–1155, 2016

CRSmittenaar, KAPetersen, KStewart, andNMoitt. Cancerincidence and mortality projections in the uk until 2035.British journal of cancer, 115(9):1147–1155, 2016

2035
[21]

An overview of health fore- casting

Ireneous N Soyiri and Daniel D Reidpath. An overview of health fore- casting. Environmental health and preventivemedicine, 18:1–9, 2013

2013
[22]

On forecasting counts.Journal of Forecasting, 27(2):109–129, 2008

Brajendra C Sutradhar. On forecasting counts.Journal of Forecasting, 27(2):109–129, 2008

2008
[23]

The time-dependent poisson-gamma model in practice: Recruitment forecasting in hiv trials

ArmandoTurchetta, EricaEMMoodie, DavidAStephens, NicolasSavy, and Zoe Moodie. The time-dependent poisson-gamma model in practice: Recruitment forecasting in hiv trials. Contemporary Clinical Trials, 144:107607, 2024

2024
[24]

Springer Science & Business Media, 2006

Mike West and Jeff Harrison.Bayesianforecasting and dynamic models. Springer Science & Business Media, 2006

2006
[25]

XGBoost Python PackageDocumentation, 2024

XGBoost Developers. XGBoost Python PackageDocumentation, 2024. Python package version 2.0.3

2024
[26]

Bayesian beta regression for bounded responses with

H Zhou et al. Bayesian beta regression for bounded responses with ... Computational Statistics & Data Analysis, 2022

2022
[27]

Bayesian beta regression for bounded responses with unknown supports

Haiming Zhou and Xianzheng Huang. Bayesian beta regression for bounded responses with unknown supports. Computational Statistics & Data Analysis, 167:107345, 2022. 18

2022