Night-Window Batching versus Carbon-Aware Scheduling for Clinical AI GPU Workloads

Nishi Doshi; Shrey Shah

arxiv: 2606.01766 · v1 · pith:UC63R647new · submitted 2026-06-01 · 💻 cs.DC · cs.ET

Night-Window Batching versus Carbon-Aware Scheduling for Clinical AI GPU Workloads

Nishi Doshi , Shrey Shah This is my paper

Pith reviewed 2026-06-28 12:56 UTC · model grok-4.3

classification 💻 cs.DC cs.ET

keywords GPU schedulingcarbon-aware computingclinical AInight batchingdeadline compliancesimulation studyhospital workloadsurgency tiers

0 comments

The pith

Night-window batching closes 78% of the carbon gap to a mixed urgency-carbon scheduler while missing fewer urgent deadlines.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper uses computer simulation to compare thirteen scheduling rules for GPU workloads on mixed hardware with synthetic patient-style jobs, urgency tiers, and time-of-day carbon traces. It tests whether a simple overnight batching rule for non-urgent jobs performs nearly as well as a richer rule that mixes urgency and carbon at weight 0.45. The overnight rule captures most of the modeled carbon reduction while better protecting urgent jobs from missing deadlines. This matters because hospitals run increasing amounts of machine learning on GPUs and want to lower electricity emissions without compromising time-sensitive clinical tasks. All reported percentages are simulator queue statistics, not clinical outcomes.

Core claim

On an eight-GPU baseline, the overnight rule closes about 78% of the carbon gap between urgency-only and CUCA0.45 while missing fewer urgent deadlines than either. At 48 jobs per hour the carbon footprints nearly tie, yet the overnight rule still misses fewer urgent deadlines. CarbonShift lets about 46% of the most urgent jobs miss their deadline. A geography test where regions share one daily carbon shape with only timezone shifts trims under one percentage point of average carbon. A twelve-hour routine window saves a little carbon for CUCA0.45 but raises overall missed deadlines.

What carries the argument

Night-window batching, the rule that defers non-urgent jobs to overnight periods of lower grid carbon intensity and compares its performance to CUCA0.45 and urgency-only policies.

If this is right

Carbon-only rules such as CarbonShift cause about 46% of the most urgent jobs to miss deadlines.
At 48 jobs per hour the carbon footprints of the overnight rule and CUCA0.45 nearly tie while the overnight rule misses fewer urgent deadlines.
A geography test with shared daily carbon shapes but timezone shifts changes average carbon by under one percentage point.
A twelve-hour routine window improves carbon for CUCA0.45 but increases overall missed deadlines.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Hospitals could achieve most modeled carbon reductions with a fixed overnight window without needing real-time carbon data feeds.
The same fixed-window approach may apply to other 24/7 computing environments where loads can shift to low-emission periods.
The wide run-to-run spread in the simulator suggests that real job-arrival variance could require adjustable window thresholds.

Load-bearing premise

The synthetic patient-style jobs, urgency tiers, time-of-day carbon traces, and mixed GPU hardware in the simulator accurately represent real clinical AI GPU workloads and their scheduling constraints.

What would settle it

Running the overnight batching rule, urgency-only policy, and CUCA0.45 side-by-side on actual hospital GPU job logs paired with real-time grid carbon intensity data would show whether the 78% gap closure and deadline improvements hold.

Figures

Figures reproduced from arXiv: 2606.01766 by Nishi Doshi, Shrey Shah.

**Figure 1.** Figure 1: (a) grid-mean kg CO2e vs. simulator critical-tier miss (%); (b)– (d) mean kg CO2e vs. arrival rate, carbon scenario, and critical-fraction for UrgencyOnly, NightWindowDefer, and CUCA 0.45 (other factors averaged within each panel; see caption) [PITH_FULL_IMAGE:figures/full_fig_p005_1.png] view at source ↗

**Figure 2.** Figure 2: CUCA sweep. Gate-grid means: (a) kg CO2e vs. α; (b) simulator criticaltier miss rate vs. α. 49.50 49.75 50.00 50.25 50.50 50.75 Mean total carbon (kg CO2e) CUCA.45 CGreedy CUCA.90 SRNight US-W CShift CUCA.75 CUCA.60 SlackCarb NightWin SRNight US-E Urgency MiddayWin 49.56 49.56 49.57 49.65 49.66 49.77 49.83 49.86 50.03 50.19 50.27 50.55 (a) Geo validation (ranked means) CUCA 0.45 CUCA 0.45 longshift 0 10 2… view at source ↗

read the original abstract

Hospitals run more machine learning on GPUs while the carbon footprint of grid electricity rises and falls through the day. Using a computer simulation, we compare $13$ scheduling rules on mixed GPU hardware, with synthetic patient-style jobs, urgency tiers, and time-of-day carbon traces. We do not study patient outcomes; every percentage we report is a simulator queue number, not a clinical finding. We ask whether running non-urgent jobs overnight is almost as good as a richer rule that mixes urgency and carbon (CUCA at weight 0.45, written CUCA$_{0.45}$). The comparison keeps carbon reduction secondary to clinical priority and deadline compliance, so each policy is judged on both average kg CO$_2$e and missed-deadline behavior. CarbonGreedy and CarbonShift are carbon-first stress tests that demonstrate how poorly wrong vendor presets can disrupt clinical priorities, and are not meant for production. Numbers are averages over many test settings, with wide run-to-run spread and no statistical adjustment, so headline ratios are exploratory. On an eight-GPU baseline, the overnight rule closes about $78\%$ of the carbon gap between urgency-only and CUCA$_{0.45}$ while missing fewer urgent deadlines than either. CarbonShift lets about $46\%$ of the most urgent jobs miss their deadline; this is simulated queueing, not bedside harm. At $48$ jobs per hour, the carbon footprints almost tie, yet the overnight rule still misses fewer urgent deadlines. A geography test, where regions share one daily carbon shape with only timezone shifts, trims under one percentage point of average carbon; a twelve-hour routine window saves a little carbon for CUCA$_{0.45}$ but raises overall missed deadlines. Overnight batching stays competitive on average modelled carbon; carbon-only rules belong only in stress tests.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Simulation shows a simple overnight batching rule captures most carbon savings of a mixed scheduler while missing fewer urgent deadlines, but the comparison lives entirely in unvalidated synthetic data.

read the letter

The main thing here is the simulation result that overnight batching closes roughly 78% of the carbon gap to CUCA0.45 on an eight-GPU setup and still misses fewer urgent deadlines than either the urgency-only or the mixed rule. At 48 jobs per hour the carbon numbers nearly tie but the overnight rule keeps the edge on deadlines.

They run thirteen policies on synthetic patient-style jobs with urgency tiers, mixed GPU hardware, and time-of-day carbon traces. The work keeps clinical priority first and treats carbon reduction as secondary, which is the right framing. They also run stress tests with pure carbon rules to show how badly those break deadlines. The geography and window-length checks are small add-ons that don't change the picture much.

Nothing in the approach is new at the algorithmic level; batch windows and carbon-aware scheduling are established. The contribution is the narrow application to clinical AI workloads and the direct head-to-head on deadline compliance versus carbon.

The soft spot is the simulator itself. The abstract is upfront that the numbers are exploratory queue statistics with wide run-to-run spread and no statistical adjustment. There is no calibration to real hospital logs, no sensitivity analysis on job-size or deadline distributions, and no external validation. If the synthetic arrival process or carbon variability is off, the ranking between overnight batching and CUCA0.45 can move without any change to the policies.

This is for operations researchers or hospital IT teams who want a concrete policy comparison in simulation before building something real. It deserves a serious referee to check whether the authors can add validation data or at least fuller sensitivity sweeps, but the current version stays preliminary.

Referee Report

2 major / 1 minor

Summary. The paper uses discrete-event simulation to compare 13 scheduling policies (including urgency-only, CUCA0.45, overnight batching, and carbon-first stress tests) for synthetic clinical AI GPU jobs on mixed hardware with time-of-day carbon traces. It reports that, on an 8-GPU baseline at 48 jobs/h, overnight batching closes ~78% of the carbon gap between urgency-only and CUCA0.45 while missing fewer urgent deadlines than either; results are framed as exploratory simulator queue statistics with wide run-to-run variance and no statistical adjustment.

Significance. If the synthetic workload assumptions hold, the comparison demonstrates that a simple overnight window can capture most modeled carbon savings of a richer urgency-carbon policy while preserving better deadline compliance for urgent jobs; the explicit stress tests for carbon-first rules and the geography/routine-window variants add useful boundary cases. The work's direct policy comparison and acknowledgment that all percentages are simulator outputs (not clinical findings) are strengths.

major comments (2)

[Results] Results section (headline 78% gap-closure claim): the quantitative ranking of overnight batching versus CUCA0.45 is produced entirely inside the simulator; the synthetic patient-style job generator, urgency tiers, deadline model, carbon-intensity traces, and mixed-GPU hardware are not calibrated against hospital logs or subjected to sensitivity sweeps on job-size or arrival distributions, which is load-bearing for the central comparative claim.
[Methods] Simulation setup (Methods): the abstract notes wide run-to-run spread and absence of statistical adjustment, yet the headline ratios are still presented as the primary finding; without confidence intervals or robustness checks on the free parameters (CUCA weight 0.45, 48 jobs/h arrival rate), the reported 78% figure and deadline-miss ordering remain sensitive to untested modeling choices.

minor comments (1)

[Abstract] Abstract: the notation CUCA$_{0.45}$ is introduced without an earlier inline definition of the weighting scheme, which could be clarified on first use.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the detailed review. We agree the work is exploratory simulation under synthetic assumptions and will revise to further emphasize limitations, add sensitivity analysis where feasible, and clarify that headline figures are not robust statistical claims.

read point-by-point responses

Referee: [Results] Results section (headline 78% gap-closure claim): the quantitative ranking of overnight batching versus CUCA0.45 is produced entirely inside the simulator; the synthetic patient-style job generator, urgency tiers, deadline model, carbon-intensity traces, and mixed-GPU hardware are not calibrated against hospital logs or subjected to sensitivity sweeps on job-size or arrival distributions, which is load-bearing for the central comparative claim.

Authors: We concur that the 78% figure and policy ranking rest entirely on uncalibrated synthetic parameters and are not validated against hospital logs. The manuscript already states that all reported percentages are simulator queue statistics, not clinical findings, and frames the study as exploratory. We will add an expanded limitations subsection explicitly noting the absence of real-data calibration and the load-bearing role of the synthetic generator. We will also run and report additional sensitivity sweeps on arrival rate and job-size distributions in the revision. revision: partial
Referee: [Methods] Simulation setup (Methods): the abstract notes wide run-to-run spread and absence of statistical adjustment, yet the headline ratios are still presented as the primary finding; without confidence intervals or robustness checks on the free parameters (CUCA weight 0.45, 48 jobs/h arrival rate), the reported 78% figure and deadline-miss ordering remain sensitive to untested modeling choices.

Authors: The abstract and main text already flag the wide run-to-run variance and lack of statistical adjustment, describing the ratios as exploratory. We will revise the results and methods sections to include explicit discussion of sensitivity to the listed free parameters (CUCA weight and arrival rate) and report ranges across additional runs. Full confidence intervals or formal robustness testing would require extending the simulation framework beyond the current scope; we can partially address this by documenting parameter sensitivity in the revision. revision: partial

Circularity Check

0 steps flagged

No significant circularity: direct simulation comparison with no fitted predictions or self-referential derivations

full rationale

The paper performs a simulation study comparing 13 scheduling rules on synthetic workloads. All reported metrics (carbon footprints, missed deadlines, gap closures) are direct outputs from the simulator runs, with no mathematical derivations, parameter fitting, or predictions that reduce to inputs by construction. No self-citations, uniqueness theorems, or ansatzes are invoked as load-bearing steps. The work explicitly labels results as exploratory simulator queue numbers rather than derived claims. This matches the default case of a self-contained empirical comparison without circular reduction.

Axiom & Free-Parameter Ledger

2 free parameters · 1 axioms · 0 invented entities

Abstract-only review; simulation parameters (CUCA weight 0.45, job arrival rate of 48 per hour, eight-GPU baseline, carbon trace shapes) are introduced without independent justification or external benchmarks.

free parameters (2)

CUCA weight 0.45
Blending weight between urgency and carbon chosen for the main comparison rule.
job arrival rate (48 per hour)
Rate used in one of the reported scenarios.

axioms (1)

domain assumption Synthetic jobs and time-of-day carbon traces sufficiently represent real hospital GPU workloads
Central modeling choice stated in abstract.

pith-pipeline@v0.9.1-grok · 5868 in / 1294 out tokens · 24181 ms · 2026-06-28T12:56:57.269259+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

25 extracted references · 2 canonical work pages · 2 internal anchors

[1]

Future Healthcare Journal , volume=

The Potential for Artificial Intelligence in Healthcare , author=. Future Healthcare Journal , volume=. 2019 , publisher=

2019
[2]

A Governance Model for the Application of

Reddy, Sandeep and Allan, Sonia and Coghlan, Simon and Cooper, Paul , journal=. A Governance Model for the Application of. 2020 , publisher=

2020
[3]

Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics , pages=

Energy and Policy Considerations for Deep Learning in NLP , author=. Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics , pages=
[4]

Communications of the ACM , volume=

Green AI , author=. Communications of the ACM , volume=. 2020 , publisher=

2020
[5]

Carbon Emissions and Large Neural Network Training

Carbon Emissions and Large Neural Network Training , author=. arXiv preprint arXiv:2104.10350 , year=

work page internal anchor Pith review Pith/arXiv arXiv
[6]

Proceedings of the 2022 ACM Conference on Fairness, Accountability, and Transparency , pages=

Measuring the Carbon Intensity of AI in Cloud Instances , author=. Proceedings of the 2022 ACM Conference on Fairness, Accountability, and Transparency , pages=. 2022 , publisher=

2022
[7]

Journal of the ACM , volume=

Scheduling Algorithms for Multiprogramming in a Hard-Real-Time Environment , author=. Journal of the ACM , volume=. 1973 , publisher=

1973
[8]

2011 , publisher=

Hard Real-Time Computing Systems: Predictable Scheduling Algorithms and Applications , author=. 2011 , publisher=

2011
[9]

Communications of the ACM , volume=

The Tail at Scale , author=. Communications of the ACM , volume=. 2013 , publisher=

2013
[10]

JAMA , volume=

Development and Validation of a Deep Learning Algorithm for Detection of Diabetic Retinopathy in Retinal Fundus Photographs , author=. JAMA , volume=
[11]

2020 7th International Conference on Signal Processing and Integrated Networks (SPIN) , pages=

Diabetic Retinopathy Classification Using Downscaling Algorithms and Deep Learning , author=. 2020 7th International Conference on Signal Processing and Integrated Networks (SPIN) , pages=. 2020 , address=

2020
[12]

CheXNet: Radiologist-Level Pneumonia Detection on Chest X-Rays with Deep Learning

CheXNet: Radiologist-Level Pneumonia Detection on Chest X-Rays with Deep Learning , author=. arXiv preprint arXiv:1711.05225 , year=

work page internal anchor Pith review Pith/arXiv arXiv
[13]

Proceedings of the 22nd International Middleware Conference , pages=

Let's Wait Awhile: How Temporal Workload Shifting Can Reduce Carbon Emissions in the Cloud , author=. Proceedings of the 22nd International Middleware Conference , pages=. 2021 , publisher=

2021
[14]

Proceedings of the 1st Workshop on Sustainable Computer Systems Design and Implementation (HotCarbon) , year=

Treehouse: A Case For Carbon-Aware Datacenter Software , author=. Proceedings of the 1st Workshop on Sustainable Computer Systems Design and Implementation (HotCarbon) , year=
[15]

Proceedings of the 28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS) , year=

Carbon Explorer: A Holistic Framework for Designing Carbon Aware Datacenters , author=. Proceedings of the 28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS) , year=
[16]

Proceedings of the 28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS) , year=

Ecovisor: A Virtual Energy System for Carbon-Efficient Applications , author=. Proceedings of the 28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS) , year=
[17]

Computer , volume=

The Carbon Footprint of Machine Learning Training Will Plateau, Then Shrink , author=. Computer , volume=. 2022 , publisher=

2022
[18]

Proceedings of the 2024 ACM Conference on Fairness, Accountability, and Transparency (FAccT) , pages=

Power Hungry Processing: Watts Driving the Cost of AI Deployment? , author=. Proceedings of the 2024 ACM Conference on Fairness, Accountability, and Transparency (FAccT) , pages=

2024
[19]

Computer , volume=

Implications of Classical Scheduling Results for Real-Time Systems , author=. Computer , volume=. 1995 , publisher=

1995
[20]

Health Care System and Effects on Public Health , author=

Environmental Impacts of the U.S. Health Care System and Effects on Public Health , author=. PLOS ONE , volume=
[21]

The Lancet Planetary Health , volume=

The Environmental Footprint of Health Care: A Global Assessment , author=. The Lancet Planetary Health , volume=
[22]

and Smith, Andrew Z

Tennison, Imogen and Roschnik, Sonia and Ashby, Ben and Boyd, Robin and Hamilton, Ian and Oreszczyn, Tadj and Owen, Anne and Romanello, Marina and Ruyssevelt, Paul and Sherman, Jodi D. and Smith, Andrew Z. P. and Steele, Kristian and Watts, Nick and Eckelman, Matthew J. , journal=. Health Care's Response to Climate Change: A Carbon Footprint Assessment of the
[23]

2022 , howpublished=

Delivering a Net Zero National Health Service , author=. 2022 , howpublished=

2022
[24]

Nature Medicine , volume=

High-Performance Medical Intelligence: Deep Learning Applications in Medicine , author=. Nature Medicine , volume=
[25]

Nature Machine Intelligence , volume=

Secure, Privacy-Preserving and Federated Machine Learning in Medical Imaging , author=. Nature Machine Intelligence , volume=

[1] [1]

Future Healthcare Journal , volume=

The Potential for Artificial Intelligence in Healthcare , author=. Future Healthcare Journal , volume=. 2019 , publisher=

2019

[2] [2]

A Governance Model for the Application of

Reddy, Sandeep and Allan, Sonia and Coghlan, Simon and Cooper, Paul , journal=. A Governance Model for the Application of. 2020 , publisher=

2020

[3] [3]

Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics , pages=

Energy and Policy Considerations for Deep Learning in NLP , author=. Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics , pages=

[4] [4]

Communications of the ACM , volume=

Green AI , author=. Communications of the ACM , volume=. 2020 , publisher=

2020

[5] [5]

Carbon Emissions and Large Neural Network Training

Carbon Emissions and Large Neural Network Training , author=. arXiv preprint arXiv:2104.10350 , year=

work page internal anchor Pith review Pith/arXiv arXiv

[6] [6]

Proceedings of the 2022 ACM Conference on Fairness, Accountability, and Transparency , pages=

Measuring the Carbon Intensity of AI in Cloud Instances , author=. Proceedings of the 2022 ACM Conference on Fairness, Accountability, and Transparency , pages=. 2022 , publisher=

2022

[7] [7]

Journal of the ACM , volume=

Scheduling Algorithms for Multiprogramming in a Hard-Real-Time Environment , author=. Journal of the ACM , volume=. 1973 , publisher=

1973

[8] [8]

2011 , publisher=

Hard Real-Time Computing Systems: Predictable Scheduling Algorithms and Applications , author=. 2011 , publisher=

2011

[9] [9]

Communications of the ACM , volume=

The Tail at Scale , author=. Communications of the ACM , volume=. 2013 , publisher=

2013

[10] [10]

JAMA , volume=

Development and Validation of a Deep Learning Algorithm for Detection of Diabetic Retinopathy in Retinal Fundus Photographs , author=. JAMA , volume=

[11] [11]

2020 7th International Conference on Signal Processing and Integrated Networks (SPIN) , pages=

Diabetic Retinopathy Classification Using Downscaling Algorithms and Deep Learning , author=. 2020 7th International Conference on Signal Processing and Integrated Networks (SPIN) , pages=. 2020 , address=

2020

[12] [12]

CheXNet: Radiologist-Level Pneumonia Detection on Chest X-Rays with Deep Learning

CheXNet: Radiologist-Level Pneumonia Detection on Chest X-Rays with Deep Learning , author=. arXiv preprint arXiv:1711.05225 , year=

work page internal anchor Pith review Pith/arXiv arXiv

[13] [13]

Proceedings of the 22nd International Middleware Conference , pages=

Let's Wait Awhile: How Temporal Workload Shifting Can Reduce Carbon Emissions in the Cloud , author=. Proceedings of the 22nd International Middleware Conference , pages=. 2021 , publisher=

2021

[14] [14]

Proceedings of the 1st Workshop on Sustainable Computer Systems Design and Implementation (HotCarbon) , year=

Treehouse: A Case For Carbon-Aware Datacenter Software , author=. Proceedings of the 1st Workshop on Sustainable Computer Systems Design and Implementation (HotCarbon) , year=

[15] [15]

Proceedings of the 28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS) , year=

Carbon Explorer: A Holistic Framework for Designing Carbon Aware Datacenters , author=. Proceedings of the 28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS) , year=

[16] [16]

Proceedings of the 28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS) , year=

Ecovisor: A Virtual Energy System for Carbon-Efficient Applications , author=. Proceedings of the 28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS) , year=

[17] [17]

Computer , volume=

The Carbon Footprint of Machine Learning Training Will Plateau, Then Shrink , author=. Computer , volume=. 2022 , publisher=

2022

[18] [18]

Proceedings of the 2024 ACM Conference on Fairness, Accountability, and Transparency (FAccT) , pages=

Power Hungry Processing: Watts Driving the Cost of AI Deployment? , author=. Proceedings of the 2024 ACM Conference on Fairness, Accountability, and Transparency (FAccT) , pages=

2024

[19] [19]

Computer , volume=

Implications of Classical Scheduling Results for Real-Time Systems , author=. Computer , volume=. 1995 , publisher=

1995

[20] [20]

Health Care System and Effects on Public Health , author=

Environmental Impacts of the U.S. Health Care System and Effects on Public Health , author=. PLOS ONE , volume=

[21] [21]

The Lancet Planetary Health , volume=

The Environmental Footprint of Health Care: A Global Assessment , author=. The Lancet Planetary Health , volume=

[22] [22]

and Smith, Andrew Z

Tennison, Imogen and Roschnik, Sonia and Ashby, Ben and Boyd, Robin and Hamilton, Ian and Oreszczyn, Tadj and Owen, Anne and Romanello, Marina and Ruyssevelt, Paul and Sherman, Jodi D. and Smith, Andrew Z. P. and Steele, Kristian and Watts, Nick and Eckelman, Matthew J. , journal=. Health Care's Response to Climate Change: A Carbon Footprint Assessment of the

[23] [23]

2022 , howpublished=

Delivering a Net Zero National Health Service , author=. 2022 , howpublished=

2022

[24] [24]

Nature Medicine , volume=

High-Performance Medical Intelligence: Deep Learning Applications in Medicine , author=. Nature Medicine , volume=

[25] [25]

Nature Machine Intelligence , volume=

Secure, Privacy-Preserving and Federated Machine Learning in Medical Imaging , author=. Nature Machine Intelligence , volume=