Greening AI Inference with Accuracy and Latency-aware User Incentives

Adamantia Stamou; George D. Stamoulis; Konstantinos Varsos; Ramin Khalili; Vasilios A. Siris

arxiv: 2605.27309 · v1 · pith:PWHGWTCRnew · submitted 2026-05-26 · 💻 cs.LG · cs.OH

Greening AI Inference with Accuracy and Latency-aware User Incentives

Vasilios A. Siris , Adamantia Stamou , George D. Stamoulis , Konstantinos Varsos , Ramin Khalili This is my paper

Pith reviewed 2026-06-29 19:00 UTC · model grok-4.3

classification 💻 cs.LG cs.OH

keywords AI inferencecarbon emissionsuser incentivesquality of experiencetwo-tier subscriptionenvironmental sustainabilitylatencyaccuracy

0 comments

The pith

A framework designs AI inference incentives using user valuations for quality, latency, and environmental consciousness to enable two-tier subscriptions that cut carbon emissions.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This paper introduces a framework for designing AI inference incentives based on the users' valuation for inference quality and latency, together with their environmental consciousness, while accounting for the tradeoff between carbon emissions and the two QoE parameters. The approach accommodates different tradeoffs that depend on the size and complexity of the AI models and the allocation of resources to serve inference requests. Incentives are offered through a practical two-tier service subscription that gives users a discount in exchange for reduced carbon emissions. The discounted option allows the AI provider flexibility to serve some percentage of requests at lower quality and higher latency during high carbon intensity periods. A sympathetic reader would care because it provides a market-based mechanism to address the major contributor to AI's environmental impact without requiring uniform performance sacrifices.

Core claim

The paper claims that incentives for AI inference can be designed based on users' valuation for inference quality and latency, together with their environmental consciousness, while accounting for the tradeoff between carbon emissions and the two QoE parameters. This framework can be implemented via a two-tier service subscription offering users a discount in exchange for reduced carbon emissions, giving the provider flexibility to serve some percentage of inference requests at a lower quality and higher latency during periods of high carbon intensity. The approach accommodates different tradeoffs depending on model size, complexity, and resource allocation.

What carries the argument

The two-tier subscription model offering discounts for allowing reduced quality and increased latency during high carbon intensity periods.

If this is right

Providers gain flexibility to allocate resources differently based on carbon intensity without losing all users.
The framework works across varying model sizes and complexities by adjusting the quality-latency tradeoff.
Users with stronger environmental consciousness are positioned to select the discounted tier.
Carbon emissions from AI inference can be lowered through voluntary user choices rather than uniform restrictions.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Real-world testing could reveal whether measurable valuations translate into subscription uptake at scale.
The model might extend to other resource-constrained services where environmental costs vary over time.
If valuations prove hard to elicit, hybrid approaches combining surveys with observed behavior could be needed.

Load-bearing premise

User valuations for quality, latency, and environmental consciousness exist in measurable forms that can be used to design effective two-tier subscriptions.

What would settle it

A deployment measuring whether users opt into the discounted tier at rates that produce a statistically significant drop in total carbon emissions from inference while maintaining acceptable overall user retention.

Figures

Figures reproduced from arXiv: 2605.27309 by Adamantia Stamou, George D. Stamoulis, Konstantinos Varsos, Ramin Khalili, Vasilios A. Siris.

**Figure 3.** Figure 3: Carbon reduction and corresponding inference accuracy and latency [PITH_FULL_IMAGE:figures/full_fig_p004_3.png] view at source ↗

**Figure 4.** Figure 4: Incentives for different days of a month to cap the total carbon [PITH_FULL_IMAGE:figures/full_fig_p005_4.png] view at source ↗

read the original abstract

The widespread use of AI services has raised concerns for its environmental sustainability, towards which recent studies have identified carbon emissions of AI inference as the major contributor. This paper introduces a framework for designing AI inference incentives based on the users' valuation for inference quality and latency, together with their environmental consciousness, while accounting for the tradeoff between carbon emissions and the two QoE parameters. Our approach can accommodate different tradeoffs, that depend on the size and complexity of the AI models and the allocation of resources to serve inference requests. The incentives can be offered through a practical two-tier service subscription that offers users a discount in exchange for reduced carbon emissions. The discounted service option gives the AI provider the flexibility to serve some percentage of inference requests at a lower quality and higher latency during periods of high carbon intensity.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

This paper sketches a two-tier subscription scheme to cut AI inference emissions via user discounts for lower quality or higher latency, but supplies no models or evidence for the required user valuations.

read the letter

The paper's main contribution is a framework that lets AI providers offer discounted subscriptions in exchange for the right to serve some requests at reduced quality or increased latency when carbon intensity is high, incorporating users' valuations for those QoE factors plus their environmental awareness.

It correctly flags inference as the dominant emissions source and notes that tradeoffs vary with model size and resource allocation. The two-tier subscription is a concrete, deployable mechanism rather than a purely technical fix.

The soft spots are central. The whole approach rests on users having measurable valuations over quality, latency, and carbon consciousness that can be elicited and turned into incentive-compatible parameters. No utility model, elicitation procedure, or analysis of revenue neutrality or heterogeneity appears. Without those pieces the claimed flexibility to shift load cannot be checked or implemented. The abstract states that different tradeoffs can be accommodated but gives no derivations or examples.

This is for researchers working on economic mechanisms for sustainable computing services. A reader wanting a worked model, simulations, or empirical grounding will not find it.

I would not recommend sending this to peer review until the valuation and compatibility issues are addressed with at least a formal model and some analysis.

Referee Report

2 major / 1 minor

Summary. The paper introduces a framework for designing AI inference incentives based on the users' valuation for inference quality and latency, together with their environmental consciousness, while accounting for the tradeoff between carbon emissions and the two QoE parameters. The incentives are offered through a two-tier service subscription that provides users a discount in exchange for reduced carbon emissions, giving the provider flexibility to serve some requests at lower quality and higher latency during high carbon intensity periods. The approach is said to accommodate different tradeoffs depending on model size, complexity, and resource allocation.

Significance. If the framework can be made operational with a concrete mechanism for valuation elicitation and incentive design, it would address a timely problem in sustainable computing by providing an economic lever to shift AI inference load away from high-carbon periods. The explicit incorporation of environmental consciousness alongside traditional QoE metrics offers a potentially extensible direction for incentive-compatible resource management in ML serving systems.

major comments (2)

[Abstract] Abstract: the central claim that incentives can be designed from user valuations for quality, latency, and environmental consciousness requires a utility model or optimization formulation that maps these valuations to subscription parameters (discounts, quality/latency reductions) and load-shifting decisions; no such model, equations, or algorithm appears in the manuscript.
[Abstract] Abstract: the assertion that the two-tier subscription gives the provider flexibility to serve a percentage of requests at lower quality/higher latency during high-carbon periods is load-bearing for the claimed practicality, yet the manuscript supplies neither a procedure for determining that percentage nor any analysis of incentive compatibility or revenue neutrality under heterogeneous users.

minor comments (1)

[Abstract] The abstract refers to 'the two QoE parameters' without an explicit enumeration or definition of the quality and latency metrics used.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive comments. We agree that the abstract and manuscript would benefit from greater explicitness on the utility model and analyses. We will revise accordingly and address each point below.

read point-by-point responses

Referee: [Abstract] Abstract: the central claim that incentives can be designed from user valuations for quality, latency, and environmental consciousness requires a utility model or optimization formulation that maps these valuations to subscription parameters (discounts, quality/latency reductions) and load-shifting decisions; no such model, equations, or algorithm appears in the manuscript.

Authors: We agree the abstract does not contain the equations. The manuscript presents the framework conceptually; to make the central claim operational we will add the explicit utility model (combining valuations for quality, latency and environmental consciousness) and the optimization formulation that maps these to subscription parameters and load-shifting decisions in the revised version. revision: yes
Referee: [Abstract] Abstract: the assertion that the two-tier subscription gives the provider flexibility to serve a percentage of requests at lower quality/higher latency during high-carbon periods is load-bearing for the claimed practicality, yet the manuscript supplies neither a procedure for determining that percentage nor any analysis of incentive compatibility or revenue neutrality under heterogeneous users.

Authors: We acknowledge the manuscript currently describes the flexibility at a conceptual level without the requested procedure or analyses. In revision we will supply an optimization-based procedure for setting the percentage and include analysis of incentive compatibility together with revenue neutrality under heterogeneous users. revision: yes

Circularity Check

0 steps flagged

No circularity: conceptual framework with no derivations or self-referential reductions

full rationale

The provided abstract and context describe a high-level incentive framework based on user valuations for quality, latency, and environmental factors, implemented via two-tier subscriptions. No equations, parameter fittings, predictions, or derivation chains are shown. No self-citations are referenced as load-bearing for uniqueness or ansatzes. The approach is presented as accommodating different tradeoffs without reducing any claimed result to its inputs by construction. This is a standard non-finding for a framework proposal paper.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only review; no free parameters, axioms, or invented entities are specified or derivable from the provided text.

pith-pipeline@v0.9.1-grok · 5678 in / 987 out tokens · 35213 ms · 2026-06-29T19:00:18.818782+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

19 extracted references · 3 canonical work pages

[1]

Trends in AI inference energy consumption: Beyond the performance-vs-parameter laws of deep learning,

R. Desislavov, F. Mart ´ınez-Plumed, J. Hern´andez-Orallo, “Trends in AI inference energy consumption: Beyond the performance-vs-parameter laws of deep learning,” Sustainable Computing: Informatics and Sys- tems, vol. 38, 2023

2023
[2]

Sustainable AI: Environmental implications, challenges and opportunities,

C.-J. Wu et al., “Sustainable AI: Environmental implications, challenges and opportunities,” inProc. of Machine Learning and Systems, 2022

2022
[3]

arXiv preprint arXiv:2508.15734 , year=

C. Elsworth et al., “Measuring the environmental impact of delivering AI at Google Scale,” arXiv:2508.15734, August 2025

work page arXiv 2025
[4]

How hungry is ai? benchmarking energy, water, and carbon footprint of llm inference,

N. Jegham, M. Abdelatti, L. Elmoubarki, A. Hendawi, “How Hungry is AI? Benchmarking Energy, Water, and Carbon Footprint of LLM Inference,” arXiv:2505.09598v4, September 2025. 6

work page arXiv 2025
[5]

Clover: Toward Sustainable AI with Carbon-Aware Machine Learning Inference Service,

B. Li, S. Samsi, V . Gadepally, D. Tiwari, “Clover: Toward Sustainable AI with Carbon-Aware Machine Learning Inference Service,” inProc. of International Conference for High Performance Computing, Networking, Storage and Analysis (SC23), 2023

2023
[6]

SPROUT: Green Generative AI with Carbon-Efficient LLM Inference,

B. Li, Y . Jiang, V . Gadepally, D. Tiwari, “SPROUT: Green Generative AI with Carbon-Efficient LLM Inference,” inProc. of Conference on Empirical Methods in Natural Language Processing, 2024

2024
[7]

EcoServe: Designing carbon-aware AI inference systems,

H. Li et al., “EcoServe: Designing Carbon-Aware AI Inference Systems,” arXiv:2502.05043, March 2025

work page arXiv 2025
[8]

Carbon-Aware Quality Adaptation for Energy- Intensive Services,

P. Wiesner et al., “Carbon-Aware Quality Adaptation for Energy- Intensive Services,” inProc. of 16th ACM International Conference on Future and Sustainable Energy Systems, 2025

2025
[9]

MDInference: Balancing Inference Accuracy and Latency for Mobile Applications,

S. S. Ogden, T. Guo, “MDInference: Balancing Inference Accuracy and Latency for Mobile Applications,” inProc. of IEEE International Conference on Cloud Engineering (IC2E), 2020

2020
[10]

Reconciling High Accuracy, Cost-Efficiency, and Low Latency of Inference Serving Systems,

M. Salmani et al., “Reconciling High Accuracy, Cost-Efficiency, and Low Latency of Inference Serving Systems,” inProc. of 3rd Workshop on Machine Learning and Systems (EuroMLSys), 2023

2023
[11]

MOSEL: Inference Serving Using Dynamic Modality Selection,

B. Hu, L. Xu, J. Moon, N. J. Yadwadkar, A. Akella, “MOSEL: Inference Serving Using Dynamic Modality Selection,” inProc. of Conference on Empirical Methods in Natural Language Processing, 2024

2024
[12]

IPA: Inference Pipeline Adaptation to Achieve High Accuracy and Cost-Efficiency,

S. Ghafouri et al., “IPA: Inference Pipeline Adaptation to Achieve High Accuracy and Cost-Efficiency,” inProc. of Companion of the 16th ACM/SPEC International Conference on Performance Engineering (ICPE), 2025

2025
[13]

One Size Does Not Fit All: Quantifying and Exposing the Accuracy-Latency Trade-Off in Machine Learning Cloud Service APIs via Tolerance Tiers,

M. Halpern et al., “One Size Does Not Fit All: Quantifying and Exposing the Accuracy-Latency Trade-Off in Machine Learning Cloud Service APIs via Tolerance Tiers,” inProc. of IEEE International Symposium on Performance Analysis of Systems and Software, 2019

2019
[14]

Towards low-cost and energy- aware inference for EdgeAI services via model swapping,

D. Trihinas, P. Michael, M. Symeonides, “Towards low-cost and energy- aware inference for EdgeAI services via model swapping,” inProc. of IEEE International Conference on Cloud Engineering (IC2E), 2024

2024
[15]

Metrics of Success: Evaluating User Satisfaction in AI Chatbots,

C. G. Møller, K. E. Ang, M. de L. Bongiovanni, M. S. Khalid, J. Wu, “Metrics of Success: Evaluating User Satisfaction in AI Chatbots,” in Proc. of International Conference on Advances in Artificial Intelligence (ICAAI), 2024

2024
[16]

Slower is Greener: Acceptance of Eco-feedback Interventions on Carbon Heavy Internet Services,

H. Kim, S. Young, X. Chen, U. Gupta, J. Hester, “Slower is Greener: Acceptance of Eco-feedback Interventions on Carbon Heavy Internet Services,” ACM Journal on Computing and Sustainable Societies, V ol. 3, issue 2, no. 7, pp 1-21, April 2025

2025
[17]

The European Union Emission Trading System and its role for green budgeting development — the case of EU member states,

K. Marchewka-Bartkowiak, “The European Union Emission Trading System and its role for green budgeting development — the case of EU member states,” Current Opinion in Environmental Sustainability, vol. 65, December 2023

2023
[18]

World Bank. 2025. State and Trends of Carbon Pricing 2025. http://hdl.handle.net/10986/43277 Last accessed: 26/5/2026

2025
[19]

Beyond Self- diagnosis: How a Chatbot-based Symptom Checker Should Respond,

Y . You, C.-H. Tsai, Y . Li, F. Ma, C. Heron, X. Gui, “Beyond Self- diagnosis: How a Chatbot-based Symptom Checker Should Respond,” ACM Transactions on Computer-Human Interaction, vol. 30, no. 4, pp 1-44, March 2023. Vasilios A. Sirisis a Professor at the Department of Informat- ics, School of Information Sciences and Technology, Athens University of Econ...

2023

[1] [1]

Trends in AI inference energy consumption: Beyond the performance-vs-parameter laws of deep learning,

R. Desislavov, F. Mart ´ınez-Plumed, J. Hern´andez-Orallo, “Trends in AI inference energy consumption: Beyond the performance-vs-parameter laws of deep learning,” Sustainable Computing: Informatics and Sys- tems, vol. 38, 2023

2023

[2] [2]

Sustainable AI: Environmental implications, challenges and opportunities,

C.-J. Wu et al., “Sustainable AI: Environmental implications, challenges and opportunities,” inProc. of Machine Learning and Systems, 2022

2022

[3] [3]

arXiv preprint arXiv:2508.15734 , year=

C. Elsworth et al., “Measuring the environmental impact of delivering AI at Google Scale,” arXiv:2508.15734, August 2025

work page arXiv 2025

[4] [4]

How hungry is ai? benchmarking energy, water, and carbon footprint of llm inference,

N. Jegham, M. Abdelatti, L. Elmoubarki, A. Hendawi, “How Hungry is AI? Benchmarking Energy, Water, and Carbon Footprint of LLM Inference,” arXiv:2505.09598v4, September 2025. 6

work page arXiv 2025

[5] [5]

Clover: Toward Sustainable AI with Carbon-Aware Machine Learning Inference Service,

B. Li, S. Samsi, V . Gadepally, D. Tiwari, “Clover: Toward Sustainable AI with Carbon-Aware Machine Learning Inference Service,” inProc. of International Conference for High Performance Computing, Networking, Storage and Analysis (SC23), 2023

2023

[6] [6]

SPROUT: Green Generative AI with Carbon-Efficient LLM Inference,

B. Li, Y . Jiang, V . Gadepally, D. Tiwari, “SPROUT: Green Generative AI with Carbon-Efficient LLM Inference,” inProc. of Conference on Empirical Methods in Natural Language Processing, 2024

2024

[7] [7]

EcoServe: Designing carbon-aware AI inference systems,

H. Li et al., “EcoServe: Designing Carbon-Aware AI Inference Systems,” arXiv:2502.05043, March 2025

work page arXiv 2025

[8] [8]

Carbon-Aware Quality Adaptation for Energy- Intensive Services,

P. Wiesner et al., “Carbon-Aware Quality Adaptation for Energy- Intensive Services,” inProc. of 16th ACM International Conference on Future and Sustainable Energy Systems, 2025

2025

[9] [9]

MDInference: Balancing Inference Accuracy and Latency for Mobile Applications,

S. S. Ogden, T. Guo, “MDInference: Balancing Inference Accuracy and Latency for Mobile Applications,” inProc. of IEEE International Conference on Cloud Engineering (IC2E), 2020

2020

[10] [10]

Reconciling High Accuracy, Cost-Efficiency, and Low Latency of Inference Serving Systems,

M. Salmani et al., “Reconciling High Accuracy, Cost-Efficiency, and Low Latency of Inference Serving Systems,” inProc. of 3rd Workshop on Machine Learning and Systems (EuroMLSys), 2023

2023

[11] [11]

MOSEL: Inference Serving Using Dynamic Modality Selection,

B. Hu, L. Xu, J. Moon, N. J. Yadwadkar, A. Akella, “MOSEL: Inference Serving Using Dynamic Modality Selection,” inProc. of Conference on Empirical Methods in Natural Language Processing, 2024

2024

[12] [12]

IPA: Inference Pipeline Adaptation to Achieve High Accuracy and Cost-Efficiency,

S. Ghafouri et al., “IPA: Inference Pipeline Adaptation to Achieve High Accuracy and Cost-Efficiency,” inProc. of Companion of the 16th ACM/SPEC International Conference on Performance Engineering (ICPE), 2025

2025

[13] [13]

One Size Does Not Fit All: Quantifying and Exposing the Accuracy-Latency Trade-Off in Machine Learning Cloud Service APIs via Tolerance Tiers,

M. Halpern et al., “One Size Does Not Fit All: Quantifying and Exposing the Accuracy-Latency Trade-Off in Machine Learning Cloud Service APIs via Tolerance Tiers,” inProc. of IEEE International Symposium on Performance Analysis of Systems and Software, 2019

2019

[14] [14]

Towards low-cost and energy- aware inference for EdgeAI services via model swapping,

D. Trihinas, P. Michael, M. Symeonides, “Towards low-cost and energy- aware inference for EdgeAI services via model swapping,” inProc. of IEEE International Conference on Cloud Engineering (IC2E), 2024

2024

[15] [15]

Metrics of Success: Evaluating User Satisfaction in AI Chatbots,

C. G. Møller, K. E. Ang, M. de L. Bongiovanni, M. S. Khalid, J. Wu, “Metrics of Success: Evaluating User Satisfaction in AI Chatbots,” in Proc. of International Conference on Advances in Artificial Intelligence (ICAAI), 2024

2024

[16] [16]

Slower is Greener: Acceptance of Eco-feedback Interventions on Carbon Heavy Internet Services,

H. Kim, S. Young, X. Chen, U. Gupta, J. Hester, “Slower is Greener: Acceptance of Eco-feedback Interventions on Carbon Heavy Internet Services,” ACM Journal on Computing and Sustainable Societies, V ol. 3, issue 2, no. 7, pp 1-21, April 2025

2025

[17] [17]

The European Union Emission Trading System and its role for green budgeting development — the case of EU member states,

K. Marchewka-Bartkowiak, “The European Union Emission Trading System and its role for green budgeting development — the case of EU member states,” Current Opinion in Environmental Sustainability, vol. 65, December 2023

2023

[18] [18]

World Bank. 2025. State and Trends of Carbon Pricing 2025. http://hdl.handle.net/10986/43277 Last accessed: 26/5/2026

2025

[19] [19]

Beyond Self- diagnosis: How a Chatbot-based Symptom Checker Should Respond,

Y . You, C.-H. Tsai, Y . Li, F. Ma, C. Heron, X. Gui, “Beyond Self- diagnosis: How a Chatbot-based Symptom Checker Should Respond,” ACM Transactions on Computer-Human Interaction, vol. 30, no. 4, pp 1-44, March 2023. Vasilios A. Sirisis a Professor at the Department of Informat- ics, School of Information Sciences and Technology, Athens University of Econ...

2023