arxiv: 2605.06482 · v1 · submitted 2026-05-07 · 💰 econ.EM · cs.CY

Recognition: unknown

Scaling the Queue: Reinforcement Learning for Equitable Call Classification Capacity in NYC Municipal Complaint Systems

Akhil Fernando-Bell, Ali Hasan, Ammar Syed, Bella Ge, Ellie Bae, Farzaan Naeem, Haoying Wang, Imran Isa-Dutse, Irene Aldridge, Ishita Gupta, Jiwon Jeong, Kai Maeda, Karl Muller, Michael Twersky, Nadav Yochman, Nathan Tai, Neha Konduru, Nicholas Donat, Nicholas Goguen-Compagnoni, Nolan McKenna, Pierce Hoenigman, Rishabh Patel, Siddhesh Darak, Tishya Khanna, Yixuan Liu, Zachary Sheldon, Zening Wang, Zexun Yao

Pith reviewed 2026-05-08 03:31 UTC · model grok-4.3

classification 💰 econ.EM cs.CY

keywords reinforcement learningequity311 callsmunicipal servicesNew York Citycomplaint routingMarkov Decision Processservice disparities

0 comments

The pith

Reinforcement learning agents can route NYC 311 complaints to boost throughput and narrow equity gaps across income and racial lines.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Municipal complaint systems face a mismatch between high incoming volumes and limited staff capacity for triage and routing, which produces uneven service quality that tracks demographic patterns. The paper models six operational domains at New York City's Department of Buildings as Markov Decision Processes and trains reinforcement learning agents to assign each complaint to one of four actions: escalate, batch, defer, or inspect now. Equity in classification coverage is placed directly in the reward function alongside throughput and misclassification cost, so the learned policies aim to augment human workers rather than replace them. Post-training analysis finds that complaint recurrence and neighborhood statistics predict real violations more reliably than raw call volume alone.

Core claim

The paper claims that formalizing each of the six DOB domains as a Markov Decision Process, with equitable classification coverage included as a first-class component of the reward, lets reinforcement learning agents learn routing policies that increase overall throughput, reduce misclassification costs, and actively reduce historical disparities in service delivery.

What carries the argument

Equity-augmented Markov Decision Processes (MDPs) for each domain, in which states include complaint features and neighborhood statistics, actions are the four routing choices, and the reward function balances operational goals with narrowing service gaps.

If this is right

Agents can augment rather than replace human classifiers while increasing total complaint processing capacity.
Recurrence and neighborhood-level statistics become stronger signals for routing decisions than complaint volume.
The same MDP structure applies across the six listed domains including boiler safety and heat complaints.
Routing policies can be learned that simultaneously pursue throughput and equity objectives.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Similar MDP-based routers could be tested in other cities' 311 systems that face volume-capacity mismatches.
Shifting triage emphasis toward recurrence patterns could change how complaint systems collect and use neighborhood data.
Real-world rollout would need ongoing audits to detect whether the equity term in the reward produces unintended effects on specific groups.

Load-bearing premise

The MDP formulation and reward function can be specified so that maximizing the equity-augmented objective actually narrows real-world service gaps without creating new unintended disparities or violating operational constraints.

What would settle it

After deployment, if measured resolution times or complaint outcomes across income or racial neighborhoods show no narrowing of gaps or show new disparities, or if overall misclassification rates increase, the central claim would not hold.

read the original abstract

Municipal 311 call centers and complaint intake systems face a structural mismatch between incoming volume and classification capacity. The staff and heuristics available to triage, route, and prioritize complaints cannot scale with demand. This bottleneck produces differential service quality that follows income and racial lines (\cite{liu2024sla}). We develop an equity-centered reinforcement learning (RL) framework that augments call classification capacity across six New York City Department of Buildings (DOB) operational domains: boiler safety, crane and derrick oversight, heat and hot water complaints, housing complaint triage, scaffold safety, and Natural Area District (SNAD) protection. Rather than replacing human classifiers, our agents act as intelligent intake routers: learning to assign incoming complaints to action categories: escalate, batch, defer, inspect now. The proposed technique is designed to maximize throughput, minimize misclassification cost, and actively narrow historical equity gaps in service delivery. We formalize each domain as a Markov Decision Process (MDP) in which equitable classification coverage is a first-class reward objective. Post-hoc SHAP attribution reveals that complaint recurrence and neighborhood-level statistics are stronger predictors of actionable violations than raw complaint volume. This finding has direct implications for complaint routing given the demographic correlates of those features.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

This paper frames NYC DOB complaint routing as an equity-augmented MDP across six domains and surfaces a useful SHAP insight on recurrence and neighborhood predictors, but shows no results confirming the equity term narrows real gaps.

read the letter

The main takeaway is that the authors take a standard MDP routing setup and make equitable coverage a first-class reward objective for boiler, crane, heat, housing, scaffold, and SNAD complaints at the Department of Buildings. That domain-specific application plus the post-hoc finding that recurrence and neighborhood stats beat raw volume as predictors of violations is the concrete piece worth noting. It keeps humans in the loop by having the agent decide escalate, batch, defer, or inspect-now actions rather than claiming full automation. The problem statement on capacity mismatch and its equity consequences is grounded in the cited 311 literature and feels operationally relevant. The MDP formalization itself is straightforward and avoids overclaiming novelty on the algorithm side. The soft spot is the missing execution. No training details, reward-function weights, baseline comparisons, or before-after metrics appear to show whether the equity term actually reduces service disparities without hurting throughput or creating new ones. The central claim therefore rests on the design choice rather than measured outcomes, which leaves the weakest assumption—that the reward can be specified to deliver measurable narrowing—untested in the visible sections. This is the sort of applied RL paper that city operations teams or researchers working on public-sector decision systems might want to see. A referee could usefully press for the empirical validation and sensitivity checks on the equity weight. I would send it to peer review rather than desk reject, provided the full manuscript contains the implementation and results the abstract outlines.

Referee Report

1 major / 1 minor

Summary. The paper proposes an equity-centered reinforcement learning (RL) framework to augment call classification capacity in New York City Department of Buildings (DOB) 311 systems across six domains (boiler safety, crane oversight, heat/hot water, housing triage, scaffold safety, SNAD protection). Each domain is formalized as a Markov Decision Process (MDP) in which agents route complaints to actions (escalate, batch, defer, inspect now) while treating equitable classification coverage as a first-class reward objective alongside throughput and misclassification cost. A post-hoc SHAP analysis identifies complaint recurrence and neighborhood-level statistics as stronger predictors of violations than raw volume, with implications for routing given demographic correlations.

Significance. If the proposed MDP formulation and equity-augmented reward can be shown to produce measurable reductions in service disparities without violating operational constraints or creating new inequities, the work could offer a practical template for applying RL to equitable public administration. The explicit inclusion of equity in the reward structure and the SHAP-based predictor insight represent conceptual strengths. However, the absence of any reported implementation, training outcomes, baselines, or validation metrics substantially limits the current significance of the contribution.

major comments (1)

Abstract: the central claim that the equity-centered RL framework augments capacity and narrows historical equity gaps rests entirely on an unexecuted description; no MDP state space, transition dynamics, reward function specification, training results, baseline comparisons, or empirical validation of equity improvements are provided, making it impossible to assess whether the approach achieves its stated objectives.

minor comments (1)

The abstract references a citation (liu2024sla) but provides no corresponding reference list entry or details on how the cited SLA disparities inform the MDP design.

Simulated Author's Rebuttal

1 responses · 1 unresolved

We thank the referee for their constructive and detailed review. The feedback correctly identifies that our manuscript proposes a conceptual framework without full empirical execution. We respond point-by-point below and clarify the intended scope of the contribution.

read point-by-point responses

Referee: Abstract: the central claim that the equity-centered RL framework augments capacity and narrows historical equity gaps rests entirely on an unexecuted description; no MDP state space, transition dynamics, reward function specification, training results, baseline comparisons, or empirical validation of equity improvements are provided, making it impossible to assess whether the approach achieves its stated objectives.

Authors: We agree that the current manuscript presents a high-level proposal and formalization rather than a fully implemented RL system with training results or validation. The abstract describes the intended MDP structure and equity-augmented reward but does not include the detailed specifications or empirical outcomes. We will revise the manuscript to add explicit definitions: the state space will incorporate complaint features (type, recurrence, location), neighborhood demographics, and historical violation rates; the action space is escalate/batch/defer/inspect-now; transitions will be derived from empirical complaint-to-outcome mappings; and the reward will be a weighted sum of throughput, misclassification penalty, and an equity term (e.g., negative disparity in coverage across income/racial groups). The post-hoc SHAP analysis on historical data is already performed and supports the predictor insights. However, no RL training, baselines, or equity-impact validation have been conducted in this work, as the paper focuses on the design template. We will update the abstract and add a dedicated MDP specification section to make the proposal fully evaluable while accurately reflecting the absence of execution results. revision: partial

standing simulated objections not resolved

Absence of training results, baseline comparisons, and empirical validation of equity improvements or capacity augmentation, as the RL agents have not been implemented or trained in the current study.

Circularity Check

0 steps flagged

No significant circularity detected

full rationale

The provided manuscript text consists of an abstract that describes formalizing operational domains as MDPs with equitable classification coverage as a first-class reward objective, but contains no equations, derivations, fitted parameters, or performance metrics. No load-bearing steps reduce any claimed result to its inputs by construction, self-citation, or renaming. The equity reward is presented as a modeling choice rather than a derived quantity, and the central claim remains a specification of an RL framework without visible internal reductions. This is the most common honest non-finding for papers whose technical sections are not supplied.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract provides no explicit free parameters, axioms, or invented entities; the framework implicitly relies on standard MDP assumptions (Markov property, reward additivity) and the existence of historical complaint data with demographic labels, but none are stated as novel.

pith-pipeline@v0.9.0 · 5643 in / 1178 out tokens · 50339 ms · 2026-05-08T03:31:31.701790+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

22 extracted references · 1 canonical work pages

[1]

Gabriel Agostini, Emma Pierson, and Nikhil Garg. 2024. A Bayesian Spatial Model to Correct Under-Reporting in Urban Crowdsourcing. InProceedings of the 38th AAAI Conference on Artificial Intelligence. AAAI Press

2024
[2]

Anna Brown, Alexandra Chouldechova, Emily Putnam-Hornstein, Andrew Tai- lor, and Rhema Vaithianathan. 2019. Toward Algorithmic Accountability in Public Services: A Qualitative Study of Affected Community Perspectives on Algorithmic Decision-Making in Child Welfare Services. InProceedings of the 2019 CHI Conference on Human Factors in Computing Systems. ACM, 1–12

2019
[3]

Burke, Deborah Estrin, Mark Hansen, Andrew Parker, Nithya Ra- manathan, Sasank Reddy, and Mani B

Jeffrey A. Burke, Deborah Estrin, Mark Hansen, Andrew Parker, Nithya Ra- manathan, Sasank Reddy, and Mani B. Srivastava. 2006. Participatory Sensing. InProceedings of the ACM SenSys Workshop on World-Sensor-Web. 1–6

2006
[4]

Alexandra Chouldechova. 2017. Fair Prediction with Disparate Impact: A Study of Bias in Recidivism Risk Scores.Big Data5, 2 (2017), 153–163

2017
[5]

2016.Evicted: Poverty and Profit in the American City

Matthew Desmond. 2016.Evicted: Poverty and Profit in the American City. Crown Publishers, New York

2016
[6]

Catherine D’Ignazio and Lauren F. Klein. 2023.Data Feminism. MIT Press, Cambridge, MA

2023
[7]

Deborah Estrin. 2014. Small Data, Where n = Me.Commun. ACM57, 4 (2014), 32–34

2014
[8]

2018.Automating Inequality: How High-Tech Tools Profile, Police, and Punish the Poor

Virginia Eubanks. 2018.Automating Inequality: How High-Tech Tools Profile, Police, and Punish the Poor. St. Martin’s Press, New York

2018
[9]

Glaeser, Andrew Hillis, Scott D

Edward L. Glaeser, Andrew Hillis, Scott D. Kominers, and Michael Luca. 2016. Crowdsourcing City Government: Using Tournaments to Improve Inspection Accuracy.American Economic Review: Papers & Proceedings106, 5 (2016), 114– 118

2016
[10]

Ben Green. 2021. The Contestation of Tech Ethics: A Sociotechnical Approach to Technology Ethics in Practice.Journal of Social Computing2, 3 (2021), 209–225

2021
[11]

Killian, Bryan Wilder, Amit Sharma, Vinod Choudhary, Bistra Dilkina, and Milind Tambe

Jackson A. Killian, Bryan Wilder, Amit Sharma, Vinod Choudhary, Bistra Dilkina, and Milind Tambe. 2019. Learning to Prescribe Interventions for Tuberculosis Patients Using Digital Adherence Data. InProceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 1359– 1369

2019
[12]

Zhi Liu, Uma Bhandaram, and Nikhil Garg. 2023. Quantifying Spatial Under- reporting Disparities in Resident Crowdsourcing.Nature Computational Science 3 (2023), 1037–1048. doi:10.1038/s43588-023-00572-6

work page doi:10.1038/s43588-023-00572-6 2023
[13]

Zhi Liu and Nikhil Garg. 2024. Redesigning Service Level Agreements: Equity and Efficiency in City Government Operations. InProceedings of the 25th ACM Conference on Economics and Computation (EC’24). ACM

2024
[14]

Lundberg and Su-In Lee

Scott M. Lundberg and Su-In Lee. 2017. A Unified Approach to Interpreting Model Predictions.Advances in Neural Information Processing Systems30 (2017), 4765–4774

2017
[15]

Rusu, Joel Veness, Marc G

Volodymyr Mnih, Koray Kavukcuoglu, David Silver, Andrei A. Rusu, Joel Veness, Marc G. Bellemare, Alex Graves, Martin Riedmiller, Andreas K. Fidjeland, Georg Ostrovski, Stig Petersen, Charles Beattie, Amir Sadik, Ioannis Antonoglou, Helen King, Dharshan Kumaran, Daan Wierstra, Shane Legg, and Demis Hassabis. 2015. Human-Level Control through Deep Reinforce...

2015
[16]

Broken Windows

Daniel T. O’Brien, Reginald R. Sampson, and Christopher Winship. 2017. Eco- metrics in the Age of Big Data: Measuring and Assessing “Broken Windows” Using Large-Scale Administrative Records.Sociological Methodology45, 1 (2017), 101–147

2017
[17]

Geoffrey Pettet, Sayyed Mohsen Vazirizade, Ayan Mukhopadhyay, Said AlMistar- ihi, and Abhishek Dubey. 2021. Incident-Driven Dispatch of Emergency Services. InProceedings of the Thirty-Third Annual Conference on Innovative Applications of Artificial Intelligence (IAAI-21)

2021
[18]

Roijers, Peter Vamplew, Shimon Whiteson, and Richard Dazeley

Diederik M. Roijers, Peter Vamplew, Shimon Whiteson, and Richard Dazeley
[19]

A Survey of Multi-Objective Sequential Decision-Making.Journal of Artificial Intelligence Research48 (2013), 67–113

2013
[20]

Rosenbaum and Donald B

Paul R. Rosenbaum and Donald B. Rubin. 1983. The Central Role of the Propensity Score in Observational Studies for Causal Effects.Biometrika70, 1 (1983), 41–55

1983
[21]

2017.The Color of Law: A Forgotten History of How Our Government Segregated America

Richard Rothstein. 2017.The Color of Law: A Forgotten History of How Our Government Segregated America. Liveright Publishing, New York

2017
[22]

Williams

Ronald J. Williams. 1992. Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning.Machine Learning8, 3–4 (1992), 229–256. A Hyperparameter Tables B Domain-Specific MDP Summaries Irene Aldridge, Ellie Bae, Siddhesh Darak, Nicholas Donat, Akhil Fernando-Bell, Bella Ge, Nicholas Goguen-Compagnoni, Ishita Gupta, Ali Hasan, Pi...

1992