Learning Under Moral Hazard with Instrumental Regression and Generalized Method of Moments

Shiliang Zuo

arxiv: 2405.20642 · v3 · submitted 2024-05-31 · 💻 cs.LG · stat.ML

Learning Under Moral Hazard with Instrumental Regression and Generalized Method of Moments

Shiliang Zuo This is my paper

Pith reviewed 2026-05-24 01:02 UTC · model grok-4.3

classification 💻 cs.LG stat.ML

keywords moral hazardprincipal-agent problemcontract designinstrumental regressiongeneralized method of momentsmultitaskingobservational learning

0 comments

The pith

Instrumental regression and GMM can estimate good contracts when actions are hidden under moral hazard.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper studies the multitasking principal-agent contract design problem where individual actions cannot be perfectly observed. It shows that instrumental regression combined with the generalized method of moments estimator can recover or learn effective contracts from observational signals alone. A sympathetic reader would care because this supplies a data-driven route to policy design in settings where direct monitoring fails, such as employment incentives or regulatory contracts. The work also supplies a uniformity characterization of the shape taken by the optimal contract.

Core claim

In the multitasking principal-agent setting with moral hazard, instrumental regression and the generalized method of moments estimator can be applied to observational data to estimate or learn a good contract; as a side result the optimal contract admits a uniform characterization of its shape.

What carries the argument

Instrumental regression paired with the GMM estimator applied to signals from hidden actions in the principal-agent contract problem.

If this is right

Observational signals suffice to learn contracts that induce desired behavior without direct action monitoring.
The optimal contract in the multitasking setting has a uniform shape that can be characterized independently of specific parameter values.
Machine-learning policy design extends to economic environments previously blocked by moral hazard.
Contract parameters become recoverable from equilibrium data generated by hidden effort choices.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same instrumental-variable approach could be tested on real employment or insurance datasets where effort is only partially observed.
If the uniformity result holds, it may simplify numerical search for contracts in higher-dimensional multitasking problems.
The method opens the possibility of online learning of contracts by repeatedly applying GMM updates as new observational batches arrive.

Load-bearing premise

Valid instruments exist that allow the GMM estimator to identify the contract parameters from observational signals even though the agent's actions remain hidden.

What would settle it

A simulation or empirical dataset in which no valid instruments can be constructed or in which GMM estimates fail to recover known optimal contract parameters under controlled moral hazard would show the method does not work.

Figures

Figures reproduced from arXiv: 2405.20642 by Shiliang Zuo.

**Figure 2.** Figure 2: The estimation error [PITH_FULL_IMAGE:figures/full_fig_p015_2.png] view at source ↗

read the original abstract

Machine learning has become increasingly popular in informing data-driven policy-making. Policies influence behavior in individuals or populations, and ideally, through observational signals, policy-makers learn which policies are effective. However, in many settings, individual actions cannot be perfectly observed. This issue, known in economics as moral hazard, poses a significant challenge. In this work, we study the foundational multitasking principal-agent contract design problem and demonstrate how instrumental regression and the generalized method of moments (GMM) estimator can be used to estimate or learn a good contract. As a bonus result, we also give a uniformity characterization of the shape of the optimal contract.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper derives moment conditions from incentive compatibility to apply IV regression and GMM for recovering linear contracts in a multitasking moral hazard model, plus a uniformity result on optimal contract shape under quadratic costs.

read the letter

The core contribution here is showing that standard econometric tools can identify contract parameters from observational data when the agent's action is hidden, by constructing moments directly from the incentive-compatibility constraints. Under the maintained assumptions of linear contracts and quadratic effort costs, the instruments satisfy the usual relevance and exogeneity conditions, and the uniformity characterization drops out of the first-order conditions without extra restrictions. That part looks clean and self-contained on the page. The work is narrow but technically precise; it stays within one class of principal-agent problems and does not claim broader applicability. The main limitation is the parametric restrictions themselves—linear sharing rules and quadratic costs—which make identification tractable but also limit how far the results travel to more general contract spaces or cost functions. No obvious circularity or post-hoc fitting appears in the derivations. This is the kind of paper that belongs in an econometrics or algorithmic economics venue rather than a general ML conference. A reader already working on observational contract design or on GMM applications in mechanism design would get direct value from the identification argument. It is coherent on its own terms and formally grounded enough to warrant referee time, even if the assumptions keep the scope modest.

Referee Report

0 major / 3 minor

Summary. The paper studies the multitasking principal-agent contract design problem under moral hazard and shows that instrumental variable regression combined with the generalized method of moments (GMM) can recover a good contract from observational data. It derives moment conditions from the agent's incentive-compatibility constraints and the principal's objective, establishes instrument relevance and exogeneity under linear-contract and quadratic-cost assumptions, and provides a uniformity characterization of the optimal contract shape that follows from the first-order conditions.

Significance. If the identification and uniformity results hold, the work supplies a concrete econometric route to data-driven contract learning in settings where actions are hidden, bridging contract theory with standard IV/GMM tools. The explicit derivation of moments from IC constraints and the parameter-free uniformity claim are strengths that could support reproducible applications in policy design.

minor comments (3)

§3: the statement that instruments satisfy exogeneity could be accompanied by an explicit statement of the maintained assumptions on the error term and the agent's type distribution to make the relevance/exogeneity argument fully self-contained.
The uniformity characterization (bonus result) is presented without a dedicated theorem number; numbering it and stating the precise domain of the uniformity (e.g., over all linear contracts) would improve readability.
Simulation section: the reported GMM standard errors appear to be computed under the assumption of homoskedasticity; a brief robustness check under heteroskedasticity would strengthen the empirical illustration.

Simulated Author's Rebuttal

0 responses · 0 unresolved

We thank the referee for their positive summary of our work and the recommendation for minor revision. The referee's description accurately reflects the paper's contributions on applying IV regression and GMM to contract learning under moral hazard, along with the uniformity characterization.

Circularity Check

0 steps flagged

No significant circularity; derivation self-contained

full rationale

No load-bearing step reduces by construction to its inputs. Moment conditions are obtained directly from the agent's incentive-compatibility constraints and principal's objective under the maintained linear-contract and quadratic-cost assumptions. Instruments are shown to satisfy relevance and exogeneity from the model primitives. The uniformity characterization of optimal-contract shape follows from the first-order conditions without additional parametric restrictions or self-citation chains. The GMM estimator is applied to these independently derived moments rather than fitting a parameter and relabeling the fit as a prediction. No equations or self-citations in the provided text create a definitional loop or imported uniqueness result.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only review provides no equations, parameters, or modeling choices; ledger is empty by necessity.

pith-pipeline@v0.9.0 · 5621 in / 1091 out tokens · 19940 ms · 2026-05-24T01:02:57.534406+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

39 extracted references · 39 canonical work pages

[1]

and Darzi, A

Ashrafian, H. and Darzi, A. (2018). Transforming health policy through machine learning. PLoS Medicine , 15(11):e1002692

work page 2018
[2]

Bastani, H., Bayati, M., and Khosravi, K. (2021). Mostly exploration-free algorithms for contextual bandits. Management Science , 67(3):1329--1349

work page 2021
[3]

and Lafontaine, F

Bhattacharyya, S. and Lafontaine, F. (1995). Double-sided moral hazard and the nature of share contracts. The RAND Journal of Economics , pages 761--781

work page 1995
[4]

Carroll, G. (2015). Robustness and linear contracts. American Economic Review , 105(2):536--563

work page 2015
[5]

and Basu, D

Della Vecchia, R. and Basu, D. (2023). Online instrumental variable regression: Regret analysis and bandit feedback. arXiv preprint arXiv:2302.09357

work page arXiv 2023
[6]

Dong, J., Roth, A., Schutzman, Z., Waggoner, B., and Wu, Z. S. (2018). Strategic classification from revealed preferences. In Proceedings of the 2018 ACM Conference on Economics and Computation , pages 55--70

work page 2018
[7]

Duetting, P., Ezra, T., Feldman, M., and Kesselheim, T. (2024). Multi-agent combinatorial contracts. arXiv preprint arXiv:2405.08260

work page arXiv 2024
[8]

D \"u tting, P., Ezra, T., Feldman, M., and Kesselheim, T. (2022). Combinatorial contracts. In 2021 IEEE 62nd Annual Symposium on Foundations of Computer Science (FOCS) , pages 815--826. IEEE

work page 2022
[9]

D \"u tting, P., Roughgarden, T., and Talgam-Cohen, I. (2019). Simple versus optimal contracts. In Proceedings of the 2019 ACM Conference on Economics and Computation , pages 369--387

work page 2019
[10]

Fuller, W. A. (2009). Measurement error models . John Wiley & Sons

work page 2009
[11]

Fuster, A., Goldsmith-Pinkham, P., Ramadorai, T., and Walther, A. (2022). Predictably unequal? the effects of machine learning on credit markets. The Journal of Finance , 77(1):5--47

work page 2022
[12]

Youtube partner earnings overview

Google (2024). Youtube partner earnings overview

work page 2024
[13]

Greene, W. H. (2008). Econometric analysis . Pearson/Prentice Hall, Upper Saddle River, N.J, 6th ed. edition

work page 2008
[14]

R., and Weinberg, S

Guruganesh, G., Kolumbus, Y., Schneider, J., Talgam-Cohen, I., Vlatakis-Gkaragkounis, E.-V., Wang, J. R., and Weinberg, S. M. (2024). Contracting with a learning agent. arXiv preprint arXiv:2401.16198

work page arXiv 2024
[15]

Hardt, M., Megiddo, N., Papadimitriou, C., and Wootters, M. (2016). Strategic classification. In Proceedings of the 2016 ACM conference on innovations in theoretical computer science , pages 111--122

work page 2016
[16]

Harris, K., Ngo, D. D. T., Stapleton, L., Heidari, H., and Wu, S. (2022). Strategic instrumental variable regression: Recovering causal relationships from strategic responses. In International Conference on Machine Learning , pages 8502--8522. PMLR

work page 2022
[17]

Hilbert, S., Coors, S., Kraus, E., Bischl, B., Lindl, A., Frei, M., Wild, J., Krauss, S., Goretzko, D., and Stachl, C. (2021). Machine learning for the educational sciences. Review of Education , 9(3):e3310

work page 2021
[18]

Ho, C.-J., Slivkins, A., and Vaughan, J. W. (2014). Adaptive contract design for crowdsourcing markets: Bandit algorithms for repeated principal-agent problems. In Proceedings of the fifteenth ACM conference on Economics and computation , pages 359--376

work page 2014
[19]

Holmstr \"o m, B. (1979). Moral hazard and observability. The Bell journal of economics , pages 74--91

work page 1979
[20]

Holmstrom, B. (1982). Moral hazard in teams. The Bell journal of economics , pages 324--340

work page 1982
[21]

and Milgrom, P

Holmstrom, B. and Milgrom, P. (1991). Multitask principal--agent analyses: Incentive contracts, asset ownership, and job design. The Journal of Law, Economics, and Organization , 7(special\_issue):24--52

work page 1991
[22]

and Rosenfeld, N

Horowitz, G. and Rosenfeld, N. (2023). Causal strategic classification: A tale of two shifts. In International Conference on Machine Learning , pages 13233--13253. PMLR

work page 2023
[23]

and Adebayo, J

Hurley, M. and Adebayo, J. (2016). Credit scoring in the era of big data. Yale JL & Tech. , 18:148

work page 2016
[24]

Jain, S., Pattanayak, K., Krishnamurthy, V., and Berry, C. (2023). Adaptive eccm for mitigating smart jammers. In ICASSP 2023-2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) , pages 1--5. IEEE

work page 2023
[25]

H., Roth, A., Waggoner, B., and Wu, Z

Kannan, S., Morgenstern, J. H., Roth, A., Waggoner, B., and Wu, Z. S. (2018). A smoothed analysis of the greedy algorithm for the linear contextual bandit problem. Advances in neural information processing systems , 31

work page 2018
[26]

Kleinberg, R. (2004). Nearly tight bounds for the continuum-armed bandit problem. Advances in Neural Information Processing Systems , 17

work page 2004
[27]

and Szepesv \'a ri, C

Lattimore, T. and Szepesv \'a ri, C. (2020). Bandit algorithms . Cambridge University Press

work page 2020
[28]

Miller, J., Milli, S., and Hardt, M. (2020). Strategic classification is causal modeling in disguise. In International Conference on Machine Learning , pages 6917--6926. PMLR

work page 2020
[29]

Perdomo, J., Zrnic, T., Mendler-D \"u nner, C., and Hardt, M. (2020). Performative prediction. In International Conference on Machine Learning , pages 7599--7609. PMLR

work page 2020
[30]

T., Zhu, H., and Ye, J

Qin, Z. T., Zhu, H., and Ye, J. (2022). Reinforcement learning for ridesharing: An extended survey. Transportation Research Part C: Emerging Technologies , 144:103852

work page 2022
[31]

Roth, A., Ullman, J., and Wu, Z. S. (2016). Watch and learn: Optimizing from revealed preferences feedback. In Proceedings of the forty-eighth annual ACM symposium on Theory of Computing , pages 949--962

work page 2016
[32]

Shamir, O. (2013). On the complexity of bandit and derivative-free stochastic convex optimization. In Conference on Learning Theory , pages 3--24. PMLR

work page 2013
[33]

Shavit, Y., Edelman, B., and Axelrod, B. (2020). Causal strategic linear regression. In International Conference on Machine Learning , pages 8676--8686. PMLR

work page 2020
[34]

Sivakumar, V., Zuo, S., and Banerjee, A. (2022). Smoothed adversarial linear contextual bandits with knapsacks. In International Conference on Machine Learning , pages 20253--20277. PMLR

work page 2022
[35]

Thiele, V. (2010). Task-specific abilities in multi-task principal--agent relationships. Labour Economics , 17(4):690--698

work page 2010
[36]

Tropp, J. A. (2012). User-friendly tail bounds for sums of random matrices. Foundations of computational mathematics , 12:389--434

work page 2012
[37]

Yu, M., Yang, Z., and Fan, J. (2022). Strategic decision-making in the presence of information asymmetry: Provably efficient rl with algorithmic instruments. arXiv preprint arXiv:2208.11040

work page arXiv 2022
[38]

Zhu, B., Bates, S., Yang, Z., Wang, Y., Jiao, J., and Jordan, M. I. (2022). The sample complexity of online contract design. Proceedings of the 24th ACM Conference on Economics and Computation

work page 2022
[39]

Zuo, S. (2024). Harnessing the continuous structure: Utilizing the first-order approach in online contract design. arXiv preprint arXiv:2403.07143

work page arXiv 2024

[1] [1]

and Darzi, A

Ashrafian, H. and Darzi, A. (2018). Transforming health policy through machine learning. PLoS Medicine , 15(11):e1002692

work page 2018

[2] [2]

Bastani, H., Bayati, M., and Khosravi, K. (2021). Mostly exploration-free algorithms for contextual bandits. Management Science , 67(3):1329--1349

work page 2021

[3] [3]

and Lafontaine, F

Bhattacharyya, S. and Lafontaine, F. (1995). Double-sided moral hazard and the nature of share contracts. The RAND Journal of Economics , pages 761--781

work page 1995

[4] [4]

Carroll, G. (2015). Robustness and linear contracts. American Economic Review , 105(2):536--563

work page 2015

[5] [5]

and Basu, D

Della Vecchia, R. and Basu, D. (2023). Online instrumental variable regression: Regret analysis and bandit feedback. arXiv preprint arXiv:2302.09357

work page arXiv 2023

[6] [6]

Dong, J., Roth, A., Schutzman, Z., Waggoner, B., and Wu, Z. S. (2018). Strategic classification from revealed preferences. In Proceedings of the 2018 ACM Conference on Economics and Computation , pages 55--70

work page 2018

[7] [7]

Duetting, P., Ezra, T., Feldman, M., and Kesselheim, T. (2024). Multi-agent combinatorial contracts. arXiv preprint arXiv:2405.08260

work page arXiv 2024

[8] [8]

D \"u tting, P., Ezra, T., Feldman, M., and Kesselheim, T. (2022). Combinatorial contracts. In 2021 IEEE 62nd Annual Symposium on Foundations of Computer Science (FOCS) , pages 815--826. IEEE

work page 2022

[9] [9]

D \"u tting, P., Roughgarden, T., and Talgam-Cohen, I. (2019). Simple versus optimal contracts. In Proceedings of the 2019 ACM Conference on Economics and Computation , pages 369--387

work page 2019

[10] [10]

Fuller, W. A. (2009). Measurement error models . John Wiley & Sons

work page 2009

[11] [11]

Fuster, A., Goldsmith-Pinkham, P., Ramadorai, T., and Walther, A. (2022). Predictably unequal? the effects of machine learning on credit markets. The Journal of Finance , 77(1):5--47

work page 2022

[12] [12]

Youtube partner earnings overview

Google (2024). Youtube partner earnings overview

work page 2024

[13] [13]

Greene, W. H. (2008). Econometric analysis . Pearson/Prentice Hall, Upper Saddle River, N.J, 6th ed. edition

work page 2008

[14] [14]

R., and Weinberg, S

Guruganesh, G., Kolumbus, Y., Schneider, J., Talgam-Cohen, I., Vlatakis-Gkaragkounis, E.-V., Wang, J. R., and Weinberg, S. M. (2024). Contracting with a learning agent. arXiv preprint arXiv:2401.16198

work page arXiv 2024

[15] [15]

Hardt, M., Megiddo, N., Papadimitriou, C., and Wootters, M. (2016). Strategic classification. In Proceedings of the 2016 ACM conference on innovations in theoretical computer science , pages 111--122

work page 2016

[16] [16]

Harris, K., Ngo, D. D. T., Stapleton, L., Heidari, H., and Wu, S. (2022). Strategic instrumental variable regression: Recovering causal relationships from strategic responses. In International Conference on Machine Learning , pages 8502--8522. PMLR

work page 2022

[17] [17]

Hilbert, S., Coors, S., Kraus, E., Bischl, B., Lindl, A., Frei, M., Wild, J., Krauss, S., Goretzko, D., and Stachl, C. (2021). Machine learning for the educational sciences. Review of Education , 9(3):e3310

work page 2021

[18] [18]

Ho, C.-J., Slivkins, A., and Vaughan, J. W. (2014). Adaptive contract design for crowdsourcing markets: Bandit algorithms for repeated principal-agent problems. In Proceedings of the fifteenth ACM conference on Economics and computation , pages 359--376

work page 2014

[19] [19]

Holmstr \"o m, B. (1979). Moral hazard and observability. The Bell journal of economics , pages 74--91

work page 1979

[20] [20]

Holmstrom, B. (1982). Moral hazard in teams. The Bell journal of economics , pages 324--340

work page 1982

[21] [21]

and Milgrom, P

Holmstrom, B. and Milgrom, P. (1991). Multitask principal--agent analyses: Incentive contracts, asset ownership, and job design. The Journal of Law, Economics, and Organization , 7(special\_issue):24--52

work page 1991

[22] [22]

and Rosenfeld, N

Horowitz, G. and Rosenfeld, N. (2023). Causal strategic classification: A tale of two shifts. In International Conference on Machine Learning , pages 13233--13253. PMLR

work page 2023

[23] [23]

and Adebayo, J

Hurley, M. and Adebayo, J. (2016). Credit scoring in the era of big data. Yale JL & Tech. , 18:148

work page 2016

[24] [24]

Jain, S., Pattanayak, K., Krishnamurthy, V., and Berry, C. (2023). Adaptive eccm for mitigating smart jammers. In ICASSP 2023-2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) , pages 1--5. IEEE

work page 2023

[25] [25]

H., Roth, A., Waggoner, B., and Wu, Z

Kannan, S., Morgenstern, J. H., Roth, A., Waggoner, B., and Wu, Z. S. (2018). A smoothed analysis of the greedy algorithm for the linear contextual bandit problem. Advances in neural information processing systems , 31

work page 2018

[26] [26]

Kleinberg, R. (2004). Nearly tight bounds for the continuum-armed bandit problem. Advances in Neural Information Processing Systems , 17

work page 2004

[27] [27]

and Szepesv \'a ri, C

Lattimore, T. and Szepesv \'a ri, C. (2020). Bandit algorithms . Cambridge University Press

work page 2020

[28] [28]

Miller, J., Milli, S., and Hardt, M. (2020). Strategic classification is causal modeling in disguise. In International Conference on Machine Learning , pages 6917--6926. PMLR

work page 2020

[29] [29]

Perdomo, J., Zrnic, T., Mendler-D \"u nner, C., and Hardt, M. (2020). Performative prediction. In International Conference on Machine Learning , pages 7599--7609. PMLR

work page 2020

[30] [30]

T., Zhu, H., and Ye, J

Qin, Z. T., Zhu, H., and Ye, J. (2022). Reinforcement learning for ridesharing: An extended survey. Transportation Research Part C: Emerging Technologies , 144:103852

work page 2022

[31] [31]

Roth, A., Ullman, J., and Wu, Z. S. (2016). Watch and learn: Optimizing from revealed preferences feedback. In Proceedings of the forty-eighth annual ACM symposium on Theory of Computing , pages 949--962

work page 2016

[32] [32]

Shamir, O. (2013). On the complexity of bandit and derivative-free stochastic convex optimization. In Conference on Learning Theory , pages 3--24. PMLR

work page 2013

[33] [33]

Shavit, Y., Edelman, B., and Axelrod, B. (2020). Causal strategic linear regression. In International Conference on Machine Learning , pages 8676--8686. PMLR

work page 2020

[34] [34]

Sivakumar, V., Zuo, S., and Banerjee, A. (2022). Smoothed adversarial linear contextual bandits with knapsacks. In International Conference on Machine Learning , pages 20253--20277. PMLR

work page 2022

[35] [35]

Thiele, V. (2010). Task-specific abilities in multi-task principal--agent relationships. Labour Economics , 17(4):690--698

work page 2010

[36] [36]

Tropp, J. A. (2012). User-friendly tail bounds for sums of random matrices. Foundations of computational mathematics , 12:389--434

work page 2012

[37] [37]

Yu, M., Yang, Z., and Fan, J. (2022). Strategic decision-making in the presence of information asymmetry: Provably efficient rl with algorithmic instruments. arXiv preprint arXiv:2208.11040

work page arXiv 2022

[38] [38]

Zhu, B., Bates, S., Yang, Z., Wang, Y., Jiao, J., and Jordan, M. I. (2022). The sample complexity of online contract design. Proceedings of the 24th ACM Conference on Economics and Computation

work page 2022

[39] [39]

Zuo, S. (2024). Harnessing the continuous structure: Utilizing the first-order approach in online contract design. arXiv preprint arXiv:2403.07143

work page arXiv 2024