pith. machine review for the scientific record. sign in

arxiv: 2605.08964 · v1 · submitted 2026-05-09 · 💻 cs.LG

Recognition: 2 theorem links

· Lean Theorem

Trustworthy AI: Ensuring Reliability and Accountability from Models to Agents

Carol Xuan Long

Authors on Pith no claims yet

Pith reviewed 2026-05-12 02:17 UTC · model grok-4.3

classification 💻 cs.LG
keywords trustworthy AILLM watermarkingmultiaccuracypredictive multiplicityautonomous agentssupply chain simulationinformation theoryoptimal transport
0
0 comments X

The pith

The thesis develops theoretically grounded algorithms to ensure reliability and accountability as machine learning systems advance from predictive models to generative models and autonomous agents.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This work introduces a kernel-based method that achieves multiaccuracy across complex subpopulations overlooked by standard demographic categories, alongside techniques to resolve predictive multiplicity where equally accurate models produce conflicting individual predictions. It then characterizes the information-theoretic trade-off between watermark detection and text distortion in large language models, deriving optimal strategies via optimal transport and coding theory that yield superior empirical detection-quality balances on language generation and coding tasks. Finally, the thesis evaluates fully LLM-driven agents through a supply chain simulator, documenting performance gains over human teams with cost reductions up to 67 percent while also exposing systemic risks such as costly tail events. A sympathetic reader cares because these methods aim to make AI systems fairer, more traceable, and safer at each stage of their increasing autonomy.

Core claim

The central claim is that tools grounded in information theory, optimization, and statistical learning can mitigate bias and arbitrariness in traditional models, ensure content provenance in generative models, and evaluate the performance and risks of autonomous agents. A kernel method delivers multiaccuracy beyond conventional groups; predictive multiplicity is addressed by methods that reduce arbitrary individual decisions; watermarking strategies derived from optimal transport achieve an improved detection-distortion frontier across tasks; and the supply-chain simulator shows LLM agents outperforming humans at lower cost while surfacing tail-event vulnerabilities.

What carries the argument

Information-theoretic characterization of watermark detection versus distortion, optimized via optimal transport and coding theory, together with kernel-based multiaccuracy and a full LLM-agent supply-chain simulator.

If this is right

  • Kernel-based multiaccuracy improves fairness across subpopulations that standard demographic partitions miss.
  • Methods for predictive multiplicity reduce conflicting individual predictions among equally accurate models.
  • Optimal-transport watermarking delivers a superior detection-quality trade-off on language and coding tasks.
  • LLM agents in the supply-chain simulator outperform human teams and cut costs by up to 67 percent.
  • The same agents introduce systemic risks including costly tail events.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The watermarking bounds could guide standards for provenance in other generative modalities such as images or structured data.
  • The supply-chain simulator offers a template for testing agent behavior in additional high-stakes domains like logistics or finance.
  • Integrating the multiaccuracy kernel with agent evaluation frameworks might produce fairness guarantees that extend to autonomous decision systems.
  • The information-theoretic watermark trade-off might inform regulatory requirements for content traceability in deployed language models.

Load-bearing premise

Theoretical guarantees from information theory and optimization will carry over to real-world performance without major degradation when applied to complex, high-dimensional data or multi-agent interactions.

What would settle it

A deployment in which the proposed watermarks fail to maintain claimed detection rates at low text distortion levels, or in which LLM agents running in an actual supply chain neither achieve the reported cost reductions nor expose the predicted tail risks.

Figures

Figures reproduced from arXiv: 2605.08964 by Carol Xuan Long.

Figure 2
Figure 2. Figure 2: illustrates how fairness interventions can increase predictive multiplicity. Here, [PITH_FULL_IMAGE:figures/full_fig_p028_2.png] view at source ↗
Figure 2.1
Figure 2.1. Figure 2.1: Accuracy-fairness frontier does not reveal arbitrariness in competing models. Left: Fairness-Accuracy frontier of baseline and fair models corrected by 5 fairness interventions; point clouds generated by different random seed choices. Middle: The cumulative distribution functions (CDF) of per-sample score std. across classifiers at different intervention levels (see Definition 2.4). For each sample, std.… view at source ↗
Figure 2.2
Figure 2.2. Figure 2.2: Illustration on two models being fair/unfair and exhibit high/low predictive multiplicity through the models’ error regions in each of the 4 cases. The metrics for fairness and predictive multiplicity are Overall Accuracy Equality (Definition 2.7) and ambiguity (Definition 2.3), respectively. We prove the orthogonality of OAE and Statistical Parity (SP) from ambiguity for￾mally in the proposition below, … view at source ↗
Figure 2.3
Figure 2.3. Figure 2.3: Data distribution of a population with two groups used in Example 2 (Left). In Right, without the Mean EO constraint (2.6) (green line), there is a unique optimal classifier (with threshold 0) that attains the smallest probability of error (blue line). Adding the Mean EO constraint enlarges the set of optimal threshold classifiers to two classifiers (red and blue dots) with indistinguishable accuracy and… view at source ↗
Figure 2.4
Figure 2.4. Figure 2.4: Quantile plot on high-fairness bin for various fairness interventions v.s. baseline on ENEM. Left: Fairness￾Accuracy frontier. Right: Fair models produce larger score std. at top percentiles compared to the baseline model (horizontal axis computed via (2.6)). (Rejection and Leveraging output thresholded scores directly.) models that agree on 90% of the samples, thereby not inducing concerns of arbitrarin… view at source ↗
Figure 2.5
Figure 2.5. Figure 2.5: Standard deviation of ensembled models trained on ENEM and HSLS with baseline random forest classifiers. We fix the high-fairness bin and vary the number of models m in each ensemble. As we increase the number of ensembles, score std. (on 10 ensembles) drops and meets the score std. of 10 baseline RFC when m = 30 on ENEM and m = 17 on HSLS. (Mean EO is computed using (2.6). frontiers are insufficient for… view at source ↗
Figure 3.1
Figure 3.1. Figure 3.1: Witness function values are highly correlated with errors of the model. Left: Visualization of the moon dataset, with the logistic regression classifier decision boundaries displayed. Middle: Witness function values (Definition 3.10 with rbf kernel) c ⋆ k,D0, f is plotted as a contour under the error of the classifiers on test samples y − f(x). The colored dots denote the error for each test sample y − f… view at source ↗
Figure 3.2
Figure 3.2. Figure 3.2: Multiaccuracy error (KME, Definition 3.14) vs. calibration error (MSCE, Definition 3.19) for KMAcc(our method), competing methods (LSBoost and MCBoost), and KMAcc with isotonic calibration, a standard score quantization technique. Predictor performances are measured as AUC and labeled next to each method. KMAcc achieves improved or comparable KME and AUC to the baselines and MCBoost (with the exception o… view at source ↗
Figure 3.3
Figure 3.3. Figure 3.3: Test errors over witness value contours using the RBF kernel. First Column: Visualization of the moon, concentric-circle, blob, and needle datasets. Red and blue represent the true labels. Second Column: Classification via a logistic regression classifier. Witness function values (Definition 3.10) c ⋆ k,D0, f is plotted as a contour under the error of the classifiers on test samples y − f(x). The witness… view at source ↗
Figure 4.1
Figure 4.1. Figure 4.1: Watermarking problem as a hypothesis test with side information. for bounded Type-I error is analyzed by comparing watermarking schemes to the uniformly most powerful watermark with knowledge of QX. The authors of He et al. [70] characterize the universal Type II error while controlling the worst-case Type-I error by optimizing the watermarking scheme and detector. While these works operate on a token-le… view at source ↗
Figure 4.2
Figure 4.2. Figure 4.2: Optimal coupling between side information S and random partition Y = f(X, B m) for pe1 ≤ 0.5 (left), pe0 ≤ 0.5 (right), with β(p) = 2p−1 2p . samples C ∼ Ber( 1 2 ). If C = 0, she samples a ∼ QX and sends it. Otherwise, she samples and sends a ∼ Qe X|S=s , which is given by the CC: Qe X|s,bm (x) = QX(x) PS|Y(s| f(x, b m)) PS(s) . (4.8) Bob performs the detection test by declaring that a is watermarked if… view at source ↗
Figure 4.3
Figure 4.3. Figure 4.3: Optimal detection probability of CC in one-shot on the adversarial token distribution (Eq. 4.6) is plotted against the inf-norm constraint λ (or equivalently, an entropy constraint) on QX 3 . When λ = 1 (entropy H(QX) = 0) , QX is deterministic, and detection is random. As entropy of QX grows (moves to smaller λ values), single-token optimal detection probability reaches a maximum of around 0.75 for bina… view at source ↗
Figure 4.4
Figure 4.4. Figure 4.4: One-shot watermark detection results on QX = Unif(X ). For αp = 0, CC achieves a detection probability of 0.75 and 0.7 with balanced and Bernoulli partitions, respectively. CC Balanced achieves the optimal detection (Eq. 4.4 with γ = 1 and |S|= 2). Standard deviations plotted as two-sided bars. predictable next token). As seen in [PITH_FULL_IMAGE:figures/full_fig_p087_4_4.png] view at source ↗
Figure 4.5
Figure 4.5. Figure 4.5: Detection probability vs. k for two values of m and a uniform token distribution QX. Sequential Watermarking We now present the performance of the CC watermark on a sequence level scheme. We present preliminary results on synthetically generated data, with the purpose of demon￾strating the applicability of our method to a sequence-level test. To that end, we consider the generation of n tokens A n , whic… view at source ↗
Figure 4.6
Figure 4.6. Figure 4.6: ROC of the sequence-level watermarking scheme. We compare the red-green method [90] with the CC scheme (Section 4.1.4). We consider a range of δ. An increase of δ increases detection, at the expense of higher perception (lower textual quality), while the CC method has fixed zero perception. Finally, we analyze the effect of k on performance in the sequential setting by observing the ROC for a range of k … view at source ↗
Figure 4.7
Figure 4.7. Figure 4.7: ROC of the sequence-level watermarking scheme under CC method for a range of k values. 4.1.7 Conclusion This section presents a rigorous analysis of text watermarking in a one-shot setting through the lens of hypothesis testing with side information. We analyze the fundamental trade-off 71 [PITH_FULL_IMAGE:figures/full_fig_p089_4_7.png] view at source ↗
Figure 4.8
Figure 4.8. Figure 4.8: HeavyWater and SimplexWater demonstrate favorable detection performance (measured by p-values) with minimal distortion to the base unwatermarked model (measured by Cross-Entropy). See Section 4.2.5 for details. transport (OT) problem [155] that maximizes the average score across all couplings between the side information and the next-token distributions. We efficiently solve the OT problem using Sinkhorn… view at source ↗
Figure 4.9
Figure 4.9. Figure 4.9: Visualization of the components of watermarking design. was extensively evaluated in real-world user tests, and we also select it as a competing benchmark. More recently, [108] proposed a binary-score watermark based on partitioning the token vocabulary into an arbitrary number of sets, followed by a simple binary test for watermark detection. The method in [108] is simple to implement and incurs little … view at source ↗
Figure 4.10
Figure 4.10. Figure 4.10: Left: Tradeoff between detection (measured by p-value) and distortion (measured by Cross￾Entropy) — SimplexWater and HeavyWater achieve higher detection rates while preserving token distributions close to the base unwatermarked model. Right: Detection gained by employing our watermark under various randomness generation schemes and several sliding window sizes h. Both SimplexWater and HeavyWater provide… view at source ↗
Figure 4.11
Figure 4.11. Figure 4.11: Our watermarks require fewer tokens to reach a given detection strength (p-value) with zero distortion. Gain and Performance Under Hash￾ing. We illustrate how SimplexWater and HeavyWater can be coupled with different side information generation methods to boost watermark detection. We consider an exper￾iment in which we replace the Red-Green wa￾termark cost function and watermarked dis￾tribution with ou… view at source ↗
Figure 5
Figure 5. Figure 5: summarizes the effects of the four factors that determine the success or failure [PITH_FULL_IMAGE:figures/full_fig_p115_5.png] view at source ↗
Figure 5.1
Figure 5.1. Figure 5.1: Performance gains of gen AI agents via model selection and inference-time techniques. Non￾reasoning models (top, GPT-4o mini) required policy constraints, orchestration, and prompt engineering to close the performance gap with humans. In contrast, reasoning models (bottom, GPT-5 mini and Llama 4 Maverick 17B) started above human-level performance, and, when optimized with the same techniques, achieved up… view at source ↗
read the original abstract

In this thesis, we develop algorithms with theoretical guarantees for ensuring reliability and accountability of Machine Learning (ML) systems. As ML systems evolve from predictive models to generative models and autonomous agents, the landscape of trustworthy AI has shifted. This thesis introduces tools grounded in information theory, optimization, and statistical learning to mitigate bias, reduce arbitrary decisions, ensure content provenance, and evaluate LLM-driven agents in autonomous settings. Towards mitigating bias and arbitrariness in traditional ML models, we introduce a kernel-based method to achieve multiaccuracy across complex subpopulations that traditional demographic categories may overlook. We also develop methods to address predictive multiplicity, where equally accurate models yield conflicting individual predictions. We ensure the accountability in generative AI through watermarking large language models (LLMs). We characterize the information-theoretic trade-off between watermark detection and text distortion and derive optimal watermarking strategies by leveraging optimal transport and coding theory. Empirical evaluations show our watermarks achieve a superior detection-quality tradeoff across language generation and coding tasks. Finally, we evaluate autonomous LLM agents in multi-agent environments through the first simulator of a fully LLM-driven supply chain. LLM agents offer significant performance gains, outperforming human teams and reducing costs by up to 67%, but also introduce systemic risks, including costly tail events.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. This thesis develops algorithms with theoretical guarantees for trustworthy AI as ML systems progress from predictive models to generative models and autonomous agents. It introduces a kernel-based method for achieving multiaccuracy across complex subpopulations overlooked by standard demographics, methods to handle predictive multiplicity in equally accurate models, and information-theoretic watermarking for LLMs that characterizes the detection-distortion tradeoff and derives optimal strategies via optimal transport and coding theory. Empirical results are reported for superior detection-quality tradeoffs on language generation and coding tasks. The work concludes with a simulator for fully LLM-driven supply chains, claiming performance gains over human teams (including up to 67% cost reduction) alongside systemic risks such as costly tail events.

Significance. If the theoretical derivations and empirical results hold, the thesis offers a coherent pipeline of tools grounded in information theory, optimization, and statistical learning for bias mitigation, arbitrariness reduction, content provenance, and agent evaluation. The explicit pairing of guarantees with experiments on practical tasks (watermarking, multi-agent simulation) and the first reported LLM supply-chain simulator represent concrete advances that could inform deployment standards, provided the claimed performance margins and risk characterizations prove robust.

major comments (2)
  1. [Empirical evaluations (watermarking and supply-chain simulator)] Abstract and empirical sections: the claim of a 'superior detection-quality tradeoff' for the proposed watermarks and the 'up to 67% cost reduction' for LLM agents require explicit baselines, variance estimates, and statistical tests; without these, the superiority and risk claims cannot be evaluated as load-bearing contributions.
  2. [Watermarking characterization and optimal strategies] Theoretical sections on watermarking: the derivation of optimal strategies via optimal transport and coding theory is presented as yielding parameter-free or tight bounds, but the translation to high-dimensional LLM outputs and multi-agent interactions is asserted without a concrete robustness argument or counterexample analysis, which is central to the accountability claims.
minor comments (2)
  1. [Introduction / Abstract] The abstract and introduction would benefit from a short roadmap explicitly mapping each contribution to a chapter or section number.
  2. [Multiaccuracy and predictive multiplicity sections] Ensure consistent terminology for 'multiaccuracy' and 'predictive multiplicity' when first introduced, and provide a brief comparison table of the kernel method against standard fairness baselines.

Simulated Author's Rebuttal

2 responses · 0 unresolved

Thank you for the constructive feedback on our thesis. We have carefully considered the major comments and will make revisions to address the concerns regarding empirical evaluations and theoretical robustness. Below we respond point by point.

read point-by-point responses
  1. Referee: Abstract and empirical sections: the claim of a 'superior detection-quality tradeoff' for the proposed watermarks and the 'up to 67% cost reduction' for LLM agents require explicit baselines, variance estimates, and statistical tests; without these, the superiority and risk claims cannot be evaluated as load-bearing contributions.

    Authors: We agree that the empirical sections require explicit baselines, variance estimates, and statistical tests to substantiate the claims. In the revised version, we will include direct comparisons to established watermarking baselines (such as probability-shift and synonym-substitution methods), report means with standard deviations across multiple runs with different random seeds, and apply statistical significance tests (e.g., paired t-tests or Wilcoxon signed-rank tests with p-values) for the detection-quality improvements on language and coding tasks. For the supply-chain simulator, we will add variance across repeated simulation episodes, explicit multi-trial human-team baselines, and statistical analysis supporting the cost-reduction figures and tail-event characterizations. revision: yes

  2. Referee: Theoretical sections on watermarking: the derivation of optimal strategies via optimal transport and coding theory is presented as yielding parameter-free or tight bounds, but the translation to high-dimensional LLM outputs and multi-agent interactions is asserted without a concrete robustness argument or counterexample analysis, which is central to the accountability claims.

    Authors: The optimal-transport and coding-theoretic derivations yield tight bounds in the idealized information-theoretic setting. We acknowledge that the manuscript asserts applicability to high-dimensional LLM outputs and multi-agent contexts without a dedicated robustness argument or counterexample analysis. In revision, we will expand the theoretical sections to explicitly state the assumptions (e.g., perfect token-level control and i.i.d. sampling), discuss potential looseness arising from discretization and approximation errors in high dimensions, and include a counterexample section illustrating degradation cases under realistic LLM constraints and agent interaction noise. This will strengthen the accountability claims. revision: yes

Circularity Check

0 steps flagged

No significant circularity detected in derivation chain

full rationale

The thesis structure grounds its contributions in external, established fields (information theory, optimal transport, coding theory, kernel methods, statistical learning) without evident self-referential loops. Watermarking derives optimal strategies from information-theoretic trade-offs and optimal transport, then validates via separate empirical evaluations on detection-quality tradeoffs. The LLM-agent simulator is presented as an empirical assessment of performance gains and risks, not a closed theoretical derivation. No equations, fitted parameters renamed as predictions, or load-bearing self-citations appear in the provided abstract or high-level argument that reduce the central claims to their own inputs by construction. The derivation remains self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only review provides insufficient detail to enumerate specific free parameters, axioms, or invented entities; the work references standard tools from information theory and optimization without listing ad-hoc choices.

pith-pipeline@v0.9.0 · 5511 in / 1109 out tokens · 40617 ms · 2026-05-12T02:17:28.600369+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

Reference graph

Works this paper leans on

199 extracted references · 199 canonical work pages · 5 internal anchors

  1. [1]

    (2024). Regulation (EU) 2024/1689 of the European Parliament and of the Council of 13 June 2024 laying down harmonised rules on artificial intelligence (Artificial Intelligence Act).Official Journal of the European Union, L 1689, 13 June 2024. Accessed: 2025-05-14

  2. [2]

    Aaronson, S. (2023). Watermarking of large language models. https://simons. berkeley.edu/talks/scott-aaronson-ut-austin-openai-2023-08-17 . Ac- cessed: 2025-01-1-

  3. [3]

    Agarwal, A., Beygelzimer, A., Dudík, M., Langford, J., and Wallach, H. (2018). A reductions approach to fair classification. InInternational conference on machine learning, pages 60–69. PMLR

  4. [4]

    Alghamdi, W., Hsu, H., Jeong, H., Wang, H., Michalak, P ., Asoodeh, S., and Calmon, F. (2022). Beyond adult and compas: Fair multi-class prediction via information projection. Advances in Neural Information Processing Systems, 35:38747–38760

  5. [5]

    Asoodeh, S., Diaz, M., and Calmon, F. P . (2020). Contraction of eγ-divergence and its applications to privacy.arXiv preprint arXiv:2012.11035

  6. [6]

    and Jiang, H

    Bahri, D. and Jiang, H. (2021). Locally adaptive label smoothing for predictive churn. arXiv preprint arXiv:2102.05140

  7. [7]

    and Wieting, J

    Bahri, D. and Wieting, J. (2024). A watermark for black-box language models.arXiv preprint arXiv:2410.02099

  8. [8]

    Bai, Y., Kadavath, S., Kundu, S., Askell, A., Kernion, J., Jones, A., Chen, A., Goldie, A., Mirhoseini, A., McKinnon, C., et al. (2022). Constitutional ai: Harmlessness from ai feedback.arXiv preprint arXiv:2212.08073

  9. [9]

    (2023).Fairness and machine learning: Limitations and opportunities

    Barocas, S., Hardt, M., and Narayanan, A. (2023).Fairness and machine learning: Limitations and opportunities. MIT Press

  10. [10]

    K., Dey, K., Hind, M., Hoffman, S

    Bellamy, R. K., Dey, K., Hind, M., Hoffman, S. C., Houde, S., Kannan, K., Lohia, P ., Martino, J., Mehta, S., Mojsilovi´ c, A., et al. (2019). Ai fairness 360: An extensible toolkit for detecting and mitigating algorithmic bias.IBM Journal of Research and Development, 63(4/5):4–1

  11. [11]

    and Thomas-Agnan, C

    Berlinet, A. and Thomas-Agnan, C. (2004).Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, 1 edition. 105

  12. [12]

    On the reproducibility of neural network predictions.arXiv preprint arXiv: 2102.03349,

    Bhojanapalli, S., Wilber, K., Veit, A., Rawat, A. S., Kim, S., Menon, A., and Ku- mar, S. (2021). On the reproducibility of neural network predictions.arXiv preprint arXiv:2102.03349

  13. [13]

    Black, E., Leino, K., and Fredrikson, M. (2021). Selective ensembles for consistent predictions.arXiv preprint arXiv:2111.08230

  14. [14]

    Black, E., Raghavan, M., and Barocas, S. (2022). Model multiplicity: Opportunities, con- cerns, and solutions. In2022 ACM Conference on Fairness, Accountability, and Transparency, pages 850–863

  15. [15]

    and Michaeli, T

    Blau, Y. and Michaeli, T. (2019). Rethinking lossy compression: The rate-distortion- perception tradeoff. InInternational Conference on Machine Learning, pages 675–685. PMLR

  16. [16]

    Boyd, S. P . and Vandenberghe, L. (2004).Convex optimization. Cambridge university press

  17. [17]

    Breiman, L. (1996). Bagging predictors.Machine learning, 24:123–140

  18. [18]

    Breiman, L. (2001). Statistical modeling: The two cultures (with comments and a rejoinder by the author).Statistical science, 16(3):199–231

  19. [19]

    and Haas, C

    Caton, S. and Haas, C. (2020). Fairness in machine learning: A survey.arXiv preprint arXiv:2010.04053

  20. [20]

    Chandra, B., Dunietz, J., and Roberts, K. (2024). Reducing risks posed by synthetic content: An overview of technical approaches to digital content transparency. Technical Report NIST.AI.100-4, National Institute of Standards and Technology, Gaithersburg, MD

  21. [21]

    Chang, Y., Krishna, K., Houmansadr, A., Wieting, J., and Iyyer, M. (2024). Postmark: A robust blackbox watermark for large language models.arXiv preprint arXiv:2406.14517

  22. [22]

    Chao, P ., Sun, Y., Dobriban, E., and Hassani, H. (2024). Watermarking language models with error correcting codes.arXiv preprint arXiv:2406.10281

  23. [23]

    (2000).Design and analysis of digital watermarking, information embedding, and data hiding systems

    Chen, B. (2000).Design and analysis of digital watermarking, information embedding, and data hiding systems. PhD thesis, Massachusetts Institute of Technology

  24. [24]

    Chen, J., Yu, L., Wang, J., Shi, W., Ge, Y., and Tong, W. (2022). On the rate-distortion- perception function.IEEE Journal on Selected Areas in Information Theory, 3(4):664–673

  25. [25]

    Chen, M., Tworek, J., Jun, H., Yuan, Q., Pinto, H. P . D. O., Kaplan, J., Edwards, H., Burda, Y., Joseph, N., Brockman, G., et al. (2021). Evaluating large language models trained on code.arXiv preprint arXiv:2107.03374

  26. [26]

    Chouldechova, A. (2017). Fair prediction with disparate impact: A study of bias in recidivism prediction instruments.Big data, 5(2):153–163

  27. [27]

    Christ, M., Gunn, S., and Zamir, O. (2024). Undetectable watermarks for language models. InThe Thirty Seventh Annual Conference on Learning Theory, pages 1125–1139. PMLR. 106

  28. [28]

    Chzhen, E., Denis, C., Hebiri, M., Oneto, L., and Pontil, M. (2019). Leveraging labeled and unlabeled data for consistent fair binary classification.Advances in Neural Information Processing Systems, 32

  29. [29]

    F., Barocas, S., De Sa, C., and Sen, S

    Cooper, A. F., Barocas, S., De Sa, C., and Sen, S. (2023). Variance, self-consistency, and arbitrariness in fair classification.arXiv preprint arXiv:2301.11562

  30. [30]

    Coston, A., Rambachan, A., and Chouldechova, A. (2021). Characterizing fairness over the set of good models under selective labels. InInternational Conference on Machine Learning, pages 2144–2155. PMLR

  31. [31]

    Cover, T. M. and Thomas, A. J. (2006).Elements of Information Theory. Wiley, New-York, 2nd edition

  32. [32]

    and Hellman, D

    Creel, K. and Hellman, D. (2022). The algorithmic leviathan: arbitrariness, fairness, and opportunity in algorithmic decision-making systems.Canadian Journal of Philosophy, 52(1):26–43

  33. [33]

    Cui, P ., Hu, W., and Zhu, J. (2020). Calibrated reliable regression using maximum mean discrepancy. In Larochelle, H., Ranzato, M., Hadsell, R., Balcan, M., and Lin, H., editors, Advances in Neural Information Processing Systems, volume 33, pages 17164–17175. Curran Associates, Inc

  34. [34]

    Cury, C. R. J. (2022). Instituto nacional de estudos e pesquisas educacionais anísio teixeira: uma trajetória em busca de uma educação de qualidade

  35. [35]

    Cuturi, M. (2013). Sinkhorn distances: Lightspeed computation of optimal transport. Advances in neural information processing systems, 26

  36. [36]

    D., et al

    D’Amour, A., Heller, K., Moldovan, D., Adlam, B., Alipanahi, B., Beutel, A., Chen, C., Deaton, J., Eisenstein, J., Hoffman, M. D., et al. (2022). Underspecification presents challenges for credibility in modern machine learning.The Journal of Machine Learning Research, 23(1):10237–10297

  37. [37]

    Dathathri, S., See, A., Ghaisas, S., Huang, P .-S., McAdam, R., Welbl, J., Bachani, V ., Kaskasoli, A., Stanforth, R., Matejovicova, T., et al. (2024). Scalable watermarking for identifying large language model outputs.Nature, 634(8035):818–823

  38. [38]

    Dawid, A. P . (1982). The well-calibrated bayesian.Journal of the American Statistical Association, 77(379):605–610

  39. [39]

    Deng, Z., Dwork, C., and Zhang, L. (2023). Happymap: A generalized multi-calibration method.arXiv preprint arXiv:2303.04379

  40. [40]

    and Freedman, D

    Diaconis, P . and Freedman, D. (1980). Finite exchangeable sequences.The Annals of Probability, pages 745–764

  41. [41]

    Dwork, C., Feldman, V ., Hardt, M., Pitassi, T., Reingold, O., and Roth, A. L. (2015). Preserving statistical validity in adaptive data analysis. InProceedings of the forty-seventh annual ACM symposium on Theory of computing, pages 117–126. 107

  42. [42]

    Dwork, C., Hardt, M., Pitassi, T., Reingold, O., and Zemel, R. (2012). Fairness through awareness. InProceedings of the 3rd Innovations in Theoretical Computer Science Conference, ITCS ’12, page 214–226, New York, NY, USA. Association for Computing Machinery

  43. [43]

    P ., Reingold, O., Rothblum, G

    Dwork, C., Kim, M. P ., Reingold, O., Rothblum, G. N., and Yona, G. (2021). Outcome indistinguishability. InProceedings of the 53rd Annual ACM SIGACT Symposium on Theory of Computing, pages 1095–1108

  44. [44]

    Dwork, C., Lee, D., Lin, H., and Tankala, P . (2023). From pseudorandomness to multi- group fairness and back. InThe Thirty Sixth Annual Conference on Learning Theory, pages 3566–3614. PMLR

  45. [45]

    Fabbri, Irene Li, Tianwei She, Suyi Li, and Dragomir R

    Fabbri, A. R., Li, I., She, T., Li, S., and Radev, D. R. (2019). Multi-news: A large-scale multi-document summarization dataset and abstractive hierarchical model.arXiv preprint arXiv:1906.01749

  46. [46]

    Fairoze, J., Garg, S., Jha, S., Mahloujifar, S., Mahmoody, M., and Wang, M. (2025). Publicly-detectable watermarking for language models.IACR Communications in Cryptol- ogy, 1(4)

  47. [47]

    Fan, A., Jernite, Y., Perez, E., Grangier, D., Weston, J., and Auli, M. (2019). Eli5: Long form question answering.arXiv preprint arXiv:1907.09190

  48. [48]

    Feldman, V . (2009). Distribution-specific agnostic boosting. InInternational Conference on Supercomputing

  49. [49]

    Fernandez, P ., Chaffin, A., Tit, K., Chappelier, V ., and Furon, T. (2023). Three bricks to consolidate watermarks for large language models. In2023 IEEE International Workshop on Information Forensics and Security (WIFS), pages 1–6. IEEE

  50. [50]

    Fisher, A., Rudin, C., and Dominici, F. (2019). All models are wrong, but many are useful: Learning a variable’s importance by studying an entire class of prediction models simultaneously.J. Mach. Learn. Res., 20(177):1–81

  51. [51]

    Z., Boisbunon, A., Chambon, S., Chapel, L., Corenflos, A., Fatras, K., Fournier, N., et al

    Flamary, R., Courty, N., Gramfort, A., Alaya, M. Z., Boisbunon, A., Chambon, S., Chapel, L., Corenflos, A., Fatras, K., Fournier, N., et al. (2021). Pot: Python optimal transport.Journal of Machine Learning Research, 22(78):1–8

  52. [52]

    A., Scheidegger, C., Venkatasubramanian, S., Choudhary, S., Hamilton, E

    Friedler, S. A., Scheidegger, C., Venkatasubramanian, S., Choudhary, S., Hamilton, E. P ., and Roth, D. (2019). A comparative study of fairness-enhancing interventions in machine learning. InProceedings of the conference on fairness, accountability, and transparency, pages 329–338

  53. [53]

    Fu, J., Zhao, X., Yang, R., Zhang, Y., Chen, J., and Xiao, Y. (2024). Gumbelsoft: Diversified language model watermarking via the gumbelmax-trick.arXiv preprint arXiv:2402.12948

  54. [54]

    Ganesh, P ., Chang, H., Strobel, M., and Shokri, R. (2023). On the impact of machine learning randomness on group fairness. InProceedings of the 2023 ACM Conference on Fairness, Accountability, and Transparency, FAccT ’23, pages 1789–1800, New York, NY, USA. Association for Computing Machinery. 108

  55. [55]

    Garg, S., Jung, C., Reingold, O., and Roth, A. (2024). Oracle efficient online multicali- bration and omniprediction. InProceedings of the 2024 Annual ACM-SIAM Symposium on Discrete Algorithms (SODA), pages 2725–2792. SIAM

  56. [56]

    and Pinsker, M

    Gel’Fand, I. and Pinsker, M. (1980). Coding for channels with random parameters. Probl. Contr. Inform. Theory, 9(1):19–31

  57. [57]

    Geva, M., Schuster, R., Berant, J., and Levy, O. (2021). Transformer feed-forward layers are key-value memories. InProceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pages 5484–5495

  58. [58]

    Watermax: breaking the llm watermark detectability-robustness-quality trade-off.arXiv preprint arXiv:2403.04808,

    Giboulot, E. and Furon, T. (2024). Watermax: breaking the llm watermark detectability- robustness-quality trade-off.arXiv preprint arXiv:2403.04808

  59. [59]

    Globus-Harris, I., Gupta, V ., Jung, C., Kearns, M., Morgenstern, J., and Roth, A. (2022). Multicalibrated regression for downstream fairness.arXiv preprint arXiv:2209.07312

  60. [60]

    Globus-Harris, I., Harrison, D., Kearns, M., Roth, A., and Sorrell, J. (2023). Multicali- bration as boosting for regression.arXiv preprint arXiv:2301.13767

  61. [61]

    Goodrich, R. K. (1970). A riesz representation theorem. InProc. Amer. Math. Soc, volume 24, pages 629–636

  62. [62]

    P ., Reingold, O., and Wieder, U

    Gopalan, P ., Hu, L., Kim, M. P ., Reingold, O., and Wieder, U. (2022). Loss minimization through the lens of outcome indistinguishability.arXiv preprint arXiv:2210.08649

  63. [63]

    T., Reingold, O., Sharan, V ., and Wieder, U

    Gopalan, P ., Kalai, A. T., Reingold, O., Sharan, V ., and Wieder, U. (2021). Omnipredictors. arXiv preprint arXiv:2109.05389

  64. [65]

    Gretton, A., Borgwardt, K., Rasch, M., Schölkopf, B., and Smola, A. (2006). A kernel method for the two-sample-problem.Advances in neural information processing systems, 19

  65. [66]

    Gurobi Optimizer Reference Manual

    Gurobi Optimization, LLC (2024). Gurobi Optimizer Reference Manual

  66. [67]

    Haghtalab, N., Jordan, M., and Zhao, E. (2024). A unifying perspective on multi- calibration: Game dynamics for multi-objective learning.Advances in Neural Information Processing Systems, 36

  67. [68]

    and Raginsky, M

    Hajek, B. and Raginsky, M. (2019). Statistical learning theory.Lecture Notes, 387

  68. [69]

    Hardt, M., Price, E., and Srebro, N. (2016). Equality of opportunity in supervised learning.Advances in neural information processing systems, 29

  69. [70]

    He, H., Liu, Y., Wang, Z., Mao, Y., and Bu, Y. (2024). Universally optimal watermarking schemes for llms: from theory to practice.arXiv preprint arXiv:2410.02890. 109

  70. [71]

    He, H., Liu, Y., Wang, Z., Mao, Y., and Bu, Y. (2025). Theoretically grounded framework for llm watermarking: A distribution-adaptive approach. InThe 1st Workshop on GenAI Watermarking

  71. [72]

    Hébert-Johnson, U., Kim, M., Reingold, O., and Rothblum, G. (2018). Multicalibration: Calibration for the (computationally-identifiable) masses. InInternational Conference on Machine Learning, pages 1939–1948. PMLR

  72. [73]

    and Floridi, L

    Hine, E. and Floridi, L. (2023). The blueprint for an ai bill of rights: in search of enaction, at risk of inaction.Minds and Machines, pages 1–8

  73. [74]

    M., Sarro, F., and Harman, M

    Hort, M., Chen, Z., Zhang, J. M., Sarro, F., and Harman, M. (2022). Bia mitigation for machine learning classifiers: A comprehensive survey.arXiv preprint arXiv:2207.07068

  74. [75]

    Hou, A., Zhang, J., He, T., Wang, Y., Chuang, Y.-S., Wang, H., Shen, L., Van Durme, B., Khashabi, D., and Tsvetkov, Y. (2024). Semstamp: A semantic watermark with paraphrastic robustness for text generation. InProceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Vol...

  75. [76]

    and Calmon, F

    Hsu, H. and Calmon, F. d. P . (2022). Rashomon capacity: A metric for predictive multiplicity in probabilistic classification.arXiv preprint arXiv:2206.01295

  76. [77]

    Hu, Z., Chen, L., Wu, X., Wu, Y., Zhang, H., and Huang, H. (2024). Unbiased watermark for large language models. InThe Twelfth International Conference on Learning Representations

  77. [78]

    and Wan, X

    Huang, B. and Wan, X. (2024). Waterpool: A watermark mitigating trade-offs among imperceptibility, efficacy and robustness.arXiv preprint arXiv:2405.13517

  78. [79]

    D., Jiao, J., and Jordan, M

    Huang, B., Zhu, H., Zhu, B., Ramchandran, K., Jordan, M. I., Lee, J. D., and Jiao, J. (2023). Towards optimal statistical watermarking.arXiv preprint arXiv:2312.07930

  79. [80]

    Huang, Y., Sun, L., Wang, H., Wu, S., Zhang, Q., Li, Y., Gao, C., Huang, Y., Lyu, W., Zhang, Y., et al. (2024). TrustLLM: Trustworthiness in large language models.arXiv preprint arXiv:2401.05561

  80. [81]

    A., Kool, W., Paulus, M

    Huijben, I. A., Kool, W., Paulus, M. B., and Van Sloun, R. J. (2022). A review of the gumbel-max trick and its extensions for discrete stochasticity in machine learning.IEEE transactions on pattern analysis and machine intelligence, 45(2):1353–1371

Showing first 80 references.