arxiv: 2605.08964 · v1 · submitted 2026-05-09 · 💻 cs.LG

Recognition: 2 theorem links

· Lean Theorem

Trustworthy AI: Ensuring Reliability and Accountability from Models to Agents

Carol Xuan Long

Authors on Pith no claims yet

Pith reviewed 2026-05-12 02:17 UTC · model grok-4.3

classification 💻 cs.LG

keywords trustworthy AILLM watermarkingmultiaccuracypredictive multiplicityautonomous agentssupply chain simulationinformation theoryoptimal transport

0 comments

The pith

The thesis develops theoretically grounded algorithms to ensure reliability and accountability as machine learning systems advance from predictive models to generative models and autonomous agents.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This work introduces a kernel-based method that achieves multiaccuracy across complex subpopulations overlooked by standard demographic categories, alongside techniques to resolve predictive multiplicity where equally accurate models produce conflicting individual predictions. It then characterizes the information-theoretic trade-off between watermark detection and text distortion in large language models, deriving optimal strategies via optimal transport and coding theory that yield superior empirical detection-quality balances on language generation and coding tasks. Finally, the thesis evaluates fully LLM-driven agents through a supply chain simulator, documenting performance gains over human teams with cost reductions up to 67 percent while also exposing systemic risks such as costly tail events. A sympathetic reader cares because these methods aim to make AI systems fairer, more traceable, and safer at each stage of their increasing autonomy.

Core claim

The central claim is that tools grounded in information theory, optimization, and statistical learning can mitigate bias and arbitrariness in traditional models, ensure content provenance in generative models, and evaluate the performance and risks of autonomous agents. A kernel method delivers multiaccuracy beyond conventional groups; predictive multiplicity is addressed by methods that reduce arbitrary individual decisions; watermarking strategies derived from optimal transport achieve an improved detection-distortion frontier across tasks; and the supply-chain simulator shows LLM agents outperforming humans at lower cost while surfacing tail-event vulnerabilities.

What carries the argument

Information-theoretic characterization of watermark detection versus distortion, optimized via optimal transport and coding theory, together with kernel-based multiaccuracy and a full LLM-agent supply-chain simulator.

If this is right

Kernel-based multiaccuracy improves fairness across subpopulations that standard demographic partitions miss.
Methods for predictive multiplicity reduce conflicting individual predictions among equally accurate models.
Optimal-transport watermarking delivers a superior detection-quality trade-off on language and coding tasks.
LLM agents in the supply-chain simulator outperform human teams and cut costs by up to 67 percent.
The same agents introduce systemic risks including costly tail events.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The watermarking bounds could guide standards for provenance in other generative modalities such as images or structured data.
The supply-chain simulator offers a template for testing agent behavior in additional high-stakes domains like logistics or finance.
Integrating the multiaccuracy kernel with agent evaluation frameworks might produce fairness guarantees that extend to autonomous decision systems.
The information-theoretic watermark trade-off might inform regulatory requirements for content traceability in deployed language models.

Load-bearing premise

Theoretical guarantees from information theory and optimization will carry over to real-world performance without major degradation when applied to complex, high-dimensional data or multi-agent interactions.

What would settle it

A deployment in which the proposed watermarks fail to maintain claimed detection rates at low text distortion levels, or in which LLM agents running in an actual supply chain neither achieve the reported cost reductions nor expose the predicted tail risks.

Figures

Figures reproduced from arXiv: 2605.08964 by Carol Xuan Long.

**Figure 2.1.** Figure 2.1: Accuracy-fairness frontier does not reveal arbitrariness in competing models. Left: Fairness-Accuracy frontier of baseline and fair models corrected by 5 fairness interventions; point clouds generated by different random seed choices. Middle: The cumulative distribution functions (CDF) of per-sample score std. across classifiers at different intervention levels (see Definition 2.4). For each sample, std.… view at source ↗

**Figure 2.2.** Figure 2.2: Illustration on two models being fair/unfair and exhibit high/low predictive multiplicity through the models’ error regions in each of the 4 cases. The metrics for fairness and predictive multiplicity are Overall Accuracy Equality (Definition 2.7) and ambiguity (Definition 2.3), respectively. We prove the orthogonality of OAE and Statistical Parity (SP) from ambiguity formally in the proposition below, … view at source ↗

**Figure 2.3.** Figure 2.3: Data distribution of a population with two groups used in Example 2 (Left). In Right, without the Mean EO constraint (2.6) (green line), there is a unique optimal classifier (with threshold 0) that attains the smallest probability of error (blue line). Adding the Mean EO constraint enlarges the set of optimal threshold classifiers to two classifiers (red and blue dots) with indistinguishable accuracy and… view at source ↗

**Figure 2.4.** Figure 2.4: Quantile plot on high-fairness bin for various fairness interventions v.s. baseline on ENEM. Left: FairnessAccuracy frontier. Right: Fair models produce larger score std. at top percentiles compared to the baseline model (horizontal axis computed via (2.6)). (Rejection and Leveraging output thresholded scores directly.) models that agree on 90% of the samples, thereby not inducing concerns of arbitrarin… view at source ↗

**Figure 2.5.** Figure 2.5: Standard deviation of ensembled models trained on ENEM and HSLS with baseline random forest classifiers. We fix the high-fairness bin and vary the number of models m in each ensemble. As we increase the number of ensembles, score std. (on 10 ensembles) drops and meets the score std. of 10 baseline RFC when m = 30 on ENEM and m = 17 on HSLS. (Mean EO is computed using (2.6). frontiers are insufficient for… view at source ↗

**Figure 3.1.** Figure 3.1: Witness function values are highly correlated with errors of the model. Left: Visualization of the moon dataset, with the logistic regression classifier decision boundaries displayed. Middle: Witness function values (Definition 3.10 with rbf kernel) c ⋆ k,D0, f is plotted as a contour under the error of the classifiers on test samples y − f(x). The colored dots denote the error for each test sample y − f… view at source ↗

**Figure 3.2.** Figure 3.2: Multiaccuracy error (KME, Definition 3.14) vs. calibration error (MSCE, Definition 3.19) for KMAcc(our method), competing methods (LSBoost and MCBoost), and KMAcc with isotonic calibration, a standard score quantization technique. Predictor performances are measured as AUC and labeled next to each method. KMAcc achieves improved or comparable KME and AUC to the baselines and MCBoost (with the exception o… view at source ↗

**Figure 3.3.** Figure 3.3: Test errors over witness value contours using the RBF kernel. First Column: Visualization of the moon, concentric-circle, blob, and needle datasets. Red and blue represent the true labels. Second Column: Classification via a logistic regression classifier. Witness function values (Definition 3.10) c ⋆ k,D0, f is plotted as a contour under the error of the classifiers on test samples y − f(x). The witness… view at source ↗

**Figure 4.1.** Figure 4.1: Watermarking problem as a hypothesis test with side information. for bounded Type-I error is analyzed by comparing watermarking schemes to the uniformly most powerful watermark with knowledge of QX. The authors of He et al. [70] characterize the universal Type II error while controlling the worst-case Type-I error by optimizing the watermarking scheme and detector. While these works operate on a token-le… view at source ↗

**Figure 4.2.** Figure 4.2: Optimal coupling between side information S and random partition Y = f(X, B m) for pe1 ≤ 0.5 (left), pe0 ≤ 0.5 (right), with β(p) = 2p−1 2p . samples C ∼ Ber( 1 2 ). If C = 0, she samples a ∼ QX and sends it. Otherwise, she samples and sends a ∼ Qe X|S=s , which is given by the CC: Qe X|s,bm (x) = QX(x) PS|Y(s| f(x, b m)) PS(s) . (4.8) Bob performs the detection test by declaring that a is watermarked if… view at source ↗

**Figure 4.3.** Figure 4.3: Optimal detection probability of CC in one-shot on the adversarial token distribution (Eq. 4.6) is plotted against the inf-norm constraint λ (or equivalently, an entropy constraint) on QX 3 . When λ = 1 (entropy H(QX) = 0) , QX is deterministic, and detection is random. As entropy of QX grows (moves to smaller λ values), single-token optimal detection probability reaches a maximum of around 0.75 for bina… view at source ↗

**Figure 4.4.** Figure 4.4: One-shot watermark detection results on QX = Unif(X ). For αp = 0, CC achieves a detection probability of 0.75 and 0.7 with balanced and Bernoulli partitions, respectively. CC Balanced achieves the optimal detection (Eq. 4.4 with γ = 1 and |S|= 2). Standard deviations plotted as two-sided bars. predictable next token). As seen in [PITH_FULL_IMAGE:figures/full_fig_p087_4_4.png] view at source ↗

**Figure 4.5.** Figure 4.5: Detection probability vs. k for two values of m and a uniform token distribution QX. Sequential Watermarking We now present the performance of the CC watermark on a sequence level scheme. We present preliminary results on synthetically generated data, with the purpose of demonstrating the applicability of our method to a sequence-level test. To that end, we consider the generation of n tokens A n , whic… view at source ↗

**Figure 4.6.** Figure 4.6: ROC of the sequence-level watermarking scheme. We compare the red-green method [90] with the CC scheme (Section 4.1.4). We consider a range of δ. An increase of δ increases detection, at the expense of higher perception (lower textual quality), while the CC method has fixed zero perception. Finally, we analyze the effect of k on performance in the sequential setting by observing the ROC for a range of k … view at source ↗

**Figure 4.7.** Figure 4.7: ROC of the sequence-level watermarking scheme under CC method for a range of k values. 4.1.7 Conclusion This section presents a rigorous analysis of text watermarking in a one-shot setting through the lens of hypothesis testing with side information. We analyze the fundamental trade-off 71 [PITH_FULL_IMAGE:figures/full_fig_p089_4_7.png] view at source ↗

**Figure 4.8.** Figure 4.8: HeavyWater and SimplexWater demonstrate favorable detection performance (measured by p-values) with minimal distortion to the base unwatermarked model (measured by Cross-Entropy). See Section 4.2.5 for details. transport (OT) problem [155] that maximizes the average score across all couplings between the side information and the next-token distributions. We efficiently solve the OT problem using Sinkhorn… view at source ↗

**Figure 4.9.** Figure 4.9: Visualization of the components of watermarking design. was extensively evaluated in real-world user tests, and we also select it as a competing benchmark. More recently, [108] proposed a binary-score watermark based on partitioning the token vocabulary into an arbitrary number of sets, followed by a simple binary test for watermark detection. The method in [108] is simple to implement and incurs little … view at source ↗

**Figure 4.10.** Figure 4.10: Left: Tradeoff between detection (measured by p-value) and distortion (measured by CrossEntropy) — SimplexWater and HeavyWater achieve higher detection rates while preserving token distributions close to the base unwatermarked model. Right: Detection gained by employing our watermark under various randomness generation schemes and several sliding window sizes h. Both SimplexWater and HeavyWater provide… view at source ↗

**Figure 4.11.** Figure 4.11: Our watermarks require fewer tokens to reach a given detection strength (p-value) with zero distortion. Gain and Performance Under Hashing. We illustrate how SimplexWater and HeavyWater can be coupled with different side information generation methods to boost watermark detection. We consider an experiment in which we replace the Red-Green watermark cost function and watermarked distribution with ou… view at source ↗

**Figure 5.** Figure 5: summarizes the effects of the four factors that determine the success or failure [PITH_FULL_IMAGE:figures/full_fig_p115_5.png] view at source ↗

**Figure 5.1.** Figure 5.1: Performance gains of gen AI agents via model selection and inference-time techniques. Nonreasoning models (top, GPT-4o mini) required policy constraints, orchestration, and prompt engineering to close the performance gap with humans. In contrast, reasoning models (bottom, GPT-5 mini and Llama 4 Maverick 17B) started above human-level performance, and, when optimized with the same techniques, achieved up… view at source ↗

read the original abstract

In this thesis, we develop algorithms with theoretical guarantees for ensuring reliability and accountability of Machine Learning (ML) systems. As ML systems evolve from predictive models to generative models and autonomous agents, the landscape of trustworthy AI has shifted. This thesis introduces tools grounded in information theory, optimization, and statistical learning to mitigate bias, reduce arbitrary decisions, ensure content provenance, and evaluate LLM-driven agents in autonomous settings. Towards mitigating bias and arbitrariness in traditional ML models, we introduce a kernel-based method to achieve multiaccuracy across complex subpopulations that traditional demographic categories may overlook. We also develop methods to address predictive multiplicity, where equally accurate models yield conflicting individual predictions. We ensure the accountability in generative AI through watermarking large language models (LLMs). We characterize the information-theoretic trade-off between watermark detection and text distortion and derive optimal watermarking strategies by leveraging optimal transport and coding theory. Empirical evaluations show our watermarks achieve a superior detection-quality tradeoff across language generation and coding tasks. Finally, we evaluate autonomous LLM agents in multi-agent environments through the first simulator of a fully LLM-driven supply chain. LLM agents offer significant performance gains, outperforming human teams and reducing costs by up to 67%, but also introduce systemic risks, including costly tail events.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

This thesis adds kernel multiaccuracy, optimal-transport watermarking, and an LLM supply-chain simulator with theoretical ties, but empirical claims need more context to judge.

read the letter

The punchline is that this thesis adds a kernel multiaccuracy method, optimal transport watermarking for LLMs, and a full LLM supply chain simulator, each tied to theoretical guarantees from information theory and optimization. It does well by showing how these address bias in subpopulations, predictive multiplicity, content provenance, and agent risks. The watermarking derives information-theoretic trade-offs and reports better detection-quality balances on language and coding tasks. The simulator demonstrates performance improvements over human teams along with new systemic risks. The soft spots are in the empirical side. The 67% cost reduction lacks context on what baselines were used or how variable the results are. The theoretical guarantees are presented as holding, but without checks on degradation in complex settings, it's unclear how far they extend. As a compiled thesis, the pieces connect but the depth varies across sections. This is for people working on trustworthy AI methods, particularly those who value theoretical grounding for bias, watermarking, and agent evaluation. A reader in one of those areas can extract useful components. The paper deserves a serious referee because the contributions are specific and the simulator is a new evaluation tool. I would recommend sending it for peer review, with attention to strengthening the empirical reporting and validation.

Referee Report

2 major / 2 minor

Summary. This thesis develops algorithms with theoretical guarantees for trustworthy AI as ML systems progress from predictive models to generative models and autonomous agents. It introduces a kernel-based method for achieving multiaccuracy across complex subpopulations overlooked by standard demographics, methods to handle predictive multiplicity in equally accurate models, and information-theoretic watermarking for LLMs that characterizes the detection-distortion tradeoff and derives optimal strategies via optimal transport and coding theory. Empirical results are reported for superior detection-quality tradeoffs on language generation and coding tasks. The work concludes with a simulator for fully LLM-driven supply chains, claiming performance gains over human teams (including up to 67% cost reduction) alongside systemic risks such as costly tail events.

Significance. If the theoretical derivations and empirical results hold, the thesis offers a coherent pipeline of tools grounded in information theory, optimization, and statistical learning for bias mitigation, arbitrariness reduction, content provenance, and agent evaluation. The explicit pairing of guarantees with experiments on practical tasks (watermarking, multi-agent simulation) and the first reported LLM supply-chain simulator represent concrete advances that could inform deployment standards, provided the claimed performance margins and risk characterizations prove robust.

major comments (2)

[Empirical evaluations (watermarking and supply-chain simulator)] Abstract and empirical sections: the claim of a 'superior detection-quality tradeoff' for the proposed watermarks and the 'up to 67% cost reduction' for LLM agents require explicit baselines, variance estimates, and statistical tests; without these, the superiority and risk claims cannot be evaluated as load-bearing contributions.
[Watermarking characterization and optimal strategies] Theoretical sections on watermarking: the derivation of optimal strategies via optimal transport and coding theory is presented as yielding parameter-free or tight bounds, but the translation to high-dimensional LLM outputs and multi-agent interactions is asserted without a concrete robustness argument or counterexample analysis, which is central to the accountability claims.

minor comments (2)

[Introduction / Abstract] The abstract and introduction would benefit from a short roadmap explicitly mapping each contribution to a chapter or section number.
[Multiaccuracy and predictive multiplicity sections] Ensure consistent terminology for 'multiaccuracy' and 'predictive multiplicity' when first introduced, and provide a brief comparison table of the kernel method against standard fairness baselines.

Simulated Author's Rebuttal

2 responses · 0 unresolved

Thank you for the constructive feedback on our thesis. We have carefully considered the major comments and will make revisions to address the concerns regarding empirical evaluations and theoretical robustness. Below we respond point by point.

read point-by-point responses

Referee: Abstract and empirical sections: the claim of a 'superior detection-quality tradeoff' for the proposed watermarks and the 'up to 67% cost reduction' for LLM agents require explicit baselines, variance estimates, and statistical tests; without these, the superiority and risk claims cannot be evaluated as load-bearing contributions.

Authors: We agree that the empirical sections require explicit baselines, variance estimates, and statistical tests to substantiate the claims. In the revised version, we will include direct comparisons to established watermarking baselines (such as probability-shift and synonym-substitution methods), report means with standard deviations across multiple runs with different random seeds, and apply statistical significance tests (e.g., paired t-tests or Wilcoxon signed-rank tests with p-values) for the detection-quality improvements on language and coding tasks. For the supply-chain simulator, we will add variance across repeated simulation episodes, explicit multi-trial human-team baselines, and statistical analysis supporting the cost-reduction figures and tail-event characterizations. revision: yes
Referee: Theoretical sections on watermarking: the derivation of optimal strategies via optimal transport and coding theory is presented as yielding parameter-free or tight bounds, but the translation to high-dimensional LLM outputs and multi-agent interactions is asserted without a concrete robustness argument or counterexample analysis, which is central to the accountability claims.

Authors: The optimal-transport and coding-theoretic derivations yield tight bounds in the idealized information-theoretic setting. We acknowledge that the manuscript asserts applicability to high-dimensional LLM outputs and multi-agent contexts without a dedicated robustness argument or counterexample analysis. In revision, we will expand the theoretical sections to explicitly state the assumptions (e.g., perfect token-level control and i.i.d. sampling), discuss potential looseness arising from discretization and approximation errors in high dimensions, and include a counterexample section illustrating degradation cases under realistic LLM constraints and agent interaction noise. This will strengthen the accountability claims. revision: yes

Circularity Check

0 steps flagged

No significant circularity detected in derivation chain

full rationale

The thesis structure grounds its contributions in external, established fields (information theory, optimal transport, coding theory, kernel methods, statistical learning) without evident self-referential loops. Watermarking derives optimal strategies from information-theoretic trade-offs and optimal transport, then validates via separate empirical evaluations on detection-quality tradeoffs. The LLM-agent simulator is presented as an empirical assessment of performance gains and risks, not a closed theoretical derivation. No equations, fitted parameters renamed as predictions, or load-bearing self-citations appear in the provided abstract or high-level argument that reduce the central claims to their own inputs by construction. The derivation remains self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only review provides insufficient detail to enumerate specific free parameters, axioms, or invented entities; the work references standard tools from information theory and optimization without listing ad-hoc choices.

pith-pipeline@v0.9.0 · 5511 in / 1109 out tokens · 40617 ms · 2026-05-12T02:17:28.600369+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Foundation/RealityFromDistinction.lean reality_from_one_distinction unclear
kernel-based method to achieve multiaccuracy across complex subpopulations... information-theoretic trade-off between watermark detection and text distortion... first simulator of a fully LLM-driven supply chain
IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear
characterize the information-theoretic trade-off... leveraging optimal transport and coding theory

Reference graph

Works this paper leans on

199 extracted references · 199 canonical work pages · 5 internal anchors

[1]

(2024). Regulation (EU) 2024/1689 of the European Parliament and of the Council of 13 June 2024 laying down harmonised rules on artificial intelligence (Artificial Intelligence Act).Official Journal of the European Union, L 1689, 13 June 2024. Accessed: 2025-05-14

work page 2024
[2]

Aaronson, S. (2023). Watermarking of large language models. https://simons. berkeley.edu/talks/scott-aaronson-ut-austin-openai-2023-08-17 . Ac- cessed: 2025-01-1-

work page 2023
[3]

Agarwal, A., Beygelzimer, A., Dudík, M., Langford, J., and Wallach, H. (2018). A reductions approach to fair classification. InInternational conference on machine learning, pages 60–69. PMLR

work page 2018
[4]

Alghamdi, W., Hsu, H., Jeong, H., Wang, H., Michalak, P ., Asoodeh, S., and Calmon, F. (2022). Beyond adult and compas: Fair multi-class prediction via information projection. Advances in Neural Information Processing Systems, 35:38747–38760

work page 2022
[5]

Asoodeh, S., Diaz, M., and Calmon, F. P . (2020). Contraction of eγ-divergence and its applications to privacy.arXiv preprint arXiv:2012.11035

work page arXiv 2020
[6]

and Jiang, H

Bahri, D. and Jiang, H. (2021). Locally adaptive label smoothing for predictive churn. arXiv preprint arXiv:2102.05140

work page arXiv 2021
[7]

and Wieting, J

Bahri, D. and Wieting, J. (2024). A watermark for black-box language models.arXiv preprint arXiv:2410.02099

work page arXiv 2024
[8]

Bai, Y., Kadavath, S., Kundu, S., Askell, A., Kernion, J., Jones, A., Chen, A., Goldie, A., Mirhoseini, A., McKinnon, C., et al. (2022). Constitutional ai: Harmlessness from ai feedback.arXiv preprint arXiv:2212.08073

work page internal anchor Pith review Pith/arXiv arXiv 2022
[9]

(2023).Fairness and machine learning: Limitations and opportunities

Barocas, S., Hardt, M., and Narayanan, A. (2023).Fairness and machine learning: Limitations and opportunities. MIT Press

work page 2023
[10]

K., Dey, K., Hind, M., Hoffman, S

Bellamy, R. K., Dey, K., Hind, M., Hoffman, S. C., Houde, S., Kannan, K., Lohia, P ., Martino, J., Mehta, S., Mojsilovi´ c, A., et al. (2019). Ai fairness 360: An extensible toolkit for detecting and mitigating algorithmic bias.IBM Journal of Research and Development, 63(4/5):4–1

work page 2019
[11]

and Thomas-Agnan, C

Berlinet, A. and Thomas-Agnan, C. (2004).Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, 1 edition. 105

work page 2004
[12]

On the reproducibility of neural network predictions.arXiv preprint arXiv: 2102.03349,

Bhojanapalli, S., Wilber, K., Veit, A., Rawat, A. S., Kim, S., Menon, A., and Ku- mar, S. (2021). On the reproducibility of neural network predictions.arXiv preprint arXiv:2102.03349

work page arXiv 2021
[13]

Black, E., Leino, K., and Fredrikson, M. (2021). Selective ensembles for consistent predictions.arXiv preprint arXiv:2111.08230

work page arXiv 2021
[14]

Black, E., Raghavan, M., and Barocas, S. (2022). Model multiplicity: Opportunities, con- cerns, and solutions. In2022 ACM Conference on Fairness, Accountability, and Transparency, pages 850–863

work page 2022
[15]

and Michaeli, T

Blau, Y. and Michaeli, T. (2019). Rethinking lossy compression: The rate-distortion- perception tradeoff. InInternational Conference on Machine Learning, pages 675–685. PMLR

work page 2019
[16]

Boyd, S. P . and Vandenberghe, L. (2004).Convex optimization. Cambridge university press

work page 2004
[17]

Breiman, L. (1996). Bagging predictors.Machine learning, 24:123–140

work page 1996
[18]

Breiman, L. (2001). Statistical modeling: The two cultures (with comments and a rejoinder by the author).Statistical science, 16(3):199–231

work page 2001
[19]

and Haas, C

Caton, S. and Haas, C. (2020). Fairness in machine learning: A survey.arXiv preprint arXiv:2010.04053

work page arXiv 2020
[20]

Chandra, B., Dunietz, J., and Roberts, K. (2024). Reducing risks posed by synthetic content: An overview of technical approaches to digital content transparency. Technical Report NIST.AI.100-4, National Institute of Standards and Technology, Gaithersburg, MD

work page 2024
[21]

Chang, Y., Krishna, K., Houmansadr, A., Wieting, J., and Iyyer, M. (2024). Postmark: A robust blackbox watermark for large language models.arXiv preprint arXiv:2406.14517

work page arXiv 2024
[22]

Chao, P ., Sun, Y., Dobriban, E., and Hassani, H. (2024). Watermarking language models with error correcting codes.arXiv preprint arXiv:2406.10281

work page arXiv 2024
[23]

(2000).Design and analysis of digital watermarking, information embedding, and data hiding systems

Chen, B. (2000).Design and analysis of digital watermarking, information embedding, and data hiding systems. PhD thesis, Massachusetts Institute of Technology

work page 2000
[24]

Chen, J., Yu, L., Wang, J., Shi, W., Ge, Y., and Tong, W. (2022). On the rate-distortion- perception function.IEEE Journal on Selected Areas in Information Theory, 3(4):664–673

work page 2022
[25]

Chen, M., Tworek, J., Jun, H., Yuan, Q., Pinto, H. P . D. O., Kaplan, J., Edwards, H., Burda, Y., Joseph, N., Brockman, G., et al. (2021). Evaluating large language models trained on code.arXiv preprint arXiv:2107.03374

work page internal anchor Pith review Pith/arXiv arXiv 2021
[26]

Chouldechova, A. (2017). Fair prediction with disparate impact: A study of bias in recidivism prediction instruments.Big data, 5(2):153–163

work page 2017
[27]

Christ, M., Gunn, S., and Zamir, O. (2024). Undetectable watermarks for language models. InThe Thirty Seventh Annual Conference on Learning Theory, pages 1125–1139. PMLR. 106

work page 2024
[28]

Chzhen, E., Denis, C., Hebiri, M., Oneto, L., and Pontil, M. (2019). Leveraging labeled and unlabeled data for consistent fair binary classification.Advances in Neural Information Processing Systems, 32

work page 2019
[29]

F., Barocas, S., De Sa, C., and Sen, S

Cooper, A. F., Barocas, S., De Sa, C., and Sen, S. (2023). Variance, self-consistency, and arbitrariness in fair classification.arXiv preprint arXiv:2301.11562

work page arXiv 2023
[30]

Coston, A., Rambachan, A., and Chouldechova, A. (2021). Characterizing fairness over the set of good models under selective labels. InInternational Conference on Machine Learning, pages 2144–2155. PMLR

work page 2021
[31]

Cover, T. M. and Thomas, A. J. (2006).Elements of Information Theory. Wiley, New-York, 2nd edition

work page 2006
[32]

and Hellman, D

Creel, K. and Hellman, D. (2022). The algorithmic leviathan: arbitrariness, fairness, and opportunity in algorithmic decision-making systems.Canadian Journal of Philosophy, 52(1):26–43

work page 2022
[33]

Cui, P ., Hu, W., and Zhu, J. (2020). Calibrated reliable regression using maximum mean discrepancy. In Larochelle, H., Ranzato, M., Hadsell, R., Balcan, M., and Lin, H., editors, Advances in Neural Information Processing Systems, volume 33, pages 17164–17175. Curran Associates, Inc

work page 2020
[34]

Cury, C. R. J. (2022). Instituto nacional de estudos e pesquisas educacionais anísio teixeira: uma trajetória em busca de uma educação de qualidade

work page 2022
[35]

Cuturi, M. (2013). Sinkhorn distances: Lightspeed computation of optimal transport. Advances in neural information processing systems, 26

work page 2013
[36]

D., et al

D’Amour, A., Heller, K., Moldovan, D., Adlam, B., Alipanahi, B., Beutel, A., Chen, C., Deaton, J., Eisenstein, J., Hoffman, M. D., et al. (2022). Underspecification presents challenges for credibility in modern machine learning.The Journal of Machine Learning Research, 23(1):10237–10297

work page 2022
[37]

Dathathri, S., See, A., Ghaisas, S., Huang, P .-S., McAdam, R., Welbl, J., Bachani, V ., Kaskasoli, A., Stanforth, R., Matejovicova, T., et al. (2024). Scalable watermarking for identifying large language model outputs.Nature, 634(8035):818–823

work page 2024
[38]

Dawid, A. P . (1982). The well-calibrated bayesian.Journal of the American Statistical Association, 77(379):605–610

work page 1982
[39]

Deng, Z., Dwork, C., and Zhang, L. (2023). Happymap: A generalized multi-calibration method.arXiv preprint arXiv:2303.04379

work page arXiv 2023
[40]

and Freedman, D

Diaconis, P . and Freedman, D. (1980). Finite exchangeable sequences.The Annals of Probability, pages 745–764

work page 1980
[41]

Dwork, C., Feldman, V ., Hardt, M., Pitassi, T., Reingold, O., and Roth, A. L. (2015). Preserving statistical validity in adaptive data analysis. InProceedings of the forty-seventh annual ACM symposium on Theory of computing, pages 117–126. 107

work page 2015
[42]

Dwork, C., Hardt, M., Pitassi, T., Reingold, O., and Zemel, R. (2012). Fairness through awareness. InProceedings of the 3rd Innovations in Theoretical Computer Science Conference, ITCS ’12, page 214–226, New York, NY, USA. Association for Computing Machinery

work page 2012
[43]

P ., Reingold, O., Rothblum, G

Dwork, C., Kim, M. P ., Reingold, O., Rothblum, G. N., and Yona, G. (2021). Outcome indistinguishability. InProceedings of the 53rd Annual ACM SIGACT Symposium on Theory of Computing, pages 1095–1108

work page 2021
[44]

Dwork, C., Lee, D., Lin, H., and Tankala, P . (2023). From pseudorandomness to multi- group fairness and back. InThe Thirty Sixth Annual Conference on Learning Theory, pages 3566–3614. PMLR

work page 2023
[45]

Fabbri, Irene Li, Tianwei She, Suyi Li, and Dragomir R

Fabbri, A. R., Li, I., She, T., Li, S., and Radev, D. R. (2019). Multi-news: A large-scale multi-document summarization dataset and abstractive hierarchical model.arXiv preprint arXiv:1906.01749

work page arXiv 2019
[46]

Fairoze, J., Garg, S., Jha, S., Mahloujifar, S., Mahmoody, M., and Wang, M. (2025). Publicly-detectable watermarking for language models.IACR Communications in Cryptol- ogy, 1(4)

work page 2025
[47]

Fan, A., Jernite, Y., Perez, E., Grangier, D., Weston, J., and Auli, M. (2019). Eli5: Long form question answering.arXiv preprint arXiv:1907.09190

work page arXiv 2019
[48]

Feldman, V . (2009). Distribution-specific agnostic boosting. InInternational Conference on Supercomputing

work page 2009
[49]

Fernandez, P ., Chaffin, A., Tit, K., Chappelier, V ., and Furon, T. (2023). Three bricks to consolidate watermarks for large language models. In2023 IEEE International Workshop on Information Forensics and Security (WIFS), pages 1–6. IEEE

work page 2023
[50]

Fisher, A., Rudin, C., and Dominici, F. (2019). All models are wrong, but many are useful: Learning a variable’s importance by studying an entire class of prediction models simultaneously.J. Mach. Learn. Res., 20(177):1–81

work page 2019
[51]

Z., Boisbunon, A., Chambon, S., Chapel, L., Corenflos, A., Fatras, K., Fournier, N., et al

Flamary, R., Courty, N., Gramfort, A., Alaya, M. Z., Boisbunon, A., Chambon, S., Chapel, L., Corenflos, A., Fatras, K., Fournier, N., et al. (2021). Pot: Python optimal transport.Journal of Machine Learning Research, 22(78):1–8

work page 2021
[52]

A., Scheidegger, C., Venkatasubramanian, S., Choudhary, S., Hamilton, E

Friedler, S. A., Scheidegger, C., Venkatasubramanian, S., Choudhary, S., Hamilton, E. P ., and Roth, D. (2019). A comparative study of fairness-enhancing interventions in machine learning. InProceedings of the conference on fairness, accountability, and transparency, pages 329–338

work page 2019
[53]

Fu, J., Zhao, X., Yang, R., Zhang, Y., Chen, J., and Xiao, Y. (2024). Gumbelsoft: Diversified language model watermarking via the gumbelmax-trick.arXiv preprint arXiv:2402.12948

work page arXiv 2024
[54]

Ganesh, P ., Chang, H., Strobel, M., and Shokri, R. (2023). On the impact of machine learning randomness on group fairness. InProceedings of the 2023 ACM Conference on Fairness, Accountability, and Transparency, FAccT ’23, pages 1789–1800, New York, NY, USA. Association for Computing Machinery. 108

work page 2023
[55]

Garg, S., Jung, C., Reingold, O., and Roth, A. (2024). Oracle efficient online multicali- bration and omniprediction. InProceedings of the 2024 Annual ACM-SIAM Symposium on Discrete Algorithms (SODA), pages 2725–2792. SIAM

work page 2024
[56]

and Pinsker, M

Gel’Fand, I. and Pinsker, M. (1980). Coding for channels with random parameters. Probl. Contr. Inform. Theory, 9(1):19–31

work page 1980
[57]

Geva, M., Schuster, R., Berant, J., and Levy, O. (2021). Transformer feed-forward layers are key-value memories. InProceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pages 5484–5495

work page 2021
[58]

Watermax: breaking the llm watermark detectability-robustness-quality trade-off.arXiv preprint arXiv:2403.04808,

Giboulot, E. and Furon, T. (2024). Watermax: breaking the llm watermark detectability- robustness-quality trade-off.arXiv preprint arXiv:2403.04808

work page arXiv 2024
[59]

Globus-Harris, I., Gupta, V ., Jung, C., Kearns, M., Morgenstern, J., and Roth, A. (2022). Multicalibrated regression for downstream fairness.arXiv preprint arXiv:2209.07312

work page arXiv 2022
[60]

Globus-Harris, I., Harrison, D., Kearns, M., Roth, A., and Sorrell, J. (2023). Multicali- bration as boosting for regression.arXiv preprint arXiv:2301.13767

work page arXiv 2023
[61]

Goodrich, R. K. (1970). A riesz representation theorem. InProc. Amer. Math. Soc, volume 24, pages 629–636

work page 1970
[62]

P ., Reingold, O., and Wieder, U

Gopalan, P ., Hu, L., Kim, M. P ., Reingold, O., and Wieder, U. (2022). Loss minimization through the lens of outcome indistinguishability.arXiv preprint arXiv:2210.08649

work page arXiv 2022
[63]

T., Reingold, O., Sharan, V ., and Wieder, U

Gopalan, P ., Kalai, A. T., Reingold, O., Sharan, V ., and Wieder, U. (2021). Omnipredictors. arXiv preprint arXiv:2109.05389

work page arXiv 2021
[65]

Gretton, A., Borgwardt, K., Rasch, M., Schölkopf, B., and Smola, A. (2006). A kernel method for the two-sample-problem.Advances in neural information processing systems, 19

work page 2006
[66]

Gurobi Optimizer Reference Manual

Gurobi Optimization, LLC (2024). Gurobi Optimizer Reference Manual

work page 2024
[67]

Haghtalab, N., Jordan, M., and Zhao, E. (2024). A unifying perspective on multi- calibration: Game dynamics for multi-objective learning.Advances in Neural Information Processing Systems, 36

work page 2024
[68]

and Raginsky, M

Hajek, B. and Raginsky, M. (2019). Statistical learning theory.Lecture Notes, 387

work page 2019
[69]

Hardt, M., Price, E., and Srebro, N. (2016). Equality of opportunity in supervised learning.Advances in neural information processing systems, 29

work page 2016
[70]

He, H., Liu, Y., Wang, Z., Mao, Y., and Bu, Y. (2024). Universally optimal watermarking schemes for llms: from theory to practice.arXiv preprint arXiv:2410.02890. 109

work page arXiv 2024
[71]

He, H., Liu, Y., Wang, Z., Mao, Y., and Bu, Y. (2025). Theoretically grounded framework for llm watermarking: A distribution-adaptive approach. InThe 1st Workshop on GenAI Watermarking

work page 2025
[72]

Hébert-Johnson, U., Kim, M., Reingold, O., and Rothblum, G. (2018). Multicalibration: Calibration for the (computationally-identifiable) masses. InInternational Conference on Machine Learning, pages 1939–1948. PMLR

work page 2018
[73]

and Floridi, L

Hine, E. and Floridi, L. (2023). The blueprint for an ai bill of rights: in search of enaction, at risk of inaction.Minds and Machines, pages 1–8

work page 2023
[74]

M., Sarro, F., and Harman, M

Hort, M., Chen, Z., Zhang, J. M., Sarro, F., and Harman, M. (2022). Bia mitigation for machine learning classifiers: A comprehensive survey.arXiv preprint arXiv:2207.07068

work page arXiv 2022
[75]

Hou, A., Zhang, J., He, T., Wang, Y., Chuang, Y.-S., Wang, H., Shen, L., Van Durme, B., Khashabi, D., and Tsvetkov, Y. (2024). Semstamp: A semantic watermark with paraphrastic robustness for text generation. InProceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Vol...

work page 2024
[76]

and Calmon, F

Hsu, H. and Calmon, F. d. P . (2022). Rashomon capacity: A metric for predictive multiplicity in probabilistic classification.arXiv preprint arXiv:2206.01295

work page arXiv 2022
[77]

Hu, Z., Chen, L., Wu, X., Wu, Y., Zhang, H., and Huang, H. (2024). Unbiased watermark for large language models. InThe Twelfth International Conference on Learning Representations

work page 2024
[78]

and Wan, X

Huang, B. and Wan, X. (2024). Waterpool: A watermark mitigating trade-offs among imperceptibility, efficacy and robustness.arXiv preprint arXiv:2405.13517

work page arXiv 2024
[79]

D., Jiao, J., and Jordan, M

Huang, B., Zhu, H., Zhu, B., Ramchandran, K., Jordan, M. I., Lee, J. D., and Jiao, J. (2023). Towards optimal statistical watermarking.arXiv preprint arXiv:2312.07930

work page arXiv 2023
[80]

Huang, Y., Sun, L., Wang, H., Wu, S., Zhang, Q., Li, Y., Gao, C., Huang, Y., Lyu, W., Zhang, Y., et al. (2024). TrustLLM: Trustworthiness in large language models.arXiv preprint arXiv:2401.05561

work page arXiv 2024
[81]

A., Kool, W., Paulus, M

Huijben, I. A., Kool, W., Paulus, M. B., and Van Sloun, R. J. (2022). A review of the gumbel-max trick and its extensions for discrete stochasticity in machine learning.IEEE transactions on pattern analysis and machine intelligence, 45(2):1353–1371

work page 2022

Showing first 80 references.