arxiv: 2605.09716 · v1 · submitted 2026-05-10 · 💻 cs.AI

Recognition: 2 theorem links

· Lean Theorem

Medical Model Synthesis Architectures: A Case Study

Katherine M. Collins , Marlene Berke , Ilia Sucholutsky , Ayman Ali , Adrian Weller , Timothy J. O'Donnell , Tyler Brooke-Wilson , Lionel Wong

show 1 more author

Joshua B. Tenenbaum

Authors on Pith no claims yet

Pith reviewed 2026-05-12 03:18 UTC · model grok-4.3

classification 💻 cs.AI

keywords medical AIdifferential diagnosisprobabilistic modelslanguage modelsuncertainty quantificationtransparent AIclinical decision supportmodel synthesis

0 comments

The pith

MedMSA is a framework that retrieves medical knowledge via language models and constructs formal probabilistic models to support transparent and calibrated inferences under uncertainty.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper is trying to establish a framework called MedMSA for building AI systems that make clinical predictions under uncertainty in a way that is both practically useful and formally transparent. It does this by having language models retrieve relevant prior medical knowledge from a clinical situation and then using that to construct a formal probabilistic model. A sympathetic reader would care because medicine involves many unknowns and current AI struggles with calibrated reasoning and explainability. If the approach works, it could produce things like lists of possible diagnoses each with an associated uncertainty level. The authors show an initial version for differential diagnosis and suggest it can apply more widely to safe clinical decision support.

Core claim

Given a clinical situation, the MedMSA framework uses language models to retrieve relevant prior knowledge, but constructs a formal probabilistic model to support calibrated and verifiable inferences under uncertainty. An initial proof-of-concept applies this to differential diagnosis by producing an uncertainty-weighted list of potential diagnoses that could explain a patient's symptoms.

What carries the argument

The MedMSA framework, which retrieves prior knowledge using language models and then constructs a formal probabilistic model from it to enable transparent reasoning under uncertainty.

If this is right

It can generate uncertainty-weighted lists of potential diagnoses for a patient's symptoms.
It supports calibrated and verifiable inferences rather than opaque predictions.
The approach can be extended to other clinical tasks such as treatment decisions under uncertainty.
It enables safer collaborations between AI systems and clinicians by making reasoning inspectable.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

This could be tested by comparing the framework's diagnostic lists to those produced by expert physicians on the same cases.
Integration with electronic health records might allow automatic checking of the knowledge retrieved by the language model.
The same structure could naturally extend to modeling sequences of clinical actions like ordering tests or choosing treatments.

Load-bearing premise

Language models can reliably retrieve relevant prior medical knowledge and a formal probabilistic model can be constructed from it to support calibrated and verifiable inferences under uncertainty.

What would settle it

A test set of patient cases where the uncertainty-weighted diagnosis lists from the framework fail to include the actual cause or assign probabilities that are poorly calibrated against observed outcomes.

Figures

Figures reproduced from arXiv: 2605.09716 by Adrian Weller, Ayman Ali, Ilia Sucholutsky, Joshua B. Tenenbaum, Katherine M. Collins, Lionel Wong, Marlene Berke, Timothy J. O'Donnell, Tyler Brooke-Wilson.

**Figure 1.** Figure 1: MedMSA overview. MedMSA takes as input a patient vignette expressed in natural language (A) and synthesizes a causal diagnosis model (B) using a series, over which probabilistic inference can be run (C) to estimate the likelihood of various conditions. be presented in natural language (e.g., that a patient Sean has chest pain), and a clinician may then be tasked with inferring the likely condition (“diff… view at source ↗

**Figure 2.** Figure 2: Vignettes and differential inferences. (A)Example vignettes, varying in observations provided about patient Sean. (B-C) Probabilities are computed as the number of samples drawn via rejection sampling, aggregated over all runs that compiled (9, 15, 8, and 10 models of 20 sampled resulted in compileable programs). Any one model is synthesized to joint answer whether Sean is having a heart attack (B) and wha… view at source ↗

**Figure 3.** Figure 3: Example synthesized program. Excerpts from synthesized probabilistic model from MedMSA for the fourth vignette wherein Sean has a clicking/crunching noise coming from his chest. Code and comments are generated by the pipeline. inference over conditions. We engineer and explore an initial instantiation of a MedMSA, directly following the scaffold built by (Wong et al., 2025) (see Figure 3c). We use an open-… view at source ↗

**Figure 4.** Figure 4: Model intervention. Example single point edit that a clinician could make on the output of one of the programs synthesized by MedMSA. Changing the condition statement for a model that was synthesized for the second vignette. Inference is rerun to “imagine” the likelihood that Sean is having a heart attack, if it turns out he had exercised. vignette. Second, they indicated that Other is not a sensible diff… view at source ↗

read the original abstract

Medicine is rife with high-stakes uncertainty. Doctors routinely make clinical judgments and decisions that juggle many fundamental unknowns, like predictions about what might be causing a patients' symptoms or decisions about what treatment to try next. Despite increasing interest in developing AI systems that aid or even replace doctors in clinical settings, current systems struggle with calibrated reasoning under uncertainty, and are often deeply opaque about their reasoning. We propose a framework for AI systems that can make practically useful but formally transparent clinical predictions under uncertainty. Given a clinical situation, our framework (MedMSA) uses language models to retrieve relevant prior knowledge, but constructs a formal probabilistic model to support calibrated and verifiable inferences under uncertainty. We show how an initial proof-of-concept of this framework can be used for differential diagnosis, producing an uncertainty-weighted list of potential diagnoses that could explain a patients' symptoms, and discuss future applications and directions for applying this framework more generally for safe clinical collaborations.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper proposes MedMSA, a framework for AI-assisted clinical reasoning that uses language models to retrieve relevant prior medical knowledge and then constructs a formal probabilistic model to support calibrated, transparent inferences under uncertainty. It presents an initial proof-of-concept application to differential diagnosis, in which the system generates an uncertainty-weighted list of potential diagnoses consistent with a patient's symptoms, and discusses extensions to other clinical tasks.

Significance. If the framework can be realized with the claimed properties, it would address a key limitation of current medical AI systems by combining the broad knowledge access of language models with the verifiability of probabilistic models, potentially enabling safer human-AI collaboration in high-stakes settings. The approach is conceptually aligned with needs for calibrated uncertainty and transparency in medicine, but the manuscript supplies no concrete implementation, data, metrics, or worked examples, so its practical significance cannot yet be evaluated.

major comments (2)

The abstract and proof-of-concept section claim that MedMSA produces an uncertainty-weighted list of diagnoses, yet no symptom set, retrieval procedure, probabilistic construction steps, or output list is provided. Without these details the central claim that the framework supports calibrated and verifiable inferences cannot be assessed.
No equations or algorithmic description define how LM-retrieved knowledge is turned into a formal probabilistic model (e.g., how priors, likelihoods, or uncertainty weights are instantiated). This step is load-bearing for the paper's promise of formal transparency.

minor comments (2)

Abstract: 'patients'' should be 'patient's'.
The manuscript would benefit from explicit comparison to existing hybrid neuro-symbolic or probabilistic medical reasoning systems.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive comments on our manuscript describing the MedMSA framework. The points raised correctly identify areas where the proof-of-concept requires greater specificity to allow evaluation of the central claims. We address each comment below and will incorporate revisions to strengthen the paper.

read point-by-point responses

Referee: The abstract and proof-of-concept section claim that MedMSA produces an uncertainty-weighted list of diagnoses, yet no symptom set, retrieval procedure, probabilistic construction steps, or output list is provided. Without these details the central claim that the framework supports calibrated and verifiable inferences cannot be assessed.

Authors: We agree that the manuscript presents the differential-diagnosis proof-of-concept at a conceptual level without a concrete worked example. In the revised version we will add a dedicated subsection containing a specific symptom set, the exact language-model retrieval steps used to obtain relevant medical knowledge, the sequence of operations that instantiate the probabilistic model from that knowledge, and the resulting uncertainty-weighted diagnosis list with explicit posterior probabilities. This addition will make the claims directly assessable. revision: yes
Referee: No equations or algorithmic description define how LM-retrieved knowledge is turned into a formal probabilistic model (e.g., how priors, likelihoods, or uncertainty weights are instantiated). This step is load-bearing for the paper's promise of formal transparency.

Authors: We acknowledge that the original text does not supply the requested equations or pseudocode. The revision will include a new formal-description subsection that defines the mapping from retrieved knowledge to the probabilistic model. This will contain (i) the prior distribution over diagnoses derived from the retrieved epidemiological and literature data, (ii) the likelihood function P(symptoms | diagnosis) constructed from symptom-diagnosis associations in the retrieved sources, (iii) the uncertainty-weighting mechanism (via Bayesian updating), and (iv) the algorithmic steps in pseudocode. These additions will substantiate the transparency claim. revision: yes

Circularity Check

0 steps flagged

No significant circularity detected in framework proposal

full rationale

The manuscript describes a high-level conceptual framework (MedMSA) that retrieves prior knowledge via language models and then constructs a separate formal probabilistic model for uncertainty-calibrated clinical inferences such as differential diagnosis. No equations, fitted parameters, or derivation steps are exhibited that reduce any claimed output to the inputs by construction, self-definition, or load-bearing self-citation. The proof-of-concept claim rests on external LM retrieval and standard probabilistic construction rather than internal fitting or renaming of results, rendering the argument self-contained against the provided description.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The proposal rests on the assumption that language models provide accurate retrieval of medical knowledge; no free parameters or invented entities are specified in the abstract.

axioms (1)

domain assumption Language models can retrieve relevant prior medical knowledge accurately enough to support construction of a formal probabilistic model.
Central to the MedMSA framework as described in the abstract.

pith-pipeline@v0.9.0 · 5484 in / 1099 out tokens · 42786 ms · 2026-05-12T03:18:20.410719+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Foundation/RealityFromDistinction.lean reality_from_one_distinction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

our framework (MedMSA) uses language models to retrieve relevant prior knowledge, but constructs a formal probabilistic model to support calibrated and verifiable inferences under uncertainty
IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

We engineer and explore an initial instantiation of a MedMSA... synthesize k programs into the probabilistic programming language WebPPL... Rejection sampling is then run in each valid model

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

298 extracted references · 298 canonical work pages · 5 internal anchors

[1]

Health Affairs , volume=

Hospital ‘Boarding’Of Patients In The Emergency Department Increasingly Common, 2017--24: Article examines hospital ‘boarding’of patience in the emergency department , author=. Health Affairs , volume=

work page 2017
[2]

European Respiratory Review , volume=

Spontaneous pneumothorax: epidemiology, pathophysiology and cause , author=. European Respiratory Review , volume=

work page
[3]

Science , volume=

Toward the eradication of medical diagnostic errors , author=. Science , volume=. 2024 , publisher=

work page 2024
[4]

Hopkins and Erik Cornelisse , title =

Ashley M. Hopkins and Erik Cornelisse , title =. Science , volume =. 2026 , doi =. https://www.science.org/doi/pdf/10.1126/science.aeg8766 , abstract =

work page doi:10.1126/science.aeg8766 2026
[5]

Nature Communications , year=

Bayesian teaching enables probabilistic reasoning in large language models , author=. Nature Communications , year=

work page
[6]

Reasoning Models Don't Always Say What They Think

Reasoning models don't always say what they think , author=. arXiv preprint arXiv:2505.05410 , year=

work page internal anchor Pith review arXiv
[7]

Brodeur and Thomas A

Peter G. Brodeur and Thomas A. Buckley and Zahir Kanjee and Ethan Goh and Evelyn Bin Ling and Priyank Jain and Stephanie Cabral and Raja-Elie Abdulnour and Adrian D. Haimovich and Jason A. Freed and Andrew Olson and Daniel J. Morgan and Jason Hom and Robert Gallo and Liam G. McCoy and Haadi Mombini and Christopher Lucas and Misha Fotoohi and Matthew Gwiaz...

work page doi:10.1126/science.adz4433 2026
[8]

Br Med J , volume=

Computer-aided diagnosis of acute abdominal pain , author=. Br Med J , volume=. 1972 , publisher=

work page 1972
[9]

2010 , publisher=

Complications: A surgeon's notes on an imperfect science , author=. 2010 , publisher=

work page 2010
[10]

American anthropologist , publisher =

Does biology constrain culture? , author =. American anthropologist , publisher =

work page
[11]

Science , volume=

Reasoning foundations of medical diagnosis: symbolic logic, probability, and value theory aid our understanding of how physicians reason , author=. Science , volume=. 1959 , publisher=

work page 1959
[12]

Science , volume=

Using large-scale experiments and machine learning to discover theories of human decision-making , author=. Science , volume=. 2021 , publisher=

work page 2021
[13]

PLoS computational biology , volume=

The pursuit of happiness: A reinforcement learning perspective on habituation and comparisons , author=. PLoS computational biology , volume=. 2022 , publisher=

work page 2022
[14]

arXiv preprint arXiv:2506.16755 , year=

Language-Informed Synthesis of Rational Agent Models for Grounded Theory-of-Mind Reasoning On-The-Fly , author=. arXiv preprint arXiv:2506.16755 , year=

work page arXiv
[15]

Are you really sure?

“Are you really sure?” Understanding the effects of human self-confidence calibration in AI-assisted decision making , author=. Proceedings of the 2024 CHI Conference on Human Factors in Computing Systems , pages=

work page 2024
[16]

arXiv preprint arXiv:2406.17055 , year=

Large language models assume people are more rational than we really are , author=. arXiv preprint arXiv:2406.17055 , year=

work page arXiv
[17]

arXiv preprint arXiv:2401.08672 , year=

Concept alignment , author=. arXiv preprint arXiv:2401.08672 , year=

work page arXiv
[18]

Nature Human Behaviour , volume=

Testing theory of mind in large language models and humans , author=. Nature Human Behaviour , volume=. 2024 , publisher=

work page 2024
[19]

arXiv preprint arXiv:2504.10839 , year=

Rethinking theory of mind benchmarks for llms: Towards a user-centered perspective , author=. arXiv preprint arXiv:2504.10839 , year=

work page arXiv
[20]

arXiv preprint arXiv:2409.08811 , year=

Mutual theory of mind in human-ai collaboration: An empirical study with llm-driven ai agents in a real-time shared workspace task , author=. arXiv preprint arXiv:2409.08811 , year=

work page arXiv
[21]

American anthropologist , volume=

Cultural transmission and the diffusion of innovations: Adoption dynamics indicate that biased cultural transmission is the predominate force in behavioral change , author=. American anthropologist , volume=. 2001 , publisher=

work page 2001
[22]

Proceedings of the National Academy of Sciences , volume=

Innateness and culture in the evolution of language , author=. Proceedings of the National Academy of Sciences , volume=. 2007 , publisher=

work page 2007
[23]

ACM Transactions on Interactive Intelligent Systems , volume=

Understanding trust and reliance development in ai advice: Assessing model accuracy, model explanations, and experiences from previous interactions , author=. ACM Transactions on Interactive Intelligent Systems , volume=. 2024 , publisher=

work page 2024
[24]

ICLR 2025 Workshop on Foundation Models in the Wild , year =

AutoToM: Automated Bayesian Inverse Planning and Model Discovery for Open-ended Theory of Mind , author=. ICLR 2025 Workshop on Foundation Models in the Wild , year =

work page 2025
[25]

2019 , publisher=

The promise of artificial intelligence: reckoning and judgment , author=. 2019 , publisher=

work page 2019
[26]

npj Digital Medicine , year=

From tool to teammate in a randomized controlled trial of clinician-AI collaborative workflows for diagnosis , author=. npj Digital Medicine , year=

work page
[27]

2015 , publisher=

The Laws of Medicine: Field Notes from an Uncertain Science , author=. 2015 , publisher=

work page 2015
[28]

The Study and Design of Human-AI Thought Partnerships , url=

Collins, Katherine , year=. The Study and Design of Human-AI Thought Partnerships , url=. doi:10.17863/CAM.124836 , school=

work page doi:10.17863/cam.124836
[29]

Nature , volume=

Large language models encode clinical knowledge , author=. Nature , volume=. 2023 , publisher=

work page 2023
[30]

Nature medicine , volume=

Toward expert-level medical question answering with large language models , author=. Nature medicine , volume=. 2025 , publisher=

work page 2025
[31]

Nature Medicine , volume=

Enhancing the reliability and accuracy of AI-enabled diagnosis via complementarity-driven deferral to clinicians , author=. Nature Medicine , volume=. 2023 , publisher=

work page 2023
[32]

Mitigating LLM biases toward spurious social contexts using direct preference optimization

Mitigating LLM biases toward spurious social contexts using direct preference optimization , author=. arXiv preprint arXiv:2604.02585 , year=

work page internal anchor Pith review Pith/arXiv arXiv
[33]

JAMA Network Open , volume=

Large language model performance and clinical reasoning tasks , author=. JAMA Network Open , volume=. 2026 , publisher=

work page 2026
[34]

Current Directions in Psychological Science , pages=

Meaningful Long-Term Thought Partnerships of Minds and Machines , author=. Current Directions in Psychological Science , pages=

work page
[35]

The Fourteenth International Conference on Learning Representations , year=

Shoot First, Ask Questions Later? Building Rational Agents that Explore and Act Like People , author=. The Fourteenth International Conference on Learning Representations , year=

work page
[36]

Hastings Center Report , volume=

Uncertainty and the shaping of medical decisions , author=. Hastings Center Report , volume=. 1991 , publisher=

work page 1991
[37]

Nature Medicine , pages=

Teaching machines to doubt , author=. Nature Medicine , pages=. 2025 , publisher=

work page 2025
[38]

M., Ying, L., Zhang, C

Modeling open-world cognition as on-demand synthesis of probabilistic models , author=. arXiv preprint arXiv:2507.12547 , year=

work page arXiv
[39]

Bounded Rationality as a Strategy for Cognitive Science , author =

work page
[40]

Proceedings of the National Academy of Sciences , publisher =

Algorithmic monoculture and social welfare , author =. Proceedings of the National Academy of Sciences , publisher =

work page
[41]

Hannah Rose Kirk and Alexander Whitefield and Paul R. The. The Thirty-eight Conference on Neural Information Processing Systems Datasets and Benchmarks Track , url =

work page
[42]

Trends in cognitive sciences , publisher =

Cognitive culture: theoretical and empirical insights into social learning strategies , author =. Trends in cognitive sciences , publisher =

work page
[43]

Proceedings of the 2023 ACM Conference on Fairness, Accountability, and Transparency , pages =

Towards a science of human-AI decision making: An overview of design space in empirical human-subject studies , author =. Proceedings of the 2023 ACM Conference on Fairness, Accountability, and Transparency , pages =

work page 2023
[44]

Nature , publisher =

AI models collapse when trained on recursively generated data , author =. Nature , publisher =

work page
[45]

Nature Medicine , publisher =

An algorithmic approach to reducing unexplained pain disparities in underserved populations , author =. Nature Medicine , publisher =

work page
[46]

Learning personalized decision support policies , author =

work page
[47]

Proceedings of the 29th International Conference on Intelligent User Interfaces , pages =

Accuracy-Time Tradeoffs in AI-Assisted Decision Making under Time Pressure , author =. Proceedings of the 29th International Conference on Intelligent User Interfaces , pages =

work page
[48]

The 2024 ACM Conference on Fairness, Accountability, and Transparency , pages =

A Decision Theoretic Framework for Measuring AI Reliance , author =. The 2024 ACM Conference on Fairness, Accountability, and Transparency , pages =

work page 2024
[49]

Organizational Behavior and Human Decision Processes , publisher =

Algorithm appreciation: People prefer algorithmic to human judgment , author =. Organizational Behavior and Human Decision Processes , publisher =

work page
[50]

, author =

Algorithm aversion: People erroneously avoid algorithms after seeing them err. , author =. Journal of Experimental Psychology: General , publisher =

work page
[51]

Biden , year = 2023, publisher =

Joe R. Biden , year = 2023, publisher =

work page 2023
[52]

Proceedings of the 3rd ACM Conference on Equity and Access in Algorithms, Mechanisms, and Optimization , pages =

The unequal opportunities of large language models: Examining demographic biases in job recommendations by chatgpt and llama , author =. Proceedings of the 3rd ACM Conference on Equity and Access in Algorithms, Mechanisms, and Optimization , pages =

work page
[53]

Proceedings of the ACM collective intelligence conference , pages =

Gender bias and stereotypes in large language models , author =. Proceedings of the ACM collective intelligence conference , pages =

work page
[54]

Morals & Machines , publisher =

Deskilling, upskilling, and reskilling: a case for hybrid intelligence , author =. Morals & Machines , publisher =

work page
[55]

Bhatt, Umang and Sargeant, Holli , year = 2024, journal =. When

work page 2024
[56]

Organizational Dynamics , publisher =

Loafing in the era of generative AI , author =. Organizational Dynamics , publisher =

work page
[57]

Positive psychology in practice: Promoting human flourishing in work, health, education, and everyday life , publisher =

The paradox of choice , author =. Positive psychology in practice: Promoting human flourishing in work, health, education, and everyday life , publisher =

work page
[58]

Social learning and innovation in contemporary hunter-gatherers: Evolutionary and ethnographic perspectives , publisher =

A multistage learning model for cultural transmission: Evidence from three indigenous societies , author =. Social learning and innovation in contemporary hunter-gatherers: Evolutionary and ethnographic perspectives , publisher =

work page
[59]

Philosophical Transactions of the Royal Society B , publisher =

The optimal timing of teaching and learning across the life course , author =. Philosophical Transactions of the Royal Society B , publisher =

work page
[60]

Do Large Language Models Perform the Way People Expect? Measuring the Human Generalization Function , author =

work page
[61]

Royal Society open science , publisher =

Dynamic social learning in temporally and spatially variable environments , author =. Royal Society open science , publisher =

work page
[62]

PloS one , publisher =

When does selection favor learning from the old? Social learning in age-structured populations , author =. PloS one , publisher =

work page
[63]

Annual Review of Anthropology , publisher =

What makes inventions become traditions? , author =. Annual Review of Anthropology , publisher =

work page
[64]

Philosophy of science , publisher =

The scientist as child , author =. Philosophy of science , publisher =

work page
[65]

Trends in cognitive sciences , publisher =

The child as hacker , author =. Trends in cognitive sciences , publisher =

work page
[66]

Topics in Cognitive Science , publisher =

Local search and the evolution of world models , author =. Topics in Cognitive Science , publisher =

work page
[67]

Science , publisher =

Why copy others? Insights from the social learning strategies tournament , author =. Science , publisher =

work page
[68]

Journal of theoretical biology , publisher =

Adaptive strategies for cumulative cultural learning , author =. Journal of theoretical biology , publisher =

work page
[69]

Behavioral and Brain Sciences , publisher =

The elephant in the room: What matters cognitively in cumulative technological culture , author =. Behavioral and Brain Sciences , publisher =

work page
[70]

Trends in ecology & evolution , publisher =

Cultural evolutionary perspectives on creativity and human innovation , author =. Trends in ecology & evolution , publisher =

work page
[71]

Science advances , publisher =

Flexible learning, rather than inveterate innovation or copying, drives cumulative knowledge gain , author =. Science advances , publisher =

work page
[72]

Journal of the Royal Society interface , publisher =

Invention as a combinatorial process: evidence from US patents , author =. Journal of the Royal Society interface , publisher =

work page
[73]

Nature Human Behaviour , publisher =

Machine culture , author =. Nature Human Behaviour , publisher =

work page
[74]

Science , publisher =

AI can help humans find common ground in democratic deliberation , author =. Science , publisher =

work page
[75]

Nature Human Behaviour , publisher =

How human--AI feedback loops alter human perceptual, emotional and social judgements , author =. Nature Human Behaviour , publisher =

work page
[76]

Uncertainty in Artificial Intelligence , pages =

On the informativeness of supervision signals , author =. Uncertainty in Artificial Intelligence , pages =

work page
[77]

Environmental Science & Technology , publisher =

Risks and benefits of large language models for the environment , author =. Environmental Science & Technology , publisher =

work page
[78]

Proceedings of the 2023 ACM Conference on Fairness, Accountability, and Transparency , pages =

Harms from increasingly agentic algorithmic systems , author =. Proceedings of the 2023 ACM Conference on Fairness, Accountability, and Transparency , pages =

work page 2023
[79]

Proceedings of the AAAI Conference on Artificial Intelligence , volume = 33, number =

Updates in human-ai teams: Understanding and addressing the performance/compatibility tradeoff , author =. Proceedings of the AAAI Conference on Artificial Intelligence , volume = 33, number =

work page
[80]

Proceedings of the National Academy of Sciences , publisher =

The debate over understanding in AI’s large language models , author =. Proceedings of the National Academy of Sciences , publisher =

work page

Showing first 80 references.