Recognition: unknown
When Agents Shop for You: Role Coherence in AI-Mediated Markets
Pith reviewed 2026-05-07 12:56 UTC · model grok-4.3
The pith
Delegating purchases to AI agents with natural-language profiles lets sellers infer willingness to pay from dialogue alone.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
When language-model buyer agents shop on behalf of verbal consumer profiles, seller-side inference from dialogue alone recovers willingness to pay nearly one-for-one. Comparing this outcome to a numeric-budget condition with confidentiality instructions shows that the leakage arises from role coherence rather than from instruction-following failure. Because the channel is created by the act of delegation to an agent using natural language, it cannot be closed at the prompt level. Architectural interventions are therefore needed to trade off personalization against preference privacy.
What carries the argument
Role coherence, the information channel through which sellers infer willingness to pay from natural-language buyer dialogues without explicit disclosure.
If this is right
- Seller agents can estimate consumer willingness to pay accurately from dialogue even when the buyer agent is instructed not to disclose budgets.
- Prompt-level confidentiality instructions fail to block the leakage in AI-mediated shopping.
- Preference privacy requires interventions at the architectural level rather than through buyer-agent prompts.
- Delegation of purchase decisions to language-model agents inherently alters information flows between consumers and sellers.
Where Pith is reading between the lines
- Consumers may respond by giving AI agents less detailed verbal profiles, which could reduce the personalization benefits the agents are meant to provide.
- Market platforms might need new protocols that limit how much identity information is passed through agent communications.
- The same leakage pattern could appear in other domains where AI agents represent users through natural language, such as negotiation or service requests.
Load-bearing premise
That the difference in seller inference between the natural-language profile condition and the numeric-budget condition with confidentiality instructions is caused only by role coherence and not by other differences in agent behavior or task interpretation.
What would settle it
An experiment in which buyer agents in the verbal-profile condition receive stronger instructions to mask budget details and seller inference accuracy then falls to the level seen in the numeric-budget condition.
Figures
read the original abstract
Consumers are increasingly delegating purchase decisions to AI agents, providing natural-language descriptions of their preferences and identity. We argue that these representations constitute an information channel, role coherence, through which sellers can infer willingness to pay without explicit disclosure by the buyer agent, leading to preference leakage. In an experiment where a language-model buyer agent shops on behalf of a verbal consumer profile, we show that seller-side inference from dialogue alone recovers willingness to pay nearly one-for-one. Comparing this setting to a numeric-budget condition with confidentiality instructions cleanly isolates role coherence as distinct from instruction-following failure. Because this leakage arises from delegation itself, it cannot be mitigated at the prompt level. Instead, we propose architectural interventions that trade off personalization against preference privacy.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript argues that delegating purchase decisions to language-model buyer agents supplied with natural-language consumer profiles creates an information channel termed 'role coherence,' enabling seller agents to infer willingness-to-pay (WTP) from dialogue alone. In an experiment contrasting a verbal-profile condition against a numeric-budget condition that includes explicit confidentiality instructions, the authors claim seller-side inference recovers WTP nearly one-for-one in the verbal case while the numeric case does not, thereby isolating role coherence from generic instruction-following failure. The paper concludes that this leakage is inherent to delegation and cannot be mitigated by prompt engineering, proposing instead architectural interventions that trade personalization against privacy.
Significance. If the experimental isolation holds, the result identifies a structural privacy risk in AI-mediated markets that arises directly from the use of natural-language roles rather than from prompt leakage. The controlled LM-to-LM setup provides a reproducible testbed for measuring preference inference, and the distinction between role coherence and instruction failure is a useful conceptual contribution. However, the strength of the claim depends on whether the two buyer conditions are prompt-matched and whether the seller inference procedure is applied identically; absent those controls the result risks being an artifact of experimental design rather than a general property of delegation.
major comments (2)
- [Experimental Setup] The central isolation claim (verbal-profile leakage vs. numeric-budget non-leakage) is load-bearing for the paper's argument that role coherence is distinct from instruction-following failure. The abstract states the numeric condition uses 'confidentiality instructions,' but without explicit confirmation in the experimental section that the two buyer prompts are otherwise identical in structure, length, and formatting, any difference in recovered WTP could be driven by prompt template differences rather than the presence/absence of a natural-language role.
- [Results] The claim that seller inference 'recovers willingness to pay nearly one-for-one' requires quantitative support (e.g., regression slope, R^{2}, or correlation coefficient) together with sample size, number of trials, and controls for seller-model variation. If these details appear only in an appendix or are omitted from the main results table, the strength of the one-for-one recovery cannot be evaluated.
minor comments (2)
- [Abstract] The abstract introduces 'role coherence' without a concise definition; a one-sentence gloss in the abstract would help readers before the full elaboration.
- [Notation] Notation for willingness-to-pay (WTP) and recovered estimates should be introduced consistently (e.g., WTP vs. inferred WTP) to avoid ambiguity when the same symbols appear in both conditions.
Simulated Author's Rebuttal
We thank the referee for their careful reading and constructive feedback. The comments highlight important aspects of experimental controls and result presentation that we will address to strengthen the manuscript. We respond to each major comment below.
read point-by-point responses
-
Referee: [Experimental Setup] The central isolation claim (verbal-profile leakage vs. numeric-budget non-leakage) is load-bearing for the paper's argument that role coherence is distinct from instruction-following failure. The abstract states the numeric condition uses 'confidentiality instructions,' but without explicit confirmation in the experimental section that the two buyer prompts are otherwise identical in structure, length, and formatting, any difference in recovered WTP could be driven by prompt template differences rather than the presence/absence of a natural-language role.
Authors: We agree that explicit confirmation of prompt equivalence is essential to support the isolation of role coherence from generic instruction-following effects. The experimental section was written with the intent that the two buyer conditions differ solely in profile format (natural-language role versus numeric budget plus confidentiality clause), with all other elements—including overall structure, length, and additional instructions—held constant. To eliminate any ambiguity, we will revise the manuscript to include the complete prompt templates for both conditions in a new table within the experimental setup section and add an explicit statement confirming that the prompts were matched on all dimensions except the role description itself. revision: yes
-
Referee: [Results] The claim that seller inference 'recovers willingness to pay nearly one-for-one' requires quantitative support (e.g., regression slope, R^{2}, or correlation coefficient) together with sample size, number of trials, and controls for seller-model variation. If these details appear only in an appendix or are omitted from the main results table, the strength of the one-for-one recovery cannot be evaluated.
Authors: We acknowledge that the quantitative backing for the near one-for-one recovery should be more immediately visible in the main text. The results section and associated figures present the regression-based evidence for WTP recovery in the verbal-profile condition (contrasted with the numeric condition), along with details on the number of dialogues and seller-model controls. These supporting statistics are currently summarized in the main results and expanded in the appendix. We will revise by adding a concise table to the main results section that reports the regression slope, R², sample size, number of trials, and seller-model controls, ensuring the strength of the claim can be assessed without reference to the appendix. revision: yes
Circularity Check
No circularity: empirical claim rests on experimental comparison, not derivation or self-reference
full rationale
The paper advances an empirical argument from an experiment in which a language-model buyer agent shops on behalf of a verbal consumer profile, with seller-side inference from dialogue recovering willingness-to-pay nearly one-for-one. The abstract and provided text contain no equations, fitted parameters, predictions derived from inputs, or self-citations that bear the central load. The comparison to a numeric-budget confidentiality condition is presented as isolating role coherence, but this is an experimental design choice whose validity is external to any internal reduction; the claim does not reduce to its own inputs by construction. No self-definitional, fitted-input, or uniqueness-imported patterns appear. The result is therefore self-contained against external benchmarks.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption LLM buyer agents will follow role instructions in a way that preserves coherence with the provided consumer profile.
invented entities (1)
-
role coherence
no independent evidence
Reference graph
Works this paper leans on
-
[1]
Acquisti A, Varian HR (2005) Conditioning prices on purchase history.Marketing Science24(3):367–381. Allouah A, Besbes O, Figueroa JD, Kanoria Y , Kumar A (2026) What is your ai agent buying? evaluation, biases, model dependence, & emerging implications of agentic e-commerce.Proceedings of the ACM Web Conference 2026, 8697–8700. Andreas J (2022) Language ...
-
[2]
Aditya Singh, Gerson Kroiz, Senthooran Rajamanoharan, and Neel Nanda
Cherep M, Ma C, Xu A, Shaked M, Maes P, Singh N (2025) A framework for studying ai agent behavior: Evidence from consumer choice experiments.arXiv preprint arXiv:2509.25609. Dub´e JP, Misra S (2023) Personalized pricing and consumer welfare.Journal of Political Economy131(1):131–189. Gopal RD, Li J, Riemer K, Sarker S, Singh PV , Susarla A, Bichler M, Tha...
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.