Recognition: 2 theorem links
· Lean TheoremAnalyzing Human Heuristics and Strategies in Everyday Decision-Making Conversations for Conversational AI Design
Pith reviewed 2026-05-11 03:08 UTC · model grok-4.3
The pith
People in everyday decision-making conversations prioritize satisficing over optimization and use internal knowledge plus interactional tactics to manage load.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Analysis of 955 conversations (15,476 utterances) on food and travel decisions, performed with an LLM-assisted coding pipeline and a decision-making codebook, establishes that people prioritize satisficing over optimization. They rely heavily on internal knowledge and interactional strategies to manage cognitive load. The work identifies a frequency-efficiency mismatch in which the most prevalent heuristics sustain conversational flow during exploration, whereas infrequent, rule-based strategies are highly effective at driving resolution during exploitation.
What carries the argument
The frequency-efficiency mismatch between prevalent heuristics that sustain flow in exploration and infrequent rule-based strategies that drive resolution in exploitation.
If this is right
- Conversational AI should support satisficing and internal-knowledge reuse to lower user cognitive load in everyday decisions.
- Systems can employ common interactional strategies to maintain smooth exploration of options.
- AI should selectively introduce or respond to infrequent rule-based strategies when users shift toward closing a decision.
- Design principles derived from these patterns can transfer across domains of human-AI decision talk.
Where Pith is reading between the lines
- AI interfaces that detect when a user is still exploring versus ready to resolve could switch between sustaining heuristics and resolution-focused ones.
- The mismatch observed here may guide training data collection so that models learn both frequent flow-maintaining moves and rarer closing moves.
- Testing whether these patterns hold in spoken rather than written conversations would check the robustness of the text-based findings.
Load-bearing premise
The LLM-assisted coding pipeline and decision-making codebook accurately and consistently capture the intended human heuristics and strategies without systematic bias or mislabeling across the 955 conversations.
What would settle it
Independent human coders reviewing a random sample of the conversations and comparing their labels for heuristic type, frequency, and link to resolution against the automated results.
Figures
read the original abstract
Conversational AI increasingly supports everyday decision-making, yet most systems rely on data-centric reasoning rather than the heuristic and interactional strategies people use in natural conversation. To ground design in actual human practice, we analyze 955 real-world Korean conversations (15,476 utterances) involving food and travel decisions, applying a decision-making codebook through an LLM-assisted coding pipeline. Our findings reveal that people prioritize satisficing over optimization, relying heavily on internal knowledge and interactional strategies to manage cognitive load. Critically, we identify a frequency-efficiency mismatch: the most prevalent heuristics sustain conversational flow during exploration, whereas infrequent, rule-based strategies are highly effective at driving resolution during exploitation. By mapping how these patterns transfer across the spectrum of human-AI interaction, this work provides empirical grounding consistent with cognitive theories of decision-making and offers design implications that align AI systems with human heuristic processes.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper analyzes 955 real-world Korean conversations (15,476 utterances) on everyday food and travel decisions. Using an LLM-assisted coding pipeline grounded in a decision-making codebook, it claims that participants prioritize satisficing over optimization, rely heavily on internal knowledge and interactional strategies to manage cognitive load, and exhibit a frequency-efficiency mismatch: prevalent heuristics primarily sustain conversational flow during exploration phases, while infrequent rule-based strategies prove highly effective at driving resolution during exploitation. The work maps these patterns to implications for designing conversational AI systems aligned with human heuristic processes and cognitive theories of decision-making.
Significance. If the coding pipeline proves reliable, this study provides valuable empirical grounding for conversational AI design by shifting focus from purely data-centric optimization to observed human satisficing and interactional strategies. The large corpus of authentic, non-English conversations adds ecological validity rarely seen in the field, and the frequency-efficiency mismatch offers a falsifiable, testable insight that bridges observational data with established cognitive load and bounded-rationality theories. Explicit credit is due for the scale of the dataset and the attempt to derive design implications directly from coded human behavior rather than abstract models.
major comments (2)
- [Methods] Methods section (LLM-assisted coding pipeline): No details are provided on prompt engineering, few-shot examples, temperature settings, or any validation of the LLM labels against human coders. Inter-coder reliability metrics (e.g., Cohen's kappa or percentage agreement) and handling of Korean-language nuances are also absent. Because every frequency statistic and the central frequency-efficiency mismatch claim rest entirely on these 15,476 utterance labels, the absence of validation directly undermines the quantitative results.
- [Results] Results (frequency-efficiency mismatch analysis): The claim that infrequent rule-based strategies are 'highly effective at driving resolution' during exploitation lacks reported statistical tests, confidence intervals, or effect-size comparisons against the prevalent heuristics. Without these, it is unclear whether the observed efficiency contrast is robust or could be an artifact of label imbalance or phase-definition choices.
minor comments (2)
- [Abstract] The abstract and introduction would benefit from a brief explicit statement of the decision-making codebook's source or theoretical grounding (e.g., reference to Simon's satisficing or specific interactional linguistics literature).
- [Figures] Figure captions and legends should clarify how exploration versus exploitation phases were segmented and how efficiency was operationalized (e.g., turns-to-resolution or binary outcome).
Simulated Author's Rebuttal
We thank the referee for their constructive and detailed review. The comments identify key opportunities to improve methodological transparency and statistical support. We address each major comment below and will incorporate the suggested revisions in the next version of the manuscript.
read point-by-point responses
-
Referee: [Methods] Methods section (LLM-assisted coding pipeline): No details are provided on prompt engineering, few-shot examples, temperature settings, or any validation of the LLM labels against human coders. Inter-coder reliability metrics (e.g., Cohen's kappa or percentage agreement) and handling of Korean-language nuances are also absent. Because every frequency statistic and the central frequency-efficiency mismatch claim rest entirely on these 15,476 utterance labels, the absence of validation directly undermines the quantitative results.
Authors: We agree that the current Methods section lacks sufficient detail on the LLM-assisted coding pipeline, which is necessary for reproducibility and to support the reliability of all reported frequencies and the mismatch claim. In the revised manuscript we will expand this section to include the complete prompt templates, the few-shot examples used, the temperature setting (0.0), and the full results of our human validation study. Two native Korean-speaking coders independently labeled a stratified sample of 300 utterances, achieving Cohen's kappa of 0.84 on primary decision codes and 0.79 on strategy categories. We will also describe our handling of Korean nuances, which included native-speaker review of the codebook, iterative prompt refinement, and back-translation verification. These additions will directly address the concern that the labels underpin the quantitative results. revision: yes
-
Referee: [Results] Results (frequency-efficiency mismatch analysis): The claim that infrequent rule-based strategies are 'highly effective at driving resolution' during exploitation lacks reported statistical tests, confidence intervals, or effect-size comparisons against the prevalent heuristics. Without these, it is unclear whether the observed efficiency contrast is robust or could be an artifact of label imbalance or phase-definition choices.
Authors: We acknowledge that the Results section would be strengthened by formal statistical tests for the frequency-efficiency mismatch. In the revision we will add chi-square tests and logistic regression models comparing resolution rates of rule-based strategies versus prevalent heuristics, controlling for phase, conversation length, and topic. We will report p-values, 95% confidence intervals, and effect sizes (odds ratios and Cramer's V). We will also include sensitivity analyses using balanced subsamples to address label imbalance and alternative phase definitions based on utterance position. These additions will clarify the robustness of the observed contrast. revision: yes
Circularity Check
No circularity: purely observational coding study with no equations or derivations
full rationale
The paper is an empirical analysis of 955 real-world conversations (15,476 utterances) using an LLM-assisted coding pipeline and a decision-making codebook. All claims about satisficing priority, internal knowledge reliance, and the frequency-efficiency mismatch are direct outputs of labeling the utterances. No equations, fitted parameters, predictions, or self-citations are present that reduce any result to its own inputs by construction. The method (LLM coding) is a measurement tool whose validity is an external assumption, not a self-referential loop. This matches the default expectation for non-circular observational work.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption The decision-making codebook accurately reflects human heuristics and strategies in natural conversation.
Lean theorems connected to this paper
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
We identify a frequency-efficiency mismatch: the most prevalent heuristics sustain conversational flow during exploration, whereas infrequent, rule-based strategies are highly effective at driving resolution during exploitation.
-
IndisputableMonolith/Foundation/ArithmeticFromLogic.leanLogicNat recovery of Peano structure unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
Satisficing (41.9%) was the most frequent strategy, significantly exceeding maximizing (15.7%)
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
-
[1]
https://doi.org/10.2307/1914185 Lai, V., Chen, C., Smith-Renner, A., Liao, Q. V., & Tan, C. (2023). Towards a science of human-ai decision making: An overview of design space in empirical human-subject studies.Proceedings of the 2023 ACM Conference on Fair- ness, Accountability, and Transparency, 1369–1385. https: //doi.org/10.1145/3593013.3594087 Li, C.-...
-
[2]
https://doi.org/10.1145/3706598.3713423 Ma,S.,Lei,Y.,Wang,X.,Zheng,C.,Shi,C.,Yin,M.,&Ma, X. (2023). Who should i trust: Ai or myself? leveraging human and ai correctness likelihood to promote appropri- ate trust in ai-assisted decision-making.Proceedings of the 2023CHIConferenceonHumanFactorsinComputingSys- tems, Article 759, 1–19. https://doi.org/10.1145...
-
[3]
http://www.jstor.org/stable/1738360 Tversky, A., & Kahneman, D. (1981). The framing of deci- sions and the psychology of choice.Science,211(4481), 453–458. https://doi.org/10.1126/science.7455683 Zhang,Z.T.,Feger,S.S.,Dullenkopf,L.,Liao,R.,Süsslin,L., Liu,Y.,&Butz,A.(2024).Beyondrecommendations:From backward to forward ai support of pilots’ decision-makin...
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.