The Decision to Verify: How Warmth and User Characteristics Shape Reliance on Conversational Agents for Information Search

Frederik Bungaran Ishak Situmeang; Mert Yazan; Suzan Verberne

arxiv: 2605.28498 · v1 · pith:36AVOJBSnew · submitted 2026-05-27 · 💻 cs.HC · cs.AI

The Decision to Verify: How Warmth and User Characteristics Shape Reliance on Conversational Agents for Information Search

Mert Yazan , Frederik Bungaran Ishak Situmeang , Suzan Verberne This is my paper

Pith reviewed 2026-06-29 10:04 UTC · model grok-4.3

classification 💻 cs.HC cs.AI

keywords overrelianceconversational AIinformation verificationuser trustconversational warmthhybrid searchfact-checking behaviordigital literacy

0 comments

The pith

Reliance on conversational AI persists even when web search is available for checking answers.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This paper tests whether access to both a chatbot and web search reduces users' tendency to accept AI answers without checking. It finds that reliance continues, and the choice to verify depends mainly on users' existing perceptions such as prior trust in chatbots rather than on whether the answer looks correct. Warm chatbot style makes users more likely to agree with incorrect answers. Checking with other AI sources improves accuracy while web search does not. The work shows that overreliance is shaped by stable user traits more than by the immediate interaction.

Core claim

In a mixed-subjects experiment, participants answered questions with either a warm or neutral chatbot and could optionally use web search to verify. Reliance on the chatbot persisted despite the verification option. Verification behavior was driven by user characteristics such as prior trust, resulting in some participants fact-checking consistently and others rarely doing so. Warm conversational style had an indirect effect by increasing agreement with incorrect chatbot responses. Consulting additional AI sources predicted higher accuracy, but traditional web search did not.

What carries the argument

Mixed-subjects experiment tracking verification decisions and reliance when participants can choose between chatbot answers and web search, with conversational warmth manipulated and user traits measured as predictors.

If this is right

Reliance on AI answers remains high even in hybrid setups that combine conversational agents with web search.
Verification behavior is predicted more by stable user perceptions like prior trust than by properties of the specific answer.
Warm conversational style increases the likelihood of accepting incorrect information.
Consulting additional AI sources is associated with higher answer accuracy.
Traditional web search does not show the same association with improved accuracy.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Designers could personalize verification prompts based on measured user trust levels rather than applying them uniformly.
Warmth in chatbots may require counterbalancing mechanisms to limit error acceptance in information tasks.
The user-dependent pattern of verification could appear in other AI tools that offer multiple information sources.
Training interventions aimed at verification habits might reduce overreliance for users who default to trust.

Load-bearing premise

The controlled experiment with optional web search mirrors how people decide whether to verify information during everyday searches.

What would settle it

A field study in which users facing real information needs switch to web search and show sharply lower reliance on chatbot answers when the answer conflicts with their own knowledge would challenge the persistence of reliance.

Figures

Figures reproduced from arXiv: 2605.28498 by Frederik Bungaran Ishak Situmeang, Mert Yazan, Suzan Verberne.

**Figure 2.** Figure 2: An overview of the survey flow. Chatbot style and answer correctness were [PITH_FULL_IMAGE:figures/full_fig_p017_2.png] view at source ↗

**Figure 3.** Figure 3: The platform we created for the experiment, embedded in question pages. [PITH_FULL_IMAGE:figures/full_fig_p018_3.png] view at source ↗

**Figure 4.** Figure 4: Correlation matrix for post-experiment survey items and demographic variables. [PITH_FULL_IMAGE:figures/full_fig_p024_4.png] view at source ↗

**Figure 5.** Figure 5: Average accuracy of the participants, depending on the correctness condition. [PITH_FULL_IMAGE:figures/full_fig_p025_5.png] view at source ↗

**Figure 6.** Figure 6: Histograms showing the distribution of usage frequency across three behavioral [PITH_FULL_IMAGE:figures/full_fig_p031_6.png] view at source ↗

read the original abstract

Conversational artificial intelligence (AI) provides an efficient and convenient gateway to information access. However, it can cause overreliance when users blindly trust AI and accept its answers without fact-checking. Information search increasingly follows a hybrid interaction paradigm that combines conversational AI with web search, making fact-checking easier. In this paper, we examine whether this interaction paradigm is effective in curbing reliance. We further investigate the underlying factors (e.g., digital literacy and conversation warmth) that drive users to verify AI answers. We conduct a mixed-subjects question-answering experiment where participants interact with either a warm or a neutral chatbot. Our findings reveal that reliance persists despite users having access to both conversational and web search. The decision to verify is driven primarily by existing user perceptions (e.g., prior trust in chatbots) rather than answer properties, with some users fact-checking regardless of the context and others trusting chatbots by default. Warm conversational style has an indirect yet critical influence on reliance by increasing agreement with the chatbot when it is incorrect. Consulting additional AI sources predicts higher accuracy, while traditional web search does not. Our study extends overreliance research by: (a) demonstrating its persistence despite access to fact-checking, (b) identifying verification behavior as user-dependent, and (c) revealing conversational warmth's indirect effect on overreliance with implications for designing trustworthy conversational search systems.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Reliance on chatbots persists in hybrid setups mainly because of users' prior trust levels, with warmth playing only an indirect role, though the abstract leaves the supporting stats and methods opaque.

read the letter

The main thing to know is that this paper finds overreliance on conversational agents does not disappear just because web search is also available. Verification behavior turns out to be mostly a property of the user—some people check everything, others accept chatbot answers by default—rather than a response to the answer quality or the chatbot's style. Warmth matters indirectly by raising agreement with incorrect answers.

The experiment is a mixed-subjects design with participants using either a warm or neutral chatbot plus optional web search. The new angle is testing the hybrid paradigm explicitly and showing that other AI sources improve accuracy while traditional web search does not. That distinction is worth noting for anyone studying information access tools.

The work is straightforward in its claims and stays within the existing literature on AI trust without overreaching. It gives credit to prior overreliance studies and positions the hybrid setting as an extension.

The soft spots are real but not fatal. The abstract supplies no sample size, no statistical tests, no effect sizes, and no description of controls or analysis methods, which makes it impossible to judge how well the data support the conclusions. The lab setup with separate optional search also leaves open whether the results would hold when verification costs or tool integration differ in actual use. If the full paper has those details and they are sound, the findings gain weight; otherwise they stay preliminary.

This is for HCI researchers and system designers working on conversational search and trust. A reader who follows behavioral studies on AI reliance will find the user-dependent verification result useful. It deserves peer review because the topic is practical and the hybrid experiment is a reasonable next step, even if the methods section will need close scrutiny and possible expansion.

Referee Report

2 major / 1 minor

Summary. The paper reports results from a mixed-subjects question-answering experiment in which participants interacted with either a warm or neutral chatbot and had optional access to web search. It claims that reliance on the chatbot persists despite verification tools being available, that the decision to verify is driven primarily by pre-existing user perceptions (e.g., prior trust) rather than answer properties, that warm conversational style indirectly increases overreliance by raising agreement with incorrect answers, and that consulting additional AI sources (but not web search) predicts higher accuracy.

Significance. If the empirical patterns hold under more realistic conditions, the work would usefully extend overreliance research by showing persistence in a hybrid search setting and by isolating user-dependent verification behavior and an indirect warmth pathway, with direct implications for the design of conversational information systems.

major comments (2)

[Experimental Design / Methods] The central claim that reliance persists and is driven by user perceptions rather than answer properties rests on a mixed-subjects lab experiment with optional separate web search. This design does not manipulate or measure perceived verification effort, time pressure, information stakes, or seamless tool integration, which directly limits the ability to conclude that the observed dominance of prior trust would generalize to real-world conditions with higher friction.
[Results / Abstract] The abstract states that 'the decision to verify is driven primarily by existing user perceptions rather than answer properties' and that warmth has an 'indirect yet critical influence,' yet no statistical details, sample characteristics, controls, effect sizes, or mediation analysis are provided in the summary of results, making it impossible to evaluate whether the data actually support the primacy of user perceptions over answer properties.

minor comments (1)

[Abstract] The abstract would benefit from explicit mention of sample size, key statistical tests, and effect sizes to allow readers to assess the strength of the reported conclusions.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their constructive comments on our manuscript. We respond to each major comment below and indicate planned revisions where appropriate.

read point-by-point responses

Referee: [Experimental Design / Methods] The central claim that reliance persists and is driven by user perceptions rather than answer properties rests on a mixed-subjects lab experiment with optional separate web search. This design does not manipulate or measure perceived verification effort, time pressure, information stakes, or seamless tool integration, which directly limits the ability to conclude that the observed dominance of prior trust would generalize to real-world conditions with higher friction.

Authors: We agree that the lab-based mixed-subjects design with optional web search does not manipulate or measure perceived verification effort, time pressure, information stakes, or seamless integration. This is a genuine limitation for generalizing the dominance of prior trust to higher-friction real-world settings. We will add an explicit limitations paragraph in the Discussion section addressing these boundary conditions and outlining how future studies could test the findings under more realistic conditions. revision: yes
Referee: [Results / Abstract] The abstract states that 'the decision to verify is driven primarily by existing user perceptions rather than answer properties' and that warmth has an 'indirect yet critical influence,' yet no statistical details, sample characteristics, controls, effect sizes, or mediation analysis are provided in the summary of results, making it impossible to evaluate whether the data actually support the primacy of user perceptions over answer properties.

Authors: Abstracts are space-constrained and conventionally omit detailed statistics; the full manuscript reports sample characteristics (N and demographics), regression models with controls for user perceptions, effect sizes, and the mediation analysis supporting the indirect warmth pathway in the Results section. These details allow evaluation of the claims. To address the concern, we will revise the abstract to include a brief clause noting that the patterns were supported by mediation and regression analyses controlling for user characteristics. revision: partial

Circularity Check

0 steps flagged

No circularity: empirical behavioral study with no derivations

full rationale

This paper reports results from a mixed-subjects question-answering experiment measuring reliance, verification decisions, and effects of chatbot warmth and user traits. All central claims (persistence of reliance, dominance of prior perceptions over answer properties, indirect warmth effect) are grounded in observed participant behavior and statistical analysis of experimental data. There are no equations, fitted parameters presented as predictions, self-citation load-bearing uniqueness theorems, or ansatzes that reduce the findings to their own inputs by construction. The study is self-contained against external benchmarks because its conclusions follow directly from the collected data rather than from any internal definitional loop.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

As an empirical behavioral study, the central claims rest on assumptions about the validity of self-reported measures, the representativeness of the participant sample, and the ecological validity of the task. No free parameters or invented entities are introduced.

axioms (1)

domain assumption Participants' behavior in the lab task with optional web search reflects their behavior in natural information search scenarios.
This assumption is required to generalize the finding that reliance persists despite access to fact-checking tools.

pith-pipeline@v0.9.1-grok · 5792 in / 1278 out tokens · 53659 ms · 2026-06-29T10:04:27.265322+00:00 · methodology

discussion (0)

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Personalized to Persuade: The Effects of Contextualization and Warmth on Trust and Reliance in Conversational AI
cs.HC 2026-05 unverdicted novelty 4.0

A 2x2 between-subjects experiment finds contextualization lowers AI persuasiveness but warmth restores it through crossover interaction, with reliance invariant to design, trust predicting outcomes independently, and ...

Reference graph

Works this paper leans on

8 extracted references · 7 canonical work pages · cited by 1 Pith paper

[1]

Hargittai, E., 2005

doi:10.61186/ist.202401.01.17. Hargittai, E., 2005. Survey measures of web-oriented digital literacy. Social Science Computer Review 23, 371–379. doi:10.1177/0894439305275911. Hashemi Chaleshtori, F., Ghosal, A., Gill, A., Bambroo, P., Marasovic, A.,

work page doi:10.61186/ist.202401.01.17 2005
[2]

(Eds.), Findings of the Association for Computational Linguistics: EMNLP 2024, Asso- ciation for Computational Linguistics, Miami, Florida, USA

On evaluating explanation utility for human-AI decision making 53 in NLP, in: Al-Onaizan, Y., Bansal, M., Chen, Y.N. (Eds.), Findings of the Association for Computational Linguistics: EMNLP 2024, Asso- ciation for Computational Linguistics, Miami, Florida, USA. pp. 7456–

2024
[3]

He, G., Aishwarya, N., Gadiraju, U., 2025a

URL:https://aclanthology.org/2024.findings-emnlp.439/, doi:10.18653/v1/2024.findings-emnlp.439. He, G., Aishwarya, N., Gadiraju, U., 2025a. Is conversational xai all you need? human-ai decision making with a conversational xai assis- tant, in: Proceedings of the 30th International Conference on Intelligent User Interfaces, Association for Computing Machin...

work page doi:10.18653/v1/2024.findings-emnlp.439 2024
[4]

Kaiser, C., Kaiser, J., Schallner, R., Schneider, S., 2025

URL:https://api.semanticscholar.org/CorpusID:270697349. Kaiser, C., Kaiser, J., Schallner, R., Schneider, S., 2025. A new era of on- line search? a large-scale study of user behavior and personal preferences during practical search tasks with generative ai versus traditional search engines, in: Proceedings of the Extended Abstracts of the CHI Confer- ence...

work page doi:10.1145/3706599.3720123 2025
[5]

i’m not sure, but

"i’m not sure, but...": Examining the impact of large language models’ uncertainty expression on user reliance and trust, in: Pro- ceedings of the 2024 ACM Conference on Fairness, Accountability, and Transparency, Association for Computing Machinery, New York, NY, USA. p. 822–835. URL:https://doi.org/10.1145/3630106.3658941, doi:10.1145/3630106.3658941. K...

work page doi:10.1145/3630106.3658941 2024
[6]

published September 4, 2025

URL:https://www.newsguardtech.com/ai-monitor/august- 2025-ai-false-claim-monitor/. published September 4, 2025. OpenAI, 2022. Introducing chatgpt. URL:https://openai.com/index/ chatgpt/. OpenAI, 2024. Introducing chatgpt search.https://openai.com/index/ introducing-chatgpt-search/. Reid, L., 2024. Generative ai in search: Let google do the searching for y...

work page doi:10.18653/v1/2024.naacl-long.81 2025
[7]

v29i1.13541

URL:https://doi.org/10.5210/fm.v29i1.13541, doi:10.5210/fm. v29i1.13541. Spatharioti, S.E., Rothschild, D., Goldstein, D.G., Hofman, J.M., 2025. Ef- fects of llm-based search on decision making: Speed, accuracy, and over- reliance, in: Proceedings of the 2025 CHI Conference on Human Fac- tors in Computing Systems, Association for Computing Machinery, New ...

work page doi:10.5210/fm.v29i1.13541 2025
[8]

Are generative ai agents effective personalized financial advisors?, in: Proceedings of the 48th International ACM SIGIR Conference on Re- search and Development in Information Retrieval, Association for Com- puting Machinery, New York, NY, USA. p. 286–295. URL:https: //doi.org/10.1145/3726302.3729897, doi:10.1145/3726302.3729897. Wang, L., Song, M., Reza...

work page doi:10.1145/3726302.3729897 2024

[1] [1]

Hargittai, E., 2005

doi:10.61186/ist.202401.01.17. Hargittai, E., 2005. Survey measures of web-oriented digital literacy. Social Science Computer Review 23, 371–379. doi:10.1177/0894439305275911. Hashemi Chaleshtori, F., Ghosal, A., Gill, A., Bambroo, P., Marasovic, A.,

work page doi:10.61186/ist.202401.01.17 2005

[2] [2]

(Eds.), Findings of the Association for Computational Linguistics: EMNLP 2024, Asso- ciation for Computational Linguistics, Miami, Florida, USA

On evaluating explanation utility for human-AI decision making 53 in NLP, in: Al-Onaizan, Y., Bansal, M., Chen, Y.N. (Eds.), Findings of the Association for Computational Linguistics: EMNLP 2024, Asso- ciation for Computational Linguistics, Miami, Florida, USA. pp. 7456–

2024

[3] [3]

He, G., Aishwarya, N., Gadiraju, U., 2025a

URL:https://aclanthology.org/2024.findings-emnlp.439/, doi:10.18653/v1/2024.findings-emnlp.439. He, G., Aishwarya, N., Gadiraju, U., 2025a. Is conversational xai all you need? human-ai decision making with a conversational xai assis- tant, in: Proceedings of the 30th International Conference on Intelligent User Interfaces, Association for Computing Machin...

work page doi:10.18653/v1/2024.findings-emnlp.439 2024

[4] [4]

Kaiser, C., Kaiser, J., Schallner, R., Schneider, S., 2025

URL:https://api.semanticscholar.org/CorpusID:270697349. Kaiser, C., Kaiser, J., Schallner, R., Schneider, S., 2025. A new era of on- line search? a large-scale study of user behavior and personal preferences during practical search tasks with generative ai versus traditional search engines, in: Proceedings of the Extended Abstracts of the CHI Confer- ence...

work page doi:10.1145/3706599.3720123 2025

[5] [5]

i’m not sure, but

"i’m not sure, but...": Examining the impact of large language models’ uncertainty expression on user reliance and trust, in: Pro- ceedings of the 2024 ACM Conference on Fairness, Accountability, and Transparency, Association for Computing Machinery, New York, NY, USA. p. 822–835. URL:https://doi.org/10.1145/3630106.3658941, doi:10.1145/3630106.3658941. K...

work page doi:10.1145/3630106.3658941 2024

[6] [6]

published September 4, 2025

URL:https://www.newsguardtech.com/ai-monitor/august- 2025-ai-false-claim-monitor/. published September 4, 2025. OpenAI, 2022. Introducing chatgpt. URL:https://openai.com/index/ chatgpt/. OpenAI, 2024. Introducing chatgpt search.https://openai.com/index/ introducing-chatgpt-search/. Reid, L., 2024. Generative ai in search: Let google do the searching for y...

work page doi:10.18653/v1/2024.naacl-long.81 2025

[7] [7]

v29i1.13541

URL:https://doi.org/10.5210/fm.v29i1.13541, doi:10.5210/fm. v29i1.13541. Spatharioti, S.E., Rothschild, D., Goldstein, D.G., Hofman, J.M., 2025. Ef- fects of llm-based search on decision making: Speed, accuracy, and over- reliance, in: Proceedings of the 2025 CHI Conference on Human Fac- tors in Computing Systems, Association for Computing Machinery, New ...

work page doi:10.5210/fm.v29i1.13541 2025

[8] [8]

Are generative ai agents effective personalized financial advisors?, in: Proceedings of the 48th International ACM SIGIR Conference on Re- search and Development in Information Retrieval, Association for Com- puting Machinery, New York, NY, USA. p. 286–295. URL:https: //doi.org/10.1145/3726302.3729897, doi:10.1145/3726302.3729897. Wang, L., Song, M., Reza...

work page doi:10.1145/3726302.3729897 2024