Recognition: unknown
Towards an Appropriate Level of Reliance on AI: A Preliminary Reliance-Control Framework for AI in Software Engineering
Pith reviewed 2026-05-10 16:04 UTC · model grok-4.3
The pith
The level of control developers exercise over AI outputs can mark overreliance or underreliance on the technology.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
From twenty-two interviews with software developers about their LLM use in development work, the authors derive a reliance-control framework. In the framework the degree of control developers retain over AI outputs functions as an indicator that distinguishes overreliance, underreliance, and appropriate reliance. Developers who make few changes to AI-generated code or text tend toward overreliance, those who extensively revise or ignore outputs tend toward underreliance, and the framework positions balanced reliance at intermediate control levels. The authors recommend future research that maps the control options already present in existing tools and that will appear in new ones.
What carries the argument
The reliance-control framework, which classifies developer interactions with AI outputs according to the amount of human editing, verification, or rejection applied.
Load-bearing premise
The level of control developers keep over AI outputs is a valid and generalizable proxy for distinguishing overreliance from underreliance.
What would settle it
A longitudinal study that assigns developers to overreliance, underreliance, or appropriate-reliance groups using the framework's control-level criteria and then measures actual skill retention and productivity changes over six months finds no consistent differences across groups.
Figures
read the original abstract
How software developers interact with Artificial Intelligence (AI)-powered tools, including Large Language Models (LLMs), plays a vital role in how these AI-powered tools impact them. While overreliance on AI may lead to long-term negative consequences (e.g., atrophy of critical thinking skills); underreliance might deprive software developers of potential gains in productivity and quality. Based on twenty-two interviews with software developers on using LLMs for software development, we propose a preliminary reliance-control framework where the level of control can be used as a way to identify AI overreliance and underreliance. We also use it to recommend future research to further explore the different control levels supported by the current and emergent LLM-driven tools. Our paper contributes to the emerging discourse on AI overreliance and provides an understanding of the appropriate degree of reliance as essential to developers making the most of these powerful technologies. Our findings can help practitioners, educators, and policymakers promote responsible and effective use of AI tools.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript proposes a preliminary reliance-control framework for AI in software engineering, derived from 22 interviews with software developers using LLMs. It claims that the level of control developers exercise over AI outputs can identify overreliance (low control) versus underreliance (high control), and uses the framework to recommend future research on control levels supported by current and emerging LLM tools, while contributing to discourse on appropriate AI reliance.
Significance. If the control-level proxy can be validated against objective outcomes, the framework would offer a practical conceptual tool for balancing productivity gains from AI tools against risks such as skill atrophy in software development. It provides a novel lens for human-AI collaboration in SE that could inform tool design, developer training, and policy, building on existing work on overreliance. The preliminary framing appropriately signals limited generalizability, but the contribution remains primarily conceptual until the proxy is tested.
major comments (2)
- [Abstract / Framework section] Abstract and framework derivation section: The central claim that control level serves as a valid proxy for distinguishing overreliance (low control) from underreliance (high control) rests on interview-derived categories without reported validation against objective measures such as code correctness, task completion time, suggestion acceptance rates, or post-AI skill retention. This proxy assumption is load-bearing for the framework's utility but lacks cross-validation or falsification tests.
- [Methods / Study design] Methods / Study design section: No details are provided on participant selection criteria, interview protocol, data analysis method (e.g., thematic coding scheme, inter-rater reliability), or how the 22 interviews were used to construct the specific control levels in the framework. These omissions prevent assessment of the framework's empirical grounding and replicability.
minor comments (1)
- [Abstract] The abstract could explicitly note the absence of quantitative validation data to better set reader expectations for a preliminary framework.
Simulated Author's Rebuttal
We thank the referee for the constructive and detailed feedback. We address each major comment below, indicating where we agree revisions are warranted to improve clarity, transparency, and the manuscript's positioning as preliminary work.
read point-by-point responses
-
Referee: [Abstract / Framework section] Abstract and framework derivation section: The central claim that control level serves as a valid proxy for distinguishing overreliance (low control) from underreliance (high control) rests on interview-derived categories without reported validation against objective measures such as code correctness, task completion time, suggestion acceptance rates, or post-AI skill retention. This proxy assumption is load-bearing for the framework's utility but lacks cross-validation or falsification tests.
Authors: We agree that the control-level proxy is derived from qualitative interview data and has not been validated against objective outcomes. The manuscript already describes the work as preliminary to signal this exploratory scope. In revision we will update the abstract and framework section to more explicitly state the proxy's basis in developer perceptions, note the absence of quantitative validation, and outline specific directions for future empirical testing. This clarification strengthens the conceptual contribution without overstating current evidence. revision: partial
-
Referee: [Methods / Study design] Methods / Study design section: No details are provided on participant selection criteria, interview protocol, data analysis method (e.g., thematic coding scheme, inter-rater reliability), or how the 22 interviews were used to construct the specific control levels in the framework. These omissions prevent assessment of the framework's empirical grounding and replicability.
Authors: We acknowledge the methods section requires greater detail. In the revised manuscript we will expand it to describe participant recruitment and selection criteria, the semi-structured interview protocol, the thematic analysis procedure including the coding scheme, any inter-rater reliability checks performed, and the iterative process by which the control levels were derived from the 22 interviews. These additions will improve transparency and replicability. revision: yes
Circularity Check
No circularity: framework constructed from external interview data
full rationale
The paper derives its preliminary reliance-control framework directly from twenty-two interviews with software developers, treating the interview insights as the empirical foundation for identifying overreliance (low control) and underreliance (high control). No equations, fitted parameters, self-definitional loops, or load-bearing self-citations appear in the provided text. The central claim rests on external qualitative data rather than reducing to its own inputs by construction, satisfying the criteria for a self-contained derivation against external benchmarks.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Interviews with software developers yield valid and transferable insights into real-world AI reliance behaviors.
invented entities (1)
-
Reliance-Control Framework
no independent evidence
Reference graph
Works this paper leans on
-
[1]
Jessica Y Bo, Sophia Wan, and Ashton Anderson. 2025. To rely or not to rely? evaluating interventions for appropriate reliance on large language models. In Proceedings of the Conference on Human Factors in Computing Systems. 1–23
2025
- [2]
-
[3]
Derek DeBellis et al. 2025.2025 DORA State of AI-Assisted Software Development. https://dora.dev/research/ai/#state-of-ai-assisted-software-development
-
[4]
2025.AI Gone Wrong: AI Hallucinations & Errors
Aaron Drapkin. 2025.AI Gone Wrong: AI Hallucinations & Errors. https://tech. co/news/list-ai-failures-mistakes-errors
2025
-
[5]
Samuel Ferino et al . 2026. Towards an Appropriate Level of Reliance on AI: A Preliminary Reliance-Control Framework for AI in Software Engineering – Supplementary Information Package. https://doi.org/10.5281/zenodo.18616305
-
[6]
Samuel Ferino, Rashina Hoda, John Grundy, and Christoph Treude. 2025. Novice developers’ perspectives on adopting llms for software development: A systematic literature review.ACM Transactions on Software Engineering and Methodology (2025)
2025
-
[7]
Gaole He et al. 2023. Knowing about knowing: An illusion of human competence can hinder appropriate reliance on AI systems. InProceedings of the 2023 CHI conference on human factors in computing systems. 1–18
2023
-
[8]
2024.Qualitative Research with Socio-Technical Grounded Theory
Rashina Hoda. 2024.Qualitative Research with Socio-Technical Grounded Theory. Springer
2024
-
[9]
Xinyi Hou et al . 2024. Large language models for software engineering: A systematic literature review.ACM Transactions on Software Engineering and Methodology33, 8 (2024), 1–79
2024
-
[10]
Kori Inkpen et al. 2023. Advancing human-AI complementarity: The impact of user expertise and algorithmic tuning on joint decision making.ACM Transactions on Computer-Human Interaction30, 5 (2023), 1–29
2023
-
[11]
Ranim Khojah et al . 2024. Beyond code generation: An observational study of chatgpt usage in software engineering practice.Proceedings of the ACM on Software Engineering1, FSE (2024), 1819–1840
2024
-
[12]
Sunnie SY Kim et al . 2025. Fostering appropriate reliance on large language models: The role of explanations, sources, and inconsistencies. InProceedings of the Conference on Human Factors in Computing Systems. 1–19
2025
- [13]
-
[14]
Shuai Ma et al. 2023. Who Should I Trust: AI or Myself? Leveraging Human and AI Correctness Likelihood to Promote Appropriate Trust in AI-Assisted Decision- Making. InProceedings of the Conference on Human Factors in Computing Systems. Article 759, 19 pages. doi:10.1145/3544548.3581058
-
[15]
Hannah Mayer, L Yee, M Chui, and R Roberts. 2025. Superagency in the workplace: Empowering people to unlock AI’s full potential.McKinsey Digital28 (2025)
2025
-
[16]
Fairuz Meem et al . 2025. Why Do Software Practitioners Use ChatGPT for Software Development Tasks?. InProceedings of the International Conference on the Foundations of Software Engineering. 1508–1514. doi:10.1145/3696630.3731667
-
[17]
Qodo AI. 2025. State of AI Code Quality 2025. https://www.qodo.ai/reports/state- of-ai-code-quality/. Accessed: 2025-10-15
2025
-
[18]
Cornelia Sindermann et al . 2021. Assessing the attitude towards artificial in- telligence: Introduction of a short measure in German, Chinese, and English language.KI-Künstliche intelligenz35, 1 (2021), 109–118
2021
-
[19]
Christoph Treude and Marco Gerosa. 2025. How developers interact with AI: A taxonomy of human-AI collaboration in software engineering. In2nd Inter- national Conference on AI Foundation Models and Software Engineering. IEEE, 236–240
2025
-
[20]
Thomas Weber et al . 2024. Significant productivity gains through program- ming with large language models.Proceedings of the ACM on Human-Computer Interaction8, EICS (2024), 1–29
2024
-
[21]
Ziyao Zhang et al. 2025. Llm hallucinations in practical code generation: Phenom- ena, mechanism, and mitigation.Proceedings of the ACM on Software Engineering 2, ISSTA (2025), 481–503
2025
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.