arxiv: 2604.22604 · v2 · submitted 2026-04-24 · 💻 cs.HC

Recognition: unknown

Vibe coding for clinicians: democratising bespoke software development for digital health innovation

Ariel Yuhan Ong , Iain Livingstone , Caroline Kilduff , Mertcan Sevgi , David A Merle , Eden Ruffell , Pearse A Keane , Fares Antaki

Authors on Pith no claims yet

Pith reviewed 2026-05-08 10:32 UTC · model grok-4.3

classification 💻 cs.HC

keywords vibe codingcliniciansdigital healthlarge language modelssoftware prototypingclinical workflowsAI-assisted developmenthealth innovation

0 comments

The pith

Vibe coding lets clinicians prototype bespoke digital health tools by prompting large language models in natural language.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Clinicians regularly encounter workflow problems too specific or low-volume to interest commercial developers. The paper claims that vibe coding, where clinicians describe needs in plain language to large language models, equips them to build or prototype their own solutions. This bridges the gap between front-line clinical insight and technical execution without requiring prior programming expertise. The authors supply foundational guidance, a step-by-step playbook, real examples, and explicit guardrails to keep the process practical and safe. They present the method as a complement to professional developers rather than a substitute.

Core claim

Vibe coding refers to the co-development of software using natural language prompts to large language models. The paper establishes that this approach democratises bespoke software development for clinicians, enabling them to rapidly create simple tools or prototypes that address real-world pain points and produce digital health solutions most reflective of clinical realities.

What carries the argument

Vibe coding, defined as co-development of software through natural language prompts to large language models, which carries the argument by turning clinical descriptions directly into functional prototypes.

If this is right

Clinicians without coding experience gain the ability to address bespoke workflow problems that commercial software ignores.
Rapid prototyping becomes accessible, allowing tools to be built that more closely match clinical realities than off-the-shelf options.
A shared playbook and case examples lower the barrier for early adopters to begin creating their own solutions.
Explicit caveats and guardrails are required to ensure safe deployment alongside professional developers.
The method positions clinicians as active participants in digital health innovation rather than passive users.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Widespread adoption would require new training on prompt engineering and code validation specific to regulated health environments.
Integration challenges with existing hospital IT systems and data privacy rules may limit how far prototypes can move into production.
Over-reliance on generated code without human oversight could create new liability questions for clinicians who deploy the results.
The approach may scale best for low-stakes tools and serve mainly as a discovery step before handing off to professional teams.

Load-bearing premise

Large language models can reliably generate functional, safe, and maintainable code for clinical applications from natural language prompts without introducing critical errors or compliance issues.

What would settle it

A documented case in which code generated through vibe coding produced a functional failure, patient safety issue, data breach, or regulatory violation in an actual clinical workflow would falsify the central claim.

Figures

Figures reproduced from arXiv: 2604.22604 by Ariel Yuhan Ong, Caroline Kilduff, David A Merle, Eden Ruffell, Fares Antaki, Iain Livingstone, Mertcan Sevgi, Pearse A Keane.

**Figure 1.** Figure 1: The vibe coding workflow illustrating the iterative cycle of software development with a view at source ↗

**Figure 2.** Figure 2: The user interface for a simple intravitreal injection interval calculator featuring two date view at source ↗

**Figure 3.** Figure 3: The “Cataract Calendar” tool featuring the online user interface which can generate a view at source ↗

read the original abstract

Clinicians often face workflow problems that are perceived as either too bespoke or low stakes to attract commercial attention. Historically, most do not have the technical knowledge to address these problems, but the recent emergence of "vibe coding" presents a transformative opportunity. Vibe coding refers to the co-development of software using natural language prompts to large language models. It offers a pathway to create simple tools that address these real-world pain points, or to prototype more complex ideas. In this review, written by a group of early adopter clinicians with a range of programming expertise, we introduce vibe coding for clinicians (especially those with no or minimal coding experience) as a way of democratising innovation from the front lines. We discuss foundational skills, outline some common challenges, provide a practical step-by-step playbook, and illustrate this approach with some case examples, taking care to consider caveats and guardrails for deployment. We propose that vibe coding is more than a technical shortcut for beginners and is not a replacement for professional software developers. Instead, it can bridge the gap between clinical insight and technical execution, equipping clinicians with the ability to rapidly prototype digital health solutions most reflective of clinical realities.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

This paper gives clinicians a practical playbook for using LLMs to prototype simple tools but provides no data to show the generated code is reliable or safe enough for clinical settings.

read the letter

The main thing to know is that this is a review introducing 'vibe coding' as a way for clinicians to prompt large language models for custom software. It targets workflow problems too small or specific for commercial products and supplies steps, challenges, and examples drawn from the authors' own experience as early adopters with mixed coding backgrounds. The authors do a solid job keeping the scope narrow: they stress that this works best for simple prototypes, not production systems, and they list guardrails around testing, privacy, and accuracy. That framing feels useful for readers who actually work in clinics and see these gaps daily. The playbook and caveats sections are the clearest parts, showing they have thought through real hurdles like prompt iteration and debugging without overpromising. The soft spot is exactly where the stress-test note flags it. The central claim that this approach bridges clinical insight and technical execution rests on the untested premise that natural-language prompts will reliably produce functional, maintainable, and compliant code. There are no success rates, bug counts, validation steps, or regulatory checks reported, only illustrative cases. Without that evidence the practical value stays speculative rather than demonstrated. This paper is for clinicians who want an accessible entry point into digital health prototyping and for researchers in medical informatics who need a starting reference on generative AI use cases. It is not for anyone expecting quantitative results or formal methods. I would send it for peer review because the topic is timely and the practical advice could be sharpened with reviewer input on validation methods and clearer limits, even though the current version is mostly descriptive.

Referee Report

2 major / 2 minor

Summary. The paper introduces 'vibe coding'—the co-development of software via natural-language prompts to large language models—as a means for clinicians (including those with minimal coding experience) to create bespoke digital health tools addressing workflow problems overlooked by commercial developers. Drawing on the authors' experience as early adopters, it covers foundational skills, common challenges, a practical step-by-step playbook, illustrative case examples, and caveats/guardrails, arguing that the approach can bridge clinical insight and technical execution to enable rapid prototyping of solutions most reflective of clinical realities.

Significance. If the core premise holds, the work could have meaningful significance for human-computer interaction and digital health by lowering barriers to clinician-led innovation in niche or low-stakes workflow areas. The manuscript's practical playbook, attention to guardrails, and grounding in real-world clinician experience provide a useful starting point for the community; however, its conceptual nature and lack of empirical validation mean the significance remains prospective rather than demonstrated.

major comments (2)

[Abstract] Abstract: the central claim that vibe coding 'equips clinicians with the ability to rapidly prototype digital health solutions most reflective of clinical realities' is load-bearing for the paper's contribution yet rests on the untested assumption that LLM-generated code will be functional, safe, and maintainable; the manuscript provides no quantitative metrics on prompt success rates, error incidence, usability, or regulatory compliance, and the case examples remain qualitative anecdotes.
[Case examples and guardrails] Sections on case examples and guardrails: while caveats are acknowledged, the absence of any controlled evaluation, post-deployment validation, or discussion of how clinicians would verify LLM outputs for clinical safety undermines the proposal's readiness for the digital health domain, where errors carry high stakes.

minor comments (2)

[Playbook] The step-by-step playbook would be strengthened by including concrete prompt templates, example LLM responses, and failure modes to improve actionability for readers with no coding background.
[Challenges] Additional citations to empirical studies on LLM code-generation reliability (especially in safety-critical or regulated domains) would better contextualize the challenges section.

Simulated Author's Rebuttal

2 responses · 1 unresolved

We thank the referee for their thoughtful and constructive review. We appreciate the acknowledgment of the manuscript's potential significance as a practical starting point for clinician-led innovation. The paper is explicitly framed as a conceptual review and experiential guide by early adopters, not as an empirical study providing quantitative validation. We address the major comments below and outline targeted revisions to clarify scope and strengthen caveats without altering the core contribution.

read point-by-point responses

Referee: [Abstract] Abstract: the central claim that vibe coding 'equips clinicians with the ability to rapidly prototype digital health solutions most reflective of clinical realities' is load-bearing for the paper's contribution yet rests on the untested assumption that LLM-generated code will be functional, safe, and maintainable; the manuscript provides no quantitative metrics on prompt success rates, error incidence, usability, or regulatory compliance, and the case examples remain qualitative anecdotes.

Authors: We agree that the manuscript offers no quantitative metrics or controlled data, as it draws from the authors' qualitative experiences as early adopters rather than experimental evaluation. The central claim is presented prospectively, based on observed patterns in real-world prototyping. We will revise the abstract to explicitly state that the described benefits are grounded in experiential insights and subject to the guardrails detailed in the main text, while removing any implication of demonstrated functionality or safety. The case examples will be reframed more clearly as illustrative anecdotes to demonstrate workflow applicability. revision: partial
Referee: [Case examples and guardrails] Sections on case examples and guardrails: while caveats are acknowledged, the absence of any controlled evaluation, post-deployment validation, or discussion of how clinicians would verify LLM outputs for clinical safety undermines the proposal's readiness for the digital health domain, where errors carry high stakes.

Authors: We concur that the paper lacks controlled evaluations or post-deployment data, which would be required to claim readiness for clinical deployment. The manuscript already limits its scope to low-stakes prototyping and explicitly states that vibe coding is not a substitute for professional software development or regulatory processes. The guardrails section discusses verification steps including manual code review, unit testing, and clinician oversight. We will expand this section with additional practical guidance on output verification (e.g., iterative prompting for edge cases and integration with existing clinical review workflows) and add a dedicated limitations paragraph calling for future empirical studies on safety and efficacy. revision: partial

standing simulated objections not resolved

Provision of quantitative metrics on prompt success rates, error incidence, usability, or regulatory compliance, as these require a separate empirical study outside the scope of this conceptual review.

Circularity Check

0 steps flagged

No circularity: conceptual review with no derivations or self-referential reductions

full rationale

The manuscript is a descriptive review and practical guide on vibe coding. It advances no mathematical models, equations, predictions, fitted parameters, or derivation chains. Claims rest on authors' stated experiences as early adopters and general discussion of LLM capabilities, without any step that reduces by construction to its own inputs or to a self-citation chain. No instances of self-definitional logic, fitted-input predictions, or ansatz smuggling appear. The paper is self-contained as advisory content.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on the untested assumption that current large language models can produce usable clinical software from natural language prompts and that clinicians can safely deploy the outputs with minimal oversight.

axioms (1)

domain assumption Large language models can generate functional and safe code for clinical tools from natural language descriptions provided by non-programmers.
This premise underpins the entire proposal that vibe coding democratizes development; it is stated implicitly throughout the abstract.

pith-pipeline@v0.9.0 · 5531 in / 1197 out tokens · 72812 ms · 2026-05-08T10:32:03.476284+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

29 extracted references · 26 canonical work pages

[1]

vibe coding

The CHUM School of Artificial Intelligence in Healthcare, Montreal, QC, Canada Correspondence to Ariel Yuhan Ong ariel.ong@nhs.net and Fares Antaki fares.antaki@outlook.com ABSTRACT Clinicians often face workflow problems that are perceived as either too bespoke or low stakes to attract commercial attention. Historically, most do not have the technical kn...

2025
[2]

treat-and-extend

maintainability, or writing clean, logical, and structured code that is easy to modify or debug in the future. Beginners can promote maintainability by instructing the LLM to break down the application into logical components and to use clear naming conventions. Treating documentation and versioning as habits rather than retrospective obligations is perha...

2024
[3]

Electronic Health Record Stress and Burnout Among Clinicians in Hospital Settings: A Systematic Review

Alobayli F, O’Connor S, Holloway A, Cresswell K. Electronic Health Record Stress and Burnout Among Clinicians in Hospital Settings: A Systematic Review. Digit Health. 2023;9:20552076231220241. doi:10.1177/20552076231220241

work page doi:10.1177/20552076231220241 2023
[4]

Bridging the Gap Between Healthcare Software Products and Sector-Based Requirements

Kalpinagarajarao G. Bridging the Gap Between Healthcare Software Products and Sector-Based Requirements. Prog Med Sci. 2024;8(3):1-4. doi:10.47363/PMS/2024(8)224

work page doi:10.47363/pms/2024(8)224 2024
[5]

The case for inclusive co-creation in digital health innovation

Nickel GC, Wang S, Kwong JCC, Kvedar JC. The case for inclusive co-creation in digital health innovation. NPJ Digit Med. 2024;7:251. doi:10.1038/s41746-024-01256-9

work page doi:10.1038/s41746-024-01256-9 2024
[6]

Democratizing Code: How GPT and Large Language Models Are Reshaping the Landscape of Software Creation

Ojha PR. Democratizing Code: How GPT and Large Language Models Are Reshaping the Landscape of Software Creation. International Journal of Scientific Research in Computer Science, Engineering and Information Technology. 2024;10(5):5. doi:10.32628/CSEIT241051031

work page doi:10.32628/cseit241051031 2024
[7]

Vibe coding: a new paradigm for biomedical software development

Moore JH, Tatonetti N. Vibe coding: a new paradigm for biomedical software development. BioData Mining. 2025;18(1):1-3. doi:10.1186/s13040-025-00462-9

work page doi:10.1186/s13040-025-00462-9 2025
[8]

Democratizing artificial intelligence: How no-code AI can leverage machine learning operations

Sundberg L, Holmström J. Democratizing artificial intelligence: How no-code AI can leverage machine learning operations. Business Horizons. 2023;66(6):777-788. doi:10.1016/j.bushor.2023.04.003

work page doi:10.1016/j.bushor.2023.04.003 2023
[9]

Vibe coding: Programming through conversa- tion with artificial intelligence,

Sarkar A, Drosos I. Vibe coding: programming through conversation with artificial intelligence. arXiv. Preprint posted online June 29, 2025:arXiv:2506.23253. doi:10.48550/arXiv.2506.23253

work page doi:10.48550/arxiv.2506.23253 2025
[10]

AI Teaches Surgical Diagnostic Reasoning to Medical Students: Evidence from an Experiment Using a Fully Automated, Low-Cost Feedback System

Kıyak YS, Emekli E, İş Kara T, Coşkun Ö, Budakoğlu Iİ. AI Teaches Surgical Diagnostic Reasoning to Medical Students: Evidence from an Experiment Using a Fully Automated, Low-Cost Feedback System. J Surg Educ. 2025;82(10):103639. doi:10.1016/j.jsurg.2025.103639

work page doi:10.1016/j.jsurg.2025.103639 2025
[11]

From technology adopters to creators: Leveraging AI-assisted vibe coding to transform clinical teaching and learning

Chow M, Ng O. From technology adopters to creators: Leveraging AI-assisted vibe coding to transform clinical teaching and learning. Med Teach. Published online April 9, 2025:1-3. doi:10.1080/0142159X.2025.2488353

work page doi:10.1080/0142159x.2025.2488353 2025
[12]

Innovation management of three-dimensional printing (3DP) technology: Disclosing insights from existing literature and determining future research streams

Marić J, Opazo-Basáez M, Vlačić B, Dabić M. Innovation management of three-dimensional printing (3DP) technology: Disclosing insights from existing literature and determining future research streams. Technological Forecasting and Social Change. 2023;193:122605. doi:10.1016/j.techfore.2023.122605 Vibe coding for clinicians 18

work page doi:10.1016/j.techfore.2023.122605 2023
[13]

3D Printing in a hospital: Centralized clinical implementation and applications for comprehensive care

Hellman S, Frisch P, Platzman A, Booth P. 3D Printing in a hospital: Centralized clinical implementation and applications for comprehensive care. Digit Health. 2023;9:20552076231221899. doi:10.1177/20552076231221899

work page doi:10.1177/20552076231221899 2023
[14]

3D printed biomedical devices and their applications: A review on state-of-the-art technologies, existing challenges, and future perspectives

Mamo HB, Adamiak M, Kunwar A. 3D printed biomedical devices and their applications: A review on state-of-the-art technologies, existing challenges, and future perspectives. Journal of the Mechanical Behavior of Biomedical Materials. 2023;143:105930. doi:10.1016/j.jmbbm.2023.105930

work page doi:10.1016/j.jmbbm.2023.105930 2023
[15]

Roumeliotis, and Manoj Karkee

Sapkota R, Roumeliotis KI, Karkee M. Vibe Coding vs. Agentic Coding: Fundamentals and Practical Implications of Agentic AI. arXiv. Preprint posted online May 26, 2025:arXiv:2505.19443. doi:10.48550/arXiv.2505.19443

work page doi:10.48550/arxiv.2505.19443 2025
[16]

Unskilled and unaware of it: how difficulties in recognizing one’s own incompetence lead to inflated self-assessments

Kruger J, Dunning D. Unskilled and unaware of it: how difficulties in recognizing one’s own incompetence lead to inflated self-assessments. J Pers Soc Psychol. 1999;77(6):1121-1134. doi:10.1037//0022-3514.77.6.1121

work page doi:10.1037//0022-3514.77.6.1121 1999
[17]

The Double-Edged Sword of Anthropomorphism in LLMs

Reinecke MG, Ting F, Savulescu J, Singh I. The Double-Edged Sword of Anthropomorphism in LLMs. Proceedings. 2025;114(1):4. doi:10.3390/proceedings2025114004

work page doi:10.3390/proceedings2025114004 2025
[18]

Agarwal, Y

Agarwal V, Pei Y, Alamir S, Liu X. CodeMirage: Hallucinations in Code Generated by Large Language Models. arXiv. Preprint posted online July 8, 2025:arXiv:2408.08333. doi:10.48550/arXiv.2408.08333

work page doi:10.48550/arxiv.2408.08333 2025
[19]

Prompt engineering as an important emerging skill for med- ical professionals: tutorial,

Meskó B. Prompt Engineering as an Important Emerging Skill for Medical Professionals: Tutorial. Journal of Medical Internet Research. 2023;25(1):e50638. doi:10.2196/50638

work page doi:10.2196/50638 2023
[20]

Intravitreal injections: past trends and future projections within a UK tertiary hospital

Chopra R, Preston GC, Keenan TDL, et al. Intravitreal injections: past trends and future projections within a UK tertiary hospital. Eye. 2022;36(7):7. doi:10.1038/s41433-021-01646-3

work page doi:10.1038/s41433-021-01646-3 2022
[21]

A Systematic Review of the Treat and Extend Treatment Regimen with Anti-VEGF Agents for Neovascular Age-Related Macular Degeneration

Gemenetzi M, Patel PJ. A Systematic Review of the Treat and Extend Treatment Regimen with Anti-VEGF Agents for Neovascular Age-Related Macular Degeneration. Ophthalmol Ther. 2017;6(1):79-92. doi:10.1007/s40123-017-0087-5

work page doi:10.1007/s40123-017-0087-5 2017
[22]

Electronic compliance monitoring of topical treatment after ophthalmic surgery

Hermann MM, Ustündag C, Diestelhorst M. Electronic compliance monitoring of topical treatment after ophthalmic surgery. Int Ophthalmol. 2010;30(4):385-390. doi:10.1007/s10792-010-9362-3

work page doi:10.1007/s10792-010-9362-3 2010
[23]

Prevalence and correlates of self-reported nonadherence with eye drop treatment: the Belgian Compliance Study in Ophthalmology (BCSO)

Vandenbroeck S, De Geest S, Dobbels F, Fieuws S, Stalmans I, Zeyen T. Prevalence and correlates of self-reported nonadherence with eye drop treatment: the Belgian Compliance Study in Ophthalmology (BCSO). J Glaucoma. 2011;20(7):414-421. doi:10.1097/IJG.0b013e3181f7b10e

work page doi:10.1097/ijg.0b013e3181f7b10e 2011
[24]

Post-cataract prevention of inflammation and macular edema by steroid and nonsteroidal anti-inflammatory eye drops: a systematic review

Kessel L, Tendal B, Jørgensen KJ, et al. Post-cataract prevention of inflammation and macular edema by steroid and nonsteroidal anti-inflammatory eye drops: a systematic review. Ophthalmology. 2014;121(10):1915-1924. doi:10.1016/j.ophtha.2014.04.035

work page doi:10.1016/j.ophtha.2014.04.035 2014
[25]

A first look at license compliance capability of llms in code generation,

Xu W, Gao K, He H, Zhou M. LiCoEval: Evaluating LLMs on License Compliance in Code Generation. arXiv. Preprint posted online February 25, 2025:arXiv:2408.02487. doi:10.48550/arXiv.2408.02487 Vibe coding for clinicians 19

work page doi:10.48550/arxiv.2408.02487 2025
[26]

DOE 1 et al v

Inc DA. DOE 1 et al v. GitHub, Inc. et al, 4:22-cv-06823, No. 135-1 (N.D.Cal. Jul. 21, 2023). Docket Alarm. Accessed October 26,

2023
[27]

The Paradoxes of Digital Tools in Hospitals: Qualitative Interview Study

Wosny M, Strasser LM, Hastings J. The Paradoxes of Digital Tools in Hospitals: Qualitative Interview Study. J Med Internet Res. 2024;26:e56095. doi:10.2196/56095

work page doi:10.2196/56095 2024
[28]

Artificial intelligence in healthcare: Complementing, not replacing, doctors and healthcare providers

Sezgin E. Artificial intelligence in healthcare: Complementing, not replacing, doctors and healthcare providers. Digit Health. 2023;9:20552076231186520. doi:10.1177/20552076231186520

work page doi:10.1177/20552076231186520 2023
[29]

Artificial Intelligence in the Provision of Health Care: An American College of Physicians Policy Position Paper

Daneshvar N, Pandita D, Erickson S, Snyder Sulmasy L, DeCamp M, ACP Medical Informatics Committee and the Ethics, Professionalism and Human Rights Committee. Artificial Intelligence in the Provision of Health Care: An American College of Physicians Policy Position Paper. Ann Intern Med. 2024;177(7):964-967. doi:10.7326/M24-0146 Vibe coding for clinicians ...

work page doi:10.7326/m24-0146 2024