Prompt Governance? On Governing Technologies Governed by Natural Language

Anna Neumann; Holli Sargeant; Jatinder Singh

arxiv: 2606.07539 · v1 · pith:GRQZ63B5new · submitted 2026-04-29 · 💻 cs.CY

Prompt Governance? On Governing Technologies Governed by Natural Language

Anna Neumann , Holli Sargeant , Jatinder Singh This is my paper

Pith reviewed 2026-07-01 08:14 UTC · model grok-4.3

classification 💻 cs.CY

keywords prompt governancesystem promptsAI policylarge language modelsgenerative AInatural language instructionsregulatory frameworksbehavioral control

0 comments

The pith

Divergent claims in research about what system prompts can achieve complicate their use as stable governance tools in AI policy.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper reviews how the research literature describes the effects of system-level instructions on large language models and compares those descriptions to the assumptions in two policy documents. Researchers advance conflicting accounts of the goals and reliability of such prompts, which the authors organize into a typology. Policymakers treat these same instructions as accessible points for imposing behavioral constraints and compliance rules. The mismatch raises questions about whether natural language can serve as a dependable mechanism for shaping model outputs in regulatory settings.

Core claim

The literature on system-level instructions advances varying and contradictory claims about what goals those instructions can achieve; these claims are distilled into a typology. Policy frameworks position the same instructions as stable, interpretable control mechanisms. The resulting misalignments indicate that prompt governance approaches require careful consideration before they can reliably support regulatory objectives.

What carries the argument

Typology of claims drawn from the literature on system-level instructions, used to surface divergences from policy assumptions that treat prompts as behavioral controls.

If this is right

Policy frameworks that rely on system prompts to enforce constraints or compliance may encounter unpredictable outcomes across different contexts.
Natural language instructions cannot be assumed to function reliably enough to serve as primary intervention points for governing generative AI.
The viability of using prompts for governance must be examined before extending such approaches to other technical systems controlled by natural language.
Misalignments between research claims and policy positions call for closer inspection of prompt-based regulatory strategies.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Empirical measurements of prompt consistency across multiple models and prompt variations could test the typology's categories.
Similar governance challenges may appear wherever natural language serves as the interface for directing complex technical systems.
Regulators might need to develop supplementary oversight methods that do not depend on textual instructions.

Load-bearing premise

The selected literature and the two examined policy frameworks capture enough of the relevant researcher and policymaker perspectives to support conclusions about misalignments.

What would settle it

A broader review that finds largely consistent rather than contradictory claims across the literature on system prompt capabilities would weaken the case for misalignment with policy approaches.

Figures

Figures reproduced from arXiv: 2606.07539 by Anna Neumann, Holli Sargeant, Jatinder Singh.

**Figure 1.** Figure 1: Hierarchical authority levels for different AI instructions (‘prompt stack’), with stakeholders and override abilities. [PITH_FULL_IMAGE:figures/full_fig_p003_1.png] view at source ↗

**Figure 2.** Figure 2: PRISMA flowchart describing our article selection procedure and results. [PITH_FULL_IMAGE:figures/full_fig_p005_2.png] view at source ↗

**Figure 3.** Figure 3: Typology of system-level instruction goals derived through thematic analysis of claims. [PITH_FULL_IMAGE:figures/full_fig_p006_3.png] view at source ↗

**Figure 4.** Figure 4: Search string for literature review. Annotator collaboration. Two annotators conducted the full-text review of 373 papers. The two annotators independently piloted the extraction procedure on ten papers each, aligned on definitions and recoding conventions, and then split the remaining corpus. Disagreements and edge cases were resolved through discussion, with consensus meetings used to clarify inclusion … view at source ↗

read the original abstract

Generative artificial intelligence (GenAI) is increasingly operated by natural language instructions (prompts). Across the pipeline, stakeholders designate various forms, e.g. end-user guidelines, developer specifications, or system prompts, as prompt governance instruments. These textual artifacts are intended to shape model behaviour by specifying constraints, priorities, and compliance rules. Policymakers and regulators have begun to treat system-level instructions as accessible prompt-based GenAI intervention points, assuming they function (directly or indirectly) as behavioural control. Yet whether these instructions operate reliably and predictably enough across contexts to support such governance frameworks remains underexplored. Towards this, we systematically evaluate (i) how researchers discuss and treat system-level instructions in the literature, focusing on large language models (LLMs) as they isolate language effects; (ii) how policymakers position system-level instructions as governance objects, incorporating analysis of two policy frameworks (US Exec. Order on Preventing Woke AI, and EU General-Purpose AI Code of Practice); and (iii) whether misalignments between these perspectives warrant closer inspection of the viability of governing AI through natural language. We identify a fragmented literature advancing varying and contradictory claims about what goals system-level instructions can achieve, which we distil into a typology of claims. Further, we show how divergent claims complicate policy approaches that treat system-level instructions as stable, interpretable control mechanisms. We argue that given such misalignments, careful consideration must be given to prompt governance approaches. Our findings have broad implications, extending from a LLM policy context to the use of natural language as control mechanism in technical systems more generally.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper maps researcher disagreements on what system prompts can control in LLMs and contrasts them with two policies, but the fragmentation claim hinges on how the sources were picked.

read the letter

The main thing here is that the authors pull together different researcher positions on system-level instructions for LLMs and show they do not line up neatly with how two specific policies treat those instructions as control points. That mismatch is the central observation.

They create a typology that groups the claims in the literature about what prompts can achieve, then compare it against the US Executive Order on Preventing Woke AI and the EU General-Purpose AI Code of Practice. The synthesis is straightforward and makes the policy angle concrete by sticking to actual documents rather than general statements.

The work is mostly a literature synthesis with no new experiments or datasets. It does surface the range of views on prompt reliability without forcing a single narrative, which is useful for anyone tracking how governance ideas travel from technical papers into regulation.

The soft spot is the lack of detail on how the literature was collected and why these two policies were chosen as representatives. If the included papers were selected to emphasize contradictions, or if other national strategies and standards sit closer to one side of the typology, the claim that this creates broad policy complications does not follow as strongly. The analysis stays conceptual, so there is no independent check on whether the misalignments are as widespread as presented.

This is for readers who work on AI policy design or regulatory approaches to generative models. Someone already following prompt engineering debates or governance frameworks will get the most out of the typology and the side-by-side comparison.

It is worth sending to peer review so referees can test the source selection and see whether the typology holds up against a wider set of documents.

Referee Report

2 major / 2 minor

Summary. The paper claims that system-level instructions (prompts) are increasingly treated as governance instruments for GenAI/LLMs by both researchers and policymakers, but the literature advances fragmented and contradictory claims about their capabilities; a conceptual synthesis distills these into a typology, and analysis of the US Executive Order on Preventing Woke AI and the EU General-Purpose AI Code of Practice reveals misalignments that complicate treating prompts as stable, interpretable control mechanisms, warranting caution on prompt governance more broadly.

Significance. If the typology accurately captures the literature and the policy misalignment holds, the work usefully flags a practical obstacle for natural-language-based AI governance approaches, with implications for regulatory design beyond LLMs. The interdisciplinary bridge between technical claims and specific policy texts is a strength, though the absence of a reproducible review protocol limits how far the fragmentation claim can be taken as evidence.

major comments (2)

[Abstract; literature synthesis section] Abstract and the section describing the literature analysis: the paper states it 'systematically evaluate[s]' researcher perspectives on system-level instructions and distills a typology of claims, yet provides no search strategy, databases, keywords, inclusion/exclusion criteria, or coding protocol for identifying contradictions. This is load-bearing for the central claim that the literature is fragmented, because without these details it is impossible to assess whether the typology reflects the full range of views or selected examples.
[Policy frameworks analysis] Policy analysis section (analysis of US Exec. Order and EU Code of Practice): the claim that divergent literature claims 'complicate policy approaches' rests on these two documents being representative of policymaker perspectives. No justification is given for their selection over other national strategies or standards documents, nor is there discussion of how the typology maps onto them beyond the two cases; this weakens the policy-implication conclusion.

minor comments (2)

[Abstract; Introduction] The abstract refers to 'system-level instructions' and 'prompt governance' without an early, explicit definition or scope (e.g., whether this includes only system prompts or also developer specs and end-user guidelines).
[Typology section] The paper would benefit from a table or figure summarizing the typology of claims with representative citations from the literature for each category.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their constructive and detailed comments, which identify important opportunities to enhance the transparency of our methods and the justification for our case selection. We address each major comment below and will make corresponding revisions to the manuscript.

read point-by-point responses

Referee: [Abstract; literature synthesis section] Abstract and the section describing the literature analysis: the paper states it 'systematically evaluate[s]' researcher perspectives on system-level instructions and distills a typology of claims, yet provides no search strategy, databases, keywords, inclusion/exclusion criteria, or coding protocol for identifying contradictions. This is load-bearing for the central claim that the literature is fragmented, because without these details it is impossible to assess whether the typology reflects the full range of views or selected examples.

Authors: We agree that the current description of the literature synthesis lacks sufficient methodological detail to support the claim of fragmentation. The typology was developed through close reading of prominent works on system prompts rather than a formal systematic review with predefined protocols. In revision, we will add a dedicated subsection describing the sources consulted, the criteria used to identify relevant claims, and the process for surfacing contradictions. We will also adjust the abstract language from 'systematically evaluate' to 'evaluate' to better reflect the scope of the synthesis. revision: yes
Referee: [Policy frameworks analysis] Policy analysis section (analysis of US Exec. Order and EU Code of Practice): the claim that divergent literature claims 'complicate policy approaches' rests on these two documents being representative of policymaker perspectives. No justification is given for their selection over other national strategies or standards documents, nor is there discussion of how the typology maps onto them beyond the two cases; this weakens the policy-implication conclusion.

Authors: The two documents were selected because they are recent, high-profile policy instruments that directly reference system-level instructions or equivalent mechanisms in major AI regulatory jurisdictions. We will revise the policy section to include explicit selection criteria, note their illustrative rather than exhaustive character, and provide a clearer, point-by-point mapping of the typology onto specific provisions in each document to demonstrate the misalignments more rigorously. revision: yes

Circularity Check

0 steps flagged

No circularity: conceptual synthesis of external literature and policies

full rationale

The paper performs a literature synthesis and policy comparison without equations, fitted parameters, derivations, or self-referential loops. It distills a typology from external researcher claims and contrasts it with two policy documents (US Exec. Order, EU Code of Practice). No self-citation is load-bearing for the central claim, and the analysis relies on external sources rather than reducing to its own inputs by construction. This is a standard non-circular conceptual paper.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

This is a qualitative policy and literature analysis paper with no mathematical derivations, data fitting, or technical modeling.

pith-pipeline@v0.9.1-grok · 5824 in / 1086 out tokens · 30569 ms · 2026-07-01T08:14:32.861109+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

298 extracted references · 228 canonical work pages · 9 internal anchors

[1]

Adetayo Adebimpe, Helmut Neukirchen, and Thomas Welsh. 2025. SBASH: a Framework for Designing and Evaluating RAG vs. Prompt-Tuned LLM Honeypots. doi:10.48550/arXiv.2510.21459

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2510.21459 2025
[2]

Andrew Adiletta, Zane Weissman, Fatemeh Khojasteh Dana, Berk Sunar, and Shahin Tajik. 2025. Rubber Mallet: A Study of High Frequency Localized Bit Flips and Their Impact on Security. doi:10.48550/arXiv.2505.01518

work page doi:10.48550/arxiv.2505.01518 2025
[3]

Divyansh Agarwal, Alexander Fabbri, Ben Risher, Philippe Laban, Shafiq Joty, and Chien-Sheng Wu. 2024. Prompt Leakage effect and mitigation strategies for multi-turn LLM Applications. InProceedings of the 2024 Conference on Empirical Methods in Natural Language Processing: Industry Track, Franck Dernoncourt, Daniel Preoţiuc-Pietro, and Anastasia Shimorina...

work page doi:10.18653/v1/2024.emnlp-industry.94 2024
[4]

Liberal, Miren Arrese, and Helena Matute

Ujué Agudo, Karlos G. Liberal, Miren Arrese, and Helena Matute. 2024. The impact of AI errors in a human-in-the-loop process. Cognitive Research: Principles and Implications9, 1 (Jan. 2024), 1. doi:10.1186/s41235-023-00529-3

work page doi:10.1186/s41235-023-00529-3 2024
[5]

Thea Lovise Ahlgren, Helene Fønstelien Sunde, Kai-Kristian Kemell, and Anh Nguyen-Duc. 2025. Assisting early-stage software startups with LLMs: Effective prompt engineering and system instruction design.Information and Software Technology187 (Nov. 2025), 107832. doi:10.1016/j.infsof.2025.107832

work page doi:10.1016/j.infsof.2025.107832 2025
[6]

Ferit Akaybicen, Aaron Cummings, Lota Iwuagwu, Xinyue Zhang, and Modupe Akintomide. 2026. A Machine Learning Approach for Emergency Detection in Medical Scenarios Using Large Language Models. InProceedings of the International Symposium on Intelligent Computing and Networking 2025, Manuel Rodriguez Martinez, Kejie Lu, Feng Ye, and Yi Qian (Eds.). Springer...

2026
[7]

Ahmet Yusuf Alan, Enis Karaarslan, and Omer Aydin. 2025. Improving LLM Reliability with RAG in Religious Question-Answering: MufassirQAS. doi:10.48550/arXiv.2401.15378

work page doi:10.48550/arxiv.2401.15378 2025
[8]

Maimounah Alhujaili and Ruqayya Abdulrahman. 2025. Fine-Tuning OpenAI GPT Chatbot in Western Saudi Dialect: A Case Study of Taibah University.International Journal of Advanced Computer Science and Applications16, 6 (2025). doi:10.14569/IJACSA.2025.0160632

work page doi:10.14569/ijacsa.2025.0160632 2025
[9]

Muhammad Ali, Bixia Chen, and Gary Wong. 2025. Developing Alice: A Scaffolding Agent for AI-Mediated Computational Thinking. Proceedings of the 9th International Conference on Computational Thinking and STEM Education (CTE-STEM 2025), 9 (June 2025), 26–31. doi:10.5281/zenodo.15769853

work page doi:10.5281/zenodo.15769853 2025
[10]

Ali, Angèle Christin, Andrew Smart, and Riitta Katila

Sanna J. Ali, Angèle Christin, Andrew Smart, and Riitta Katila. 2023. Walking the Walk of AI Ethics: Organizational Challenges and the Individualization of Risk among Ethics Entrepreneurs. In2023 ACM Conference on Fairness Accountability and Transparency. ACM, Chicago IL USA, 217–226. doi:10.1145/3593013.3593990

work page doi:10.1145/3593013.3593990 2023
[11]

Masoud, Alaa Alzahrani, Deema Alnuhait, Emad A

Mohammed Alkhowaiter, Norah Alshahrani, Saied Alshahrani, Reem I. Masoud, Alaa Alzahrani, Deema Alnuhait, Emad A. Alghamdi, and Khalid Almubarak. 2025. Mind the Gap: A Review of Arabic Post-Training Datasets and Their Limitations. InProceedings of The Third Arabic Natural Language Processing Conference, Kareem Darwish, Ahmed Ali, Ibrahim Abu Farha, Samia ...

work page doi:10.18653/v1/2025.arabicnlp-main.26 2025
[12]

Mina Almasi and Ross Deans Kristensen-McLachlan. 2025. Alignment Drift in CEFR-prompted LLMs for Interactive Spanish Tutoring. doi:10.48550/arXiv.2505.08351

work page doi:10.48550/arxiv.2505.08351 2025
[13]

Ayesha Amjad, Saurav Sthapit, and Tahir Qasim Syed. 2026. An Agentic System with Reinforcement-Learned Subsystem Improvements for Parsing Form-Like Documents. InEngineering Multi-Agent Systems, Sebastian Rodriguez, Lu Feng, and Jörg P. Müller (Eds.). Springer Nature Switzerland, Cham, 27–44

2026
[14]

Anthropic. 2025. Configuring and Using Styles | Claude Help Center — support.claude.com. https://support.claude.com/en/articles/ 10181068-configuring-and-using-styles [Accessed 26-11-2025]. Prompt Governance? On Governing Technologies Governed by Natural Language FAccT ’26, June 25–28, 2026, Montreal, QC, Canada

2025
[15]

Anthropic. 2025. Effective context engineering for AI agents — anthropic.com. https://www.anthropic.com/engineering/effective- context-engineering-for-ai-agents [Accessed 11-01-2026]

2025
[16]

Anthropic. 2025. Giving Claude a role with a system prompt - Anthropic — docs.anthropic.com. https://docs.anthropic.com/en/docs/ build-with-claude/prompt-engineering/system-prompts [Accessed 07-09-2025]

2025
[17]

Anthropic. 2026. Statement on the comments from Secretary of War Pete Hegseth — anthropic.com. https://www.anthropic.com/ news/statement-comments-secretary-war [Accessed 17-03-2026]

2026
[18]

Anthropic. 2026. Where things stand with the Department of War — anthropic.com. https://www.anthropic.com/news/where-stand- department-war [Accessed 17-03-2026]

2026
[19]

Paula Akemi Aoyagui, Kelsey Stemmler, Sharon A Ferguson, Young-Ho Kim, and Anastasia Kuzminykh. 2025. A Matter of Perspective(s): Contrasting Human and LLM Argumentation in Subjective Decision-Making on Subtle Sexism. InProceedings of the 2025 CHI Conference on Human Factors in Computing Systems (CHI ’25). Association for Computing Machinery, New York, NY...

work page doi:10.1145/3706598.3713248 2025
[20]

Tariq Arif and Md Rahim. 2025. Agentic AI for Real-Time Adaptive PID Control of a Servo Motor.Actuators14, 9 (Sept. 2025), 459. doi:10.3390/act14090459

work page doi:10.3390/act14090459 2025
[21]

Rauno Arike, Elizabeth Donoway, Henning Bartsch, and Marius Hobbhahn. 2025. Technical Report: Evaluating Goal Drift in Language Model Agents. doi:10.48550/arXiv.2505.02709

work page doi:10.48550/arxiv.2505.02709 2025
[22]

Suriya Ganesh Ayyamperumal and Limin Ge. 2024. Current state of LLM Risks and AI Guardrails. doi:10.48550/arXiv.2406.12934

work page doi:10.48550/arxiv.2406.12934 2024
[23]

Yuntao Bai, Andy Jones, Kamal Ndousse, Amanda Askell, Anna Chen, Nova DasSarma, Dawn Drain, Stanislav Fort, Deep Ganguli, Tom Henighan, Nicholas Joseph, Saurav Kadavath, Jackson Kernion, Tom Conerly, Sheer El-Showk, Nelson Elhage, Zac Hatfield-Dodds, Danny Hernandez, Tristan Hume, Scott Johnston, Shauna Kravec, Liane Lovitt, Neel Nanda, Catherine Olsson, ...

work page internal anchor Pith review Pith/arXiv arXiv 2022
[24]

Agathe Balayn, Lorenzo Corti, Fanny Rancourt, Fabio Casati, and Ujwal Gadiraju. 2024. Understanding Stakeholders’ Perceptions and Needs Across the LLM Supply Chain. arXiv:2405.16311 [cs.HC] https://arxiv.org/abs/2405.16311

work page arXiv 2024
[25]

Rick Battle and Teja Gollapudi. 2024. The Unreasonable Effectiveness of Eccentric Automatic Prompts. doi:10.48550/arXiv.2402.10949

work page doi:10.48550/arxiv.2402.10949 2024
[26]

Álvaro Guglielmin Becker, Gabriel Bauer de Oliveira, Lana Bertoldo Rossato, and Anderson Rocha Tavares. 2025. Boardwalk: Towards a Framework for Creating Board Games with LLMs. InAnais do XXIV Simpósio Brasileiro de Jogos e Entretenimento Digital (SBGames 2025). 655–667. doi:10.5753/sbgames.2025.10222

work page doi:10.5753/sbgames.2025.10222 2025
[27]

Rebecca Bellan. 2025. OpenAI adds new teen safety rules to ChatGPT as lawmakers weigh AI standards for minors | TechCrunch — techcrunch.com. https://techcrunch.com/2025/12/19/openai-adds-new-teen-safety-rules-to-models-as-lawmakers-weigh-ai- standards-for-minors/ [Accessed 10-01-2026]

2025
[28]

Ziv Ben-Zion, Paul Raffelhüschen, Max Zettl, Antonia Lüönd, Achim Burrer, Philipp Homan, and Tobias R. Spiller. 2025. Detecting and Preventing Harmful Behaviors in AI Companions: Development and Evaluation of the SHIELD Supervisory System. doi:10.48550/ arXiv.2510.15891

work page arXiv 2025
[29]

Shir Bernstein, David Beste, Daniel Ayzenshteyn, Lea Schonherr, and Yisroel Mirsky. 2025. Trust Me, I Know This Function: Hijacking LLM Static Analysis using Bias. doi:10.48550/arXiv.2508.17361

work page doi:10.48550/arxiv.2508.17361 2025
[30]

Mazal Bethany, Nishant Vishwamitra, Cho-Yu Jason Chiang, and Peyman Najafirad. 2025. CAMOUFLAGE: Exploiting Misinformation Detection Systems Through LLM-driven Adversarial Claim Transformation. doi:10.48550/arXiv.2505.01900

work page doi:10.48550/arxiv.2505.01900 2025
[31]

Abeba Birhane, Pratyusha Kalluri, Dallas Card, William Agnew, Ravit Dotan, and Michelle Bao. 2022. The Values Encoded in Machine Learning Research. InProceedings of the 2022 ACM Conference on Fairness, Accountability, and Transparency (FAccT ’22). Association for Computing Machinery, New York, NY, USA, 173–184. doi:10.1145/3531146.3533083

work page doi:10.1145/3531146.3533083 2022
[32]

Abeba Birhane, Ryan Steed, Victor Ojewale, Briana Vecchione, and Inioluwa Deborah Raji. 2024. AI auditing: The Broken Bus on the Road to AI Accountability. In2024 IEEE Conference on Secure and Trustworthy Machine Learning (SaTML). 612–643. doi:10.1109/ SaTML59370.2024.00037

work page arXiv 2024
[33]

Bo, Harsh Kumar, Michael Liut, and Ashton Anderson

Jessica Y. Bo, Harsh Kumar, Michael Liut, and Ashton Anderson. 2024. Disclosures & Disclaimers: Investigating the Impact of Transparency Disclosures and Reliability Disclaimers on Learner-LLM Interactions.Proceedings of the AAAI Conference on Human Computation and Crowdsourcing12 (Oct. 2024), 23–32. doi:10.1609/hcomp.v12i1.31597

work page doi:10.1609/hcomp.v12i1.31597 2024
[34]

Sebastian Daniel Boie, Esther Glastetter, Michael Patrick Lux, Felix Balzer, Christof Von Kalle, Christian Lenz, and Ulrike Müller
[35]

2025), e68426–e68426

Evaluating a Chatbot as a Companion for Patients With Breast Cancer: Collaborative Pilot Study.JMIR Cancer11 (Aug. 2025), e68426–e68426. doi:10.2196/68426

work page doi:10.2196/68426 2025
[36]

2020.The Brussels effect: How the European Union rules the world

Anu Bradford. 2020.The Brussels effect: How the European Union rules the world. Oxford University Press

2020
[37]

Christian Braun, Alexander Lilienbeck, and Daniel Mentjukov. 2025. The Hidden Structure – Improving Legal Document Understanding Through Explicit Text Formatting. doi:10.48550/arXiv.2505.12837

work page doi:10.48550/arxiv.2505.12837 2025
[38]

Maarten Buyl, Yousra Fettach, Guillaume Bied, and Tijl De Bie. 2025. Building and Measuring Trust between Large Language Models. doi:10.48550/arXiv.2508.15858 FAccT ’26, June 25–28, 2026, Montreal, QC, Canada Neumann et al

work page doi:10.48550/arxiv.2508.15858 2025
[39]

Maarten Buyl, Alexander Rogiers, Sander Noels, Guillaume Bied, Iris Dominguez-Catena, Edith Heiter, Iman Johary, Alexandru- Cristian Mara, Raphaël Romero, Jefrey Lijffijt, and Tijl De Bie. 2025. Large Language Models Reflect the Ideology of their Creators. arXiv:2410.18417 [cs.CL] https://arxiv.org/abs/2410.18417

work page arXiv 2025
[40]

Maarten Buyl, Alexander Rogiers, Sander Noels, Guillaume Bied, Iris Dominguez-Catena, Edith Heiter, Iman Johary, Alexandru-Cristian Mara, Raphaël Romero, Jefrey Lijffijt, and Tijl De Bie. 2026. Large language models reflect the ideology of their creators.npj Artificial Intelligence2, 1 (Jan. 2026), 7. doi:10.1038/s44387-025-00048-0

work page doi:10.1038/s44387-025-00048-0 2026
[41]

Jinyu Cai, Yusei Ishimizu, Mingyue Zhang, Munan Li, Jialong Li, and Kenji Tei. 2025. Simulation of Language Evolution under Regulated Social Media Platforms: A Synergistic Approach of Large Language Models and Genetic Algorithms. doi:10.48550/arXiv.2502.19193

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2502.19193 2025
[42]

Bochuan Cao, Changjiang Li, Yuanpu Cao, Yameng Ge, Ting Wang, and Jinghui Chen. 2025. You Can’t Steal Nothing: Mitigating Prompt Leakages in LLMs via System Vectors. InProceedings of the 2025 ACM SIGSAC Conference on Computer and Communications Security (Taipei, Taiwan)(CCS ’25). Association for Computing Machinery, New York, NY, USA, 4423–4437. doi:10.11...

work page doi:10.1145/3719027.3765124 2025
[43]

Jeshwanth Challagundla, Mantek Singh, Siddharth Raina, Smarth Behl, FNU Harsh, and Jasmin Jarsania. 2025. SI-Agent: An Agentic Framework for Feedback-Driven Generation and Tuning of Human-Readable System Instructions for Large Language Models. In2025 16th International Conference on Information, Intelligence, Systems & Applications (IISA). 1–9. doi:10.110...

work page doi:10.1109/iisa66859.2025.11311216 2025
[44]

Alan Chan, Carson Ezell, Max Kaufmann, Kevin Wei, Lewis Hammond, Herbie Bradley, Emma Bluemke, Nitarshan Rajkumar, David Krueger, Noam Kolt, Lennart Heim, and Markus Anderljung. 2024. Visibility into AI Agents. InProceedings of the 2024 ACM Conference on Fairness, Accountability, and Transparency(Rio de Janeiro, Brazil)(FAccT ’24). Association for Computi...

work page doi:10.1145/3630106.3658948 2024
[45]

Chun Fai Chan, Daniel Wankit Yip, and Aysan Esmradi. 2023. Detection and Defense Against Prominent Attacks on Preconditioned LLM-Integrated Virtual Assistants. In2023 IEEE Asia-Pacific Conference on Computer Science and Data Engineering (CSDE). 1–5. doi:10.1109/CSDE59766.2023.10487759

work page doi:10.1109/csde59766.2023.10487759 2023
[46]

Pantid Chantangphol, Pornchanan Balee, Kantapong Sucharitpongpan, Chanatip Saetia, and Tawunrat Chalothorn. 2025. FinMind- Y-Me at the Regulations Challenge Task: Financial Mind Your Meaning based on THaLLE. InProceedings of the Joint Workshop of the 9th Financial Technology and Natural Language Processing (FinNLP), the 6th Financial Narrative Processing ...

2025
[47]

Pappas, Florian Tramèr, Hamed Hassani, and Eric Wong

Patrick Chao, Edoardo Debenedetti, Alexander Robey, Maksym Andriushchenko, Francesco Croce, Vikash Sehwag, Edgar Dobriban, Nicolas Flammarion, George J. Pappas, Florian Tramèr, Hamed Hassani, and Eric Wong. 2024. JailbreakBench: An Open Robustness Benchmark for Jailbreaking Large Language Models. InAdvances in Neural Information Processing Systems, A. Glo...

work page doi:10.52202/079017-1745 2024
[48]

Shreya Chappidi, Jatinder Singh, and Andra V Krauze. 2026. Who Does What? Archetypes of Roles Assigned to LLMs During Human-AI Decision-Making. InProceedings of the 2026 CHI Conference on Human Factors in Computing Systems. ACM, Barcelona, Spain. doi:10.1145/3772318.3791428

work page doi:10.1145/3772318.3791428 2026
[49]

Alex Chen, Renato Geh, Aditya Grover, Guy Van den Broeck, and Daniel Israel. 2025. The Pitfalls of KV Cache Compression. doi:10.48550/arXiv.2510.00231

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2510.00231 2025
[50]

Bocheng Chen, Hanqing Guo, and Qiben Yan. 2024. FlexLLM: Exploring LLM Customization for Moving Target Defense on Black-Box LLMs Against Jailbreak Attacks. doi:10.48550/arXiv.2412.07672

work page doi:10.48550/arxiv.2412.07672 2024
[51]

Bocheng Chen, Nikolay Ivanov, Guangjing Wang, and Qiben Yan. 2024. Multi-Turn Hidden Backdoor in Large Language Model-powered Chatbot Models. InProceedings of the 19th ACM Asia Conference on Computer and Communications Security. ACM, Singapore Singapore, 1316–1330. doi:10.1145/3634737.3656289

work page doi:10.1145/3634737.3656289 2024
[52]

Kedi Chen, Qin Chen, Jie Zhou, He Yishen, and Liang He. 2024. DiaHalu: A Dialogue-level Hallucination Evaluation Benchmark for Large Language Models. InFindings of the Association for Computational Linguistics: EMNLP 2024. Association for Computational Linguistics, Miami, Florida, USA, 9057–9079. doi:10.18653/v1/2024.findings-emnlp.529

work page doi:10.18653/v1/2024.findings-emnlp.529 2024
[53]

Tong Chen, Faeze Brahman, Jiacheng Liu, Niloofar Mireshghallah, Weijia Shi, Pang Wei Koh, Luke Zettlemoyer, and Hannaneh Hajishirzi
[54]

doi:10.48550/arXiv.2504.14452

ParaPO: Aligning Language Models to Reduce Verbatim Reproduction of Pre-training Data. doi:10.48550/arXiv.2504.14452

work page doi:10.48550/arxiv.2504.14452
[55]

Zhangquan Chen, Chunjiang Liu, and Haobin Duan. 2024. A Three-Phases-LORA Finetuned Hybrid LLM Integrated with Strong Prior Module in the Education Context. InArtificial Neural Networks and Machine Learning – ICANN 2024, Michael Wand, Kristína Malinovská, Jürgen Schmidhuber, and Igor V. Tetko (Eds.). Vol. 15020. Springer Nature Switzerland, Cham, 235–250....

work page doi:10.1007/978-3- 2024
[56]

Xiang Cheng, Raveesh Mayya, and João Sedoc. 2025. To Err Is Human; To Annotate, SILICON? Reducing Measurement Error in LLM Annotation. doi:10.48550/arXiv.2412.14461

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2412.14461 2025
[57]

Simon Chesterman, Lyria Bennett Moses, and Ugo Pagallo. 2023. All Rise for the Honourable Robot Judge? Using Artificial Intelligence to Regulate AI: a debate.Technology and Regulation2023 (Oct. 2023), 45–57. doi:10.71265/0p137y60 Prompt Governance? On Governing Technologies Governed by Natural Language FAccT ’26, June 25–28, 2026, Montreal, QC, Canada

work page doi:10.71265/0p137y60 2023
[58]

Cheng-Han Chiang, Wei-Chih Chen, Chun-Yi Kuan, Chienchou Yang, and Hung-yi Lee. 2024. Large Language Model as an Assignment Evaluator: Insights, Feedback, and Challenges in a 1000+ Student Course. InProceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, Yaser Al-Onaizan, Mohit Bansal, and Yun-Nung Chen (Eds.). Association ...

work page doi:10.18653/v1/2024.emnlp-main.146 2024
[59]

Jeffrey Yang Fan Chiang, Seungjae Lee, Jia-Bin Huang, Furong Huang, and Yizheng Chen. 2025. Why Are Web AI Agents More Vulnerable Than Standalone LLMs? A Security Analysis. doi:10.48550/arXiv.2502.20383

work page doi:10.48550/arxiv.2502.20383 2025
[60]

Yu Ying Chiu, Liwei Jiang, and Yejin Choi. 2025. DailyDilemmas: Revealing Value Preferences of LLMs with Quandaries of Daily Life. doi:10.48550/arXiv.2410.02683

work page doi:10.48550/arxiv.2410.02683 2025
[61]

Yumin Choi, Jinheon Baek, and Sung Ju Hwang. 2025. System Prompt Optimization with Meta-Learning. doi:10.48550/arXiv.2505.09666

work page doi:10.48550/arxiv.2505.09666 2025
[62]

Sora Chon, Jaehoon Kim, and Jaeho Kim. 2025. Multifaceted variability in LLM-driven stock recommendations.Finance Research Letters86 (Dec. 2025), 108923. doi:10.1016/j.frl.2025.108923

work page doi:10.1016/j.frl.2025.108923 2025
[63]

Chrome. 2025. The Prompt API | AI on Chrome | Chrome for Developers — developer.chrome.com. https://developer.chrome.com/ docs/ai/prompt-api [Accessed 13-01-2026]

2025
[64]

Gabriel Chua, Shing Yee Chan, and Shaun Khoo. 2025. A Flexible Large Language Models Guardrail Development Methodology Applied to Off-Topic Prompt Detection. doi:10.48550/arXiv.2411.12946

work page doi:10.48550/arxiv.2411.12946 2025
[65]

Peter Cihon, Jonas Schuett, and Seth D. Baum. 2021. Corporate Governance of Artificial Intelligence in the Public Interest.Information 12, 7 (2021). doi:10.3390/info12070275

work page doi:10.3390/info12070275 2021
[66]

Victoria Clarke and Virginia Braun. 2017. Thematic analysis.The Journal of Positive Psychology12, 3 (May 2017), 297–298. doi:10.1080/ 17439760.2016.1262613

work page arXiv 2017
[67]

Claude. 2025. System Prompts — platform.claude.com. https://platform.claude.com/docs/en/release-notes/system-prompts [Accessed 07-01-2026]

2025
[68]

Jennifer Cobbe, Michelle Seng Ah Lee, and Jatinder Singh. 2021. Reviewable Automated Decision-Making: A Framework for Accountable Algorithmic Systems. InProceedings of the 2021 ACM Conference on Fairness, Accountability, and Transparency (FAccT ’21). Association for Computing Machinery, New York, NY, USA, 598–609. doi:10.1145/3442188.3445921

work page doi:10.1145/3442188.3445921 2021
[69]

Jennifer Cobbe, Michael Veale, and Jatinder Singh. 2023. Understanding accountability in algorithmic supply chains. InProceedings of the 2023 ACM Conference on Fairness, Accountability, and Transparency(Chicago, IL, USA)(FAccT ’23). Association for Computing Machinery, New York, NY, USA, 1186–1197. doi:10.1145/3593013.3594073

work page doi:10.1145/3593013.3594073 2023
[70]

Kwesi Adu Cobbina and Tianyi Zhou. 2025. Where to show Demos in Your Prompt: A Positional Bias of In-Context Learning. InProceedings of the 2025 Conference on Empirical Methods in Natural Language Processing, Christos Christodoulopoulos, Tanmoy Chakraborty, Carolyn Rose, and Violet Peng (Eds.). Association for Computational Linguistics, Suzhou, China, 295...

2025
[71]

Ofir Cohen, Gil Ari Agmon, Asaf Shabtai, and Rami Puzis. 2025. The Information Security Awareness of Large Language Models. doi:10.48550/arXiv.2411.13207

work page doi:10.48550/arxiv.2411.13207 2025
[72]

Luca Collini, Siddharth Garg, and Ramesh Karri. 2025. C2HLSC: Leveraging Large Language Models to Bridge the Software-to-Hardware Design Gap.ACM Trans. Des. Autom. Electron. Syst.30, 6, Article 96 (Oct. 2025), 24 pages. doi:10.1145/3734524

work page doi:10.1145/3734524 2025
[73]

Feder Cooper, Emanuel Moss, Benjamin Laufer, and Helen Nissenbaum

A. Feder Cooper, Emanuel Moss, Benjamin Laufer, and Helen Nissenbaum. 2022. Accountability in an Algorithmic Society: Relationality, Responsibility, and Robustness in Machine Learning. InProceedings of the 2022 ACM Conference on Fairness, Accountability, and Transparency (FAccT ’22). Association for Computing Machinery, New York, NY, USA, 864–876. doi:10....

work page doi:10.1145/3531146.3533150 2022
[74]

Rimom Costa. 2025. Instruction-Level Weight Shaping: A Framework for Self-Improving AI Agents. doi:10.48550/arXiv.2509.00251

work page doi:10.48550/arxiv.2509.00251 2025
[75]

Yuhao Dan, Zhikai Lei, Yiyang Gu, Yong Li, Jianghao Yin, Jiaju Lin, Linhao Ye, Zhiyan Tie, Yougen Zhou, Yilei Wang, Aimin Zhou, Ze Zhou, Qin Chen, Jie Zhou, Liang He, and Xipeng Qiu. 2023. EduChat: A Large-Scale Language Model-based Chatbot System for Intelligent Education. doi:10.48550/arXiv.2308.02773

work page doi:10.48550/arxiv.2308.02773 2023
[76]

Daniel and Anand Pal

Johan S. Daniel and Anand Pal. 2024. Impact of Non-Standard Unicode Characters on Security and Comprehension in Large Language Models. doi:10.48550/arXiv.2405.14490

work page doi:10.48550/arxiv.2405.14490 2024
[77]

Hadi Amini, and Yanzhao Wu

Badhan Chandra Das, M. Hadi Amini, and Yanzhao Wu. 2025. System Prompt Extraction Attacks and Defenses in Large Language Models. doi:10.48550/arXiv.2505.23817

work page doi:10.48550/arxiv.2505.23817 2025
[78]

Davis and Florencia Marotta-Wurgler

Kevin E. Davis and Florencia Marotta-Wurgler. 2024. Filling the Void: How E.U. Privacy Law Spills Over to the U.S.Journal of Law & Empirical Analysis1, 1 (2024), 77–97. doi:10.1177/2755323X241237619

work page doi:10.1177/2755323x241237619 2024
[79]

Íñigo de Troya, Jacqueline Kernahan, Neelke Doorn, Virginia Dignum, and Roel Dobbe. 2025. Misabstraction in Sociotechnical Systems. InProceedings of the 2025 ACM Conference on Fairness, Accountability, and Transparency (FAccT ’25). Association for Computing Machinery, New York, NY, USA, 1829–1842. doi:10.1145/3715275.3732122

work page doi:10.1145/3715275.3732122 2025
[80]

Edoardo Debenedetti, Javier Rando, Daniel Paleka, Fineas Silaghi, Dragos Albastroiu, Niv Cohen, Yuval Lemberg, Reshmi Ghosh, Rui Wen, Ahmed Salem, Giovanni Cherubin, Santiago Zanella-Beguelin, Robin Schmid, Victor Klemm, Takahiro Miki, Chenhao Li, Stefan Kraft, Mario Fritz, Florian Tramèr, Sahar Abdelnabi, and Lea Schönherr. 2024. Dataset and Lessons Lear...

work page doi:10.52202/079017-1164 2024

Showing first 80 references.

[1] [1]

Adetayo Adebimpe, Helmut Neukirchen, and Thomas Welsh. 2025. SBASH: a Framework for Designing and Evaluating RAG vs. Prompt-Tuned LLM Honeypots. doi:10.48550/arXiv.2510.21459

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2510.21459 2025

[2] [2]

Andrew Adiletta, Zane Weissman, Fatemeh Khojasteh Dana, Berk Sunar, and Shahin Tajik. 2025. Rubber Mallet: A Study of High Frequency Localized Bit Flips and Their Impact on Security. doi:10.48550/arXiv.2505.01518

work page doi:10.48550/arxiv.2505.01518 2025

[3] [3]

Divyansh Agarwal, Alexander Fabbri, Ben Risher, Philippe Laban, Shafiq Joty, and Chien-Sheng Wu. 2024. Prompt Leakage effect and mitigation strategies for multi-turn LLM Applications. InProceedings of the 2024 Conference on Empirical Methods in Natural Language Processing: Industry Track, Franck Dernoncourt, Daniel Preoţiuc-Pietro, and Anastasia Shimorina...

work page doi:10.18653/v1/2024.emnlp-industry.94 2024

[4] [4]

Liberal, Miren Arrese, and Helena Matute

Ujué Agudo, Karlos G. Liberal, Miren Arrese, and Helena Matute. 2024. The impact of AI errors in a human-in-the-loop process. Cognitive Research: Principles and Implications9, 1 (Jan. 2024), 1. doi:10.1186/s41235-023-00529-3

work page doi:10.1186/s41235-023-00529-3 2024

[5] [5]

Thea Lovise Ahlgren, Helene Fønstelien Sunde, Kai-Kristian Kemell, and Anh Nguyen-Duc. 2025. Assisting early-stage software startups with LLMs: Effective prompt engineering and system instruction design.Information and Software Technology187 (Nov. 2025), 107832. doi:10.1016/j.infsof.2025.107832

work page doi:10.1016/j.infsof.2025.107832 2025

[6] [6]

Ferit Akaybicen, Aaron Cummings, Lota Iwuagwu, Xinyue Zhang, and Modupe Akintomide. 2026. A Machine Learning Approach for Emergency Detection in Medical Scenarios Using Large Language Models. InProceedings of the International Symposium on Intelligent Computing and Networking 2025, Manuel Rodriguez Martinez, Kejie Lu, Feng Ye, and Yi Qian (Eds.). Springer...

2026

[7] [7]

Ahmet Yusuf Alan, Enis Karaarslan, and Omer Aydin. 2025. Improving LLM Reliability with RAG in Religious Question-Answering: MufassirQAS. doi:10.48550/arXiv.2401.15378

work page doi:10.48550/arxiv.2401.15378 2025

[8] [8]

Maimounah Alhujaili and Ruqayya Abdulrahman. 2025. Fine-Tuning OpenAI GPT Chatbot in Western Saudi Dialect: A Case Study of Taibah University.International Journal of Advanced Computer Science and Applications16, 6 (2025). doi:10.14569/IJACSA.2025.0160632

work page doi:10.14569/ijacsa.2025.0160632 2025

[9] [9]

Muhammad Ali, Bixia Chen, and Gary Wong. 2025. Developing Alice: A Scaffolding Agent for AI-Mediated Computational Thinking. Proceedings of the 9th International Conference on Computational Thinking and STEM Education (CTE-STEM 2025), 9 (June 2025), 26–31. doi:10.5281/zenodo.15769853

work page doi:10.5281/zenodo.15769853 2025

[10] [10]

Ali, Angèle Christin, Andrew Smart, and Riitta Katila

Sanna J. Ali, Angèle Christin, Andrew Smart, and Riitta Katila. 2023. Walking the Walk of AI Ethics: Organizational Challenges and the Individualization of Risk among Ethics Entrepreneurs. In2023 ACM Conference on Fairness Accountability and Transparency. ACM, Chicago IL USA, 217–226. doi:10.1145/3593013.3593990

work page doi:10.1145/3593013.3593990 2023

[11] [11]

Masoud, Alaa Alzahrani, Deema Alnuhait, Emad A

Mohammed Alkhowaiter, Norah Alshahrani, Saied Alshahrani, Reem I. Masoud, Alaa Alzahrani, Deema Alnuhait, Emad A. Alghamdi, and Khalid Almubarak. 2025. Mind the Gap: A Review of Arabic Post-Training Datasets and Their Limitations. InProceedings of The Third Arabic Natural Language Processing Conference, Kareem Darwish, Ahmed Ali, Ibrahim Abu Farha, Samia ...

work page doi:10.18653/v1/2025.arabicnlp-main.26 2025

[12] [12]

Mina Almasi and Ross Deans Kristensen-McLachlan. 2025. Alignment Drift in CEFR-prompted LLMs for Interactive Spanish Tutoring. doi:10.48550/arXiv.2505.08351

work page doi:10.48550/arxiv.2505.08351 2025

[13] [13]

Ayesha Amjad, Saurav Sthapit, and Tahir Qasim Syed. 2026. An Agentic System with Reinforcement-Learned Subsystem Improvements for Parsing Form-Like Documents. InEngineering Multi-Agent Systems, Sebastian Rodriguez, Lu Feng, and Jörg P. Müller (Eds.). Springer Nature Switzerland, Cham, 27–44

2026

[14] [14]

Anthropic. 2025. Configuring and Using Styles | Claude Help Center — support.claude.com. https://support.claude.com/en/articles/ 10181068-configuring-and-using-styles [Accessed 26-11-2025]. Prompt Governance? On Governing Technologies Governed by Natural Language FAccT ’26, June 25–28, 2026, Montreal, QC, Canada

2025

[15] [15]

Anthropic. 2025. Effective context engineering for AI agents — anthropic.com. https://www.anthropic.com/engineering/effective- context-engineering-for-ai-agents [Accessed 11-01-2026]

2025

[16] [16]

Anthropic. 2025. Giving Claude a role with a system prompt - Anthropic — docs.anthropic.com. https://docs.anthropic.com/en/docs/ build-with-claude/prompt-engineering/system-prompts [Accessed 07-09-2025]

2025

[17] [17]

Anthropic. 2026. Statement on the comments from Secretary of War Pete Hegseth — anthropic.com. https://www.anthropic.com/ news/statement-comments-secretary-war [Accessed 17-03-2026]

2026

[18] [18]

Anthropic. 2026. Where things stand with the Department of War — anthropic.com. https://www.anthropic.com/news/where-stand- department-war [Accessed 17-03-2026]

2026

[19] [19]

Paula Akemi Aoyagui, Kelsey Stemmler, Sharon A Ferguson, Young-Ho Kim, and Anastasia Kuzminykh. 2025. A Matter of Perspective(s): Contrasting Human and LLM Argumentation in Subjective Decision-Making on Subtle Sexism. InProceedings of the 2025 CHI Conference on Human Factors in Computing Systems (CHI ’25). Association for Computing Machinery, New York, NY...

work page doi:10.1145/3706598.3713248 2025

[20] [20]

Tariq Arif and Md Rahim. 2025. Agentic AI for Real-Time Adaptive PID Control of a Servo Motor.Actuators14, 9 (Sept. 2025), 459. doi:10.3390/act14090459

work page doi:10.3390/act14090459 2025

[21] [21]

Rauno Arike, Elizabeth Donoway, Henning Bartsch, and Marius Hobbhahn. 2025. Technical Report: Evaluating Goal Drift in Language Model Agents. doi:10.48550/arXiv.2505.02709

work page doi:10.48550/arxiv.2505.02709 2025

[22] [22]

Suriya Ganesh Ayyamperumal and Limin Ge. 2024. Current state of LLM Risks and AI Guardrails. doi:10.48550/arXiv.2406.12934

work page doi:10.48550/arxiv.2406.12934 2024

[23] [23]

Yuntao Bai, Andy Jones, Kamal Ndousse, Amanda Askell, Anna Chen, Nova DasSarma, Dawn Drain, Stanislav Fort, Deep Ganguli, Tom Henighan, Nicholas Joseph, Saurav Kadavath, Jackson Kernion, Tom Conerly, Sheer El-Showk, Nelson Elhage, Zac Hatfield-Dodds, Danny Hernandez, Tristan Hume, Scott Johnston, Shauna Kravec, Liane Lovitt, Neel Nanda, Catherine Olsson, ...

work page internal anchor Pith review Pith/arXiv arXiv 2022

[24] [24]

Agathe Balayn, Lorenzo Corti, Fanny Rancourt, Fabio Casati, and Ujwal Gadiraju. 2024. Understanding Stakeholders’ Perceptions and Needs Across the LLM Supply Chain. arXiv:2405.16311 [cs.HC] https://arxiv.org/abs/2405.16311

work page arXiv 2024

[25] [25]

Rick Battle and Teja Gollapudi. 2024. The Unreasonable Effectiveness of Eccentric Automatic Prompts. doi:10.48550/arXiv.2402.10949

work page doi:10.48550/arxiv.2402.10949 2024

[26] [26]

Álvaro Guglielmin Becker, Gabriel Bauer de Oliveira, Lana Bertoldo Rossato, and Anderson Rocha Tavares. 2025. Boardwalk: Towards a Framework for Creating Board Games with LLMs. InAnais do XXIV Simpósio Brasileiro de Jogos e Entretenimento Digital (SBGames 2025). 655–667. doi:10.5753/sbgames.2025.10222

work page doi:10.5753/sbgames.2025.10222 2025

[27] [27]

Rebecca Bellan. 2025. OpenAI adds new teen safety rules to ChatGPT as lawmakers weigh AI standards for minors | TechCrunch — techcrunch.com. https://techcrunch.com/2025/12/19/openai-adds-new-teen-safety-rules-to-models-as-lawmakers-weigh-ai- standards-for-minors/ [Accessed 10-01-2026]

2025

[28] [28]

Ziv Ben-Zion, Paul Raffelhüschen, Max Zettl, Antonia Lüönd, Achim Burrer, Philipp Homan, and Tobias R. Spiller. 2025. Detecting and Preventing Harmful Behaviors in AI Companions: Development and Evaluation of the SHIELD Supervisory System. doi:10.48550/ arXiv.2510.15891

work page arXiv 2025

[29] [29]

Shir Bernstein, David Beste, Daniel Ayzenshteyn, Lea Schonherr, and Yisroel Mirsky. 2025. Trust Me, I Know This Function: Hijacking LLM Static Analysis using Bias. doi:10.48550/arXiv.2508.17361

work page doi:10.48550/arxiv.2508.17361 2025

[30] [30]

Mazal Bethany, Nishant Vishwamitra, Cho-Yu Jason Chiang, and Peyman Najafirad. 2025. CAMOUFLAGE: Exploiting Misinformation Detection Systems Through LLM-driven Adversarial Claim Transformation. doi:10.48550/arXiv.2505.01900

work page doi:10.48550/arxiv.2505.01900 2025

[31] [31]

Abeba Birhane, Pratyusha Kalluri, Dallas Card, William Agnew, Ravit Dotan, and Michelle Bao. 2022. The Values Encoded in Machine Learning Research. InProceedings of the 2022 ACM Conference on Fairness, Accountability, and Transparency (FAccT ’22). Association for Computing Machinery, New York, NY, USA, 173–184. doi:10.1145/3531146.3533083

work page doi:10.1145/3531146.3533083 2022

[32] [32]

Abeba Birhane, Ryan Steed, Victor Ojewale, Briana Vecchione, and Inioluwa Deborah Raji. 2024. AI auditing: The Broken Bus on the Road to AI Accountability. In2024 IEEE Conference on Secure and Trustworthy Machine Learning (SaTML). 612–643. doi:10.1109/ SaTML59370.2024.00037

work page arXiv 2024

[33] [33]

Bo, Harsh Kumar, Michael Liut, and Ashton Anderson

Jessica Y. Bo, Harsh Kumar, Michael Liut, and Ashton Anderson. 2024. Disclosures & Disclaimers: Investigating the Impact of Transparency Disclosures and Reliability Disclaimers on Learner-LLM Interactions.Proceedings of the AAAI Conference on Human Computation and Crowdsourcing12 (Oct. 2024), 23–32. doi:10.1609/hcomp.v12i1.31597

work page doi:10.1609/hcomp.v12i1.31597 2024

[34] [34]

Sebastian Daniel Boie, Esther Glastetter, Michael Patrick Lux, Felix Balzer, Christof Von Kalle, Christian Lenz, and Ulrike Müller

[35] [35]

2025), e68426–e68426

Evaluating a Chatbot as a Companion for Patients With Breast Cancer: Collaborative Pilot Study.JMIR Cancer11 (Aug. 2025), e68426–e68426. doi:10.2196/68426

work page doi:10.2196/68426 2025

[36] [36]

2020.The Brussels effect: How the European Union rules the world

Anu Bradford. 2020.The Brussels effect: How the European Union rules the world. Oxford University Press

2020

[37] [37]

Christian Braun, Alexander Lilienbeck, and Daniel Mentjukov. 2025. The Hidden Structure – Improving Legal Document Understanding Through Explicit Text Formatting. doi:10.48550/arXiv.2505.12837

work page doi:10.48550/arxiv.2505.12837 2025

[38] [38]

Maarten Buyl, Yousra Fettach, Guillaume Bied, and Tijl De Bie. 2025. Building and Measuring Trust between Large Language Models. doi:10.48550/arXiv.2508.15858 FAccT ’26, June 25–28, 2026, Montreal, QC, Canada Neumann et al

work page doi:10.48550/arxiv.2508.15858 2025

[39] [39]

Maarten Buyl, Alexander Rogiers, Sander Noels, Guillaume Bied, Iris Dominguez-Catena, Edith Heiter, Iman Johary, Alexandru- Cristian Mara, Raphaël Romero, Jefrey Lijffijt, and Tijl De Bie. 2025. Large Language Models Reflect the Ideology of their Creators. arXiv:2410.18417 [cs.CL] https://arxiv.org/abs/2410.18417

work page arXiv 2025

[40] [40]

Maarten Buyl, Alexander Rogiers, Sander Noels, Guillaume Bied, Iris Dominguez-Catena, Edith Heiter, Iman Johary, Alexandru-Cristian Mara, Raphaël Romero, Jefrey Lijffijt, and Tijl De Bie. 2026. Large language models reflect the ideology of their creators.npj Artificial Intelligence2, 1 (Jan. 2026), 7. doi:10.1038/s44387-025-00048-0

work page doi:10.1038/s44387-025-00048-0 2026

[41] [41]

Jinyu Cai, Yusei Ishimizu, Mingyue Zhang, Munan Li, Jialong Li, and Kenji Tei. 2025. Simulation of Language Evolution under Regulated Social Media Platforms: A Synergistic Approach of Large Language Models and Genetic Algorithms. doi:10.48550/arXiv.2502.19193

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2502.19193 2025

[42] [42]

Bochuan Cao, Changjiang Li, Yuanpu Cao, Yameng Ge, Ting Wang, and Jinghui Chen. 2025. You Can’t Steal Nothing: Mitigating Prompt Leakages in LLMs via System Vectors. InProceedings of the 2025 ACM SIGSAC Conference on Computer and Communications Security (Taipei, Taiwan)(CCS ’25). Association for Computing Machinery, New York, NY, USA, 4423–4437. doi:10.11...

work page doi:10.1145/3719027.3765124 2025

[43] [43]

Jeshwanth Challagundla, Mantek Singh, Siddharth Raina, Smarth Behl, FNU Harsh, and Jasmin Jarsania. 2025. SI-Agent: An Agentic Framework for Feedback-Driven Generation and Tuning of Human-Readable System Instructions for Large Language Models. In2025 16th International Conference on Information, Intelligence, Systems & Applications (IISA). 1–9. doi:10.110...

work page doi:10.1109/iisa66859.2025.11311216 2025

[44] [44]

Alan Chan, Carson Ezell, Max Kaufmann, Kevin Wei, Lewis Hammond, Herbie Bradley, Emma Bluemke, Nitarshan Rajkumar, David Krueger, Noam Kolt, Lennart Heim, and Markus Anderljung. 2024. Visibility into AI Agents. InProceedings of the 2024 ACM Conference on Fairness, Accountability, and Transparency(Rio de Janeiro, Brazil)(FAccT ’24). Association for Computi...

work page doi:10.1145/3630106.3658948 2024

[45] [45]

Chun Fai Chan, Daniel Wankit Yip, and Aysan Esmradi. 2023. Detection and Defense Against Prominent Attacks on Preconditioned LLM-Integrated Virtual Assistants. In2023 IEEE Asia-Pacific Conference on Computer Science and Data Engineering (CSDE). 1–5. doi:10.1109/CSDE59766.2023.10487759

work page doi:10.1109/csde59766.2023.10487759 2023

[46] [46]

Pantid Chantangphol, Pornchanan Balee, Kantapong Sucharitpongpan, Chanatip Saetia, and Tawunrat Chalothorn. 2025. FinMind- Y-Me at the Regulations Challenge Task: Financial Mind Your Meaning based on THaLLE. InProceedings of the Joint Workshop of the 9th Financial Technology and Natural Language Processing (FinNLP), the 6th Financial Narrative Processing ...

2025

[47] [47]

Pappas, Florian Tramèr, Hamed Hassani, and Eric Wong

Patrick Chao, Edoardo Debenedetti, Alexander Robey, Maksym Andriushchenko, Francesco Croce, Vikash Sehwag, Edgar Dobriban, Nicolas Flammarion, George J. Pappas, Florian Tramèr, Hamed Hassani, and Eric Wong. 2024. JailbreakBench: An Open Robustness Benchmark for Jailbreaking Large Language Models. InAdvances in Neural Information Processing Systems, A. Glo...

work page doi:10.52202/079017-1745 2024

[48] [48]

Shreya Chappidi, Jatinder Singh, and Andra V Krauze. 2026. Who Does What? Archetypes of Roles Assigned to LLMs During Human-AI Decision-Making. InProceedings of the 2026 CHI Conference on Human Factors in Computing Systems. ACM, Barcelona, Spain. doi:10.1145/3772318.3791428

work page doi:10.1145/3772318.3791428 2026

[49] [49]

Alex Chen, Renato Geh, Aditya Grover, Guy Van den Broeck, and Daniel Israel. 2025. The Pitfalls of KV Cache Compression. doi:10.48550/arXiv.2510.00231

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2510.00231 2025

[50] [50]

Bocheng Chen, Hanqing Guo, and Qiben Yan. 2024. FlexLLM: Exploring LLM Customization for Moving Target Defense on Black-Box LLMs Against Jailbreak Attacks. doi:10.48550/arXiv.2412.07672

work page doi:10.48550/arxiv.2412.07672 2024

[51] [51]

Bocheng Chen, Nikolay Ivanov, Guangjing Wang, and Qiben Yan. 2024. Multi-Turn Hidden Backdoor in Large Language Model-powered Chatbot Models. InProceedings of the 19th ACM Asia Conference on Computer and Communications Security. ACM, Singapore Singapore, 1316–1330. doi:10.1145/3634737.3656289

work page doi:10.1145/3634737.3656289 2024

[52] [52]

Kedi Chen, Qin Chen, Jie Zhou, He Yishen, and Liang He. 2024. DiaHalu: A Dialogue-level Hallucination Evaluation Benchmark for Large Language Models. InFindings of the Association for Computational Linguistics: EMNLP 2024. Association for Computational Linguistics, Miami, Florida, USA, 9057–9079. doi:10.18653/v1/2024.findings-emnlp.529

work page doi:10.18653/v1/2024.findings-emnlp.529 2024

[53] [53]

Tong Chen, Faeze Brahman, Jiacheng Liu, Niloofar Mireshghallah, Weijia Shi, Pang Wei Koh, Luke Zettlemoyer, and Hannaneh Hajishirzi

[54] [54]

doi:10.48550/arXiv.2504.14452

ParaPO: Aligning Language Models to Reduce Verbatim Reproduction of Pre-training Data. doi:10.48550/arXiv.2504.14452

work page doi:10.48550/arxiv.2504.14452

[55] [55]

Zhangquan Chen, Chunjiang Liu, and Haobin Duan. 2024. A Three-Phases-LORA Finetuned Hybrid LLM Integrated with Strong Prior Module in the Education Context. InArtificial Neural Networks and Machine Learning – ICANN 2024, Michael Wand, Kristína Malinovská, Jürgen Schmidhuber, and Igor V. Tetko (Eds.). Vol. 15020. Springer Nature Switzerland, Cham, 235–250....

work page doi:10.1007/978-3- 2024

[56] [56]

Xiang Cheng, Raveesh Mayya, and João Sedoc. 2025. To Err Is Human; To Annotate, SILICON? Reducing Measurement Error in LLM Annotation. doi:10.48550/arXiv.2412.14461

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2412.14461 2025

[57] [57]

Simon Chesterman, Lyria Bennett Moses, and Ugo Pagallo. 2023. All Rise for the Honourable Robot Judge? Using Artificial Intelligence to Regulate AI: a debate.Technology and Regulation2023 (Oct. 2023), 45–57. doi:10.71265/0p137y60 Prompt Governance? On Governing Technologies Governed by Natural Language FAccT ’26, June 25–28, 2026, Montreal, QC, Canada

work page doi:10.71265/0p137y60 2023

[58] [58]

Cheng-Han Chiang, Wei-Chih Chen, Chun-Yi Kuan, Chienchou Yang, and Hung-yi Lee. 2024. Large Language Model as an Assignment Evaluator: Insights, Feedback, and Challenges in a 1000+ Student Course. InProceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, Yaser Al-Onaizan, Mohit Bansal, and Yun-Nung Chen (Eds.). Association ...

work page doi:10.18653/v1/2024.emnlp-main.146 2024

[59] [59]

Jeffrey Yang Fan Chiang, Seungjae Lee, Jia-Bin Huang, Furong Huang, and Yizheng Chen. 2025. Why Are Web AI Agents More Vulnerable Than Standalone LLMs? A Security Analysis. doi:10.48550/arXiv.2502.20383

work page doi:10.48550/arxiv.2502.20383 2025

[60] [60]

Yu Ying Chiu, Liwei Jiang, and Yejin Choi. 2025. DailyDilemmas: Revealing Value Preferences of LLMs with Quandaries of Daily Life. doi:10.48550/arXiv.2410.02683

work page doi:10.48550/arxiv.2410.02683 2025

[61] [61]

Yumin Choi, Jinheon Baek, and Sung Ju Hwang. 2025. System Prompt Optimization with Meta-Learning. doi:10.48550/arXiv.2505.09666

work page doi:10.48550/arxiv.2505.09666 2025

[62] [62]

Sora Chon, Jaehoon Kim, and Jaeho Kim. 2025. Multifaceted variability in LLM-driven stock recommendations.Finance Research Letters86 (Dec. 2025), 108923. doi:10.1016/j.frl.2025.108923

work page doi:10.1016/j.frl.2025.108923 2025

[63] [63]

Chrome. 2025. The Prompt API | AI on Chrome | Chrome for Developers — developer.chrome.com. https://developer.chrome.com/ docs/ai/prompt-api [Accessed 13-01-2026]

2025

[64] [64]

Gabriel Chua, Shing Yee Chan, and Shaun Khoo. 2025. A Flexible Large Language Models Guardrail Development Methodology Applied to Off-Topic Prompt Detection. doi:10.48550/arXiv.2411.12946

work page doi:10.48550/arxiv.2411.12946 2025

[65] [65]

Peter Cihon, Jonas Schuett, and Seth D. Baum. 2021. Corporate Governance of Artificial Intelligence in the Public Interest.Information 12, 7 (2021). doi:10.3390/info12070275

work page doi:10.3390/info12070275 2021

[66] [66]

Victoria Clarke and Virginia Braun. 2017. Thematic analysis.The Journal of Positive Psychology12, 3 (May 2017), 297–298. doi:10.1080/ 17439760.2016.1262613

work page arXiv 2017

[67] [67]

Claude. 2025. System Prompts — platform.claude.com. https://platform.claude.com/docs/en/release-notes/system-prompts [Accessed 07-01-2026]

2025

[68] [68]

Jennifer Cobbe, Michelle Seng Ah Lee, and Jatinder Singh. 2021. Reviewable Automated Decision-Making: A Framework for Accountable Algorithmic Systems. InProceedings of the 2021 ACM Conference on Fairness, Accountability, and Transparency (FAccT ’21). Association for Computing Machinery, New York, NY, USA, 598–609. doi:10.1145/3442188.3445921

work page doi:10.1145/3442188.3445921 2021

[69] [69]

Jennifer Cobbe, Michael Veale, and Jatinder Singh. 2023. Understanding accountability in algorithmic supply chains. InProceedings of the 2023 ACM Conference on Fairness, Accountability, and Transparency(Chicago, IL, USA)(FAccT ’23). Association for Computing Machinery, New York, NY, USA, 1186–1197. doi:10.1145/3593013.3594073

work page doi:10.1145/3593013.3594073 2023

[70] [70]

Kwesi Adu Cobbina and Tianyi Zhou. 2025. Where to show Demos in Your Prompt: A Positional Bias of In-Context Learning. InProceedings of the 2025 Conference on Empirical Methods in Natural Language Processing, Christos Christodoulopoulos, Tanmoy Chakraborty, Carolyn Rose, and Violet Peng (Eds.). Association for Computational Linguistics, Suzhou, China, 295...

2025

[71] [71]

Ofir Cohen, Gil Ari Agmon, Asaf Shabtai, and Rami Puzis. 2025. The Information Security Awareness of Large Language Models. doi:10.48550/arXiv.2411.13207

work page doi:10.48550/arxiv.2411.13207 2025

[72] [72]

Luca Collini, Siddharth Garg, and Ramesh Karri. 2025. C2HLSC: Leveraging Large Language Models to Bridge the Software-to-Hardware Design Gap.ACM Trans. Des. Autom. Electron. Syst.30, 6, Article 96 (Oct. 2025), 24 pages. doi:10.1145/3734524

work page doi:10.1145/3734524 2025

[73] [73]

Feder Cooper, Emanuel Moss, Benjamin Laufer, and Helen Nissenbaum

A. Feder Cooper, Emanuel Moss, Benjamin Laufer, and Helen Nissenbaum. 2022. Accountability in an Algorithmic Society: Relationality, Responsibility, and Robustness in Machine Learning. InProceedings of the 2022 ACM Conference on Fairness, Accountability, and Transparency (FAccT ’22). Association for Computing Machinery, New York, NY, USA, 864–876. doi:10....

work page doi:10.1145/3531146.3533150 2022

[74] [74]

Rimom Costa. 2025. Instruction-Level Weight Shaping: A Framework for Self-Improving AI Agents. doi:10.48550/arXiv.2509.00251

work page doi:10.48550/arxiv.2509.00251 2025

[75] [75]

Yuhao Dan, Zhikai Lei, Yiyang Gu, Yong Li, Jianghao Yin, Jiaju Lin, Linhao Ye, Zhiyan Tie, Yougen Zhou, Yilei Wang, Aimin Zhou, Ze Zhou, Qin Chen, Jie Zhou, Liang He, and Xipeng Qiu. 2023. EduChat: A Large-Scale Language Model-based Chatbot System for Intelligent Education. doi:10.48550/arXiv.2308.02773

work page doi:10.48550/arxiv.2308.02773 2023

[76] [76]

Daniel and Anand Pal

Johan S. Daniel and Anand Pal. 2024. Impact of Non-Standard Unicode Characters on Security and Comprehension in Large Language Models. doi:10.48550/arXiv.2405.14490

work page doi:10.48550/arxiv.2405.14490 2024

[77] [77]

Hadi Amini, and Yanzhao Wu

Badhan Chandra Das, M. Hadi Amini, and Yanzhao Wu. 2025. System Prompt Extraction Attacks and Defenses in Large Language Models. doi:10.48550/arXiv.2505.23817

work page doi:10.48550/arxiv.2505.23817 2025

[78] [78]

Davis and Florencia Marotta-Wurgler

Kevin E. Davis and Florencia Marotta-Wurgler. 2024. Filling the Void: How E.U. Privacy Law Spills Over to the U.S.Journal of Law & Empirical Analysis1, 1 (2024), 77–97. doi:10.1177/2755323X241237619

work page doi:10.1177/2755323x241237619 2024

[79] [79]

Íñigo de Troya, Jacqueline Kernahan, Neelke Doorn, Virginia Dignum, and Roel Dobbe. 2025. Misabstraction in Sociotechnical Systems. InProceedings of the 2025 ACM Conference on Fairness, Accountability, and Transparency (FAccT ’25). Association for Computing Machinery, New York, NY, USA, 1829–1842. doi:10.1145/3715275.3732122

work page doi:10.1145/3715275.3732122 2025

[80] [80]

Edoardo Debenedetti, Javier Rando, Daniel Paleka, Fineas Silaghi, Dragos Albastroiu, Niv Cohen, Yuval Lemberg, Reshmi Ghosh, Rui Wen, Ahmed Salem, Giovanni Cherubin, Santiago Zanella-Beguelin, Robin Schmid, Victor Klemm, Takahiro Miki, Chenhao Li, Stefan Kraft, Mario Fritz, Florian Tramèr, Sahar Abdelnabi, and Lea Schönherr. 2024. Dataset and Lessons Lear...

work page doi:10.52202/079017-1164 2024