pith. machine review for the scientific record. sign in

arxiv: 2604.23065 · v1 · submitted 2026-04-24 · 💻 cs.CY · cs.SE

Recognition: unknown

What Should Frontier AI Developers Disclose About Internal Deployments?

Jacob Charnock, Justin Miller, Raja Mehta Moreno, William L. Anderson

Authors on Pith no claims yet

Pith reviewed 2026-05-08 09:30 UTC · model grok-4.3

classification 💻 cs.CY cs.SE
keywords frontier AIinternal deploymentsAI safetytransparencydisclosure frameworkAI governancemodel system cardsAI regulation
0
0 comments X

The pith

Frontier AI developers should disclose information about internal model deployments in four categories to enable external safety oversight.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Frontier AI developers are using advanced models internally to speed up their own research, but these uses receive little external review. The paper identifies specific information that should be shared about these internal deployments to demonstrate safety. It organizes this into four categories: the models' capabilities, how they are being used, the safety steps taken, and the governance around decisions. The authors review the upsides and downsides of sharing each type of detail and ways to limit risks like leaking business secrets. If adopted, the approach would help both public reports and compliance with new AI rules.

Core claim

The paper's central claim is that companies should disclose key information about internally deployed frontier models across capabilities, usage, safety mitigations, and governance. For each category the authors analyze the benefits for oversight, the limitations including competitive risks, and strategies to mitigate those risks. This framework is intended to guide both public transparency documents like model system cards and private reports under emerging regulation.

What carries the argument

The four-category disclosure framework (capabilities, usage, safety mitigations, governance) for internally deployed frontier AI models.

Load-bearing premise

That the disclosures can be made in enough detail to allow real external oversight while avoiding the release of sensitive competitive information.

What would settle it

A case where the proposed disclosures are implemented but external reviewers still cannot adequately evaluate the safety of the internal deployments due to gaps in the information provided.

read the original abstract

Frontier AI developers are increasingly deploying highly capable models internally to automate AI R&D, but these deployments currently face limited external oversight. It is essential, therefore, that developers provide evidence that internally deployed models are safe. While recent work has highlighted the risks of internal deployments and proposed broad approaches to transparency and governance, there remains little guidance on the specific information developers should disclose about them. We address this gap by identifying key information that companies should disclose about internally deployed models across four categories: capabilities, usage, safety mitigations, and governance. For each category, we analyse the key benefits and limitations of disclosure and consider how disclosure-related risks can be mitigated. Our framework could be used by developers to inform both public transparency documents, such as model system cards, and private periodic reports required under emerging frontier AI regulation.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

0 major / 0 minor

Summary. The paper claims that frontier AI developers should disclose specific information about internally deployed models across four categories—capabilities, usage, safety mitigations, and governance—to enable external oversight of these deployments, which currently lack transparency. For each category, it analyzes benefits and limitations of disclosure and proposes ways to mitigate associated risks such as competitive harm, with the framework intended to inform both public model system cards and private reports under emerging regulation.

Significance. If the proposed disclosures prove workable, the framework would offer timely, structured policy guidance that directly links identified risks of internal frontier model use (e.g., for automating R&D) to concrete transparency measures. It builds usefully on prior literature by moving from broad calls for governance to category-specific recommendations, potentially aiding both voluntary industry practices and regulatory implementation.

Simulated Author's Rebuttal

0 responses · 0 unresolved

We thank the referee for their positive and accurate summary of the manuscript, as well as for the recommendation of minor revision. The assessment correctly identifies the paper's focus on category-specific disclosure recommendations for internal frontier model deployments and its potential utility for both voluntary and regulatory contexts. No specific major comments were provided in the report.

Circularity Check

0 steps flagged

No significant circularity; conceptual policy framework

full rationale

The paper proposes a disclosure framework across four categories (capabilities, usage, safety mitigations, governance) by analyzing benefits, limitations, and risk mitigations. No equations, derivations, fitted parameters, or self-referential claims appear. All steps draw from externally identified risks in prior literature rather than reducing to the paper's own inputs or self-citations. The work is scoped as policy guidance and remains self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The central claim rests on the assumption that internal deployments require external oversight via disclosure and that balanced disclosure is feasible; no free parameters or invented entities are introduced.

axioms (2)
  • domain assumption Internal deployments of frontier AI models pose risks that necessitate external oversight through specific disclosures.
    Explicitly stated as essential in the abstract for safety.
  • domain assumption Disclosure risks in the four categories can be mitigated without undermining the value of the disclosures.
    The paper states it considers how disclosure-related risks can be mitigated.

pith-pipeline@v0.9.0 · 5437 in / 1269 out tokens · 35392 ms · 2026-05-08T09:30:37.010068+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

55 extracted references · 17 canonical work pages · 1 internal anchor

  1. [1]

    write newline

    " write newline "" before.all 'output.state := FUNCTION n.dashify 't := "" t empty not t #1 #1 substring "-" = t #1 #2 substring "--" = not "--" * t #2 global.max substring 't := t #1 #1 substring "-" = "-" * t #2 global.max substring 't := while if t #1 #1 substring * t #2 global.max substring 't := if while FUNCTION format.date year duplicate empty "emp...

  2. [2]

    and Delaney, O

    Acharya, A. and Delaney, O. Managing risks from internal AI systems. https://www.iaps.ai/research/managing-risks-from-internal-ai-systems, 2025

  3. [3]

    2025 AI Forecasting Survey

    AI Digest . 2025 AI Forecasting Survey . https://ai2025.org/, 2025

  4. [4]

    A grading rubric for ai safety frameworks, 2024

    Alaga, J., Schuett, J., and Anderljung, M. A grading rubric for ai safety frameworks, 2024. URL https://arxiv.org/abs/2409.08751

  5. [5]

    Strengthening our safeguards through collaboration with US CAISI and UK AISI

    Anthropic . Strengthening our safeguards through collaboration with US CAISI and UK AISI . https://www.anthropic.com/news/strengthening-our-safeguards-through-collaboration-with-us-caisi-and-uk-aisi, 2025

  6. [6]

    Claude 's constitution

    Anthropic . Claude 's constitution. https://www.anthropic.com/constitution, 2026 a

  7. [7]

    Alignment risk update: Claude Mythos preview

    Anthropic . Alignment risk update: Claude Mythos preview. https://www.anthropic.com/claude-mythos-preview-risk-report, April 2026 b

  8. [8]

    System card: Claude Mythos Preview

    Anthropic . System card: Claude Mythos Preview . https://cdn.sanity.io/files/4zrzovbb/website/7624816413e9b4d2e3ba620c5a5e091b98b190a5.pdf, 2026 c

  9. [9]

    System card: Claude Opus 4.6

    Anthropic . System card: Claude Opus 4.6. https://www-cdn.anthropic.com/6a5fa276ac68b9aeb0c8b6af5fa36326e0e166dd.pdf, 2026 d

  10. [10]

    Sabotage risk report: Claude Opus 4.6

    Anthropic . Sabotage risk report: Claude Opus 4.6. https://www-cdn.anthropic.com/f21d93f21602ead5cdbecb8c8e1c765759d9e232.pdf, 2026 e

  11. [11]

    Risk report: February 2026

    Anthropic . Risk report: February 2026. https://www-cdn.anthropic.com/08eca2757081e850ed2ad490e5253e940240ca4f.pdf, 2026 f

  12. [12]

    a , J., Johnson, C., Jolly, G., Katzir, Z., Khan, S. M., Kitano, H., Kr \

    Bengio, Y., Mindermann, S., Privitera, D., Besiroglu, T., Bommasani, R., Casper, S., Choi, Y., Fox, P., Garfinkel, B., Goldfarb, D., Heidari, H., Ho, A., Kapoor, S., Khalatbari, L., Longpre, S., Manning, S., Mavroudis, V., Mazeika, M., Michael, J., Newman, J., Ng, K. Y., Okolo, C. T., Raji, D., Sastry, G., Seger, E., Skeadas, T., South, T., Strubell, E., ...

  13. [13]

    a , J., Johnson, C., Jolly, G., Katzir, Z., Kerema, M. N., Kitano, H., Kr \

    Bengio, Y., Clare, S., Prunkl, C., Murray, M., Andriushchenko, M., Bucknall, B., Bommasani, R., Casper, S., Davidson, T., Douglas, R., Duvenaud, D., Fox, P., Gohar, U., Hadshar, R., Ho, A., Hu, T., Jones, C., Kapoor, S., Kasirzadeh, A., Manning, S., Maslej, N., Mavroudis, V., McGlynn, C., Moulange, R., Newman, J., Ng, K. Y., Paskov, P., Rismani, S., Sastr...

  14. [14]

    Brundage, M., Avin, S., Wang, J., Belfield, H., Krueger, G., Hadfield, G., Khlaaf, H., Yang, J., Toner, H., Fong, R., Maharaj, T., Koh, P. W., Hooker, S., Leung, J., Trask, A., Bluemke, E., Lebensold, J., O'Keefe, C., Koren, M., Ryffel, T., Rubinovitz, J., Besiroglu, T., Carugati, F., Clark, J., Eckersley, P., de Haas, S., Johnson, M., Laurie, B., Ingerma...

  15. [15]

    Brundage, M., Dreksler, N., Homewood, A., McGregor, S., Paskov, P., Stosz, C., Sastry, G., Cooper, A. F., Balston, G., Adler, S., Casper, S., Anderljung, M., Werner, G., Mindermann, S., Mavroudis, V., Bucknall, B., Stix, C., Freund, J., Pacchiardi, L., Hernandez-Orallo, J., Pistillo, M., Chen, M., Painter, C., Ball, D. W., O'Keefe, C., Weil, G., Harack, B...

  16. [16]

    Senate bill no

    California State Legislature . Senate bill no. 53: Transparency in frontier artificial intelligence act. https://leginfo.legislature.ca.gov/faces/billTextClient.xhtml?bill_id=202520260SB53, 2025

  17. [17]

    AI Models Can Be Dangerous before Public Deployment

    Chan, A., Padarath, R., Kwon, J., Greaves, H., and Anderljung, M. Measuring ai R&D automation, 2026. URL https://arxiv.org/abs/2603.03992

  18. [18]

    Ai models can be dangerous before public deployment

    Chan, L. Ai models can be dangerous before public deployment. https://metr.org/blog/2025-01-17-ai-models-dangerous-before-public-deployment/, 01 2025

  19. [19]

    Subliminal learning: Language models transmit behavioral traits via hidden signals in data.arXiv preprint arXiv:2507.14805,

    Cloud, A., Le, M., Chua, J., Betley, J., Sztyber-Betley, A., Hilton, J., Marks, S., and Evans, O. Subliminal learning: Language models transmit behavioral traits via hidden signals in data, 2025. URL https://arxiv.org/abs/2507.14805

  20. [20]

    Ai-enabled coups: How a small group could use ai to seize power, 2025

    Davidson, T., Finnveden, L., and Hadshar, R. Ai-enabled coups: How a small group could use ai to seize power, 2025. URL https://www.forethought.org/research/ai-enabled-coups-how-a-small-group-could-use-ai-to-seize-power. Accessed: 2026-04-22

  21. [21]

    SWE-Bench Pro: Can AI Agents Solve Long-Horizon Software Engineering Tasks?

    Deng, X., Da, J., Pan, E., He, Y. Y., Ide, C., Garg, K., Lauffer, N., Park, A., Pasari, N., Rane, C., Sampath, K., Krishnan, M., Kundurthy, S., Hendryx, S., Wang, Z., Bharadwaj, V., Holm, J., Aluri, R., Zhang, C. B. C., Jacobson, N., Liu, B., and Kenstler, B. Swe-bench pro: Can ai agents solve long-horizon software engineering tasks?, 2025. URL https://ar...

  22. [22]

    and Davidson, T

    Eth, D. and Davidson, T. Will ai r&d automation cause a software intelligence explosion?, 2025. URL https://www.forethought.org/research/will-ai-r-and-d-automation-cause-a-software-intelligence-explosion. Accessed: 2026-04-24

  23. [23]

    The general-purpose AI code of practice: Safety and security chapter

    European Commission . The general-purpose AI code of practice: Safety and security chapter. https://digital-strategy.ec.europa.eu/en/policies/contents-code-gpai, 2025

  24. [24]

    Ai researchers' views on automating ai R&D and intelligence explosions, 2026

    Field, S., Douglas, R., and Krueger, D. Ai researchers' views on automating ai R&D and intelligence explosions, 2026. URL https://arxiv.org/abs/2603.03338

  25. [25]

    SUP 5.6: Confidential information and privilege, 2016

    Financial Conduct Authority . SUP 5.6: Confidential information and privilege, 2016. URL https://www.handbook.fca.org.uk/handbook/SUP/5/6.html?date=2016-03-07. Last amended 7 March 2016. Section within Chapter 5 (Reports by skilled persons) of the Supervision Manual, FCA Handbook

  26. [26]

    Accelerating mathematical and scientific discovery with Gemini Deep Think

    Google DeepMind . Accelerating mathematical and scientific discovery with Gemini Deep Think . https://deepmind.google/blog/accelerating-mathematical-and-scientific-discovery-with-gemini-deep-think/, 2026 a

  27. [27]

    Gemini 3.1 Pro model card

    Google DeepMind . Gemini 3.1 Pro model card. https://storage.googleapis.com/deepmind-media/Model-Cards/Gemini-3-1-Pro-Model-Card.pdf, 2026 b

  28. [28]

    Guan, Miles Wang, Micah Carroll, Zehao Dou, Annie Y

    Guan, M. Y., Wang, M., Carroll, M., Dou, Z., Wei, A. Y., Williams, M., Arnav, B., Huizinga, J., Kivlichan, I., Glaese, M., Pachocki, J., and Baker, B. Monitoring monitorability, 2025. URL https://arxiv.org/abs/2512.18311

  29. [29]

    ac.uk/publications/claude-mythos-future-cybersecurity

    Homewood, A., Williams, S., Dreksler, N., Lidiard, J., Murray, M., Heim, L., Ziosi, M., h \'E igeartaigh, S. \'O ., Chen, M., Wei, K., Winter, C., Brundage, M., Garfinkel, B., and Schuett, J. Third-party compliance reviews for frontier ai safety frameworks, 2025. URL https://arxiv.org/abs/2505.01643

  30. [30]

    Information security, cybersecurity and privacy protection --- Information security management systems --- Requirements , 2022

    ISO/IEC . Information security, cybersecurity and privacy protection --- Information security management systems --- Requirements , 2022. URL https://www.iso.org/standard/27001

  31. [31]

    Jagadeeswari, M., Karthi, P., Nitish Kumar, V., and Ram, S. S. A secure file sharing and audit trail tracking platform with advanced encryption standard for cloud-based environments. In 2023 4th International Conference on Electronics and Sustainable Communication Systems (ICESC), pp.\ 540--547, 2023. doi:10.1109/ICESC57686.2023.10193389

  32. [32]

    Early work on monitorability evaluations

    Kinniment, M., Nix, S., Broadley, T., Wijk, H., and Parikh, N. Early work on monitorability evaluations. https://metr.org/blog/2026-01-19-early-work-on-monitorability-evaluations/, 01 2026

  33. [33]

    K., Heim, L., Rodriguez, M., Sandbrink, J

    Kolt, N., Anderljung, M., Barnhart, J., Brass, A., Esvelt, K., Hadfield, G. K., Heim, L., Rodriguez, M., Sandbrink, J. B., and Woodside, T. Responsible reporting for frontier ai development. Proceedings of the AAAI/ACM Conference on AI, Ethics, and Society, 7 0 (1): 0 768--783, Oct. 2024. doi:10.1609/aies.v7i1.31678. URL https://ojs.aaai.org/index.php/AIE...

  34. [34]

    Kutasov, J., Sun, Y., Colognese, P., van der Weij, T., Petrini, L., Zhang, C. B. C., Hughes, J., Deng, X., Sleight, H., Tracy, T., Shlegeris, B., and Benton, J. Shade-arena: Evaluating sabotage and monitoring in llm agents, 2025. URL https://arxiv.org/abs/2506.15740

  35. [35]

    Goal Misgeneralization in Deep Reinforcement Learning

    Kwon, J. and Casper, S. Internal deployment gaps in ai regulation, 2026. URL https://arxiv.org/abs/2601.08005

  36. [36]

    Review of the anthropic summer 2025 pilot sabotage risk report

    METR . Review of the anthropic summer 2025 pilot sabotage risk report. https://alignment.anthropic.com/2025/sabotage-risk-report/2025_pilot_risk_report_metr_review.pdf, 2025

  37. [37]

    What should companies share about risks from frontier ai models? https://metr.org/blog/2025-06-27-risk-transparency/, 06 2025

    METR. What should companies share about risks from frontier ai models? https://metr.org/blog/2025-06-27-risk-transparency/, 06 2025

  38. [38]

    Responsible AI safety and education ( RAISE ) act, 6453--A

    New York State Legislature . Responsible AI safety and education ( RAISE ) act, 6453--A . https://www.nysenate.gov/legislation/bills/2025/A6453/amendment/A, 2025

  39. [39]

    Bank supervision process

    Office of the Comptroller of the Currency . Bank supervision process. Comptroller's handbook booklet, examination process series, Office of the Comptroller of the Currency, September 2019. URL https://www.occ.gov/publications-and-resources/publications/comptrollers-handbook/files/bank-supervision-process/pub-ch-bank-supervision-process.pdf. Transmitted vi...

  40. [40]

    Working with us caisi and uk aisi to build more secure ai systems

    OpenAI . Working with us caisi and uk aisi to build more secure ai systems. https://openai.com/index/us-caisi-uk-aisi-ai-update/, 2025

  41. [41]

    Introducing GPT-5.3 Codex

    OpenAI . Introducing GPT-5.3 Codex . https://openai.com/index/introducing-gpt-5-3-codex/, 2026 a

  42. [42]

    GPT-5.3 Codex system card

    OpenAI . GPT-5.3 Codex system card. https://deploymentsafety.openai.com/gpt-5-3-codex/gpt-5-3-codex.pdf, 2026 b

  43. [43]

    How we monitor internal coding agents for misalignment

    OpenAI . How we monitor internal coding agents for misalignment. https://openai.com/index/how-we-monitor-internal-coding-agents-misalignment/, 2026 c

  44. [44]

    Introducing GPT-5.4 mini and nano

    OpenAI . Introducing GPT-5.4 mini and nano. https://openai.com/index/introducing-gpt-5-4-mini-and-nano/, 2026 d

  45. [45]

    Openai's Raising Concerns Policy

    OpenAI . Openai's Raising Concerns Policy . https://openai.com/index/openai-raising-concerns-policy/, 2026 e

  46. [46]

    Inference scaling reshapes ai governance, 2025

    Ord, T. Inference scaling reshapes ai governance, 2025. URL https://arxiv.org/abs/2503.05705

  47. [47]

    Red-teaming anthropic's internal agent monitoring systems

    Rein, D. Red-teaming anthropic's internal agent monitoring systems. https://metr.org/blog/2026-03-25-red-teaming-anthropic-agent-monitoring/, 03 2026

  48. [48]

    The Thinking Machines Tinker API is good news for ai control and security

    Shlegeris, B. The Thinking Machines Tinker API is good news for ai control and security. https://blog.redwoodresearch.org/p/the-thinking-machines-tinker-api, 2025. Redwood Research blog

  49. [49]

    CYBERSECEVAL 3: Advancing the Evaluation of Cybersecurity Risks and Capabilities in Large Language Models

    Stix, C., Pistillo, M., Sastry, G., Hobbhahn, M., Ortega, A., Balesni, M., Hallensleben, A., Goldowsky-Dill, N., and Sharkey, L. Ai behind closed doors: a primer on the governance of internal deployment, 2025. URL https://arxiv.org/abs/2504.12170

  50. [50]

    M., Shea-Blymyer, C., Yee, E., Acharya, A., Fisher, K., Scholl, K., Wildeford, P., Greenblatt, R., Albanie, S., Ballard, S., and Larsen, T

    Toner, H., Beers, K., Newman, S., Khan, S. M., Shea-Blymyer, C., Yee, E., Acharya, A., Fisher, K., Scholl, K., Wildeford, P., Greenblatt, R., Albanie, S., Ballard, S., and Larsen, T. When AI builds AI : Findings from a workshop on automation of AI R&D . Technical report, Center for Security and Emerging Technology, January 2026. URL https://cset.georgetow...

  51. [51]

    Food and Drug Administration

    U.S. Food and Drug Administration . 21 CFR 20.61: Trade secrets and commercial or financial information which is privileged or confidential. https://www.ecfr.gov/current/title-21/chapter-I/subchapter-A/part-20/subpart-D/section-20.61, 2026. Electronic Code of Federal Regulations

  52. [52]

    A., and Neidermeyer, P

    Vinten, G., Neidermeyer, A. A., and Neidermeyer, P. E. Audit anticipation: does it impact job performance? Managerial Auditing Journal, 20 0 (1): 0 19--29, 01 2005. ISSN 0268-6902. doi:10.1108/02686900510570669. URL https://doi.org/10.1108/02686900510570669

  53. [53]

    R., Becker, J., Jawhar, S., Parikh, N., Broadley, T., Chan, L., Chen, M., Clymer, J

    Wijk, H., Lin, T. R., Becker, J., Jawhar, S., Parikh, N., Broadley, T., Chan, L., Chen, M., Clymer, J. M., Dhyani, J., Ericheva, E., Garcia, K., Goodrich, B., Jurkovic, N., Kinniment, M., Lajko, A., Nix, S., Koba Sato, L. J., Saunders, W., Taran, M., West, B., and Barnes, E. RE -bench: Evaluating frontier AI R&D capabilities of language model agents again...

  54. [54]

    Williams, M., Raymond, C., Carroll, M., and Team, S. O. Sidestepping evaluation awareness and anticipating misalignment with production evaluations. https://alignment.openai.com/prod-evals/, 2025 a . OpenAI Alignment Blog

  55. [55]

    Assessing risk relative to competitors: An analysis of current AI company policies

    Williams, S., Dreksler, N., Homewood, A., Anderljung, M., and Freund, J. Assessing risk relative to competitors: An analysis of current AI company policies. https://www.governance.ai/research-paper/assessing-risk-relative-to-competitors-an-analysis-of-current-ai-company-policies, 2025 b . GovAI research paper