Prioritizing High-Consequence Biological Capabilities in Evaluations of Artificial Intelligence Models
Pith reviewed 2026-05-24 01:08 UTC · model grok-4.3
The pith
AI model evaluations should prioritize high-consequence biological risks like pandemics and assess them before deployment.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Experience and study by scientists and policy professionals of dual-use capabilities in the life sciences can inform risk evaluations of AI models with biological capabilities. AI model evaluations should prioritize addressing high-consequence risks that could cause large-scale harm to the public, such as pandemics, and these risks should be evaluated prior to model deployment so as to allow potential biosafety and/or biosecurity measures. Identifying which AI capabilities pose the greatest biosecurity and biosafety concerns is necessary in order to establish targeted AI safety evaluation methods, secure these tools against accident and misuse, and avoid impeding immense potential benefits.
What carries the argument
Dual-use identification and mitigation approaches from the life sciences, transferred to evaluate AI models for biological capabilities.
If this is right
- Targeted AI safety evaluation methods can be established for the capabilities of greatest concern.
- AI tools can be secured against accident and misuse through pre-deployment biosafety measures.
- Immense potential benefits of AI in biology can proceed without broad impediments.
- Biosecurity measures can be applied where high-consequence risks are identified.
Where Pith is reading between the lines
- The same prioritization logic could be tested on AI capabilities in other high-risk domains such as chemistry.
- Developers might incorporate these pre-deployment checks into standard release pipelines for frontier models.
- International standards bodies could adopt the dual-use criteria as a baseline for AI governance.
- Early integration of these evaluations during model training could reduce the cost of later mitigation.
Load-bearing premise
Dual-use identification and mitigation approaches developed in the life sciences can be directly and effectively transferred to AI models to identify and mitigate high-consequence biological capabilities without excessive false positives or stifling beneficial applications.
What would settle it
An empirical test showing that life-sciences dual-use criteria applied to AI models produce widespread false positives that block safe models or miss genuine high-consequence biological capabilities.
Figures
read the original abstract
As a result of rapidly accelerating AI capabilities, over the past year, national governments and multinational bodies have announced efforts to address safety, security and ethics issues related to AI models. One high priority among these efforts is the mitigation of misuse of AI models. Many biologists have for decades sought to reduce the risks of scientific research that could lead, through accident or misuse, to high-consequence disease outbreaks. Scientists have carefully considered what types of life sciences research have the potential for both benefit and risk (dual-use), especially as scientific advances have accelerated our ability to engineer organisms and create novel variants of pathogens. Here we describe how previous experience and study by scientists and policy professionals of dual-use capabilities in the life sciences can inform risk evaluations of AI models with biological capabilities. We argue that AI model evaluations should prioritize addressing high-consequence risks (those that could cause large-scale harm to the public, such as pandemics), and that these risks should be evaluated prior to model deployment so as to allow potential biosafety and/or biosecurity measures. Scientists' experience with identifying and mitigating dual-use biological risks can help inform new approaches to evaluating biological AI models. Identifying which AI capabilities post the greatest biosecurity and biosafety concerns is necessary in order to establish targeted AI safety evaluation methods, secure these tools against accident and misuse, and avoid impeding immense potential benefits.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper claims that evaluations of AI models should prioritize high-consequence biological risks (those with potential for large-scale public harm such as pandemics) by drawing on established dual-use research practices from the life sciences; it recommends conducting such evaluations prior to model deployment to enable biosafety and biosecurity measures, while arguing that life-sciences experience can inform targeted AI evaluation methods without impeding beneficial applications.
Significance. If the central recommendation holds, the manuscript provides a policy framework that focuses AI biosecurity efforts on the most serious risks by explicit analogy to decades of life-sciences dual-use precedents; this is a strength because the paper draws on external, long-established literature rather than self-referential results, offering a directional recommendation for prioritization that could help avoid both excessive false positives and under-regulation of high-stakes capabilities.
minor comments (2)
- [Abstract] The abstract and introduction could more precisely delineate the class of AI models under discussion (e.g., general-purpose LLMs versus specialized biological design tools) to sharpen the scope of the recommended evaluations.
- A short table or bullet list summarizing the key dual-use criteria from the life-sciences literature (with citations) would improve readability when the analogy is applied to AI capabilities.
Simulated Author's Rebuttal
We thank the referee for their positive assessment and recommendation to accept the manuscript. The referee's summary correctly identifies the paper's core argument that AI biological capability evaluations should draw on established life-sciences dual-use frameworks to prioritize high-consequence risks and enable pre-deployment biosafety measures.
Circularity Check
No significant circularity
full rationale
The paper advances a policy recommendation to prioritize high-consequence biological risks in AI evaluations by drawing on decades of external life-sciences dual-use research and established biosafety practices. No equations, fitted parameters, predictions, or derivations are presented whose validity reduces to self-citation chains, self-definitional constructs, or inputs renamed as outputs. The central argument is an independent directional recommendation grounded in external precedent rather than any load-bearing internal reduction.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Dual-use research experience in the life sciences provides a valid and transferable framework for identifying and prioritizing high-consequence risks in AI biological capabilities.
Reference graph
Works this paper leans on
- [1]
-
[2]
Maug N. Epoch. 2024 [cited 2024 Mar 27]. Biological Sequence Models in the Context of the AI Directives. Available from: https://epochai.org/blog/biological-sequence-models-in-the-context-of-the- ai-directives
work page 2024
-
[3]
Protein design meets biosecurity
Baker D, Church G. Protein design meets biosecurity. Science. 2024 Jan 26;383(6681):349–349
work page 2024
-
[4]
Autonomous chemical research with large language models
Boiko DA, MacKnight R, Kline B, Gomes G. Autonomous chemical research with large language models. Nature. 2023 Dec;624(7992):570–8
work page 2023
- [5]
-
[6]
Building an early warning system for LLM-aided biological threat creation [Internet]. [cited 2024 Mar 27]. Available from: https://openai.com/research/building-an-early-warning-system-for-llm-aided- biological-threat-creation
work page 2024
-
[7]
The Operational Risks of AI in Large-Scale Biological Attacks: A Red-Team Approach [Internet]
Mouton CA, Lucas C, Guest E. The Operational Risks of AI in Large-Scale Biological Attacks: A Red-Team Approach [Internet]. RAND Corporation; 2023 Oct [cited 2024 Mar 27]. Available from: https://www.rand.org/pubs/research_reports/RRA2977-1.html
work page 2023
-
[8]
Barriers to Bioweapons by Sonia Ben Ouagrham-Gormley | Hardcover [Internet]. Cornell University Press. [cited 2024 Mar 27]. Available from: https://www.cornellpress.cornell.edu/book/9780801452888/barriers-to-bioweapons/
-
[9]
Phantom Menace or Looming Danger? [Internet]
Vogel KM. Phantom Menace or Looming Danger? [Internet]. Johns Hopkins University Press; 2012 [cited 2024 Mar 27]. Available from: https://www.press.jhu.edu/books/title/10403/phantom-menace- or-looming-danger
work page 2012
-
[10]
Opinion | What if Dario Amodei Is Right About A.I.? The New York Times [Internet]
Show’ ‘The Ezra Klein. Opinion | What if Dario Amodei Is Right About A.I.? The New York Times [Internet]. 2024 Apr 12 [cited 2024 May 6]; Available from: https://www.nytimes.com/2024/04/12/opinion/ezra-klein-podcast-dario-amodei.html
work page 2024
-
[11]
Diving deep into OpenAI’s new study on LLM’s and bioweapons [Internet]
Marcus G. Diving deep into OpenAI’s new study on LLM’s and bioweapons [Internet]. Marcus on AI. 2024 [cited 2024 Mar 27]. Available from: https://garymarcus.substack.com/p/when-looked-at- carefully-openais
work page 2024
-
[12]
AI’s bioterrorism potential should not be ruled out [Internet]
Ahuja A. AI’s bioterrorism potential should not be ruled out [Internet]. 2024 [cited 2024 Mar 27]. Available from: https://www.ft.com/content/e2a28b73-9831-4e7e-be7c-a599d2498f24
work page 2024
-
[13]
How to better research the possible threats posed by AI-driven misuse of biology [Internet]
Goudarzi S. How to better research the possible threats posed by AI-driven misuse of biology [Internet]. Bulletin of the Atomic Scientists. 2024 [cited 2024 Mar 27]. Available from: https://thebulletin.org/2024/03/how-to-better-research-the-possible-threats-posed-by-ai-driven-misuse- of-biology/
work page 2024
-
[14]
Sandbrink JB. Artificial intelligence and biological misuse: Differentiating risks of language models and biological design tools [Internet]. arXiv; 2023 [cited 2024 Apr 14]. Available from: http://arxiv.org/abs/2306.13952
-
[15]
End 2 End AI Molecule Design – We generate diverse de novo protein sequences from just a text description of the desired properties by Mol.E , a state-of-the-art ML model [Internet]. [cited 2024 Jun 21]. Available from: https://310.ai/
work page 2024
-
[16]
FutureHouse [Internet]. [cited 2024 Jun 21]. Available from: https://www.futurehouse.org/
work page 2024
-
[17]
SAM.gov [Internet]. [cited 2024 Jun 21]. Available from: https://sam.gov/opp/dd906dc45ee347d5a0c29d980cf67dcc/view
work page 2024
-
[18]
Responsible AI x Biodesign [Internet]. [cited 2024 Mar 27]. Responsible AI x Biodesign. Available from: https://responsiblebiodesign.ai/
work page 2024
-
[19]
Self-driving laboratories to autonomously navigate the protein fitness landscape
Rapp JT, Bremer BJ, Romero PA. Self-driving laboratories to autonomously navigate the protein fitness landscape. Nat Chem Eng. 2024 Jan;1(1):97–107
work page 2024
-
[20]
In vitro continuous protein evolution empowered by machine learning and automation
Yu T, Boob AG, Singh N, Su Y, Zhao H. In vitro continuous protein evolution empowered by machine learning and automation. Cell Syst. 2023 Aug 16;14(8):633–44
work page 2023
-
[21]
Laboratories in the cloud [Internet]
Field M. Laboratories in the cloud [Internet]. Bulletin of the Atomic Scientists. 2019 [cited 2024 Mar 27]. Available from: https://thebulletin.org/2019/07/laboratories-in-the-cloud/
work page 2019
-
[22]
Federation of American Scientists
Bio x AI: Policy Recommendations for a New Frontier [Internet]. Federation of American Scientists. [cited 2024 Mar 27]. Available from: https://fas.org/publication/bio-x-ai-policy- recommendations/
work page 2024
- [23]
-
[24]
Introducing Devin, the first AI software engineer [Internet]. [cited 2024 Mar 27]. Available from: https://www.cognition-labs.com/introducing-devin
work page 2024
-
[25]
On the Opportunities and Risks of Foundation Models
Bommasani R, Hudson DA, Adeli E, Altman R, Arora S, von Arx S, et al. On the Opportunities and Risks of Foundation Models [Internet]. arXiv; 2022 [cited 2024 Mar 31]. Available from: http://arxiv.org/abs/2108.07258
work page internal anchor Pith review Pith/arXiv arXiv 2022
-
[26]
House TW. The White House. 2023 [cited 2024 Mar 27]. Executive Order on the Safe, Secure, and Trustworthy Development and Use of Artificial Intelligence. Available from: https://www.whitehouse.gov/briefing-room/presidential-actions/2023/10/30/executive-order-on-the- safe-secure-and-trustworthy-development-and-use-of-artificial-intelligence/
work page 2023
-
[27]
Sequence modeling and design from molecular to genome scale with Evo [Internet]
Nguyen E, Poli M, Durrant MG, Thomas AW, Kang B, Sullivan J, et al. Sequence modeling and design from molecular to genome scale with Evo [Internet]. bioRxiv; 2024 [cited 2024 Mar 31]. p. 2024.02.27.582234. Available from: https://www.biorxiv.org/content/10.1101/2024.02.27.582234v2
-
[28]
Engineering AAVs with Evo and AlphaFold [Internet]
Workman K. Engineering AAVs with Evo and AlphaFold [Internet]. 2024 [cited 2024 Mar 30]. Available from: https://blog.latch.bio/p/engineering-aavs-with-evo-and-alphafold
work page 2024
-
[29]
Washington, D.C.: National Academies Press; 2006 [cited 2024 Mar 27]
Globalization, Biosecurity, and the Future of the Life Sciences [Internet]. Washington, D.C.: National Academies Press; 2006 [cited 2024 Mar 27]. Available from: http://www.nap.edu/catalog/11567
work page 2006
-
[30]
Autonomous, multiproperty-driven molecular discovery: From predictions to measurements and back | Science [Internet]. [cited 2024 Mar 27]. Available from: https://www-science- org.laneproxy.stanford.edu/doi/10.1126/science.adi1407
-
[31]
Synthetic Nucleic Acids [Internet]. [cited 2024 May 6]. Available from: https://aspr.hhs.gov:443/legal/synna/Pages/default.aspx
work page 2024
-
[32]
The White House [Internet]. 2024 [cited 2024 May 9]. Framework for Nucleic Acid Synthesis Screening | OSTP. Available from: https://www.whitehouse.gov/ostp/news- updates/2024/04/29/framework-for-nucleic-acid-synthesis-screening/
work page 2024
-
[33]
Washington, D.C.: National Academies Press; 2018 [cited 2024 Mar 27]
Biodefense in the Age of Synthetic Biology [Internet]. Washington, D.C.: National Academies Press; 2018 [cited 2024 Mar 27]. Available from: https://www.nap.edu/catalog/24890
work page 2018
-
[34]
Durrant MG, Fanton A, Tycko J, Hinks M, Chandrasekaran SS, Perry NT, et al. Systematic discovery of recombinases for efficient integration of large DNA sequences into the human genome. Nat Biotechnol. 2023 Apr;41(4):488–99
work page 2023
-
[35]
GOV.UK [Internet]. [cited 2024 Mar 27]. AI Safety Institute approach to evaluations. Available from: https://www.gov.uk/government/publications/ai-safety-institute-approach-to-evaluations/ai- safety-institute-approach-to-evaluations
work page 2024
-
[36]
The White House [Internet]. 2024 [cited 2024 May 9]. United States Government Policy for Oversight of Dual Use Research of Concern and Pathogens with Enhanced Pandemic Potential | OSTP. Available from: https://www.whitehouse.gov/ostp/news-updates/2024/05/06/united-states- government-policy-for-oversight-of-dual-use-research-of-concern-and-pathogens-with-e...
work page 2024
-
[37]
GOV.UK [Internet]. [cited 2024 Mar 27]. Introducing the AI Safety Institute. Available from: https://www.gov.uk/government/publications/ai-safety-institute-overview/introducing-the-ai-safety- institute
work page 2024
-
[38]
GOV.UK [Internet]. [cited 2024 Apr 2]. The Bletchley Declaration by Countries Attending the AI Safety Summit, 1-2 November 2023. Available from: https://www.gov.uk/government/publications/ai- safety-summit-2023-the-bletchley-declaration/the-bletchley-declaration-by-countries-attending-the-ai- safety-summit-1-2-november-2023
work page 2024
-
[39]
Anthropic’s Responsible Scaling Policy [Internet]. [cited 2024 Mar 27]. Available from: https://www.anthropic.com/news/anthropics-responsible-scaling-policy
work page 2024
-
[40]
Preparedness [Internet]. [cited 2024 Mar 27]. Available from: https://openai.com/safety/preparedness
work page 2024
-
[41]
Introducing the next generation of Claude [Internet]. [cited 2024 Mar 27]. Available from: https://www.anthropic.com/news/claude-3-family
work page 2024
-
[42]
Gopal A, Helm-Burger N, Justen L, Soice EH, Tzeng T, Jeyapragasan G, et al. Will releasing the weights of future large language models grant widespread access to pandemic agents? [Internet]. arXiv; 2023 [cited 2024 Mar 27]. Available from: http://arxiv.org/abs/2310.18233
-
[43]
The WMDP Benchmark: Measuring and Reducing Malicious Use With Unlearning
Li N, Pan A, Gopal A, Yue S, Berrios D, Gatti A, et al. The WMDP Benchmark: Measuring and Reducing Malicious Use With Unlearning [Internet]. arXiv; 2024 [cited 2024 Mar 27]. Available from: http://arxiv.org/abs/2403.03218
work page internal anchor Pith review Pith/arXiv arXiv 2024
-
[44]
Measuring Massive Multitask Language Understanding
Hendrycks D, Burns C, Basart S, Zou A, Mazeika M, Song D, et al. Measuring Massive Multitask Language Understanding [Internet]. arXiv; 2021 [cited 2024 Mar 27]. Available from: http://arxiv.org/abs/2009.03300
work page internal anchor Pith review Pith/arXiv arXiv 2021
-
[45]
Amazon Web Services, Inc. [Internet]. [cited 2024 Mar 27]. Responsible AI – AWS AI Service Cards: Amazon Titan Text – Amazon Web Services. Available from: https://aws.amazon.com/machine-learning/responsible-machine-learning/titan-text/
work page 2024
-
[46]
Gemini: A Family of Highly Capable Multimodal Models
Gemini Team, Anil R, Borgeaud S, Wu Y, Alayrac JB, Yu J, et al. Gemini: A Family of Highly Capable Multimodal Models [Internet]. arXiv; 2023 [cited 2024 Mar 27]. Available from: http://arxiv.org/abs/2312.11805
work page internal anchor Pith review Pith/arXiv arXiv 2023
-
[47]
Overview of Meta AI safety policies prepared for the UK AI Safety Summit | Transparency Center [Internet]. [cited 2024 Mar 27]. Available from: https://transparency.fb.com/en-gb/policies/ai- safety-policies-for-safety-summit/
work page 2024
-
[48]
Blogs MC. Microsoft On the Issues. 2023 [cited 2024 Mar 27]. Microsoft’s AI Safety Policies. Available from: https://blogs.microsoft.com/on-the-issues/2023/10/26/microsofts-ai-safety-policies/
work page 2023
-
[49]
United States Government Policy for Oversight of Life Sciences DURC [Internet]. [cited 2024 Jun 21]. Available from: https://www.phe.gov/s3/dualuse/Pages/USGOversightPolicy.aspx
work page 2024
-
[50]
Department of Health and Human Services Framework for Guiding Funding Decisions about Proposed Research Involving Enhanced Potential Pandemic Pathogens (P3CO) [Internet]. [cited 2022 Nov 2]. Available from: https://www.phe.gov/s3/dualuse/Pages/ResearchReview-PPP.aspx
work page 2022
-
[51]
Dual Use Research of Concern [Internet]
admin. Dual Use Research of Concern [Internet]. Office of Science Policy. [cited 2021 Aug 16]. Available from: https://osp.od.nih.gov/biotechnology/dual-use-research-of-concern/
work page 2021
-
[52]
Washington, D.C.: National Academies Press; 2004 [cited 2024 Mar 27]
Biotechnology Research in an Age of Terrorism [Internet]. Washington, D.C.: National Academies Press; 2004 [cited 2024 Mar 27]. Available from: http://www.nap.edu/catalog/10827
work page 2004
-
[53]
National Academies of Sciences E. Governance of Dual Use Research in the Life Sciences: Advancing Global Consensus on Research Oversight: Proceedings of a Workshop [Internet]. 2018 [cited 2021 Feb 18]. Available from: https://www.nap.edu/catalog/25154/governance-of-dual-use- research-in-the-life-sciences-advancing
work page 2018
-
[54]
Berns KI, Casadevall A, Cohen ML, Ehrlich SA, Enquist LW, Fitch JP, et al. Public health and biosecurity. Adaptations of avian flu virus are a cause for concern. Science. 2012 Feb 10;335(6069):660–1
work page 2012
-
[55]
Protocols and risks: when less is more
Pannu J, Sandbrink JB, Watson M, Palmer MJ, Relman DA. Protocols and risks: when less is more. Nat Protoc. 2022 Jan;17(1):1–2
work page 2022
-
[56]
Generative Artificial Intelligence-Assisted Protein Design Must Consider Repurposing Potential
Ekins S, Brackmann M, Invernizzi C, Lentzos F. Generative Artificial Intelligence-Assisted Protein Design Must Consider Repurposing Potential. GEN Biotechnol. 2023 Aug;2(4):296–300
work page 2023
-
[57]
Federal Select Agent Program [Internet]. 2023 [cited 2024 Mar 27]. Available from: https://www.selectagents.gov/index.htm
work page 2023
-
[58]
Chemical and Biological Controls [Internet]. [cited 2024 Mar 27]. Available from: https://www.bis.doc.gov/index.php/policy-guidance/product-guidance/chemical-and-biological- controls
work page 2024
-
[59]
Australia Group Common Control Lists — The Australia Group [Internet]. [cited 2024 Mar 27]. Available from: https://www.dfat.gov.au/publications/minisite/theaustraliagroupnet/site/en/controllists.html
work page 2024
-
[60]
Beyond Biosecurity by Taxonomic Lists: Lessons, Challenges, and Opportunities
Millett P, Alexanian T, Brink KR, Carter SR, Diggans J, Palmer MJ, et al. Beyond Biosecurity by Taxonomic Lists: Lessons, Challenges, and Opportunities. Health Secur. 2023 Dec;21(6):521–9
work page 2023
-
[61]
Proposed Biosecurity Oversight Framework for the Future of Science
Letts K (NIH/OD) [C]. Proposed Biosecurity Oversight Framework for the Future of Science
-
[62]
Artificial Intelligence Safety Institute
U.S. Artificial Intelligence Safety Institute. NIST [Internet]. 2023 Oct 26 [cited 2024 Mar 27]; Available from: https://www.nist.gov/artificial-intelligence/artificial-intelligence-safety-institute
work page 2023
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.