The Case for ESM3 as a General-Purpose AI Model with Systemic Risk Under the EU AI Act

Jacob Griffith, Koen Holtman, Marcel Mir Teijeiro, Rokas Gipi\v{s}kis, Taro Qureshi, Ze Shen Chin

Authors on Pith no claims yet

Pith reviewed 2026-05-09 13:42 UTC · model grok-4.3

classification 💻 cs.CY cs.AIcs.LG

keywords esm3modelsbiologicalconcludegeneral-purposeobligationsrisksubject

0 comments

The pith

ESM3 does not currently qualify as a general-purpose AI model with systemic risk under the EU AI Act despite mapping to biorisk chains, but regulatory remedies are proposed to address this gap.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The EU AI Act sets rules for powerful AI systems that could cause widespread harm. ESM3 is a large model trained on biological data to understand proteins and design molecules. The authors trace how ESM3 could be part of a chain leading to biological risks, such as helping create harmful agents. They compare ESM3's features like its scale and capabilities against the Act's specific thresholds for 'systemic risk' models. These thresholds focus on things like training compute or certain high-risk uses. The analysis finds that ESM3 falls short of triggering the obligations. The paper then suggests updates to the law so that providers of similar biological models must evaluate and reduce dual-use risks before release.

Core claim

We conclude that at this time, ESM3 does not appear to be meaningfully regulated by the Act. We then propose remedies to correct the situation.

Load-bearing premise

The assumption that the EU AI Act's current classification criteria and supporting material provide a complete and accurate basis for determining whether biological models like ESM3 pose systemic risks requiring obligations.

read the original abstract

Due to ambiguity in the wording of the EU AI Act, we examine the question of to what extent frontier biological foundation models such as ESM3 are subject to obligations for general-purpose AI models with systemic risk under the EU AI Act. In this paper, we map ESM3 to the biorisk chain, and conclude that it would be desirable if the providers of ESM3 and similar biological models were subject to these obligations, which would require them to assess and mitigate dual-use risks from their models. We then perform an analysis, comparing the attributes of ESM3 to the classification criteria in the AI Act and the supporting material. We conclude that at this time, ESM3 does not appear to be meaningfully regulated by the Act. We then propose remedies to correct the situation.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

ESM3 falls outside the EU AI Act's systemic-risk GPAI rules under current criteria, and the paper maps why that gap exists along with some fixes.

read the letter

The core point is that ESM3 does not appear to meet the EU AI Act's thresholds for general-purpose AI models with systemic risk, even after the authors map its capabilities onto the biorisk chain. They compare the model's attributes directly to the classification criteria in the Act and supporting material, then outline remedies to bring such biological foundation models under the obligations for risk assessment and mitigation. This is a straightforward policy analysis that takes the Act's wording at face value and applies it to a concrete frontier model in biology. What the paper does well is make the regulatory gap explicit rather than leaving it abstract. The biorisk mapping shows how ESM3's protein design features could enable dual-use work, and the attribute-by-attribute check against the Act highlights that the current rules lean heavily on quantitative markers like compute or parameters. That comparison is useful for anyone tracking how the Act will actually land on specialized models. The soft spots are modest but real. The analysis rests on legal interpretation of the Act's criteria and supporting documents, which can evolve with official guidance or enforcement practice, so the 'not meaningfully regulated' conclusion could shift if qualitative capability triggers are read more broadly. The proposed remedies stay high-level without much detail on how providers would implement dual-use assessments in practice. The paper is for people working on AI governance, biosecurity policy, or EU regulatory implementation. It is not a technical deep dive into the model itself. It deserves a serious referee because it flags a timely, concrete shortfall in how the Act handles biological foundation models, even if the legal reading invites pushback on scope.

Referee Report

2 major / 2 minor

Summary. The paper examines the extent to which the biological foundation model ESM3 is subject to obligations for general-purpose AI models with systemic risk under the EU AI Act. It maps ESM3 capabilities onto the biorisk chain, argues that subjecting such models to these obligations would be desirable for dual-use risk assessment and mitigation, compares ESM3 attributes against the Act's classification criteria and supporting material (including quantitative thresholds), concludes that ESM3 is not meaningfully regulated at present, and proposes remedies to address the identified gap.

Significance. If the legal mapping and attribute comparison hold, the paper identifies a concrete regulatory gap for advanced biological AI models under the EU AI Act, with potential implications for biosecurity governance. The biorisk-chain mapping and direct comparison to Act criteria provide a structured, falsifiable framework for evaluating similar models that could inform policy amendments or enforcement guidance. The work is strengthened by its explicit separation of the desirability argument from the regulatory-status conclusion.

major comments (2)

[attribute comparison and classification criteria analysis] Analysis of classification criteria (post-biorisk mapping section): The conclusion that ESM3 fails to meet systemic-risk criteria rests on an interpretation that the Act (Art. 51, Annex XIII, and GPAI Code of Practice) relies exclusively on quantitative thresholds such as training compute or parameter count, with no binding qualitative route for dual-use biological design capabilities. However, the paper's own biorisk-chain mapping demonstrates high dual-use potential; the analysis does not cite or refute specific language in the supporting material that might permit capability-based triggers, leaving the 'not meaningfully regulated' claim load-bearing but incompletely tested against alternative readings of the Act.
[remedies proposal] Remedies section: The proposed remedies to bring biological foundation models under the Act's obligations are outlined at a high level without specifying concrete mechanisms (e.g., amendments to Annex XIII or updates to the Code of Practice) or assessing their compatibility with the Act's existing quantitative framework, which weakens the actionability of the central policy recommendation.

minor comments (2)

[abstract and introduction] The abstract and introduction could more explicitly distinguish the desirability argument from the regulatory-status finding to avoid any appearance of conflating policy preference with legal analysis.
[throughout classification criteria section] Several citations to EU AI Act articles and annexes are referenced by number but lack pinpoint page or paragraph references in the supporting material, which would aid readers in verifying the attribute comparisons.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The analysis depends on interpretive assumptions about the EU AI Act's scope and criteria rather than mathematical derivations or empirical data fitting.

axioms (1)

domain assumption The EU AI Act's classification criteria for general-purpose AI models with systemic risk, as described in the Act and supporting material, are the appropriate and sufficient standard for assessing regulatory obligations for biological foundation models.
Invoked when mapping ESM3 to the criteria and concluding it does not trigger obligations.

pith-pipeline@v0.9.0 · 5456 in / 1244 out tokens · 84444 ms · 2026-05-09T13:42:16.274082+00:00 · methodology

The Case for ESM3 as a General-Purpose AI Model with Systemic Risk Under the EU AI Act

Core claim

Load-bearing premise

discussion (0)