pith. sign in

arxiv: 2605.24632 · v1 · pith:FOFGTXJHnew · submitted 2026-05-23 · 💻 cs.CR · cs.AI· cs.LG

Demystifying the Mythos or Disrupting Bugonomics? From Zero-Day Asymmetry to Defender Remediation Throughput

Pith reviewed 2026-06-30 12:54 UTC · model grok-4.3

classification 💻 cs.CR cs.AIcs.LG
keywords LLM vulnerability discoverybugonomicszero-day economicsremediation throughputopen source securityvulnerability triageexploit market pricesmaintainer capacity
0
0 comments X

The pith

LLM-assisted discovery makes low-signal vulnerability candidates cheaper while shifting the bottleneck to defender remediation throughput.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper examines how large language models change the economics of vulnerability discovery and fixing by lowering the cost of candidate generation, code analysis, and report preparation at scale. It draws on public data from LLM previews and real browser collaborations plus exploit-market prices to argue that the outcome is not simply more high-value zero-days but greater pressure on validation, triage, and patching capacity. A sympathetic reader would care because the argument reframes AI security effects around operational defender scaling rather than offensive capability alone. The claim is most acute for open-source projects whose maintainer resources are fixed.

Core claim

Using public data from Anthropic's Mythos Preview and Mozilla Firefox collaborations along with exploit-market anchors and reward programs, the paper claims the near-term shift from LLM-driven discovery is not an increase in zero-days but a move toward broader defender remediation throughput where low-signal candidates become cheaper, evidence-rich remediation becomes more important, and scarce capacity moves toward maintainer review and release work.

What carries the argument

The bugonomics lens that tracks the operational economics of producing, proving, prioritizing, and fixing defects, applied to the transition from zero-day asymmetry to defender remediation throughput.

If this is right

  • Low-signal candidates become cheaper to produce at codebase scale.
  • Evidence-rich remediation work gains relative importance over raw discovery.
  • Scarce capacity shifts from bug hunting toward maintainer review and release processes.
  • The pressure is most visible in open source where funding and staffing do not automatically expand with report volume.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Projects may need new automated pre-filters that score reports on evidence quality before human triage begins.
  • Reward programs could move from flat bounties to tiered payouts that reward proof-of-impact quality over candidate novelty.
  • A two-tier reporting system could emerge in which automated low-evidence submissions receive minimal response while high-evidence ones compete for limited maintainer time.

Load-bearing premise

LLM-assisted discovery will substantially increase report volume while maintainer-side validation, triage, funding, and release capacity will not scale accordingly, especially in open source settings.

What would settle it

A sustained rise in LLM-generated reports accompanied by stable or declining average time-to-patch and no growth in backlogs across major open-source projects would falsify the claim that remediation throughput is the binding constraint.

read the original abstract

Recent demonstrations of large language models producing candidate and confirmed vulnerabilities in production software have renewed the narrative that AI will reshape offensive and defensive security. Headlines emphasize capability; they rarely interrogate costs and incentives. This paper examines LLM-driven vulnerability discovery through a bugonomics lens: the operational economics of producing, proving, prioritizing, and fixing security-relevant defects. Historically, the most visible high-end bugonomics was offense-priced because production-grade zero-days and exploit chains were expensive specialist outputs for governments, brokers, and offensive vendors. Defender-side bugonomics already existed in vulnerability research, reward programs, and vendor remediation work; LLM-assisted systems change its scale and distribution. They make candidate generation, code comprehension, harness construction, proof-of-impact drafting, and report preparation cheaper at codebase scale. Exploits and proofs of concept remain important, but in defender workflows they primarily prove impact, guide prioritization, and justify remediation. The resulting bottleneck is not only finding more bugs; it is absorbing, validating, triaging, patching, and shipping a larger stream of reports. Using public data from Anthropic's Mythos Preview and Mozilla Firefox collaborations, along with public exploit-market price anchors and vulnerability reward programs, we argue that the near-term shift is not simply more zero-days. It is a move toward broader defender remediation throughput: low-signal candidates become cheaper, evidence-rich remediation become more important, and scarce capacity shifts toward maintainer review and release work. The effect is acute in open source, where LLM-assisted discovery can increase report volume while maintainer-side validation, triage, funding, and release capacity may not scale.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

1 major / 1 minor

Summary. The paper claims that LLM-assisted vulnerability discovery shifts the economics of bug finding ('bugonomics') away from offense-dominated zero-day markets toward defender remediation throughput, with the new bottleneck being the absorption, validation, triage, patching, and release of higher report volumes. It invokes public data from Anthropic's Mythos Preview, Mozilla Firefox collaborations, exploit-market prices, and vulnerability reward programs to argue that this effect is especially acute in open source, where discovery costs fall but maintainer capacity does not scale accordingly.

Significance. If the argument holds, the paper supplies a conceptual framework for analyzing how AI tools redistribute costs and incentives between discovery and remediation in security. It draws attention to open-source maintainer constraints as a potential limiting factor and could inform the design of vulnerability programs and triage processes.

major comments (1)
  1. [Abstract] Abstract: the central claim that 'maintainer-side validation, triage, funding, and release capacity may not scale' with LLM-driven report volume is load-bearing for the predicted shift from zero-day asymmetry to remediation throughput, yet the manuscript presents this non-scaling as a structural feature of open-source settings without quantitative comparison of historical scaling rates, adoption of LLM triage tools, or modeling of capacity elasticity.
minor comments (1)
  1. The 'bugonomics' framework is introduced to organize costs and incentives, but the abstract does not supply explicit definitions, independent external benchmarks, or falsifiable predictions that would allow readers to test the framework separately from the conclusions.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for highlighting the load-bearing nature of the non-scaling claim. Our response clarifies the paper's scope as a conceptual framework supported by cited public data rather than a quantitative model, while acknowledging where additional context could strengthen the presentation.

read point-by-point responses
  1. Referee: [Abstract] Abstract: the central claim that 'maintainer-side validation, triage, funding, and release capacity may not scale' with LLM-driven report volume is load-bearing for the predicted shift from zero-day asymmetry to remediation throughput, yet the manuscript presents this non-scaling as a structural feature of open-source settings without quantitative comparison of historical scaling rates, adoption of LLM triage tools, or modeling of capacity elasticity.

    Authors: The manuscript frames the argument as an economic and incentive analysis rather than an empirical econometric study. The non-scaling premise draws from established characteristics of open-source maintenance (limited maintainer time, volunteer structures, and fixed release cadences) documented in prior OSS literature, combined with the observed drop in candidate-generation costs from the cited Anthropic Mythos Preview and Mozilla data. These sources illustrate increased report volume without corresponding expansion in triage and patching throughput. We do not claim to have modeled elasticity or performed new historical scaling comparisons; the contribution is the identification of the resulting bottleneck shift. We can expand the related-work section to reference existing studies on OSS maintainer capacity constraints, but we maintain that the current evidence base suffices for the conceptual claim. revision: partial

Circularity Check

0 steps flagged

No circularity: argument uses external public data without self-referential reduction

full rationale

The paper introduces a 'bugonomics' framework as an analytical lens but does not define its terms or conclusions in terms of each other by construction. Claims rest on cited public datasets (Anthropic Mythos Preview, Mozilla Firefox collaborations, exploit-market prices, vulnerability reward programs) rather than fitted parameters renamed as predictions or self-citations. No equations, uniqueness theorems, or ansatzes are invoked that reduce the throughput-shift argument to the framework's own inputs. The non-scaling of defender capacity is stated as a structural observation about open-source settings, not derived from the framework itself. This is a self-contained argumentative analysis with independent external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 1 invented entities

The analysis rests on the domain assumption that LLM systems reduce candidate generation costs at scale and introduces the new conceptual term 'bugonomics' without independent prior evidence or validation.

axioms (1)
  • domain assumption LLM-assisted systems make candidate generation, code comprehension, harness construction, proof-of-impact drafting, and report preparation cheaper at codebase scale.
    This premise is stated directly in the abstract as the basis for the claimed change in bugonomics scale and distribution.
invented entities (1)
  • bugonomics no independent evidence
    purpose: A lens for the operational economics of producing, proving, prioritizing, and fixing security-relevant defects.
    New term coined in the paper to structure the discussion of costs and incentives.

pith-pipeline@v0.9.1-grok · 5831 in / 1391 out tokens · 46190 ms · 2026-06-30T12:54:52.149137+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. Antaeus: Hunting Repository-Level Logic Vulnerabilities via Context-Grounded LLM Reasoning

    cs.CR 2026-07 unverdicted novelty 6.0

    Antaeus detects 15 logic vulnerabilities across 28 repositories via a pipeline of function prioritization, repository-level LLM reasoning, and comparative validation, outperforming baselines at similar cost.

Reference graph

Works this paper leans on

26 extracted references · 1 canonical work pages · cited by 1 Pith paper

  1. [1]

    The AI Vulnerability Storm: Building a Mythos-ready Security Program,

    G. Evron, R. T. Lee, R. Mogull, et al., “The AI Vulnerability Storm: Building a Mythos-ready Security Program,” Cloud Security Alliance CISO Community, SANS Institute, [un]prompted, OW ASP Gen AI Security Project, Apr. 18, 2026

  2. [2]

    AI Cyber Challenge marks pivotal inflection point for cyber defense,

    DARPA, “AI Cyber Challenge marks pivotal inflection point for cyber defense,” Aug. 8, 2025. [Online]. Available: https://www.darpa.mil/ news/2025/aixcc-results

  3. [3]

    The idea behind BynarIO,

    Bynar.io, “The idea behind BynarIO,” 2025. [Online]. Available: https: //bynar.io/blog/the-idea-behind-bynario

  4. [4]

    Introducing Trusted Access for Cyber,

    OpenAI, “Introducing Trusted Access for Cyber,” Feb. 5, 2026. [Online]. Available: https://openai.com/index/trusted-access-for-cyber/

  5. [5]

    Trusted access for the next era of cyber defense,

    OpenAI, “Trusted access for the next era of cyber defense,” Apr. 14, 2026. [Online]. Available: https://openai.com/index/ scaling-trusted-access-for-cyber-defense/

  6. [6]

    Assessing Claude Mythos Preview’s cybersecurity capabilities,

    Anthropic Frontier Red Team, “Assessing Claude Mythos Preview’s cybersecurity capabilities,” Apr. 2026. [Online]. Available: https://red. anthropic.com/2026/mythos-preview/

  7. [7]

    Partnering with Mozilla to improve Firefox’s secu- rity,

    Anthropic, “Partnering with Mozilla to improve Firefox’s secu- rity,” Mar. 2026. [Online]. Available: https://www.anthropic.com/news/ mozilla-firefox-security

  8. [8]

    Behind the Scenes Hardening Firefox with Claude Mythos Preview,

    B. Grinstead, C. Holler, and F. Braun, “Behind the Scenes Hardening Firefox with Claude Mythos Preview,” Mozilla Hacks, May 7, 2026. [Online]. Available: https://hacks.mozilla.org/2026/05/ behind-the-scenes-hardening-firefox/

  9. [9]

    Claude API Pricing,

    Anthropic, “Claude API Pricing,” 2026. [Online]. Available: https: //platform.claude.com/docs/en/about-claude/pricing

  10. [10]

    Zero Days, Thousands of Nights: The Life and Times of Zero-Day Vulnerabilities and Their Exploits,

    L. Ablon and A. Bogart, “Zero Days, Thousands of Nights: The Life and Times of Zero-Day Vulnerabilities and Their Exploits,” RAND Corpo- ration, 2017. [Online]. Available: https://www.rand.org/pubs/research reports/RR1751.html

  11. [11]

    Price of zero-day exploits rises as companies harden products against hackers,

    L. Franceschi-Bicchierai, “Price of zero-day exploits rises as companies harden products against hackers,” TechCrunch, Apr. 6, 2024. [Online]. Available: https://techcrunch.com/2024/04/06/ price-of-zero-day-exploits-rises-as-companies-harden-products-against-hackers/

  12. [12]

    About 0-days In-the-Wild

    Google Project Zero, “About 0-days In-the-Wild.” [Online]. Available: https://googleprojectzero.github.io/0days-in-the-wild/about.html

  13. [13]

    Root Cause Analyses,

    Google Project Zero, “Root Cause Analyses,” 0-days In-the-Wild. [On- line]. Available: https://googleprojectzero.github.io/0days-in-the-wild/ rca.html

  14. [14]

    VRP 2025 Year in Review,

    Google Vulnerability Rewards Program Team, “VRP 2025 Year in Review,” Google Security Blog, Mar. 31, 2026. [Online]. Available: https://blog.google/security/vrp-2025-year-in-review/

  15. [15]

    Evolving the Android & Chrome VRPs for the AI Era,

    Google Bug Hunters, “Evolving the Android & Chrome VRPs for the AI Era,” Apr. 30, 2026. [Online]. Available: https://bughunters.google. com/blog/evolving-the-android-chrome-vrps-for-the-ai-era

  16. [16]

    2025 Data Breach Investigations Report: Executive Summary,

    Verizon, “2025 Data Breach Investigations Report: Executive Summary,”

  17. [17]

    Available: https://www.verizon.com/business/resources/ reports/2025-dbir-executive-summary.pdf

    [Online]. Available: https://www.verizon.com/business/resources/ reports/2025-dbir-executive-summary.pdf

  18. [18]

    M-Trends 2025,

    Mandiant, “M-Trends 2025,” 2025. [Online]. Available: https://services. google.com/fh/files/misc/m-trends-2025-en.pdf

  19. [19]

    Look What You Made Us Patch: 2025 Zero-Days in Review,

    Google Threat Intelligence Group, “Look What You Made Us Patch: 2025 Zero-Days in Review,” Mar. 2026. [Online]. Available: https:// cloud.google.com/blog/topics/threat-intelligence/2025-zero-day-review

  20. [20]

    VulnCheck State of Exploitation 2026,

    P. Garrity, “VulnCheck State of Exploitation 2026,” VulnCheck, Jan. 21, 2026. [Online]. Available: https://www.vulncheck.com/blog/ state-of-exploitation-2026

  21. [21]

    Introducing the 2026 VulnCheck Exploit Intelligence Report,

    C. Condon, “Introducing the 2026 VulnCheck Exploit Intelligence Report,” VulnCheck, Feb. 25, 2026. [Online]. Available: https://www. vulncheck.com/blog/2026-vulncheck-exploit-intelligence-report

  22. [22]

    American Fuzzy Lop,

    M. Zalewski, “American Fuzzy Lop,” 2013. [Online]. Available: https: //lcamtuf.coredump.cx/afl/ 11

  23. [23]

    libFuzzer: a library for coverage-guided fuzz testing

    LLVM Project, “libFuzzer: a library for coverage-guided fuzz testing.” [Online]. Available: https://llvm.org/docs/LibFuzzer.html

  24. [24]

    Address- Sanitizer: A Fast Address Sanity Checker,

    K. Serebryany, D. Bruening, A. Potapenko, and D. Vyukov, “Address- Sanitizer: A Fast Address Sanity Checker,” USENIX ATC, 2012

  25. [25]

    KLEE: Unassisted and Automatic Generation of High-Coverage Tests for Complex Systems Programs,

    C. Cadar, D. Dunbar, and D. Engler, “KLEE: Unassisted and Automatic Generation of High-Coverage Tests for Complex Systems Programs,” OSDI, 2008

  26. [26]

    Weird Machines, Exploitability, and Provable Unexploitabil- ity,

    T. Dullien, “Weird Machines, Exploitability, and Provable Unexploitabil- ity,”IEEE Transactions on Emerging Topics in Computing, vol. 8, no. 2, pp. 391–403, 2020, doi: 10.1109/TETC.2017.2785299. 12