pith. sign in

arxiv: 2605.15084 · v1 · pith:MCSNCEBKnew · submitted 2026-05-14 · 💻 cs.CR

PickleFuzzer: A Case Study in Fuzzing for Discrepancies Between Python Pickle Implementations

Pith reviewed 2026-06-30 20:08 UTC · model grok-4.3

classification 💻 cs.CR
keywords picklefuzzingdifferential testingserializationsecurityPythondiscrepanciesvirtual machine
0
0 comments X

The pith

A grammar-based fuzzer using differential testing detects 14 discrepancies between Python pickle implementations, four of which bypass security scanners.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

PickleFuzzer generates pickle objects from a custom grammar that models the protocol's opcodes and feeds identical inputs to each of the three native Python implementations. It flags a discrepancy whenever the implementations differ in the exceptions raised or in changes to internal states, without relying on an external specification oracle. The tool identified 14 previously unreported discrepancies, four of which allow malicious payloads to evade detection by security scanners. The findings establish that implementation differences can undermine defenses that depend on precise interpretation of pickle virtual-machine behavior. All results were reported to the maintainers for remediation.

Core claim

PickleFuzzer generates pickle objects using a grammar developed to account for the missing pickle specification. It determines discrepancies by comparing the execution behaviors of each test implementation, rather than requiring a specification-derived oracle. PickleFuzzer detected 14 new discrepancies between the pickle implementations. Four discrepancies are critical and can be used to bypass security-critical scanning tools.

What carries the argument

PickleFuzzer, a generation-based fuzzer that applies differential testing to inputs produced by a custom grammar covering pickle opcodes.

If this is right

  • Inconsistent exception behavior or state changes across implementations can produce incorrect classification of malicious payloads.
  • Security tools that scan pickle data may fail to detect attacks that succeed on one implementation but not another.
  • Differential testing without a full specification is sufficient to surface security-relevant inconsistencies in pickle.
  • The same method can support more directed fuzzing aimed at deeper bugs in the pickle virtual machine.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same grammar-driven differential approach could be applied to other serialization protocols that lack complete public specifications.
  • A formal, public specification of the pickle opcode set would reduce the need for post-hoc grammar engineering to catch implementation drift.
  • Extending the fuzzer to additional third-party or language-specific pickle variants would test whether the observed discrepancies are Python-specific.
  • Serialization choices in machine-learning pipelines may warrant systematic differential testing as part of their security review.

Load-bearing premise

The custom grammar developed to account for the missing pickle specification is sufficiently complete to generate inputs that expose security-relevant discrepancies rather than only superficial differences.

What would settle it

An independent check showing that none of the four reported critical discrepancies actually allow a malicious payload to bypass the security scanners in practice.

Figures

Figures reproduced from arXiv: 2605.15084 by Andreas Kellas, Justin Applegate.

Figure 1
Figure 1. Figure 1: Each iteration of PICKLEFUZZER has three phases. In the Generation Phase, a pickle program is generated from a grammar and combined with randomly selected encoding and buffer metadata to create a test payload. In the Execution Phase, the test payload is run by each of the test modules, and execution trace information is collected. In the Evaluation Phase, the execution information is compared to identify d… view at source ↗
Figure 2
Figure 2. Figure 2: Picklescan fails to detect a malicious pickle file ( [PITH_FULL_IMAGE:figures/full_fig_p012_2.png] view at source ↗
read the original abstract

Python's native serialization protocol, pickle, is a powerful but insecure format for transferring untrusted data. It is frequently used, especially for saving machine learning models, despite known security challenges. While developers sometimes mitigate this risk by restricting imports during unpickling or using static and dynamic analysis tools, these approaches are error-prone and depend heavily on accurate interpretations of the Pickle Virtual Machine (PVM) opcodes. Discrepancies across Python's three native PVM modules can lead to incorrect detection of malicious payloads and undermine existing defenses. To efficiently and scalably identify discrepancies, we present PickleFuzzer, a custom generation-based fuzzer that identifies inconsistencies across pickle implementations. PickleFuzzer generates pickle objects, passes them to each implementation, and detects differences in thrown exceptions or changes to key internal states. It generates pickle objects using a grammar, which we developed to account for the missing pickle specification. It determines discrepancies by comparing the execution behaviors of each test implementation, rather than requiring a specification-derived oracle. PickleFuzzer detected 14 new discrepancies between the pickle implementations. Four discrepancies are critical and can be used to bypass security-critical scanning tools like those deployed on the popular model hosting platform, Hugging Face. We disclosed all findings to the Python Software Foundation for remediation, and additionally disclosed the security issues to a bug bounty platform and were awarded a $750 bounty. We demonstrate that differential testing is a viable approach for identifying security-relevant discrepancies in important pickle implementations, and our work can lead to promising future directions for finding deeper pickle bugs with more directed fuzzing.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper introduces PickleFuzzer, a generation-based fuzzer that uses a hand-crafted grammar (developed in the absence of a formal pickle specification) to generate inputs and detect behavioral discrepancies across Python's pickle implementations by comparing thrown exceptions and internal PVM state changes. It reports 14 new discrepancies, four of which are characterized as critical because they enable bypass of security scanners including those deployed on Hugging Face; the findings were disclosed to the PSF and yielded a $750 bounty.

Significance. If the discrepancies are reproducible and the bypass claims hold, the work provides concrete evidence that differential testing can surface security-relevant inconsistencies in a widely used but underspecified serialization format, with direct implications for ML model hosting and import-restriction defenses. The external bounty award supplies independent corroboration that at least some reported issues are actionable.

major comments (2)
  1. [§3] §3 (Grammar Construction): the central claim that four discrepancies enable scanner bypass rests on the grammar producing inputs that reach security-critical PVM opcode sequences and state transitions; however, the manuscript supplies no coverage metrics, no enumeration of exercised opcodes against the full set, and no comparison of generated inputs to the opcode patterns present in known malicious pickles.
  2. [§5.2] §5.2 (Critical Discrepancies): the assertion that the four discrepancies bypass Hugging Face-style scanners is load-bearing for the security contribution, yet the evaluation provides only high-level descriptions of the differing behaviors without concrete reproduction steps, the exact opcode/state combinations involved, or evidence that the fuzzer inputs are dense in the relevant subspace rather than only triggering superficial exception differences.
minor comments (2)
  1. [Table 2] Table 2: the column headers for the three PVM implementations are not repeated on continuation pages, reducing readability.
  2. [Abstract and §5] The abstract states '14 new discrepancies' while the body occasionally refers to 'previously unreported' without clarifying overlap with prior public reports; a short clarification sentence would help.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their positive assessment of the significance of our work and the independent validation provided by the bounty award. We address each of the major comments below.

read point-by-point responses
  1. Referee: [§3] §3 (Grammar Construction): the central claim that four discrepancies enable scanner bypass rests on the grammar producing inputs that reach security-critical PVM opcode sequences and state transitions; however, the manuscript supplies no coverage metrics, no enumeration of exercised opcodes against the full set, and no comparison of generated inputs to the opcode patterns present in known malicious pickles.

    Authors: We agree that the manuscript would benefit from more explicit details on the grammar's coverage. Although no formal pickle specification exists, making traditional coverage metrics challenging to define, we will add an enumeration of the opcodes and key state transitions exercised by the grammar to the revised §3. We will also include a comparison of our generated inputs against opcode patterns from publicly known malicious pickles to demonstrate that the grammar reaches security-critical sequences. This addresses the concern directly. revision: yes

  2. Referee: [§5.2] §5.2 (Critical Discrepancies): the assertion that the four discrepancies bypass Hugging Face-style scanners is load-bearing for the security contribution, yet the evaluation provides only high-level descriptions of the differing behaviors without concrete reproduction steps, the exact opcode/state combinations involved, or evidence that the fuzzer inputs are dense in the relevant subspace rather than only triggering superficial exception differences.

    Authors: We acknowledge that the current presentation in §5.2 is high-level. In the revision, we will provide concrete reproduction steps for the four critical discrepancies, including the exact opcode sequences and state changes that lead to the scanner bypasses. Regarding the density of fuzzer inputs in the relevant subspace, our differential testing approach prioritizes finding any discrepancies, and the fact that four were deemed critical enough for a bounty suggests they are not superficial. We can add further analysis or examples from our test cases to show the inputs target the relevant behaviors. revision: yes

Circularity Check

0 steps flagged

No significant circularity; purely empirical differential testing

full rationale

The paper describes a generation-based fuzzer that uses a hand-crafted grammar (developed because no authoritative pickle specification exists) to produce inputs, then compares runtime behaviors across three Python pickle implementations to surface discrepancies. No equations, fitted parameters, predictions, or derivations appear in the provided text. The central claims rest on observed differences in exceptions and internal state rather than any self-referential reduction, self-citation chain, or renaming of known results. The grammar completeness is an explicit modeling assumption whose adequacy is evaluated by the discrepancies found, not by construction from the outputs themselves.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

The paper introduces no mathematical free parameters, axioms, or invented entities; the contribution is an empirical testing tool and the discrepancies it surfaced.

pith-pipeline@v0.9.1-grok · 5822 in / 1005 out tokens · 20798 ms · 2026-06-30T20:08:08.699354+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

38 extracted references · 6 canonical work pages

  1. [1]

    SensePost|Playing with python pickle #1,

    M. Slaviero, “SensePost|Playing with python pickle #1,” Oct. 2010, https://sensepost.com/blog/2010/playing-with-python-pickle-%231/. [Online]. Available: https://sensepost.com/blog/2010/playing-with-pyt hon-pickle-%231/

  2. [2]

    Shellcoding in Python’s serialisation format,

    ——, “Shellcoding in Python’s serialisation format,”Black Hat, 2011. [Online]. Available: https://media.blackhat.com/bh-us-11/Slaviero/BH US 11 Slaviero Sour Pickles WP.pdf

  3. [3]

    moreati/pickle-fuzz,

    A. Willmer, “moreati/pickle-fuzz,” https://github.com/moreati/pickle-f uzz. [Online]. Available: https://github.com/moreati/pickle-fuzz

  4. [4]

    PEP 307 – Extensions to the pickle protocol |peps.python.org,

    G. van Rossum, “PEP 307 – Extensions to the pickle protocol |peps.python.org,” Feb. 2003, https://peps.python.org/pep-0307/. [Online]. Available: https://peps.python.org/pep-0307/

  5. [5]

    Django’s cache framework|Django documentation,

    “Django’s cache framework|Django documentation,” https:// docs.djangoproject.com/en/5.1/topics/cache/. [Online]. Available: https://docs.djangoproject.com/en/5.1/topics/cache/

  6. [6]

    Never a dill moment: Exploiting machine learning pickle files,

    E. Sultanik, “Never a dill moment: Exploiting machine learning pickle files,” Mar. 2021, https://blog.trailofbits.com/2021/03/15/never-a-dill-m oment-exploiting-machine-learning-pickle-files/. [Online]. Available: https://blog.trailofbits.com/2021/03/15/never-a-dill-moment-exploitin g-machine-learning-pickle-files/

  7. [7]

    Pickleball: Secure deserialization of pickle-based machine learning models,

    A. D. Kellas, N. Christou, W. Jiang, P. Li, L. Simon, Y . David, V . P. Kemerlis, J. C. Davis, and J. Yang, “Pickleball: Secure deserialization of pickle-based machine learning models,” inProceedings of the 2025 ACM SIGSAC Conference on Computer and Communications Security, ser. CCS ’25. New York, NY , USA: Association for Computing Machinery, 2025, p. 33...

  8. [8]

    Paws in the Pickle Jar: Risk & Vulnerability in the Model-sharing Ecosystem,

    “Paws in the Pickle Jar: Risk & Vulnerability in the Model-sharing Ecosystem,” https://www.splunk.com/en us/blog/security/paws-in-the -pickle-jar-risk-vulnerability-in-the-model-sharing-ecosystem.html. [Online]. Available: https://www.splunk.com/en us/blog/security/paws -in-the-pickle-jar-risk-vulnerability-in-the-model-sharing-ecosystem.h tml

  9. [9]

    protectai/modelscan,

    “protectai/modelscan,” Apr. 2025, https://github.com/protectai/modelsc an. [Online]. Available: https://github.com/protectai/modelscan

  10. [10]

    mmaitre314/picklescan,

    M. Maitre, “mmaitre314/picklescan,” Apr. 2025, https://github.com/m maitre314/picklescan. [Online]. Available: https://github.com/mmaitre 314/picklescan

  11. [11]

    trailofbits/fickling,

    “trailofbits/fickling,” Feb. 2026, https://github.com/trailofbits/fickling. [Online]. Available: https://github.com/trailofbits/fickling

  12. [12]

    Hugging face teams up with protect ai: Enhancing model security for the ml community,

    “Hugging face teams up with protect ai: Enhancing model security for the ml community,” https://huggingface.co/blog/protectai. [Online]. Available: https://huggingface.co/blog/protectai

  13. [13]

    Third-party scanner: Protect ai,

    “Third-party scanner: Protect ai,” https://huggingface.co/blog/protectai. [Online]. Available: https://huggingface.co/blog/protectai

  14. [14]

    Pickle Scanning,

    “Pickle Scanning,” https://huggingface.co/docs/hub/en/security-pickle. [Online]. Available: https://huggingface.co/docs/hub/en/security-pickle

  15. [15]

    pickle — Python object serialization,

    “pickle — Python object serialization,” https://docs.python.org/3/librar y/pickle.html. [Online]. Available: https://docs.python.org/3/library/pick le.html

  16. [16]

    Malicious ml models discovered on hugging face platform,

    K. Zanki, “Malicious ml models discovered on hugging face platform,” Feb 2025. [Online]. Available: https://www.reversinglabs.com/blog/rl-i dentifies-malware-ml-model-hosted-on-hugging-face

  17. [17]

    The Art of Hide and Seek: Making Pickle-Based Model Supply Chain Poisoning Stealthy Again,

    T. Liu, G. Meng, P. Zhou, Z. Deng, S. Yao, and K. Chen, “The Art of Hide and Seek: Making Pickle-Based Model Supply Chain Poisoning Stealthy Again,” 2025. [Online]. Available: https: //arxiv.org/abs/2508.19774

  18. [18]

    Hugging Face – The AI community building the future

    “Hugging Face – The AI community building the future.” May 2025, https://huggingface.co/. [Online]. Available: https://huggingface.co/

  19. [19]

    cpython/Lib/pickle.py at 16c8eccfcf85811d1d9368aacb94b47ae8195719 · python/cpython,

    “cpython/Lib/pickle.py at 16c8eccfcf85811d1d9368aacb94b47ae8195719 · python/cpython,” https://github.com/python/cpython/blob/16c8eccfc f85811d1d9368aacb94b47ae8195719/Lib/pickle.py#L658. [Online]. Available: https://github.com/python/cpython/blob/16c8eccfcf85811d1 d9368aacb94b47ae8195719/Lib/pickle.py#L658

  20. [20]

    Pain Pickle: Bypassing Python Restricted Unpickler for Automatic Exploit Generation,

    N.-J. Huang, C.-J. Huang, and S.-K. Huang, “Pain Pickle: Bypassing Python Restricted Unpickler for Automatic Exploit Generation,” in2022 IEEE 22nd International Conference on Software Quality, Reliability and Security (QRS), Dec. 2022, pp. 1079–1090, iSSN: 2693-9177. [Online]. Available: https://ieeexplore.ieee.org/abstract/document/1006 2403

  21. [21]

    Weights-only unpickler,

    PyTorch, “Weights-only unpickler,” https://github.com/pytorch/pytorch /blob/main/torch/ weights only unpickler.py, 2024

  22. [22]

    cpython/Lib/pickle.py at main · python/cpython,

    “cpython/Lib/pickle.py at main · python/cpython,” https://github.c om/python/cpython/blob/main/Lib/pickle.py. [Online]. Available: https://github.com/python/cpython/blob/main/Lib/pickle.py

  23. [23]

    cpython/Modules/ pickle.c at main · python/cpython,

    “cpython/Modules/ pickle.c at main · python/cpython,” https://github.c om/python/cpython/blob/main/Modules/ pickle.c. [Online]. Available: https://github.com/python/cpython/blob/main/Modules/ pickle.c

  24. [24]

    cpython/Lib/pickletools.py at main · python/cpython,

    “cpython/Lib/pickletools.py at main · python/cpython,” https://github.c om/python/cpython/blob/main/Lib/pickletools.py. [Online]. Available: https://github.com/python/cpython/blob/main/Lib/pickletools.py

  25. [25]

    Differential testing for software,

    W. M. McKeeman, “Differential testing for software,”Digit. Tech. J., vol. 10, pp. 100–107, 1998. [Online]. Available: https://api.semanticsc holar.org/CorpusID:14018070

  26. [26]

    Asfuzzer: Differential testing of assemblers with error-driven grammar inference,

    H. Kim, S. Kim, J. Lee, and S. K. Cha, “Asfuzzer: Differential testing of assemblers with error-driven grammar inference,” inProceedings of the 33rd ACM SIGSOFT International Symposium on Software Testing and Analysis, ser. ISSTA 2024. New York, NY , USA: Association for Computing Machinery, 2024, p. 1099–1111. [Online]. Available: https://doi.org/10.1145...

  27. [27]

    dippy gram: Grammar-aware, coverage- guided differential fuzzing of url parsers,

    B. Kallus and S. W. Smith, “dippy gram: Grammar-aware, coverage- guided differential fuzzing of url parsers,” 2023. [Online]. Available: https://langsec.org/spw23/papers/Kallus LangSec23.pdf

  28. [28]

    Email Smuggling with Differential Fuzzing of MIME Parsers ,

    S. B. Andarzian, M. Meyers, and E. Poll, “ Email Smuggling with Differential Fuzzing of MIME Parsers ,” in2025 IEEE Security and Privacy Workshops (SPW). Los Alamitos, CA, USA: IEEE Computer Society, May 2025, pp. 26–37. [Online]. Available: https://doi.ieeecomputersociety.org/10.1109/SPW67851.2025.00007

  29. [29]

    Equivocal URLs: Understanding the Fragmented Space of URL Parser Implementations,

    J. Reynolds, A. Bates, and M. Bailey, “Equivocal URLs: Understanding the Fragmented Space of URL Parser Implementations,” inComputer Security – ESORICS 2022: 27th European Symposium on Research in Computer Security, Copenhagen, Denmark, September 26–30, 2022, Proceedings, Part III. Berlin, Heidelberg: Springer-Verlag, 2022, p. 166–185. [Online]. Available...

  30. [30]

    pickledoc/Opcodes.md at main · Legoclones/pickledoc,

    “pickledoc/Opcodes.md at main · Legoclones/pickledoc,” https://github .com/Legoclones/pickledoc/blob/main/Opcodes.md. [Online]. Available: https://github.com/Legoclones/pickledoc/blob/main/Opcodes.md

  31. [31]

    cPickle doesn’t raise error, pickle does (recursiondepth) · Issue #38614 · python/cpython,

    “cPickle doesn’t raise error, pickle does (recursiondepth) · Issue #38614 · python/cpython,” https://github.com/python/cpython/issues/38614. [Online]. Available: https://github.com/python/cpython/issues/38614

  32. [32]

    Demystify the fuzzing methods: A comprehensive survey,

    S. Mallissery and Y .-S. Wu, “Demystify the fuzzing methods: A comprehensive survey,”ACM Comput. Surv., vol. 56, no. 3, Oct. 2023. [Online]. Available: https://doi.org/10.1145/3623375

  33. [33]

    Token-Level fuzzing,

    C. Salls, C. Jindal, J. Corina, C. Kruegel, and G. Vigna, “Token-Level fuzzing,” in30th USENIX Security Symposium (USENIX Security 21). USENIX Association, Aug. 2021, pp. 2795–2809. [Online]. Available: https://www.usenix.org/conference/usenixsecurity21/presentation/salls

  34. [34]

    Chapter 5: Input and Output — Definitive Guide to Jython latest documentation,

    “Chapter 5: Input and Output — Definitive Guide to Jython latest documentation,” https://jython.readthedocs.io/en/latest/InputOutput/. [Online]. Available: https://jython.readthedocs.io/en/latest/InputOutput/

  35. [35]

    What is Jython,

    “What is Jython,” https://www.jython.org/. [Online]. Available: https://www.jython.org/

  36. [36]

    Sunsetting Python 2,

    “Sunsetting Python 2,” https://www.python.org/doc/sunset-python-2/. [Online]. Available: https://www.python.org/doc/sunset-python-2/

  37. [37]

    IronLanguages/ironpython3,

    “IronLanguages/ironpython3,” Feb. 2026, https://github.com/IronLangu ages/ironpython3. [Online]. Available: https://github.com/IronLanguag es/ironpython3

  38. [38]

    pypy/pypy,

    “pypy/pypy,” Feb. 2026, https://github.com/pypy/pypy. [Online]. Available: https://github.com/pypy/pypy APPENDIX root@29bc390197e5:˜# xxd mal.pkl 00000000: 4930 7831 3333 370a 8c05 706f 7369 788c I0x1337...posix. 00000010: 0673 7973 7465 6d93 8c06 7768 6f61 6d69 .system...whoami 00000020: 8552 2e .R. root@29bc390197e5:˜# picklescan -p mal.pkl ERROR: parsi...