Recognition: unknown
VectorSmuggle: Steganographic Exfiltration in Embedding Stores and a Cryptographic Provenance Defense
Pith reviewed 2026-05-14 17:42 UTC · model grok-4.3
The pith
Embeddings can hide stolen data via small rotations that evade detectors, but signatures block the changes.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Post-embedding perturbations such as small-angle orthogonal rotations allow an attacker to encode payload bits inside vectors without shifting the distributional properties that anomaly detectors monitor, so retrieval behavior for legitimate users remains unchanged. Real manifolds limit usable capacity below the theoretical floor(d/2) * b bits from a disjoint-Givens encoder. VectorPin counters the attack by producing an Ed25519 signature over a canonical byte representation of the embedding, its originating content, and the model; any alteration invalidates the signature.
What carries the argument
VectorPin, the protocol that computes an Ed25519 signature over a canonical byte representation of each embedding together with its source content and producing model.
If this is right
- Vector stores without provenance checks remain open to steganographic exfiltration from any party that can modify embeddings at ingestion time.
- Distribution-based anomaly detectors fail against rotation-based hiding on every tested model and corpus pair.
- A capacity-detectability trade-off appears in practice: usable hidden bits per vector sit well below the theoretical maximum once retrieval must stay unchanged.
- Embedding-level cryptographic pinning becomes a deployable control that closes the entire class of post-creation modifications.
Where Pith is reading between the lines
- The same rotation technique could be tested on image or multimodal embeddings to see whether retrieval-preserving hiding generalizes beyond text.
- Signature verification at query time would add a fixed per-vector cost but would detect tampering regardless of how the change was made.
- Manifold geometry appears to set a hard limit on how much data can be hidden without moving the vector enough to change retrieval rankings.
Load-bearing premise
Perturbations exist that preserve identical top-k retrieval results for the same queries after the embedding is changed.
What would settle it
Compare top-k retrieval sets on a held-out query set using original versus small-angle-rotated embeddings to check whether the sets match exactly, or attempt signature verification on a single modified vector to see whether it fails.
Figures
read the original abstract
Modern retrieval-augmented generation (RAG) systems convert sensitive content into high-dimensional embeddings and store them in vector databases that treat the resulting numerical artifacts as opaque. Major vector-store products do not provide native controls for embedding integrity, ingestion-time distributional anomaly detection, or cryptographic provenance attestation. We show this opens a class of steganographic exfiltration attacks: an attacker with write access to the ingestion pipeline can hide payload data inside embeddings using simple post-embedding perturbations (noise injection, rotation, scaling, offset, fragmentation, and combinations thereof) while preserving the surface-level retrieval behavior the RAG system exposes to legitimate users. We evaluate these techniques across a synthetic-PII corpus on text-embedding-3-large, four locally hosted open embedding models, a cross-corpus replication on BEIR NFCorpus and a Quora subset (over 26,000 chunks combined), seven vector-store configurations, an adaptive-attacker variant of the detector evaluation, and a paraphrased-query retrieval benchmark. Distribution-shifting perturbations are often caught by simple anomaly detectors; small-angle orthogonal rotation defeats distribution-based detection across every (model, corpus) pair tested. A disjoint-Givens rotation encoder gives a closed-form per-vector capacity ceiling of floor(d/2) * b bits, but real embedding manifolds impose a capacity-detectability trade-off, and the retrieval-preserving operating point sits well below it. We propose VectorPin, a cryptographic provenance protocol that pins each embedding to its source content and producing model via an Ed25519 signature over a canonical byte representation. Any post-embedding modification breaks signature verification. Embedding-level integrity is a deployable, standardizable control that closes this attack class.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper claims that attackers with write access to RAG ingestion pipelines can perform steganographic exfiltration by applying post-embedding perturbations (noise, rotation, scaling, etc.) to hide payloads in vector stores while preserving surface retrieval behavior. It reports that small-angle orthogonal rotations defeat distribution-based detectors across all tested models and corpora (including >26k chunks on paraphrased queries), derives a closed-form capacity bound for disjoint-Givens rotations, and proposes VectorPin: an Ed25519 signature protocol over canonical byte representations that detects any post-embedding modification.
Significance. If the empirical claims hold, the work is significant for RAG security: it identifies a practical attack class on opaque embedding stores and supplies a deployable, parameter-free cryptographic control (standard Ed25519) that closes the vector. The breadth of evaluation (multiple models, BEIR/Quora corpora, seven stores, adaptive attacker) and the explicit capacity-detectability trade-off discussion are strengths; the defense requires no new primitives and directly addresses the identified gap.
major comments (2)
- [Evaluation (paraphrased-query retrieval benchmark and cross-corpus replication)] The central usability claim—that small-angle orthogonal rotations preserve surface-level retrieval behavior across every (model, corpus) pair—rests on unquantified assertions. The paraphrased-query benchmark on >26k chunks is described, but no top-k overlap, MRR delta, recall@5 change, or similar metrics are reported between clean and rotated embeddings. This metric gap is load-bearing for both the 'defeats detection while remaining usable' conclusion and the practical stealth assessment.
- [Abstract and Evaluation sections] The statement that 'small-angle orthogonal rotation defeats distribution-based detection across every (model, corpus) pair tested' is presented as a universal result, yet the abstract and evaluation summary provide no per-pair detector scores, false-positive rates, or adaptive-attacker success rates. Without these numbers or the exact detector implementations, the scope of the 'defeats' claim cannot be verified.
minor comments (1)
- [Capacity analysis] The disjoint-Givens rotation capacity formula (floor(d/2) * b bits) is stated without an accompanying derivation or reference to the underlying linear-algebra construction; a short appendix or inline proof sketch would improve clarity.
Simulated Author's Rebuttal
We thank the referee for the careful and constructive review. The two major comments correctly identify gaps in the quantitative reporting of our evaluation results. We will address both by expanding the evaluation section with the requested metrics and tables in the revised manuscript.
read point-by-point responses
-
Referee: [Evaluation (paraphrased-query retrieval benchmark and cross-corpus replication)] The central usability claim—that small-angle orthogonal rotations preserve surface-level retrieval behavior across every (model, corpus) pair—rests on unquantified assertions. The paraphrased-query benchmark on >26k chunks is described, but no top-k overlap, MRR delta, recall@5 change, or similar metrics are reported between clean and rotated embeddings. This metric gap is load-bearing for both the 'defeats detection while remaining usable' conclusion and the practical stealth assessment.
Authors: We agree that the manuscript describes the paraphrased-query benchmark and cross-corpus replication but does not report the explicit numerical deltas. In the revision we will add a table (and accompanying text) that reports top-k overlap, MRR, recall@5, and nDCG deltas between clean and rotated embeddings for every model-corpus pair. These numbers will directly quantify the retrieval-preservation claim and support the practical stealth assessment. revision: yes
-
Referee: [Abstract and Evaluation sections] The statement that 'small-angle orthogonal rotation defeats distribution-based detection across every (model, corpus) pair tested' is presented as a universal result, yet the abstract and evaluation summary provide no per-pair detector scores, false-positive rates, or adaptive-attacker success rates. Without these numbers or the exact detector implementations, the scope of the 'defeats' claim cannot be verified.
Authors: We acknowledge that the abstract and high-level evaluation summary omit the per-pair numerical results. The evaluation section describes the detector implementations and the adaptive-attacker protocol, but the concrete scores are only summarized. We will revise both the abstract and the evaluation section to include detailed tables listing detector scores, false-positive rates, and adaptive-attacker success rates for each (model, corpus) pair, together with the precise detector configurations used. revision: yes
Circularity Check
No significant circularity: empirical evaluations and standard Ed25519 application
full rationale
The paper's core claims rest on empirical measurements of perturbation effects (noise, rotation, scaling) across multiple models, corpora, and benchmarks, with no fitted parameters or self-referential predictions. The VectorPin defense is a direct, parameter-free application of the standard Ed25519 signature scheme over canonical byte representations of embeddings, source content, and model identifiers. No equations or derivations reduce to their inputs by construction, no self-citations form load-bearing uniqueness arguments, and no ansatzes are smuggled via prior work. The attack results are falsifiable via the reported retrieval and detection metrics; the defense is independently verifiable against the Ed25519 specification. This yields a self-contained, non-circular contribution.
Axiom & Free-Parameter Ledger
axioms (2)
- domain assumption Small post-embedding perturbations can preserve retrieval behavior while embedding hidden data
- standard math Ed25519 signatures over canonical byte representations provide integrity against post-creation modification
invented entities (1)
-
VectorPin protocol
no independent evidence
Reference graph
Works this paper leans on
-
[1]
Turning your weakness into a strength: Watermarking deep neural networks by backdooring
Yossi Adi, Carsten Baum, Moustapha Cisse, Benny Pinkas, and Joseph Keshet. Turning your weakness into a strength: Watermarking deep neural networks by backdooring. InUSENIX Security Symposium, 2018
2018
-
[2]
Bernstein, Niels Duif, Tanja Lange, Peter Schwabe, and Bo-Yin Yang
Daniel J. Bernstein, Niels Duif, Tanja Lange, Peter Schwabe, and Bo-Yin Yang. High-speed high-security signatures.Journal of Cryptographic Engineering, 2, 2012
2012
-
[3]
Extracting training data from large language models
Nicholas Carlini, Florian Tramer, Eric Wallace, et al. Extracting training data from large language models. InUSENIX Security Symposium, 2021
2021
-
[4]
C2PA technical specification, version 2.0
Coalition for Content Provenance and Authenticity. C2PA technical specification, version 2.0. https://c2pa.org/specifications/, 2024
2024
-
[5]
Cox, Joe Kilian, F
Ingemar J. Cox, Joe Kilian, F. Thomson Leighton, and Talal Shamoon. Secure spread spectrum watermarking for multimedia.IEEE Transactions on Image Processing, 6(12), 1997
1997
-
[6]
Regulation (EU) 2024/1689 on artificial intelligence
European Parliament and Council. Regulation (EU) 2024/1689 on artificial intelligence. https://eur-lex.europa.eu/eli/reg/2024/1689/oj, 2024
2024
-
[7]
Cambridge University Press, 2009
Jessica Fridrich.Steganography in Digital Media: Principles, Algorithms, and Applications. Cambridge University Press, 2009
2009
-
[8]
BadNets: Identifying Vulnerabilities in the Machine Learning Model Supply Chain
Tianyu Gu, Brendan Dolan-Gavitt, and Siddharth Garg. BadNets: Identifying vulnerabilities in the machine learning model supply chain.arXiv preprint arXiv:1708.06733, 2017
work page internal anchor Pith review Pith/arXiv arXiv 2017
-
[9]
Billion-scale similarity search with GPUs
Jeff Johnson, Matthijs Douze, and Hervé Jégou. Billion-scale similarity search with GPUs. IEEE Transactions on Big Data, 7(3), 2021
2021
-
[10]
Jones, John Bradley, and Nat Sakimura
Michael B. Jones, John Bradley, and Nat Sakimura. RFC 7515: JSON web signature (JWS). https://datatracker.ietf.org/doc/html/rfc7515, 2015
2015
-
[11]
RFC 8032: Edwards-curve digital signature algorithm (EdDSA).https://datatracker.ietf.org/doc/html/rfc8032, 2017
Simon Josefsson and Ilari Liusvaara. RFC 8032: Edwards-curve digital signature algorithm (EdDSA).https://datatracker.ietf.org/doc/html/rfc8032, 2017
2017
-
[12]
Dense passage retrieval for open-domain question answering
Vladimir Karpukhin, Barlas Oğuz, Sewon Min, Patrick Lewis, Ledell Wu, Sergey Edunov, Danqi Chen, and Wen-tau Yih. Dense passage retrieval for open-domain question answering. InEmpirical Methods in Natural Language Processing (EMNLP), 2020
2020
-
[13]
RFC 3339: Date and time on the Internet: Timestamps
Graham Klyne and Chris Newman. RFC 3339: Date and time on the Internet: Timestamps. https://datatracker.ietf.org/doc/html/rfc3339, 2002
2002
-
[14]
Retrieval-augmented generation for knowledge-intensive NLP tasks
Patrick Lewis, Ethan Perez, Aleksandra Piktus, Fabio Petroni, Vladimir Karpukhin, Naman Goyal, Heinrich Küttler, Mike Lewis, Wen-tau Yih, Tim Rocktäschel, Sebastian Riedel, and Douwe Kiela. Retrieval-augmented generation for knowledge-intensive NLP tasks. InAdvances in Neural Information Processing Systems (NeurIPS), 2020. 40
2020
-
[15]
Isolation forest
Fei Tony Liu, Kai Ming Ting, and Zhi-Hua Zhou. Isolation forest. InIEEE International Conference on Data Mining (ICDM), 2008
2008
-
[16]
Malkov and D
Yury A. Malkov and D. A. Yashunin. Efficient and robust approximate nearest neighbor search using hierarchical navigable small world graphs.IEEE Transactions on Pattern Analysis and Machine Intelligence, 42(4), 2020
2020
-
[17]
AI risk management framework (AI RMF 1.0).https://www.nist.gov/itl/ai-risk-management-framework, 2023
National Institute of Standards and Technology. AI risk management framework (AI RMF 1.0).https://www.nist.gov/itl/ai-risk-management-framework, 2023
2023
-
[18]
sigstore: Software signing for everybody
Zachary Newman, John Speed Meyers, and Santiago Torres-Arias. sigstore: Software signing for everybody. InACM Conference on Computer and Communications Security (CCS), 2022
2022
-
[19]
SLSA: Supply-chain levels for software artifacts.https: //slsa.dev/, 2023
Open Source Security Foundation. SLSA: Supply-chain levels for software artifacts.https: //slsa.dev/, 2023
2023
-
[20]
New embedding models and API updates
OpenAI. New embedding models and API updates. https://openai.com/index/ new-embedding-models-and-api-updates/, 2024. Announcement of text-embedding-3- largeand related models
2024
-
[21]
Berkay Celik, and Ananthram Swami
Nicolas Papernot, Patrick McDaniel, Ian Goodfellow, Somesh Jha, Z. Berkay Celik, and Ananthram Swami. Practical black-box attacks against machine learning. InACM Asia Conference on Computer and Communications Security (ASIACCS), 2017
2017
-
[22]
Scikit-learn: Machine learning in Python.Journal of Machine Learning Research, 12, 2011
Fabian Pedregosa, Gaël Varoquaux, Alexandre Gramfort, Vincent Michel, Bertrand Thirion, Olivier Grisel, Mathieu Blondel, et al. Scikit-learn: Machine learning in Python.Journal of Machine Learning Research, 12, 2011
2011
-
[23]
Hide and seek: An introduction to steganography.IEEE Security & Privacy, 1(3), 2003
Niels Provos and Peter Honeyman. Hide and seek: An introduction to steganography.IEEE Security & Privacy, 1(3), 2003
2003
-
[24]
Qdrant documentation: Vector quantization
Qdrant Team. Qdrant documentation: Vector quantization. https://qdrant.tech/ documentation/guides/quantization/, 2024
2024
-
[25]
RFC 8392: CBOR object signing and encryption (COSE).https://datatracker
Jim Schaad. RFC 8392: CBOR object signing and encryption (COSE).https://datatracker. ietf.org/doc/html/rfc8392, 2018
2018
-
[26]
Platt, John Shawe-Taylor, Alex J
Bernhard Schölkopf, John C. Platt, John Shawe-Taylor, Alex J. Smola, and Robert C. Williamson. Estimating the support of a high-dimensional distribution.Neural Computation, 13(7), 2001
2001
-
[27]
Membership inference attacks against machine learning models
Reza Shokri, Marco Stronati, Congzheng Song, and Vitaly Shmatikov. Membership inference attacks against machine learning models. InIEEE Symposium on Security and Privacy, 2017
2017
-
[28]
in-toto: Providing farm-to-table guarantees for bits and bytes
Santiago Torres-Arias, Hammad Afzali, Trishank Karthik Kuppusamy, Reza Curtmola, and Justin Cappos. in-toto: Providing farm-to-table guarantees for bits and bytes. InUSENIX Security Symposium, 2019
2019
-
[29]
The Space of Transferable Adversarial Examples
Florian Tramèr, Nicolas Papernot, Ian Goodfellow, Dan Boneh, and Patrick McDaniel. The space of transferable adversarial examples. InarXiv preprint arXiv:1704.03453, 2017
work page internal anchor Pith review Pith/arXiv arXiv 2017
-
[30]
SchemaPin: Cryptographic provenance for tool schemas.https://github
Jascha Wanger. SchemaPin: Cryptographic provenance for tool schemas.https://github. com/ThirdKeyAI/SchemaPin, 2025. Apache-2.0. 41
2025
-
[31]
Symbiont: Policy-governed agent runtime.https://github.com/ThirdKeyAI/ Symbiont, 2025
Jascha Wanger. Symbiont: Policy-governed agent runtime.https://github.com/ThirdKeyAI/ Symbiont, 2025. Apache-2.0
2025
-
[32]
VectorPin: Verifiable integrity for AI embedding stores.https://github
Jascha Wanger. VectorPin: Verifiable integrity for AI embedding stores.https://github. com/ThirdKeyAI/VectorPin, 2025. Apache-2.0
2025
-
[33]
VectorSmuggle: A research framework for vector-based data exfiltration
Jascha Wanger. VectorSmuggle: A research framework for vector-based data exfiltration. https://github.com/jaschadub/VectorSmuggle, 2025. Apache-2.0
2025
-
[34]
F5—a steganographic algorithm: High capacity despite better steganalysis
Andreas Westfeld. F5—a steganographic algorithm: High capacity despite better steganalysis. InInformation Hiding (IH ’01), 2001
2001
-
[35]
Wei Zou, Runpeng Geng, Binghui Wang, and Jinyuan Jia. PoisonedRAG: Knowledge corruption attacks to retrieval-augmented generation of large language models.https://arxiv.org/abs/ 2402.07867, 2024. A Protocol Specification (v1) This appendix is a self-contained reproduction of the VectorPin protocol specification at version
-
[36]
∥hex(SHA256(UTF8(NFC(s)))). Text MUST be normalized to Unicode NFC before encoding. Implementations MUST reject input that cannot be normalized. 42 Vector.hash_vector(v,dtype) :=
A separate-language implementation that follows this appendix should produce signatures and verifications byte-for-byte compatible with the Python and Rust reference implementations [32]. A.1 Goals A VectorPin Pin is a compact attestation that travels with an embedding through a vector database. It guarantees that: •The embedding matches a specific source...
2026
-
[37]
Reject pins whosevfield is unknown to it (UNSUPPORTED_VERSION)
-
[38]
Reject pins whosekidis not in its key registry (UNKNOWN_KEY)
-
[39]
Reconstruct the canonical byte sequence and verifysig against the registered public key forkid (SIGNATURE_INVALIDon failure). 43
-
[40]
If ground-truth source was supplied, recomputehash_text(source) and compare tosource_- hash(SOURCE_MISMATCHon mismatch)
-
[41]
If a ground-truth vector was supplied, recomputehash_vector(vector, vec_dtype) and com- pare tovec_hash; also check the supplied vector’s shape matchesvec_dim (VECTOR_TAMPERED orSHAPE_MISMATCHon mismatch)
-
[42]
rbf", nu=0.05, gamma=
If an expected model identifier was supplied, compare tomodel (MODEL_MISMATCH on mismatch). Verifiers MUST distinguish at least these failure modes. Other implementations MAY use different identifiers for the modes but MUST distinguish the cases. A.7 Storage conventions Adapter implementations SHOULD store pins under the metadata keyvectorpin. Backends wi...
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.