pith. sign in

arxiv: 2606.03330 · v1 · pith:YDSLZXEJnew · submitted 2026-06-02 · 💻 cs.LG · cs.AI· cs.CR

FLIPS: Instance-Fingerprinting for LLMs via Pseudo-random Sequences

Pith reviewed 2026-06-28 11:21 UTC · model grok-4.3

classification 💻 cs.LG cs.AIcs.CR
keywords instance fingerprintingLLM identificationpseudo-random sequencesAI regulationmodel configurationsbinary sequence biasclosed-set identificationopen-set identification
0
0 comments X

The pith

Instance-level parameters create stable biases in LLMs' pseudo-random binary outputs that allow identification of specific configurations.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Large language models behave differently depending on instance-level parameters such as instructional prompts, sampling configurations, and quantization levels. Regulators need tools to verify actual deployed behaviors rather than model provenance alone, yet existing fingerprinting methods are built to ignore these variations. The paper introduces FLIPS, which detects statistical biases in the binary random sequences an LLM produces to distinguish different configurations of the same model. Across 237 model instances it reaches 96 percent accuracy when all targets are known and 90 percent in open-set cases where some targets are unknown, compared with 35 percent for an adapted baseline. This establishes that instance-level fingerprinting is both required for compliance work and practically achievable.

Core claim

FLIPS exploits biases in generated binary random sequences to distinguish configurations of the same LLM, reaching 96 percent closed-set and 90 percent open-set identification accuracy across 237 model instances versus 35 percent for the adapted baseline. The work shows that instance-level fingerprinting is necessary for regulation and practically feasible.

What carries the argument

FLIPS, the method that identifies LLM instances by exploiting statistical biases in the pseudo-random binary sequences they generate under varying instance parameters.

If this is right

  • Regulators can target actual deployed behaviors of LLMs rather than model provenance alone.
  • Identification accuracy holds in open-set scenarios where some instances are unknown.
  • The approach outperforms adapted existing techniques by a wide margin on hundreds of instances.
  • Compliance checks can focus on specific configurations that may produce unsafe outputs under certain settings.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same bias signals could be tested for persistence when models receive updates or additional fine-tuning after initial deployment.
  • Combining sequence biases with other observable outputs might increase resistance to attempts to mask the fingerprint.
  • Extending the queries beyond pure random-sequence generation to ordinary tasks would show whether the distinguishing patterns survive in typical use.

Load-bearing premise

Instance-level parameters produce stable, distinguishable statistical biases in pseudo-random binary sequences that are not erased by normal usage or prompt variation.

What would settle it

A test showing that different LLM instances produce statistically indistinguishable binary sequence biases, or that repeated queries to the same instance under varied prompts cause identification accuracy to fall below usable levels.

Figures

Figures reproduced from arXiv: 2606.03330 by Erwan Le Merrer, Gilles Tredan, Gohar Dashyan, Gurvan Richardeau.

Figure 1
Figure 1. Figure 1: We evaluate the ability of the state-of-the-art IPP finger￾printing scheme LLMmap taken off the shelf (Pasquini et al., 2025) and FLIPS (ours) to distinguish an ”abliterated” (uncensored) in￾stance from its original safe counterpart. LLMmap generally iden￾tifies the unsafe abliterated version as the original model, which is the primary goal of the method: being robust to alteration of the original model. F… view at source ↗
Figure 2
Figure 2. Figure 2: Querying Strategy FLIPS employs a fixed prompt tem￾plate q0 (formalized in Appendix C) that instructs the in￾stance to generate a random binary sequence. The base￾line prompt requests symbols 0 and 1 (yielding sequences 6 Stochasticity arises from sampling procedures (e.g., top-k, nu￾cleus sampling) and hardware non-determinism (Atil et al., 2024) [PITH_FULL_IMAGE:figures/full_fig_p003_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Calculation of the Pseudo-random Conditions Rate. For each token pair (tA, tB), we evaluate every generated to￾ken across all sequences and models by comparing their log￾probabilities. A binary score is assigned based on whether the log-probability of tA is significantly higher than tB or if they are relatively similar. This score is then averaged, yielding the Pseudo￾random Conditions Rate. Token-Level An… view at source ↗
Figure 4
Figure 4. Figure 4: Relationship between randomness quality and finger￾printing accuracy. Each point corresponds to a model instance evaluated on a single 0-1 token pair or averaged over the 30 token pairs sampled in Section 5. The NIST score is a sequence-level average of all NIST test success rates, and fingerprinting accuracy is measured in the closed-set setting with Nt = 1. The 0-1 Token Pair The 0-1 token pair shows par… view at source ↗
Figure 5
Figure 5. Figure 5: presents the overall classification accuracy as a function of Nt, the number of queries in the verification stage. In the open-set setting, it shows the good performance of FLIPS as it achieves 90% accuracy, with a precision/re￾call on unseen of 54%/83%, using Nt = 8 verification queries. The performance is already strong using a single request: accuracy of 64%, and steadily improves with ad￾ditional sampl… view at source ↗
Figure 6
Figure 6. Figure 6: Open-set precision/recall trade-off vs. confidence threshold (micro-averaged across classes, Nt = 8). Global pre￾cision/recall (all instances) and Unseen-only precision/recall are shown jointly as the confidence threshold sweeps. plete results in both closed and open settings). Surpris￾ingly, we observe greater homogeneity across rows than columns in [PITH_FULL_IMAGE:figures/full_fig_p007_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: FLIPS per-instance recall in the open-set setting with Nt = 8 (shortlist of representative instances, full heatmap in Appendix F, [PITH_FULL_IMAGE:figures/full_fig_p008_7.png] view at source ↗
Figure 8
Figure 8. Figure 8: Per LLM closed-set classification confusion matrix with Nt = 1. A logarithmic scale is applied to highlight infrequent confusions. of license plates: an easy way for the regulator to identify an instance. License plates can be forged, but this criminal behavior is considerably more involved and risky. Like￾wise, active fingerprinting deception is a considerably more challenging endeavor. Despite their frag… view at source ↗
Figure 9
Figure 9. Figure 9: Classifier comparison at Nt = 1 (blue) and Nt = 8 (orange). Each bar reports mean closed-set classification accu￾racy across cross-validation splits, with error bars indicating one standard deviation. Both regimes use Nr = 40 training samples per class. The dashed red line marks the random-chance baseline (1/|E|). C. Prompt q0(tA, tB) Random Binary Sequence Generation Prompt Give a sequence of 500 symbols … view at source ↗
Figure 10
Figure 10. Figure 10: The top 25 most important features reported by the XGBoost classification. This ranking is made on averaged feature importance over all tested token pairs. The interquartile range is also reported as error bars. normalization method auto-selected from the training dis￾tribution: a heavy-tailed/highly-skewed feature receives a power or log transform, a feature with a large outlier ratio is normalized robus… view at source ↗
Figure 11
Figure 11. Figure 11: serves two purposes. First, it presents the open-set ROC curve characterizing FLIPS’ Unseen-detection ability. Second, and more importantly for deployment, it empiri￾cally validates that our α-thresholding heuristic operates without requiring test data: the operating point selected by the α-quantile rule (α = 0.05, computed from training data only) sits essentially on top of the oracle threshold that an o… view at source ↗
Figure 12
Figure 12. Figure 12: (a) Illustration of an example of maximum probability distribution that needs to be separated by a threshold in Open-set. (b) Distribution of all the thresholds obtained with the thresholding procedure. Classification is made within a single-query (Nt = 1) here. Algorithm 1 Open-setEval (Evaluation of S) Require: Model set M, maximum number of training sam￾ples per model Nr, number of test samples per mod… view at source ↗
Figure 13
Figure 13. Figure 13: (Closed-set) FLIPS per-instance recall, in a closed-set setting with Nt = 8 (5-fold outer CV). Each tile represents the recall for an instance (original LLM plus variation). temp and sp refer to temperature and system prompt, int4 and fp8 to the quantizations; the system prompts are cataloged in Appendix I. 1 2 3 4 5 6 7 8 Number of queries at verification (Nt) 0.5 0.6 0.7 0.8 0.9 1.0 Accuracy Token Pair … view at source ↗
Figure 14
Figure 14. Figure 14: (Closed-set) Accuracy as a function of the verification budget Nt for two querying strategies: Same Token Pair (K = 1, repeat the same token pair Nt times) versus Multi Token Pair (K = Nt different token pairs, each one queried once). Both strategies share the same trained per-pair classifiers and the same total query budget; the only difference is whether queries are diversified across pairs. Mixing yiel… view at source ↗
Figure 16
Figure 16. Figure 16: Histogram of closed-set fingerprinting accuracies across the 30 token pairs sampled from T. The corresponding experi￾mental setup is consistent with the one of [PITH_FULL_IMAGE:figures/full_fig_p018_16.png] view at source ↗
Figure 18
Figure 18. Figure 18: compares the resulting open-set and closed-set accuracies against the main experimental regime (sequence lengths in [100, 500], capped by MaxTokens = 500). Trun￾cating to a constant 100 bits induces only a modest accuracy drop (about 7 points open-set, 4 points closed-set), confirm￾ing that the bulk of FLIPS’ discriminative power comes from the bias-based NIST features rather than from length shortcuts. W… view at source ↗
Figure 19
Figure 19. Figure 19: (Open-set) FLIPS per-instance recall, in an open-set setting with Nt = 8 (3-fold outer CV). Each tile represents the recall for an instance (original LLM plus variation). temp and sp refer to temperature and system prompt, int4 and fp8 to the quantizations; the system prompts are cataloged in Appendix I. 19 [PITH_FULL_IMAGE:figures/full_fig_p019_19.png] view at source ↗
Figure 20
Figure 20. Figure 20: Average generated sequence lengths per LLM after bit extraction, computed over all token pairs and all collected samples across variations. The black dotted vertical line indicates MaxTokens. 20 [PITH_FULL_IMAGE:figures/full_fig_p020_20.png] view at source ↗
read the original abstract

Literature reveals that a Large Language Model's (LLM) behavior is not only conditioned by its original weights but also its instance-level parameters, such as instructional prompt, sampling configuration or quantization. A model that generates safe outputs under one configuration may produce toxic content under another. However, current LLM identification techniques (such as fingerprinting) focus on intellectual property protection, and their design favors robustness to changes in these instance-level parameters. This poses a critical challenge for AI regulation in which compliance assessments target actual deployed behaviors, not model provenance. In this paper, we introduce instance-level fingerprinting, a regulator-oriented paradigm that distinguishes configurations of the same LLM. Our method FLIPS, exploits biases in generated binary random sequences to reach 96% (closed-set) and 90% (open-set, where some targets are unknown) identification accuracy across 237 model instances, versus 35% for the adapted LLMmap baseline. This shows that instance-level fingerprinting is both necessary for regulation and practically feasible. Code available at https://github.com/GurvanR/FLIPS-LLM-Instance-Fingerprinting.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 0 minor

Summary. The paper introduces FLIPS, a regulator-oriented instance-level fingerprinting method for LLMs that exploits statistical biases in pseudo-random binary sequences generated under different instance configurations (prompts, sampling, quantization). It reports 96% closed-set and 90% open-set identification accuracy across 237 model instances, substantially outperforming an adapted LLMmap baseline at 35%, and argues this enables compliance checks on deployed behaviors rather than model provenance alone.

Significance. If the reported accuracies prove robust, the work would be significant for AI regulation by demonstrating a practical, high-accuracy approach to distinguishing instance-specific behaviors that affect safety and compliance. The open-set result and code release are notable strengths that could support further development of falsifiable, regulator-usable fingerprints.

major comments (2)
  1. Abstract: the 90% open-set accuracy claim is load-bearing for the regulator-oriented contribution, yet the text supplies no quantitative results on bias stability under prompt rephrasing, continued interaction, or sampling variation; without such controls the accuracies may reflect fixed experimental conditions rather than persistent instance fingerprints.
  2. Abstract / Experiments section: the open-set protocol requires a rejection mechanism for unknown instances, but no description is given of the decision threshold, feature representation of the bit sequences, or how the 237-instance dataset was partitioned, preventing assessment of whether the 90% figure generalizes beyond the reported protocol.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback emphasizing the need for robustness evidence and methodological transparency in the open-set results. We respond to each major comment below.

read point-by-point responses
  1. Referee: Abstract: the 90% open-set accuracy claim is load-bearing for the regulator-oriented contribution, yet the text supplies no quantitative results on bias stability under prompt rephrasing, continued interaction, or sampling variation; without such controls the accuracies may reflect fixed experimental conditions rather than persistent instance fingerprints.

    Authors: We agree that explicit stability quantification under prompt rephrasing and continued interaction would strengthen the regulatory claim. The current experiments already incorporate variation in sampling parameters (temperature, top-p) and multiple base prompts across the 237 instances, but dedicated ablations on paraphrased prompts and multi-turn settings are not reported. We will add these quantitative stability results as a new subsection in the revised experiments. revision: yes

  2. Referee: Abstract / Experiments section: the open-set protocol requires a rejection mechanism for unknown instances, but no description is given of the decision threshold, feature representation of the bit sequences, or how the 237-instance dataset was partitioned, preventing assessment of whether the 90% figure generalizes beyond the reported protocol.

    Authors: We acknowledge these details were omitted and agree they are required for evaluation. The feature representation is the vector of per-bit bias statistics extracted from the generated sequences; the rejection threshold is selected via validation-set cross-validation to control false-positive rate on unknowns; and the 237 instances were partitioned 70/30 for training/testing with a disjoint subset of instances held out entirely as unknowns. We will insert a dedicated paragraph describing the full open-set protocol, threshold selection, and partitioning in the revised experiments section. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical accuracies measured against external baseline

full rationale

The paper presents FLIPS as an empirical fingerprinting method that generates pseudo-random binary sequences from LLM instances and measures identification accuracy (96% closed-set, 90% open-set) across 237 configurations against an adapted LLMmap baseline (35%). No equations, parameters, or claims are shown to reduce the reported accuracies to fitted inputs, self-definitions, or self-citation chains by construction. The derivation consists of experimental measurements of statistical biases rather than any mathematical or definitional loop. This is the normal case of a self-contained empirical study.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract supplies no information on free parameters, axioms, or invented entities; all such elements are unknown.

pith-pipeline@v0.9.1-grok · 5739 in / 989 out tokens · 26064 ms · 2026-06-28T11:21:48.921722+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

66 extracted references · 35 canonical work pages · 4 internal anchors

  1. [1]

    arXiv preprint arXiv:2309.07875 , year=

    Safety-tuned llamas: Lessons from improving the safety of large language models that follow instructions , author=. arXiv preprint arXiv:2309.07875 , year=

  2. [2]

    arXiv preprint arXiv:2508.19843 , year=

    Sok: Large language model copyright auditing via fingerprinting , author=. arXiv preprint arXiv:2508.19843 , year=

  3. [3]

    International workshop on digital watermarking , pages=

    Watermarking is not cryptography , author=. International workshop on digital watermarking , pages=. 2006 , organization=

  4. [4]

    2001 IEEE Fourth Workshop on Multimedia Signal Processing (Cat

    Considerations on watermarking security , author=. 2001 IEEE Fourth Workshop on Multimedia Signal Processing (Cat. No. 01TH8564) , pages=. 2001 , organization=

  5. [5]

    Proceedings of the IEEE/CVF conference on computer vision and pattern recognition , pages=

    Fingerprinting deep neural networks globally via universal adversarial perturbations , author=. Proceedings of the IEEE/CVF conference on computer vision and pattern recognition , pages=

  6. [6]

    Proceedings of the 2021 ACM asia conference on computer and communications security , pages=

    IPGuard: Protecting intellectual property of deep neural networks via fingerprinting the classification boundary , author=. Proceedings of the 2021 ACM asia conference on computer and communications security , pages=

  7. [7]

    Proceedings of the 28th ACM SIGKDD conference on knowledge discovery and data mining , pages=

    Metav: A meta-verifier approach to task-agnostic model fingerprinting , author=. Proceedings of the 28th ACM SIGKDD conference on knowledge discovery and data mining , pages=

  8. [8]

    IEEE Transactions on Information Forensics and Security , volume=

    Fingerprinting classifiers with benign inputs , author=. IEEE Transactions on Information Forensics and Security , volume=. 2023 , publisher=

  9. [9]

    2025 , eprint=

    LLMmap: Fingerprinting For Large Language Models , author=. 2025 , eprint=

  10. [10]

    arXiv preprint arXiv:2508.09021 , year=

    Attacks and defenses against llm fingerprinting , author=. arXiv preprint arXiv:2508.09021 , year=

  11. [11]

    arXiv preprint arXiv:2505.16530 , year=

    DuFFin: A Dual-Level Fingerprinting Framework for LLMs IP Protection , author=. arXiv preprint arXiv:2505.16530 , year=

  12. [12]

    arXiv preprint arXiv:2410.20247 , year=

    Model Equality Testing: Which Model Is This API Serving? , author=. arXiv preprint arXiv:2410.20247 , year=

  13. [13]

    Auditing Black-Box LLM APIs with a Rank-Based Uniformity Test

    Auditing Black-Box LLM APIs with a Rank-Based Uniformity Test , author=. arXiv preprint arXiv:2506.06975 , year=

  14. [14]

    arXiv preprint arXiv:2402.12991 , year=

    Trap: Targeted random adversarial prompt honeypot for black-box identification , author=. arXiv preprint arXiv:2402.12991 , year=

  15. [15]

    2024 IEEE Conference on Communications and Network Security (CNS) , pages=

    Proflingo: A fingerprinting-based intellectual property protection scheme for large language models , author=. 2024 IEEE Conference on Communications and Network Security (CNS) , pages=. 2024 , organization=

  16. [16]

    arXiv preprint arXiv:2505.12682 , year=

    RoFL: Robust Fingerprinting of Language Models , author=. arXiv preprint arXiv:2505.12682 , year=

  17. [17]

    SRAF: Stealthy and Robust Adversarial Fingerprint for Copyright Verification of Large Language Models

    SRAF: Stealthy and Robust Adversarial Fingerprint for Copyright Verification of Large Language Models , author=. arXiv preprint arXiv:2505.06304 , year=

  18. [18]

    Proceedings of the 1stWorkshop on GenAI Content Detection (GenAIDetect) , pages=

    Your large language models are leaving fingerprints , author=. Proceedings of the 1stWorkshop on GenAI Content Detection (GenAIDetect) , pages=

  19. [19]

    arXiv preprint arXiv:2407.01235 , year=

    A fingerprint for large language models , author=. arXiv preprint arXiv:2407.01235 , year=

  20. [20]

    arXiv preprint arXiv:2505.16785 , year=

    CoTSRF: Utilize Chain of Thought as Stealthy and Robust Fingerprint of Large Language Models , author=. arXiv preprint arXiv:2505.16785 , year=

  21. [21]

    arXiv preprint arXiv:2306.05540 , year=

    Detectllm: Leveraging log rank information for zero-shot detection of machine-generated text , author=. arXiv preprint arXiv:2306.05540 , year=

  22. [22]

    International conference on machine learning , pages=

    Detectgpt: Zero-shot machine-generated text detection using probability curvature , author=. International conference on machine learning , pages=. 2023 , organization=

  23. [23]

    Spotting llms with binoculars: Zero-shot detection of machine-generated text.arXiv preprint arXiv:2401.12070, 2024

    Spotting llms with binoculars: Zero-shot detection of machine-generated text , author=. arXiv preprint arXiv:2401.12070 , year=

  24. [24]

    Fast-detectgpt: Efficient zero-shot detection of machine-generated text via conditional probability curvature

    Fast-detectgpt: Efficient zero-shot detection of machine-generated text via conditional probability curvature , author=. arXiv preprint arXiv:2310.05130 , year=

  25. [25]

    Intrinsic Fingerprint of LLMs: Continue Training is NOT All You Need to Steal A Model!

    Intrinsic Fingerprint of LLMs: Continue Training is NOT All You Need to Steal A Model! , author=. arXiv preprint arXiv:2507.03014 , year=

  26. [26]

    Reef: Representation encoding fingerprints for large language models.arXiv preprint arXiv:2410.14273,

    Reef: Representation encoding fingerprints for large language models , author=. arXiv preprint arXiv:2410.14273 , year=

  27. [27]

    Advances in Neural Information Processing Systems , volume=

    WaterMax: breaking the LLM watermark detectability-robustness-quality trade-off , author=. Advances in Neural Information Processing Systems , volume=

  28. [28]

    International Conference on Machine Learning , pages=

    A watermark for large language models , author=. International Conference on Machine Learning , pages=. 2023 , organization=

  29. [29]

    arXiv preprint arXiv:2502.07760 , year=

    Scalable fingerprinting of large language models , author=. arXiv preprint arXiv:2502.07760 , year=

  30. [30]

    Hey, That's My Model! Introducing Chain & Hash, An LLM Fingerprinting Technique

    Hey, That's My Model! Introducing Chain & Hash, An LLM Fingerprinting Technique , author=. arXiv preprint arXiv:2407.10887 , year=

  31. [31]

    arXiv preprint arXiv:2401.12255 , year=

    Instructional fingerprinting of large language models , author=. arXiv preprint arXiv:2401.12255 , year=

  32. [32]

    arXiv preprint arXiv:2505.16723 , year=

    Robust LLM Fingerprinting via Domain-Specific Watermarks , author=. arXiv preprint arXiv:2505.16723 , year=

  33. [33]

    Scientific reports , volume=

    A cognitive fingerprint in human random number generation , author=. Scientific reports , volume=. 2021 , publisher=

  34. [34]

    2023 , organization=

    Can llms generate random numbers? evaluating llm sampling in controlled domains , author=. 2023 , organization=

  35. [35]

    arXiv preprint arXiv:2408.09656 , year=

    A Comparison of Large Language Model and Human Performance on Random Number Generation Tasks , author=. arXiv preprint arXiv:2408.09656 , year=

  36. [36]

    Deterministic or probabilistic? the psychology of llms as random number generators, 2025

    Deterministic or probabilistic? The psychology of LLMs as random number generators , author=. arXiv preprint arXiv:2502.19965 , year=

  37. [37]

    How random is random? evaluating the random- ness and humaness of llms’ coin flips, 2024

    How Random is Random? Evaluating the Randomness and Humaness of LLMs' Coin Flips , author=. arXiv preprint arXiv:2406.00092 , year=

  38. [38]

    Applied Network Science , year=

    LLMs prompted for graphs: hallucinations and generative capabilities , author=. Applied Network Science , year=

  39. [39]

    arXiv preprint arXiv:2502.00873 , year=

    Language models use trigonometry to do addition , author=. arXiv preprint arXiv:2502.00873 , year=

  40. [40]

    Neural Networks , volume=

    Arithmetic with language models: From memorization to computation , author=. Neural Networks , volume=. 2024 , publisher=

  41. [41]

    Proceedings of the AAAI Conference on Artificial Intelligence , volume=

    Queries, Representation & Detection: The Next 100 Model Fingerprinting Schemes , author=. Proceedings of the AAAI Conference on Artificial Intelligence , volume=

  42. [42]

    ICASSP 2023-2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) , pages=

    Model fingerprinting with benign inputs , author=. ICASSP 2023-2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) , pages=. 2023 , organization=

  43. [43]

    Decision Support Systems , volume=

    Transparency and accountability in AI decision support: Explaining and visualizing convolutional neural networks for text information , author=. Decision Support Systems , volume=. 2020 , publisher=

  44. [44]

    arXiv preprint arXiv:2505.04796 , year=

    Robust ML Auditing using Prior Knowledge , author=. arXiv preprint arXiv:2505.04796 , year=

  45. [45]

    AI , volume=

    Local ai governance: Addressing model safety and policy challenges posed by decentralized ai , author=. AI , volume=. 2025 , publisher=

  46. [46]

    Non-determinism of

    Non-determinism of" deterministic" llm settings , author=. arXiv preprint arXiv:2408.04667 , year=

  47. [47]

    2010 , number=

    A statistical test suite for random and pseudorandom number generators for cryptographic applications , author=. 2010 , number=

  48. [48]

    Proceedings of the 22nd acm sigkdd international conference on knowledge discovery and data mining , pages=

    Xgboost: A scalable tree boosting system , author=. Proceedings of the 22nd acm sigkdd international conference on knowledge discovery and data mining , pages=

  49. [49]

    Proceedings of the 29th symposium on operating systems principles , pages=

    Efficient memory management for large language model serving with pagedattention , author=. Proceedings of the 29th symposium on operating systems principles , pages=

  50. [50]

    Proceedings of the 2025 ACM Conference on Fairness, Accountability, and Transparency , pages=

    Position is Power: System Prompts as a Mechanism of Bias in Large Language Models (LLMs) , author=. Proceedings of the 2025 ACM Conference on Fairness, Accountability, and Transparency , pages=

  51. [51]

    Advances in Neural Information Processing Systems , volume=

    Sg-bench: Evaluating llm safety generalization across diverse tasks and prompt types , author=. Advances in Neural Information Processing Systems , volume=

  52. [52]

    arXiv preprint arXiv:2405.20947 , year =

    Or-bench: An over-refusal benchmark for large language models , author=. arXiv preprint arXiv:2405.20947 , year=

  53. [53]

    arXiv preprint arXiv:2502.05234 , year=

    Optimizing temperature for language models with multi-sample inference , author=. arXiv preprint arXiv:2502.05234 , year=

  54. [54]

    arXiv preprint arXiv:2502.18389 , year=

    Monte Carlo Temperature: a robust sampling strategy for LLM's uncertainty quantification methods , author=. arXiv preprint arXiv:2502.18389 , year=

  55. [55]

    Forty-second International Conference on Machine Learning , year=

    Assessing Safety Risks and Quantization-aware Safety Patching for Quantized Large Language Models , author=. Forty-second International Conference on Machine Learning , year=

  56. [56]

    arXiv preprint arXiv:2411.06835 , year=

    Harmlevelbench: Evaluating harm-level compliance and the impact of quantization on model alignment , author=. arXiv preprint arXiv:2411.06835 , year=

  57. [57]

    arXiv preprint arXiv:2407.03211 , year=

    How does quantization affect multilingual LLMs? , author=. arXiv preprint arXiv:2407.03211 , year=

  58. [58]

    arXiv preprint arXiv:2502.15799 , year=

    Investigating the impact of quantization methods on the safety and reliability of large language models , author=. arXiv preprint arXiv:2502.15799 , year=

  59. [59]

    arXiv preprint arXiv:2508.18088 , year=

    How Quantization Shapes Bias in Large Language Models , author=. arXiv preprint arXiv:2508.18088 , year=

  60. [60]

    How benchmark prediction from fewer data misses the mark.arXiv preprint arXiv:2506.07673, 2025

    How Benchmark Prediction from Fewer Data Misses the Mark , author=. arXiv preprint arXiv:2506.07673 , year=

  61. [61]

    2025 , eprint=

    A Systematic Survey of Model Extraction Attacks and Defenses: State-of-the-Art and Perspectives , author=. 2025 , eprint=

  62. [62]

    13th International Conference on Learning Representations, ICLR 2025 , pages=

    CAN WATERMARKS BE USED TO DETECT LARGE LANGUAGE MODEL INTELLECTUAL PROPERTY INFRINGEMENT FOR FREE? , author=. 13th International Conference on Learning Representations, ICLR 2025 , pages=. 2025 , organization=

  63. [63]

    Advances in Neural Information Processing Systems , volume=

    Refusal in language models is mediated by a single direction , author=. Advances in Neural Information Processing Systems , volume=

  64. [64]

    arXiv preprint arXiv:2407.14981 , year=

    Open problems in technical ai governance , author=. arXiv preprint arXiv:2407.14981 , year=

  65. [65]

    Internet Policy Review , volume=

    The European approach to regulating AI through technical standards , author=. Internet Policy Review , volume=. 2024 , publisher=

  66. [66]

    Internet Policy Review , volume=

    Brussels effect or experimentalism? The EU AI Act and global standard-setting , author=. Internet Policy Review , volume=. 2025 , publisher=