arXiv preprint arXiv:2402.07841 , year=

Michael Duan et al · 2024 · arXiv 2402.07841

12 Pith papers cite this work. Polarity classification is still indexing.

12 Pith papers citing it

read on arXiv browse 12 citing papers

citation-role summary

background 1

citation-polarity summary

background 1

representative citing papers

Learning the Signature of Memorization in Autoregressive Language Models

cs.CL · 2026-04-03 · accept · novelty 8.0

A classifier trained only on transformer fine-tuning data detects an invariant memorization signature that transfers to Mamba, RWKV-4, and RecurrentGemma with AUCs of 0.963, 0.972, and 0.936.

MRMMIA: Membership Inference Attacks on Memory in Chat Agents

cs.CR · 2026-05-27 · unverdicted · novelty 7.0

MRMMIA is a multi-recall-probe membership inference attack that extracts signals from chat agent memory and outperforms baselines in black-, gray-, and white-box settings.

DistractMIA: Black-Box Membership Inference on Vision-Language Models via Semantic Distraction

cs.CV · 2026-05-12 · unverdicted · novelty 7.0

DistractMIA performs output-only black-box membership inference on vision-language models by inserting semantic distractors and measuring shifts in generated text responses.

Decomposing Memorization Reduction in Privacy-Preserving Fine-Tuning of SLMs for CSIRTs

cs.CR · 2026-06-26 · unverdicted · novelty 6.0

Controlled experiments across 96 LoRA adapters show that reduced optimizer updates explain nearly all observed memorization drops in DP-SGD fine-tuning, HMAC pseudonymization cuts exposure 40-61% without creating new targets, and 1-3B models achieve only 0.19-0.28 F1 under the tested budget.

Auditing Training Data in Generative Music Models via Black-Box Membership Inference

cs.LG · 2026-05-28 · unverdicted · novelty 6.0

Black-box membership inference on text-to-music models reaches up to 98.6% accuracy by training an auditor on semantic alignment patterns extracted from shadow-model generations.

TRACER: A Semantic-Aware Framework for Fine-Grained Contamination Detection in Code LLMs

cs.SE · 2026-05-22 · unverdicted · novelty 6.0

TRACER presents a semantic-aware framework and the first benchmark for fine-grained code contamination detection across three levels of overlap, reporting F1 scores of 0.91-0.92 and large gains over prior methods.

Distinguishable Deletion: Unifying Knowledge Erasure and Refusal for Large Language Model Unlearning

cs.LG · 2026-05-16 · unverdicted · novelty 6.0

Distinguishable Deletion unifies knowledge erasure and refusal for LLM unlearning via an energy index that enforces boundaries during training and enables refusal at inference.

Black-box model classification under the discriminative factorization

cs.LG · 2026-05-08 · unverdicted · novelty 6.0

Discriminative factorization distinguishes high-quality query sets for black-box model classification, with chance-level error decaying exponentially in query budget and parameters predicting empirical decay rates on auditing tasks.

Pop Quiz Attack: Black-box Membership Inference Attacks Against Large Language Models

cs.CR · 2026-05-07 · unverdicted · novelty 6.0

PopQuiz Attack infers LLM training data membership by turning examples into quiz questions and measuring answer accuracy, reaching 0.873 average ROC-AUC across six models and outperforming prior methods by 20.6%.

CoLA: A Choice Leakage Attack Framework to Expose Privacy Risks in Subset Training

cs.CR · 2026-04-14 · unverdicted · novelty 6.0

CoLA reveals that subset training creates new privacy leakage surfaces via side-channel metadata and model outputs, enabling training-membership and selection-participation membership inference attacks.

From Rookie to Expert: Manipulating LLMs for Automated Vulnerability Exploitation in Enterprise Software

cs.SE · 2025-12-28 · unverdicted · novelty 6.0

RSA prompting enables LLMs to automatically create functional exploits for CVEs in Odoo ERP, succeeding on all tested cases in 3-5 rounds and removing the need for manual effort.

Hey, That's My Data! Token-Only Dataset Inference in Large Language Models

cs.CL · 2025-06-06 · unverdicted · novelty 6.0

CatShift detects training data membership in LLMs by comparing output shifts induced by fine-tuning on member versus non-member data, relying on catastrophic forgetting without requiring logit access.

citing papers explorer

Showing 2 of 2 citing papers after filters.

TRACER: A Semantic-Aware Framework for Fine-Grained Contamination Detection in Code LLMs cs.SE · 2026-05-22 · unverdicted · none · ref 14
TRACER presents a semantic-aware framework and the first benchmark for fine-grained code contamination detection across three levels of overlap, reporting F1 scores of 0.91-0.92 and large gains over prior methods.
From Rookie to Expert: Manipulating LLMs for Automated Vulnerability Exploitation in Enterprise Software cs.SE · 2025-12-28 · unverdicted · none · ref 6
RSA prompting enables LLMs to automatically create functional exploits for CVEs in Odoo ERP, succeeding on all tested cases in 3-5 rounds and removing the need for manual effort.

arXiv preprint arXiv:2402.07841 , year=

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer