{"total":14,"items":[{"citing_arxiv_id":"2606.29175","ref_index":3,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Direct Causation in International Humanitarian Law and the Challenge of AI-Mediated Civilian Cyber Operations","primary_cat":"cs.AI","submitted_at":"2026-06-28T03:43:30+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":5.0,"formal_verification":"none","one_line_summary":"Autonomous AI cyber systems deployed by civilians fail the one-causal-step and integral-part requirements of the IHL direct participation test because harm arises from post-disengagement system decisions.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.21948","ref_index":12,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"SCI-Defense: Defending Manipulation Attacks from Generative Engine Optimization","primary_cat":"cs.LG","submitted_at":"2026-05-21T03:28:06+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":5.0,"formal_verification":"none","one_line_summary":"SCI-Defense combines perplexity detection, semantic integrity scoring across four manipulation dimensions, and inter-candidate detection to counter GEO attacks, reporting perfect precision on Amazon product data but domain-limited recall on web passages.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.20761","ref_index":17,"ref_count":2,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Findings of the Counter Turing Test: AI-Generated Text Detection","primary_cat":"cs.CL","submitted_at":"2026-05-20T06:01:17+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":2.0,"formal_verification":"none","one_line_summary":"Shared task findings show near-perfect binary detection of AI-generated text but greater difficulty in attributing outputs to particular language models.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.16462","ref_index":15,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Asking Back: Interaction-Layer Antidistillation Watermarks","primary_cat":"cs.CR","submitted_at":"2026-05-15T08:28:35+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"Interaction-layer antidistillation watermarks use system-prompt-induced behavioral markers like explicit follow-up questions that transfer to distilled student models at 45-89% relative fidelity and can be audited via black-box LLM-as-judge queries.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.12456","ref_index":12,"ref_count":2,"confidence":0.9,"is_internal_anchor":false,"paper_title":"TextSeal: A Localized LLM Watermark for Provenance & Distillation Protection","primary_cat":"cs.CR","submitted_at":"2026-05-12T17:44:41+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"TextSeal provides a localized, distortion-free LLM watermark that outperforms baselines in detection strength, remains effective in mixed human-AI text, preserves model performance, and transfers through distillation for provenance tracking.","context_count":1,"top_context_role":"method","top_context_polarity":"use_method","context_text":", pV )andV=|V|random variablesR= (R 1, . . . , RV )s.t.R v iid ∼ U [0,1]. LetV ⋆ = arg maxv R1/pv v . Then:P(V ⋆ =v) =p v. Proof of Proposition 1.For anyv∈ V,R v iid ∼ U [0,1] so,−ln(R v)follows an exponential distribution E(1). LetZ v :=− 1 pv ln(Rv). By construction,Z v ∼ E(p v), with densityf Zv(z) =p ve−pv.z. We now have: V ⋆ = arg max v R 1 pv v = arg min v Zv.(12) A well known result about exponential laws is that: Z = min v Zv ∼ E X v pv ! =E(1),(13) P(V ⋆ =v) = pvP j pj =p v.(14) This shows that for a given secret vectorr, the watermarking chooses a word which may be unlikely (low probabilityp V ⋆). Yet, on expectation over the secret keys, i.e., over r.v.R= (R1, . . . , RV ), the distribution of the chosen token follows the distribution given by the LLM."},{"citing_arxiv_id":"2605.07481","ref_index":10,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Vaporizer: Breaking Watermarking Schemes for Large Language Model Outputs","primary_cat":"cs.CR","submitted_at":"2026-05-08T09:24:29+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"Existing LLM watermarking schemes can be defeated by semantic-preserving attacks including lexical changes, machine translation, and neural paraphrasing.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.06987","ref_index":98,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Response Time Enhances Alignment with Heterogeneous Preferences","primary_cat":"cs.LG","submitted_at":"2026-05-07T22:05:23+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"Response times modeled as drift-diffusion processes enable consistent estimation of population-average preferences from heterogeneous anonymous binary choices.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.05503","ref_index":5,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Chainwash: Multi-Step Rewriting Attacks on Diffusion Language Model Watermarks","primary_cat":"cs.CL","submitted_at":"2026-05-06T23:00:26+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":5.0,"formal_verification":"none","one_line_summary":"Chained rewrites by open-weight LLMs reduce watermark detection on diffusion LM outputs from 87.9% to 4.86% after five steps across multiple styles and models.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.04305","ref_index":51,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"SWAN: Semantic Watermarking with Abstract Meaning Representation","primary_cat":"cs.CL","submitted_at":"2026-05-05T21:13:30+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"SWAN uses AMR to embed semantic watermarks that persist through paraphrases, matching SOTA detection on original text and improving AUC by 13.9 points on paraphrased RealNews data.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2604.23338","ref_index":80,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"A Systematic Survey of Security Threats and Defenses in LLM-Based AI Agents: A Layered Attack Surface Framework","primary_cat":"cs.CR","submitted_at":"2026-04-25T14:57:15+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"A new 7x4 taxonomy organizes agentic AI security threats by architectural layer and persistence timescale, revealing under-explored upper layers and missing defenses after surveying 116 papers.","context_count":1,"top_context_role":"background","top_context_polarity":"support","context_text":"and content classifiers [76], [77] catch syntactically anomalous jailbreaks but, by construction, fail against semantically coher- ent attacks (AutoDAN, PAIR) and against gradient-optimized low-perplexity suffixes (GCG). Red-teaming [78], [79] remains the primary empirical method for discovering L1 vulnerabilities before deployment, and watermarking [80] and differentially private training [81] address attribution and extraction respec- tively rather than jailbreaking. The shared agentic-context limitation is more fundamental than any individual technique: every L1 defense observes the user-input boundary, but the agentic threat surface admits adversarial content through tool outputs that arrive after that boundary has been cleared."},{"citing_arxiv_id":"2604.16058","ref_index":12,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"LLMSniffer: Detecting LLM-Generated Code via GraphCodeBERT and Supervised Contrastive Learning","primary_cat":"cs.SE","submitted_at":"2026-04-17T13:32:25+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":4.0,"formal_verification":"none","one_line_summary":"LLMSniffer improves detection of LLM-generated code on GPTSniffer and Whodunit benchmarks by fine-tuning GraphCodeBERT via two-stage supervised contrastive learning plus preprocessing and MLP classification.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2507.03014","ref_index":6,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Intrinsic Fingerprint of LLMs: Continue Training is NOT All You Need to Steal A Model!","primary_cat":"cs.CR","submitted_at":"2025-07-02T12:29:38+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"Standard deviation distributions of attention matrices in LLMs remain distinctive and stable after continued training, enabling fingerprinting to trace model lineage and detect potential plagiarism such as in Pangu Pro MoE.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2502.11336","ref_index":19,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"ExaGPT: Example-Based Machine-Generated Text Detection for Human Interpretability","primary_cat":"cs.CL","submitted_at":"2025-02-17T01:15:07+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"ExaGPT uses span-level similarity retrieval from human and LLM datastores to detect machine-generated text while supplying the matching spans as human-interpretable evidence, achieving up to 37-point accuracy gains over prior interpretable detectors at 1% FPR.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2303.11156","ref_index":64,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Can AI-Generated Text be Reliably Detected?","primary_cat":"cs.CL","submitted_at":"2023-03-17T17:53:19+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"Recursive paraphrasing attacks substantially lower detection rates for multiple AI text detectors with only minor quality loss, while a theoretical analysis ties best-case AUROC to total variation distance between human and AI distributions.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null}],"limit":50,"offset":0}