KBF uses stable numerical recall near the knowledge boundary to fingerprint and audit black-box LLM APIs, successfully detecting all tested substitutions and some real-world inconsistencies across production endpoints.
arXiv preprint arXiv:2509.21057 (2025)
5 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
years
2026 5verdicts
UNVERDICTED 5representative citing papers
GREW uses a secret-key-driven green-red item partition and three ranking-integrated modules to embed verifiable watermarks in recommender systems that resist extraction attacks without data injection.
DeP mitigates MLLM hallucinations by dynamically perturbing text prompts to identify and reinforce stable visual evidence regions while counteracting language prior biases using attention variance and logit statistics.
RLSpoofer trains a 4B model on 100 watermarked paraphrase pairs to spoof PF watermarks at 62% success rate, far exceeding baselines trained on up to 10,000 samples.
Introduces a sketch-based watermarking method for masked diffusion language models providing an order-agnostic detection statistic decoupled from local context.
citing papers explorer
-
Decoding by Perturbation: Mitigating MLLM Hallucinations via Dynamic Textual Perturbation
DeP mitigates MLLM hallucinations by dynamically perturbing text prompts to identify and reinforce stable visual evidence regions while counteracting language prior biases using attention variance and logit statistics.