Object Hallucination-Free Reinforcement Unlearning for Vision-Language Models

· 2026 · cs.CV · arXiv 2605.08031

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

open full Pith review browse 1 citing papers arXiv PDF

abstract

Vision-language models (VLMs) raise growing concerns about privacy, copyright, and bias, motivating machine unlearning to remove sensitive knowledge. However, existing methods primarily fine-tune the language decoder, leading to superficial forgetting that fails to erase underlying visual representations and often introduces object hallucination. We propose HFRU, a reinforcement unlearning framework that operates on the vision encoder for deep semantic removal. Our two-stage approach combines alignment disruption with GRPO-based optimization using a composite reward, including an abstraction reward that encourages semantically valid substitutions and mitigates hallucinations. Experiments on object recognition and face identity tasks show that HFRU achieves over 98% forgetting and retention performance, while introducing negligible object hallucination, significantly outperforming prior methods.Our code and implementation details are available at https://github.com/XMUDeepLIT/HFRU.

representative citing papers

ZeroUnlearn: Few-Shot Knowledge Unlearning in Large Language Models

cs.LG · 2026-05-16 · 2 refs

citing papers explorer

Showing 1 of 1 citing paper.

ZeroUnlearn: Few-Shot Knowledge Unlearning in Large Language Models cs.LG · 2026-05-16 · unreviewed · ref 9 · 2 links · internal anchor

Object Hallucination-Free Reinforcement Unlearning for Vision-Language Models

fields

years

verdicts

representative citing papers

citing papers explorer