Large vision-language models exhibit severe object hallucination that varies with training instructions, and the proposed POPE polling method evaluates it more stably and flexibly than prior approaches.
Title resolution pending
5 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
roles
background 1polarities
background 1representative citing papers
A framework trains keypoint detectors on inpainted markerless robot images and uses runtime inpainting plus UKF for robust vision-based control without models or calibration.
A UAS with YOLO-based swimmer detection and DES simulations reduces drowning rescue response time by a factor of five versus standard operations in tested lake areas.
PaliGemma is an open 3B VLM based on SigLIP and Gemma that achieves strong performance on nearly 40 diverse open-world tasks including benchmarks, remote-sensing, and segmentation.
NTGA is the first clean-label generalization attack under black-box settings but is vulnerable to adversarial training and image transformations, with newer attacks outperforming it.
citing papers explorer
-
Evaluating Object Hallucination in Large Vision-Language Models
Large vision-language models exhibit severe object hallucination that varies with training instructions, and the proposed POPE polling method evaluates it more stably and flexibly than prior approaches.
-
Utilizing Inpainting for Keypoint Detection for Vision-Based Control of Robotic Manipulators
A framework trains keypoint detectors on inpainted markerless robot images and uses runtime inpainting plus UKF for robust vision-based control without models or calibration.
-
Autonomous Unmanned Aircraft Systems for Enhanced Search and Rescue of Drowning Swimmers: Image-Based Localization and Mission Simulation
A UAS with YOLO-based swimmer detection and DES simulations reduces drowning rescue response time by a factor of five versus standard operations in tested lake areas.
-
PaliGemma: A versatile 3B VLM for transfer
PaliGemma is an open 3B VLM based on SigLIP and Gemma that achieves strong performance on nearly 40 diverse open-world tasks including benchmarks, remote-sensing, and segmentation.
-
SoK: A Comprehensive Analysis of the Current Status of Neural Tangent Generalization Attacks with Research Directions
NTGA is the first clean-label generalization attack under black-box settings but is vulnerable to adversarial training and image transformations, with newer attacks outperforming it.