OBEYED-VLA improves VLA robustness in cluttered real-world manipulation by disentangling perception into VLM-based object-centric grounding and geometry-aware stages, then fine-tuning the policy only on single-object demonstrations.
Grad-cam: Visual explanations from deep networks via gradient-based localization
2 Pith papers cite this work. Polarity classification is still indexing.
representative citing papers
Integrated gradients on a 10-class domestic sound classifier yields 0.39 mean IoU, 0.52 frame F1 and 82.6% Pointing Game accuracy for temporal event detection, approaching weakly and strongly supervised framewise CNN baselines.
citing papers explorer
-
Clutter-Robust Vision-Language-Action Models through Object-Centric and Geometry Grounding
OBEYED-VLA improves VLA robustness in cluttered real-world manipulation by disentangling perception into VLM-based object-centric grounding and geometry-aware stages, then fine-tuning the policy only on single-object demonstrations.
-
Evaluating the Temporal Detection Capability of Integrated Gradients Applied on Sound Classifier
Integrated gradients on a 10-class domestic sound classifier yields 0.39 mean IoU, 0.52 frame F1 and 82.6% Pointing Game accuracy for temporal event detection, approaching weakly and strongly supervised framewise CNN baselines.