Agent-Centric Observation Adaptation for Robust Visual Control under Dynamic Perturbations

· 2026 · cs.RO · arXiv 2604.24661

3 Pith papers cite this work. Polarity classification is still indexing.

3 Pith papers citing it

open full Pith review browse 3 citing papers arXiv PDF

abstract

Real-world visual systems face time-varying perturbations, including weather, sensor noise, compression artifacts, and background distractions. Existing image restoration methods are typically designed for fixed corruption types and optimized for pixel-level fidelity, leaving open two questions: how restoration behaves under non-stationary corruption switching, and whether pixel-level fidelity preserves the task-relevant information needed by downstream models. To study this setting, we introduce the Visual Degraded Control Suite (VDCS), a benchmark that injects Markov-switching physical degradations into rendered scenes. We further identify a fundamental failure mode of reconstruction-based representations: faithfully reconstructing corrupted observations forces the latent state to encode corruption-specific nuisance information, thereby contaminating downstream models. From an information-bottleneck perspective, anchoring the representation to the clean foreground eliminates this contamination. Motivated by this analysis, we propose \emph{Agent-Centric Observations with Mixture-of-Experts} (ACO-MoE), a frozen, plug-and-play observation adapter that combines a routed bank of restoration experts with a foreground-mask branch. ACO-MoE is pretrained entirely offline on synthetic rendered data with automatically generated degradation pairs and simulation-derived foreground masks, requiring no manual annotation. At inference time, it takes only corrupted RGB as input without corruption labels, clean reference frames, or foreground masks. Across VDCS, DMC-GB, and RoboSuite, ACO-MoE consistently improves downstream control with both model-free and model-based backbones, recovering 95.3\% of clean-input performance under challenging Markov-switching corruptions. It also generalizes zero-shot to unseen visual perturbations excluded from adapter pretraining.

representative citing papers

FRUC: Feedforward Dynamic Scene Reconstruction from Uncalibrated Collaborative Driving Views

cs.CV · 2026-05-28 · unverdicted · novelty 7.0

FRUC enables one-shot calibration-free dynamic scene reconstruction from collaborative driving views via a geometric Transformer, ego-centric occlusion priors, and zero-initialized residual denoising, claiming SOTA quality and speed on V2XReal and UrbanIng-V2X.

Universal Image Restoration via Internalized Chain-of-Thought Reasoning

cs.CV · 2026-06-16 · unverdicted · novelty 6.0

CoTIR fine-tunes a pre-trained image editing model using a differentiable CoT-style objective inspired by Lagrangian optimization to enable single-pass universal image restoration, supported by a new 5.2M-sample benchmark showing improved perceptual quality.

V2XCrafter: Learning to Generate Driving Scene Across Agents

cs.CV · 2026-05-28 · unverdicted · novelty 6.0

V2XCrafter introduces a progressive multi-agent diffusion model with cross-agent attention to generate controllable, consistent collaborative driving scenes for V2X data augmentation.

citing papers explorer

Showing 3 of 3 citing papers after filters.

FRUC: Feedforward Dynamic Scene Reconstruction from Uncalibrated Collaborative Driving Views cs.CV · 2026-05-28 · unverdicted · none · ref 8 · internal anchor
FRUC enables one-shot calibration-free dynamic scene reconstruction from collaborative driving views via a geometric Transformer, ego-centric occlusion priors, and zero-initialized residual denoising, claiming SOTA quality and speed on V2XReal and UrbanIng-V2X.
Universal Image Restoration via Internalized Chain-of-Thought Reasoning cs.CV · 2026-06-16 · unverdicted · none · ref 4 · internal anchor
CoTIR fine-tunes a pre-trained image editing model using a differentiable CoT-style objective inspired by Lagrangian optimization to enable single-pass universal image restoration, supported by a new 5.2M-sample benchmark showing improved perceptual quality.
V2XCrafter: Learning to Generate Driving Scene Across Agents cs.CV · 2026-05-28 · unverdicted · none · ref 7 · internal anchor
V2XCrafter introduces a progressive multi-agent diffusion model with cross-agent attention to generate controllable, consistent collaborative driving scenes for V2X data augmentation.

Agent-Centric Observation Adaptation for Robust Visual Control under Dynamic Perturbations

fields

years

verdicts

representative citing papers

citing papers explorer