OA-WAM uses persistent address vectors and dynamic content vectors in object slots to enable addressable world-action prediction, improving robustness on manipulation benchmarks under scene changes.
Slotformer: Unsupervised visual dynamics simulation with object-centric models
6 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
roles
background 1polarities
background 1representative citing papers
UniSim learns a universal real-world simulator from orchestrated diverse datasets, enabling zero-shot deployment of policies trained purely in simulation.
Object-centric LeJEPA uses SAM object masks to extend LeJEPA's distributional objective to variable object sets and adds an instance-separating loss, outperforming image-level LeJEPA on DAVIS tracking, ImageNet classification, ADE20k segmentation, and NAVI re-identification across 10-100% of COCO da
DSSA decouples per-frame appearance from temporal identity in slot attention mechanisms to reduce slot swapping and improve temporal consistency in video object segmentation.
ChronoSC projects video temporal dynamics into a compact chrono-image via color stacking, transmits it with lightweight DeepJSCC, reconstructs explicitly, and applies a pre-trained BLIP model for VideoQA answers, delivering 192x bandwidth savings on CLEVRER.
WorldDP combines a high-level object-centric world model for subgoal planning with a low-level diffusion policy for execution, claiming better performance than baselines on multi-stage robotic manipulation benchmarks.
citing papers explorer
-
Object-centric LeJEPA
Object-centric LeJEPA uses SAM object masks to extend LeJEPA's distributional objective to variable object sets and adds an instance-separating loss, outperforming image-level LeJEPA on DAVIS tracking, ImageNet classification, ADE20k segmentation, and NAVI re-identification across 10-100% of COCO da
-
Dual-State Slot Attention: Decoupling Appearance and Identity for Video Object-Centric Learning
DSSA decouples per-frame appearance from temporal identity in slot attention mechanisms to reduce slot swapping and improve temporal consistency in video object segmentation.
-
ChronoSC: Task-Oriented Semantic Communication via Temporal-to-Color Encoding
ChronoSC projects video temporal dynamics into a compact chrono-image via color stacking, transmits it with lightweight DeepJSCC, reconstructs explicitly, and applies a pre-trained BLIP model for VideoQA answers, delivering 192x bandwidth savings on CLEVRER.