arXiv preprint arXiv:2108.12617 , year=

Yu, H · 2021 · arXiv 2108.12617

5 Pith papers cite this work. Polarity classification is still indexing.

5 Pith papers citing it

representative citing papers

Unleashing Infinite Motion: Scaling Expressive Quadrupedal Motion via Generative Video Priors

cs.RO · 2026-06-26 · conditional · novelty 7.0

Uni-Mo generates 7,488 language-annotated quadruped motions via LLM prompts and video diffusion, lifts them to 3D trajectories, and trains policies achieving 96.7% real-robot success on 392 sampled motions.

Quo Vadis, Visual In-Context Learning? A Unified Benchmark Across Domains and Tasks

cs.CV · 2026-06-09 · unverdicted · novelty 7.0

The paper constructs the VIBE benchmark and evaluates six visual in-context learning models on 14 datasets, 12 tasks, and 106 combinations under a unified one-shot protocol, revealing limitations and failure modes.

GKDT: General Keypoint Detection Transformer

cs.CV · 2026-07-01 · unverdicted · novelty 6.0

Creates MegaKPT dataset and GKDT promptable transformer model for general keypoint detection across diverse objects with reported high accuracy on 22 test sets.

SOCO: Benchmarking Semantic Object Correspondence in Vision Foundation Models

cs.CV · 2026-05-29 · unverdicted · novelty 6.0 · 2 refs

SOCO is a new benchmark for semantic object correspondence that provides taxonomy, annotations, and language labels to evaluate part-level understanding in vision and multimodal foundation models.

AnyAct: Towards Human Reenactment of Character Motion From Video

cs.CV · 2026-05-15 · unverdicted · novelty 6.0 · 2 refs

AnyAct generates editable human reenactments from character videos via conditional motion generation from transferable sparse local 2D articulated cues, with designs for human-only supervision and global-local decoupling.

citing papers explorer

Showing 5 of 5 citing papers.

Unleashing Infinite Motion: Scaling Expressive Quadrupedal Motion via Generative Video Priors cs.RO · 2026-06-26 · conditional · none · ref 25
Uni-Mo generates 7,488 language-annotated quadruped motions via LLM prompts and video diffusion, lifts them to 3D trajectories, and trains policies achieving 96.7% real-robot success on 392 sampled motions.
Quo Vadis, Visual In-Context Learning? A Unified Benchmark Across Domains and Tasks cs.CV · 2026-06-09 · unverdicted · none · ref 107
The paper constructs the VIBE benchmark and evaluates six visual in-context learning models on 14 datasets, 12 tasks, and 106 combinations under a unified one-shot protocol, revealing limitations and failure modes.
GKDT: General Keypoint Detection Transformer cs.CV · 2026-07-01 · unverdicted · none · ref 70
Creates MegaKPT dataset and GKDT promptable transformer model for general keypoint detection across diverse objects with reported high accuracy on 22 test sets.
SOCO: Benchmarking Semantic Object Correspondence in Vision Foundation Models cs.CV · 2026-05-29 · unverdicted · none · ref 77 · 2 links
SOCO is a new benchmark for semantic object correspondence that provides taxonomy, annotations, and language labels to evaluate part-level understanding in vision and multimodal foundation models.
AnyAct: Towards Human Reenactment of Character Motion From Video cs.CV · 2026-05-15 · unverdicted · none · ref 85 · 2 links
AnyAct generates editable human reenactments from character videos via conditional motion generation from transferable sparse local 2D articulated cues, with designs for human-only supervision and global-local decoupling.

arXiv preprint arXiv:2108.12617 , year=

fields

years

verdicts

representative citing papers

citing papers explorer