pith. sign in

hub

Learning transferable visual models from natural language supervision

13 Pith papers cite this work. Polarity classification is still indexing.

13 Pith papers citing it

hub tools

citation-role summary

background 2 method 1

citation-polarity summary

years

2026 9 2025 4

verdicts

UNVERDICTED 13

representative citing papers

Representation Fr\'echet Loss for Visual Generation

cs.CV · 2026-04-30 · unverdicted · novelty 8.0

Fréchet Distance optimized as FD-loss in representation space by decoupling population size from batch size improves generator quality, enables one-step generation from multi-step models, and motivates a multi-representation metric FDr^k.

Cambrian-P: Pose-Grounded Video Understanding

cs.CV · 2026-05-21 · unverdicted · novelty 6.0

Cambrian-P adds per-frame camera pose tokens and a regression head to video MLLMs, delivering 4.5-6.5% gains on spatial benchmarks, generalization to other video QA tasks, and SOTA streaming pose estimation on ScanNet.

PersonaVLM: Long-Term Personalized Multimodal LLMs

cs.CL · 2026-03-20 · unverdicted · novelty 6.0

PersonaVLM adds memory extraction, multi-turn retrieval-based reasoning, and personality inference to multimodal LLMs, yielding 22.4% gains on a new long-term personalization benchmark and outperforming GPT-4o.

Uni-Hand: Universal Hand Motion Forecasting in Egocentric Views

cs.CV · 2025-11-17 · unverdicted · novelty 6.0

Uni-Hand forecasts 2D/3D hand waypoints, head motion, and contact states in egocentric views using vision-language fusion and dual-branch diffusion, with new benchmarks for downstream robotics and action tasks.

Cambrian-S: Towards Spatial Supersensing in Video

cs.CV · 2025-11-06 · unverdicted · novelty 6.0

Cambrian-S introduces VSI-SUPER benchmarks for long-horizon spatial recall and counting, shows data scaling yields 30% gains on existing tests, and demonstrates a self-supervised next-latent predictor using surprise outperforms baselines on the new spatial supersensing tasks.

Swift Sampling: Selecting Temporal Surprises via Taylor Series

cs.CV · 2026-05-21 · unverdicted · novelty 5.0

Swift Sampling is a training-free frame selection method that uses Taylor expansions on video latent trajectories to pick temporally surprising frames, outperforming uniform sampling on long-video QA tasks.

citing papers explorer

Showing 13 of 13 citing papers.