pith. sign in

super hub Mixed citations

SAM 3: Segment Anything with Concepts

Mixed citation behavior. Most common role is method (58%).

306 Pith papers citing it
Method 58% of classified citations
abstract

We present Segment Anything Model (SAM) 3, a unified model that detects, segments, and tracks objects in images and videos based on concept prompts, which we define as either short noun phrases (e.g., "yellow school bus"), image exemplars, or a combination of both. Promptable Concept Segmentation (PCS) takes such prompts and returns segmentation masks and unique identities for all matching object instances. To advance PCS, we build a scalable data engine that produces a high-quality dataset with 4M unique concept labels, including hard negatives, across images and videos. Our model consists of an image-level detector and a memory-based video tracker that share a single backbone. Recognition and localization are decoupled with a presence head, which boosts detection accuracy. SAM 3 doubles the accuracy of existing systems in both image and video PCS, and improves previous SAM capabilities on visual segmentation tasks. We open source SAM 3 along with our new Segment Anything with Concepts (SA-Co) benchmark for promptable concept segmentation.

hub tools

citation-role summary

method 29 background 17 baseline 4

citation-polarity summary

claims ledger

  • abstract We present Segment Anything Model (SAM) 3, a unified model that detects, segments, and tracks objects in images and videos based on concept prompts, which we define as either short noun phrases (e.g., "yellow school bus"), image exemplars, or a combination of both. Promptable Concept Segmentation (PCS) takes such prompts and returns segmentation masks and unique identities for all matching object instances. To advance PCS, we build a scalable data engine that produces a high-quality dataset with 4M unique concept labels, including hard negatives, across images and videos. Our model consists of

authors

co-cited works

clear filters

representative citing papers

Online Segment 3D Gaussians via Launching Virtual Drones

cs.CV · 2026-07-02 · unverdicted · novelty 7.0

SAGO achieves setup-free interactive 3D Gaussian segmentation by modeling it as an online NBV planning task in a Markov process, delivering sub-second latency and over 50x speedup over prior setup-free methods.

UnfoldArt: Zero-Shot Recovery of Full Articulated 3D Objects from Text or Image

cs.CV · 2026-06-29 · unverdicted · novelty 7.0 · 2 refs

UnfoldArt uses a two-round structured debate between high-level semantic agents and low-level parameter agents, grounded in generated video, to infer articulation and reconstruct full articulated 3D objects including occluded geometry from text or image inputs.

Trustworthy Image Authentication using Forensic Knowledge Graphs

cs.CV · 2026-06-22 · unverdicted · novelty 7.0

Forensic Knowledge Graphs integrate forensic traces, causal dependencies, and scene links via a new authentication network and Iterative Context Refinement to outperform standard detectors and VLMs on detection, localization, and justification.

Thinking in Boxes: 3D Editing in Real Images Made Easy

cs.CV · 2026-06-18 · unverdicted · novelty 7.0

A method that treats 3D box pairs as exact transformation specs, adds a depth-aware floor reference, and trains an image generator on synthetic scenes plus Objectron videos to perform large 3D edits on real photographs.

Intrinsic 4D Gaussian Segmentation from Scene Cues

cs.CV · 2026-06-17 · unverdicted · novelty 7.0

Intrinsic-GS recovers object-level segmentation in 4D Gaussian scenes from intrinsic cues alone via affinity graph and Leiden partitioning, reaching 0.746 mIoU on Neu3D and 0.575 on HyperNeRF without mask supervision.

Recover, Discover, Plan: Learning Skills and Concepts from Robot Failures

cs.RO · 2026-06-16 · unverdicted · novelty 7.0

ReSYNC learns recovery skills via RL then discovers and refines relational predicates to enable abstract planning that generalizes failure avoidance to unseen long-horizon tasks, outperforming baselines by over 50% in simulation and transferring to real robots.

Human Universal Grasping

cs.RO · 2026-06-15 · unverdicted · novelty 7.0

HUG trains a flow-matching model on a new 1M-frame egocentric human grasp dataset to generate retargetable grasps from single RGB-D images, beating baselines by 23-34% on a new 90-object benchmark.

citing papers explorer

Showing 7 of 7 citing papers after filters.