pith. sign in

super hub Mixed citations

SAM 3: Segment Anything with Concepts

Mixed citation behavior. Most common role is method (58%).

217 Pith papers citing it
Method 58% of classified citations
abstract

We present Segment Anything Model (SAM) 3, a unified model that detects, segments, and tracks objects in images and videos based on concept prompts, which we define as either short noun phrases (e.g., "yellow school bus"), image exemplars, or a combination of both. Promptable Concept Segmentation (PCS) takes such prompts and returns segmentation masks and unique identities for all matching object instances. To advance PCS, we build a scalable data engine that produces a high-quality dataset with 4M unique concept labels, including hard negatives, across images and videos. Our model consists of an image-level detector and a memory-based video tracker that share a single backbone. Recognition and localization are decoupled with a presence head, which boosts detection accuracy. SAM 3 doubles the accuracy of existing systems in both image and video PCS, and improves previous SAM capabilities on visual segmentation tasks. We open source SAM 3 along with our new Segment Anything with Concepts (SA-Co) benchmark for promptable concept segmentation.

hub tools

citation-role summary

method 29 background 17 baseline 4

citation-polarity summary

claims ledger

  • abstract We present Segment Anything Model (SAM) 3, a unified model that detects, segments, and tracks objects in images and videos based on concept prompts, which we define as either short noun phrases (e.g., "yellow school bus"), image exemplars, or a combination of both. Promptable Concept Segmentation (PCS) takes such prompts and returns segmentation masks and unique identities for all matching object instances. To advance PCS, we build a scalable data engine that produces a high-quality dataset with 4M unique concept labels, including hard negatives, across images and videos. Our model consists of

authors

co-cited works

clear filters

representative citing papers

UnfoldArt: Zero-Shot Recovery of Full Articulated 3D Objects from Text or Image

cs.CV · 2026-06-29 · unverdicted · novelty 7.0 · 2 refs

UnfoldArt uses a two-round structured debate between high-level semantic agents and low-level parameter agents, grounded in generated video, to infer articulation and reconstruct full articulated 3D objects including occluded geometry from text or image inputs.

Mosaic: Compositional Multi-Concept Erasure via Vector Field Blending

cs.CV · 2026-05-25 · unverdicted · novelty 7.0

Mosaic is a framework for compositional multi-concept erasure in flow-based T2I models via spatial vector field blending without extra optimization, evaluated on the new CoME-Bench benchmark covering intra- and cross-category cases.

Towards Camera-Robust 3D Localization: Equation-Anchored Tool-Use for MLLMs

cs.CV · 2026-05-19 · unverdicted · novelty 7.0

Proposes an equation-anchored tool-use method for MLLMs that writes the pinhole back-projection equation in Chain-of-Thought and substitutes retrieved camera intrinsics and depths to achieve robustness in 3D object detection and visual grounding under rescaled intrinsics.

citing papers explorer

Showing 29 of 29 citing papers after filters.