pith. machine review for the scientific record. sign in

arxiv: 2304.02643 · v1 · submitted 2023-04-05 · 💻 cs.CV · cs.AI· cs.LG

Recognition: 3 theorem links

Segment Anything

Alexander C. Berg, Alexander Kirillov, Chloe Rolland, Eric Mintun, Hanzi Mao, Laura Gustafson, Nikhila Ravi, Piotr Doll\'ar, Ross Girshick, Spencer Whitehead, Tete Xiao, Wan-Yen Lo

Authors on Pith no claims yet

Pith reviewed 2026-05-11 06:08 UTC · model grok-4.3

classification 💻 cs.CV cs.AIcs.LG
keywords image segmentationzero-shot transferpromptable modelslarge-scale datasetfoundation modelscomputer visionSAMSA-1B
0
0 comments X

The pith

A promptable model trained on a billion masks enables zero-shot segmentation that often matches supervised results.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces a new task, model, and dataset for image segmentation. Using an efficient version of the model to collect data in a loop, the authors assembled the largest segmentation dataset to date, containing over one billion masks across eleven million images. The resulting model is built to accept prompts such as points or boxes, allowing it to generalize zero-shot to new image distributions and tasks without retraining. Evaluations across many tasks show that this zero-shot performance is often competitive with or better than earlier models trained with full supervision for each specific task. The work releases both the model and dataset to support further research on foundation models for vision.

Core claim

We introduce the Segment Anything (SA) project: a new task, model, and dataset for image segmentation. Using our efficient model in a data collection loop, we built the largest segmentation dataset to date, with over 1 billion masks on 11M images. The model is designed and trained to be promptable, so it can transfer zero-shot to new image distributions and tasks. We evaluate its capabilities on numerous tasks and find that its zero-shot performance is impressive -- often competitive with or even superior to prior fully supervised results.

What carries the argument

The promptable Segment Anything Model (SAM) that takes user-provided prompts such as points, boxes, or coarse masks and outputs object segmentation masks.

If this is right

  • The model can be applied directly to new image types and tasks without collecting new labeled data or retraining.
  • Prompt-based interaction becomes a practical way to guide segmentation on arbitrary images.
  • The released dataset supports training or fine-tuning of additional vision models at large scale.
  • Releasing both the model and data lowers the barrier for researchers to experiment with promptable segmentation.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The self-supervised data collection loop may offer a template for building large datasets in other vision domains where annotation is expensive.
  • Promptable architectures could extend beyond segmentation to tasks like detection or editing with similar zero-shot benefits.
  • Interactive tools built on this model might reduce the need for per-task model training in applied settings such as medical imaging or content creation.
  • Performance on video sequences or 3D data would test whether the promptable property holds across temporal and spatial dimensions.

Load-bearing premise

That the zero-shot results reflect genuine generalization to new distributions rather than overfitting to the self-collected data or the evaluation tasks.

What would settle it

A controlled test on a fresh image domain and segmentation task where the model's zero-shot accuracy falls clearly below a model trained with full supervision on that same task.

read the original abstract

We introduce the Segment Anything (SA) project: a new task, model, and dataset for image segmentation. Using our efficient model in a data collection loop, we built the largest segmentation dataset to date (by far), with over 1 billion masks on 11M licensed and privacy respecting images. The model is designed and trained to be promptable, so it can transfer zero-shot to new image distributions and tasks. We evaluate its capabilities on numerous tasks and find that its zero-shot performance is impressive -- often competitive with or even superior to prior fully supervised results. We are releasing the Segment Anything Model (SAM) and corresponding dataset (SA-1B) of 1B masks and 11M images at https://segment-anything.com to foster research into foundation models for computer vision.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The manuscript introduces the Segment Anything (SA) project, comprising a new promptable segmentation task, the Segment Anything Model (SAM), and the SA-1B dataset of over 1 billion masks on 11 million images. The dataset is constructed via a multi-stage data engine that uses an efficient version of the model itself to propose, refine, and automatically generate masks. The central claim is that the resulting promptable model transfers zero-shot to new image distributions and tasks, with performance that is often competitive with or superior to prior fully supervised methods; the model and dataset are released publicly.

Significance. If the zero-shot generalization results are robust, this work would mark a notable advance toward foundation models in computer vision by demonstrating a single promptable model that can handle diverse segmentation tasks without task-specific training or fine-tuning. The unprecedented scale of SA-1B and the open release of both model and data constitute clear strengths that could enable substantial follow-on research.

major comments (2)
  1. [Data Engine section] Data Engine section: The staged collection process (particularly stages 2 and 3) invokes the model to generate and refine masks, so the training distribution is shaped by the model's own inductive biases. This creates a circularity risk that could undermine the zero-shot claim; the manuscript must supply a concrete analysis (e.g., ablation on held-out image sources or comparison of mask statistics before/after the automatic stages) showing that reported gains on external benchmarks are not artifacts of distribution alignment.
  2. [Experiments section] Experiments section: The claim that zero-shot performance is 'often competitive with or even superior to prior fully supervised results' is load-bearing, yet the manuscript provides limited detail on exact metrics, chosen baselines, error bars, and statistical tests across the evaluated tasks. Full per-task tables with variance estimates and explicit comparison protocols are required to substantiate the central performance assertion.
minor comments (2)
  1. [Abstract] Abstract: The phrase 'numerous tasks' is vague; a short parenthetical list of the primary evaluation benchmarks would improve immediate clarity.
  2. [Model section] Notation: The distinction between the 'efficient' model used in the data engine and the final SAM should be introduced with explicit symbols or subsection headings to avoid reader confusion in later sections.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback on the data engine and experimental reporting. We address each major comment below and will revise the manuscript accordingly to strengthen the presentation of our results.

read point-by-point responses
  1. Referee: [Data Engine section] Data Engine section: The staged collection process (particularly stages 2 and 3) invokes the model to generate and refine masks, so the training distribution is shaped by the model's own inductive biases. This creates a circularity risk that could undermine the zero-shot claim; the manuscript must supply a concrete analysis (e.g., ablation on held-out image sources or comparison of mask statistics before/after the automatic stages) showing that reported gains on external benchmarks are not artifacts of distribution alignment.

    Authors: We acknowledge the potential circularity concern arising from the model's role in stages 2 and 3 of the data engine. However, the zero-shot evaluations use entirely external benchmarks and image distributions that were never seen during data collection or training. To directly address this, we will add a new analysis subsection in the revised manuscript that includes: (1) mask statistic comparisons (e.g., size, complexity, and diversity metrics) before and after the automatic stages, and (2) performance ablations on held-out image sources excluded from the data engine. These additions will demonstrate that the reported zero-shot gains on external tasks are not artifacts of distribution alignment. revision: yes

  2. Referee: [Experiments section] Experiments section: The claim that zero-shot performance is 'often competitive with or even superior to prior fully supervised results' is load-bearing, yet the manuscript provides limited detail on exact metrics, chosen baselines, error bars, and statistical tests across the evaluated tasks. Full per-task tables with variance estimates and explicit comparison protocols are required to substantiate the central performance assertion.

    Authors: We agree that the central performance claim requires more granular reporting for full substantiation. In the revised manuscript, we will expand the Experiments section with complete per-task tables that include exact metrics for all evaluated tasks, the specific baselines used, error bars or variance estimates (from multiple seeds or cross-validation where feasible), and results of statistical significance tests. We will also add explicit descriptions of the comparison protocols, including how prompts were generated and how zero-shot transfer was measured against fully supervised methods. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical zero-shot evaluations remain independent of data engine

full rationale

The paper's core claim is empirical: a promptable model trained on SA-1B achieves competitive zero-shot results on external tasks and image distributions. The data engine (staged collection using an efficient model variant) is a practical annotation procedure whose outputs are then used for supervised training; the reported evaluations use separate benchmarks whose ground truth and image sources are not generated by the same loop. No equation, uniqueness theorem, or prediction reduces by construction to a fitted parameter or self-generated input. Self-citations are absent from the load-bearing steps, and the architecture's promptability is justified by design and training rather than by renaming prior results.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The abstract provides no explicit free parameters, axioms, or invented entities beyond standard deep-learning training assumptions and the effectiveness of the promptable design.

axioms (1)
  • domain assumption Standard deep learning assumptions on generalization from large-scale training data
    The zero-shot transfer claim rests on the model learning general segmentation capabilities from the collected data.

pith-pipeline@v0.9.0 · 5466 in / 1071 out tokens · 36264 ms · 2026-05-11T06:08:52.324361+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 58 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. Does it Really Count? Assessing Semantic Grounding in Text-Guided Class-Agnostic Counting

    cs.CV 2026-05 conditional novelty 8.0

    Current CAC models often count the wrong objects because they misalign text prompts with visual content, as demonstrated by new negative-label and distractor tests on the MUCCA dataset.

  2. MedCore: Boundary-Preserving Medical Core Pruning for MedSAM

    cs.CV 2026-05 unverdicted novelty 7.0

    MedCore achieves 60% parameter and 58.4% FLOP reduction on MedSAM with Dice 0.9549 and preserved boundary metrics via dual-intervention pruning and a new boundary leverage principle.

  3. Qwen3-VL-Seg: Unlocking Open-World Referring Segmentation with Vision-Language Grounding

    cs.CV 2026-05 unverdicted novelty 7.0

    Qwen3-VL-Seg decodes MLLM bounding boxes into pixel-level referring segmentation via a lightweight box-guided mask decoder, new SA1B-ORS training data, and ORS-Bench evaluation, showing strong open-world performance.

  4. OA-WAM: Object-Addressable World Action Model for Robust Robot Manipulation

    cs.RO 2026-05 unverdicted novelty 7.0

    OA-WAM uses persistent address vectors and dynamic content vectors in object slots to enable addressable world-action prediction, improving robustness on manipulation benchmarks under scene changes.

  5. Large-Scale High-Quality 3D Gaussian Head Reconstruction from Multi-View Captures

    cs.CV 2026-05 unverdicted novelty 7.0

    HeadsUp maps multi-view captures to UV-parameterized 3D Gaussians on a template via an encoder-decoder, achieving state-of-the-art quality and generalization after training on more than 10,000 subjects.

  6. Does it Really Count? Assessing Semantic Grounding in Text-Guided Class-Agnostic Counting

    cs.CV 2026-05 unverdicted novelty 7.0

    Text-guided class-agnostic counting models exhibit significant weaknesses in grounding textual prompts to visual objects, as demonstrated by new negative-label and distractor tests on a multi-category dataset.

  7. IMPACT-CYCLE: A Contract-Based Multi-Agent System for Claim-Level Supervisory Correction of Long-Video Semantic Memory

    cs.CV 2026-04 unverdicted novelty 7.0

    A contract-based multi-agent system maintains a claim-level semantic memory for long videos, enabling targeted corrections that raise VQA accuracy from 0.71 to 0.79 and cut human arbitration cost by 4.8x on VidOR.

  8. A 3D SAM-Based Progressive Prompting Framework for Multi-Task Segmentation of Radiotherapy-induced Normal Tissue Injuries in Limited-Data Settings

    cs.CV 2026-04 unverdicted novelty 7.0

    A progressive prompting framework on 3D SAM with text, dose-box, and click prompts plus small-target loss achieves reliable multi-task segmentation of osteoradionecrosis, cerebral edema, and cerebral radiation necrosi...

  9. Seg2Change: Adapting Open-Vocabulary Semantic Segmentation Model for Remote Sensing Change Detection

    cs.CV 2026-04 conditional novelty 7.0

    Seg2Change adapts open-vocabulary segmentation models to open-vocabulary change detection via a category-agnostic change head and new dataset CA-CDD, delivering +9.52 IoU on WHU-CD and +5.50 mIoU on SECOND.

  10. Off-the-shelf Vision Models Benefit Image Manipulation Localization

    cs.CV 2026-04 unverdicted novelty 7.0

    ReVi adapter enables off-the-shelf vision models to localize image manipulations by separating and enhancing manipulation cues from semantic features without full model retraining.

  11. Training a Student Expert via Semi-Supervised Foundation Model Distillation

    cs.CV 2026-04 conditional novelty 7.0

    A semi-supervised framework distills vision foundation models into compact instance segmentation experts that outperform their teachers by up to 11.9 AP on Cityscapes and 8.6 AP on ADE20K while being 11 times smaller.

  12. SEED-Bench: Benchmarking Multimodal LLMs with Generative Comprehension

    cs.CL 2023-07 unverdicted novelty 7.0

    SEED-Bench is a new benchmark of 19K multiple-choice questions for evaluating generative comprehension in multimodal LLMs across 12 image and video dimensions.

  13. VoxPoser: Composable 3D Value Maps for Robotic Manipulation with Language Models

    cs.RO 2023-07 unverdicted novelty 7.0

    VoxPoser uses LLMs to compose 3D value maps via VLM interaction for model-based synthesis of robust robot trajectories on open-set language-specified manipulation tasks.

  14. LLaMA-Adapter: Efficient Fine-tuning of Language Models with Zero-init Attention

    cs.CV 2023-03 conditional novelty 7.0

    LLaMA-Adapter turns frozen LLaMA 7B into a capable instruction follower using only 1.2M new parameters and zero-init attention, matching Alpaca while extending to image-conditioned reasoning on ScienceQA and COCO.

  15. ASIP-Planner: Adaptive Planning for UAV Surface Inspection in Partially Known Indoor Environments

    cs.RO 2026-05 unverdicted novelty 6.0

    ASIP-Planner achieves near-complete surface coverage and shorter trajectories in partially known indoor environments by clustering inspection targets globally and adapting viewing angles locally to handle occlusions.

  16. HeteroGenManip: Generalizable Manipulation For Heterogeneous Object Interactions

    cs.RO 2026-05 unverdicted novelty 6.0

    HeteroGenManip decouples grasp localization from interaction planning using task-conditioned foundation models and multi-model diffusion policies, delivering 31% average gains in broad simulation tasks and 36.7% in fo...

  17. HeteroGenManip: Generalizable Manipulation For Heterogeneous Object Interactions

    cs.RO 2026-05 unverdicted novelty 6.0

    A task-conditioned two-stage system decouples grasp localization from interaction trajectory planning using specialized foundation models to improve generalization across heterogeneous object types.

  18. Few-Click-Driven Interactive 3D Segmentation with Semantic Embedding

    cs.CV 2026-05 unverdicted novelty 6.0

    A point-Transformer interactive 3D instance segmentation model handles multiple clicks jointly in one pass and reports over 20% mIoU gains versus baselines plus 8-10% cross-dataset improvement for one-click-per-instan...

  19. Leveraging Image Generators to Address Training Data Scarcity: The Gen4Regen Dataset for Forest Regeneration Mapping

    cs.CV 2026-05 conditional novelty 6.0

    Mixing real UAV imagery with 2101 AI-generated image-mask pairs improves semantic segmentation F1 scores for fine-grained forest species by over 15 percentage points overall and up to 30 points for rare classes.

  20. YOTOnet: Zero-Shot Cross-Domain Fault Diagnosis via Domain-Conditioned Mixture of Experts

    cs.LG 2026-05 unverdicted novelty 6.0

    YOTOnet achieves improved zero-shot cross-domain fault diagnosis on bearing datasets by combining a physics-aware invariant feature distiller with domain-conditioned sparse experts, showing performance scaling as more...

  21. Approaching human parity in the quality of automated organoid image segmentation

    cs.CV 2026-05 conditional novelty 6.0

    A composite SAM-based method segments organoid images with accuracy matching or approaching inter-observer variability among human annotators.

  22. Learning Equivariant Neural-Augmented Object Dynamics From Few Interactions

    cs.RO 2026-05 unverdicted novelty 6.0

    PIEGraph augments a spring-mass particle model with an equivariant GNN and novel action representation to predict accurate object dynamics for robotic manipulation from few interactions.

  23. GS-Playground: A High-Throughput Photorealistic Simulator for Vision-Informed Robot Learning

    cs.RO 2026-04 unverdicted novelty 6.0

    GS-Playground delivers a high-throughput photorealistic simulator for vision-informed robot learning via parallel physics integrated with batch 3D Gaussian Splatting at 10^4 FPS and an automated Real2Sim workflow for ...

  24. DiffuSAM: Diffusion-Based Prompt-Free SAM2 for Few-Shot and Source-Free Medical Image Segmentation

    cs.CV 2026-04 unverdicted novelty 6.0

    DiffuSAM synthesizes SAM2-compatible mask embeddings via a diffusion prior conditioned on prior slices to enable accurate prompt-free medical image segmentation under SF-UDA and few-shot settings.

  25. AgentLens: Adaptive Visual Modalities for Human-Agent Interaction in Mobile GUI Agents

    cs.HC 2026-04 unverdicted novelty 6.0

    AgentLens adaptively deploys Full UI, Partial UI, and GenUI modalities with virtual display overlays for mobile GUI agents, yielding 85.7% user preference and best-in-study usability in a 21-participant evaluation.

  26. SpaceDex: Generalizable Dexterous Grasping in Tiered Workspaces

    cs.RO 2026-04 unverdicted novelty 6.0

    SpaceDex achieves 63% success grasping unseen objects in tiered workspaces via VLM spatial planning and arm-hand feature separation, beating a 39% tabletop baseline in 100 real trials.

  27. Chain Of Interaction Benchmark (COIN): When Reasoning meets Embodied Interaction

    cs.RO 2026-04 unverdicted novelty 6.0

    COIN provides 50 interactive robotic tasks, a 1000-demonstration dataset collected via AR teleoperation, and metrics showing that CodeAsPolicy, VLA, and H-VLA models fail at causally-dependent interactive reasoning du...

  28. One-Shot Cross-Geometry Skill Transfer through Part Decomposition

    cs.RO 2026-04 unverdicted novelty 6.0

    Part decomposition with generative shape models allows one-shot robot skill transfer across unfamiliar object geometries in simulation and real settings.

  29. From Boundaries to Semantics: Prompt-Guided Multi-Task Learning for Petrographic Thin-section Segmentation

    cs.CV 2026-04 unverdicted novelty 6.0

    Petro-SAM adapts SAM via a Merge Block for polarized views plus multi-scale fusion and color-entropy priors to jointly achieve grain-edge and lithology segmentation in petrographic images.

  30. Granularity-Aware Transfer for Tree Instance Segmentation in Synthetic and Real Forests

    cs.CV 2026-04 unverdicted novelty 6.0

    Granularity-aware distillation improves tree instance segmentation accuracy on real forest images by merging logits and unifying masks from fine-grained synthetic teachers despite coarse real labels.

  31. GTPBD-MM: A Global Terraced Parcel and Boundary Dataset with Multi-Modality

    cs.CV 2026-04 unverdicted novelty 6.0

    GTPBD-MM is the first multimodal benchmark for global terraced parcel extraction, integrating image, text, and DEM data with experiments showing that textual and terrain cues improve delineation accuracy over image-on...

  32. Self-supervised Pretraining of Cell Segmentation Models

    cs.CV 2026-04 unverdicted novelty 6.0

    DINOCell achieves a SEG score of 0.784 on LIVECell by self-supervised domain adaptation of DINOv2, improving 10.42% over SAM-based models and showing strong zero-shot transfer.

  33. Text-Guided 6D Object Pose Rearrangement via Closed-Loop VLM Agents

    cs.CV 2026-04 unverdicted novelty 6.0

    Closed-loop VLM agents using multi-view reasoning, object-centered visualization, and single-axis rotation prediction achieve superior text-guided 6D pose rearrangement for target objects in scenes.

  34. GESS: Multi-cue Guided Local Feature Learning via Geometric and Semantic Synergy

    cs.CV 2026-04 unverdicted novelty 6.0

    GESS introduces joint semantic-normal and depth stability prediction heads, the SDAK keypoint mechanism, and the UTCF descriptor fusion module to leverage multi-cue synergy for improved robustness and discriminability.

  35. Moondream Segmentation: From Words to Masks

    cs.CV 2026-04 unverdicted novelty 6.0

    Moondream Segmentation achieves 80.2% cIoU on RefCOCO by autoregressively decoding paths from referring expressions and using RL to refine masks, plus releases a cleaned RefCOCO-M dataset.

  36. DeepSeek-OCR: Contexts Optical Compression

    cs.CV 2025-10 unverdicted novelty 6.0

    DeepSeek-OCR compresses text contexts up to 20x via 2D optical mapping while achieving 97% OCR accuracy below 10x and 60% at 20x, outperforming prior OCR tools with fewer vision tokens.

  37. InternVLA-M1: A Spatially Guided Vision-Language-Action Framework for Generalist Robot Policy

    cs.RO 2025-10 unverdicted novelty 6.0

    InternVLA-M1 uses spatially guided pre-training on 2.3M examples followed by action post-training to deliver up to 17% gains on robot manipulation benchmarks and 20.6% on unseen objects.

  38. DROID: A Large-Scale In-The-Wild Robot Manipulation Dataset

    cs.RO 2024-03 accept novelty 6.0

    DROID is a new 76k-trajectory in-the-wild robot manipulation dataset spanning 564 scenes and 84 tasks that improves policy performance and generalization when used for training.

  39. Grounded SAM: Assembling Open-World Models for Diverse Visual Tasks

    cs.CV 2024-01 unverdicted novelty 6.0

    Grounded SAM integrates Grounding DINO and SAM to support text-prompted open-world detection and segmentation, achieving 48.7 mean AP on SegInW zero-shot with the base detector and huge segmenter.

  40. ShareGPT4V: Improving Large Multi-Modal Models with Better Captions

    cs.CV 2023-11 conditional novelty 6.0

    A new 1.2M-caption dataset generated via GPT-4V improves LMMs on MME and MMBench by 222.8/22.0/22.3 and 2.7/1.3/1.5 points respectively when used for supervised fine-tuning.

  41. TD-MPC2: Scalable, Robust World Models for Continuous Control

    cs.LG 2023-10 conditional novelty 6.0

    TD-MPC2 scales an implicit world-model RL method to a 317M-parameter agent that masters 80 tasks across four domains with a single hyperparameter configuration.

  42. CrackMorph-XAI-Net: A Topology-Preserving and Explainable Framework for Automated Crack Morphology

    math.GM 2026-05 unverdicted novelty 5.0

    CrackMorph-XAI-Net extracts crack skeletons with Dice 0.991 and topology preservation in 98.5% of cases, detects junctions with F1 0.887, and computes morphology descriptors with correlations above 0.95 on an extended...

  43. Large-Scale High-Quality 3D Gaussian Head Reconstruction from Multi-View Captures

    cs.CV 2026-05 unverdicted novelty 5.0

    Pith review generated a malformed one-line summary.

  44. FUS3DMaps: Scalable and Accurate Open-Vocabulary Semantic Mapping by 3D Fusion of Voxel- and Instance-Level Layers

    cs.RO 2026-05 unverdicted novelty 5.0

    FUS3DMaps fuses voxel- and instance-level open-vocabulary layers inside a shared 3D voxel map to improve both layers and enable scalable accurate semantic mapping.

  45. CoAX: Cognitive-Oriented Attribution eXplanation User Model of Human Understanding of AI Explanations

    cs.AI 2026-04 unverdicted novelty 5.0

    Cognitive models of user reasoning strategies with XAI methods on tabular data fit human forward-simulation decisions better than ML baselines and support hypothesis testing without new user studies.

  46. Semantic Foam: Unifying Spatial and Semantic Scene Decomposition

    cs.CV 2026-04 unverdicted novelty 5.0

    Semantic Foam unifies spatial Voronoi decomposition with cell-level semantic features to achieve superior object segmentation by enabling direct spatial regularization that avoids occlusion and view-inconsistency artifacts.

  47. DiffuSAM: Diffusion Guided Zero-Shot Object Grounding for Remote Sensing Imagery

    cs.CV 2026-04 unverdicted novelty 5.0

    DiffuSAM fuses diffusion-based localization cues with SAM models to deliver over 14% higher Acc@0.5 in zero-shot object grounding for remote sensing imagery compared to prior methods.

  48. SGP-SAM: Self-Gated Prompting for Transferring 3D Segment Anything Models to Lesion Segmentation

    cs.CV 2026-04 unverdicted novelty 5.0

    SGP-SAM transfers 3D SAM to lesion segmentation using a self-gated module for conditional multi-scale enhancement and a Zoom Loss, achieving 7.3% mDice gain over fine-tuning on MSD Liver Tumor data.

  49. STEP-Parts: Geometric Partitioning of Boundary Representations for Large-Scale CAD Processing

    cs.GR 2026-04 unverdicted novelty 5.0

    STEP-Parts produces tessellation-robust geometric part labels from STEP B-Reps by deterministic merging of same-primitive faces, enabling consistent supervision on 180k+ models.

  50. MyoVision: A Mobile Research Tool and NEATBoost-Attention Ensemble Framework for Real Time Chicken Breast Myopathy Detection

    cs.LG 2026-04 unverdicted novelty 5.0

    Smartphone transillumination imaging paired with a neuroevolution-tuned ensemble model classifies chicken breast myopathies at 82.4% accuracy on 336 fillets, matching costly hyperspectral systems.

  51. Robotic Nanoparticle Synthesis via Solution-based Processes

    cs.RO 2026-04 unverdicted novelty 5.0

    Screw-based motion planning extracted from single demonstrations enables robots to autonomously execute long-horizon nanoparticle synthesis protocols.

  52. FF3R: Feedforward Feature 3D Reconstruction from Unconstrained views

    cs.CV 2026-04 unverdicted novelty 5.0

    FF3R unifies geometric and semantic 3D reconstruction in a single annotation-free feed-forward network trained solely via RGB and feature rendering supervision.

  53. F3G-Avatar : Face Focused Full-body Gaussian Avatar

    cs.CV 2026-04 unverdicted novelty 5.0

    F3G-Avatar improves full-body Gaussian avatars by adding a dedicated face-focused deformation branch to better preserve facial geometry and expressions from multi-view RGB video.

  54. Edge Deep Learning in Computer Vision and Medical Diagnostics: A Comprehensive Survey

    cs.CV 2026-05 unverdicted novelty 4.0

    A comprehensive survey of edge deep learning in computer vision and medical diagnostics that presents a novel categorization of hardware platforms by performance and usage scenarios.

  55. Label-Efficient School Detection from Aerial Imagery via Weakly Supervised Pretraining and Fine-Tuning

    cs.CV 2026-05 unverdicted novelty 4.0

    A two-stage weakly supervised pipeline pretrains on auto-generated school labels from sparse points and fine-tunes on only 50 manual examples to achieve strong detection performance in aerial imagery.

  56. AMO-ENE: Attention-based Multi-Omics Fusion Model for Outcome Prediction in Extra Nodal Extension and HPV-associated Oropharyngeal Cancer

    eess.IV 2026-04 unverdicted novelty 4.0

    An attention-based fusion model combining semi-supervised CT segmentation, radiomics, and clinical features predicts metastatic recurrence, overall survival, and disease-free survival in HPV+ oropharyngeal cancer with...

  57. DeepSeek-VL: Towards Real-World Vision-Language Understanding

    cs.AI 2024-03 unverdicted novelty 4.0

    DeepSeek-VL develops open-source 1.3B and 7B vision-language models that achieve competitive or state-of-the-art results on real-world visual-language benchmarks through diverse data curation, a hybrid vision encoder,...

  58. Semantic-Fast-SAM: Efficient Semantic Segmenter

    cs.CV 2026-04 unverdicted novelty 3.0

    Semantic-Fast-SAM matches prior SAM-based semantic segmentation accuracy on Cityscapes and ADE20K while running about 20 times faster by combining FastSAM with SSA labeling and CLIP for open-vocabulary cases.

Reference graph

Works this paper leans on

196 extracted references · 196 canonical work pages · cited by 55 Pith papers · 9 internal anchors

  1. [1]

    On seeing stuff: the perception of materials by humans and machines

    Edward H Adelson. On seeing stuff: the perception of materials by humans and machines. Human vision and electronic imaging VI ,

  2. [2]

    What is an object? CVPR, 2010

    Bogdan Alexe, Thomas Deselaers, and Vittorio Ferrari. What is an object? CVPR, 2010. 4, 10

  3. [3]

    Contour detection and hierarchical image segmentation

    Pablo Arbel ´aez, Michael Maire, Charless Fowlkes, and Jitendra Malik. Contour detection and hierarchical image segmentation. TPAMI, 2010. 4, 10, 21, 28

  4. [4]

    Layer Normalization

    Jimmy Lei Ba, Jamie Ryan Kiros, and Geoffrey E Hinton. Layer normalization. arXiv:1607.06450, 2016. 16

  5. [5]

    BEiT: BERT Pre-Training of Image Transformers

    Hangbo Bao, Li Dong, and Furu Wei. BEiT: BERT pre-training of image transformers. arXiv:2106.08254, 2021. 17

  6. [6]

    ZeroWaste dataset: Towards deformable object segmentation in cluttered scenes

    Dina Bashkirova, Mohamed Abdelfattah, Ziliang Zhu, James Akl, Fadi Alladkani, Ping Hu, Vitaly Ablavsky, Berk Calli, Sarah Adel Bargal, and Kate Saenko. ZeroWaste dataset: Towards deformable object segmentation in cluttered scenes. CVPR, 2022. 9, 20

  7. [7]

    Straehle, Bernhard X

    Stuart Berg, Dominik Kutra, Thorben Kroeger, Christoph N. Straehle, Bernhard X. Kausler, Carsten Haubold, Martin Schiegg, Janez Ales, Thorsten Beier, Markus Rudy, Kemal Eren, Jaime I. Cervantes, Buote Xu, Fynn Beuttenmueller, Adrian Wolny, Chong Zhang, Ullrich Koethe, Fred A. Hamprecht, and Anna Kreshuk. ilastik: interactive machine learning for (bio)imag...

  8. [8]

    On the Opportunities and Risks of Foundation Models

    Rishi Bommasani, Drew A Hudson, Ehsan Adeli, Russ Altman, Simran Arora, Sydney von Arx, Michael S Bernstein, Jeannette Bohg, Antoine Bosselut, Emma Brunskill, et al. On the opportu- nities and risks of foundation models. arXiv:2108.07258, 2021. 1, 12

  9. [9]

    Iterative interaction training for segmentation editing networks

    Gustav Bredell, Christine Tanner, and Ender Konukoglu. Iterative interaction training for segmentation editing networks. MICCAI,

  10. [10]

    Language models are few-shot learners

    Tom Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared D Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, Sandhini Agarwal, Ariel Herbert-V oss, Gretchen Krueger, Tom Henighan, Rewon Child, Aditya Ramesh, Daniel Ziegler, Jeffrey Wu, Clemens Winter, Chris Hesse, Mark Chen, Eric Sigler, Mateusz Litwin, Scott G...

  11. [11]

    Cascade R-CNN: Delving into high quality object detection

    Zhaowei Cai and Nuno Vasconcelos. Cascade R-CNN: Delving into high quality object detection. CVPR, 2018. 10

  12. [12]

    Caicedo, Allen Goodman, Kyle W

    Juan C. Caicedo, Allen Goodman, Kyle W. Karhohs, Beth A. Ci- mini, Jeanelle Ackerman, Marzieh Haghighi, CherKeng Heng, Tim Becker, Minh Doan, Claire McQuin, Mohammad Rohban, Shan- tanu Singh, and Anne E. Carpenter. Nucleus segmentation across imaging experiments: the 2018 data science bowl. Nature Methods,

  13. [13]

    A computational approach to edge detection

    John Canny. A computational approach to edge detection. TPAMI,

  14. [14]

    End-to-end object detection with Transformers

    Nicolas Carion, Francisco Massa, Gabriel Synnaeve, Nicolas Usunier, Alexander Kirillov, and Sergey Zagoruyko. End-to-end object detection with Transformers. ECCV, 2020. 5, 16, 17

  15. [15]

    Automatic image colorization via multimodal predictions

    Guillaume Charpiat, Matthias Hofmann, and Bernhard Sch ¨olkopf. Automatic image colorization via multimodal predictions. ECCV,

  16. [16]

    Object-proposal evaluation protocol is’ gameable’

    Neelima Chavali, Harsh Agrawal, Aroma Mahendru, and Dhruv Batra. Object-proposal evaluation protocol is’ gameable’. CVPR,

  17. [17]

    3D instance segmentation of MVS buildings

    Jiazhou Chen, Yanghui Xu, Shufang Lu, Ronghua Liang, and Lian- gliang Nan. 3D instance segmentation of MVS buildings. IEEE Transactions on Geoscience and Remote Sensing, 2022. 9, 19, 20, 23, 24

  18. [18]

    FocalClick: towards practical interactive image segmentation

    Xi Chen, Zhiyan Zhao, Yilei Zhang, Manni Duan, Donglian Qi, and Hengshuang Zhao. FocalClick: towards practical interactive image segmentation. CVPR, 2022. 8, 9, 12, 19

  19. [19]

    Masked-attention mask transformer for universal image segmentation

    Bowen Cheng, Ishan Misra, Alexander G Schwing, Alexander Kir- illov, and Rohit Girdhar. Masked-attention mask transformer for universal image segmentation. CVPR, 2022. 4

  20. [20]

    Per- pixel classification is not all you need for semantic segmentation

    Bowen Cheng, Alex Schwing, and Alexander Kirillov. Per- pixel classification is not all you need for semantic segmentation. NeurIPS, 2021. 5, 16, 17

  21. [21]

    PaLM: Scaling Language Modeling with Pathways

    Aakanksha Chowdhery, Sharan Narang, Jacob Devlin, Maarten Bosma, Gaurav Mishra, Adam Roberts, Paul Barham, Hyung Won Chung, Charles Sutton, Sebastian Gehrmann, et al. PaLM: Scaling language modeling with pathways. arXiv:2204.02311, 2022. 1

  22. [22]

    Domain adaptation for traffic density estimation

    Luca Ciampi, Carlos Santiago, Joao Costeira, Claudio Gennaro, and Giuseppe Amato. Domain adaptation for traffic density estimation. International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications, 2021. 9, 20

  23. [23]

    Luca Ciampi, Carlos Santiago, Joao Costeira, Claudio Gennaro, and Giuseppe Amato. Night and day instance segmented park (NDIS- Park) dataset: a collection of images taken by day and by night for vehicle detection, segmentation and counting in parking areas.Zen- odo, 2022. 9, 20

  24. [24]

    Semantic segmen- tation in art paintings

    Nadav Cohen, Yael Newman, and Ariel Shamir. Semantic segmen- tation in art paintings. Computer Graphics Forum, 2022. 9, 19, 20, 23, 24

  25. [25]

    The Cityscapes dataset for semantic urban scene understanding

    Marius Cordts, Mohamed Omran, Sebastian Ramos, Timo Rehfeld, Markus Enzweiler, Rodrigo Benenson, Uwe Franke, Stefan Roth, and Bernt Schiele. The Cityscapes dataset for semantic urban scene understanding. CVPR, 2016. 9, 19, 20

  26. [26]

    Learning parameterized skills

    Bruno da Silva, George Konidaris, and Andrew Barto. Learning parameterized skills. ICML, 2012. 4

  27. [27]

    Rescaling egocentric vision: Collection, pipeline and challenges for EPIC- KITCHENS-100

    Dima Damen, Hazel Doughty, Giovanni Maria Farinella, Antonino Furnari, Jian Ma, Evangelos Kazakos, Davide Moltisanti, Jonathan Munro, Toby Perrett, Will Price, and Michael Wray. Rescaling egocentric vision: Collection, pipeline and challenges for EPIC- KITCHENS-100. IJCV, 2022. 9, 20, 23, 24

  28. [28]

    EPIC-KITCHENS VISOR benchmark: Video segmentations and object relations

    Ahmad Darkhalil, Dandan Shan, Bin Zhu, Jian Ma, Amlan Kar, Richard Higgins, Sanja Fidler, David Fouhey, and Dima Damen. EPIC-KITCHENS VISOR benchmark: Video segmentations and object relations. NeurIPS, 2022. 9, 19, 20, 23, 24

  29. [29]

    Does object recognition work for everyone?CVPR workshops, 2019

    Terrance De Vries, Ishan Misra, Changhan Wang, and Laurens Van der Maaten. Does object recognition work for everyone?CVPR workshops, 2019. 18

  30. [30]

    Crowd- WorkSheets: Accounting for individual and collective identities un- derlying crowdsourced dataset annotation

    Mark D ´ıaz, Ian Kivlichan, Rachel Rosen, Dylan Baker, Razvan Amironesei, Vinodkumar Prabhakaran, and Emily Denton. Crowd- WorkSheets: Accounting for individual and collective identities un- derlying crowdsourced dataset annotation. ACM Conference on Fairness, Accountability, and Transparency, 2022. 25

  31. [31]

    PhraseClick: toward achieving flexible interactive segmentation by phrase and click

    Henghui Ding, Scott Cohen, Brian Price, and Xudong Jiang. PhraseClick: toward achieving flexible interactive segmentation by phrase and click. ECCV, 2020. 11

  32. [32]

    Fast edge detection using structured forests

    Piotr Doll ´ar and C Lawrence Zitnick. Fast edge detection using structured forests. TPAMI, 2014. 21

  33. [33]

    An image is worth 16x16 words: Transformers for image recognition at scale

    Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner, Mostafa De- hghani, Matthias Minderer, Georg Heigold, Sylvain Gelly, Jakob Uszkoreit, and Neil Houlsby. An image is worth 16x16 words: Transformers for image recognition at scale. ICLR, 2021. 5, 8, 16

  34. [34]

    Alireza Fathi, Xiaofeng Ren, and James M. Rehg. Learning to rec- ognize objects in egocentric activities. CVPR, 2011. 9, 19, 20

  35. [35]

    Efficient graph- based image segmentation

    Pedro F Felzenszwalb and Daniel P Huttenlocher. Efficient graph- based image segmentation. IJCV, 2004. 10

  36. [36]

    Fitzpatrick

    Thomas B. Fitzpatrick. The validity and practicality of sun-reactive skin types i through vi. Archives of Dermatology, 1988. 8

  37. [37]

    Getting to 99% accuracy in interactive segmentation

    Marco Forte, Brian Price, Scott Cohen, Ning Xu, and Franc ¸ois Piti´e. Getting to 99% accuracy in interactive segmentation. arXiv:2003.07932, 2020. 5, 17

  38. [38]

    Instance segmentation for au- tonomous log grasping in forestry operations

    Jean-Michel Fortin, Olivier Gamache, Vincent Grondin, Franc ¸ois Pomerleau, and Philippe Gigu `ere. Instance segmentation for au- tonomous log grasping in forestry operations. IROS, 2022. 9, 20 13

  39. [39]

    Datasheets for datasets

    Timnit Gebru, Jamie Morgenstern, Briana Vecchione, Jen- nifer Wortman Vaughan, Hanna Wallach, Hal Daum´e Iii, and Kate Crawford. Datasheets for datasets. Communications of the ACM ,

  40. [40]

    Simple copy-paste is a strong data augmentation method for instance segmentation.CVPR,

    Golnaz Ghiasi, Yin Cui, Aravind Srinivas, Rui Qian, Tsung-Yi Lin, Ekin D Cubuk, Quoc V Le, and Barret Zoph. Simple copy-paste is a strong data augmentation method for instance segmentation.CVPR,

  41. [41]

    Rich feature hierarchies for accurate object detection and semantic segmentation

    Ross Girshick, Jeff Donahue, Trevor Darrell, and Jitendra Malik. Rich feature hierarchies for accurate object detection and semantic segmentation. CVPR, 2014. 10

  42. [42]

    Accurate, Large Minibatch SGD: Training ImageNet in 1 Hour

    Priya Goyal, Piotr Doll ´ar, Ross Girshick, Pieter Noordhuis, Lukasz Wesolowski, Aapo Kyrola, Andrew Tulloch, Yangqing Jia, and Kaiming He. Accurate, large minibatch SGD: Training ImageNet in 1 hour. arXiv:1706.02677, 2017. 17

  43. [43]

    Kristen Grauman, Andrew Westbury, Eugene Byrne, Zachary Chavis, Antonino Furnari, Rohit Girdhar, Jackson Hamburger, Hao Jiang, Miao Liu, Xingyu Liu, Miguel Martin, Tushar Na- garajan, Ilija Radosavovic, Santhosh Kumar Ramakrishnan, Fiona Ryan, Jayant Sharma, Michael Wray, Mengmeng Xu, Eric Zhong- cong Xu, Chen Zhao, Siddhant Bansal, Dhruv Batra, Vincent C...

  44. [44]

    LVIS: A dataset for large vocabulary instance segmentation

    Agrim Gupta, Piotr Dollar, and Ross Girshick. LVIS: A dataset for large vocabulary instance segmentation. CVPR, 2019. 2, 6, 7, 9, 10, 11, 19, 20, 21, 24

  45. [45]

    Multiple choice learning: Learning to produce multiple structured outputs

    Abner Guzman-Rivera, Dhruv Batra, and Pushmeet Kohli. Multiple choice learning: Learning to produce multiple structured outputs. NeurIPS, 2012. 5, 17

  46. [46]

    K ¨uhl, and V olker Steinhage

    Timm Haucke, Hjalmar S. K ¨uhl, and V olker Steinhage. SOCRATES: Introducing depth in visual wildlife monitoring using stereo vision. Sensors, 2022. 9, 20

  47. [47]

    Masked autoencoders are scalable vision learn- ers

    Kaiming He, Xinlei Chen, Saining Xie, Yanghao Li, Piotr Doll ´ar, and Ross Girshick. Masked autoencoders are scalable vision learn- ers. CVPR, 2022. 5, 8, 12, 16, 17

  48. [48]

    Mask R-CNN

    Kaiming He, Georgia Gkioxari, Piotr Doll ´ar, and Ross Girshick. Mask R-CNN. ICCV, 2017. 10

  49. [49]

    Deep residual learning for image recognition

    Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. CVPR, 2016. 16

  50. [50]

    Gaussian Error Linear Units (GELUs)

    Dan Hendrycks and Kevin Gimpel. Gaussian error linear units (gelus). arXiv:1606.08415, 2016. 16

  51. [51]

    Training Compute-Optimal Large Language Models

    Jordan Hoffmann, Sebastian Borgeaud, Arthur Mensch, Elena Buchatskaya, Trevor Cai, Eliza Rutherford, Diego de Las Casas, Lisa Anne Hendricks, Johannes Welbl, Aidan Clark, et al. Training compute-optimal large language models. arXiv:2203.15556, 2022. 1

  52. [52]

    TrashCan: A semantically-segmented dataset towards visual detection of marine debris

    Jungseok Hong, Michael Fulton, and Junaed Sattar. TrashCan: A semantically-segmented dataset towards visual detection of marine debris. arXiv:2007.08097, 2020. 9, 19, 20

  53. [53]

    Deep networks with stochastic depth

    Gao Huang, Yu Sun, Zhuang Liu, Daniel Sedra, and Kilian Q Wein- berger. Deep networks with stochastic depth. ECCV, 2016. 17

  54. [54]

    Oneformer: One transformer to rule universal image segmentation

    Jitesh Jain, Jiachen Li, MangTik Chiu, Ali Hassani, Nikita Orlov, and Humphrey Shi. Oneformer: One transformer to rule universal image segmentation. arXiv:2211.06220, 2022. 4

  55. [55]

    Scaling up visual and vision-language representation learning with noisy text supervision

    Chao Jia, Yinfei Yang, Ye Xia, Yi-Ting Chen, Zarana Parekh, Hieu Pham, Quoc Le, Yun-Hsuan Sung, Zhen Li, and Tom Duerig. Scaling up visual and vision-language representation learning with noisy text supervision. ICML, 2021. 1

  56. [56]

    Scaling Laws for Neural Language Models

    Jared Kaplan, Sam McCandlish, Tom Henighan, Tom B Brown, Benjamin Chess, Rewon Child, Scott Gray, Alec Radford, Jeffrey Wu, and Dario Amodei. Scaling laws for neural language models. arXiv:2001.08361, 2020. 1

  57. [57]

    Snakes: Active contour models

    Michael Kass, Andrew Witkin, and Demetri Terzopoulos. Snakes: Active contour models. IJCV, 1988. 4

  58. [58]

    Learning open-world object proposals without learning to classify

    Dahun Kim, Tsung-Yi Lin, Anelia Angelova, In So Kweon, and Weicheng Kuo. Learning open-world object proposals without learning to classify. IEEE Robotics and Automation Letters, 2022. 21

  59. [59]

    Panoptic segmentation

    Alexander Kirillov, Kaiming He, Ross Girshick, Carsten Rother, and Piotr Doll´ar. Panoptic segmentation. CVPR, 2019. 4

  60. [60]

    The open images dataset v4: Unified image classification, object detection, and visual relationship detection at scale

    Alina Kuznetsova, Hassan Rom, Neil Alldrin, Jasper Uijlings, Ivan Krasin, Jordi Pont-Tuset, Shahab Kamali, Stefan Popov, Matteo Malloci, Alexander Kolesnikov, Tom Duerig, and Vittorio Ferrari. The open images dataset v4: Unified image classification, object detection, and visual relationship detection at scale. IJCV, 2020. 2, 6, 7, 18, 19

  61. [61]

    Quantifying the carbon emissions of machine learning

    Alexandre Lacoste, Alexandra Luccioni, Victor Schmidt, and Thomas Dandres. Quantifying the carbon emissions of machine learning. arXiv:1910.09700, 2019. 28

  62. [62]

    Explor- ing plain vision transformer backbones for object detection

    Yanghao Li, Hanzi Mao, Ross Girshick, and Kaiming He. Explor- ing plain vision transformer backbones for object detection. ECCV,

  63. [63]

    5, 10, 11, 16, 21, 23, 24

  64. [64]

    Yin Li, Zhefan Ye, and James M. Rehg. Delving into egocentric actions. CVPR, 2015. 9, 20

  65. [65]

    Interactive image segmentation with latent diversity

    Zhuwen Li, Qifeng Chen, and Vladlen Koltun. Interactive image segmentation with latent diversity. CVPR, 2018. 5, 17, 19

  66. [66]

    Focal loss for dense object detection

    Tsung-Yi Lin, Priya Goyal, Ross Girshick, Kaiming He, and Piotr Doll´ar. Focal loss for dense object detection. ICCV, 2017. 5, 17

  67. [67]

    Mi- crosoft COCO: Common objects in context

    Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Doll´ar, and C Lawrence Zitnick. Mi- crosoft COCO: Common objects in context. ECCV, 2014. 2, 4, 6, 7, 11, 18, 19, 20

  68. [68]

    Sim- pleClick: Interactive image segmentation with simple vision trans- formers

    Qin Liu, Zhenlin Xu, Gedas Bertasius, and Marc Niethammer. Sim- pleClick: Interactive image segmentation with simple vision trans- formers. arXiv:2210.11006, 2022. 8, 9, 12, 19

  69. [69]

    Decoupled weight decay regu- larization

    Ilya Loshchilov and Frank Hutter. Decoupled weight decay regu- larization. ICLR, 2019. 17

  70. [70]

    Gelatinous zoo- plankton biomass in the global oceans: geographic variation and environmental drivers

    Cathy H Lucas, Daniel OB Jones, Catherine J Hollyhead, Robert H Condon, Carlos M Duarte, William M Graham, Kelly L Robinson, Kylie A Pitt, Mark Schildhauer, and Jim Regetz. Gelatinous zoo- plankton biomass in the global oceans: geographic variation and environmental drivers. Global Ecology and Biogeography , 2014. 20

  71. [71]

    Iter- atively trained interactive segmentation

    Sabarinath Mahadevan, Paul V oigtlaender, and Bastian Leibe. Iter- atively trained interactive segmentation. BMVC, 2018. 4, 17

  72. [72]

    Deep extreme cut: From extreme points to object seg- mentation

    Kevis-Kokitsi Maninis, Sergi Caelles, Jordi Pont-Tuset, and Luc Van Gool. Deep extreme cut: From extreme points to object seg- mentation. CVPR, 2018. 6

  73. [73]

    A database of human segmented natural images and its applica- tion to evaluating segmentation algorithms and measuring ecologi- cal statistics

    David Martin, Charless Fowlkes, Doron Tal, and Jitendra Malik. A database of human segmented natural images and its applica- tion to evaluating segmentation algorithms and measuring ecologi- cal statistics. ICCV, 2001. 10, 21, 28

  74. [74]

    V-Net: Fully convolutional neural networks for volumetric medical image segmentation

    Fausto Milletari, Nassir Navab, and Seyed-Ahmad Ahmadi. V-Net: Fully convolutional neural networks for volumetric medical image segmentation. 3DV, 2016. 5, 17

  75. [75]

    Tsaftaris

    Massimo Minervini, Andreas Fischbach, Hanno Scharr, and Sotirios A. Tsaftaris. Finely-grained annotated datasets for image- based plant phenotyping. Pattern Recognition Letters, 2016. 9, 20

  76. [76]

    Model cards for model reporting

    Margaret Mitchell, Simone Wu, Andrew Zaldivar, Parker Barnes, Lucy Vasserman, Ben Hutchinson, Elena Spitzer, Inioluwa Debo- rah Raji, and Timnit Gebru. Model cards for model reporting. Pro- ceedings of the conference on fairness, accountability, and trans- parency, 2019. 25, 28 14

  77. [77]

    Extreme clicking for efficient object annotation

    Dim P Papadopoulos, Jasper RR Uijlings, Frank Keller, and Vittorio Ferrari. Extreme clicking for efficient object annotation. ICCV,

  78. [78]

    Carbon Emissions and Large Neural Network Training

    David Patterson, Joseph Gonzalez, Quoc Le, Chen Liang, Lluis- Miquel Munguia, Daniel Rothchild, David So, Maud Texier, and Jeff Dean. Carbon emissions and large neural network training. arXiv:2104.10350, 2021. 28

  79. [79]

    Semi-supervised sequence tagging with bidirectional language models

    Matthew E Peters, Waleed Ammar, Chandra Bhagavatula, and Rus- sell Power. Semi-supervised sequence tagging with bidirectional language models. Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics, 2017. 18

  80. [80]

    EDTER: Edge detection with transformer

    Mengyang Pu, Yaping Huang, Yuming Liu, Qingji Guan, and Haibin Ling. EDTER: Edge detection with transformer. CVPR,

Showing first 80 references.