CrowdHuman: A Benchmark for Detecting Human in a Crowd

Shuai Shao , Zijian Zhao , Boxun Li , Tete Xiao , Gang Yu , Xiangyu Zhang , Jian Sun

Authors on Pith no claims yet

classification 💻 cs.CV

keywords humandatasetcrowdhumandetectionbounding-boxcrowdbaselinedetecting

read the original abstract

Human detection has witnessed impressive progress in recent years. However, the occlusion issue of detecting human in highly crowded environments is far from solved. To make matters worse, crowd scenarios are still under-represented in current human detection benchmarks. In this paper, we introduce a new dataset, called CrowdHuman, to better evaluate detectors in crowd scenarios. The CrowdHuman dataset is large, rich-annotated and contains high diversity. There are a total of $470K$ human instances from the train and validation subsets, and $~22.6$ persons per image, with various kinds of occlusions in the dataset. Each human instance is annotated with a head bounding-box, human visible-region bounding-box and human full-body bounding-box. Baseline performance of state-of-the-art detection frameworks on CrowdHuman is presented. The cross-dataset generalization results of CrowdHuman dataset demonstrate state-of-the-art performance on previous dataset including Caltech-USA, CityPersons, and Brainwash without bells and whistles. We hope our dataset will serve as a solid baseline and help promote future research in human detection tasks.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 5 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

DETR-ViP: Detection Transformer with Robust Discriminative Visual Prompts
cs.CV 2026-04 unverdicted novelty 7.0

DETR-ViP boosts visual-prompted detection performance by learning globally discriminative prompts through integration and distillation on top of image-text contrastive learning, with a selective fusion step for stability.
Generalized Small Object Detection:A Point-Prompted Paradigm and Benchmark
cs.CV 2026-04 unverdicted novelty 7.0

TinySet-9M dataset and DEAL point-prompted framework deliver 31.4% relative AP75 gain over supervised baselines for small object detection with one click at inference and generalization to unseen categories.
Robust Grounding with MLLMs Against Occlusion and Small Objects via Language-Guided Semantic Cues
cs.CV 2026-04 unverdicted novelty 6.0

Language-guided semantic cues from MLLM visual pipelines, steered by text embeddings, refine object semantics and boost grounding accuracy against occlusion and small objects.
InsHuman: Towards Natural and Identity-Preserving Human Insertion
cs.CV 2026-05 unverdicted novelty 5.0

InsHuman proposes Human-Background Adaptive Fusion, Face-to-Face ID-Preserving, and Bidirectional Data Pairing to enable natural human insertion in images without altering identity.
Attention Is not Everything: Efficient Alternatives for Vision
cs.CV 2026-04 unverdicted novelty 3.0

A survey that taxonomizes non-Transformer vision models and evaluates their practical trade-offs across efficiency, scalability, and robustness.