AutoMedBench evaluates AI agents on long-horizon medical workflows across five stages and finds validation and submission as dominant failure points based on thousands of runs.
The pascal visual object classes (voc) challenge.International Journal of Computer Vision, 88:303–338, 06 2010
14 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
roles
dataset 3polarities
use dataset 3representative citing papers
BOOKMARKS introduces searchable bookmarks as reusable answers to storyline questions, enabling active initialization and passive synchronization for more consistent role-playing agent memory than recurrent summarization.
Introduces the Dota2-Vis dataset of 288 videos from 144 TI 2025 matches plus 2,477 annotated minimaps and evaluates YOLO11 variants for player-icon detection to produce visibility curves.
Introduces a benchmark dataset for data snapshot extraction focused on semantically meaningful analytical artifacts in institutional documents and shows open-source layout models struggle to generalize from academic benchmarks.
WildRoadBench is a new dual-track benchmark on professionally annotated wild UAV road-damage images showing closed-source VLMs lead but leave over half the AP_50 metric on the table while agents lag and open-source models collapse on small targets.
GAZE framework with viewer tools and literature retrieval achieves 58.2 mAP@0.3 lesion localization and 34.9% top-1 diagnostic accuracy on 906 rare brain MRI cases in zero-shot setting, with larger gains on rarest pathologies.
A variational latent bottleneck with KL regularization and a dynamic binary mask based on saliency produces model-specific features that keep high accuracy for one classifier but drop others below 2% on CIFAR-100 with over 45x suppression.
MuRF fuses multi-resolution features from frozen vision foundation models at inference time to create stronger representations without any training.
FSS-TIs models cross-domain few-shot segmentation as an ODE process with Fourier-based spectral perturbations to create domain-agnostic features and enable effective fine-tuning on limited support samples.
MoEIoU is a mixture-of-experts IoU loss using log-sum-exp aggregation and curriculum weighting that reports consistent gains over prior IoU losses on PASCAL VOC, HRIPCB, and MS COCO with YOLO models.
XiYOLO uses iterative energy-aware neural architecture search and scaling to produce object detectors with stronger accuracy-energy tradeoffs than YOLO baselines on GPUs and NPUs.
GarmNet jointly localizes garments and detects grasp landmarks on the CloPeMa dataset, reducing localization error by 24.7% when landmark detection is included.
A survey that organizes methods for cross-domain object detection into a taxonomy, analyzes domain shift across detection stages, and outlines persistent challenges.
citing papers explorer
-
AutoMedBench: Towards Medical AutoResearch with Agentic AI Models
AutoMedBench evaluates AI agents on long-horizon medical workflows across five stages and finds validation and submission as dominant failure points based on thousands of runs.
-
BOOKMARKS: Efficient Active Storyline Memory for Role-playing
BOOKMARKS introduces searchable bookmarks as reusable answers to storyline questions, enabling active initialization and passive synchronization for more consistent role-playing agent memory than recurrent summarization.
-
Computer Vision for MOBA Analytics: A Dataset and Baseline for Visibility Analysis in Dota 2
Introduces the Dota2-Vis dataset of 288 videos from 144 TI 2025 matches plus 2,477 annotated minimaps and evaluates YOLO11 variants for player-icon detection to produce visibility curves.
-
Benchmarking Open-Source Layout Detection Models for Data Snapshot Extraction from Institutional Documents
Introduces a benchmark dataset for data snapshot extraction focused on semantically meaningful analytical artifacts in institutional documents and shows open-source layout models struggle to generalize from academic benchmarks.
-
WildRoadBench: A Wild Aerial Road-Damage Grounding Benchmark for Vision-Language Models and Autonomous Agents
WildRoadBench is a new dual-track benchmark on professionally annotated wild UAV road-damage images showing closed-source VLMs lead but leave over half the AP_50 metric on the table while agents lag and open-source models collapse on small targets.
-
GAZE: Grounded Agentic Zero-shot Evaluation with Viewer-Level Tools and Literature Retrieval on Rare Brain MRI
GAZE framework with viewer tools and literature retrieval achieves 58.2 mAP@0.3 lesion localization and 34.9% top-1 diagnostic accuracy on 906 rare brain MRI cases in zero-shot setting, with larger gains on rarest pathologies.
-
Variational Feature Compression for Model-Specific Representations
A variational latent bottleneck with KL regularization and a dynamic binary mask based on saliency produces model-specific features that keep high accuracy for one classifier but drop others below 2% on CIFAR-100 with over 45x suppression.
-
MuRF: Unlocking the Multi-Scale Potential of Vision Foundation Models
MuRF fuses multi-resolution features from frozen vision foundation models at inference time to create stronger representations without any training.
-
Cross-Domain Few-Shot Segmentation via Ordinary Differential Equations over Time Intervals
FSS-TIs models cross-domain few-shot segmentation as an ODE process with Fourier-based spectral perturbations to create domain-agnostic features and enable effective fine-tuning on limited support samples.
-
MoEIoU: Rethinking Bounding-Box Regression as a Mixture of Experts
MoEIoU is a mixture-of-experts IoU loss using log-sum-exp aggregation and curriculum weighting that reports consistent gains over prior IoU losses on PASCAL VOC, HRIPCB, and MS COCO with YOLO models.
-
XiYOLO: Energy-Aware Object Detection via Iterative Architecture Search and Scaling
XiYOLO uses iterative energy-aware neural architecture search and scaling to produce object detectors with stronger accuracy-energy tradeoffs than YOLO baselines on GPUs and NPUs.
-
GarmNet: Improving Global with Local Perception for Robotic Laundry Folding
GarmNet jointly localizes garments and detects grasp landmarks on the CloPeMa dataset, reducing localization error by 24.7% when landmark detection is included.
-
Generalization Under Scrutiny: Cross-Domain Detection Progresses, Pitfalls, and Persistent Challenges
A survey that organizes methods for cross-domain object detection into a taxonomy, analyzes domain shift across detection stages, and outlines persistent challenges.
- PipeMFL-240K: A Large-scale Dataset and Benchmark for Object Detection in Pipeline Magnetic Flux Leakage Imaging