pith. machine review for the scientific record. sign in

arxiv: 2410.17725 · v1 · submitted 2024-10-23 · 💻 cs.CV

Recognition: 2 theorem links

· Lean Theorem

YOLOv11: An Overview of the Key Architectural Enhancements

Authors on Pith no claims yet

Pith reviewed 2026-05-12 13:22 UTC · model grok-4.3

classification 💻 cs.CV
keywords YOLOv11object detectionC3k2SPPFC2PSAcomputer visionarchitectural analysisfeature extraction
0
0 comments X

The pith

YOLOv11 adds C3k2 blocks, SPPF and C2PSA for better feature extraction and efficiency.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This paper analyzes the architecture of YOLOv11 and identifies the C3k2 block, SPPF, and C2PSA as the main innovations that support stronger feature extraction. It reviews how these changes help the model handle object detection, instance segmentation, pose estimation, and oriented bounding box tasks while keeping a favorable balance between parameter count and accuracy. The overview also covers the availability of model variants from nano to extra-large for use on edge devices up to high-performance systems. A sympathetic reader would care because the claimed gains could affect the practicality of real-time vision on varied hardware.

Core claim

The central claim is that the introduction of the C3k2 (Cross Stage Partial with kernel size 2) block, SPPF (Spatial Pyramid Pooling - Fast), and C2PSA (Convolutional block with Parallel Spatial Attention) components contributes to improving the model's performance in several ways such as enhanced feature extraction, while delivering better mean Average Precision and computational efficiency compared to predecessors.

What carries the argument

The C3k2 block, SPPF, and C2PSA components, which carry the argument by enabling enhanced feature extraction across detection, segmentation, and pose tasks.

If this is right

  • YOLOv11 achieves higher mean Average Precision than earlier YOLO versions at comparable or lower computational cost.
  • The model maintains a favorable trade-off between parameter count and accuracy.
  • Performance scales across nano to extra-large variants for use on edge hardware through to high-performance systems.
  • The same backbone supports object detection, instance segmentation, pose estimation, and oriented object detection.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • If the new blocks are the true source of gains, similar modules could be tested for transfer to other detection families.
  • Real-time applications on mobile or embedded devices may see larger practical speed-ups once the efficiency claims are measured on target hardware.
  • Independent verification on additional benchmarks beyond those reported would strengthen the case that the architectural additions drive the improvements.

Load-bearing premise

The listed architectural components are the main cause of observed performance gains rather than training procedures or dataset differences.

What would settle it

An ablation study that removes the C3k2, SPPF, and C2PSA modules from YOLOv11 and reports whether mean Average Precision on COCO drops by more than a few points while keeping other factors fixed.

read the original abstract

This study presents an architectural analysis of YOLOv11, the latest iteration in the YOLO (You Only Look Once) series of object detection models. We examine the models architectural innovations, including the introduction of the C3k2 (Cross Stage Partial with kernel size 2) block, SPPF (Spatial Pyramid Pooling - Fast), and C2PSA (Convolutional block with Parallel Spatial Attention) components, which contribute in improving the models performance in several ways such as enhanced feature extraction. The paper explores YOLOv11's expanded capabilities across various computer vision tasks, including object detection, instance segmentation, pose estimation, and oriented object detection (OBB). We review the model's performance improvements in terms of mean Average Precision (mAP) and computational efficiency compared to its predecessors, with a focus on the trade-off between parameter count and accuracy. Additionally, the study discusses YOLOv11's versatility across different model sizes, from nano to extra-large, catering to diverse application needs from edge devices to high-performance computing environments. Our research provides insights into YOLOv11's position within the broader landscape of object detection and its potential impact on real-time computer vision applications.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

0 major / 3 minor

Summary. The paper presents an architectural overview and analysis of YOLOv11, the latest in the YOLO series. It examines innovations including the C3k2 (Cross Stage Partial with kernel size 2) block, SPPF (Spatial Pyramid Pooling - Fast), and C2PSA (Convolutional block with Parallel Spatial Attention) components and their roles in enhanced feature extraction and performance. The manuscript reviews expanded capabilities across object detection, instance segmentation, pose estimation, and oriented object detection (OBB), along with mAP and computational efficiency gains relative to predecessors, the parameter-accuracy trade-off, and scalability across nano to extra-large model sizes for diverse hardware.

Significance. If the architectural descriptions accurately capture the model's design and reported metrics, the overview could serve as a convenient reference consolidating YOLOv11's key changes and application scope for the computer vision community. Its contribution is primarily synthetic rather than generative, with modest field impact absent new experiments, ablations, or independent verification.

minor comments (3)
  1. [Abstract] Abstract: repeated possessive errors such as 'the models architectural innovations' (should be 'the model's') and 'the models performance' (should be 'the model's performance') appear multiple times and reduce readability.
  2. [Abstract] Abstract: phrasing 'contribute in improving the models performance' is grammatically awkward and should be revised to 'contribute to improving the model's performance'.
  3. [Abstract] Abstract: the claim of reviewing 'performance improvements in terms of mean Average Precision (mAP) and computational efficiency' would benefit from explicit cross-references to any tables or figures containing the actual numerical comparisons, even if drawn from prior work.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for their review of our manuscript and the recommendation for minor revision. We address the key observation regarding the paper's synthetic nature below.

read point-by-point responses
  1. Referee: Its contribution is primarily synthetic rather than generative, with modest field impact absent new experiments, ablations, or independent verification.

    Authors: We agree that the manuscript is an overview paper whose primary contribution is synthetic: it consolidates and analyzes publicly available information on YOLOv11's architectural changes (C3k2, SPPF, C2PSA) and their effects on feature extraction, mAP, efficiency, and multi-task support. The work does not include new experiments, ablations, or independent verification because its scope is to provide a convenient reference for the community rather than to generate novel empirical results. We believe this type of paper still offers value by clearly documenting the design decisions and performance trade-offs across model scales. We are happy to incorporate any specific minor revisions the referee or editor may suggest for improved clarity, additional citations, or expanded discussion of limitations. revision: no

Circularity Check

0 steps flagged

No significant circularity; purely descriptive overview with no derivation chain

full rationale

The paper presents an architectural analysis and review of YOLOv11 without any equations, derivations, fitted parameters, predictions, or load-bearing self-citations. Its claims describe the roles of components such as C3k2, SPPF, and C2PSA in feature extraction and performance improvements based on the model's existing design and reported metrics from prior work. No step reduces by construction to the paper's own inputs, and the central content is self-contained descriptive summary rather than a causal or predictive argument requiring internal validation.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

The paper is a descriptive overview and introduces no new parameters, axioms, or entities.

pith-pipeline@v0.9.0 · 5505 in / 927 out tokens · 56197 ms · 2026-05-12T13:22:17.121055+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

  • IndisputableMonolith.Foundation.DimensionForcing dimension_forced unclear
    ?
    unclear

    Relation between the paper passage and the cited Recognition theorem.

    YOLOv11’s versatility across different model sizes, from nano to extra-large, catering to diverse application needs from edge devices to high-performance computing environments.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Forward citations

Cited by 34 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. PoseBridge: Bridging the Skeletonization Gap for Zero-Shot Skeleton-Based Action Recognition

    cs.CV 2026-05 unverdicted novelty 7.0

    PoseBridge recovers semantic information lost during skeletonization by extracting pose-anchored cues from human pose estimation and transferring them via skeleton-conditioned bridging and semantic prototype adaptatio...

  2. Fusion in Your Way: Aligning Image Fusion with Heterogeneous Demands via Direct Preference Optimization

    cs.CV 2026-05 unverdicted novelty 7.0

    DPOFusion uses direct preference optimization on property-aligned and preference-controllable latent diffusion models to produce adaptive infrared-visible image fusions aligned with heterogeneous human and machine vis...

  3. FluxShard: Motion-Aware Feature Cache Reuse for Collaborative Video Analytics in Mobile Edge Computing

    cs.NI 2026-05 unverdicted novelty 7.0

    FluxShard uses per-block motion vectors and a Receptive Field Alignment Principle to manage feature cache reuse in edge-cloud video analytics, delivering 32.6-83.8% lower latency and 14.9-64.0% lower energy than basel...

  4. Learning Where to Embed: Noise-Aware Positional Embedding for Query Retrieval in Small-Object Detection

    cs.CV 2026-04 unverdicted novelty 7.0

    HELP uses heatmap-guided positional embeddings and a gradient mask to suppress background noise in queries, enabling efficient small-object detection with fewer decoder layers and parameters.

  5. What and Where to Adapt: Structure-Semantics Co-Tuning for Machine Vision Compression via Synergistic Adapters

    cs.CV 2026-04 unverdicted novelty 7.0

    S2-CoT coordinates a Structural Fidelity Adapter in the encoder-decoder with a Semantic Context Adapter in the entropy model to convert potential performance loss into state-of-the-art gains across base codecs while u...

  6. GeoMMBench and GeoMMAgent: Toward Expert-Level Multimodal Intelligence in Geoscience and Remote Sensing

    cs.CV 2026-04 unverdicted novelty 7.0

    GeoMMBench reveals deficiencies in current multimodal LLMs for geoscience tasks while GeoMMAgent demonstrates that tool-integrated agents achieve significantly higher performance.

  7. SARES-DEIM: Sparse Mixture-of-Experts Meets DETR for Robust SAR Ship Detection

    cs.CV 2026-04 unverdicted novelty 7.0

    SARES-DEIM achieves 76.4% mAP50:95 and 93.8% mAP50 on HRSID by routing SAR features through sparse frequency and wavelet experts plus a high-resolution preservation neck, outperforming prior YOLO and SAR detectors.

  8. UniSpector: Towards Universal Open-set Defect Recognition via Spectral-Contrastive Visual Prompting

    cs.CV 2026-04 unverdicted novelty 7.0

    UniSpector organizes visual prompt space with spatial-spectral and contrastive encoders to support open-set defect localization, beating baselines by at least 19.7% AP50b and 15.8% AP50m on the new Inspect Anything benchmark.

  9. Contour-Native Bridge Defect Detection and Compact Digital Archiving with Frequency-Supervised Fourier Contours

    cs.CV 2026-05 unverdicted novelty 6.0

    FS-FSD regresses frequency-supervised Fourier contours for bridge defects, yielding higher polygon accuracy and better geometric quality than box, mask, or contour baselines on 3,767 UAV images with 42,346 instances.

  10. Look Once, Beam Twice: Camera-Primed Real-Time Double-Directional mmWave Beam Management for Vehicular Connectivity

    cs.NI 2026-05 unverdicted novelty 6.0

    VIBE is a camera-primed hybrid model-based closed-loop learning system for real-time double-directional mmWave beam management in vehicular networks that achieves outage rates as low as 1.1-1.4% and outperforms 5G NR ...

  11. Physical Adversarial Clothing Evades Visible-Thermal Detectors via Non-Overlapping RGB-T Pattern

    cs.CV 2026-05 unverdicted novelty 6.0

    Non-overlapping RGB-T adversarial patterns on clothing, optimized with spatial discrete-continuous optimization, achieve high attack success rates against multiple RGB-T detector fusion architectures in both digital a...

  12. Training-Free Tunnel Defect Inspection and Engineering Interpretation via Visual Recalibration and Entity Reconstruction

    cs.CV 2026-04 unverdicted novelty 6.0

    TunnelMIND recalibrates language-guided defect proposals via dense visual consistency and reconstructs them into structured defect entities with attributes for severity grading and retrieval-grounded engineering repor...

  13. Transferable Physical-World Adversarial Patches Against Object Detection in Autonomous Driving

    cs.CV 2026-04 unverdicted novelty 6.0

    AdvAD produces physical-world adversarial patches with improved transferability to unseen object detectors by multi-model optimization, adaptive balancing, and physical variation robustness.

  14. ZoomSpec: A Physics-Guided Coarse-to-Fine Framework for Wideband Spectrum Sensing

    cs.CV 2026-04 unverdicted novelty 6.0

    ZoomSpec achieves 78.1 mAP@0.5:0.95 on the SpaceNet dataset by combining log-space STFT, a coarse proposal net, adaptive heterodyne filtering, and dual-domain fine recognition to improve narrowband visibility in wideb...

  15. Toward Unified Fine-Grained Vehicle Classification and Automatic License Plate Recognition

    cs.CV 2026-04 accept novelty 6.0

    UFPR-VeSV is a new real-world dataset for fine-grained vehicle classification and automatic license plate recognition collected from Brazilian police cameras, with benchmarks demonstrating its difficulty and the value...

  16. Visual Prototype Conditioned Focal Region Generation for UAV-Based Object Detection

    cs.CV 2026-04 unverdicted novelty 6.0

    UAVGen generates higher-quality synthetic UAV images via visual prototype conditioning and focal region focus in diffusion models, leading to better object detection accuracy than prior methods.

  17. Video models are zero-shot learners and reasoners

    cs.LG 2025-09 unverdicted novelty 6.0

    Generative video models exhibit emergent zero-shot capabilities across perception, manipulation, and basic reasoning tasks.

  18. ERPPO: Entropy Regularization-based Proximal Policy Optimization

    cs.LG 2026-05 unverdicted novelty 5.0

    ERPPO adds a DSA-based ambiguity estimator to MAPPO and switches between L1 and L2 entropy regularization to improve exploration and stability in non-stationary multi-dimensional observations.

  19. TriBand-BEV: Real-Time LiDAR-Only 3D Pedestrian Detection via Height-Aware BEV and High-Resolution Feature Fusion

    cs.CV 2026-05 unverdicted novelty 5.0

    TriBand-BEV introduces a three-band height-aware BEV encoding of LiDAR data to enable single-pass real-time 3D detection of pedestrians, cars, and cyclists with improved KITTI accuracy.

  20. Exploring Clustering Capability of Inpainting Model Embeddings for Pattern-based Individual Identification

    cs.CV 2026-05 unverdicted novelty 5.0

    Inpainting auxiliary task improves clustering of embeddings for individual zebrafish identification based on skin patterns.

  21. Echo-{\alpha}: Large Agentic Multimodal Reasoning Model for Ultrasound Interpretation

    cs.CV 2026-04 unverdicted novelty 5.0

    Echo-α integrates organ-specific detectors with global visual context via an invoke-and-reason agentic loop, trained on a nine-task curriculum plus sequential RL, to achieve superior grounding (56.73%/43.78% F1@0.5) a...

  22. Edge-Cloud Collaborative Reconstruction via Structure-Aware Latent Diffusion for Downstream Remote Sensing Perception

    cs.CV 2026-04 unverdicted novelty 5.0

    SALD decouples remote sensing images into compressed payload plus structural prior at the edge and uses structure-gated diffusion on the cloud to improve super-resolution and downstream detection under extreme bandwid...

  23. DocRevive: A Unified Pipeline for Document Text Restoration

    cs.CV 2026-04 unverdicted novelty 5.0

    DocRevive builds a unified pipeline using OCR, image analysis, language models, and diffusion to reconstruct degraded document text, backed by a 30k-image synthetic dataset and the UCSM metric.

  24. A Weak-Signal-Aware Framework for Subsurface Defect Detection: Mechanisms for Enhancing Low-SCR Hyperbolic Signatures

    cs.CV 2026-04 unverdicted novelty 5.0

    WSA-Net uses partial convolutions, heterogeneous grouping attention, geometric reconstruction, and context anchoring to enhance low-SCR hyperbolic signatures in GPR data, reaching 0.6958 mAP@0.5 at 164 FPS with 2.412M...

  25. Human Interaction-Aware 3D Reconstruction from a Single Image

    cs.CV 2026-04 unverdicted novelty 5.0

    HUG3D uses group-instance multi-view diffusion and physics-based optimization to create physically plausible 3D reconstructions of interacting people from a single image.

  26. Gaze to Insight: A Scalable AI Approach for Detecting Gaze Behaviours in Face-to-Face Collaborative Learning

    cs.CV 2026-04 unverdicted novelty 5.0

    A method combining pretrained YOLO11, YOLOE-26, and Gaze-LLE models detects student gaze targets in collaborative learning videos with F1-score 0.829 without requiring labeled training data.

  27. A Marine Debris Detection Framework for Ocean Robots via Self-Attention Enhancement and Feature Interaction Optimization

    cs.CV 2026-05 unverdicted novelty 4.0

    YOLO-MD improves underwater marine debris detection by adding a Dual-Branch Convolutional Enhanced Self-Attention module, a lightweight shift operation, and SFG-Loss for class imbalance, achieving 0.875 precision and ...

  28. Fringe Projection Based Vision Pipeline for Autonomous Hard Drive Disassembly

    cs.CV 2026-04 unverdicted novelty 4.0

    An integrated fringe projection and AI pipeline delivers aligned high-accuracy 3D sensing and instance segmentation for autonomous HDD disassembly at 77.7 FPS.

  29. Beyond Mamba: Enhancing State-space Models with Deformable Dilated Convolutions for Multi-scale Traffic Object Detection

    cs.CV 2026-04 unverdicted novelty 4.0

    MDDCNet combines Mamba blocks with deformable dilated convolutions, enhanced feed-forward networks, and an attention-aggregating feature pyramid to achieve better multi-scale traffic object detection than prior detectors.

  30. StreakMind: AI detection and analysis of satellite streaks in astronomical images with automated database integration

    astro-ph.IM 2026-05 unverdicted novelty 3.0

    StreakMind trains a YOLO OBB model on 2335 images to detect satellite streaks in FITS frames with 94% precision and 97% recall, then applies geometric refinement and orbital database matching.

  31. Real-Time Cellist Postural Evaluation With On-Device Computer Vision

    cs.HC 2026-04 unverdicted novelty 3.0

    Cello Evaluator is a real-time postural feedback system for cellists running on current Android phones via on-device computer vision, validated as user-friendly by experts.

  32. Cosmos World Foundation Model Platform for Physical AI

    cs.CV 2025-01 unverdicted novelty 3.0

    The Cosmos platform supplies open-source pre-trained world models and supporting tools for building fine-tunable digital world simulations to train Physical AI.

  33. AI Driven Soccer Analysis Using Computer Vision

    cs.CV 2026-04 unverdicted novelty 2.0

    A system combining object detection, segmentation, keypoint prediction, and homography transforms soccer video into real-world player positions and tactical statistics.

  34. YOLOv11 Demystified: A Practical Guide to High-Performance Object Detection

    cs.CV 2026-04 unverdicted novelty 2.0

    YOLOv11 delivers higher mean average precision on standard benchmarks than prior YOLO versions while keeping real-time inference speed through C3K2, SPPF, and C2PSA modules.

Reference graph

Works this paper leans on

27 extracted references · 27 canonical work pages · cited by 34 Pith papers · 2 internal anchors

  1. [1]

    Image processing, analysis and machine vision

    Milan Sonka, Vaclav Hlavac, and Roger Boyle. Image processing, analysis and machine vision. Springer, 2013

  2. [2]

    Object detection in 20 years: A survey

    Zhengxia Zou, Keyan Chen, Zhenwei Shi, Yuhong Guo, and Jieping Ye. Object detection in 20 years: A survey. Proceedings of the IEEE, 111(3):257–276, 2023

  3. [3]

    Object detection with deep learning: A review

    Zhong-Qiu Zhao, Peng Zheng, Shou-tao Xu, and Xindong Wu. Object detection with deep learning: A review. IEEE transactions on neural networks and learning systems, 30(11):3212–3232, 2019

  4. [4]

    In-depth review of yolov1 to yolov10 variants for enhanced photo- voltaic defect detection

    Muhammad Hussain and Rahima Khanam. In-depth review of yolov1 to yolov10 variants for enhanced photo- voltaic defect detection. In Solar, volume 4, pages 351–386. MDPI, 2024

  5. [5]

    You only look once: Unified, real-time object detection

    Joseph Redmon, Santosh Divvala, Ross Girshick, and Ali Farhadi. You only look once: Unified, real-time object detection. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 779–788, 2016

  6. [6]

    Understanding of object detection based on cnn family and yolo

    Juan Du. Understanding of object detection based on cnn family and yolo. In Journal of Physics: Conference Series, volume 1004, page 012029. IOP Publishing, 2018

  7. [7]

    Yolo9000: better, faster, stronger

    Joseph Redmon and Ali Farhadi. Yolo9000: better, faster, stronger. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 7263–7271, 2017

  8. [8]

    YOLOv3: An Incremental Improvement

    Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018

  9. [9]

    YOLOv4: Optimal Speed and Accuracy of Object Detection

    Alexey Bochkovskiy, Chien-Yao Wang, and Hong-Yuan Mark Liao. Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934, 2020

  10. [10]

    What is yolov5? a guide for beginners., 2020

    Roboflow Blog Jacob Solawetz. What is yolov5? a guide for beginners., 2020. Accessed: 21 October 2024

  11. [11]

    Yolov6: A single-stage object detection framework for industrial applications

    Chuyi Li, Lulu Li, Hongliang Jiang, Kaiheng Weng, Yifei Geng, Liang Li, Zaidan Ke, Qingyuan Li, Meng Cheng, Weiqiang Nie, et al. Yolov6: A single-stage object detection framework for industrial applications. arXiv preprint arXiv:2209.02976, 2022

  12. [12]

    Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors

    Chien-Yao Wang, Alexey Bochkovskiy, and Hong-Yuan Mark Liao. Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 7464–7475, 2023

  13. [13]

    What is yolov8? the ultimate guide, 2023

    Francesco Jacob Solawetz. What is yolov8? the ultimate guide, 2023. Accessed: 21 October 2024

  14. [14]

    Yolov9: Learning what you want to learn us- ing programmable gradient information

    Chien-Yao Wang, I-Hau Yeh, and Hong-Yuan Mark Liao. Yolov9: Learning what you want to learn using programmable gradient information. arXiv preprint arXiv:2402.13616, 2024

  15. [15]

    Yolov10: Real-time end-to-end object detection.arXiv preprint arXiv:2405.14458, 2024

    Ao Wang, Hui Chen, Lihao Liu, Kai Chen, Zijia Lin, Jungong Han, and Guiguang Ding. Yolov10: Real-time end-to-end object detection. arXiv preprint arXiv:2405.14458, 2024

  16. [16]

    Ultralytics yolo11, 2024

    Glenn Jocher and Jing Qiu. Ultralytics yolo11, 2024

  17. [17]

    A comprehensive review of convolutional neural networks for defect detection in industrial applications

    Rahima Khanam, Muhammad Hussain, Richard Hill, and Paul Allen. A comprehensive review of convolutional neural networks for defect detection in industrial applications. IEEE Access, 2024

  18. [18]

    Yolo - learnopencv

    Satya Mallick. Yolo - learnopencv. https://learnopencv.com/yolo11/, 2024. Accessed: 2024-10-21

  19. [19]

    Application of yolov7-tiny in the detection of steel surface defects

    Jingwen Feng, Qiaofeng An, Jiahao Zhang, Shuxun Zhou, Guangwei Du, and Kai Yang. Application of yolov7-tiny in the detection of steel surface defects. In 2024 5th International Seminar on Artificial Intelligence, Networking and Information Technology (AINIT), pages 2241–2245. IEEE, 2024. 8 R.K HANAM ET AL .: YOLO V11: A N OVERVIEW OF THE KEY ARCHITECTURAL...

  20. [20]

    Instance segmentation and tracking, 2024

    Ultralytics. Instance segmentation and tracking, 2024. Accessed: 2024-10-21

  21. [21]

    Ultralytics yolo11 has arrived: Redefine what’s possible in ai, 2024

    Ultralytics Abirami Vina. Ultralytics yolo11 has arrived: Redefine what’s possible in ai, 2024. Accessed: 2024-10-21

  22. [22]

    Yolov11: A new iteration of “you only look once

    Viso.AI Gaudenz Boesch. Yolov11: A new iteration of “you only look once. https://viso.ai/ computer-vision/yolov11/, 2024. Accessed: 2024-10-21

  23. [23]

    Ultralytics yolov11

    Ultralytics. Ultralytics yolov11. https://docs.ultralytics.com/models/yolo11/s, 2024. Accessed: 21-Oct-2024

  24. [24]

    What is yolov5: A deep look into the internal features of the popular object detector

    Rahima Khanam and Muhammad Hussain. What is yolov5: A deep look into the internal features of the popular object detector. arXiv preprint arXiv:2407.20892, 2024

  25. [25]

    What’s new in yolov11 transforming object detection once again part 1, 2024

    DigitalOcean. What’s new in yolov11 transforming object detection once again part 1, 2024. Accessed: 2024-10- 21

  26. [26]

    Child emotion recognition via custom lightweight cnn architecture

    Muhammad Hussain and Hussain Al-Aqrabi. Child emotion recognition via custom lightweight cnn architecture. In Kids Cybersecurity Using Computational Intelligence Techniques, pages 165–174. Springer, 2023

  27. [27]

    Domain modelling for a lightweight convolutional network focused on automated exudate detection in retinal fundus images

    Burcu Ataer Aydin, Muhammad Hussain, Richard Hill, and Hussain Al-Aqrabi. Domain modelling for a lightweight convolutional network focused on automated exudate detection in retinal fundus images. In 2023 9th International Conference on Information Technology Trends (ITT), pages 145–150. IEEE, 2023. 9