Understanding Video Content: Efficient Hero Detection and Recognition for the Game "Honor of Kings"
Pith reviewed 2026-05-24 20:10 UTC · model grok-4.3
The pith
A two-stage method detects heroes in game videos by blood-bar template matching and then recognizes their names with CNNs using almost no labeling.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The central claim is that an efficient two-stage algorithm detects all heroes in each video frame via blood bar template-matching, classifies them by camp (self/friend/enemy), and recognizes their names using deep CNNs, all while needing almost no labeling effort for the recognition stage.
What carries the argument
blood bar template-matching method that locates heroes and assigns camps, followed by deep CNN name recognition
If this is right
- All heroes in a frame are located and assigned to one of three camps before name recognition begins.
- Recognition requires almost no manual labeling of training or test samples.
- The pipeline processes game video frames efficiently enough for practical use.
- Experiments confirm both detection accuracy and recognition accuracy on Honor of Kings footage.
Where Pith is reading between the lines
- The same blood-bar cue could support hero tracking across consecutive frames in longer clips.
- Similar UI elements in other games might allow the detection stage to transfer with only template changes.
- Reducing labeling cost opens the possibility of scaling the method to large archives of game videos.
Load-bearing premise
Blood bar template matching can reliably locate every hero and correctly assign its camp across varied video frames without substantial false positives or missed detections.
What would settle it
A set of video frames in which blood bars are partially obscured, differently styled, or overlapping such that template matching produces more than 5 percent missed detections or incorrect camp assignments.
read the original abstract
In order to understand content and automatically extract labels for videos of the game "Honor of Kings", it is necessary to detect and recognize characters (called "hero") together with their camps in the game video. In this paper, we propose an efficient two-stage algorithm to detect and recognize heros in game videos. First, we detect all heros in a video frame based on blood bar template-matching method, and classify them according to their camps (self/ friend/ enemy). Then we recognize the name of each hero using one or more deep convolution neural networks. Our method needs almost no work for labelling training and testing samples in the recognition stage. Experiments show its efficiency and accuracy in the task of hero detection and recognition in game videos.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript proposes an efficient two-stage algorithm for detecting and recognizing heroes (with camps) in Honor of Kings game videos. Stage 1 applies blood-bar template matching to locate heroes and classify them as self/friend/enemy; stage 2 feeds the cropped regions to one or more deep CNNs for name recognition. The method is presented as requiring almost no labeling effort for the recognition stage, with the abstract asserting that experiments demonstrate both efficiency and accuracy.
Significance. If the quantitative claims were substantiated, the pipeline would constitute a practical, low-annotation engineering solution for structured extraction from game footage that exploits domain-specific UI elements (blood bars). The approach could be useful for downstream video-understanding tasks in esports analytics, but the current lack of supporting measurements prevents any assessment of its actual performance or generality.
major comments (2)
- [Abstract] Abstract: the central claim that 'Experiments show its efficiency and accuracy' is unsupported because the manuscript supplies no quantitative detection or recognition metrics (precision, recall, accuracy, F1, runtime, dataset cardinality, or baseline comparisons). This directly undermines the paper's primary assertion.
- [Method] Method section (blood-bar template matching): the entire pipeline depends on the untested premise that template matching reliably detects every hero and correctly assigns camps under real-game conditions (motion blur, lighting variation, partial occlusion, similar blood-bar appearances). No detection performance numbers, failure-case analysis, or robustness experiments are reported to validate this load-bearing step.
minor comments (2)
- [Abstract] Abstract and throughout: repeated spelling error 'heros' should be 'heroes'.
- [Abstract] Abstract: 'deep convolution neural networks' should read 'deep convolutional neural networks'.
Simulated Author's Rebuttal
We thank the referee for the constructive feedback. We agree that quantitative metrics are required to support the claims made in the abstract and will revise the manuscript to include them. We respond point-by-point to the major comments below.
read point-by-point responses
-
Referee: [Abstract] Abstract: the central claim that 'Experiments show its efficiency and accuracy' is unsupported because the manuscript supplies no quantitative detection or recognition metrics (precision, recall, accuracy, F1, runtime, dataset cardinality, or baseline comparisons). This directly undermines the paper's primary assertion.
Authors: We accept the criticism. The current manuscript does not provide quantitative metrics to back the abstract claim. In the revision we will add a full Experiments section with precision, recall, accuracy, F1, runtime, dataset cardinality, and baseline comparisons for both stages, thereby substantiating the efficiency and accuracy statements. revision: yes
-
Referee: [Method] Method section (blood-bar template matching): the entire pipeline depends on the untested premise that template matching reliably detects every hero and correctly assigns camps under real-game conditions (motion blur, lighting variation, partial occlusion, similar blood-bar appearances). No detection performance numbers, failure-case analysis, or robustness experiments are reported to validate this load-bearing step.
Authors: The referee is correct that no performance numbers or robustness analysis are supplied for the template-matching stage. We will add quantitative evaluation of blood-bar detection and camp classification (including accuracy under motion blur, occlusion, and lighting changes), plus failure-case analysis, in the revised Method and Experiments sections. revision: yes
Circularity Check
No circularity: direct engineering pipeline with no derivations or self-referential steps
full rationale
The paper describes a two-stage detection/recognition pipeline using blood-bar template matching followed by CNN classification. No equations, fitted parameters presented as predictions, uniqueness theorems, or self-citations appear in the provided text. The method is presented as an empirical engineering approach whose performance is asserted via experiments rather than derived from prior results by the same authors. No load-bearing step reduces to its own inputs by construction.
Axiom & Free-Parameter Ledger
free parameters (1)
- blood bar templates
axioms (1)
- domain assumption Blood bars remain visible and sufficiently distinctive for template matching under typical game video conditions
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.