pith. machine review for the scientific record. sign in

arxiv: 2605.07831 · v1 · submitted 2026-05-08 · 💻 cs.CV

Recognition: no theorem link

Explainable Part-Based Vehicle Classifier with Spatial Awareness

Andreas Caduff (1) , Klaus Zahn (1) , Jonas Hofstetter (1) , Martin Rechsteiner (1) , Patrick Flaig (2) ((1) Competence Center for Intelligent Sensors , Networks , Lucerne University of Applied Science , Art (2) SICK AG)

Authors on Pith no claims yet

Pith reviewed 2026-05-11 02:33 UTC · model grok-4.3

classification 💻 cs.CV
keywords vehicle classificationexplainable AIpart-based modelsspatial probability mapsintelligent transportation systemssoftmax regressionCNN decomposition
0
0 comments X

The pith

Spatial probability maps condition part detections in a vehicle classifier to reduce false positives while matching CNN accuracy.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper extends an earlier part-based vehicle classification system by replacing binary present-or-absent decisions with full spatial probability maps that condition each detected part according to its expected location for a given vehicle category. These maps feed into a softmax regression stage that computes overall vehicle probabilities. The authors report that the change increases resilience to erroneous part detections, a common problem in real scenes. Comparative tests against a state-of-the-art end-to-end CNN show that classification accuracy remains comparable, indicating that the added spatial structure does not cost performance. The decomposition also preserves the ability to add new vehicle categories without retraining the underlying part detector.

Core claim

The central claim is that constructing spatial probability maps to condition the presence of semantically strong vehicle parts, followed by softmax regression for category probabilities, yields considerably improved robustness against false detections while delivering accuracy on par with end-to-end CNNs.

What carries the argument

Spatial probability maps that condition individual part detections relative to a vehicle category before softmax regression classification.

If this is right

  • New vehicle categories can be added by updating only the spatial maps and retraining the final softmax stage, without touching the part detector.
  • The model exposes which parts were detected, their spatial relations, and the resulting category scores, supporting human inspection of decisions.
  • Reduced sensitivity to spurious part detections makes the system more deployable in practical intelligent transportation settings.
  • The results question the assumption that high classification accuracy in this domain requires an opaque end-to-end network.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same spatial-conditioning idea could be tested on other fine-grained recognition tasks where part layout is diagnostic, such as aircraft or animal species identification.
  • Because the maps produce continuous probabilities rather than hard decisions, the pipeline might naturally support uncertainty estimates that downstream modules could use for safety-critical filtering.
  • The explicit separation of detection, spatial reasoning, and classification opens the possibility of inserting symbolic constraints or rules on top of the detected parts.

Load-bearing premise

That constructing spatial probability maps to condition part presence will deliver considerably improved robustness against false detections in real-world conditions without sacrificing overall classification performance.

What would settle it

A head-to-head evaluation on a large, varied real-world video dataset in which the spatial-map version either produces more false detections than the binary version or falls measurably below the end-to-end CNN in overall accuracy.

read the original abstract

In the area of Intelligent Transportation Systems (ITS), fine-grained vehicle classification systems play an essential role. Recently, the authors have presented a novel vision-based classification approach in which standard end-to-end Convolutional Neural Networks (CNNs) have been decomposed into 1) a CNN-based detector for semantically strong vehicle parts, followed by 2) feature construction and 3) final classification by a decision tree. In contrast to conventional CNNs, this allows both easy extensibility to new vehicle categories - without the need to fully retrain the part detector - and an important step towards the interpretability of the model, removing partially the black-box nature inherent to CNNs. Here we present an important extension of this approach that now incorporates spatial awareness of the vehicle parts: while the feature construction 2) of the previous approach used a binary decision for each feature (present vs. absent), now a full spatial probability map is constructed to condition the presence of each individual part with respect to a given vehicle category. The classification is performed using a softmax regression approach for the overall vehicle probabilities. This method shows a considerably improved robustness against false (part-)detections, a point that is crucial for practical application. Comparative analyses with a state-of-the-art end-to-end CNN indicate that our part-based methods achieve comparable accuracy, effectively challenging the presumed trade-off between accuracy and explainability. This research represents a significant advance in vehicle classification for ITS and forms the basis for systems that combine high accuracy with intuitive interpretability.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. The manuscript extends a prior part-based vehicle classification pipeline for Intelligent Transportation Systems by replacing binary present/absent part decisions with spatial probability maps that condition part presence relative to vehicle category. The final stage is changed from a decision tree to softmax regression. The central claims are that the spatial maps yield considerably improved robustness to false part detections while preserving classification accuracy comparable to state-of-the-art end-to-end CNNs, thereby challenging the presumed accuracy-explainability trade-off and enabling extensible, interpretable models without full retraining.

Significance. If the empirical results and ablations support the claims, the work would be significant for explainable computer vision in ITS. It offers a concrete route to part-based models that remain extensible to new categories and potentially more robust in deployment, while matching CNN accuracy. This directly addresses a practical tension in safety-critical vision systems where interpretability is valued alongside performance.

major comments (2)
  1. [Abstract] Abstract: the assertions of 'considerably improved robustness against false (part-)detections' and 'comparable accuracy' are presented without any quantitative results, dataset descriptions, error bars, or ablation studies. This leaves the central claim that spatial probability maps resolve the accuracy-explainability trade-off unsupported in the provided summary.
  2. [Method description] Method description: the pipeline simultaneously replaces binary feature construction and decision-tree classification with spatial maps and softmax regression. No ablation is described that holds the classifier fixed and varies only the spatial conditioning (spatial maps vs. binary decisions). Because the robustness and accuracy gains could arise from the classifier change alone, this omission is load-bearing for the claim that spatial awareness is the key improvement.
minor comments (1)
  1. The construction of the spatial probability maps and their exact integration into the input vector for softmax regression should be formalized with equations or pseudocode for reproducibility.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We are grateful to the referee for the constructive feedback on our manuscript. We address each major comment point by point below, indicating planned revisions where appropriate.

read point-by-point responses
  1. Referee: [Abstract] Abstract: the assertions of 'considerably improved robustness against false (part-)detections' and 'comparable accuracy' are presented without any quantitative results, dataset descriptions, error bars, or ablation studies. This leaves the central claim that spatial probability maps resolve the accuracy-explainability trade-off unsupported in the provided summary.

    Authors: We acknowledge that the abstract, being a concise summary, presents the key claims without embedding specific quantitative results, dataset names, error bars, or ablation details. These elements are fully reported in the manuscript body (experimental section), including accuracy comparisons to end-to-end CNNs on vehicle classification datasets, robustness evaluations under false part detections, and supporting ablations. The central claim is therefore substantiated in the paper. To better align the abstract with the referee's expectation, we will revise it to incorporate brief quantitative highlights drawn from the results while preserving length constraints. revision: yes

  2. Referee: [Method description] Method description: the pipeline simultaneously replaces binary feature construction and decision-tree classification with spatial maps and softmax regression. No ablation is described that holds the classifier fixed and varies only the spatial conditioning (spatial maps vs. binary decisions). Because the robustness and accuracy gains could arise from the classifier change alone, this omission is load-bearing for the claim that spatial awareness is the key improvement.

    Authors: We agree that an explicit ablation isolating spatial probability maps from the classifier change (while holding the other fixed) would strengthen the attribution of gains to spatial awareness. The manuscript compares the complete new pipeline against the prior binary-plus-decision-tree version and against CNN baselines, with the switch to softmax motivated by its suitability for continuous probability inputs. Nevertheless, the referee's point is valid, and we will add the requested ablation in the revision—for instance, by thresholding spatial maps to binary decisions for use with the original decision tree or by feeding binary features into softmax regression—to demonstrate the independent contribution of the spatial conditioning. revision: yes

Circularity Check

0 steps flagged

No derivation reduces to self-inputs; performance claims rest on external comparisons.

full rationale

The paper extends a prior part-based pipeline (self-cited for background) by adding spatial probability maps and switching to softmax regression. All load-bearing claims about robustness and accuracy are tied to comparative experiments against an independent end-to-end CNN, not to any equation or parameter that is defined in terms of the target result. No self-definitional loops, fitted-input predictions, or uniqueness theorems imported from the authors' own prior work appear in the provided text. The absence of an ablation isolating the spatial maps is a methodological gap but does not constitute circularity in the derivation chain.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only review supplies no explicit free parameters, axioms, or invented entities; the approach relies on standard CNN detection and softmax regression without additional postulated constructs.

pith-pipeline@v0.9.0 · 5614 in / 987 out tokens · 62332 ms · 2026-05-11T02:33:57.773592+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

76 extracted references · 76 canonical work pages · 2 internal anchors

  1. [1]

    Imagenet classification with deep convo- lutional neural networks,

    A. Krizhevsky, I. Sutskever, and G. E. Hin- ton, “Imagenet classification with deep convo- lutional neural networks,”Advances in neural information processing systems, vol. 25, 2012

  2. [2]

    A convnet for the 2020s,

    Z. Liu, H. Mao, C.-Y . Wu, C. Feichtenhofer, T. Darrell, and S. Xie, “A convnet for the 2020s,” inProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 11976–11986, 2022

  3. [3]

    Intelligent traffic monitoring sys- tems for vehicle classification: A survey,

    M. Won, “Intelligent traffic monitoring sys- tems for vehicle classification: A survey,”IEEE Access, vol. 8, pp. 73340–73358, 2020

  4. [4]

    Exploring the limits of vanilla cnn architectures for fine-grained vision-based vehicle classification,

    A. Caduff, K. Zahn, J. Hofstetter, M. Rech- steiner, and P . Bucher, “Exploring the limits of vanilla cnn architectures for fine-grained vision-based vehicle classification,” inInterna- tional Conference on Engineering Applications of Neural Networks, pp. 202–212, Springer, 2021

  5. [5]

    Technische lieferbedingungen f ¨ur streckenstationen

    BAST, “Technische lieferbedingungen f ¨ur streckenstationen.”https://www.bast.de/ BASt 2017/DE/Publikationen/Regelwerke/ Verkehrstechnik/Unterseiten/V5-tls-2012. html, 2012. Accessed: 2023-12-15

  6. [6]

    A comprehensive study of class incremen- tal learning algorithms for visual tasks,

    E. Belouadah, A. Popescu, and I. Kanellos, “A comprehensive study of class incremen- tal learning algorithms for visual tasks,”Neural Networks, vol. 135, pp. 38–54, 2021

  7. [7]

    A survey on neural network interpretability,

    Y . Zhang, P . Tiˇno, A. Leonardis, and K. Tang, “A survey on neural network interpretability,”IEEE Transactions on Emerging Topics in Computa- tional Intelligence, vol. 5, no. 5, pp. 726–742, 2021

  8. [8]

    Visual inter- pretability for deep learning: a survey,

    Q.-s. Zhang and S.-C. Zhu, “Visual inter- pretability for deep learning: a survey,”Fron- tiers of Information Technology & Electronic Engineering, vol. 19, no. 1, pp. 27–39, 2018

  9. [9]

    Disentangling convolu- tional neural network towards an explainable vehicle classifier,

    A. Caduff, K. Zahn, J. Hofstetter, M. Rech- steiner, and P . Flaig, “Disentangling convolu- tional neural network towards an explainable vehicle classifier,” in2022 International Con- ference on Digital Image Computing: Tech- niques and Applications (DICTA), pp. 1–8, IEEE, 2022

  10. [10]

    A survey on deep learning-based fine-grained object classification and semantic segmenta- tion,

    B. Zhao, J. Feng, X. Wu, and S. Y an, “A survey on deep learning-based fine-grained object classification and semantic segmenta- tion,”International Journal of Automation and 14 Computing, vol. 14, no. 2, pp. 119–135, 2017

  11. [11]

    Object detec- tion with discriminatively trained part-based models,

    P . F . Felzenszwalb, R. B. Girshick, D. McAllester, and D. Ramanan, “Object detec- tion with discriminatively trained part-based models,”IEEE transactions on pattern anal- ysis and machine intelligence, vol. 32, no. 9, pp. 1627–1645, 2009

  12. [12]

    Part-based r-cnns for fine-grained cat- egory detection,

    N. Zhang, J. Donahue, R. Girshick, and T. Dar- rell, “Part-based r-cnns for fine-grained cat- egory detection,” inComputer Vision–ECCV 2014: 13th European Conference, Zurich, Switzerland, September 6-12, 2014, Proceed- ings, Part I 13, pp. 834–849, Springer, 2014

  13. [13]

    Car type recognition with deep neural net- works,

    H. Huttunen, F . S. Y ancheshmeh, and K. Chen, “Car type recognition with deep neural net- works,” in2016 IEEE intelligent vehicles sym- posium (IV), pp. 1115–1120, IEEE, 2016

  14. [14]

    Automated vehicle recognition with deep convolutional neural networks,

    Y . O. Adu-Gyamfi, S. K. Asare, A. Sharma, and T. Titus, “Automated vehicle recognition with deep convolutional neural networks,”Trans- portation Research Record, vol. 2645, no. 1, pp. 113–122, 2017

  15. [15]

    Vehicle classification for large-scale traffic surveillance videos using convolutional neural networks,

    L. Zhuo, L. Jiang, Z. Zhu, J. Li, J. Zhang, and H. Long, “Vehicle classification for large-scale traffic surveillance videos using convolutional neural networks,”Machine Vision and Applica- tions, vol. 28, pp. 793–802, 2017

  16. [16]

    Location- aware fine-grained vehicle type recognition using multi-task deep networks,

    B. Hu, J.-H. Lai, and C.-C. Guo, “Location- aware fine-grained vehicle type recognition using multi-task deep networks,”Neurocom- puting, vol. 243, pp. 60–68, 2017

  17. [17]

    Convolutional neural network based vehicle classification in adverse illuminous conditions for intelligent transporta- tion systems,

    M. A. Butt, A. M. Khattak, S. Shafique, B. Hayat, S. Abid, K.-I. Kim, M. W. Ayub, A. Sajid, and A. Adnan, “Convolutional neural network based vehicle classification in adverse illuminous conditions for intelligent transporta- tion systems,”Complexity, vol. 2021, pp. 1–11, 2021

  18. [18]

    Vehi- cle classification using a real-time convolu- tional structure based on dwt pooling layer and se blocks,

    H. Gholamalinejad and H. Khosravi, “Vehi- cle classification using a real-time convolu- tional structure based on dwt pooling layer and se blocks,”Expert Systems with Applications, vol. 183, p. 115420, 2021

  19. [19]

    Indone- sia toll road vehicle classification using trans- fer learning with pre-trained resnet models,

    A. T. Sasongko and M. I. Fanany, “Indone- sia toll road vehicle classification using trans- fer learning with pre-trained resnet models,” in2019 International Seminar on Research of Information Technology and Intelligent Sys- tems (ISRITI), pp. 373–378, IEEE, 2019

  20. [20]

    Vehicle type classification in surveillance image based on deep learning method,

    E. U. Armin, A. Bejo, and R. Hidayat, “Vehicle type classification in surveillance image based on deep learning method,” in2020 3rd Interna- tional Conference on Information and Commu- nications Technology (ICOIACT), pp. 400–404, IEEE, 2020

  21. [21]

    Resnet-based vehicle classification and localization in traffic surveillance systems,

    H. Jung, M.-K. Choi, J. Jung, J.-H. Lee, S. Kwon, and W. Y oung Jung, “Resnet-based vehicle classification and localization in traffic surveillance systems,” inProceedings of the IEEE conference on computer vision and pat- tern recognition workshops, pp. 61–67, 2017

  22. [22]

    An ensemble deep learning method for vehicle type classification on visual traffic surveillance sensors,

    W. Liu, M. Zhang, Z. Luo, and Y . Cai, “An ensemble deep learning method for vehicle type classification on visual traffic surveillance sensors,”IEEE Access, vol. 5, pp. 24417– 24425, 2017

  23. [23]

    Vehicle type classi- fication using bagging and convolutional neu- ral network on multi view surveillance image,

    P .-K. Kim and K.-T. Lim, “Vehicle type classi- fication using bagging and convolutional neu- ral network on multi view surveillance image,” inProceedings of the IEEE conference on computer vision and pattern recognition work- shops, pp. 41–46, 2017

  24. [24]

    Moving vehicle detection and classification using gaus- sian mixture model and ensemble deep learn- ing technique,

    P . Jagannathan, S. Rajkumar, J. Frnda, P . B. Divakarachari, and P . Subramani, “Moving vehicle detection and classification using gaus- sian mixture model and ensemble deep learn- ing technique,”Wireless Communications and Mobile Computing, vol. 2021, pp. 1–15, 2021

  25. [25]

    Deep learning- based vehicle classification using an ensem- ble of local expert and global networks,

    J. Taek Lee and Y . Chung, “Deep learning- based vehicle classification using an ensem- ble of local expert and global networks,” in Proceedings of the IEEE conference on com- puter vision and pattern recognition work- shops, pp. 47–52, 2017

  26. [26]

    A super-learner ensemble of deep networks for vehicle-type classification,

    M. A. Hedeya, A. H. Eid, and R. F . Abdel-Kader, “A super-learner ensemble of deep networks for vehicle-type classification,”IEEE Access, vol. 8, pp. 98266–98280, 2020

  27. [27]

    Eden: Ensemble of deep networks for vehicle classifi- cation,

    R. Theagarajan, F . Pala, and B. Bhanu, “Eden: Ensemble of deep networks for vehicle classifi- cation,” inProceedings of the IEEE conference on computer vision and pattern recognition workshops, pp. 33–40, 2017

  28. [28]

    Single image vehicle classification using pseudo long short-term memory clas- sifier,

    R. F . Rachmadi, K. Uchimura, G. Koutaki, and K. Ogata, “Single image vehicle classification using pseudo long short-term memory clas- sifier,”Journal of Visual Communication and Image Representation, vol. 56, pp. 265–274, 2018

  29. [29]

    Deep cnns with spatially weighted pooling for fine-grained 15 car recognition,

    Q. Hu, H. Wang, T. Li, and C. Shen, “Deep cnns with spatially weighted pooling for fine-grained 15 car recognition,”IEEE Transactions on Intelli- gent Transportation Systems, vol. 18, no. 11, pp. 3147–3156, 2017

  30. [30]

    A visual attention based convolutional neural network for image classification,

    Y . Chen, D. Zhao, L. Lv, and C. Li, “A visual attention based convolutional neural network for image classification,” in2016 12th World Congress on Intelligent Control and Automa- tion (WCICA), pp. 764–769, IEEE, 2016

  31. [31]

    Cam: A fine-grained vehicle model recognition method based on visual attention model,

    Y . Yu, L. Xu, W. Jia, W. Zhu, Y . Fu, and Q. Lu, “Cam: A fine-grained vehicle model recognition method based on visual attention model,”Image and Vision Computing, vol. 104, p. 104027, 2020

  32. [32]

    A novel smart lightweight visual attention model for fine- grained vehicle recognition,

    A. Boukerche and X. Ma, “A novel smart lightweight visual attention model for fine- grained vehicle recognition,”IEEE Transac- tions on Intelligent Transportation Systems, vol. 23, no. 8, pp. 13846–13862, 2021

  33. [33]

    Recognition of car makes and models from a single traffic- camera image,

    H. He, Z. Shao, and J. Tan, “Recognition of car makes and models from a single traffic- camera image,”IEEE Transactions on Intelli- gent Transportation Systems, vol. 16, no. 6, pp. 3182–3192, 2015

  34. [34]

    Multi- path deep cnns for fine-grained car recogni- tion,

    H. Wang, J. Peng, Y . Zhao, and X. Fu, “Multi- path deep cnns for fine-grained car recogni- tion,”IEEE Transactions on Vehicular Technol- ogy, vol. 69, no. 10, pp. 10484–10493, 2020

  35. [35]

    Fine-grained vehi- cle recognition by deep convolutional neural network,

    K. Huang and B. Zhang, “Fine-grained vehi- cle recognition by deep convolutional neural network,” in2016 9th International Congress on Image and Signal Processing, BioMedi- cal Engineering and Informatics (CISP-BMEI), pp. 465–470, IEEE, 2016

  36. [36]

    Multi- component vehicle type recognition using adapted cnn by optimal transport,

    B. Liao, H. He, Y . Du, and S. Guan, “Multi- component vehicle type recognition using adapted cnn by optimal transport,”Signal, Image and Video Processing, pp. 1–8, 2022

  37. [37]

    Fine- grained vehicle model recognition using a coarse-to-fine convolutional neural network architecture,

    J. Fang, Y . Zhou, Y . Yu, and S. Du, “Fine- grained vehicle model recognition using a coarse-to-fine convolutional neural network architecture,”IEEE Transactions on Intelli- gent Transportation Systems, vol. 18, no. 7, pp. 1782–1792, 2016

  38. [38]

    Learning features and parts for fine-grained recognition,

    J. Krause, T. Gebru, J. Deng, L.-J. Li, and L. Fei-Fei, “Learning features and parts for fine-grained recognition,” in2014 22nd Inter- national conference on pattern recognition, pp. 26–33, IEEE, 2014

  39. [39]

    Selective multi-convolutional region feature extraction based iterative discrimina- tion cnn for fine-grained vehicle model recogni- tion,

    Y . Tian, W. Zhang, Q. Zhang, G. Lu, and X. Wu, “Selective multi-convolutional region feature extraction based iterative discrimina- tion cnn for fine-grained vehicle model recogni- tion,” in2018 24th International Conference on Pattern Recognition (ICPR), pp. 3279–3284, IEEE, 2018

  40. [40]

    A novel part-level feature extraction method for fine-grained vehi- cle recognition,

    L. Lu, P . Wang, and Y . Cao, “A novel part-level feature extraction method for fine-grained vehi- cle recognition,”Pattern Recognition, vol. 131, p. 108869, 2022

  41. [41]

    A cascaded part-based system for fine- grained vehicle classification,

    M. Biglari, A. Soleimani, and H. Hassan- pour, “A cascaded part-based system for fine- grained vehicle classification,”IEEE Transac- tions on Intelligent Transportation Systems, vol. 19, no. 1, pp. 273–283, 2017

  42. [42]

    Exploiting effects of parts in fine-grained categorization of vehicles,

    L. Liao, R. Hu, J. Xiao, Q. Wang, J. Xiao, and J. Chen, “Exploiting effects of parts in fine-grained categorization of vehicles,” in2015 IEEE international conference on image pro- cessing (ICIP), pp. 745–749, IEEE, 2015

  43. [43]

    Learning discriminative pattern for real-time car brand recognition,

    C. Hu, X. Bai, L. Qi, X. Wang, G. Xue, and L. Mei, “Learning discriminative pattern for real-time car brand recognition,”IEEE Trans- actions on Intelligent Transportation Systems, vol. 16, no. 6, pp. 3170–3181, 2015

  44. [44]

    Global topol- ogy constraint network for fine-grained vehi- cle recognition,

    Y . Xiang, Y . Fu, and H. Huang, “Global topol- ogy constraint network for fine-grained vehi- cle recognition,”IEEE Transactions on Intelli- gent Transportation Systems, vol. 21, no. 7, pp. 2918–2929, 2019

  45. [45]

    Global relative position space based pooling for fine-grained vehicle recognition,

    Y . Xiang, Y . Fu, and H. Huang, “Global relative position space based pooling for fine-grained vehicle recognition,”Neurocomputing, vol. 367, pp. 287–298, 2019

  46. [46]

    3d object representations for fine-grained categorization,

    J. Krause, M. Stark, J. Deng, and L. Fei- Fei, “3d object representations for fine-grained categorization,” inProceedings of the IEEE international conference on computer vision workshops, pp. 554–561, 2013

  47. [47]

    Jointly optimizing 3d model fitting and fine-grained classification,

    Y .-L. Lin, V. I. Morariu, W. Hsu, and L. S. Davis, “Jointly optimizing 3d model fitting and fine-grained classification,” inComputer Vision–ECCV 2014: 13th European Confer- ence, Zurich, Switzerland, September 6-12, 2014, Proceedings, Part IV 13, pp. 466–480, Springer, 2014

  48. [48]

    Boxcars: Improving fine-grained recognition of vehicles using 3-d bounding boxes in traffic surveil- lance,

    J. Sochor, J. ˇSpaˇnhel, and A. Herout, “Boxcars: Improving fine-grained recognition of vehicles using 3-d bounding boxes in traffic surveil- lance,”IEEE transactions on intelligent trans- portation systems, vol. 20, no. 1, pp. 97–108, 2018. 16

  49. [49]

    Geometry-constrained car recogni- tion using a 3d perspective network,

    Z. Rui, G. Zongyuan, D. Simon, S. Sridha, and F . Clinton, “Geometry-constrained car recogni- tion using a 3d perspective network,” inPro- ceedings of the AAAI Conference on Artificial Intelligence, vol. 34, pp. 1161–1168, 2020

  50. [50]

    This looks like that: deep learning for interpretable image recognition,

    C. Chen, O. Li, D. Tao, A. Barnett, C. Rudin, and J. K. Su, “This looks like that: deep learning for interpretable image recognition,” Advances in neural information processing systems, vol. 32, 2019

  51. [51]

    Interpretable image classification with differ- entiable prototypes assignment,

    D. Rymarczyk, Ł. Struski, M. G ´orszczak, K. Lewandowska, J. Tabor, and B. Zieli ´nski, “Interpretable image classification with differ- entiable prototypes assignment,” inEuropean Conference on Computer Vision, pp. 351–368, Springer, 2022

  52. [52]

    Neural prototype trees for interpretable fine-grained image recognition,

    M. Nauta, R. Van Bree, and C. Seifert, “Neural prototype trees for interpretable fine-grained image recognition,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 14933–14943, 2021

  53. [53]

    Deformable protopnet: An interpretable image classifier using deformable prototypes,

    J. Donnelly, A. J. Barnett, and C. Chen, “Deformable protopnet: An interpretable image classifier using deformable prototypes,” inPro- ceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 10265–10275, 2022

  54. [54]

    Der: Dynamically expandable representation for class incremen- tal learning,

    S. Y an, J. Xie, and X. He, “Der: Dynamically expandable representation for class incremen- tal learning,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 3014–3023, 2021

  55. [55]

    Lifelong machine learning with deep streaming linear discrimi- nant analysis,

    T. L. Hayes and C. Kanan, “Lifelong machine learning with deep streaming linear discrimi- nant analysis,” inProceedings of the IEEE/CVF conference on computer vision and pattern recognition workshops, pp. 220–221, 2020

  56. [56]

    Remind your neural network to prevent catastrophic forgetting,

    T. L. Hayes, K. Kafle, R. Shrestha, M. Acharya, and C. Kanan, “Remind your neural network to prevent catastrophic forgetting,” inEuropean Conference on Computer Vision, pp. 466–483, Springer, 2020

  57. [57]

    Few-shot incremental learning with continually evolved classifiers,

    C. Zhang, N. Song, G. Lin, Y . Zheng, P . Pan, and Y . Xu, “Few-shot incremental learning with continually evolved classifiers,” inProceed- ings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 12455– 12464, 2021

  58. [58]

    Deesil: Deep- shallow incremental learning.,

    E. Belouadah and A. Popescu, “Deesil: Deep- shallow incremental learning.,” inProceedings of the European Conference on Computer Vision (ECCV) Workshops, pp. 0–0, 2018

  59. [59]

    Fetril: Feature translation for exemplar-free class-incremental learning,

    G. Petit, A. Popescu, H. Schindler, D. Picard, and B. Delezoide, “Fetril: Feature translation for exemplar-free class-incremental learning,” inProceedings of the IEEE/CVF Winter Con- ference on Applications of Computer Vision, pp. 3911–3920, 2023

  60. [60]

    Few-shot class- incremental learning via compact and separa- ble features for fine-grained vehicle recogni- tion,

    D.-W. Li and H. Huang, “Few-shot class- incremental learning via compact and separa- ble features for fine-grained vehicle recogni- tion,”IEEE Transactions on Intelligent Trans- portation Systems, vol. 23, no. 11, pp. 21418– 21429, 2022

  61. [61]

    Deep learning based geometric features for effective truck selec- tion and classification from highway videos,

    P . He, A. Wu, X. Huang, J. Scott, A. Ran- garajan, and S. Ranka, “Deep learning based geometric features for effective truck selec- tion and classification from highway videos,” in 2019 IEEE Intelligent Transportation Systems Conference (ITSC), pp. 824–830, IEEE, 2019

  62. [62]

    Image-based vehi- cle classification by synergizing features from supervised and self-supervised learning paradigms,

    S. Ma and J. J. Y ang, “Image-based vehi- cle classification by synergizing features from supervised and self-supervised learning paradigms,”Eng, vol. 4, no. 1, pp. 444–456, 2023

  63. [63]

    Probabilistic inference for occluded and multiview on-road vehicle detec- tion,

    C. Wang, Y . Fang, H. Zhao, C. Guo, S. Mita, and H. Zha, “Probabilistic inference for occluded and multiview on-road vehicle detec- tion,”IEEE Transactions on Intelligent Trans- portation Systems, vol. 17, no. 1, pp. 215–229, 2015

  64. [64]

    Multi-view vehicle detection based on fusion part model with active learning,

    D. L. Li, M. Prasad, C.-L. Liu, and C.-T. Lin, “Multi-view vehicle detection based on fusion part model with active learning,”IEEE Trans- actions on Intelligent Transportation Systems, vol. 22, no. 5, pp. 3146–3157, 2020

  65. [65]

    Y ou only look once: Unified, real- time object detection,

    J. Redmon, S. Divvala, R. Girshick, and A. Farhadi, “Y ou only look once: Unified, real- time object detection,” inProceedings of the IEEE conference on computer vision and pat- tern recognition, pp. 779–788, 2016

  66. [66]

    Generalizing from a few examples: A survey on few-shot learning,

    Y . Wang, Q. Y ao, J. T. Kwok, and L. M. Ni, “Generalizing from a few examples: A survey on few-shot learning,”ACM computing surveys (csur), vol. 53, no. 3, pp. 1–34, 2020

  67. [67]

    Fss-1000: A 1000-class dataset for few-shot segmentation,

    X. Li, T. Wei, Y . P . Chen, Y .-W. Tai, and C.- K. Tang, “Fss-1000: A 1000-class dataset for few-shot segmentation,” inProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 2869–2878, 2020

  68. [68]

    Very Deep Convolutional Networks for Large-Scale Image Recognition

    K. Simonyan and A. Zisserman, “Very deep 17 convolutional networks for large-scale image recognition,”arXiv preprint arXiv:1409.1556, 2014

  69. [69]

    Adam: A Method for Stochastic Optimization

    D. P . Kingma and J. Ba, “Adam: A method for stochastic optimization,”arXiv preprint arXiv:1412.6980, 2014

  70. [70]

    Batch normalization: Accelerating deep network training by reducing internal covariate shift,

    S. Ioffe and C. Szegedy, “Batch normalization: Accelerating deep network training by reducing internal covariate shift,” inInternational confer- ence on machine learning, pp. 448–456, pmlr, 2015

  71. [71]

    Dropout: a simple way to prevent neural networks from overfitting,

    N. Srivastava, G. Hinton, A. Krizhevsky, I. Sutskever, and R. Salakhutdinov, “Dropout: a simple way to prevent neural networks from overfitting,”The journal of machine learning research, vol. 15, no. 1, pp. 1929–1958, 2014

  72. [72]

    Object detection in 20 years: A survey,

    Z. Zou, K. Chen, Z. Shi, Y . Guo, and J. Y e, “Object detection in 20 years: A survey,”Pro- ceedings of the IEEE, 2023

  73. [73]

    YOLOv7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors

    C. Wang, A. Bochkovskiy, and H. Liao, “Y olov7: Trainable bag-of-freebies sets new state-of- the-art for real-time object detectors. arxiv 2022,”arXiv preprint arXiv:2207.02696, 2022

  74. [74]

    Y olov5 by ultralytics

    G. Jocher, “Y olov5 by ultralytics.”https: //github.com/ultralytics/yolov5/releases/ tag/v6.1, 2022. Accessed: 2023-12-15

  75. [75]

    Network Dissection: Quantifying Interpretability of Deep Visual Representations[PDF]Bau, D., Zhou, B., Khosla, A., Oliva, A

    B. Zhou, A. Khosla, A. Lapedriza, A. Oliva, and A. Torralba, “Object detectors emerge in deep scene cnns,”arXiv preprint arXiv:1412.6856, 2014

  76. [76]

    Explainable artificial intelligence: a compre- hensive review,

    D. Minh, H. X. Wang, Y . F . Li, and T. N. Nguyen, “Explainable artificial intelligence: a compre- hensive review,”Artificial Intelligence Review, pp. 1–66, 2022. 18