DeepForestVisionV2: Ecology-Driven Taxonomy Expansion for Camera-Trap Monitoring in African Tropical Forests
Pith reviewed 2026-06-26 17:54 UTC · model grok-4.3
The pith
DeepForestVisionV2 expands a camera-trap classifier from 35 to 64 classes to cover more species across varied African forest habitats while keeping accuracy stable.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
DeepForestVisionV2 is the ecology-driven expansion of the prior model to a 64-class space that includes additional animal classes for arboreal primates, birds, semi-aquatic taxa, and human-associated confounders such as livestock. It is trained on 1,535,010 photographs and 243,354 videos from multi-country projects and evaluated on a cross-country validation set plus three held-out Uganda video benchmarks. The model reaches 0.86 accuracy on validation and on deployment benchmarks preserves or improves accuracy while increasing identified taxa from 22 to 29 in forest interiors, from 4 to 9 at riverbanks, and raising accuracy from 0.62 to 0.86 with zero false alarms at park edges.
What carries the argument
The 64-class prediction taxonomy that adds classes addressing vertical stratification, scene openness, and anthropogenic interfaces.
If this is right
- The same model works across closed-canopy, riverbank, and park-edge sites.
- More species are automatically identified in video deployments.
- False positive rates drop in human-interface areas.
- Offline processing of both photos and videos remains supported.
- Performance holds across different countries and camera settings.
Where Pith is reading between the lines
- Researchers could standardize on one classifier instead of building habitat-specific versions.
- Similar expansions might improve monitoring in other tropical regions with comparable gradients.
- Longer-term data collection could track changes in species presence at edges and open areas more reliably.
- Combining the classifier with location data might reveal patterns in animal movement across gradients.
Load-bearing premise
The new classes and the multi-country training data match the actual species and scene conditions in the held-out Uganda sites closely enough that added classes do not create new errors that wipe out the gains.
What would settle it
Running the original 35-class model and the new 64-class model on the same Uganda deployment videos and finding that the new model identifies fewer correct taxa or has lower accuracy than the original would falsify the claim of improved utility.
Figures
read the original abstract
Camera-trap monitoring in African tropical forests increasingly extends beyond closed-canopy interiors to riverbanks, clearings, and park edges. Among available open tools for African forest camera-trap classification, DeepForestVision is the only one providing a matched offline workflow for both photographs and videos, and previous work showed that it outperformed other available baselines on a comparable benchmark. However, it was designed for closed-canopy, ground-level forest interiors and uses a 35-class prediction space that becomes too coarse when deployments encounter arboreal primates, birds, semi-aquatic taxa, or human-associated confounders such as livestock. We present DeepForestVisionV2, an ecology-driven expansion from 35 to 64 prediction classes (61 animal classes plus human, vehicle, and blank) designed to address three recurrent deployment gradients: vertical stratification, scene openness, and anthropogenic interfaces. DeepForestVisionV2 retains the same offline workflow and is trained on 1,535,010 photographs and 243,354 videos from multi-country African tropical-forest projects. Evaluation combines a cross-country cropped-photo validation set, used to assess robustness across sites and camera-trap settings, with three held-out Uganda video benchmarks spanning the targeted gradients. On the validation set, DeepForestVisionV2 reaches 0.86 accuracy, 0.82 macro-F1, and 0.81 balanced accuracy. On the deployment benchmarks, it preserves or improves baseline accuracy despite its harder classification task, while increasing the number of identified taxa from 22 to 29 in forest-interior videos and from 4 to 9 at riverbanks. In the park-edge use case, it raises accuracy from 0.62 to 0.86 and reduces false alarms from 11 to 0. These results show that DeepForestVisionV2 materially improves field utility while preserving robustness across sites, habitats, and camera-trap settings.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript introduces DeepForestVisionV2, an ecology-driven expansion of a prior camera-trap classifier from 35 to 64 classes (61 animal classes plus human, vehicle, blank) to address vertical stratification, scene openness, and anthropogenic interfaces in African tropical forests. It is trained on 1,535,010 photographs and 243,354 videos from multi-country projects and evaluated on a cross-country cropped-photo validation set plus three held-out Uganda video benchmarks spanning forest interiors, riverbanks, and park edges. Reported results include 0.86 accuracy / 0.82 macro-F1 / 0.81 balanced accuracy on validation; on deployment benchmarks the model increases identified taxa (22→29 interiors, 4→9 riverbanks) while preserving or improving accuracy (0.62→0.86 at edges) and reducing false alarms despite the harder task.
Significance. If the reported gains survive scrutiny of label provenance and distribution matching, the work would supply a practically deployable offline tool that materially extends camera-trap utility beyond closed-canopy interiors while retaining cross-site robustness. Strengths include the explicit use of held-out cross-country validation and deployment-specific Uganda benchmarks (avoiding circularity) together with concrete numeric reporting of accuracy, macro-F1, and taxon counts. The ecology-driven taxonomy expansion is a clear motivation, but the absence of class-balance statistics, label-verification protocols, and per-benchmark sample sizes limits assessment of whether the 29 added classes align with Uganda deployment statistics without introducing noise or shift.
major comments (2)
- [Abstract] Abstract: The headline claim that accuracy is preserved or improved on the three held-out Uganda video benchmarks (22→29 taxa, 4→9 taxa, 0.62→0.86 accuracy) is load-bearing for the central assertion of improved field utility. Yet the abstract supplies neither per-benchmark sample sizes, confusion matrices, nor any description of label verification or exclusion rules for the 29 added classes, leaving open whether post-hoc choices affect the metrics.
- [Abstract] Abstract: The training distribution (1.5 M+ multi-country images/videos) and the 29 added classes are treated as free parameters whose alignment with the Uganda deployment gradients is assumed rather than demonstrated. Without explicit discussion of how arboreal/semi-aquatic/anthropogenic classes were sourced and verified against the target scene statistics, the robustness claim across habitats risks being undermined by unmeasured distribution shift or label noise.
minor comments (1)
- [Abstract] Abstract: The exact breakdown of the 64-class space (61 animal + human/vehicle/blank) is stated but could be repeated in a methods section for clarity.
Simulated Author's Rebuttal
We thank the referee for the constructive comments emphasizing transparency in the abstract. We address each point below and will revise the manuscript accordingly to incorporate additional details on benchmarks and class alignment.
read point-by-point responses
-
Referee: [Abstract] Abstract: The headline claim that accuracy is preserved or improved on the three held-out Uganda video benchmarks (22→29 taxa, 4→9 taxa, 0.62→0.86 accuracy) is load-bearing for the central assertion of improved field utility. Yet the abstract supplies neither per-benchmark sample sizes, confusion matrices, nor any description of label verification or exclusion rules for the 29 added classes, leaving open whether post-hoc choices affect the metrics.
Authors: We agree that the abstract would benefit from greater specificity on these points. The full manuscript reports per-benchmark sample sizes and describes label verification by domain experts in the Methods section, with confusion matrices provided in the supplementary material. We will revise the abstract to include the sample sizes for each of the three Uganda benchmarks and a concise statement on verification protocols and exclusion rules. This revision will be made to strengthen the presentation of the headline results. revision: yes
-
Referee: [Abstract] Abstract: The training distribution (1.5 M+ multi-country images/videos) and the 29 added classes are treated as free parameters whose alignment with the Uganda deployment gradients is assumed rather than demonstrated. Without explicit discussion of how arboreal/semi-aquatic/anthropogenic classes were sourced and verified against the target scene statistics, the robustness claim across habitats risks being undermined by unmeasured distribution shift or label noise.
Authors: The manuscript motivates the 29 added classes through an ecology-driven analysis of deployment gradients observed across the multi-country training projects, which include comparable vertical stratification, scene openness, and anthropogenic interfaces. To directly address the concern, we will add a dedicated paragraph in the Methods section detailing the sourcing of arboreal, semi-aquatic, and anthropogenic classes from the training data and their verification against scene statistics from the held-out Uganda benchmarks. This will explicitly demonstrate alignment and mitigate risks of unmeasured shift. revision: yes
Circularity Check
No circularity: empirical metrics computed on held-out cross-country and Uganda benchmarks
full rationale
The paper's central claims rest on accuracy, F1, and taxon-count improvements measured on explicitly held-out validation sets and three Uganda deployment video benchmarks that were not used for training or class expansion. No equations, fitted parameters, or self-citations are invoked to derive these metrics; the 35-to-64 class expansion and 1.5M-image training corpus are presented as design choices whose effects are then measured independently. The prior DeepForestVision reference is mentioned only for context and is not load-bearing for the V2 results. This is a standard empirical ML evaluation with no reduction of outputs to inputs by construction.
Axiom & Free-Parameter Ledger
free parameters (2)
- expanded class count =
64
- training data selection
axioms (1)
- domain assumption The added classes accurately label the taxa that appear in the targeted habitats without excessive annotation error
Reference graph
Works this paper leans on
-
[1]
Ahumada, J.A., Fegraus, E., Birch, T., Flores, N., Kays, R., O’Brien, T.G., Palmer, J., Schuttler, S., Zhao, J.Y., Jetz, W., et al.: Wildlife insights: A plat- form to maximize the potential of camera trap and other passive sensor wildlife data for the planet. Environmental Conservation47(1), 1–6 (2020).https: //doi.org/10.1017/S0376892919000298
-
[2]
Ahumada, J.A., Hurtado, J., Lizcano, D.: Monitoring the status and trends of tropical forest terrestrial vertebrate communities from camera trap data: A tool for conservation. PLOS ONE8(9), 1–10 (09 2013).https://doi.org/10.1371/ journal.pone.0073707,https://doi.org/10.1371/journal.pone.0073707
-
[3]
arXiv (2019)
Beery, S., Morris, D., Yang, S.: Efficient pipeline for camera trap image review. arXiv (2019)
2019
-
[4]
Sustainable Wildlife Man- agement
Cornelis, D.: Camera trap survey data of the project “Sustainable Wildlife Man- agement” in Gabon (Mulundu Department). CIRAD Dataverse (2023).https: //doi.org/10.18167/DVN1/TNBXCN
-
[5]
Imagenet: A large-scale hierarchical image database
Deng, J., Dong, W., Socher, R., Li, L.J., Li, K., Fei-Fei, L.: Imagenet: A large-scale hierarchical image database. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR). pp. 248–255 (2009).https://doi.org/ 10.1109/CVPR.2009.5206848
-
[6]
Proceedings of the Python in Science Conference (SciPy) (2025), accessed 2026-01-04
Dorne, E., et al.: Zamba: Computer vision for wildlife conservation. Proceedings of the Python in Science Conference (SciPy) (2025), accessed 2026-01-04
2025
-
[7]
Gadot, T., Istrate, S., Kim, H., Morris, D., Beery, S., Birch, T., Ahumada, J.: To crop or not to crop: Comparing whole-image and cropped classification on a large dataset of camera trap images. IET Computer Vision18(8), 1193– 1208 (2024).https://doi.org/https://doi.org/10.1049/cvi2.12318,https: //ietresearch.onlinelibrary.wiley.com/doi/abs/10.1049/cvi2.12318
-
[8]
Beiträge zur Jagd- und Wildforschung41, 295–306 (2016)
Hatlauf, J., Hackländer, K.: Preliminary results of golden jackal (canis aureus) survey in austria. Beiträge zur Jagd- und Wildforschung41, 295–306 (2016)
2016
-
[9]
van Lunteren, P.: Addaxai: A no-code platform to train and deploy custom yolov5 object detection models. Journal of Open Source Software8(88), 5581 (2023).https://doi.org/10.21105/joss.05581,https://doi.org/10.21105/ joss.05581
-
[10]
Ecological Solu- tions and Evidence6(4), e70167 (2025).https://doi.org/https://doi.org/10
Magaldi, H., Cornette, R., Tibesigwa, J.J., Katumba, R., Rugonge, H., Ama- rasekaran, B., Anderson, N., Cappelle, N., Cardoso, A.W., Cornélis, D., De- schner, T., Fonteyn, D., Garriga, R.M., van Lunteren, P., Rufray, X., Van- thomme, H., Zwerts, J.A., Krief, S.: Deepforestvision: Automated wildlife identification for camera traps of african tropical fores...
-
[11]
American Jour- nal of Primatology87, e70087 (2025).https://doi.org/10.1002/ajp.70087
McKaughan, J.E.T., Stephens, P.A., Hill, R.A.: Estimating abundance of crop- foraging primates in anthropogenic landscapes using camera traps. American Jour- nal of Primatology87, e70087 (2025).https://doi.org/10.1002/ajp.70087
-
[12]
Animal Con- servation23(5), 561–572 (2020).https://doi.org/https://doi.org/10.1111/ acv.12569,https://zslpublications.onlinelibrary.wiley.com/doi/abs/10
Moore, J.F., Pine, W.E., Mulindahabi, F., Niyigaba, P., Gatorano, G., Masoz- era, M.K., Beaudrot, L.: Comparison of species richness and detection between line transects, ground camera traps, and arboreal camera traps. Animal Con- servation23(5), 561–572 (2020).https://doi.org/https://doi.org/10.1111/ acv.12569,https://zslpublications.onlinelibrary.wiley....
2020
-
[13]
Bird Conservation International 18(S1)(2008).https://doi.org/10.1017/S0959270908000348
O’Brien, T.G., Kinnaird, M.F.: A picture is worth a thousand words: the appli- cation of camera trapping to the study of birds. Bird Conservation International 18(S1)(2008).https://doi.org/10.1017/S0959270908000348
-
[14]
Primates62(1), 133–142 (2021).https://doi.org/10.1007/s10329-020-00845-y
Pebsworth, P.A., Gruber, T., Miller, J.D., Zuberb"uhler, K., Young, S.L.: Selecting between iron-rich and clay-rich soils: a geophagy field experiment with black-and- white colobus monkeys in the budongo forest reserve, uganda. Primates62(1), 133–142 (2021).https://doi.org/10.1007/s10329-020-00845-y
-
[15]
Siméoni, O., Vo, H.V., Seitzer, M., Baldassarre, F., Oquab, M., Jose, C., Khalidov, V., Szafraniec, M., Yi, S., Ramamonjisoa, M., Massa, F., Haziza, D., Wehrstedt, L., Wang, J., Darcet, T., Moutakanni, T., Sentana, L., Roberts, C., Vedaldi, A., Tolan, J., Brandt, J., Couprie, C., Mairal, J., Jégou, H., Labatut, P., Bojanowski, P.: Dinov3 (2025),https://ar...
Pith/arXiv arXiv 2025
-
[16]
Acta Zoobot160(2024)
Suss,L.,Hatlauf,J.:Focusoncarnivorecommunities:phototrapsanddataanalysis in biodiversity research. Acta Zoobot160(2024)
2024
-
[17]
Commun.13, 792, DOI: 10.1038/s41467-022-27980-y (2022)
Tuia,D.,Kellenberger,B.,Beery,S.,Costelloe,B.R.,Zuffi,S.,Risse,B.,Mathis,A., Mathis, M.W., van Langevelde, F., Burghardt, T., Kays, R., Klinck, H., Wikelski, M., Couzin, I.D., van Horn, G., Crofoot, M.C., Stewart, C.V., Berger-Wolf, T.: Perspectives in machine learning for wildlife conservation. Nature Communications 13(1), 792 (2022).https://doi.org/10.1...
-
[18]
Methods in Ecology and Evolution14(2), 459–477 (2023).https://doi.org/https://doi.org/10
Vélez, J., McShea, W., Shamon, H., Castiblanco-Camacho, P.J., Tabak, M.A., Chalmers, C., Fergus, P., Fieberg, J.: An evaluation of platforms for pro- cessing camera-trap data using artificial intelligence. Methods in Ecology and Evolution14(2), 459–477 (2023).https://doi.org/https://doi.org/10. 1111/2041-210X.14044,https://besjournals.onlinelibrary.wiley....
-
[19]
Ecology and Evolution12(4), e8808 (2022).https://doi.org/10.1002/ece3.8808
Walton, B.J., Findlay, L.J., Hill, R.A.: Camera traps and guard observations as an alternative to researcher observation for studying anthropogenic foraging. Ecology and Evolution12(4), e8808 (2022).https://doi.org/10.1002/ece3.8808
-
[20]
Methods in Ecology and Evolution14(3), 867–874 (2023)
Whytock, R.C., Suijten, T., van Deursen, T., Świeżewski, J., et al.: Real-time alerts from ai-enabled camera traps using the iridium satellite network: A case-study in gabon, central africa. Methods in Ecology and Evolution14(3), 867–874 (2023). https://doi.org/10.1111/2041-210X.14036
-
[21]
Methods in Ecology and Evolution12, 1080–1092 (2021).https://doi.org/10.1111/2041-210X.13576
Whytock, R.C., Świeżewski, J., Zwerts, J.A., et al.: Robust ecological analysis of camera trap data labelled by a machine learning model. Methods in Ecology and Evolution12, 1080–1092 (2021).https://doi.org/10.1111/2041-210X.13576
-
[22]
Zwerts,J.A.,Sterck,E.H.M.,Verweij,P.A.,Maisels,F.,vanderWaarde,J.,Geelen, E.A.M., Tchoumba, G.B., Donfouet Zebaze, H.F., van Kuijk, M.: FSC-certified for- est management benefits large mammals compared to non-FSC. Nature628(8008), 563–568 (2024).https://doi.org/10.1038/s41586-024-07257-8 DeepForestVisionV2: Camera-Trap Monitoring in African Tropical Fores...
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.