pith. sign in

arxiv: 2605.17123 · v1 · pith:MBXMO7EKnew · submitted 2026-05-16 · 💻 cs.HC · cs.RO

ATRACT: A Trustworthy Robotic Autonomous system to support Casualty Triage

Pith reviewed 2026-05-20 14:32 UTC · model grok-4.3

classification 💻 cs.HC cs.RO
keywords casualty triagedrone videowearable sensorsmulti-modal learningbattlefield triagedata augmentationaction classificationremote assessment
0
0 comments X

The pith

A multi-modal system fusing drone video with wearable sensors reaches 85.7 percent accuracy classifying casualty actions for remote triage.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents ATRACT as a human-in-the-loop system that combines drone-captured video of behavioral cues with wearable physiological signals to assess casualties when direct access is restricted or dangerous. A conditional variational autoencoder generates synthetic data to address the shortage of real injured-action examples, allowing the model to learn from limited battlefield-like footage. On a custom drone dataset the pipeline achieves 85.7 percent accuracy for action classification, and its lightweight CNN visual encoder performs competitively with heavier pre-trained video models. This combination supplies medics with evidence for early casualty prioritization without requiring them to approach the scene immediately. A sympathetic reader sees the work as a concrete engineering step toward safer triage support in contested environments.

Core claim

ATRACT integrates drone video for fine-grained pose and posture cues with complementary body-worn sensor data on heart rate, breathing rate and movement through multi-modal learning; a conditional variational autoencoder augments training data for injured actions, yielding an overall action-classification accuracy of 85.7 percent on the collected drone dataset while the lightweight visual encoder stays competitive with stronger pre-trained backbones.

What carries the argument

Multi-modal fusion of drone video and wearable physiological signals, augmented by conditional variational autoencoder synthetic data generation, to classify casualty actions and supply evidence for medic judgment.

If this is right

  • Medics could receive early, evidence-based assessments of casualty states from a safe standoff distance.
  • The system reduces immediate exposure of frontline personnel by supporting triage prioritization before physical evacuation.
  • A lightweight visual encoder allows the pipeline to run on small drone platforms with limited onboard compute.
  • Human oversight keeps the output as supportive evidence rather than an autonomous final decision.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same sensing combination could be adapted for civilian disaster response where rubble or hazards similarly block direct access.
  • Performance under low-light or adverse weather conditions not represented in the current dataset would be a natural next measurement.
  • Pairing the classifier with autonomous drone routing could enable persistent monitoring without continuous pilot attention.

Load-bearing premise

Synthetic data produced by the conditional variational autoencoder is realistic enough that the resulting model will give reliable evidence for casualty-state assessment under actual battlefield conditions.

What would settle it

Record new drone video of live actors performing genuine injury movements in an outdoor contested-environment simulation and check whether action-classification accuracy falls substantially below 85.7 percent or whether the outputs cease to help medics form triage decisions.

Figures

Figures reproduced from arXiv: 2605.17123 by Ardhendu Behera, Arindam Sikdar, Khizer Saeed, Mindula Illeperuma, Peter Lee, Rafael Pina, Sandip Pradhan, Tasweer Ahmad, Varuna De Silva.

Figure 1
Figure 1. Figure 1: Overview of the proposed ATRACT framework for human-in-the-loop battlefield triage. The system uses [PITH_FULL_IMAGE:figures/full_fig_p002_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: A step-by-step pipeline of our human-in-the-loop ATRACT system. The architecture includes [PITH_FULL_IMAGE:figures/full_fig_p004_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Data collection setup: (A) DJI Mini 2 drone, (B) participants wearing BioModules, (C) base station, (D) data logging using Om￾nisense [34]. Sensors used: 1) Zephyr BioModule, 2) Echo Gateway Transceiver, 3) GPS (Starz 818XT), and 4) BioModule on chest strap. III. THE ATRACT SYSTEM Our framework integrates sensor and drone video inputs for human-in-the-loop triage support system, shown in [PITH_FULL_IMAGE:… view at source ↗
Figure 4
Figure 4. Figure 4: Representative frames of six different choreographed actions, performed by cadets during data acquisition by drone. [PITH_FULL_IMAGE:figures/full_fig_p005_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: CVVitAE architecture for vital-sign augmentation (arm injury, head injury, collapse), and validation through proximity mapping. [PITH_FULL_IMAGE:figures/full_fig_p005_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Graphical Interface of ATRACT system, which integrates [PITH_FULL_IMAGE:figures/full_fig_p007_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: Stable Drone flight while carrying the communication module. [PITH_FULL_IMAGE:figures/full_fig_p007_7.png] view at source ↗
Figure 8
Figure 8. Figure 8: Confusion matrix (%) for our CNN and R(2+1)D backbone. [PITH_FULL_IMAGE:figures/full_fig_p008_8.png] view at source ↗
Figure 9
Figure 9. Figure 9: Grad-CAM Visualisations for Explainability, showing that [PITH_FULL_IMAGE:figures/full_fig_p009_9.png] view at source ↗
read the original abstract

At a time when drones are increasingly associated with hostile operations, we re-purpose them for humanitarian and life-saving applications. However, adapting search and rescue drones for battlefield triage remains extremely challenging; the technology must perform reliably to support frontline medics who are forced to operate under extreme uncertainty, restricted access, and significant personal risk. Due to growing vulnerabilities of casualty evacuation in conflicting zones, this paper presents ATRACT (A Trustworthy Robotic Autonomous system to support Casualty Triage), a novel human-in-the-loop decision support system to enable early battlefield triage during the critical post-trauma period. ATRACT integrates drone-captured video with wearable sensor input for multi-modal learning to support casualty-state assessment, thereby addressing the limitations of existing systems. Drone video captures fine-grained behavioural cues, such as pose, posture, while body-worn sensors provide complementary physiological signals, including heart rate, breathing rate, and movement. By combining two modalities, ATRACT provides evidence to support the early judgement of medics when direct access to the casualty is delayed, risky, or restricted. To mitigate the data realism gap pertaining to injured actions, a conditional variational autoencoder is devised for data augmentation. Experimental results on our drone captured dataset show that proposed pipeline achieves 85.7% accuracy for action classification; while our lightweight CNN visual encoder remains competitive with stronger pre-trained video backbones. Overall, the results support ATRACT as a practically meaningful step towards remote triage in contested environments, where multi-modal sensing, human oversight and trustworthy decision support can improve casualty prioritisation, and lessen the exposure of frontline medics.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 0 minor

Summary. The paper proposes ATRACT, a human-in-the-loop decision support system for battlefield casualty triage that fuses drone-captured video (for pose and posture cues) with wearable sensor inputs (heart rate, breathing rate, movement) via multi-modal learning. A conditional variational autoencoder is introduced to augment the dataset for injured actions and thereby close the data realism gap. On a custom drone-captured dataset the pipeline reports 85.7% accuracy for action classification, while a lightweight CNN visual encoder remains competitive with stronger pre-trained video backbones. The work positions the system as a practical step toward remote triage that reduces medic exposure in contested environments.

Significance. If the performance figures and the utility of the CVAE augmentation can be rigorously validated, the manuscript would constitute a relevant contribution to human-computer interaction and robotics for high-stakes humanitarian applications. The multi-modal fusion of behavioral and physiological signals, combined with explicit human oversight, addresses a genuine operational need in restricted-access triage scenarios. The emphasis on trustworthy decision support and the use of generative augmentation for scarce injured-action data are timely themes, though their practical impact cannot yet be assessed from the reported evidence.

major comments (2)
  1. [Abstract] Abstract: The central empirical claim of 85.7% accuracy for action classification is presented without any information on dataset size, number of action classes, train-test split ratios, baseline comparisons, error bars, or validation procedures (e.g., cross-validation or statistical testing). This omission renders the headline performance result impossible to evaluate rigorously and directly undermines the assertion that the pipeline constitutes a practically meaningful step for remote triage.
  2. [Abstract] Abstract: The conditional variational autoencoder used to mitigate the data realism gap for injured actions is described without specification of conditioning variables, latent dimensionality, loss terms, or any quantitative fidelity assessment (reconstruction error, distribution distances, or expert ratings). No ablation comparing accuracy with versus without the synthetic data is supplied, leaving open whether the augmentation improves performance or introduces exploitable artifacts.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive comments, which help strengthen the clarity and evaluability of our empirical claims. We address each point below and have revised the abstract and experiments section accordingly.

read point-by-point responses
  1. Referee: [Abstract] Abstract: The central empirical claim of 85.7% accuracy for action classification is presented without any information on dataset size, number of action classes, train-test split ratios, baseline comparisons, error bars, or validation procedures (e.g., cross-validation or statistical testing). This omission renders the headline performance result impossible to evaluate rigorously and directly undermines the assertion that the pipeline constitutes a practically meaningful step for remote triage.

    Authors: We agree that the abstract would benefit from additional context to support rigorous evaluation of the 85.7% figure. The full manuscript already details the custom drone-captured dataset (including size, action classes, splits, baselines, and cross-validation) in the Experiments section. We have revised the abstract to concisely summarize these elements, added reference to error bars and statistical validation, and clarified the practical relevance for remote triage. revision: yes

  2. Referee: [Abstract] Abstract: The conditional variational autoencoder used to mitigate the data realism gap for injured actions is described without specification of conditioning variables, latent dimensionality, loss terms, or any quantitative fidelity assessment (reconstruction error, distribution distances, or expert ratings). No ablation comparing accuracy with versus without the synthetic data is supplied, leaving open whether the augmentation improves performance or introduces exploitable artifacts.

    Authors: We acknowledge that the abstract provides only a high-level description of the CVAE. The manuscript specifies conditioning on action labels, latent dimensionality, and loss terms in the Methods section, along with fidelity metrics. We have updated the abstract to include these details and added an explicit ablation study (with vs. without augmentation) plus quantitative assessments such as reconstruction error and distribution distances in the revised Experiments section to demonstrate performance gains without artifacts. revision: yes

Circularity Check

0 steps flagged

No circularity: results are direct empirical evaluation on collected data

full rationale

The paper presents an empirical system for multi-modal casualty assessment using drone video and wearable sensors, augmented by a conditional VAE for injured-action data. All reported performance (85.7% action-classification accuracy, competitiveness of the lightweight CNN) is obtained from direct evaluation on the authors' drone-captured dataset. No mathematical derivations, first-principles predictions, or fitted parameters are invoked whose outputs reduce by construction to the inputs; the central claims rest on experimental measurement rather than any self-referential chain. The CVAE is used for augmentation but its outputs are not renamed as independent predictions, and no self-citation load-bearing uniqueness theorems appear in the provided text.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The central claim rests on domain assumptions about the informativeness of video pose cues and sensor vitals for casualty state, plus the effectiveness of CVAE augmentation for limited injury data; no free parameters or invented entities are explicitly introduced beyond standard ML components.

axioms (2)
  • domain assumption Drone video captures fine-grained behavioural cues such as pose and posture that are indicative of casualty state
    Invoked when describing integration of video modality for assessment when direct access is delayed.
  • domain assumption Wearable sensors provide complementary physiological signals including heart rate, breathing rate, and movement
    Stated as input for multi-modal learning to support medic judgment.

pith-pipeline@v0.9.0 · 5855 in / 1563 out tokens · 68023 ms · 2026-05-20T14:32:32.089971+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

40 extracted references · 40 canonical work pages · 1 internal anchor

  1. [1]

    Telemedicine in humanitarian aid: evaluation of potentials and challenges and an implementation trial in ukraine,

    F. Habers, A. M ¨uller, J. Kunczik, R. Rossaint, M. Czaplik, and A. Foll- mann, “Telemedicine in humanitarian aid: evaluation of potentials and challenges and an implementation trial in ukraine,”Frontiers in Disaster and Emergency Medicine, vol. 3, p. 1718877, 2025

  2. [2]

    Victim detection and localization in emergencies,

    C. S. ´Alvarez-Merino, E. J. Khatib, H. Q. Luo-Chen, and R. Barco, “Victim detection and localization in emergencies,”Sensors, vol. 22, no. 21, p. 8433, 2022

  3. [3]

    Automated unmanned aerial system for camera-based semi-automatic triage categorization in mass casualty incidents,

    L. M ¨osch, D. Q. Pokee, I. Barz, A. M ¨uller, A. Follmann, D. Moormann, M. Czaplik, and C. B. Pereira, “Automated unmanned aerial system for camera-based semi-automatic triage categorization in mass casualty incidents,”Drones, vol. 8, no. 10, p. 589, 2024

  4. [4]

    Vision based victim detection from unmanned aerial vehicles,

    M. Andriluka, P. Schnitzspan, J. Meyer, S. Kohlbrecher, K. Petersen, O. V on Stryk, S. Roth, and B. Schiele, “Vision based victim detection from unmanned aerial vehicles,” in2010 IEEE/RSJ international con- ference on intelligent robots and systems. IEEE, 2010, pp. 1740–1747

  5. [5]

    Life signs detector using a drone in disaster zones,

    A. Al-Naji, A. G. Perera, S. L. Mohammed, and J. Chahl, “Life signs detector using a drone in disaster zones,”Remote Sensing, vol. 11, no. 20, p. 2441, 2019

  6. [6]

    Consciousness detection on injured simulated patients using manual and automatic classification via visible and infrared imaging,

    D. Queir ´os Pokee, C. Barbosa Pereira, L. M ¨osch, A. Follmann, and M. Czaplik, “Consciousness detection on injured simulated patients using manual and automatic classification via visible and infrared imaging,”Sensors, vol. 21, no. 24, p. 8455, 2021

  7. [7]

    A revision of the trauma score,

    H. R. Champion, W. J. Sacco, W. S. Copes, D. S. Gann, T. A. Gennarelli, and M. E. Flanagan, “A revision of the trauma score,”Journal of Trauma, vol. 29, no. 5, pp. 623–629, 1989

  8. [8]

    The injury severity score: A method for describing patients with multiple injuries and evaluating emergency care,

    S. P. Baker, B. O’Neill, W. Haddon, and W. B. Long, “The injury severity score: A method for describing patients with multiple injuries and evaluating emergency care,”Journal of Trauma, vol. 14, no. 3, pp. 187–196, 1974

  9. [9]

    Use of SALT triage in a simulated mass-casualty incident,

    E. B. Lerner, R. B. Schwartz, P. L. Coule, and R. G. Pirrallo, “Use of SALT triage in a simulated mass-casualty incident,”Prehospital Emergency Care, vol. 14, no. 1, pp. 21–25, 2011

  10. [10]

    Machine learning without borders? an adaptable tool to optimize mortality prediction in diverse clinical settings,

    S. A. Christie, A. E. Hubbard, R. A. Callcut, M. Hameed, F. N. Dissak- Delon, D. Mekolo, A. Saidou, A. C. Mefire, P. Nsongoo, R. A. Dicker et al., “Machine learning without borders? an adaptable tool to optimize mortality prediction in diverse clinical settings,”Journal of Trauma and Acute Care Surgery, vol. 85, no. 5, pp. 921–927, 2018

  11. [11]

    Tactical combat casualty care in special operations,

    F. K. Butler, J. Hagmann, and E. G. Butler, “Tactical combat casualty care in special operations,”Military Medicine, vol. 165, no. suppl 1, pp. 1–16, 2000

  12. [12]

    Wearable sensors incorpo- rating compensatory reserve measurement for advancing physiological monitoring in critically injured trauma patients,

    V . A. Convertino, S. G. Schauer, E. K. Weitzel, S. Cardin, M. E. Stackle, M. J. Talley, M. N. Sawka, and O. T. Inan, “Wearable sensors incorpo- rating compensatory reserve measurement for advancing physiological monitoring in critically injured trauma patients,”Sensors, vol. 20, no. 22, p. 6413, 2020

  13. [13]

    A survey on wearable sensor- based systems for health monitoring and prognosis,

    A. Pantelopoulos and N. G. Bourbakis, “A survey on wearable sensor- based systems for health monitoring and prognosis,”IEEE Transactions on Systems, Man, and Cybernetics, Part C, vol. 40, no. 1, pp. 1–12, 2010

  14. [14]

    UA V-assisted disaster manage- ment: Applications and open issues,

    M. Erdelj, O. Kr ´al, and E. Natalizio, “UA V-assisted disaster manage- ment: Applications and open issues,” pp. 1–5, 2017

  15. [15]

    The economic and operational value of using drones to transport vaccines,

    L. A. Haidari, S. T. Brown, M. Ferguson, E. Bancroft, M. Spiker, A. Wilcox, R. Ambikapathi, V . Sampath, D. L. Connor, and B. Y . Lee, “The economic and operational value of using drones to transport vaccines,”Vaccine, vol. 34, no. 34, pp. 4062–4067, 2016

  16. [16]

    Development of the aerial remote triage system using drones in mass casualty scenarios: a survey of international experts,

    C. ´Alvarez-Garc´ıa, S. C´amara-Anguita, J. M. L ´opez-Hens, N. Granero- Moya, M. D. L ´opez-Franco, I. Mar ´ıa-Comino-Sanz, S. Sanz-Martos, and P. L. Pancorbo-Hidalgo, “Development of the aerial remote triage system using drones in mass casualty scenarios: a survey of international experts,”PLoS one, vol. 16, no. 5, p. e0242947, 2021

  17. [17]

    Okutama-action: An aerial view video dataset for concurrent human action detection,

    M. Barekatain, M. Mart ´ı, H.-F. Shih, S. Murray, K. Nakayama, Y . Mat- suo, and H. Prendinger, “Okutama-action: An aerial view video dataset for concurrent human action detection,” inProceedings of the IEEE conference on computer vision and pattern recognition workshops, 2017, pp. 28–35

  18. [18]

    Drone-action: An outdoor recorded drone video dataset for action recognition,

    A. G. Perera, Y . W. Law, and J. Chahl, “Drone-action: An outdoor recorded drone video dataset for action recognition,”Drones, vol. 3, no. 4, p. 82, 2019

  19. [19]

    Dronecaps: recognition of human actions in drone videos using capsule networks with binary volume comparisons,

    A. M. Algamdi, V . Sanchez, and C.-T. Li, “Dronecaps: recognition of human actions in drone videos using capsule networks with binary volume comparisons,” in2020 IEEE International Conference on Image Processing (ICIP). IEEE, 2020, pp. 3174–3178

  20. [20]

    Uav-human: A large benchmark for human behavior understanding with unmanned aerial vehicles,

    T. Li, J. Liu, W. Zhang, Y . Ni, W. Wang, and Z. Li, “Uav-human: A large benchmark for human behavior understanding with unmanned aerial vehicles,” inProceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2021, pp. 16 266–16 275

  21. [21]

    Pmi sampler: Patch similarity guided frame selection for aerial action recognition,

    R. Xian, X. Wang, D. Kothandaraman, and D. Manocha, “Pmi sampler: Patch similarity guided frame selection for aerial action recognition,” inProceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, 2024, pp. 6982–6991

  22. [22]

    Drone-hat: Hybrid attention transformer for complex ac- tion recognition in drone surveillance videos,

    M. Khan, J. Ahmad, A. El Saddik, W. Gueaieb, G. De Masi, and F. Karray, “Drone-hat: Hybrid attention transformer for complex ac- tion recognition in drone surveillance videos,” inProceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2024, pp. 4713–4722

  23. [23]

    Airpose: Multi-view fusion network for aerial 3d human pose and shape estima- tion,

    N. Saini, E. Bonetto, E. Price, A. Ahmad, and M. J. Black, “Airpose: Multi-view fusion network for aerial 3d human pose and shape estima- tion,”IEEE Robotics and Automation Letters, vol. 7, no. 2, pp. 4805– 4812, 2022

  24. [24]

    Active human pose estimation via an autonomous uav agent,

    J. Chen, B. He, C. D. Singh, C. Ferm ¨uller, and Y . Aloimonos, “Active human pose estimation via an autonomous uav agent,” in2024 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE, 2024, pp. 7801–7808

  25. [25]

    Flypose: Towards robust human pose estimation from aerial views,

    H. Farooq, M. Brenner, and P. St ¨utz, “Flypose: Towards robust human pose estimation from aerial views,” inProceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, 2026, pp. 8617– 8627

  26. [26]

    Learning spatiotemporal features with 3d convolutional networks,

    D. Tran, L. Bourdev, R. Fergus, L. Torresani, and M. Paluri, “Learning spatiotemporal features with 3d convolutional networks,” inProceedings of the IEEE international conference on computer vision, 2015, pp. 4489–4497

  27. [27]

    Quo vadis, action recognition? a new model and the kinetics dataset,

    J. Carreira and A. Zisserman, “Quo vadis, action recognition? a new model and the kinetics dataset,” inproceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2017, pp. 6299–6308

  28. [28]

    A closer look at spatiotemporal convolutions for action recognition,

    D. Tran, H. Wang, L. Torresani, J. Ray, Y . LeCun, and M. Paluri, “A closer look at spatiotemporal convolutions for action recognition,” in Proceedings of the IEEE conference on Computer Vision and Pattern Recognition, 2018, pp. 6450–6459

  29. [29]

    A deep learning-based radar and camera sensor fusion architecture for object detection,

    F. Nobis, M. Geisslinger, M. Weber, J. Betz, and M. Lienkamp, “A deep learning-based radar and camera sensor fusion architecture for object detection,” in2019 Sensor Data Fusion: Trends, Solutions, Applications (SDF). IEEE, 2019, pp. 1–7

  30. [30]

    A multimodal late fusion framework for physiological sensor and audio- signal-based stress detection: An experimental study and public dataset,

    V .-R. Xefteris, M. Dominguez, J. Grivolla, A. Tsanousa, F. Zaffanela, M. Monego, S. Symeonidis, S. Diplaris, L. Wanner, S. Vrochidiset al., “A multimodal late fusion framework for physiological sensor and audio- signal-based stress detection: An experimental study and public dataset,” Electronics, vol. 12, no. 23, p. 4871, 2023

  31. [31]

    Edge-based multimodal sensor data fusion with vision language models (vlms) for real-time autonomous vehicle accident avoidance,

    F. Yang, B. Yu, Y . Zhou, X. Luo, Z. Tu, and C. Liu, “Edge-based multimodal sensor data fusion with vision language models (vlms) for real-time autonomous vehicle accident avoidance,”arXiv preprint arXiv:2508.01057, 2025

  32. [32]

    Fusion-gcn: Multimodal action recognition using graph convolutional networks,

    M. Duhme, R. Memmesheimer, and D. Paulus, “Fusion-gcn: Multimodal action recognition using graph convolutional networks,” inDAGM German conference on pattern recognition. Springer, 2021, pp. 265– 281

  33. [33]

    Semantics-aware adaptive knowledge distillation for sensor-to-vision action recognition,

    Y . Liu, K. Wang, G. Li, and L. Lin, “Semantics-aware adaptive knowledge distillation for sensor-to-vision action recognition,”IEEE Transactions on Image Processing, vol. 30, pp. 5573–5588, 2021

  34. [34]

    Omnisense user manual,

    Z. Technology, “Omnisense user manual,” 2019. [Online]. Available: https://www.medtronic.com/content/dam/ covidien/library/us/en/product/health-informatics-and-monitoring/ zephyr-omnisense-5-1-user-manual-en-PT00109656A00.pdf

  35. [35]

    Bioharness 3.0 user manual,

    ——, “Bioharness 3.0 user manual,” 2012. [On- line]. Available: https://www.zephyranywhere.com/media/download/ bioharness3-user-manual.pdf

  36. [36]

    A database to support development and evaluation of intelligent intensive care monitoring,

    G. Moody and R. Mark, “A database to support development and evaluation of intelligent intensive care monitoring,” inComputers in Cardiology 1996, 1996, pp. 657–660

  37. [37]

    Learning structured output representation using deep conditional generative models,

    K. Sohn, H. Lee, and X. Yan, “Learning structured output representation using deep conditional generative models,” inAdvances in Neural Information Processing Systems, vol. 28, 2015

  38. [38]

    YOLOv12: Attention-Centric Real-Time Object Detectors

    Y . Tian, Q. Ye, and D. Doermann, “Yolov12: Attention-centric real-time object detectors,”arXiv preprint arXiv:2502.12524, 2025. JOURNAL OF LATEX CLASS FILES, VOL. 14, NO. 8, AUGUST 2021 11

  39. [39]

    Ethical principles for artificial intelligence in national defence,

    M. Taddeo, D. McNeish, A. Blanchard, and E. Edgar, “Ethical principles for artificial intelligence in national defence,” inThe 2021 Yearbook of the Digital Ethics Lab. Springer, 2022, pp. 261–283

  40. [40]

    An ai ethics framework for a trustworthy autonomous drone system to support battlefield casualty triage,

    P. Lee, T. Ahmad, S. M. Waheed, and A. Kenning, “An ai ethics framework for a trustworthy autonomous drone system to support battlefield casualty triage,”AI and Ethics, vol. 6, no. 1, p. 139, 2026