ATRACT: A Trustworthy Robotic Autonomous system to support Casualty Triage

Ardhendu Behera; Arindam Sikdar; Khizer Saeed; Mindula Illeperuma; Peter Lee; Rafael Pina; Sandip Pradhan; Tasweer Ahmad; Varuna De Silva

arxiv: 2605.17123 · v1 · pith:MBXMO7EKnew · submitted 2026-05-16 · 💻 cs.HC · cs.RO

ATRACT: A Trustworthy Robotic Autonomous system to support Casualty Triage

Tasweer Ahmad , Rafael Pina , Sandip Pradhan , Arindam Sikdar , Mindula Illeperuma , Khizer Saeed , Peter Lee , Varuna De Silva

show 1 more author

Ardhendu Behera

This is my paper

Pith reviewed 2026-05-20 14:32 UTC · model grok-4.3

classification 💻 cs.HC cs.RO

keywords casualty triagedrone videowearable sensorsmulti-modal learningbattlefield triagedata augmentationaction classificationremote assessment

0 comments

The pith

A multi-modal system fusing drone video with wearable sensors reaches 85.7 percent accuracy classifying casualty actions for remote triage.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents ATRACT as a human-in-the-loop system that combines drone-captured video of behavioral cues with wearable physiological signals to assess casualties when direct access is restricted or dangerous. A conditional variational autoencoder generates synthetic data to address the shortage of real injured-action examples, allowing the model to learn from limited battlefield-like footage. On a custom drone dataset the pipeline achieves 85.7 percent accuracy for action classification, and its lightweight CNN visual encoder performs competitively with heavier pre-trained video models. This combination supplies medics with evidence for early casualty prioritization without requiring them to approach the scene immediately. A sympathetic reader sees the work as a concrete engineering step toward safer triage support in contested environments.

Core claim

ATRACT integrates drone video for fine-grained pose and posture cues with complementary body-worn sensor data on heart rate, breathing rate and movement through multi-modal learning; a conditional variational autoencoder augments training data for injured actions, yielding an overall action-classification accuracy of 85.7 percent on the collected drone dataset while the lightweight visual encoder stays competitive with stronger pre-trained backbones.

What carries the argument

Multi-modal fusion of drone video and wearable physiological signals, augmented by conditional variational autoencoder synthetic data generation, to classify casualty actions and supply evidence for medic judgment.

If this is right

Medics could receive early, evidence-based assessments of casualty states from a safe standoff distance.
The system reduces immediate exposure of frontline personnel by supporting triage prioritization before physical evacuation.
A lightweight visual encoder allows the pipeline to run on small drone platforms with limited onboard compute.
Human oversight keeps the output as supportive evidence rather than an autonomous final decision.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same sensing combination could be adapted for civilian disaster response where rubble or hazards similarly block direct access.
Performance under low-light or adverse weather conditions not represented in the current dataset would be a natural next measurement.
Pairing the classifier with autonomous drone routing could enable persistent monitoring without continuous pilot attention.

Load-bearing premise

Synthetic data produced by the conditional variational autoencoder is realistic enough that the resulting model will give reliable evidence for casualty-state assessment under actual battlefield conditions.

What would settle it

Record new drone video of live actors performing genuine injury movements in an outdoor contested-environment simulation and check whether action-classification accuracy falls substantially below 85.7 percent or whether the outputs cease to help medics form triage decisions.

Figures

Figures reproduced from arXiv: 2605.17123 by Ardhendu Behera, Arindam Sikdar, Khizer Saeed, Mindula Illeperuma, Peter Lee, Rafael Pina, Sandip Pradhan, Tasweer Ahmad, Varuna De Silva.

**Figure 2.** Figure 2: A step-by-step pipeline of our human-in-the-loop ATRACT system. The architecture includes [PITH_FULL_IMAGE:figures/full_fig_p004_2.png] view at source ↗

**Figure 3.** Figure 3: Data collection setup: (A) DJI Mini 2 drone, (B) participants wearing BioModules, (C) base station, (D) data logging using Omnisense [34]. Sensors used: 1) Zephyr BioModule, 2) Echo Gateway Transceiver, 3) GPS (Starz 818XT), and 4) BioModule on chest strap. III. THE ATRACT SYSTEM Our framework integrates sensor and drone video inputs for human-in-the-loop triage support system, shown in [PITH_FULL_IMAGE:… view at source ↗

**Figure 4.** Figure 4: Representative frames of six different choreographed actions, performed by cadets during data acquisition by drone. [PITH_FULL_IMAGE:figures/full_fig_p005_4.png] view at source ↗

**Figure 5.** Figure 5: CVVitAE architecture for vital-sign augmentation (arm injury, head injury, collapse), and validation through proximity mapping. [PITH_FULL_IMAGE:figures/full_fig_p005_5.png] view at source ↗

**Figure 6.** Figure 6: Graphical Interface of ATRACT system, which integrates [PITH_FULL_IMAGE:figures/full_fig_p007_6.png] view at source ↗

**Figure 7.** Figure 7: Stable Drone flight while carrying the communication module. [PITH_FULL_IMAGE:figures/full_fig_p007_7.png] view at source ↗

**Figure 8.** Figure 8: Confusion matrix (%) for our CNN and R(2+1)D backbone. [PITH_FULL_IMAGE:figures/full_fig_p008_8.png] view at source ↗

**Figure 9.** Figure 9: Grad-CAM Visualisations for Explainability, showing that [PITH_FULL_IMAGE:figures/full_fig_p009_9.png] view at source ↗

read the original abstract

At a time when drones are increasingly associated with hostile operations, we re-purpose them for humanitarian and life-saving applications. However, adapting search and rescue drones for battlefield triage remains extremely challenging; the technology must perform reliably to support frontline medics who are forced to operate under extreme uncertainty, restricted access, and significant personal risk. Due to growing vulnerabilities of casualty evacuation in conflicting zones, this paper presents ATRACT (A Trustworthy Robotic Autonomous system to support Casualty Triage), a novel human-in-the-loop decision support system to enable early battlefield triage during the critical post-trauma period. ATRACT integrates drone-captured video with wearable sensor input for multi-modal learning to support casualty-state assessment, thereby addressing the limitations of existing systems. Drone video captures fine-grained behavioural cues, such as pose, posture, while body-worn sensors provide complementary physiological signals, including heart rate, breathing rate, and movement. By combining two modalities, ATRACT provides evidence to support the early judgement of medics when direct access to the casualty is delayed, risky, or restricted. To mitigate the data realism gap pertaining to injured actions, a conditional variational autoencoder is devised for data augmentation. Experimental results on our drone captured dataset show that proposed pipeline achieves 85.7% accuracy for action classification; while our lightweight CNN visual encoder remains competitive with stronger pre-trained video backbones. Overall, the results support ATRACT as a practically meaningful step towards remote triage in contested environments, where multi-modal sensing, human oversight and trustworthy decision support can improve casualty prioritisation, and lessen the exposure of frontline medics.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

ATRACT combines drone video and wearable sensors for remote battlefield triage with a reported 85.7% accuracy after CVAE augmentation, but the lack of dataset details and validation leaves the practical claims hard to assess.

read the letter

Hi, the core of this paper is a practical system that fuses drone-captured video for pose and posture cues with body-worn sensors for heart rate, breathing, and movement to help medics triage casualties when direct access is too risky. They add a conditional VAE to generate synthetic data for injured actions and report 85.7% accuracy on action classification, while noting their lightweight CNN visual encoder stays competitive with heavier pre-trained backbones. The framing around re-purposing drones for humanitarian use in contested zones is straightforward and relevant to the military medicine setting they target. The multi-modal approach itself makes sense as a way to gather complementary evidence without putting medics in harm's way. What the work does reasonably well is lay out a human-in-the-loop pipeline that keeps the model light enough for potential field use and ties the technical choices to the operational constraints of restricted access. The soft spots are mostly in the experimental side. The accuracy figure appears without any mention of dataset size, splits, baselines, or statistical measures, which makes it difficult to judge how reliable the number actually is. The CVAE augmentation, presented as the fix for the injured-action data gap, gets no fidelity checks, no ablation against real data only, and no description of conditioning or loss terms. That leaves open whether the synthetic samples help or just add artifacts the model might exploit. This kind of paper is mainly for applied researchers working on AI for emergency response or remote sensing in constrained environments. Readers looking for concrete ideas on modality fusion in high-stakes triage could extract some value, though they would need to fill in the missing validation themselves. It deserves peer review because the underlying problem is real and the proposed integration has clear practical intent; referees could usefully push for the missing dataset stats, ablations, and realism metrics to strengthen the claims.

Referee Report

2 major / 0 minor

Summary. The paper proposes ATRACT, a human-in-the-loop decision support system for battlefield casualty triage that fuses drone-captured video (for pose and posture cues) with wearable sensor inputs (heart rate, breathing rate, movement) via multi-modal learning. A conditional variational autoencoder is introduced to augment the dataset for injured actions and thereby close the data realism gap. On a custom drone-captured dataset the pipeline reports 85.7% accuracy for action classification, while a lightweight CNN visual encoder remains competitive with stronger pre-trained video backbones. The work positions the system as a practical step toward remote triage that reduces medic exposure in contested environments.

Significance. If the performance figures and the utility of the CVAE augmentation can be rigorously validated, the manuscript would constitute a relevant contribution to human-computer interaction and robotics for high-stakes humanitarian applications. The multi-modal fusion of behavioral and physiological signals, combined with explicit human oversight, addresses a genuine operational need in restricted-access triage scenarios. The emphasis on trustworthy decision support and the use of generative augmentation for scarce injured-action data are timely themes, though their practical impact cannot yet be assessed from the reported evidence.

major comments (2)

[Abstract] Abstract: The central empirical claim of 85.7% accuracy for action classification is presented without any information on dataset size, number of action classes, train-test split ratios, baseline comparisons, error bars, or validation procedures (e.g., cross-validation or statistical testing). This omission renders the headline performance result impossible to evaluate rigorously and directly undermines the assertion that the pipeline constitutes a practically meaningful step for remote triage.
[Abstract] Abstract: The conditional variational autoencoder used to mitigate the data realism gap for injured actions is described without specification of conditioning variables, latent dimensionality, loss terms, or any quantitative fidelity assessment (reconstruction error, distribution distances, or expert ratings). No ablation comparing accuracy with versus without the synthetic data is supplied, leaving open whether the augmentation improves performance or introduces exploitable artifacts.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive comments, which help strengthen the clarity and evaluability of our empirical claims. We address each point below and have revised the abstract and experiments section accordingly.

read point-by-point responses

Referee: [Abstract] Abstract: The central empirical claim of 85.7% accuracy for action classification is presented without any information on dataset size, number of action classes, train-test split ratios, baseline comparisons, error bars, or validation procedures (e.g., cross-validation or statistical testing). This omission renders the headline performance result impossible to evaluate rigorously and directly undermines the assertion that the pipeline constitutes a practically meaningful step for remote triage.

Authors: We agree that the abstract would benefit from additional context to support rigorous evaluation of the 85.7% figure. The full manuscript already details the custom drone-captured dataset (including size, action classes, splits, baselines, and cross-validation) in the Experiments section. We have revised the abstract to concisely summarize these elements, added reference to error bars and statistical validation, and clarified the practical relevance for remote triage. revision: yes
Referee: [Abstract] Abstract: The conditional variational autoencoder used to mitigate the data realism gap for injured actions is described without specification of conditioning variables, latent dimensionality, loss terms, or any quantitative fidelity assessment (reconstruction error, distribution distances, or expert ratings). No ablation comparing accuracy with versus without the synthetic data is supplied, leaving open whether the augmentation improves performance or introduces exploitable artifacts.

Authors: We acknowledge that the abstract provides only a high-level description of the CVAE. The manuscript specifies conditioning on action labels, latent dimensionality, and loss terms in the Methods section, along with fidelity metrics. We have updated the abstract to include these details and added an explicit ablation study (with vs. without augmentation) plus quantitative assessments such as reconstruction error and distribution distances in the revised Experiments section to demonstrate performance gains without artifacts. revision: yes

Circularity Check

0 steps flagged

No circularity: results are direct empirical evaluation on collected data

full rationale

The paper presents an empirical system for multi-modal casualty assessment using drone video and wearable sensors, augmented by a conditional VAE for injured-action data. All reported performance (85.7% action-classification accuracy, competitiveness of the lightweight CNN) is obtained from direct evaluation on the authors' drone-captured dataset. No mathematical derivations, first-principles predictions, or fitted parameters are invoked whose outputs reduce by construction to the inputs; the central claims rest on experimental measurement rather than any self-referential chain. The CVAE is used for augmentation but its outputs are not renamed as independent predictions, and no self-citation load-bearing uniqueness theorems appear in the provided text.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The central claim rests on domain assumptions about the informativeness of video pose cues and sensor vitals for casualty state, plus the effectiveness of CVAE augmentation for limited injury data; no free parameters or invented entities are explicitly introduced beyond standard ML components.

axioms (2)

domain assumption Drone video captures fine-grained behavioural cues such as pose and posture that are indicative of casualty state
Invoked when describing integration of video modality for assessment when direct access is delayed.
domain assumption Wearable sensors provide complementary physiological signals including heart rate, breathing rate, and movement
Stated as input for multi-modal learning to support medic judgment.

pith-pipeline@v0.9.0 · 5855 in / 1563 out tokens · 68023 ms · 2026-05-20T14:32:32.089971+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

40 extracted references · 40 canonical work pages · 1 internal anchor

[1]

Telemedicine in humanitarian aid: evaluation of potentials and challenges and an implementation trial in ukraine,

F. Habers, A. M ¨uller, J. Kunczik, R. Rossaint, M. Czaplik, and A. Foll- mann, “Telemedicine in humanitarian aid: evaluation of potentials and challenges and an implementation trial in ukraine,”Frontiers in Disaster and Emergency Medicine, vol. 3, p. 1718877, 2025

work page 2025
[2]

Victim detection and localization in emergencies,

C. S. ´Alvarez-Merino, E. J. Khatib, H. Q. Luo-Chen, and R. Barco, “Victim detection and localization in emergencies,”Sensors, vol. 22, no. 21, p. 8433, 2022

work page 2022
[3]

Automated unmanned aerial system for camera-based semi-automatic triage categorization in mass casualty incidents,

L. M ¨osch, D. Q. Pokee, I. Barz, A. M ¨uller, A. Follmann, D. Moormann, M. Czaplik, and C. B. Pereira, “Automated unmanned aerial system for camera-based semi-automatic triage categorization in mass casualty incidents,”Drones, vol. 8, no. 10, p. 589, 2024

work page 2024
[4]

Vision based victim detection from unmanned aerial vehicles,

M. Andriluka, P. Schnitzspan, J. Meyer, S. Kohlbrecher, K. Petersen, O. V on Stryk, S. Roth, and B. Schiele, “Vision based victim detection from unmanned aerial vehicles,” in2010 IEEE/RSJ international con- ference on intelligent robots and systems. IEEE, 2010, pp. 1740–1747

work page 2010
[5]

Life signs detector using a drone in disaster zones,

A. Al-Naji, A. G. Perera, S. L. Mohammed, and J. Chahl, “Life signs detector using a drone in disaster zones,”Remote Sensing, vol. 11, no. 20, p. 2441, 2019

work page 2019
[6]

Consciousness detection on injured simulated patients using manual and automatic classification via visible and infrared imaging,

D. Queir ´os Pokee, C. Barbosa Pereira, L. M ¨osch, A. Follmann, and M. Czaplik, “Consciousness detection on injured simulated patients using manual and automatic classification via visible and infrared imaging,”Sensors, vol. 21, no. 24, p. 8455, 2021

work page 2021
[7]

A revision of the trauma score,

H. R. Champion, W. J. Sacco, W. S. Copes, D. S. Gann, T. A. Gennarelli, and M. E. Flanagan, “A revision of the trauma score,”Journal of Trauma, vol. 29, no. 5, pp. 623–629, 1989

work page 1989
[8]

The injury severity score: A method for describing patients with multiple injuries and evaluating emergency care,

S. P. Baker, B. O’Neill, W. Haddon, and W. B. Long, “The injury severity score: A method for describing patients with multiple injuries and evaluating emergency care,”Journal of Trauma, vol. 14, no. 3, pp. 187–196, 1974

work page 1974
[9]

Use of SALT triage in a simulated mass-casualty incident,

E. B. Lerner, R. B. Schwartz, P. L. Coule, and R. G. Pirrallo, “Use of SALT triage in a simulated mass-casualty incident,”Prehospital Emergency Care, vol. 14, no. 1, pp. 21–25, 2011

work page 2011
[10]

Machine learning without borders? an adaptable tool to optimize mortality prediction in diverse clinical settings,

S. A. Christie, A. E. Hubbard, R. A. Callcut, M. Hameed, F. N. Dissak- Delon, D. Mekolo, A. Saidou, A. C. Mefire, P. Nsongoo, R. A. Dicker et al., “Machine learning without borders? an adaptable tool to optimize mortality prediction in diverse clinical settings,”Journal of Trauma and Acute Care Surgery, vol. 85, no. 5, pp. 921–927, 2018

work page 2018
[11]

Tactical combat casualty care in special operations,

F. K. Butler, J. Hagmann, and E. G. Butler, “Tactical combat casualty care in special operations,”Military Medicine, vol. 165, no. suppl 1, pp. 1–16, 2000

work page 2000
[12]

Wearable sensors incorpo- rating compensatory reserve measurement for advancing physiological monitoring in critically injured trauma patients,

V . A. Convertino, S. G. Schauer, E. K. Weitzel, S. Cardin, M. E. Stackle, M. J. Talley, M. N. Sawka, and O. T. Inan, “Wearable sensors incorpo- rating compensatory reserve measurement for advancing physiological monitoring in critically injured trauma patients,”Sensors, vol. 20, no. 22, p. 6413, 2020

work page 2020
[13]

A survey on wearable sensor- based systems for health monitoring and prognosis,

A. Pantelopoulos and N. G. Bourbakis, “A survey on wearable sensor- based systems for health monitoring and prognosis,”IEEE Transactions on Systems, Man, and Cybernetics, Part C, vol. 40, no. 1, pp. 1–12, 2010

work page 2010
[14]

UA V-assisted disaster manage- ment: Applications and open issues,

M. Erdelj, O. Kr ´al, and E. Natalizio, “UA V-assisted disaster manage- ment: Applications and open issues,” pp. 1–5, 2017

work page 2017
[15]

The economic and operational value of using drones to transport vaccines,

L. A. Haidari, S. T. Brown, M. Ferguson, E. Bancroft, M. Spiker, A. Wilcox, R. Ambikapathi, V . Sampath, D. L. Connor, and B. Y . Lee, “The economic and operational value of using drones to transport vaccines,”Vaccine, vol. 34, no. 34, pp. 4062–4067, 2016

work page 2016
[16]

Development of the aerial remote triage system using drones in mass casualty scenarios: a survey of international experts,

C. ´Alvarez-Garc´ıa, S. C´amara-Anguita, J. M. L ´opez-Hens, N. Granero- Moya, M. D. L ´opez-Franco, I. Mar ´ıa-Comino-Sanz, S. Sanz-Martos, and P. L. Pancorbo-Hidalgo, “Development of the aerial remote triage system using drones in mass casualty scenarios: a survey of international experts,”PLoS one, vol. 16, no. 5, p. e0242947, 2021

work page 2021
[17]

Okutama-action: An aerial view video dataset for concurrent human action detection,

M. Barekatain, M. Mart ´ı, H.-F. Shih, S. Murray, K. Nakayama, Y . Mat- suo, and H. Prendinger, “Okutama-action: An aerial view video dataset for concurrent human action detection,” inProceedings of the IEEE conference on computer vision and pattern recognition workshops, 2017, pp. 28–35

work page 2017
[18]

Drone-action: An outdoor recorded drone video dataset for action recognition,

A. G. Perera, Y . W. Law, and J. Chahl, “Drone-action: An outdoor recorded drone video dataset for action recognition,”Drones, vol. 3, no. 4, p. 82, 2019

work page 2019
[19]

Dronecaps: recognition of human actions in drone videos using capsule networks with binary volume comparisons,

A. M. Algamdi, V . Sanchez, and C.-T. Li, “Dronecaps: recognition of human actions in drone videos using capsule networks with binary volume comparisons,” in2020 IEEE International Conference on Image Processing (ICIP). IEEE, 2020, pp. 3174–3178

work page 2020
[20]

Uav-human: A large benchmark for human behavior understanding with unmanned aerial vehicles,

T. Li, J. Liu, W. Zhang, Y . Ni, W. Wang, and Z. Li, “Uav-human: A large benchmark for human behavior understanding with unmanned aerial vehicles,” inProceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2021, pp. 16 266–16 275

work page 2021
[21]

Pmi sampler: Patch similarity guided frame selection for aerial action recognition,

R. Xian, X. Wang, D. Kothandaraman, and D. Manocha, “Pmi sampler: Patch similarity guided frame selection for aerial action recognition,” inProceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, 2024, pp. 6982–6991

work page 2024
[22]

Drone-hat: Hybrid attention transformer for complex ac- tion recognition in drone surveillance videos,

M. Khan, J. Ahmad, A. El Saddik, W. Gueaieb, G. De Masi, and F. Karray, “Drone-hat: Hybrid attention transformer for complex ac- tion recognition in drone surveillance videos,” inProceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2024, pp. 4713–4722

work page 2024
[23]

Airpose: Multi-view fusion network for aerial 3d human pose and shape estima- tion,

N. Saini, E. Bonetto, E. Price, A. Ahmad, and M. J. Black, “Airpose: Multi-view fusion network for aerial 3d human pose and shape estima- tion,”IEEE Robotics and Automation Letters, vol. 7, no. 2, pp. 4805– 4812, 2022

work page 2022
[24]

Active human pose estimation via an autonomous uav agent,

J. Chen, B. He, C. D. Singh, C. Ferm ¨uller, and Y . Aloimonos, “Active human pose estimation via an autonomous uav agent,” in2024 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE, 2024, pp. 7801–7808

work page 2024
[25]

Flypose: Towards robust human pose estimation from aerial views,

H. Farooq, M. Brenner, and P. St ¨utz, “Flypose: Towards robust human pose estimation from aerial views,” inProceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, 2026, pp. 8617– 8627

work page 2026
[26]

Learning spatiotemporal features with 3d convolutional networks,

D. Tran, L. Bourdev, R. Fergus, L. Torresani, and M. Paluri, “Learning spatiotemporal features with 3d convolutional networks,” inProceedings of the IEEE international conference on computer vision, 2015, pp. 4489–4497

work page 2015
[27]

Quo vadis, action recognition? a new model and the kinetics dataset,

J. Carreira and A. Zisserman, “Quo vadis, action recognition? a new model and the kinetics dataset,” inproceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2017, pp. 6299–6308

work page 2017
[28]

A closer look at spatiotemporal convolutions for action recognition,

D. Tran, H. Wang, L. Torresani, J. Ray, Y . LeCun, and M. Paluri, “A closer look at spatiotemporal convolutions for action recognition,” in Proceedings of the IEEE conference on Computer Vision and Pattern Recognition, 2018, pp. 6450–6459

work page 2018
[29]

A deep learning-based radar and camera sensor fusion architecture for object detection,

F. Nobis, M. Geisslinger, M. Weber, J. Betz, and M. Lienkamp, “A deep learning-based radar and camera sensor fusion architecture for object detection,” in2019 Sensor Data Fusion: Trends, Solutions, Applications (SDF). IEEE, 2019, pp. 1–7

work page 2019
[30]

A multimodal late fusion framework for physiological sensor and audio- signal-based stress detection: An experimental study and public dataset,

V .-R. Xefteris, M. Dominguez, J. Grivolla, A. Tsanousa, F. Zaffanela, M. Monego, S. Symeonidis, S. Diplaris, L. Wanner, S. Vrochidiset al., “A multimodal late fusion framework for physiological sensor and audio- signal-based stress detection: An experimental study and public dataset,” Electronics, vol. 12, no. 23, p. 4871, 2023

work page 2023
[31]

Edge-based multimodal sensor data fusion with vision language models (vlms) for real-time autonomous vehicle accident avoidance,

F. Yang, B. Yu, Y . Zhou, X. Luo, Z. Tu, and C. Liu, “Edge-based multimodal sensor data fusion with vision language models (vlms) for real-time autonomous vehicle accident avoidance,”arXiv preprint arXiv:2508.01057, 2025

work page arXiv 2025
[32]

Fusion-gcn: Multimodal action recognition using graph convolutional networks,

M. Duhme, R. Memmesheimer, and D. Paulus, “Fusion-gcn: Multimodal action recognition using graph convolutional networks,” inDAGM German conference on pattern recognition. Springer, 2021, pp. 265– 281

work page 2021
[33]

Semantics-aware adaptive knowledge distillation for sensor-to-vision action recognition,

Y . Liu, K. Wang, G. Li, and L. Lin, “Semantics-aware adaptive knowledge distillation for sensor-to-vision action recognition,”IEEE Transactions on Image Processing, vol. 30, pp. 5573–5588, 2021

work page 2021
[34]

Omnisense user manual,

Z. Technology, “Omnisense user manual,” 2019. [Online]. Available: https://www.medtronic.com/content/dam/ covidien/library/us/en/product/health-informatics-and-monitoring/ zephyr-omnisense-5-1-user-manual-en-PT00109656A00.pdf

work page 2019
[35]

Bioharness 3.0 user manual,

——, “Bioharness 3.0 user manual,” 2012. [On- line]. Available: https://www.zephyranywhere.com/media/download/ bioharness3-user-manual.pdf

work page 2012
[36]

A database to support development and evaluation of intelligent intensive care monitoring,

G. Moody and R. Mark, “A database to support development and evaluation of intelligent intensive care monitoring,” inComputers in Cardiology 1996, 1996, pp. 657–660

work page 1996
[37]

Learning structured output representation using deep conditional generative models,

K. Sohn, H. Lee, and X. Yan, “Learning structured output representation using deep conditional generative models,” inAdvances in Neural Information Processing Systems, vol. 28, 2015

work page 2015
[38]

YOLOv12: Attention-Centric Real-Time Object Detectors

Y . Tian, Q. Ye, and D. Doermann, “Yolov12: Attention-centric real-time object detectors,”arXiv preprint arXiv:2502.12524, 2025. JOURNAL OF LATEX CLASS FILES, VOL. 14, NO. 8, AUGUST 2021 11

work page internal anchor Pith review Pith/arXiv arXiv 2025
[39]

Ethical principles for artificial intelligence in national defence,

M. Taddeo, D. McNeish, A. Blanchard, and E. Edgar, “Ethical principles for artificial intelligence in national defence,” inThe 2021 Yearbook of the Digital Ethics Lab. Springer, 2022, pp. 261–283

work page 2021
[40]

An ai ethics framework for a trustworthy autonomous drone system to support battlefield casualty triage,

P. Lee, T. Ahmad, S. M. Waheed, and A. Kenning, “An ai ethics framework for a trustworthy autonomous drone system to support battlefield casualty triage,”AI and Ethics, vol. 6, no. 1, p. 139, 2026

work page 2026

[1] [1]

Telemedicine in humanitarian aid: evaluation of potentials and challenges and an implementation trial in ukraine,

F. Habers, A. M ¨uller, J. Kunczik, R. Rossaint, M. Czaplik, and A. Foll- mann, “Telemedicine in humanitarian aid: evaluation of potentials and challenges and an implementation trial in ukraine,”Frontiers in Disaster and Emergency Medicine, vol. 3, p. 1718877, 2025

work page 2025

[2] [2]

Victim detection and localization in emergencies,

C. S. ´Alvarez-Merino, E. J. Khatib, H. Q. Luo-Chen, and R. Barco, “Victim detection and localization in emergencies,”Sensors, vol. 22, no. 21, p. 8433, 2022

work page 2022

[3] [3]

Automated unmanned aerial system for camera-based semi-automatic triage categorization in mass casualty incidents,

L. M ¨osch, D. Q. Pokee, I. Barz, A. M ¨uller, A. Follmann, D. Moormann, M. Czaplik, and C. B. Pereira, “Automated unmanned aerial system for camera-based semi-automatic triage categorization in mass casualty incidents,”Drones, vol. 8, no. 10, p. 589, 2024

work page 2024

[4] [4]

Vision based victim detection from unmanned aerial vehicles,

M. Andriluka, P. Schnitzspan, J. Meyer, S. Kohlbrecher, K. Petersen, O. V on Stryk, S. Roth, and B. Schiele, “Vision based victim detection from unmanned aerial vehicles,” in2010 IEEE/RSJ international con- ference on intelligent robots and systems. IEEE, 2010, pp. 1740–1747

work page 2010

[5] [5]

Life signs detector using a drone in disaster zones,

A. Al-Naji, A. G. Perera, S. L. Mohammed, and J. Chahl, “Life signs detector using a drone in disaster zones,”Remote Sensing, vol. 11, no. 20, p. 2441, 2019

work page 2019

[6] [6]

Consciousness detection on injured simulated patients using manual and automatic classification via visible and infrared imaging,

D. Queir ´os Pokee, C. Barbosa Pereira, L. M ¨osch, A. Follmann, and M. Czaplik, “Consciousness detection on injured simulated patients using manual and automatic classification via visible and infrared imaging,”Sensors, vol. 21, no. 24, p. 8455, 2021

work page 2021

[7] [7]

A revision of the trauma score,

H. R. Champion, W. J. Sacco, W. S. Copes, D. S. Gann, T. A. Gennarelli, and M. E. Flanagan, “A revision of the trauma score,”Journal of Trauma, vol. 29, no. 5, pp. 623–629, 1989

work page 1989

[8] [8]

The injury severity score: A method for describing patients with multiple injuries and evaluating emergency care,

S. P. Baker, B. O’Neill, W. Haddon, and W. B. Long, “The injury severity score: A method for describing patients with multiple injuries and evaluating emergency care,”Journal of Trauma, vol. 14, no. 3, pp. 187–196, 1974

work page 1974

[9] [9]

Use of SALT triage in a simulated mass-casualty incident,

E. B. Lerner, R. B. Schwartz, P. L. Coule, and R. G. Pirrallo, “Use of SALT triage in a simulated mass-casualty incident,”Prehospital Emergency Care, vol. 14, no. 1, pp. 21–25, 2011

work page 2011

[10] [10]

Machine learning without borders? an adaptable tool to optimize mortality prediction in diverse clinical settings,

S. A. Christie, A. E. Hubbard, R. A. Callcut, M. Hameed, F. N. Dissak- Delon, D. Mekolo, A. Saidou, A. C. Mefire, P. Nsongoo, R. A. Dicker et al., “Machine learning without borders? an adaptable tool to optimize mortality prediction in diverse clinical settings,”Journal of Trauma and Acute Care Surgery, vol. 85, no. 5, pp. 921–927, 2018

work page 2018

[11] [11]

Tactical combat casualty care in special operations,

F. K. Butler, J. Hagmann, and E. G. Butler, “Tactical combat casualty care in special operations,”Military Medicine, vol. 165, no. suppl 1, pp. 1–16, 2000

work page 2000

[12] [12]

Wearable sensors incorpo- rating compensatory reserve measurement for advancing physiological monitoring in critically injured trauma patients,

V . A. Convertino, S. G. Schauer, E. K. Weitzel, S. Cardin, M. E. Stackle, M. J. Talley, M. N. Sawka, and O. T. Inan, “Wearable sensors incorpo- rating compensatory reserve measurement for advancing physiological monitoring in critically injured trauma patients,”Sensors, vol. 20, no. 22, p. 6413, 2020

work page 2020

[13] [13]

A survey on wearable sensor- based systems for health monitoring and prognosis,

A. Pantelopoulos and N. G. Bourbakis, “A survey on wearable sensor- based systems for health monitoring and prognosis,”IEEE Transactions on Systems, Man, and Cybernetics, Part C, vol. 40, no. 1, pp. 1–12, 2010

work page 2010

[14] [14]

UA V-assisted disaster manage- ment: Applications and open issues,

M. Erdelj, O. Kr ´al, and E. Natalizio, “UA V-assisted disaster manage- ment: Applications and open issues,” pp. 1–5, 2017

work page 2017

[15] [15]

The economic and operational value of using drones to transport vaccines,

L. A. Haidari, S. T. Brown, M. Ferguson, E. Bancroft, M. Spiker, A. Wilcox, R. Ambikapathi, V . Sampath, D. L. Connor, and B. Y . Lee, “The economic and operational value of using drones to transport vaccines,”Vaccine, vol. 34, no. 34, pp. 4062–4067, 2016

work page 2016

[16] [16]

Development of the aerial remote triage system using drones in mass casualty scenarios: a survey of international experts,

C. ´Alvarez-Garc´ıa, S. C´amara-Anguita, J. M. L ´opez-Hens, N. Granero- Moya, M. D. L ´opez-Franco, I. Mar ´ıa-Comino-Sanz, S. Sanz-Martos, and P. L. Pancorbo-Hidalgo, “Development of the aerial remote triage system using drones in mass casualty scenarios: a survey of international experts,”PLoS one, vol. 16, no. 5, p. e0242947, 2021

work page 2021

[17] [17]

Okutama-action: An aerial view video dataset for concurrent human action detection,

M. Barekatain, M. Mart ´ı, H.-F. Shih, S. Murray, K. Nakayama, Y . Mat- suo, and H. Prendinger, “Okutama-action: An aerial view video dataset for concurrent human action detection,” inProceedings of the IEEE conference on computer vision and pattern recognition workshops, 2017, pp. 28–35

work page 2017

[18] [18]

Drone-action: An outdoor recorded drone video dataset for action recognition,

A. G. Perera, Y . W. Law, and J. Chahl, “Drone-action: An outdoor recorded drone video dataset for action recognition,”Drones, vol. 3, no. 4, p. 82, 2019

work page 2019

[19] [19]

Dronecaps: recognition of human actions in drone videos using capsule networks with binary volume comparisons,

A. M. Algamdi, V . Sanchez, and C.-T. Li, “Dronecaps: recognition of human actions in drone videos using capsule networks with binary volume comparisons,” in2020 IEEE International Conference on Image Processing (ICIP). IEEE, 2020, pp. 3174–3178

work page 2020

[20] [20]

Uav-human: A large benchmark for human behavior understanding with unmanned aerial vehicles,

T. Li, J. Liu, W. Zhang, Y . Ni, W. Wang, and Z. Li, “Uav-human: A large benchmark for human behavior understanding with unmanned aerial vehicles,” inProceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2021, pp. 16 266–16 275

work page 2021

[21] [21]

Pmi sampler: Patch similarity guided frame selection for aerial action recognition,

R. Xian, X. Wang, D. Kothandaraman, and D. Manocha, “Pmi sampler: Patch similarity guided frame selection for aerial action recognition,” inProceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, 2024, pp. 6982–6991

work page 2024

[22] [22]

Drone-hat: Hybrid attention transformer for complex ac- tion recognition in drone surveillance videos,

M. Khan, J. Ahmad, A. El Saddik, W. Gueaieb, G. De Masi, and F. Karray, “Drone-hat: Hybrid attention transformer for complex ac- tion recognition in drone surveillance videos,” inProceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2024, pp. 4713–4722

work page 2024

[23] [23]

Airpose: Multi-view fusion network for aerial 3d human pose and shape estima- tion,

N. Saini, E. Bonetto, E. Price, A. Ahmad, and M. J. Black, “Airpose: Multi-view fusion network for aerial 3d human pose and shape estima- tion,”IEEE Robotics and Automation Letters, vol. 7, no. 2, pp. 4805– 4812, 2022

work page 2022

[24] [24]

Active human pose estimation via an autonomous uav agent,

J. Chen, B. He, C. D. Singh, C. Ferm ¨uller, and Y . Aloimonos, “Active human pose estimation via an autonomous uav agent,” in2024 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE, 2024, pp. 7801–7808

work page 2024

[25] [25]

Flypose: Towards robust human pose estimation from aerial views,

H. Farooq, M. Brenner, and P. St ¨utz, “Flypose: Towards robust human pose estimation from aerial views,” inProceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, 2026, pp. 8617– 8627

work page 2026

[26] [26]

Learning spatiotemporal features with 3d convolutional networks,

D. Tran, L. Bourdev, R. Fergus, L. Torresani, and M. Paluri, “Learning spatiotemporal features with 3d convolutional networks,” inProceedings of the IEEE international conference on computer vision, 2015, pp. 4489–4497

work page 2015

[27] [27]

Quo vadis, action recognition? a new model and the kinetics dataset,

J. Carreira and A. Zisserman, “Quo vadis, action recognition? a new model and the kinetics dataset,” inproceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2017, pp. 6299–6308

work page 2017

[28] [28]

A closer look at spatiotemporal convolutions for action recognition,

D. Tran, H. Wang, L. Torresani, J. Ray, Y . LeCun, and M. Paluri, “A closer look at spatiotemporal convolutions for action recognition,” in Proceedings of the IEEE conference on Computer Vision and Pattern Recognition, 2018, pp. 6450–6459

work page 2018

[29] [29]

A deep learning-based radar and camera sensor fusion architecture for object detection,

F. Nobis, M. Geisslinger, M. Weber, J. Betz, and M. Lienkamp, “A deep learning-based radar and camera sensor fusion architecture for object detection,” in2019 Sensor Data Fusion: Trends, Solutions, Applications (SDF). IEEE, 2019, pp. 1–7

work page 2019

[30] [30]

A multimodal late fusion framework for physiological sensor and audio- signal-based stress detection: An experimental study and public dataset,

V .-R. Xefteris, M. Dominguez, J. Grivolla, A. Tsanousa, F. Zaffanela, M. Monego, S. Symeonidis, S. Diplaris, L. Wanner, S. Vrochidiset al., “A multimodal late fusion framework for physiological sensor and audio- signal-based stress detection: An experimental study and public dataset,” Electronics, vol. 12, no. 23, p. 4871, 2023

work page 2023

[31] [31]

Edge-based multimodal sensor data fusion with vision language models (vlms) for real-time autonomous vehicle accident avoidance,

F. Yang, B. Yu, Y . Zhou, X. Luo, Z. Tu, and C. Liu, “Edge-based multimodal sensor data fusion with vision language models (vlms) for real-time autonomous vehicle accident avoidance,”arXiv preprint arXiv:2508.01057, 2025

work page arXiv 2025

[32] [32]

Fusion-gcn: Multimodal action recognition using graph convolutional networks,

M. Duhme, R. Memmesheimer, and D. Paulus, “Fusion-gcn: Multimodal action recognition using graph convolutional networks,” inDAGM German conference on pattern recognition. Springer, 2021, pp. 265– 281

work page 2021

[33] [33]

Semantics-aware adaptive knowledge distillation for sensor-to-vision action recognition,

Y . Liu, K. Wang, G. Li, and L. Lin, “Semantics-aware adaptive knowledge distillation for sensor-to-vision action recognition,”IEEE Transactions on Image Processing, vol. 30, pp. 5573–5588, 2021

work page 2021

[34] [34]

Omnisense user manual,

Z. Technology, “Omnisense user manual,” 2019. [Online]. Available: https://www.medtronic.com/content/dam/ covidien/library/us/en/product/health-informatics-and-monitoring/ zephyr-omnisense-5-1-user-manual-en-PT00109656A00.pdf

work page 2019

[35] [35]

Bioharness 3.0 user manual,

——, “Bioharness 3.0 user manual,” 2012. [On- line]. Available: https://www.zephyranywhere.com/media/download/ bioharness3-user-manual.pdf

work page 2012

[36] [36]

A database to support development and evaluation of intelligent intensive care monitoring,

G. Moody and R. Mark, “A database to support development and evaluation of intelligent intensive care monitoring,” inComputers in Cardiology 1996, 1996, pp. 657–660

work page 1996

[37] [37]

Learning structured output representation using deep conditional generative models,

K. Sohn, H. Lee, and X. Yan, “Learning structured output representation using deep conditional generative models,” inAdvances in Neural Information Processing Systems, vol. 28, 2015

work page 2015

[38] [38]

YOLOv12: Attention-Centric Real-Time Object Detectors

Y . Tian, Q. Ye, and D. Doermann, “Yolov12: Attention-centric real-time object detectors,”arXiv preprint arXiv:2502.12524, 2025. JOURNAL OF LATEX CLASS FILES, VOL. 14, NO. 8, AUGUST 2021 11

work page internal anchor Pith review Pith/arXiv arXiv 2025

[39] [39]

Ethical principles for artificial intelligence in national defence,

M. Taddeo, D. McNeish, A. Blanchard, and E. Edgar, “Ethical principles for artificial intelligence in national defence,” inThe 2021 Yearbook of the Digital Ethics Lab. Springer, 2022, pp. 261–283

work page 2021

[40] [40]

An ai ethics framework for a trustworthy autonomous drone system to support battlefield casualty triage,

P. Lee, T. Ahmad, S. M. Waheed, and A. Kenning, “An ai ethics framework for a trustworthy autonomous drone system to support battlefield casualty triage,”AI and Ethics, vol. 6, no. 1, p. 139, 2026

work page 2026