pith. machine review for the scientific record. sign in

arxiv: 2604.10404 · v1 · submitted 2026-04-12 · 💻 cs.ET · cs.LG

Recognition: 2 theorem links

· Lean Theorem

Sense Less, Infer More: Agentic Multimodal Transformers for Edge Medical Intelligence

Brandon Lee, Chengwei Zhou, Christopher Pulliam, Gourav Datta, Haotian Yu, Massoud Pedram, Steve Majerus, Xuming Chen, Zhaoyan Jia

Authors on Pith no claims yet

Pith reviewed 2026-05-10 16:40 UTC · model grok-4.3

classification 💻 cs.ET cs.LG
keywords adaptive multimodal sensingagentic transformersedge medical intelligencesensor gatingsigma-delta samplingwearable health monitoringenergy-efficient inference
0
0 comments X

The pith

An end-to-end agentic framework learns to gate sensors and skip redundant samples, cutting usage by 48.8 percent while raising accuracy 1.9 percent on physiological monitoring tasks.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents Adaptive Multimodal Intelligence as a system that jointly trains a controller to choose active sensors and a sensing module to ignore temporally redundant patches. This setup, built around a cross-modal transformer that fuses partial inputs, is evaluated on three standard wearable datasets and reports both lower sensor activity and higher classification performance than prior methods. The motivation is that continuous streams from ECG, PPG, EMG, and IMU devices exhaust batteries in hours, so selective sensing directly extends usable lifetime without sacrificing diagnostic reliability. Joint optimization of accuracy, sparsity, alignment, and predictive coding lets the model adapt to missing modalities at inference time.

Core claim

By training a differentiable Gumbel-Sigmoid modality controller together with learnable-threshold Sigma-Delta patch sampling inside a foundation-encoder-plus-cross-modal-transformer backbone, the resulting model achieves robust fusion from sparser inputs and thereby delivers both energy reduction and accuracy gains on edge medical tasks.

What carries the argument

The Agentic Modality Controller (Gumbel-Sigmoid gating on model confidence and task relevance) paired with the Learned Sigma-Delta Sensing module (patch-wise operations with trainable thresholds) inside the jointly optimized multimodal prediction network.

If this is right

  • Wearable devices can operate for longer periods on the same battery by activating fewer sensors.
  • The architecture supports dynamic computation graphs and masked operations that translate directly into hardware latency and power savings.
  • Performance remains stable even when some modalities are gated off, removing the need for always-on acquisition.
  • The same joint-training recipe applies across ECG, PPG, EMG, and IMU streams without separate per-modality pipelines.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same gating-plus-sparse-sampling pattern could be applied to non-medical edge sensing tasks such as industrial vibration monitoring or environmental IoT nodes.
  • Extending the learned thresholds to adapt online during deployment would further reduce the need for periodic retraining.
  • Because the transformer backbone already handles missing inputs, the framework may tolerate intermittent wireless dropouts in addition to deliberate sensor gating.

Load-bearing premise

That the learned gating thresholds and cross-modal predictions will continue to preserve accuracy and energy savings when the model encounters real hardware noise, new patients, or longer recording durations absent from the three evaluation datasets.

What would settle it

Running the trained model on physical wearable hardware with a fresh patient cohort over multi-day recordings and checking whether measured battery lifetime and diagnostic accuracy match the reported 48.8 percent sensor reduction and 1.9 percent accuracy lift.

Figures

Figures reproduced from arXiv: 2604.10404 by Brandon Lee, Chengwei Zhou, Christopher Pulliam, Gourav Datta, Haotian Yu, Massoud Pedram, Steve Majerus, Xuming Chen, Zhaoyan Jia.

Figure 1
Figure 1. Figure 1: illustrates this issue: using power values reported in sensor datasheets (e.g., 0.3–1 mW IMU, 1–5 mW ECG, 6–15 mW EMG, 4–10 mW PPG), a wearable with a 300 mWh battery can support each sensor alone for hundreds of hours, yet combining them reduces runtime to under 10 hours. This mismatch between multimodal sensing and limited battery capacity severely restricts long-term, continuous monitoring scenarios suc… view at source ↗
Figure 2
Figure 2. Figure 2: Our Unified Agentic Multimodal Sensing and Inference Framework for Efficient, High-Accuracy Biomedical AI. However, these models rely on dense, continuous sampling from all sensors and incur heavy computational and energy costs. In parallel, sensor selection has been explored through sparsity meth￾ods [51, 54], RL-based policies [48], contextual bandits [14], differen￾tiable gating [28], information-theore… view at source ↗
Figure 4
Figure 4. Figure 4: Thresholding and skipping in Sigma–Delta Sensing. [PITH_FULL_IMAGE:figures/full_fig_p003_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Training pipeline with unrolled timesteps optimized via BPTT. Each fused state 𝑆𝑡 produces a prediction loss, gating loss, predictive coding loss, and a contrastive alignment loss computed against a memory bank. The controller’s gating actions 𝐴𝑡 influence future observations, and all losses jointly update the model through temporal backpropagation. Adaptive Thresholding and Skip Policy: As shown in [PITH… view at source ↗
Figure 6
Figure 6. Figure 6: Sensing rate heatmap over patches obtained from the proposed method on (Left) MHEALTH and (Right) HMC [PITH_FULL_IMAGE:figures/full_fig_p006_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: Per-iteration latency (left) and energy (right) as the sens￾ing rate varies on MHEALTH. Measurements are across ARM CPU, Jetson (TensorRT) and A6000 (PyTorch and TensorRT). Latency is decomposed into AMC and FMPM and values are in √ of ms. deployments on Jetson and A6000 apply standard optimizations such as layer fusion, kernel autotuning, and FP16 execution. Latency & Energy Savings: To evaluate the runti… view at source ↗
read the original abstract

Edge-based multimodal medical monitoring requires models that balance diagnostic accuracy with severe energy constraints. Continuous acquisition of ECG, PPG, EMG, and IMU streams rapidly drains wearable batteries, often limiting operation to under 10 hours, while existing systems overlook the high temporal redundancy present in physiological signals. We introduce Adaptive Multimodal Intelligence (AMI), an end-to-end framework that jointly learns when to sense and how to infer. AMI integrates three components: (1) a lightweight Agentic Modality Controller that uses differentiable Gumbel-Sigmoid gating to dynamically select active sensors based on model confidence and task relevance; (2) a Learned Sigma-Delta Sensing module that applies patch-wise Delta-Sigma operations with learnable thresholds to skip temporally redundant samples; and (3) a Foundation-backed Multimodal Prediction Model built on unimodal foundation encoders and a cross-modal transformer with temporal context, enabling robust fusion even under gated or missing inputs. These components are trained jointly via a multi-objective loss combining classification accuracy, sparsity regularization, cross-modal alignment, and predictive coding. AMI is hardware-aware, supporting dynamic computation graphs and masked operations, leading to real energy and latency savings. Across MHEALTH, HMC Sleep, and WESAD datasets, it reduces sensor usage by 48.8% while improving state-of-the-art accuracy by 1.9% on average.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 1 minor

Summary. The paper presents Adaptive Multimodal Intelligence (AMI), an end-to-end framework for edge medical intelligence that combines an Agentic Modality Controller using differentiable Gumbel-Sigmoid gating for dynamic sensor selection, a Learned Sigma-Delta Sensing module with learnable thresholds for skipping redundant samples, and a foundation-backed multimodal transformer for prediction under partial inputs. Jointly trained with a multi-objective loss, it claims to achieve 48.8% reduction in sensor usage and 1.9% average accuracy improvement over state-of-the-art on the MHEALTH, HMC Sleep, and WESAD datasets.

Significance. Should the empirical results prove robust, this contribution would be significant for the field of edge AI in healthcare, as it addresses the critical trade-off between continuous monitoring accuracy and battery life in wearables. The differentiable, hardware-aware design for adaptive sensing represents a technical advance over static or heuristic approaches, with potential for broader application in resource-constrained multimodal sensing scenarios. The joint optimization of gating, sampling, and inference is a strength.

major comments (3)
  1. [Abstract] The headline performance claims (48.8% sensor reduction and +1.9% accuracy) are given as averages without per-dataset breakdowns, baseline specifications, error bars, or statistical tests, which are necessary to substantiate the improvements over prior work.
  2. [Abstract] The evaluation lacks any mention of patient-independent splits, leave-one-subject-out cross-validation, or tests for robustness to noise and distribution shifts, undermining confidence in the generalization of the learned Gumbel-Sigmoid controller and Sigma-Delta thresholds to real-world unseen patients and hardware conditions.
  3. [Abstract] Assertions of 'real energy and latency savings' and 'hardware-aware' benefits are supported only by proxy counts of sensor usage rather than direct measurements of power consumption or latency on target edge devices.
minor comments (1)
  1. [Abstract] The description of the multi-objective loss would benefit from explicit equations or weighting details to allow reproducibility.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for their constructive comments, which identify key areas to improve the transparency and robustness of our evaluation. We address each point below and commit to revisions that strengthen the manuscript without altering its core contributions.

read point-by-point responses
  1. Referee: [Abstract] The headline performance claims (48.8% sensor reduction and +1.9% accuracy) are given as averages without per-dataset breakdowns, baseline specifications, error bars, or statistical tests, which are necessary to substantiate the improvements over prior work.

    Authors: We agree that additional detail is warranted. In the revised manuscript we will add a compact per-dataset breakdown (including means, standard deviations across folds, and the exact baselines) either as a footnote to the abstract or as a new table referenced from the abstract. We will also report paired statistical tests supporting the accuracy gains. revision: yes

  2. Referee: [Abstract] The evaluation lacks any mention of patient-independent splits, leave-one-subject-out cross-validation, or tests for robustness to noise and distribution shifts, undermining confidence in the generalization of the learned Gumbel-Sigmoid controller and Sigma-Delta thresholds to real-world unseen patients and hardware conditions.

    Authors: We acknowledge the importance of subject-independent evaluation. Our experiments already follow a leave-one-subject-out protocol on all three datasets; we will explicitly document this protocol, together with the resulting per-subject variance, in the methods and abstract. We will further add controlled robustness experiments (additive sensor noise and simulated covariate shift) to the revised results section. revision: yes

  3. Referee: [Abstract] Assertions of 'real energy and latency savings' and 'hardware-aware' benefits are supported only by proxy counts of sensor usage rather than direct measurements of power consumption or latency on target edge devices.

    Authors: We agree that direct on-device measurements constitute stronger evidence. Sensor activation is the dominant power consumer in the target wearables, so the reported usage reduction is a first-order proxy; we will augment the revision with calibrated energy estimates drawn from published sensor power profiles and will quantify the latency benefit arising from the dynamic computation graph. Full end-to-end power traces on a specific microcontroller are beyond the present scope and will be noted as future work. revision: partial

Circularity Check

0 steps flagged

No significant circularity; empirical results on external datasets

full rationale

The paper's headline claims (48.8% sensor reduction, +1.9% average accuracy) are direct empirical measurements obtained by running the jointly trained AMI framework on the three external datasets MHEALTH, HMC Sleep, and WESAD. These quantities are not derived from any internal model equation, fitted parameter renamed as a prediction, or self-referential definition; they are observed outcomes of the training and evaluation procedure. The architectural components (Gumbel-Sigmoid controller, learnable Sigma-Delta thresholds, cross-modal transformer) are described as trained end-to-end via a composite loss, but the reported performance numbers remain independent experimental results rather than quantities forced by construction from the inputs or prior self-citations. No load-bearing uniqueness theorem, ansatz smuggling, or renaming of known results appears in the derivation chain.

Axiom & Free-Parameter Ledger

2 free parameters · 2 axioms · 0 invented entities

The framework rests on standard assumptions about differentiability of Gumbel-Sigmoid and the existence of temporal redundancy in the chosen signals; no new physical entities are postulated.

free parameters (2)
  • multi-objective loss weights
    The loss combines classification accuracy, sparsity regularization, cross-modal alignment, and predictive coding; relative weighting of these terms must be chosen or tuned.
  • learnable sigma-delta thresholds
    Patch-wise thresholds are learned and directly control which samples are skipped.
axioms (2)
  • domain assumption Physiological signals contain sufficient temporal redundancy that skipping unchanged patches preserves diagnostic information
    Invoked to justify the Learned Sigma-Delta Sensing module.
  • standard math Gumbel-Sigmoid relaxation provides unbiased gradients for discrete sensor selection
    Used to make the modality controller differentiable.

pith-pipeline@v0.9.0 · 5568 in / 1487 out tokens · 43570 ms · 2026-05-10T16:40:39.196372+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

Reference graph

Works this paper leans on

55 extracted references · 17 canonical work pages · 2 internal anchors

  1. [1]

    Shimmer3 Wearable Sensor Specifications

    2020. Shimmer3 Wearable Sensor Specifications. https://www.shimmersensing. com

  2. [2]

    Salar Abbaspourazad et al. 2024. Large-scale Training of Foundation Models for Wearable Biosignals. InThe Twelfth International Conference on Learning Representations. https://openreview.net/forum?id=pC3WJHf51j

  3. [3]

    Abd Al Aleem et al. 2024. A Deep Learning Approach Using WESAD Data for Multi-Class Classification with Wearable Sensors. In2024 6th Novel Intelligent and Leading Emerging Sciences Conference (NILES). IEEE, 194–197

  4. [4]

    Almujally et al

    N. Almujally et al . 2025. Wearable sensors-based assistive technologies for patient health monitoring.Frontiers in Bioengineering and Biotechnology13 (2025), 1437877

  5. [5]

    2022.Haaglanden Medisch Cen- trum sleep staging database (version 1.1)

    Diego Alvarez-Estevez and Ronald Rijsman. 2022.Haaglanden Medisch Cen- trum sleep staging database (version 1.1). https://doi.org/10.13026/t79q-fr32 RRID:SCR_007345

  6. [6]

    Diego Alvarez-Estevez and Roselyne M Rijsman. 2021. Inter-database validation of a deep learning approach for automatic sleep scoring.PloS one16, 8 (2021), e0256111

  7. [7]

    Samaneh Aminikhanghahi and Diane J Cook. 2017. A survey of methods for time series change point detection.Knowledge and Information Systems51, 2 (2017), 339–367

  8. [8]

    Banos et al

    O. Banos et al . 2014. MHEALTH. UCI Machine Learning Repository. DOI: https://doi.org/10.24432/C5TW22

  9. [9]

    Yoshua Bengio et al. 2013. Estimating or propagating gradients through stochastic neurons for conditional computation.arXiv preprint arXiv:1308.3432(2013)

  10. [10]

    Víctor Campos et al. 2017. Skip RNN: Learning to skip state updates in recurrent neural networks. InInternational Conference on Learning Representations

  11. [11]

    Emmanuel J Candès et al . 2006. Robust uncertainty principles: Exact signal reconstruction from highly incomplete frequency information.IEEE Transactions on Information Theory52, 2 (2006), 489–509

  12. [12]

    Feng Chen and TODO. 2010. Compressed sensing for wireless ECG bio-sensor networks.IEEE Transactions on Biomedical Engineering57, 2 (2010), 139–148

  13. [13]

    Isaac Debache et al. 2020. A lean and performant hierarchical model for human activity recognition using body-mounted sensors.Sensors20, 11 (2020), 3090

  14. [14]

    Demirel et al

    B. Demirel et al. 2022. Neural Contextual Bandits Based Dynamic Sensor Se- lection for Low-Power Body-Area Networks. InProceedings of the ACM/IEEE International Symposium on Low Power Electronics and Design (ISLPED). Boston, MA, USA, 1–6. https://doi.org/10.1145/3531437.3539713

  15. [15]

    Thomas Elsken, Jan Hendrik Metzen, and Frank Hutter. 2019. Neural architecture search: A survey.Journal of Machine Learning Research20, 1 (2019), 1997–2017

  16. [16]

    Ching Fang et al . 2024. Promoting cross-modal representations to improve multimodal foundation models for physiological signals. InAdvancements In Medical Foundation Models: Explainability, Robustness, Security, and Beyond. https: //openreview.net/forum?id=HNQxrUOvX4

  17. [17]

    Oliver Faust et al. 2018. Deep Learning for Healthcare Applications Based on Physiological Signals: A Review.Computer Methods and Programs in Biomedicine 161 (2018), 1–13

  18. [18]

    Chelsea Finn, Pieter Abbeel, and Sergey Levine. 2017. Model-agnostic meta- learning for fast adaptation of deep networks. InInternational Conference on Machine Learning. 1126–1135

  19. [19]

    Sebastian Frey, Marco Guermandi, Simone Benatti, Victor Kartsch, Andrea Cosset- tini, and Luca Benini. 2023. BioGAP: a 10-Core FP-capable Ultra-Low Power IoT Processor, with Medical-Grade AFE and BLE Connectivity for Wearable Biosig- nal Processing. In2023 IEEE International Conference on Omni-layer Intelligent Systems (COINS), Vol. 1. 1–7. https://doi.or...

  20. [20]

    Garnett et al

    R. Garnett et al. 2010. Bayesian optimization for sensor set selection. InProceed- ings of the 9th ACM/IEEE International Conference on Information Processing in Sensor Networks. 209–219. https://doi.org/10.1145/1791212.1791238

  21. [21]

    Yi Gu et al. 2026. Learning Contrastive Multimodal Fusion with Improved Modal- ity Dropout for Disease Detection and Prediction. InMedical Image Computing and Computer Assisted Intervention – MICCAI 2025. 280–290

  22. [22]

    Song Han, Huizi Mao, and William J Dally. 2016. Deep compression: Compressing deep neural networks with pruning, trained quantization and Huffman coding. International Conference on Learning Representations(2016)

  23. [23]

    Han et al

    Yi. Han et al. 2021. Dynamic neural networks: A survey.IEEE Transactions on Pattern Analysis and Machine Intelligence(2021)

  24. [24]

    Geoffrey Hinton, Oriol Vinyals, and Jeff Dean. 2015. Distilling the knowledge in a neural network.arXiv preprint arXiv:1503.02531(2015)

  25. [25]

    Texas Instruments. 2019. ADS1292R Low-Power Analog Front-End for ECG and Bioelectrical Measurements. Datasheet

  26. [26]

    Maxim Integrated. 2018. MAX30101 Optical Pulse Oximeter and Heart-Rate Sensor. Datasheet

  27. [27]

    Jacob et al

    B. Jacob et al. 2018. Quantization and training of neural networks for efficient integer-arithmetic-only inference. InProceedings of the IEEE Conference on Com- puter Vision and Pattern Recognition. 2704–2713

  28. [28]

    Eric Jang, Shixiang Gu, and Ben Poole. 2017. Categorical reparameterization with Gumbel-Softmax. InInternational Conference on Learning Representations

  29. [29]

    arXiv preprint arXiv:2504.19596 , year=

    W. Jiang et al . 2025. Towards Robust Multimodal Physiological Foundation Models: Handling Arbitrary Missing Modalities.arXiv preprint arXiv:2504.19596 (2025)

  30. [30]

    Kolba and L

    M. Kolba and L. Collins. 2006. Information-theoretic Sensor Management for Multimodal Sensing. In2006 IEEE International Symposium on Geoscience and Remote Sensing, Vol. 1. 3935–3938. https://doi.org/10.1109/IGARSS.2006.1009

  31. [31]

    Kurz et al

    Christoph F. Kurz et al. 2025. Benchmarking vision–language models for diag- nostics in emergency and critical care settings.npj Digital Medicine8 (2025), 423. https://doi.org/10.1038/s41746-025-01837-2

  32. [32]

    Shih-Chii Liu and Tobi Delbruck. 2010. Neuromorphic sensory systems.Current Opinion in Neurobiology20, 3 (2010), 288–295

  33. [33]

    William Lotter, Gabriel Kreiman, and David Cox. 2017. Deep Predictive Coding Networks for Video Prediction and Unsupervised Learning. InInternational Conference on Learning Representations

  34. [34]

    Shuo Ma et al . 2024. SleepMG: Multimodal generalizable sleep staging with inter-modal balance of classification and domain discrimination. InProceedings of the 32nd ACM International Conference on Multimedia. 4004–4013

  35. [35]

    Kaden McKeen, Sameer Masood, Augustin Toma, Barry Rubin, and Bo Wang

  36. [36]

    Ecg-fm: An open electrocardiogram foundation model.JAMIA open8, 5 (2025), ooaf122

  37. [37]

    Sparsh Mittal. 2016. A survey of techniques for approximate computing.Comput. Surveys48, 4 (2016), 1–33

  38. [38]

    2023.TensorRT Developer Guide

    NVIDIA Corporation. 2023.TensorRT Developer Guide. https://docs.nvidia.com/ deeplearning/tensorrt/developer-guide/

  39. [39]

    Peter O’Connor and Max Welling. 2016. Sigma delta quantized networks.arXiv preprint arXiv:1611.02024(2016)

  40. [40]

    Yanghua Peng et al. 2021. DL2: A Deep Learning-Driven Scheduler for Deep Learning Clusters. , 1947-1960 pages. https://doi.org/10.1109/TPDS.2021.3052895

  41. [41]

    Pereira et al

    C. Pereira et al. 2024. Machine Learning Applied to Edge Computing and Wear- able Devices for Healthcare: Systematic Mapping of the Literature.Sensors24, 19 (2024). https://doi.org/10.3390/s24196322

  42. [42]

    Pillai et al

    A. Pillai et al. 2025. PaPaGei: Open Foundation Models for Optical Physiological Signals. InThe Thirteenth International Conference on Learning Representations. https://openreview.net/forum?id=kYwTmlq6Vn

  43. [43]

    Rajesh PN Rao and Dana H Ballard. 1999. Predictive coding in the visual cortex: a functional interpretation of some extra-classical receptive-field effects.Nature Neuroscience2, 1 (1999), 79–87

  44. [44]

    Philip Schmidt et al. 2018. Introducing wesad, a multimodal dataset for wearable stress and affect detection. InProceedings of the 20th ACM international conference on multimodal interaction. 400–408

  45. [45]

    Richard Schreier and Gabor C. Temes. 2005.Understanding Delta–Sigma Data Converters. IEEE Press

  46. [46]

    Abu Sebastian, Manuel Le Gallo, Riduan Khaddam-Aljameh, and Evangelos Eleftheriou. 2020. Memory devices and applications for in-memory computing. Nature Nanotechnology15, 7 (2020), 529–544

  47. [47]

    Deepak Sharma, Arup Roy, Sankar Prasad Bag, Pawan Kumar Singh, and Youakim Badr. 2023. A hybrid deep learning-based approach for human activity recogni- tion using wearable sensors. InInnovations in Machine and Deep Learning: Case Studies and Applications. Springer, 231–259

  48. [48]

    Basit Riaz Sheikh and Rajit Manohar. 2011. Energy-Efficient Pipeline Templates for High-Performance Asynchronous Circuits.J. Emerg. Technol. Comput. Syst. 7, 4, Article 19 (Dec. 2011), 26 pages. https://doi.org/10.1145/2043643.2043649

  49. [49]

    Ali Tazarv et al. 2023. Active Reinforcement Learning for Personalized Stress Monitoring in Everyday Settings. In2023 IEEE/ACM Conference on Connected Health: Applications, Systems and Engineering Technologies (CHASE), Vol. 1. 44–55. https://doi.org/10.1145/3580252.3586979

  50. [50]

    Surat Teerapittayanon, Bradley McDanel, and HT Kung. 2016. BranchyNet: Fast inference via early exiting from deep neural networks. In2016 23rd International Conference on Pattern Recognition (ICPR). IEEE, 2464–2469

  51. [51]

    Thapa et al

    R. Thapa et al. 2025. A Multimodal Sleep Foundation Model Developed with 500K Hours of Sleep Recordings for Disease Predictions.medRxiv(2025). https: //doi.org/10.1101/2025.02.04.25321675

  52. [52]

    Robert Tibshirani. 1996. Regression shrinkage and selection via the lasso.Journal of the Royal Statistical Society: Series B58, 1 (1996), 267–288. 7

  53. [53]

    Bichen Wu et al . 2019. FBNet: Hardware-aware efficient convnet design via differentiable neural architecture search. InProceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 10734–10742

  54. [54]

    Jinghua Xu and Michael Staniek. 2025. Multimodal Transformers for Clinical Time Series Forecasting and Early Sepsis Prediction. InProceedings of the Second Workshop on Patient-Oriented Language Processing (CL4Health). Association for Computational Linguistics. https://doi.org/10.18653/v1/2025.cl4health-1.8

  55. [55]

    Hui Zou and Trevor Hastie. 2005. Regularization and variable selection via the elastic net.Journal of the Royal Statistical Society: Series B67, 2 (2005), 301–320. 8