pith. machine review for the scientific record. sign in

arxiv: 2605.08685 · v1 · submitted 2026-05-09 · 💻 cs.LG · cs.AI

Recognition: 2 theorem links

· Lean Theorem

Event Fields: Learning Latent Event Structure for Waveform Foundation Models

Authors on Pith no claims yet

Pith reviewed 2026-05-12 01:12 UTC · model grok-4.3

classification 💻 cs.LG cs.AI
keywords latent event processesphysiological waveformsevent-centric representationsself-supervised consistencywaveform foundation modelsarrhythmia classificationhemodynamic predictionsegmentation-aware encoder
0
0 comments X

The pith

Physiological waveforms are better modeled as latent processes of interacting events than as sequences of signal tokens.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper claims that treating physiological time series such as ECGs as outputs of hidden events with temporal extent and interactions gives a more natural structure than breaking them into fixed local patches or tokens. It introduces a self-supervised method that trains representations to stay consistent when the same waveform is segmented in random ways or viewed through time-frequency projections. The resulting models use a segmentation-aware encoder and an operator for event dependencies, and they show gains on arrhythmia classification, blood-flow prediction, and waveform search while needing fewer labels. A sympathetic reader would care because this suggests foundation models for health signals can rely on an inductive bias that matches how the body actually produces the data rather than on ever-larger sequence capacity alone.

Core claim

Physiological time series are realizations of latent event processes whose boundaries and dynamics are unobserved. A self-supervised framework enforces consistency across stochastic segmentations and time-frequency projections of the same waveform, producing representations that are invariant to signal-level changes yet preserve event-level organization. The architecture combines a segmentation-aware encoder with a latent interaction operator that models dependencies among the inferred events and extends directly to multimodal data by aligning modalities through shared event representations. On arrhythmia classification, hemodynamic prediction, and waveform retrieval benchmarks the approach,

What carries the argument

The latent event process, recovered by a segmentation-aware encoder paired with a latent interaction operator that captures dependencies among inferred events.

If this is right

  • Event-centric models achieve higher accuracy than sequence baselines on arrhythmia classification tasks.
  • Hemodynamic prediction performance improves because event interactions capture longer-range physiological dependencies.
  • Label efficiency increases since the self-supervised consistency objective learns useful structure without task labels.
  • Multimodal physiological data can be aligned by matching representations of the same underlying events.
  • Robustness to signal perturbations rises because the model focuses on event organization rather than raw waveform details.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The method may generalize to noisier real-world clinical recordings where event boundaries are even harder to see by eye.
  • Explicit incorporation of known medical event primitives, such as QRS complexes, could further constrain the latent space.
  • The same consistency-across-views idea might apply to other continuous sensor streams outside medicine, such as industrial vibration or environmental monitoring.
  • Scaling curves for event-centric models could require less raw data than token-based models to reach comparable performance.

Load-bearing premise

Clinically meaningful structure in physiological waveforms arises from temporally extended interacting events whose boundaries are not directly observed, and consistency across different segmentations and projections is sufficient to recover that structure.

What would settle it

Training the event-based model and a strong sequence-based baseline on identical data and observing no statistically significant gain in accuracy or robustness on the arrhythmia classification or hemodynamic prediction benchmarks would falsify the claimed advantage of the event-centric inductive bias.

Figures

Figures reproduced from arXiv: 2605.08685 by Li Na, Shi Li, Yuanyun Zhang.

Figure 1
Figure 1. Figure 1: Overview of the proposed event-field waveform foundation model. (1) Latent event field generative view: physiological waveforms are modeled as superpositions of latent events with variable durations and emissions. (2) Stochastic segmentations: multiple plausible decompositions of the same signal are sampled, reflecting ambiguity in event boundaries. (3) Time–frequency projec￾tions: randomized operators gen… view at source ↗
read the original abstract

We propose a new class of waveform foundation models that departs from conventional sequence based representations by modeling physiological time series as realizations of latent event processes. Rather than treating signals as collections of local tokens or patches, our approach assumes that clinically meaningful structure arises from temporally extended, interacting events whose boundaries and dynamics are not directly observed. To capture this structure, we introduce a self supervised learning framework that enforces consistency across stochastic segmentations and time frequency projections of the same waveform, encouraging representations that are invariant to signal level perturbations while preserving event level organization. The resulting model combines a segmentation aware encoder with a latent interaction operator that captures dependencies among inferred events, and naturally extends to multimodal settings by aligning modalities through shared event representations. Across a range of physiological benchmarks, including arrhythmia classification, hemodynamic prediction, and waveform retrieval, the proposed method improves performance, robustness, and label efficiency relative to strong sequence based baselines. These results suggest that shifting from signal centric to event centric representations provides a more appropriate inductive bias for modeling physiological dynamics and offers a complementary path to scaling foundation models in healthcare.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The manuscript proposes Event Fields, a new class of waveform foundation models that treat physiological time series as realizations of latent event processes rather than conventional sequence or patch-based representations. It introduces a self-supervised consistency framework that enforces invariance across stochastic segmentations and time-frequency projections of the same waveform, using a segmentation-aware encoder and a latent interaction operator to model event dependencies; the approach extends naturally to multimodal alignment via shared event representations. The paper claims that this event-centric inductive bias yields improved performance, robustness, and label efficiency on arrhythmia classification, hemodynamic prediction, and waveform retrieval benchmarks relative to strong sequence-based baselines.

Significance. If the empirical results and the interpretation of recovered events as clinically meaningful structures hold, the work would supply a complementary scaling direction for healthcare foundation models by embedding an inductive bias aligned with the temporally extended, interacting nature of physiological events (e.g., cardiac cycles or pressure phases). It could improve generalization in low-label regimes where sequence models struggle with long-range dynamics.

major comments (2)
  1. [Experiments and Method sections] The central claim that consistency losses across stochastic segmentations recover clinically meaningful latent events lacks direct validation. No ground-truth event annotations, synthetic waveforms with known boundaries, or qualitative inspection demonstrating that inferred events align with recognizable clinical units (rather than generic statistical invariances) is supplied; downstream gains could therefore be attributable to the segmentation-aware encoder or multimodal alignment alone.
  2. [Abstract and Experiments] Quantitative support for the performance claims is absent from the abstract and not detailed with specific metrics, baselines, statistical tests, or ablations in the provided text. Without these, the assertion of improvements in accuracy, robustness, and label efficiency cannot be evaluated, and it remains unclear whether the latent interaction operator contributes beyond architectural changes.
minor comments (2)
  1. [Method] The description of the latent interaction operator would benefit from an explicit equation or pseudocode showing how event dependencies are parameterized and optimized.
  2. [Method] Notation for stochastic segmentations and time-frequency projections should be introduced with a diagram or formal definition to improve clarity for readers unfamiliar with the consistency objective.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback on our manuscript. We address each major comment point by point below and describe the revisions we will make to strengthen the presentation and validation of our claims.

read point-by-point responses
  1. Referee: [Experiments and Method sections] The central claim that consistency losses across stochastic segmentations recover clinically meaningful latent events lacks direct validation. No ground-truth event annotations, synthetic waveforms with known boundaries, or qualitative inspection demonstrating that inferred events align with recognizable clinical units (rather than generic statistical invariances) is supplied; downstream gains could therefore be attributable to the segmentation-aware encoder or multimodal alignment alone.

    Authors: We acknowledge that the current manuscript does not provide ground-truth event annotations or synthetic waveforms with known boundaries, as these are not available for the real physiological datasets used in our benchmarks. To directly address the concern that inferred events may reflect generic statistical invariances rather than clinically meaningful structures, we will add qualitative visualizations in the revised Experiments section. These will show example waveforms with overlaid inferred event boundaries and interactions, compared against standard clinical landmarks (e.g., QRS complexes in ECG or pressure phase transitions). We will also include a dedicated ablation study that isolates the contribution of the consistency loss and latent interaction operator from the segmentation-aware encoder alone. This will help demonstrate that the event-centric components drive the observed gains. revision: yes

  2. Referee: [Abstract and Experiments] Quantitative support for the performance claims is absent from the abstract and not detailed with specific metrics, baselines, statistical tests, or ablations in the provided text. Without these, the assertion of improvements in accuracy, robustness, and label efficiency cannot be evaluated, and it remains unclear whether the latent interaction operator contributes beyond architectural changes.

    Authors: We agree that the abstract would be strengthened by including concrete quantitative results. Although the full Experiments section already contains detailed comparisons against sequence-based baselines, with metrics for arrhythmia classification, hemodynamic prediction, and waveform retrieval, plus robustness and label-efficiency evaluations, we will revise the abstract to highlight representative numerical improvements (e.g., accuracy gains and efficiency metrics) and reference the supporting tables. We will also expand the Experiments section to explicitly include statistical significance tests (e.g., paired t-tests or Wilcoxon tests with p-values) and a focused ablation isolating the latent interaction operator. These updates will make the performance claims and component contributions directly evaluable. revision: yes

Circularity Check

0 steps flagged

No detectable circularity; derivation chain not reducible to inputs by construction.

full rationale

The abstract and visible description introduce a self-supervised consistency objective across stochastic segmentations and time-frequency projections, combined with a segmentation-aware encoder and latent interaction operator. No equations, loss formulations, fitted parameters, or self-citations are provided that would allow any load-bearing step to be exhibited as equivalent to its own inputs by definition. The central premise—that consistency recovers unobserved event structure—is presented as an inductive bias and modeling assumption rather than a derived result forced by prior self-citation chains or renaming of known patterns. Without mathematical detail, no specific reduction (e.g., prediction equaling a fitted quantity) can be quoted or demonstrated. The approach is therefore self-contained against external benchmarks and receives the default non-circularity finding.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 3 invented entities

The central claim rests on the domain assumption that physiological signals possess unobserved event structure; the paper introduces several new modeling components without independent evidence or derivations visible in the abstract.

free parameters (1)
  • model hyperparameters and loss weights
    Standard deep learning components whose specific values are not reported in the abstract.
axioms (1)
  • domain assumption clinically meaningful structure arises from temporally extended, interacting events whose boundaries and dynamics are not directly observed
    Explicitly stated as the modeling premise in the abstract.
invented entities (3)
  • latent event processes no independent evidence
    purpose: Core representation for physiological time series
    Introduced as the fundamental modeling unit replacing sequence tokens.
  • segmentation aware encoder no independent evidence
    purpose: Encoder component that respects inferred event boundaries
    New architectural element described in the framework.
  • latent interaction operator no independent evidence
    purpose: Captures dependencies among inferred events
    New operator introduced to model event-level interactions.

pith-pipeline@v0.9.0 · 5483 in / 1576 out tokens · 74924 ms · 2026-05-12T01:12:26.133569+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

34 extracted references · 34 canonical work pages · 3 internal anchors

  1. [1]

    Foundation models in healthcare: Opportunities, risks & strategies forward

    Anja Thieme, Aditya Nori, Marzyeh Ghassemi, Rishi Bommasani, Tariq Osman Andersen, and Ewa Luger. Foundation models in healthcare: Opportunities, risks & strategies forward. InExtended abstracts of the 2023 CHI conference on human factors in computing systems, pages 1–4,

  2. [2]

    Foundation models for electronic health records: representation dynamics and transferability

    Michael C Burkhart, Bashar Ramadan, Zewei Liao, Kaveri Chhikara, Juan C Rojas, William F Parker, and Brett K Beaulieu-Jones. Foundation models for electronic health records: representation dynamics and transferability.arXiv preprint arXiv:2504.10422,

  3. [3]

    A foundation model for intensive care: Unlocking generalization across tasks and domains at scale.medRxiv, pages 2025–07,

    Manuel Burger, Daphné Chopard, Malte Londschien, Fedor Sergeev, Hugo Yèche, Rita Kuznetsova, Martin Faltys, Eike Gerdes, Polina Leshetkina, Peter Bühlmann, et al. A foundation model for intensive care: Unlocking generalization across tasks and domains at scale.medRxiv, pages 2025–07,

  4. [4]

    Bert: Pre-training of deep bidirectional transformers for language understanding

    Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. Bert: Pre-training of deep bidirectional transformers for language understanding. InProceedings of the 2019 conference of the North American chapter of the association for computational linguistics: human language technologies, volume 1 (long and short papers), pages 4171–4186,

  5. [5]

    Deep Residual Learning for Image Recognition

    URLhttps://arxiv.org/abs/1512.03385. Mingyi He, Bo Li, and Huahui Chen. Multi-scale 3d deep convolutional neural network for hyper- spectral image classification. In2017 IEEE International Conference on Image Processing (ICIP), pages 3904–3908. IEEE,

  6. [6]

    Can large language models abstract medical coded language? arXiv preprint arXiv:2403.10822,

    Simon A Lee and Timothy Lindsey. Can large language models abstract medical coded language? arXiv preprint arXiv:2403.10822,

  7. [7]

    Serialized ehr make for good text representations.arXiv preprint arXiv:2510.13843, 2025

    12 Zhirong Chou, Quan Qin, and Shi Li. Serialized ehr make for good text representations.arXiv preprint arXiv:2510.13843,

  8. [8]

    A collection of innovations in medical ai for patient records in 2024.arXiv preprint arXiv:2503.05768, 2025

    arXiv preprint arXiv:2503.05768, 2025a. Fu Huiliang, Hu Hong, Tao Jingfei, Guo Fengge, Cai Ning, Yuanyun Zhang, and Shi Li. Clio: Policy-aware foundation models for ehr as controlled dynamical systems.Authorea Preprints,

  9. [9]

    Structured semantics from unstructured notes: Language model approaches to ehr-based decision support.arXiv preprint arXiv:2506.06340, 2025

    Wu Hao Ran, Xi Xi, Furong Li, Jingyi Lu, Jian Jiang, Hui Huang, Yuzhuan Zhang, and Shi Li. Structured semantics from unstructured notes: Language model approaches to ehr-based decision support.arXiv preprint arXiv:2506.06340,

  10. [10]

    Towards on-device foundation models for raw wearable signals

    Simon A Lee, Cyrus Tanade, Hao Zhou, Juhyeon Lee, Megha Thukral, Baiying Lu, and Sharanya Ar- cot Desai. Towards on-device foundation models for raw wearable signals. InNeurIPS 2025 Workshop on Learning from Time Series for Health,

  11. [11]

    A case study exploring the current landscape of synthetic medical record generation with commercial llms.arXiv preprint arXiv:2504.14657, 2025

    Yihan Lin, Zhirong Bella Yu, and Simon Lee. A case study exploring the current landscape of synthetic medical record generation with commercial llms.arXiv preprint arXiv:2504.14657, April

  12. [12]

    Chronoformer: Time-aware transformer architectures for structured clinical event modeling.arXiv preprint arXiv:2504.07373, 2025

    Shane Lowe, Garrett Park, Liam Lee, and Parker Smith. Latent physiology as language: A state-space foundation model for multimodal icu and ehr representation learning. Yuanyun Zhang and Shi Li. Chronoformer: Time-aware transformer architectures for structured clinical event modeling.arXiv preprint arXiv:2504.07373, 2025b. Simon A Lee, Cyrus Tanade, Hao Zh...

  13. [13]

    Brandon Westover, and Jimeng Sun

    Chaoqi Yang, M. Brandon Westover, and Jimeng Sun. Biot: Biosignal transformer for cross-data learning in the wild. InNeurIPS 2023,

  14. [14]

    Large-scale training of foundation models for wearable biosignals.arXiv preprint arXiv:2312.05409, 2023

    URL https://openreview.net/forum?id= c2LZyTyddi. Salar Abbaspourazad, Oussama Elachqar, Andrew C Miller, Saba Emrani, Udhyakumar Nallasamy, and Ian Shapiro. Large-scale training of foundation models for wearable biosignals.arXiv preprint arXiv:2312.05409,

  15. [15]

    Wearable accelerom- eter foundation models for health via knowledge distillation.arXiv preprint arXiv:2412.11276, 2024

    Salar Abbaspourazad, Anshuman Mishra, Joseph Futoma, Andrew C Miller, and Ian Shapiro. Wear- able accelerometer foundation models for health via knowledge distillation.arXiv preprint arXiv:2412.11276,

  16. [16]

    Language models are few-shot learners.Advances in neural information processing systems, 33:1877–1901,

    Tom Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared D Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, et al. Language models are few-shot learners.Advances in neural information processing systems, 33:1877–1901,

  17. [17]

    Ehrmamba: Towards generalizable and scalable foundation models for electronic health records.arXiv preprint arXiv:2405.14567, 2024

    Adibvafa Fallahpour, Mahshid Alinoori, Wenqian Ye, Xu Cao, Arash Afkanpour, and Amrit Krishnan. Ehrmamba: Towards generalizable and scalable foundation models for electronic health records. arXiv preprint arXiv:2405.14567,

  18. [18]

    Emergency department decision support using clinical pseudo-notes.arXiv preprint arXiv:2402.00160, 2024

    Simon A Lee, Sujay Jain, Alex Chen, Kyoka Ono, Jennifer Fang, Akos Rudas, and Jeffrey N Chiang. Emergency department decision support using clinical pseudo-notes.arXiv preprint arXiv:2402.00160,

  19. [19]

    Text serialization and their relationship with the conventional paradigms of tabular machine learning.arXiv preprint arXiv:2406.13846, 2024

    Kyoka Ono and Simon A Lee. Text serialization and their relationship with the conventional paradigms of tabular machine learning.arXiv preprint arXiv:2406.13846,

  20. [20]

    Yonglong Tian, Dilip Krishnan, and Phillip Isola

    URLhttps://arxiv.org/abs/2407.05898. Yonglong Tian, Dilip Krishnan, and Phillip Isola. Contrastive representation distillation.arXiv,

  21. [21]

    Contrastive representation distilla- tion,

    doi: 10.48550/arxiv.1910.10699. URLhttps://arxiv.org/abs/1910.10699. Ruibing Hou, Hong Chang, Bingpeng Ma, Shiguang Shan, and Xilin Chen. Cross attention network for few-shot classification.Advances in neural information processing systems, 32,

  22. [22]

    Context clues: Evaluating long context models for clinical prediction tasks on ehrs.arXiv preprint arXiv:2412.16178, 2024

    Michael Wornow, Suhana Bedi, Miguel Angel Fuentes Hernandez, Ethan Steinberg, Jason Alan Fries, Christopher Ré, Sanmi Koyejo, and Nigam H Shah. Context clues: Evaluating long context models for clinical prediction tasks on ehrs.arXiv preprint arXiv:2412.16178,

  23. [23]

    Core-behrt: A carefully optimized and rigorously evaluated behrt.arXiv preprint arXiv:2404.15201, 2024

    Mikkel Odgaard, Kiril Vadimovic Klein, Sanne Møller Thysen, Espen Jimenez-Solem, Martin Sillesen, and Mads Nielsen. Core-behrt: A carefully optimized and rigorously evaluated behrt. arXiv preprint arXiv:2404.15201,

  24. [24]

    DINOv2: Learning Robust Visual Features without Supervision

    Maxime Oquab, Timothée Darcet, Théo Moutakanni, Huy V o, Marc Szafraniec, Vasil Khalidov, Pierre Fernandez, Daniel Haziza, Francisco Massa, Alaaeldin El-Nouby, et al. Dinov2: Learning robust visual features without supervision.arXiv preprint arXiv:2304.07193,

  25. [25]

    E3d-gpt: Enhanced 3d visual foundation for medical vision-language model.arXiv preprint arXiv:2410.14200,

    Haoran Lai, Zihang Jiang, Qingsong Yao, Rongsheng Wang, Zhiyang He, Xiaodong Tao, Wei Wei, Weifu Lv, and S Kevin Zhou. E3d-gpt: Enhanced 3d visual foundation for medical vision-language model.arXiv preprint arXiv:2410.14200,

  26. [26]

    Zixuan Liu, Hanwen Xu, Addie Woicik, Linda G Shapiro, Marian Blazes, Yue Wu, Cecilia S Lee, Aaron Y Lee, and Sheng Wang. Octcube: a 3d foundation model for optical coherence tomography that improves cross-dataset, cross-disease, cross-device and cross-modality analysis.arXiv preprint arXiv:2408.11227, 2024b. Abdelrahman M Shaker, Muhammad Maaz, Hanoona Ra...

  27. [27]

    Clinical modernbert: An efficient and long context encoder for biomedical text.arXiv preprint arXiv:2504.03964, 2025

    Simon A Lee, Anthony Wu, and Jeffrey N Chiang. Clinical modernbert: An efficient and long context encoder for biomedical text.arXiv preprint arXiv:2504.03964, April

  28. [28]

    Hybridna: A hybrid transformer-mamba2 long-range dna language model.arXiv preprint arXiv:2502.10807,

    Mingqian Ma, Guoqing Liu, Chuan Cao, Pan Deng, Tri Dao, Albert Gu, Peiran Jin, Zhao Yang, Yingce Xia, Renqian Luo, et al. Hybridna: A hybrid transformer-mamba2 long-range dna language model.arXiv preprint arXiv:2502.10807,

  29. [29]

    Jepa-dna: Grounding genomic foundation models through joint-embedding predictive architectures.arXiv preprint arXiv:2602.17162, 2026

    Ariel Larey, Elay Dahan, Amit Bleiweiss, Raizy Kellerman, Guy Leib, Omri Nayshool, Dan Ofer, Tal Zinger, Dan Dominissini, Gideon Rechavi, et al. Jepa-dna: Grounding genomic foundation models through joint-embedding predictive architectures.arXiv preprint arXiv:2602.17162,

  30. [30]

    Genbench: A benchmarking suite for systematic evaluation of genomic foundation models.arXiv preprint arXiv:2406.01627, 2024c

    Zicheng Liu, Jiahui Li, Siyuan Li, Zelin Zang, Cheng Tan, Yufei Huang, Yajing Bai, and Stan Z Li. Genbench: A benchmarking suite for systematic evaluation of genomic foundation models.arXiv preprint arXiv:2406.01627, 2024c. Wei Wu, Qiuyi Li, Mingyang Li, Kun Fu, Fuli Feng, Jieping Ye, Hui Xiong, and Zheng Wang. Generator: a long-context generative genomic...

  31. [31]

    Medhelm: Holistic evaluation of large language models for medical tasks.arXiv preprint arXiv:2505.23802,

    Suhana Bedi, Hejie Cui, Miguel Fuentes, Alyssa Unell, Michael Wornow, Juan M Banda, Nikesh Kotecha, Timothy Keyes, Yifan Mai, Mert Oez, et al. Medhelm: Holistic evaluation of large language models for medical tasks.arXiv preprint arXiv:2505.23802,

  32. [32]

    A Survey of Large Language Models

    Wayne Xin Zhao, Kun Zhou, Junyi Li, Tianyi Tang, Xiaolei Wang, Yupeng Hou, Yingqian Min, Beichen Zhang, Junjie Zhang, Zican Dong, et al. A survey of large language models.arXiv preprint arXiv:2303.18223, 1(2),

  33. [33]

    Freqmae: Frequency-aware masked autoencoder for multi-modal iot sensing

    Denizhan Kara, Tomoyoshi Kimura, Shengzhong Liu, Jinyang Li, Dongxin Liu, Tianshi Wang, Ruijie Wang, Yizhuo Chen, Yigong Hu, and Tarek Abdelzaher. Freqmae: Frequency-aware masked autoencoder for multi-modal iot sensing. InProceedings of the ACM Web Conference 2024, pages 2795–2806,

  34. [34]

    Frequency-aware masked autoencoders for multimodal pretraining on biosignals

    Ran Liu, Ellen L Zippi, Hadi Pouransari, Chris Sandino, Jingping Nie, Hanlin Goh, Erdrin Azemi, and Ali Moin. Frequency-aware masked autoencoders for multimodal pretraining on biosignals. arXiv preprint arXiv:2309.05927,