arxiv: 2604.18058 · v2 · submitted 2026-04-20 · 💻 cs.LG

Recognition: unknown

Sonata: A Hybrid World Model for Inertial Kinematics under Clinical Data Scarcity

Blaise Delaney , Salil Patel , Yuji Xing , Dominic Dootson , Karin Sevegnani , Chrystalina Antoniades

Authors on Pith no claims yet

Pith reviewed 2026-05-10 04:33 UTC · model grok-4.3

classification 💻 cs.LG

keywords inertial measurement unitworld modelclinical data scarcityrepresentation learningfall risk predictiontrunk kinematicslatent modelwearable inference

0 comments

The pith

A compact hybrid world model pre-trained on public IMU data outperforms autoregressive baselines in clinical discrimination and fall-risk tasks.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents Sonata as a small latent world model for six-axis trunk IMU signals that addresses the mismatch between typical large-scale reconstruction objectives and the small size of clinical cohorts. It pre-trains on a combined set of nine public datasets by learning to predict future states in latent space rather than reconstructing raw traces. When the resulting representations are frozen and tested on clinical tasks, the model shows stronger discrimination between groups, better prospective fall-risk prediction, and improved transfer across cohorts compared to a matched forecasting baseline. The 3.77-million-parameter size also makes it suitable for direct use on wearable devices. This approach aims to create more usable kinematic representations when new labeled clinical data remains limited.

Core claim

Sonata is a 3.77 M-parameter hybrid model, pre-trained on a harmonised corpus of nine public datasets (739 subjects, 190k windows) with a latent world-model objective that predicts future state rather than reconstructing raw sensor traces. In a controlled comparison against a matched autoregressive forecasting baseline on the same backbone, Sonata yields consistently stronger frozen-probe clinical discrimination, prospective fall-risk prediction, and cross-cohort transfer across a 14-arm evaluation suite, while producing higher-rank, more structured latent representations.

What carries the argument

Sonata, the compact hybrid latent world model whose training objective forecasts future kinematic states from trunk IMU inputs instead of reconstructing the raw signals.

If this is right

Frozen representations from Sonata enable stronger clinical group discrimination than matched autoregressive models.
The same representations improve prospective prediction of falls in patient cohorts.
Cross-cohort transfer performance rises across multiple evaluation arms.
Latent representations become higher-rank and more structured than those from pure forecasting.
The 3.77 M parameter size permits on-device inference on standard wearable hardware.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

If the transfer benefit holds, similar world-model pre-training could be applied to other body-worn sensors or movement disorders without new large-scale clinical collection.
The structured latents may support downstream tasks such as anomaly detection or progression tracking that were not directly tested.
On-device deployment opens the possibility of continuous kinematic monitoring outside laboratory settings.

Load-bearing premise

That harmonizing nine public datasets produces a pre-training distribution representative enough to transfer usefully to real clinical cohorts that contain only tens to hundreds of patients.

What would settle it

A head-to-head test of Sonata against the autoregressive baseline on a completely held-out clinical cohort collected under different protocols, where Sonata shows no advantage in discrimination accuracy or fall-risk prediction.

Figures

Figures reproduced from arXiv: 2604.18058 by Blaise Delaney, Chrystalina Antoniades, Dominic Dootson, Karin Sevegnani, Salil Patel, Yuji Xing.

**Figure 2.** Figure 2: Sonata architecture, pretraining objective, and downstream probe. Top: The input sequence is embedded into H(0) and processed by a hybrid backbone interleaving LongConvBlocks (C) and GatedDeltaNetBlocks (G). Intermediate recurrent states weave across units via state injection (eq. (3)). During self-supervised pretraining, the terminal latent state St is routed through an expand–compress projector πϕ and li… view at source ↗

**Figure 3.** Figure 3: Latent geometry on SisFall activities and fall directions, shown as two-dimensional t-SNE projections of the [PITH_FULL_IMAGE:figures/full_fig_p009_3.png] view at source ↗

read the original abstract

We introduce Sonata, a compact latent world model for six-axis trunk IMU representation learning under clinical data scarcity. Clinical cohorts typically comprise tens to hundreds of patients, making web-scale masked-reconstruction objectives poorly matched to the problem. Sonata is a 3.77 M-parameter hybrid model, pre-trained on a harmonised corpus of nine public datasets (739 subjects, 190k windows) with a latent world-model objective that predicts future state rather than reconstructing raw sensor traces. In a controlled comparison against a matched autoregressive forecasting baseline (MAE) on the same backbone, Sonata yields consistently stronger frozen-probe clinical discrimination, prospective fall-risk prediction, and cross-cohort transfer across a 14-arm evaluation suite, while producing higher-rank, more structured latent representations. At 3.77 M parameters the model is compatible with on-device wearable inference, offering a step toward general kinematic world models for neurological assessment.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Sonata gives a compact hybrid IMU world model with a future-state prediction pre-training objective that beats a matched autoregressive baseline on clinical transfer tasks, but the abstract supplies no numbers so the size of the gains is impossible to judge.

read the letter

The main takeaway is that Sonata is a 3.77 M parameter hybrid model pre-trained on 739 subjects from nine public IMU datasets using a latent future-state prediction loss instead of masked reconstruction, then evaluated on frozen-probe clinical tasks where it reportedly beats a same-backbone autoregressive forecasting baseline across 14 arms while producing more structured latents and better cross-cohort transfer. The model is small enough for on-device use, which directly targets the clinical scarcity problem of tens-to-hundreds of patients rather than web-scale data regimes. The matched baseline comparison is a clear strength because it keeps the architecture fixed and isolates the effect of the world-model objective. The framing around kinematic world models for neurological assessment is also practical and avoids overclaiming generality. The paper does a solid job identifying why standard reconstruction objectives are mismatched to small clinical IMU cohorts and proposing a concrete alternative that stays lightweight. The stress-test concern about domain shift in the harmonized pre-training corpus is worth watching, but the abstract does not yet give the numbers or ablations needed to evaluate it. No effect sizes, confidence intervals, or per-arm breakdowns appear in the summary, which leaves open whether the reported outperformance is large enough to matter or whether post-hoc choices in the 14-arm suite inflate the picture. Without those details it is hard to know if the gains come from the objective, from incidental alignment with the evaluation cohorts, or from something else. This work is aimed at people building wearable IMU tools for fall-risk or gait assessment under tight data constraints. A reader already working on small-cohort transfer learning or on-device kinematic models would find the objective shift and size constraint useful to think about, even if they end up adapting the details. It deserves peer review because the problem is real, the baseline control is honest, and the model scale is appropriate for the setting; the current version simply needs the quantitative results and ablations filled in before it can be assessed properly.

Referee Report

2 major / 2 minor

Summary. The manuscript introduces Sonata, a compact 3.77 M-parameter hybrid latent world model for six-axis trunk IMU representation learning. It is pre-trained on a harmonized corpus of nine public datasets (739 subjects, 190k windows) using a latent world-model objective that predicts future state rather than reconstructing raw traces. In a controlled comparison to a matched autoregressive forecasting baseline (MAE) on the identical backbone, Sonata shows stronger frozen-probe clinical discrimination, prospective fall-risk prediction, and cross-cohort transfer across a 14-arm evaluation suite, while producing higher-rank and more structured latent representations. The model size is noted as compatible with on-device wearable inference.

Significance. If the results hold, the work provides a practical route to general kinematic world models for neurological assessment under data scarcity, by leveraging public IMU corpora for pre-training that transfers to small clinical cohorts. The matched autoregressive baseline on the same architecture supplies an independent empirical anchor, and the emphasis on a compact parameter count for on-device use is a concrete strength.

major comments (2)

[Results] The central transfer claim requires that pre-training on the harmonized public corpus yields latents that generalize to real clinical cohorts of tens to hundreds of patients. However, the results section provides no quantitative domain-shift metrics (e.g., MMD between pre-training and target latents) or ablation studies that remove individual source datasets. Without these, it remains unclear whether observed gains on frozen-probe and cross-cohort tasks stem from the world-model objective or from incidental alignment with the evaluation cohorts.
[Abstract and Results] The abstract and results claim 'consistent outperformance' across the 14-arm suite but report no numerical effect sizes, confidence intervals, or details on arm selection and baseline matching. This omission makes it impossible to assess whether post-hoc selection or baseline mismatch influences the reported advantages in clinical discrimination and fall-risk prediction.

minor comments (2)

[Abstract] The abstract refers to 'higher-rank, more structured latent representations' without specifying the quantitative metrics (e.g., effective rank, participation ratio, or mutual information) or visualization methods used to support this claim.
[Methods] Clarify the precise formulation of the latent world-model loss (future-state prediction) versus standard reconstruction objectives, including any weighting or auxiliary terms, in the methods section.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback on Sonata. We address each major comment below, clarifying the design choices and indicating revisions to improve transparency and rigor.

read point-by-point responses

Referee: [Results] The central transfer claim requires that pre-training on the harmonized public corpus yields latents that generalize to real clinical cohorts of tens to hundreds of patients. However, the results section provides no quantitative domain-shift metrics (e.g., MMD between pre-training and target latents) or ablation studies that remove individual source datasets. Without these, it remains unclear whether observed gains on frozen-probe and cross-cohort tasks stem from the world-model objective or from incidental alignment with the evaluation cohorts.

Authors: The matched autoregressive MAE baseline on the identical backbone and training corpus is intended to isolate the contribution of the latent world-model objective from data or architecture effects. The 14-arm suite includes multiple cross-cohort transfer arms that evaluate on clinical cohorts explicitly held out from pre-training. We did not report MMD or exhaustive leave-one-dataset-out ablations in the original submission. In revision we will add MMD distances between pre-training and target latent distributions. For ablations, the existing cross-cohort arms already probe generalization across different source-subset combinations; we will add a supplementary leave-one-dataset-out analysis for the two largest source corpora if space and compute allow. revision: partial
Referee: [Abstract and Results] The abstract and results claim 'consistent outperformance' across the 14-arm suite but report no numerical effect sizes, confidence intervals, or details on arm selection and baseline matching. This omission makes it impossible to assess whether post-hoc selection or baseline mismatch influences the reported advantages in clinical discrimination and fall-risk prediction.

Authors: We agree that explicit quantitative support and evaluation details are needed. The 14 arms comprise frozen-probe clinical discrimination tasks, prospective fall-risk prediction, and cross-cohort transfers, with the MAE baseline matched exactly in architecture, parameter count, and pre-training data. In the revised manuscript we will report effect sizes and 95% confidence intervals for primary metrics, and include a table that enumerates each arm with cohort sizes, task definitions, and baseline-matching criteria. This will allow readers to evaluate consistency and rule out post-hoc selection concerns. revision: yes

Circularity Check

0 steps flagged

No significant circularity; empirical comparison provides independent anchor

full rationale

The paper's core claims rest on a controlled empirical comparison of the latent world-model objective against a matched autoregressive forecasting baseline (MAE) using the identical backbone architecture. This setup yields measurable differences in frozen-probe clinical discrimination, fall-risk prediction, and cross-cohort transfer without reducing any result to a fitted parameter or self-definition by construction. Pre-training on the harmonized corpus (739 subjects) is presented as an input distribution choice whose downstream effects are externally validated on separate clinical cohorts rather than tautologically implied. No self-citation load-bearing steps, uniqueness theorems, or ansatz smuggling appear in the derivation chain; the 14-arm evaluation suite serves as an independent falsification mechanism.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

The approach rests on standard supervised learning assumptions plus the domain assumption that public IMU corpora can be harmonized without introducing distribution shift that invalidates clinical transfer.

pith-pipeline@v0.9.0 · 5472 in / 1088 out tokens · 39043 ms · 2026-05-10T04:33:06.087075+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

66 extracted references · 47 canonical work pages · 6 internal anchors

[1]

A decade of progress in wearable sensors for fall de- tection (2015–2024): A network-based visualization review.Sensors, 25(7):2205, 2025

Yifei Li, Pei Liu, Yan Fang, Xiangyuan Wu, Yewei Xie, Zhongzhi Xu, Hao Ren, and Fengshi Jing. A decade of progress in wearable sensors for fall de- tection (2015–2024): A network-based visualization review.Sensors, 25(7):2205, 2025. doi: 10.3390/ s25072205

2015
[2]

Luca Palmerini, Luca Reggi, Tecla Bonci, Silvia Del Din, M. Encarna Micó-Amigo, Francesca Salis, Stefano Bertuletti, Marco Caruso, Andrea Cereatti, Eran Gazit, Anisoara Paraschiv-Ionescu, Abolfazl Soltani, Felix Kluge, Arne Küderle, Martin Ull- rich, Cameron Kirk, Hugo Hiden, Ilaria D’Ascanio, Clint Hansen, Lynn Rochester, Claudia Mazzà, and Lorenzo Chiar...

work page doi:10.1038/s41597-023-01930-9 2023
[3]

A decoder-only foundation model for time-series forecasting

Abhimanyu Das, Weihao Kong, Rajat Sen, and Yichen Zhou. A decoder-only foundation model for time-series forecasting. InProceedings of the 41st In- ternational Conference on Machine Learning (ICML),
[4]

Chronos: Learning the Language of Time Series

Abdul Fatir Ansari, Lorenzo Stella, Caner Turkmen, et al. Chronos: Learning the language of time series. Transactions on Machine Learning Research, 2024. arXiv:2403.07815

work page internal anchor Pith review arXiv 2024
[5]

Timer-XL: Long- context transformers for unified time series forecast- ing

Yong Liu, Guo Qin, Xiangdong Huang, Jianmin Wang, and Mingsheng Long. Timer-XL: Long- context transformers for unified time series forecast- ing. InProceedings of the 13th International Con- ference on Learning Representations (ICLR), 2025. arXiv:2410.04803

work page arXiv 2025
[6]

Moirai-MoE: Empowering time series foundation models with sparse mixture of experts, 2024

Xu Liu, Juncheng Liu, Gerald Woo, Taha Aksu, Yux- uan Liang, Roger Zimmermann, Chenghao Liu, Silvio Savarese, Caiming Xiong, and Doyen Sahoo. Moirai- MoE: Empowering time series foundation models with sparse mixture of experts.arXiv preprint, 2024. arXiv:2410.10469

work page arXiv 2024
[7]

This time is different: An observabil- ity perspective on time series foundation models.arXiv preprint arXiv:2505.14766,

Ben Cohen, Emaad Khwaja, Youssef Doubli, Stephan Xie, Ameet Talwalkar, et al. This time is different: Anobservabilityperspectiveontimeseriesfoundation models.arXiv preprint, 2025. arXiv:2505.14766

work page arXiv 2025
[8]

A review of gait analysis using gyroscopes and inertial measurement units.Sensors, 25(11):3481, 2025

Sheng Lin, Kerrie Evans, Dean Hartley, Scott Morri- son, Stuart McDonald, Martin Veidt, and Gui Wang. A review of gait analysis using gyroscopes and inertial measurement units.Sensors, 25(11):3481, 2025. doi: 10.3390/s25113481

work page doi:10.3390/s25113481 2025
[9]

Moore, Hamish G

Steven T. Moore, Hamish G. MacDougall, and William G. Ondo. Ambulatory monitoring of freez- ing of gait in Parkinson’s disease.Journal of Neuroscience Methods, 167(2):340–348, 2008. doi: 10.1016/j.jneumeth.2007.08.023

work page doi:10.1016/j.jneumeth.2007.08.023 2008
[10]

Bächlin et al

Marc Bächlin, Meir Plotnik, Daniel Roggen, In- bal Maidan, Jeffrey M. Hausdorff, Nir Giladi, and Sonata — Technical Report 14 Gerhard Tröster. Wearable assistant for Parkin- son’s disease patients with the freezing of gait symp- tom.IEEE Transactions on Information Tech- nology in Biomedicine, 14(2):436–442, 2010. doi: 10.1109/TITB.2009.2036165

work page doi:10.1109/titb.2009.2036165 2010
[11]

Brzezicki, Salil Patel, Niall Conway, James J

Charalampos Sotirakis, Maksymilian A. Brzezicki, Salil Patel, Niall Conway, James J. FitzGerald, and Chrystalina A. Antoniades. Predicting future fallers in Parkinson’s disease using kinematic data over a period of 5 years.npj Digital Medicine, 7:345, 2024. doi: 10.1038/s41746-024-01311-5

work page doi:10.1038/s41746-024-01311-5 2024
[12]

Hausdorff

Jeffrey M. Hausdorff. Gait dynamics, fractals and falls: finding meaning in the stride-to-stride fluctua- tions of human walking.Human Movement Science, 26(4):555–589, 2007. doi: 10.1016/j.humov.2007.05. 003

work page doi:10.1016/j.humov.2007.05 2007
[13]

Wiebren Zijlstra and At L. Hof. Assessment of spatio- temporal gait parameters from trunk accelerations during human walking.Gait & Posture, 18(2):1–10,
[14]

doi: 10.1016/S0966-6362(02)00190-X

work page doi:10.1016/s0966-6362(02)00190-x
[15]

Estimation of gait cycle characteristics by trunk ac- celerometry.Journal of Biomechanics, 37(1):121–126,

Rolf Moe-Nilssen and Jorunn Lægdheim Helbostad. Estimation of gait cycle characteristics by trunk ac- celerometry.Journal of Biomechanics, 37(1):121–126,
[16]

doi: 10.1016/S0021-9290(03)00233-1

work page doi:10.1016/s0021-9290(03)00233-1
[17]

Anewmethodforevaluatingmotor control in gait under real-life environmental condi- tions

RolfMoe-Nilssen. Anewmethodforevaluatingmotor control in gait under real-life environmental condi- tions. Part 1: The instrument.Clinical Biomechanics, 13(4):320–327, 1998. doi: 10.1016/S0268-0033(98) 00089-8

work page doi:10.1016/s0268-0033(98 1998
[18]

Mubarak Patel, Aleksandar Pavic, and Victoria A. Goodwin. Wearable inertial sensors to measure gait and posture characteristic differences in older adult fallers and non-fallers: a scoping review.Gait & Posture, 76:110–121, 2020. doi: 10.1016/j.gaitpost. 2019.10.039

work page doi:10.1016/j.gaitpost 2020
[19]

Anderson, David Eguren, Michael A

Anthony J. Anderson, David Eguren, Michael A. Gon- zalez, Michael Caiola, Naima Khan, Sophia Watkin- son, Isabella Zuccaroli, Siegfried S. Hirczy, Cyrus P. Zabetian, Kelly Mills, Emile Moukheiber, Laure- ano Moro-Velazquez, Najim Dehak, Chelsie Mot- ley, Brittney C. Muir, Ankur Butala, and Kim- berly Kontson. WearGait-PD: An open-access wear- ables datase...

work page doi:10.1038/s41597-026-06806-2 2026
[20]

LIMU-BERT: Unleashing the Potential of Unlabeled Data for IMU Sensing Applications

Huatao Xu, Pengfei Zhou, Rui Tan, Mo Li, and Guobin Shen. LIMU-BERT: Unleashing the poten- tial of unlabeled data for IMU sensing applications. InProceedings of the 19th ACM Conference on Em- bedded Networked Sensor Systems (SenSys), pages 208–221, 2021. doi: 10.1145/3485730.3485937

work page doi:10.1145/3485730.3485937 2021
[21]

Contrastive predictive coding for human activity recognition.Proceedings of the ACM on Interac- tive, Mobile, Wearable and Ubiquitous Technologies, 5(2):1–26, 2021

Harish Haresamudram, Irfan Essa, and Thomas Plötz. Contrastive predictive coding for human activity recognition.Proceedings of the ACM on Interac- tive, Mobile, Wearable and Ubiquitous Technologies, 5(2):1–26, 2021. doi: 10.1145/3463506

work page doi:10.1145/3463506 2021
[22]

Prac- tically adopting human activity recognition

Huatao Xu, Pengfei Zhou, Rui Tan, and Mo Li. Prac- tically adopting human activity recognition. InPro- ceedings of the 29th Annual International Conference on Mobile Computing and Networking (MobiCom),
[23]

doi: 10.1145/3570361.3613299

work page doi:10.1145/3570361.3613299
[24]

LeWorldModel: Stable End-to-End Joint-Embedding Predictive Architecture from Pixels

Lucas Maes, Quentin Le Lidec, Damien Scieur, Yann LeCun, and Randall Balestriero. LeWorldModel: Sta- ble end-to-end joint-embedding predictive architec- ture from pixels.arXiv preprint arXiv:2603.19312, 2026

work page internal anchor Pith review arXiv 2026
[25]

Laya: A LeJEPA Approach to EEG via Latent Prediction over Reconstruction

Saarang Panchavati, Uddhav Panchavati, Corey Arnold, and William Speier. Laya: A LeJEPA ap- proach to EEG via latent prediction over reconstruc- tion.arXiv preprint arXiv:2603.16281, 2026

work page internal anchor Pith review Pith/arXiv arXiv 2026
[26]

Xu, Jaya Narain, Gregory Darnell, Har- aldur Hallgrimsson, Hyewon Jeong, Darren Forde, Richard Fineman, Karthik J

Maxwell A. Xu, Jaya Narain, Gregory Darnell, Har- aldur Hallgrimsson, Hyewon Jeong, Darren Forde, Richard Fineman, Karthik J. Raghuram, James M. Rehg, and Shirley Ren. RelCon: Relative contrastive learning for a motion foundation model for wearable data. InInternational Conference on Learning Rep- resentations (ICLR), 2025

2025
[27]

Scaling wearable foundation models

Girish Narayanswamy, Xin Liu, Kumar Ayush, Yuzhe Yang, Xuhai Xu, Shun Liao, Jake Garrison, Shyam Tailor, Jake Sunshine, Yun Liu, Tim Althoff, Shrikanth Narayanan, Pushmeet Kohli, Jiening Zhan, Mark Malhotra, Shwetak Patel, Samy Abdel-Ghaffar, and Daniel McDuff. Scaling wearable foundation models. InInternational Conference on Learning Representations (ICLR), 2025

2025
[28]

& Rahman, T

Yunfei Luo, Yuliang Chen, Asif Salekin, and Tauhidur Rahman. Toward foundation model for multivari- ate wearable sensing of physiological signals.arXiv preprint arXiv:2412.09758, 2025

work page arXiv 2025
[29]

Ali Heydari, Girish Narayanswamy, Maxwell A

Yuwei Zhang, Kumar Ayush, Siyuan Qiao, A. Ali Heydari, Girish Narayanswamy, Maxwell A. Xu, Ahmed A. Metwally, Shawn Xu, Jake Garrison, Tim Althoff, Xuhai Xu, Yun Liu, Pushmeet Kohli, Jien- ing Zhan, Mark Malhotra, Shwetak Patel, Cecilia Mascolo, Xin Liu, Daniel McDuff, and Yuzhe Yang. SensorLM: Learning the language of wearable sensors. arXiv preprint arX...

work page arXiv 2025
[30]

Wearable accelerom- eter foundation models for health via knowledge distillation.arXiv preprint arXiv:2412.11276, 2024

Salar Abbaspourazad, Anshuman Mishra, Joseph Fu- toma, Andrew C Miller, and Ian Shapiro. Wearable accelerometer foundation models for health via knowl- edge distillation.arXiv preprint arXiv:2412.11276, 2025

work page arXiv 2025
[31]

Das, Chi Ian Tang, Fahim Kawsar, and Mohammad Malekzadeh

Arnav M. Das, Chi Ian Tang, Fahim Kawsar, and Mohammad Malekzadeh. PRimuS: Pretraining IMU encoders with multimodal self-supervision. InPro- ceedings of the 2025 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP),

2025
[32]

Scalalog: Scalable log-based failure diagnosis using llm

doi: 10.1109/ICASSP49660.2025.10888874. arXiv:2411.15127

work page doi:10.1109/icassp49660.2025.10888874 2025
[33]

Gupta, Sonata — Technical Report 15 and Jingbo Shang

Xiyuan Zhang, Diyan Teng, Ranak Roy Chowd- hury, Shuheng Li, Dezhi Hong, Rajesh K. Gupta, Sonata — Technical Report 15 and Jingbo Shang. UniMTS: Unified pre-training for motion time series. InAdvances in Neural Information Processing Systems (NeurIPS), 2024. arXiv:2410.19818

work page arXiv 2024
[34]

Simon A. Lee, Cyrus Tanade, Hao Zhou, Juhyeon Lee, Megha Thukral, Minji Han, Rachel Choi, Md Saz- zad Hissain Khan, Baiying Lu, Migyeong Gwak, Mehrab Bin Morshed, Viswam Nathan, Md Mah- bubur Rahman, Li Zhu, Subramaniam Venkatra- man, and Sharanya Arcot Desai. HiMAE: Hierarchi- cal masked autoencoders discover resolution-specific structure in wearable tim...

work page arXiv 2025
[35]

Beyond generative AI: World models for clinical prediction, counterfactuals, and planning

Mohammad Areeb Qazi, Maryam Nadeem, and Mo- hammad Yaqub. Beyond generative AI: World models for clinical prediction, counterfactuals, and planning. InNeurIPS 2025 Workshop on Bridging Language, Agent, and World Models for Reasoning and Planning,

2025
[36]

EchoJEPA: A latent predictive foundation model for echocardiogra- phy.arXiv preprint arXiv:2602.02603, 2026

Alif Munim, Adibvafa Fallahpour, River Jiang, Barry Rubin, Jeremy Slivnick, Heather Whitney, Teodora Szasz, Wendy Tsang, and Bo Wang. EchoJEPA: A latent predictive foundation model for echocardiogra- phy.arXiv preprint arXiv:2602.02603, 2026

work page arXiv 2026
[37]

medDreamer: Model-based reinforcement learning with latent imagination on complex EHRs for clinical decision support.arXiv preprint arXiv:2505.19785, 2025

Qianyi Xu, Gousia Habib, Feng Wu, Dilruk Per- era, and Mengling Feng. medDreamer: Model-based reinforcement learning with latent imagination on complex EHRs for clinical decision support.arXiv preprint arXiv:2505.19785, 2025

work page arXiv 2025
[38]

Clarity: Medical world model for guiding treatment decisions by modeling context-aware disease trajectories in latent space.arXiv preprint arXiv:2512.08029, 2025

Tianxingjian Ding, Yuanhao Zou, Chen Chen, Mubarak Shah, and Yu Tian. CLARITY: Medi- cal world model for guiding treatment decisions by modeling context-aware disease trajectories in latent space.arXiv preprint arXiv:2512.08029, 2025

work page arXiv 2025
[39]

JETS: A self-supervised joint em- bedding time series foundation model for behavioral data in healthcare

Erik Xie, Wyatt Chang, Raquel Martinez, and Bran- don Ballinger. JETS: A self-supervised joint em- bedding time series foundation model for behavioral data in healthcare. InNeurIPS 2025 Workshop on Learning from Time-Series for Health (TS4H), 2025

2025
[40]

Xu, James M

Yenho Chen, Maxwell A. Xu, James M. Rehg, and Christopher J. Rozell. Self-supervised dynamical system representations for physiological time-series. arXiv preprint arXiv:2512.00239, 2025. Under review at ICLR 2026

work page arXiv 2025
[41]

Linear transformers are secretly fast weight program- mers

Imanol Schlag, Kazuki Irie, and Jürgen Schmidhuber. Linear transformers are secretly fast weight program- mers. InProceedings of the 38th International Confer- ence on Machine Learning (ICML), pages 9355–9366. PMLR, 2021

2021
[42]

Parallelizing linear transformers with the delta rule over sequence length

Songlin Yang, Bailin Wang, Yu Zhang, Yikang Shen, and Yoon Kim. Parallelizing linear transformers with the delta rule over sequence length. InAdvances in Neural Information Processing Systems, volume 37, 2024

2024
[43]

Gated delta networks: Improving Mamba2 with delta rule

Songlin Yang, Jan Kautz, and Ali Hatamizadeh. Gated delta networks: Improving Mamba2 with delta rule. InInternational Conference on Learning Repre- sentations, 2025

2025
[44]

Transformers are SSMs: Generalized Models and Efficient Algorithms Through Structured State Space Duality

Tri Dao and Albert Gu. Transformers are SSMs: Generalized models and efficient algorithms through structured state space duality. InInternational Conference on Machine Learning (ICML), 2024. arXiv:2405.21060

work page internal anchor Pith review arXiv 2024
[45]

A systematic analysis of hybrid linear attention.arXiv preprint arXiv:2507.06457, 2025

Dustin Wang, Rui-Jie Zhu, Steven Abreu, Yong Shan, Taylor Kergan, Yuqi Pan, Yuhong Chou, Zheng Li, Ge Zhang, Wenhao Huang, and Jason Eshraghian. A systematic analysis of hybrid linear attention.arXiv preprint arXiv:2507.06457, 2025

work page arXiv 2025
[46]

Kimi linear: An expressive, efficient attention architecture

Kimi Team. Kimi linear: An expressive, efficient attention architecture. Technical report, MoonshotAI, 2025

2025
[47]

Olmo Hybrid: From Theory to Practice and Back

William Merrill, Yanhong Li, Tyler Romero, Anej Svete, Caia Costello, Pradeep Dasigi, Dirk Groen- eveld, David Heineman, Bailey Kuehl, Nathan Lam- bert, Chuan Li, Kyle Lo, Saumya Malik, DJ Matusz, Benjamin Minixhofer, Jacob Morrison, Luca Soldaini, Finbarr Timbers, Pete Walsh, Noah A. Smith, Han- naneh Hajishirzi, and Ashish Sabharwal. OLMo hybrid: From t...

work page internal anchor Pith review Pith/arXiv arXiv 2026
[48]

arXiv preprint arXiv:2601.22156 , year=

Yingfa Chen, Zhen Leng Thai, Zihan Zhou, Zhu Zhang, Xingyu Shen, Shuo Wang, Chaojun Xiao, Xu Han, and Zhiyuan Liu. Hybrid linear attention done right: Efficient distillation and effective archi- tectures for extremely long contexts.arXiv preprint arXiv:2601.22156, 2026

work page arXiv 2026
[49]

DeltaProduct: Improving state-tracking in linear RNNs via householder products

Julien Siems, Timur Carstensen, Arber Zela, Frank Hutter, Massimiliano Pontil, and Riccardo Grazzi. DeltaProduct: Improving state-tracking in linear RNNs via householder products. InAdvances in Neural Information Processing Systems, volume 38, 2025

2025
[50]

Exact Flow Linear Attention: Exact Solution from Continuous-Time Dynamics

Jingdi Lei, Di Zhang, and Soujanya Poria. Error- free linear attention is a free lunch: Exact solu- tion from continuous-time dynamics.arXiv preprint arXiv:2512.12602, 2025

work page internal anchor Pith review arXiv 2025
[51]

Reverso: Efficient time series founda- tion models for zero-shot forecasting.arXiv preprint arXiv:2602.17634, 2026

Xinghong Fu, Yanhong Li, Georgios Papaioannou, and Yoon Kim. Reverso: Efficient time series founda- tion models for zero-shot forecasting.arXiv preprint arXiv:2602.17634, 2026

work page arXiv 2026
[52]

A dataset of clin- ical gait signals with wearable sensors from healthy, neurological, and orthopedic cohorts.Scientific Data, 12:1674, 2025

Cyril Voisard, Rémi Barrois, Nicolas de l’Escalopier, Nicolas Vayatis, Pierre-Paul Vidal, Alain Yelnik, Damien Ricard, and Laurent Oudre. A dataset of clin- ical gait signals with wearable sensors from healthy, neurological, and orthopedic cohorts.Scientific Data, 12:1674, 2025. doi: 10.1038/s41597-025-05959-w

work page doi:10.1038/s41597-025-05959-w 2025
[53]

Sonata — Technical Report 16 Hausdorff

Aner Weiss, Marina Brozgol, Moran Dorfman, Talia Herman, Shirley Shema, Nir Giladi, and Jeffrey M. Sonata — Technical Report 16 Hausdorff. Does the evaluation of gait quality dur- ing daily life provide insight into fall risk? A novel approach using 3-day accelerometer record- ings.Neurorehabilitation and Neural Repair, 27(8): 742–752, 2013. doi: 10.1177/...

work page doi:10.1177/1545968313491004 2013
[54]

SisFall: A fall and movement dataset.Sensors, 17(1):198, 2017

Angela Sucerquia, José David López, and Jesús Fran- cisco Vargas-Bonilla. SisFall: A fall and movement dataset.Sensors, 17(1):198, 2017. doi: 10.3390/ s17010198

2017
[55]

A public domain dataset for human activity recognition using smartphones

Davide Anguita, Alessandro Ghio, Luca Oneto, Xavier Parra, and Jorge Luis Reyes-Ortiz. A public domain dataset for human activity recognition using smartphones. In21st European Symposium on Artifi- cial Neural Networks, Computational Intelligence and Machine Learning (ESANN), pages 437–442, Bruges, Belgium, 2013

2013
[56]

Santoyo-Ramón, and Jose M

Eduardo Casilari, José Antonio Santoyo-Ramón, and José Manuel Cano-García. UMAFall: A multisensor dataset for the research on automatic fall detection. Procedia Computer Science, 110:32–39, 2017. doi: 10.1016/j.procs.2017.06.110

work page doi:10.1016/j.procs.2017.06.110 2017
[57]

FallAllD: An Open Dataset of Human Falls and Activities of Daily Living for Classi cal and Deep Learning Applications,

Majd Saleh, Manuel Abbas, and Régine Le Bouquin Jeannès. FallAllD: An open dataset of human falls and activities of daily living for classical and deep learning applications.IEEE Sensors Journal, 21(2): 1849–1858, 2021. doi: 10.1109/JSEN.2020.3018335

work page doi:10.1109/jsen.2020.3018335 2021
[58]

Mi Zhang and Alexander A. Sawchuk. USC-HAD: A daily activity dataset for ubiquitous activity recog- nition using wearable sensors. InProceedings of the 2012 ACM Conference on Ubiquitous Comput- ing (UbiComp), Workshop on Situation, Activity and Goal Awareness (SAGAware), pages 1036–1043, Pitts- burgh, PA, USA, 2012. ACM. doi: 10.1145/2370216. 2370438

work page doi:10.1145/2370216 2012
[59]

ISBrecommendations for standardization in the reporting of kinematic data

GeWuandPeterR.Cavanagh. ISBrecommendations for standardization in the reporting of kinematic data. Journal of Biomechanics, 28(10):1257–1261, 1995. doi: 10.1016/0021-9290(95)00017-C

work page doi:10.1016/0021-9290(95)00017-c 1995
[60]

E., et al

Pauli Virtanen, Ralf Gommers, Travis E. Oliphant, et al. SciPy 1.0: Fundamental algorithms for scientific computing in Python.Nature Methods, 17(3):261– 272, 2020. doi: 10.1038/s41592-019-0686-2

work page doi:10.1038/s41592-019-0686-2 2020
[61]

URL http: //dx.doi.org/10.1145/3136755.3136817

Terry T. Um, Franz M. J. Pfister, Daniel Pichler, Satoshi Endo, Muriel Lang, Sandra Hirche, Urban Fietzek, and Dana Kulić. Data augmentation of wear- able sensor data for Parkinson’s disease monitoring using convolutional neural networks. InProceedings of the 19th ACM International Conference on Multi- modal Interaction (ICMI), pages 216–220, 2017. doi: 1...

work page doi:10.1145/3136755.3136817 2017
[62]

An empirical survey of data augmentation for time series classification with neural networks.PLOS ONE, 16(7):e0254841, 2021

Brian Kenji Iwana and Seiichi Uchida. An empirical survey of data augmentation for time series classi- fication with neural networks.PLOS One, 16(7): e0254841, 2021. doi: 10.1371/journal.pone.0254841

work page doi:10.1371/journal.pone.0254841 2021
[63]

Riccardo Grazzi, Julien Siems, Arber Zela, Jörg K. H. Franke, Frank Hutter, and Massimiliano Pontil. Un- locking state-tracking in linear RNNs through neg- ative eigenvalues. InInternational Conference on Learning Representations, 2025

2025
[64]

Lejepa: Provable and scalable self-supervised learning without the heuristics, 2025

Randall Balestriero and Yann LeCun. LeJEPA: Prov- able and scalable self-supervised learning without heuristics.arXiv preprint arXiv:2511.08544, 2025

work page arXiv 2025
[65]

The effective rank: A measure of effective dimensionality

Olivier Roy and Martin Vetterli. The effective rank: A measure of effective dimensionality. InProceedings of the 15th European Signal Processing Conference (EUSIPCO), pages 606–610, 2007

2007
[66]

Introducing a new benchmarked dataset for activity monitoring

Attila Reiss and Didier Stricker. Introducing a new benchmarked dataset for activity monitoring. In2012 16th International Symposium on Wearable Comput- ers, pages 108–109, Newcastle, UK, 2012. IEEE. doi: 10.1109/ISWC.2012.13. A Evaluation Protocol All evaluations operate on a frozen encoder unless explic- itly noted. During pretraining, the objective is ...

work page doi:10.1109/iswc.2012.13 2012