GEESE: Genotype-aware End-to-End Spatio-temporal Embedding for Behavioral Phenotyping
Pith reviewed 2026-06-30 14:55 UTC · model grok-4.3
The pith
End-to-end deep learning from 3D pose dynamics predicts genotypes in autism models better than hand-crafted features.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
GEESE encodes 3D movement sequences into a behavioral manifold via a pretrained time series foundation model, producing representations that surpass hand-crafted feature baselines in classifying behaviors and predicting genotypes for CNTNAP2, CHD8, and FMR1 models while generalizing across genetic backgrounds.
What carries the argument
The GEESE end-to-end framework that applies a pretrained time series foundation model to 3D pose dynamics to generate a behavioral manifold for genotype-aware embedding.
If this is right
- Learned representations capture genotype-specific behavioral signatures.
- The framework generalizes across genetic backgrounds.
- An all-cohort model identifies both genetic background and genotype from movement patterns alone.
- The method supports behavior classification and genotype prediction without hand-crafted features.
Where Pith is reading between the lines
- This could allow larger-scale genetic studies by automating phenotyping from pose data.
- Embeddings may reveal behavioral patterns not easily captured by traditional features.
- The approach might extend to other species or behavioral assays if the foundation model generalizes.
Load-bearing premise
A pretrained time series foundation model applied directly to 3D pose dynamics produces a behavioral manifold whose embeddings are informative for genotype prediction without domain-specific adaptation or hand-crafted features.
What would settle it
An experiment showing that GEESE embeddings do not outperform hand-crafted feature baselines in genotype prediction accuracy on the autism-associated genetic models.
Figures
read the original abstract
Behavioral phenotyping of genetic animal models currently requires labor-intensive manual feature engineering that limits reproducibility and scalability. We present GEESE, an end-to-end deep learning framework that learns behavioral representations directly from 3D pose dynamics without hand-crafted features. Using a pretrained time series foundation model, we encode movement sequences into a behavioral manifold that supports both behavior classification and genotype prediction. Evaluated across three autism-associated genetic models (CNTNAP2, CHD8, FMR1), our deep learning approach surpasses hand-crafted feature baselines in both tasks, revealing that learned representations capture genotype-specific behavioral signatures. The framework generalizes across genetic backgrounds, and an all-cohort model identifies both genetic background and genotype from movement patterns alone. We further provide HONK, an interactive intelligent tool enabling researchers without programming expertise to perform behavioral phenotyping from pose data through natural language interaction.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript introduces GEESE, an end-to-end deep learning framework that applies a pretrained time series foundation model directly to 3D pose dynamics to produce a behavioral manifold for genotype-aware phenotyping. Evaluated on three autism-associated genetic mouse models (CNTNAP2, CHD8, FMR1), the approach is claimed to surpass hand-crafted feature baselines for both behavior classification and genotype prediction tasks, to generalize across genetic backgrounds, and to enable an all-cohort model that identifies both background and genotype from movement patterns alone. The paper also presents the HONK interactive tool for natural-language phenotyping from pose data.
Significance. If the empirical claims hold with rigorous validation, the work could meaningfully reduce dependence on labor-intensive manual feature engineering in behavioral neuroscience, improving reproducibility and enabling scalable genotype-phenotype mapping. The use of frozen foundation-model embeddings for kinematic time series is a high-risk, high-reward direction; successful transfer without domain adaptation would constitute a notable result. The interactive HONK tool addresses a practical barrier for non-computational researchers. However, the load-bearing assumption—that pretrained univariate time-series models transfer to spatially constrained, multi-joint 3D pose trajectories without adaptation—requires explicit empirical support before the superiority and generalization claims can be accepted.
major comments (2)
- [Abstract] Abstract: The central claim that the deep learning approach 'surpasses hand-crafted feature baselines in both tasks' is stated without any quantitative results, error bars, statistical tests, dataset sizes, or exclusion criteria. This renders the superiority, generalization, and all-cohort claims impossible to evaluate from the provided information and places the entire empirical contribution on the methods and results sections.
- [Methods] Methods (foundation-model application): The decision to apply the pretrained time series foundation model frozen, with no domain-specific adaptation or fine-tuning, to 3D joint trajectories is load-bearing for all reported gains. Time-series foundation models are typically trained on univariate or low-dimensional series from unrelated domains; the manuscript must demonstrate (via ablation or embedding analysis) that the resulting manifold preserves the spatial kinematic constraints and inter-joint correlations necessary for genotype discrimination, rather than merely capturing generic temporal statistics.
minor comments (2)
- [Abstract / Results] The abstract mentions three genetic models but does not specify the number of animals, trials, or total pose sequences per cohort; these details should appear in the first results paragraph or a dedicated data table.
- [Methods] Notation for the behavioral manifold and embedding dimensionality is introduced without an explicit equation or diagram; a small schematic would clarify how the foundation-model output is used for the two downstream tasks.
Simulated Author's Rebuttal
We thank the referee for the constructive feedback. We address each major comment point by point below, indicating revisions where the manuscript will be updated to strengthen the presentation of results and validation of the approach.
read point-by-point responses
-
Referee: [Abstract] Abstract: The central claim that the deep learning approach 'surpasses hand-crafted feature baselines in both tasks' is stated without any quantitative results, error bars, statistical tests, dataset sizes, or exclusion criteria. This renders the superiority, generalization, and all-cohort claims impossible to evaluate from the provided information and places the entire empirical contribution on the methods and results sections.
Authors: We agree that the abstract would be more informative with key quantitative details. In the revised manuscript we will expand the abstract to report specific performance metrics (including accuracy or AUC improvements with standard errors), dataset sizes (number of animals and trials per cohort), and mention of the statistical tests used for the main comparisons. This will allow readers to evaluate the claims directly from the abstract while preserving its brevity. revision: yes
-
Referee: [Methods] Methods (foundation-model application): The decision to apply the pretrained time series foundation model frozen, with no domain-specific adaptation or fine-tuning, to 3D joint trajectories is load-bearing for all reported gains. Time-series foundation models are typically trained on univariate or low-dimensional series from unrelated domains; the manuscript must demonstrate (via ablation or embedding analysis) that the resulting manifold preserves the spatial kinematic constraints and inter-joint correlations necessary for genotype discrimination, rather than merely capturing generic temporal statistics.
Authors: We acknowledge that explicit validation of the frozen transfer is important given the domain shift. The current manuscript demonstrates superiority over hand-crafted kinematic features on genotype prediction and behavior classification tasks, which provides indirect evidence that the embeddings capture task-relevant structure beyond generic temporal statistics. To directly address the concern, we will add (i) an ablation comparing frozen versus fine-tuned embeddings and (ii) an analysis of the embedding space (e.g., correlation of embedding distances with inter-joint kinematic features) in the revised Methods and Results sections. revision: yes
Circularity Check
No significant circularity detected
full rationale
The abstract and available description contain no equations, parameter-fitting steps, self-citations, or ansatzes that reduce any claimed prediction or result to its own inputs by construction. The central claims rest on empirical comparisons of a pretrained time-series model applied to pose data versus hand-crafted baselines, with no mathematical derivation chain presented that could be self-referential. This is the normal case of a self-contained empirical ML paper whose validity is externally falsifiable via replication on the reported datasets.
Axiom & Free-Parameter Ledger
Reference graph
Works this paper leans on
-
[1]
Parkinson’s disease: clinical features and diagnosis
Jankovic J. Parkinson’s disease: clinical features and diagnosis. Journal of neurology, neurosurgery & psychiatry. 2008;79(4):368-76
2008
-
[2]
Huntington disease
Bates GP, Dorsey R, Gusella JF, et al. Huntington disease. Nature reviews Disease primers. 2015;1(1):1-21
2015
-
[3]
Autism spectrum disorder: a review
Hirota T, King BH. Autism spectrum disorder: a review. Jama. 2023;329(2):157-68
2023
-
[4]
Overview of mouse models of autism spectrum disorders
Bey AL, Jiang Yh. Overview of mouse models of autism spectrum disorders. Current protocols in pharmacology. 2014;66(1):5-66
2014
-
[5]
Assessing behavioural and cognitive domains of autism spectrum disorders in rodents: current status and future perspectives
Kas MJ, Glennon JC, Buitelaar J, et al. Assessing behavioural and cognitive domains of autism spectrum disorders in rodents: current status and future perspectives. Psychopharmacology. 2014;231(6):1125-46
2014
-
[6]
DeepLabCut: markerless pose estimation of user-defined body parts with deep learning
Mathis A, Mamidanna P, Cury KM, et al. DeepLabCut: markerless pose estimation of user-defined body parts with deep learning. Nature neuroscience. 2018;21(9):1281-9
2018
-
[7]
Fast animal pose estimation using deep neural networks
Pereira TD, Aldarondo DE, Willmore L, et al. Fast animal pose estimation using deep neural networks. Nature methods. 2019;16(1):117-25
2019
-
[8]
SLEAP: A deep learning system for multi-animal pose tracking
Pereira TD, Tabris N, Matsliah A, et al. SLEAP: A deep learning system for multi-animal pose tracking. Nature methods. 2022;19(4):486-95
2022
-
[9]
Geometric deep learning enables 3D kinematic profiling across species and environments
Dunn TW, Marshall JD, Severson KS, et al. Geometric deep learning enables 3D kinematic profiling across species and environments. Nature methods. 2021;18(5):564-73
2021
-
[10]
Foundation models in bioinformatics
Guo F, Guan R, Li Y , et al. Foundation models in bioinformatics. National science review. 2025;12(4):nwaf028
2025
-
[11]
Principal component analysis
Wold S, Esbensen K, Geladi P. Principal component analysis. Chemometrics and intelligent laboratory systems. 1987;2(1-3):37-52
1987
-
[12]
Wavelet transform
Zhang D. Wavelet transform. In: Fundamentals of image data mining: Analysis, Features, Classification and Retrieval. Springer; 2019. p. 35-44
2019
-
[13]
Feature dimensionality reduction: a review
Jia W, Sun M, Lian J, Hou S. Feature dimensionality reduction: a review. Complex & Intelligent Systems. 2022;8(3):2663-93
2022
-
[14]
Mapping the landscape of social behavior
Klibaite U, Li T, Aldarondo D, Akoad JF, Ölveczky BP, Dunn TW. Mapping the landscape of social behavior. Cell. 2025;188(8):2249-66
2025
-
[15]
Multi-animal 3D social pose estimation, identification and behaviour embedding with a few-shot learning framework
Han Y , Chen K, Wang Y , et al. Multi-animal 3D social pose estimation, identification and behaviour embedding with a few-shot learning framework. Nature machine intelligence. 2024;6(1):48-61
2024
-
[16]
Keypoint-MoSeq: parsing behavior by linking point tracking to pose dynamics
Weinreb C, Pearl JE, Lin S, et al. Keypoint-MoSeq: parsing behavior by linking point tracking to pose dynamics. Nature Methods. 2024;21(7):1329-39
2024
-
[17]
Learnable latent embeddings for joint behavioural and neural analysis
Schneider S, Lee JH, Mathis MW. Learnable latent embeddings for joint behavioural and neural analysis. Nature. 2023;617(7960):360-8
2023
-
[18]
Moment: A family of open time-series foundation models
Goswami M, Szafer K, Choudhry A, Cai Y , Li S, Dubrawski A. Moment: A family of open time-series foundation models. arXiv preprint arXiv:240203885. 2024
2024
-
[19]
Silhouettes: a graphical aid to the interpretation and validation of cluster analysis
Rousseeuw PJ. Silhouettes: a graphical aid to the interpretation and validation of cluster analysis. Journal of computational and applied mathematics. 1987;20:53-65
1987
-
[20]
Cluster ensembles—a knowledge reuse framework for combining multiple partitions
Strehl A, Ghosh J. Cluster ensembles—a knowledge reuse framework for combining multiple partitions. Journal of machine learning research. 2002;3(Dec):583-617
2002
-
[21]
Cntnap2 loss drives striatal neuron hyperexcitability and behavioral inflexibility
Cording KR, Tu EM, Wang H, Agopyan-Miu AH, Bateup HS. Cntnap2 loss drives striatal neuron hyperexcitability and behavioral inflexibility. bioRxiv. 2025:2024-05
2025
-
[22]
Early environmental enrichment for autism spec- trum disorder Fmr1 mice models has positive behavioral and molecular effects
Chen Ys, Zhang Sm, Yue Cx, Xiang P, Li Jq, Wei Z, et al. Early environmental enrichment for autism spec- trum disorder Fmr1 mice models has positive behavioral and molecular effects. Experimental Neurology. 2022;352:114033
2022
-
[23]
Etsformer: Exponential smoothing transformers for time-series forecasting
Woo G, Liu C, Sahoo D, Kumar A, Hoi S. Etsformer: Exponential smoothing transformers for time-series forecasting. arXiv preprint arXiv:220201381. 2022
2022
-
[24]
Are transformers effective for time series forecasting? In: Proceedings of the AAAI conference on artificial intelligence
Zeng A, Chen M, Zhang L, Xu Q. Are transformers effective for time series forecasting? In: Proceedings of the AAAI conference on artificial intelligence. vol. 37; 2023. p. 11121-8
2023
-
[25]
Algorithm AS 136: A k-means clustering algorithm
Hartigan JA, Wong MA. Algorithm AS 136: A k-means clustering algorithm. Journal of the royal statistical society series c (applied statistics). 1979;28(1):100-8
1979
-
[26]
Umap: Uniform manifold approximation and projection for dimension reduction
McInnes L, Healy J, Melville J. Umap: Uniform manifold approximation and projection for dimension reduction. arXiv preprint arXiv:180203426. 2018
2018
-
[27]
The role of the CNTNAP2 gene in the development of autism spectrum disorder
Valeeva EV , Sabirov IS, Safiullina LR, et al. The role of the CNTNAP2 gene in the development of autism spectrum disorder. Research in Autism Spectrum Disorders. 2024;114:102409
2024
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.