pith. sign in

arxiv: 2605.24370 · v2 · pith:B2VAGPMTnew · submitted 2026-05-23 · 💻 cs.LG · q-bio.QM

GEESE: Genotype-aware End-to-End Spatio-temporal Embedding for Behavioral Phenotyping

Pith reviewed 2026-06-30 14:55 UTC · model grok-4.3

classification 💻 cs.LG q-bio.QM
keywords behavioral phenotypinggenotype prediction3D pose dynamicsdeep learningautism modelstime series embeddingend-to-end learningspatio-temporal embedding
0
0 comments X

The pith

End-to-end deep learning from 3D pose dynamics predicts genotypes in autism models better than hand-crafted features.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents GEESE as a framework that learns behavioral representations directly from 3D pose dynamics using a pretrained time series foundation model, bypassing manual feature engineering. This is tested on three genetic mouse models linked to autism, where it outperforms baseline methods in both behavior classification and genotype prediction. The learned embeddings are shown to capture genotype-specific behavioral signatures that generalize across different genetic backgrounds. An all-cohort model further demonstrates that movement patterns alone can reveal both the genetic background and specific genotype. This suggests a scalable way to perform behavioral phenotyping without labor-intensive manual work.

Core claim

GEESE encodes 3D movement sequences into a behavioral manifold via a pretrained time series foundation model, producing representations that surpass hand-crafted feature baselines in classifying behaviors and predicting genotypes for CNTNAP2, CHD8, and FMR1 models while generalizing across genetic backgrounds.

What carries the argument

The GEESE end-to-end framework that applies a pretrained time series foundation model to 3D pose dynamics to generate a behavioral manifold for genotype-aware embedding.

If this is right

  • Learned representations capture genotype-specific behavioral signatures.
  • The framework generalizes across genetic backgrounds.
  • An all-cohort model identifies both genetic background and genotype from movement patterns alone.
  • The method supports behavior classification and genotype prediction without hand-crafted features.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • This could allow larger-scale genetic studies by automating phenotyping from pose data.
  • Embeddings may reveal behavioral patterns not easily captured by traditional features.
  • The approach might extend to other species or behavioral assays if the foundation model generalizes.

Load-bearing premise

A pretrained time series foundation model applied directly to 3D pose dynamics produces a behavioral manifold whose embeddings are informative for genotype prediction without domain-specific adaptation or hand-crafted features.

What would settle it

An experiment showing that GEESE embeddings do not outperform hand-crafted feature baselines in genotype prediction accuracy on the autism-associated genetic models.

Figures

Figures reproduced from arXiv: 2605.24370 by Chunqi Qian, Yiran Ding, Yuen Gao, Zijun Cui.

Figure 1
Figure 1. Figure 1: System architecture. Pose sequences are processed by a pretrained time series model, producing behavioral representations. Training on behavioral labels organizes these representations by behavior type, enabling behavior classification. The same representations support genotype prediction after brief fine-tuning on genotype labels. 3.1 Preliminary: Time Series Foundation Models MOMENT18 is an open-source f… view at source ↗
Figure 2
Figure 2. Figure 2: Learned behavioral representations. (a) Behavioral clusters via K-Means (k=9), visualized with UMAP. (b–f) Representative skeleton sequences. (g) Genotype enrichment per cluster, showing differential distribution of WT, HET, and HOM animals. type and genotype dosage in a shared space, enabling the model to capture not only what an animal is doing but how its genetic background modulates that behavior. To q… view at source ↗
Figure 3
Figure 3. Figure 3: Genotype-behavior association learning. (a) Before and (b) after genotype fine-tuning. Top row: ground truth genotype distributions; bottom row: predicted distributions. Columns show WT (blue), HET (red), and HOM (green). After fine-tuning, predicted distributions converge toward ground truth, with HOM occupying a distinct region [PITH_FULL_IMAGE:figures/full_fig_p008_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Concept diagram of HONK, the interactive analysis agent built on the GEESE pipeline. Users pose natural language queries that are routed to the pipeline, which encodes 3D pose data into a behavioral manifold for behavior classification, genotype prediction, and interactive visualization. The system provides real-time analysis without requiring programming expertise or local computational resources. Functio… view at source ↗
read the original abstract

Behavioral phenotyping of genetic animal models currently requires labor-intensive manual feature engineering that limits reproducibility and scalability. We present GEESE, an end-to-end deep learning framework that learns behavioral representations directly from 3D pose dynamics without hand-crafted features. Using a pretrained time series foundation model, we encode movement sequences into a behavioral manifold that supports both behavior classification and genotype prediction. Evaluated across three autism-associated genetic models (CNTNAP2, CHD8, FMR1), our deep learning approach surpasses hand-crafted feature baselines in both tasks, revealing that learned representations capture genotype-specific behavioral signatures. The framework generalizes across genetic backgrounds, and an all-cohort model identifies both genetic background and genotype from movement patterns alone. We further provide HONK, an interactive intelligent tool enabling researchers without programming expertise to perform behavioral phenotyping from pose data through natural language interaction.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The manuscript introduces GEESE, an end-to-end deep learning framework that applies a pretrained time series foundation model directly to 3D pose dynamics to produce a behavioral manifold for genotype-aware phenotyping. Evaluated on three autism-associated genetic mouse models (CNTNAP2, CHD8, FMR1), the approach is claimed to surpass hand-crafted feature baselines for both behavior classification and genotype prediction tasks, to generalize across genetic backgrounds, and to enable an all-cohort model that identifies both background and genotype from movement patterns alone. The paper also presents the HONK interactive tool for natural-language phenotyping from pose data.

Significance. If the empirical claims hold with rigorous validation, the work could meaningfully reduce dependence on labor-intensive manual feature engineering in behavioral neuroscience, improving reproducibility and enabling scalable genotype-phenotype mapping. The use of frozen foundation-model embeddings for kinematic time series is a high-risk, high-reward direction; successful transfer without domain adaptation would constitute a notable result. The interactive HONK tool addresses a practical barrier for non-computational researchers. However, the load-bearing assumption—that pretrained univariate time-series models transfer to spatially constrained, multi-joint 3D pose trajectories without adaptation—requires explicit empirical support before the superiority and generalization claims can be accepted.

major comments (2)
  1. [Abstract] Abstract: The central claim that the deep learning approach 'surpasses hand-crafted feature baselines in both tasks' is stated without any quantitative results, error bars, statistical tests, dataset sizes, or exclusion criteria. This renders the superiority, generalization, and all-cohort claims impossible to evaluate from the provided information and places the entire empirical contribution on the methods and results sections.
  2. [Methods] Methods (foundation-model application): The decision to apply the pretrained time series foundation model frozen, with no domain-specific adaptation or fine-tuning, to 3D joint trajectories is load-bearing for all reported gains. Time-series foundation models are typically trained on univariate or low-dimensional series from unrelated domains; the manuscript must demonstrate (via ablation or embedding analysis) that the resulting manifold preserves the spatial kinematic constraints and inter-joint correlations necessary for genotype discrimination, rather than merely capturing generic temporal statistics.
minor comments (2)
  1. [Abstract / Results] The abstract mentions three genetic models but does not specify the number of animals, trials, or total pose sequences per cohort; these details should appear in the first results paragraph or a dedicated data table.
  2. [Methods] Notation for the behavioral manifold and embedding dimensionality is introduced without an explicit equation or diagram; a small schematic would clarify how the foundation-model output is used for the two downstream tasks.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback. We address each major comment point by point below, indicating revisions where the manuscript will be updated to strengthen the presentation of results and validation of the approach.

read point-by-point responses
  1. Referee: [Abstract] Abstract: The central claim that the deep learning approach 'surpasses hand-crafted feature baselines in both tasks' is stated without any quantitative results, error bars, statistical tests, dataset sizes, or exclusion criteria. This renders the superiority, generalization, and all-cohort claims impossible to evaluate from the provided information and places the entire empirical contribution on the methods and results sections.

    Authors: We agree that the abstract would be more informative with key quantitative details. In the revised manuscript we will expand the abstract to report specific performance metrics (including accuracy or AUC improvements with standard errors), dataset sizes (number of animals and trials per cohort), and mention of the statistical tests used for the main comparisons. This will allow readers to evaluate the claims directly from the abstract while preserving its brevity. revision: yes

  2. Referee: [Methods] Methods (foundation-model application): The decision to apply the pretrained time series foundation model frozen, with no domain-specific adaptation or fine-tuning, to 3D joint trajectories is load-bearing for all reported gains. Time-series foundation models are typically trained on univariate or low-dimensional series from unrelated domains; the manuscript must demonstrate (via ablation or embedding analysis) that the resulting manifold preserves the spatial kinematic constraints and inter-joint correlations necessary for genotype discrimination, rather than merely capturing generic temporal statistics.

    Authors: We acknowledge that explicit validation of the frozen transfer is important given the domain shift. The current manuscript demonstrates superiority over hand-crafted kinematic features on genotype prediction and behavior classification tasks, which provides indirect evidence that the embeddings capture task-relevant structure beyond generic temporal statistics. To directly address the concern, we will add (i) an ablation comparing frozen versus fine-tuned embeddings and (ii) an analysis of the embedding space (e.g., correlation of embedding distances with inter-joint kinematic features) in the revised Methods and Results sections. revision: yes

Circularity Check

0 steps flagged

No significant circularity detected

full rationale

The abstract and available description contain no equations, parameter-fitting steps, self-citations, or ansatzes that reduce any claimed prediction or result to its own inputs by construction. The central claims rest on empirical comparisons of a pretrained time-series model applied to pose data versus hand-crafted baselines, with no mathematical derivation chain presented that could be self-referential. This is the normal case of a self-contained empirical ML paper whose validity is externally falsifiable via replication on the reported datasets.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

No technical details available from abstract; ledger cannot be populated.

pith-pipeline@v0.9.1-grok · 5686 in / 1125 out tokens · 27807 ms · 2026-06-30T14:55:15.400937+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

27 extracted references

  1. [1]

    Parkinson’s disease: clinical features and diagnosis

    Jankovic J. Parkinson’s disease: clinical features and diagnosis. Journal of neurology, neurosurgery & psychiatry. 2008;79(4):368-76

  2. [2]

    Huntington disease

    Bates GP, Dorsey R, Gusella JF, et al. Huntington disease. Nature reviews Disease primers. 2015;1(1):1-21

  3. [3]

    Autism spectrum disorder: a review

    Hirota T, King BH. Autism spectrum disorder: a review. Jama. 2023;329(2):157-68

  4. [4]

    Overview of mouse models of autism spectrum disorders

    Bey AL, Jiang Yh. Overview of mouse models of autism spectrum disorders. Current protocols in pharmacology. 2014;66(1):5-66

  5. [5]

    Assessing behavioural and cognitive domains of autism spectrum disorders in rodents: current status and future perspectives

    Kas MJ, Glennon JC, Buitelaar J, et al. Assessing behavioural and cognitive domains of autism spectrum disorders in rodents: current status and future perspectives. Psychopharmacology. 2014;231(6):1125-46

  6. [6]

    DeepLabCut: markerless pose estimation of user-defined body parts with deep learning

    Mathis A, Mamidanna P, Cury KM, et al. DeepLabCut: markerless pose estimation of user-defined body parts with deep learning. Nature neuroscience. 2018;21(9):1281-9

  7. [7]

    Fast animal pose estimation using deep neural networks

    Pereira TD, Aldarondo DE, Willmore L, et al. Fast animal pose estimation using deep neural networks. Nature methods. 2019;16(1):117-25

  8. [8]

    SLEAP: A deep learning system for multi-animal pose tracking

    Pereira TD, Tabris N, Matsliah A, et al. SLEAP: A deep learning system for multi-animal pose tracking. Nature methods. 2022;19(4):486-95

  9. [9]

    Geometric deep learning enables 3D kinematic profiling across species and environments

    Dunn TW, Marshall JD, Severson KS, et al. Geometric deep learning enables 3D kinematic profiling across species and environments. Nature methods. 2021;18(5):564-73

  10. [10]

    Foundation models in bioinformatics

    Guo F, Guan R, Li Y , et al. Foundation models in bioinformatics. National science review. 2025;12(4):nwaf028

  11. [11]

    Principal component analysis

    Wold S, Esbensen K, Geladi P. Principal component analysis. Chemometrics and intelligent laboratory systems. 1987;2(1-3):37-52

  12. [12]

    Wavelet transform

    Zhang D. Wavelet transform. In: Fundamentals of image data mining: Analysis, Features, Classification and Retrieval. Springer; 2019. p. 35-44

  13. [13]

    Feature dimensionality reduction: a review

    Jia W, Sun M, Lian J, Hou S. Feature dimensionality reduction: a review. Complex & Intelligent Systems. 2022;8(3):2663-93

  14. [14]

    Mapping the landscape of social behavior

    Klibaite U, Li T, Aldarondo D, Akoad JF, Ölveczky BP, Dunn TW. Mapping the landscape of social behavior. Cell. 2025;188(8):2249-66

  15. [15]

    Multi-animal 3D social pose estimation, identification and behaviour embedding with a few-shot learning framework

    Han Y , Chen K, Wang Y , et al. Multi-animal 3D social pose estimation, identification and behaviour embedding with a few-shot learning framework. Nature machine intelligence. 2024;6(1):48-61

  16. [16]

    Keypoint-MoSeq: parsing behavior by linking point tracking to pose dynamics

    Weinreb C, Pearl JE, Lin S, et al. Keypoint-MoSeq: parsing behavior by linking point tracking to pose dynamics. Nature Methods. 2024;21(7):1329-39

  17. [17]

    Learnable latent embeddings for joint behavioural and neural analysis

    Schneider S, Lee JH, Mathis MW. Learnable latent embeddings for joint behavioural and neural analysis. Nature. 2023;617(7960):360-8

  18. [18]

    Moment: A family of open time-series foundation models

    Goswami M, Szafer K, Choudhry A, Cai Y , Li S, Dubrawski A. Moment: A family of open time-series foundation models. arXiv preprint arXiv:240203885. 2024

  19. [19]

    Silhouettes: a graphical aid to the interpretation and validation of cluster analysis

    Rousseeuw PJ. Silhouettes: a graphical aid to the interpretation and validation of cluster analysis. Journal of computational and applied mathematics. 1987;20:53-65

  20. [20]

    Cluster ensembles—a knowledge reuse framework for combining multiple partitions

    Strehl A, Ghosh J. Cluster ensembles—a knowledge reuse framework for combining multiple partitions. Journal of machine learning research. 2002;3(Dec):583-617

  21. [21]

    Cntnap2 loss drives striatal neuron hyperexcitability and behavioral inflexibility

    Cording KR, Tu EM, Wang H, Agopyan-Miu AH, Bateup HS. Cntnap2 loss drives striatal neuron hyperexcitability and behavioral inflexibility. bioRxiv. 2025:2024-05

  22. [22]

    Early environmental enrichment for autism spec- trum disorder Fmr1 mice models has positive behavioral and molecular effects

    Chen Ys, Zhang Sm, Yue Cx, Xiang P, Li Jq, Wei Z, et al. Early environmental enrichment for autism spec- trum disorder Fmr1 mice models has positive behavioral and molecular effects. Experimental Neurology. 2022;352:114033

  23. [23]

    Etsformer: Exponential smoothing transformers for time-series forecasting

    Woo G, Liu C, Sahoo D, Kumar A, Hoi S. Etsformer: Exponential smoothing transformers for time-series forecasting. arXiv preprint arXiv:220201381. 2022

  24. [24]

    Are transformers effective for time series forecasting? In: Proceedings of the AAAI conference on artificial intelligence

    Zeng A, Chen M, Zhang L, Xu Q. Are transformers effective for time series forecasting? In: Proceedings of the AAAI conference on artificial intelligence. vol. 37; 2023. p. 11121-8

  25. [25]

    Algorithm AS 136: A k-means clustering algorithm

    Hartigan JA, Wong MA. Algorithm AS 136: A k-means clustering algorithm. Journal of the royal statistical society series c (applied statistics). 1979;28(1):100-8

  26. [26]

    Umap: Uniform manifold approximation and projection for dimension reduction

    McInnes L, Healy J, Melville J. Umap: Uniform manifold approximation and projection for dimension reduction. arXiv preprint arXiv:180203426. 2018

  27. [27]

    The role of the CNTNAP2 gene in the development of autism spectrum disorder

    Valeeva EV , Sabirov IS, Safiullina LR, et al. The role of the CNTNAP2 gene in the development of autism spectrum disorder. Research in Autism Spectrum Disorders. 2024;114:102409