Mantis: Lightweight Foundation Model for Time Series Classification
read the original abstract
While foundation models have revolutionized various domains, their application to time series classification remains rather under-explored, with existing literature predominantly focused on forecasting. To bridge this gap, we introduce \textbf{Mantis}, a transformer-based foundation model pre-trained exclusively on synthetic data via self-supervised contrastive learning. We demonstrate that effective tokenization is critical to unlocking the full potential of transformers, proposing a novel token generator unit. Furthermore, we introduce an enhanced test-time methodology that bridges the performance gap between Mantis and strong specialized approaches by leveraging intermediate-layer representations, self-ensembling, and cross-model embedding fusion. Extensive experiments demonstrate that Mantis establishes a new state-of-the-art, outperforming existing foundation models across four diverse dataset collections covering various application domains.
This paper has not been read by Pith yet.
Forward citations
Cited by 4 Pith papers
-
Beyond IID: How General Are Tabular Foundation Models, Really?
Tabular foundation models excel on tiny- to medium-sized IID data but are outperformed by traditional tree-based and deep learning models on non-IID, large, and high-dimensional datasets, based on evaluations across 1...
-
TelecomTS: A Multi-Modal Observability Dataset for Time Series and Language Analysis
TelecomTS is a new observability dataset from 5G networks that preserves absolute scale and supports multi-modal tasks, showing that current time series and language models struggle with abrupt noisy dynamics.
-
Modular Retrieval-Augmented Generalization for Human Action Recognition
MoRA is a new retrieval-augmented module for IMU-based human activity recognition that uses uncertainty-adaptive fusion of retrieved motion patterns to improve model performance.
-
COMODO: Cross-Modal Video-to-IMU Distillation for Efficient Egocentric Human Activity Recognition
COMODO is a cross-modal self-supervised distillation framework that uses a frozen video encoder and dynamic instance queue to align video and IMU embeddings, improving IMU-based egocentric HAR to match supervised performance.
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.