Byte Pair Encoding for Efficient Time Series Forecasting
read the original abstract
Existing time series tokenization methods predominantly encode a constant number of samples into individual tokens. This inflexible approach can generate excessive tokens for even simple patterns like extended constant values, resulting in substantial computational overhead. Inspired by the success of byte pair encoding, we propose the first pattern-centric tokenization scheme for time series analysis. Based on a discrete vocabulary of frequent motifs, our method merges samples with underlying patterns into tokens, compressing time series adaptively. Exploiting our finite set of motifs and the continuous properties of time series, we further introduce conditional decoding as a lightweight yet powerful post-hoc optimization method, which requires no gradient computation and adds no computational overhead. On recent time series foundation models, our motif-based tokenization improves forecasting performance by 40% and boosts efficiency by 2314% on average. Conditional decoding further reduces MSE by up to 48%. In an extensive analysis, we demonstrate the adaptiveness of our tokenization to diverse temporal patterns, its generalization to unseen data, and its meaningful token representations capturing distinct time series properties, including statistical moments and trends.
This paper has not been read by Pith yet.
Forward citations
Cited by 2 Pith papers
-
Dywave: Event-Aligned Dynamic Tokenization for Heterogeneous IoT Sensing Signals
Dywave applies wavelet-based hierarchical decomposition to build dynamic, event-aligned tokens for heterogeneous IoT signals, cutting token length by up to 75% while raising accuracy up to 12% on sequence models.
-
Dywave: Event-Aligned Dynamic Tokenization for Heterogeneous IoT Sensing Signals
Dywave uses wavelet hierarchical decomposition to create event-aligned compact token sequences for heterogeneous IoT signals, yielding up to 12% accuracy gains and 75% shorter inputs on mainstream sequence models acro...
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.