An Objective Performance Evaluation of the LSTM Networks in Time Series Classification

Balakumar Balasingam; Sooraj Sunil

arxiv: 2605.19311 · v1 · pith:VIEPRV2Nnew · submitted 2026-05-19 · 💻 cs.LG · eess.SP

An Objective Performance Evaluation of the LSTM Networks in Time Series Classification

Sooraj Sunil , Balakumar Balasingam This is my paper

Pith reviewed 2026-05-20 06:51 UTC · model grok-4.3

classification 💻 cs.LG eess.SP

keywords LSTMtime series classificationperformance evaluationexpectation maximizationKalman filternoise statisticsbinary classificationstate space models

0 comments

The pith

LSTM classifiers require larger noise statistic separations than model-based EM to achieve reliable time series classification.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper evaluates LSTM networks against an expectation maximization classifier for binary classification of time series generated from two scalar linear Gaussian state space models that differ only in noise statistics. It benchmarks both against the optimal Kalman filter likelihood ratio test through Monte Carlo simulations varying the noise separation, sequence length, and training set size. The results indicate that the structure-exploiting EM classifier performs close to the reference, while LSTM needs greater noise separation for good performance and falls short of the reference when only measurement noise differs, irrespective of sequence length or training data volume.

Core claim

Through Monte Carlo simulations on data from scalar linear Gaussian state space models differing only in noise statistics, the LSTM classifier is shown to require a larger separation in noise statistics to achieve reliable classification compared to the EM classifier, with its performance saturating below the Kalman filter reference when the models differ only in measurement noise, regardless of sequence length or training dataset size.

What carries the argument

The evaluation framework comparing LSTM, EM, and Kalman likelihood ratio test classifiers on synthetic scalar linear Gaussian state space model data with controlled noise differences.

If this is right

The EM classifier, leveraging known model structure, achieves performance near the optimal reference with smaller noise separations.
LSTM performance does not reach the reference level in cases where models differ only in measurement noise even with increased sequence lengths or larger training sets.
These results underscore the benefit of using model-based approaches when the data conforms to known physical models in time series classification.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

In real applications with approximate models, the performance gap between LSTM and model-based methods may narrow if the assumed structure is not exact.
Hybrid methods that incorporate partial model knowledge into neural networks could potentially reduce the required noise separation for LSTMs.
Testing on multivariate or nonlinear time series could reveal whether the LSTM's limitations are specific to this scalar linear Gaussian setup.

Load-bearing premise

The generated time series data exactly follows the scalar linear Gaussian state space model assumptions used by the EM and Kalman methods.

What would settle it

If the LSTM classifier achieves performance comparable to the EM classifier with small noise separations in simulations where the data deviates from the linear Gaussian model, the observed performance difference would be called into question.

Figures

Figures reproduced from arXiv: 2605.19311 by Balakumar Balasingam, Sooraj Sunil.

**Figure 2.** Figure 2: Data flow through an LSTM layer. tangent state activation function; and ⊙ denotes the elementwise (Hadamard) product. The final hidden state hT ∈ R nh , where nh denotes the number of hidden units, is passed to a fully connected layer which applies an affine transformation: y = WhT + b, (24) where W ∈ R 2×nh and b ∈ R 2 are the weight matrix and bias vector of the fully connected layer, respectively, and … view at source ↗

**Figure 3.** Figure 3: Sample observation sequences for varying process [PITH_FULL_IMAGE:figures/full_fig_p005_3.png] view at source ↗

**Figure 5.** Figure 5: Classification accuracy as a function of noise ratio. [PITH_FULL_IMAGE:figures/full_fig_p005_5.png] view at source ↗

**Figure 4.** Figure 4: Sample observation sequences for varying measureme [PITH_FULL_IMAGE:figures/full_fig_p005_4.png] view at source ↗

**Figure 6.** Figure 6: Classification accuracy as a function of sequence len [PITH_FULL_IMAGE:figures/full_fig_p006_6.png] view at source ↗

**Figure 7.** Figure 7: Classification accuracy as a function of the number of [PITH_FULL_IMAGE:figures/full_fig_p006_7.png] view at source ↗

read the original abstract

The rapid adoption of deep learning has increasingly led to data-driven models replacing classical model-based algorithms, even in domains governed by well-understood physical laws. While data-driven models, such as long short-term memory (LSTM) networks, have become a popular choice for time-series analysis, their performance relative to model-based approaches in structured environments is rarely evaluated objectively. This paper presents a performance evaluation framework comparing an LSTM classifier against a model-based expectation maximization (EM) classifier for binary time-series classification. The evaluation is conducted on two scalar linear Gaussian state space models differing only in their noise statistics, where the Kalman filter likelihood ratio test with true parameters serves as a reference for the best achievable classification performance.Through Monte Carlo simulations, the classifiers are evaluated across three axes: task difficulty, controlled by the separation in process or measurement noise between the two models; sequence length; and training dataset size. The results show that the EM classifier, which exploits the known model structure, performs strongly when the data conform to the assumed model class. The LSTM classifier requires a larger separation in noise statistics to achieve reliable classification, and its performance saturates below the reference classifier when the models differ only in measurement noise, regardless of sequence length or training dataset size.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. The paper presents an empirical framework for objectively evaluating LSTM networks in binary time-series classification against a model-based expectation-maximization (EM) classifier, using synthetic data generated from two scalar linear Gaussian state-space models that differ only in noise statistics. A Kalman filter likelihood-ratio test with known true parameters serves as the optimal reference. Monte Carlo simulations assess performance across axes of task difficulty (separation in process or measurement noise), sequence length, and training dataset size. Key results indicate that the EM classifier exploits the known structure effectively, while the LSTM requires larger noise separation for reliable classification and its performance saturates below the reference when models differ only in measurement noise, independent of sequence length or training set size.

Significance. If the central empirical findings hold, the work supplies a controlled, reproducible benchmark demonstrating that data-driven LSTMs can underperform model-based methods that exploit known generative structure, even as sequence length and data volume increase. The use of Monte Carlo simulations with a clear optimal reference classifier (Kalman LRT) and synthetic data drawn exactly from the assumed class strengthens the objectivity of the comparison and provides falsifiable, quantitative evidence on the limits of purely data-driven approaches in structured time-series domains.

major comments (2)

[Abstract and Results] Abstract and Results section: The claim that LSTM performance 'saturates below the reference classifier ... regardless of sequence length or training dataset size' is supported only by Monte Carlo trials on a finite grid of sequence lengths and training-set sizes. No experiments are reported for substantially larger regimes, and no theoretical argument is given showing that an LSTM cannot approximate the likelihood-ratio test or the relevant sufficient statistics (e.g., distinguishing measurement-noise variances) in the large-data, long-sequence limit. This leaves the independence assertion as an extrapolation rather than a demonstrated property.
[Section 3] Section 3 (LSTM implementation): The manuscript provides insufficient detail on the LSTM architecture (layers, hidden units, cell state), hyperparameter selection procedure, training protocol (optimizer, learning-rate schedule, regularization, early stopping), and statistical significance testing of the reported performance gaps. These omissions make it difficult to determine whether the observed saturation is intrinsic to the LSTM class or sensitive to implementation choices.

minor comments (1)

[Figures] Figure captions and axis labels in the performance plots could more explicitly state the exact ranges of sequence length T and training-set size N used in each panel to aid quick interpretation.

Simulated Author's Rebuttal

2 responses · 1 unresolved

We thank the referee for the constructive comments. We have revised the manuscript to qualify our empirical claims more precisely and to provide the requested implementation details. Point-by-point responses follow.

read point-by-point responses

Referee: [Abstract and Results] Abstract and Results section: The claim that LSTM performance 'saturates below the reference classifier ... regardless of sequence length or training dataset size' is supported only by Monte Carlo trials on a finite grid of sequence lengths and training-set sizes. No experiments are reported for substantially larger regimes, and no theoretical argument is given showing that an LSTM cannot approximate the likelihood-ratio test or the relevant sufficient statistics (e.g., distinguishing measurement-noise variances) in the large-data, long-sequence limit. This leaves the independence assertion as an extrapolation rather than a demonstrated property.

Authors: We agree that the reported Monte Carlo trials cover only a finite grid of sequence lengths and training-set sizes, with no accompanying theoretical analysis of the large-data or long-sequence limit. We have therefore revised the abstract and Results section to state that saturation below the reference is observed within the tested regimes, removing the absolute phrasing 'regardless of sequence length or training dataset size'. A formal proof that LSTMs cannot approximate the Kalman LRT or the relevant sufficient statistics in the infinite limit lies outside the empirical scope of this study. revision: partial
Referee: [Section 3] Section 3 (LSTM implementation): The manuscript provides insufficient detail on the LSTM architecture (layers, hidden units, cell state), hyperparameter selection procedure, training protocol (optimizer, learning-rate schedule, regularization, early stopping), and statistical significance testing of the reported performance gaps. These omissions make it difficult to determine whether the observed saturation is intrinsic to the LSTM class or sensitive to implementation choices.

Authors: We have expanded Section 3 to include the requested details. The LSTM consists of two stacked layers with 64 hidden units each and standard cell-state implementation. Hyperparameters were selected by grid search on a validation set; training employed the Adam optimizer with initial learning rate 0.001 and exponential decay, L2 regularization, and early stopping after five epochs without validation improvement. Performance differences were evaluated for statistical significance via paired t-tests over the Monte Carlo repetitions. revision: yes

standing simulated objections not resolved

A theoretical argument establishing that LSTMs cannot approximate the likelihood-ratio test or relevant sufficient statistics in the large-data, long-sequence limit.

Circularity Check

0 steps flagged

Purely empirical comparison with no derivation chain or self-referential reductions

full rationale

The paper performs Monte Carlo simulations of LSTM, EM, and Kalman LRT classifiers on synthetic scalar linear Gaussian SSM data differing only in noise statistics. Performance is evaluated directly against the known-optimal Kalman reference using true parameters. No equations, predictions, or first-principles results are claimed; all statements follow from finite-sample empirical trials across task difficulty, sequence length, and dataset size. No self-citations, fitted inputs renamed as predictions, or ansatzes appear in the load-bearing claims. The study is self-contained against external benchmarks (synthetic data generation and optimal reference), satisfying the criteria for score 0.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

The evaluation rests on standard assumptions of linear Gaussian state space models and correct model specification for the EM and Kalman components; no new free parameters, axioms, or invented entities are introduced beyond the controlled simulation design.

pith-pipeline@v0.9.0 · 5748 in / 1146 out tokens · 67961 ms · 2026-05-20T06:51:02.034457+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

The LSTM classifier requires a larger separation in noise statistics to achieve reliable classification, and its performance saturates below the reference classifier when the models differ only in measurement noise, regardless of sequence length or training dataset size.
IndisputableMonolith/Foundation/ArithmeticFromLogic.lean reality_from_one_distinction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

Through Monte Carlo simulations, the classifiers are evaluated across three axes: task difficulty, controlled by the separation in process or measurement noise between the two models; sequence length; and training dataset size.

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

21 extracted references · 21 canonical work pages

[1]

An early cl assiﬁcation approach for multivariate time series of on-vehicle sensor s in transporta- tion,

A. Gupta, H. P . Gupta, B. Biswas, and T. Dutta, “An early cl assiﬁcation approach for multivariate time series of on-vehicle sensor s in transporta- tion,” IEEE Transactions on Intelligent Transportation Systems , vol. 21, no. 12, pp. 5316–5327, 2020

work page 2020
[2]

Multichannel-based m ultiview shallow fusion for time series classiﬁcation and its applic ation in fault diagnosis,

C. He, X. Huo, Y . Jiang, and C. Zhu, “Multichannel-based m ultiview shallow fusion for time series classiﬁcation and its applic ation in fault diagnosis,” IEEE Transactions on Systems, Man, and Cybernetics: Systems, 2025

work page 2025
[3]

Deep learning for time series classiﬁcation: a review,

H. Ismail Fawaz, G. Forestier, J. Weber, L. Idoumghar, an d P .-A. Muller, “Deep learning for time series classiﬁcation: a review,” Data mining and knowledge discovery, vol. 33, no. 4, pp. 917–963, 2019

work page 2019
[4]

Tim e- series classiﬁcation in smart manufacturing systems: An ex perimental evaluation of state-of-the-art machine learning algorith ms,

M. A. Farahani, M. McCormick, R. Harik, and T. Wuest, “Tim e- series classiﬁcation in smart manufacturing systems: An ex perimental evaluation of state-of-the-art machine learning algorith ms,” Robotics and Computer-Integrated Manufacturing, vol. 91, p. 102839, 2025

work page 2025
[5]

MSCGN: Multiscal e complementary gating network for time series classiﬁcatio n,

X. Wu, M. Y an, H. Tang, D. Wu, and L. Xie, “MSCGN: Multiscal e complementary gating network for time series classiﬁcatio n,” Biomedi- cal Signal Processing and Control , vol. 112, p. 108563, 2026

work page 2026
[6]

A new approach to linear ﬁltering and predi ction problems,

R. E. Kalman, “A new approach to linear ﬁltering and predi ction problems,” 1960

work page 1960
[7]

Bar-Shalom, X

Y . Bar-Shalom, X. R. Li, and T. Kirubarajan, Estimation with applica- tions to tracking and navigation: theory algorithms and sof tware. John Wiley & Sons, 2004

work page 2004
[8]

Approaches to adaptive ﬁltering,

R. Mehra, “Approaches to adaptive ﬁltering,” IEEE Transactions on automatic control, vol. 17, no. 5, pp. 693–698, 2003

work page 2003
[9]

Maximum like lihood from incomplete data via the EM algorithm,

A. P . Dempster, N. M. Laird, and D. B. Rubin, “Maximum like lihood from incomplete data via the EM algorithm,” Journal of the Royal Statistical Society. Series B (Methodological) , vol. 39, no. 1, pp. 1–38, 1977

work page 1977
[10]

An approach to time seri es smoothing and forecasting using the EM algorithm,

R. H. Shumway and D. S. Stoffer, “An approach to time seri es smoothing and forecasting using the EM algorithm,” Journal of time series analysis , vol. 3, no. 4, pp. 253–264, 1982

work page 1982
[11]

A compar ison of ARIMA and LSTM in forecasting time series,

S. Siami-Namini, N. Tavakoli, and A. S. Namin, “A compar ison of ARIMA and LSTM in forecasting time series,” in 2018 17th IEEE in- ternational conference on machine learning and applicatio ns (ICMLA) , pp. 1394–1401, Ieee, 2018

work page 2018
[12]

The perf ormance of LSTM and BiLSTM in forecasting time series,

S. Siami-Namini, N. Tavakoli, and A. S. Namin, “The perf ormance of LSTM and BiLSTM in forecasting time series,” in 2019 IEEE International conference on big data (Big Data) , pp. 3285–3292, IEEE, 2019

work page 2019
[13]

A deep learning model for sm art manufac- turing using convolutional LSTM neural network autoencode rs,

A. Essien and C. Giannetti, “A deep learning model for sm art manufac- turing using convolutional LSTM neural network autoencode rs,” IEEE Transactions on Industrial Informatics , vol. 16, no. 9, pp. 6069–6078, 2020

work page 2020
[14]

LSTM full y convo- lutional networks for time series classiﬁcation,

F. Karim, S. Majumdar, H. Darabi, and S. Chen, “LSTM full y convo- lutional networks for time series classiﬁcation,” IEEE access , vol. 6, pp. 1662–1669, 2017

work page 2017
[15]

Multi variate lstm- fcns for time series classiﬁcation,

F. Karim, S. Majumdar, H. Darabi, and S. Harford, “Multi variate lstm- fcns for time series classiﬁcation,” Neural networks, vol. 116, pp. 237– 245, 2019

work page 2019
[16]

On the size of conv olutional neural networks and generalization performance,

M. Kabkab, E. Hand, and R. Chellappa, “On the size of conv olutional neural networks and generalization performance,” in 2016 23rd Interna- tional Conference on Pattern Recognition (ICPR) , pp. 3572–3577, IEEE, 2016

work page 2016
[17]

Mod el-based deep learning for maneuvering target tracking,

N. Forti, L. M. Milleﬁori, P . Braca, and P . Willett, “Mod el-based deep learning for maneuvering target tracking,” in 2023 26th International Conference on Information Fusion (FUSION) , pp. 1–6, IEEE, 2023

work page 2023
[18]

Time series regression and ex- ploratory data analysis,

R. H. Shumway and D. S. Stoffer, “Time series regression and ex- ploratory data analysis,” in Time Series Analysis and its Applications , pp. 47–82, Springer, 2011

work page 2011
[19]

Gradient ﬂow in recurrent nets: the difﬁculty of learning long-term d ependencies,

S. Hochreiter, Y . Bengio, P . Frasconi, J. Schmidhuber, et al. , “Gradient ﬂow in recurrent nets: the difﬁculty of learning long-term d ependencies,” 2001

work page 2001
[20]

LSTM: A search space odyssey,

K. Greff, R. K. Srivastava, J. Koutn´ ık, B. R. Steunebri nk, and J. Schmid- huber, “LSTM: A search space odyssey,” IEEE transactions on neural networks and learning systems , vol. 28, no. 10, pp. 2222–2232, 2016

work page 2016
[21]

Tiny machine learning (tiny-ml) for efﬁcient channel estimation and signal detec tion,

H. Liu, Z. Wei, H. Zhang, B. Li, and C. Zhao, “Tiny machine learning (tiny-ml) for efﬁcient channel estimation and signal detec tion,” IEEE Transactions on V ehicular Technology, vol. 71, no. 6, pp. 6795–6800, 2022

work page 2022

[1] [1]

An early cl assiﬁcation approach for multivariate time series of on-vehicle sensor s in transporta- tion,

A. Gupta, H. P . Gupta, B. Biswas, and T. Dutta, “An early cl assiﬁcation approach for multivariate time series of on-vehicle sensor s in transporta- tion,” IEEE Transactions on Intelligent Transportation Systems , vol. 21, no. 12, pp. 5316–5327, 2020

work page 2020

[2] [2]

Multichannel-based m ultiview shallow fusion for time series classiﬁcation and its applic ation in fault diagnosis,

C. He, X. Huo, Y . Jiang, and C. Zhu, “Multichannel-based m ultiview shallow fusion for time series classiﬁcation and its applic ation in fault diagnosis,” IEEE Transactions on Systems, Man, and Cybernetics: Systems, 2025

work page 2025

[3] [3]

Deep learning for time series classiﬁcation: a review,

H. Ismail Fawaz, G. Forestier, J. Weber, L. Idoumghar, an d P .-A. Muller, “Deep learning for time series classiﬁcation: a review,” Data mining and knowledge discovery, vol. 33, no. 4, pp. 917–963, 2019

work page 2019

[4] [4]

Tim e- series classiﬁcation in smart manufacturing systems: An ex perimental evaluation of state-of-the-art machine learning algorith ms,

M. A. Farahani, M. McCormick, R. Harik, and T. Wuest, “Tim e- series classiﬁcation in smart manufacturing systems: An ex perimental evaluation of state-of-the-art machine learning algorith ms,” Robotics and Computer-Integrated Manufacturing, vol. 91, p. 102839, 2025

work page 2025

[5] [5]

MSCGN: Multiscal e complementary gating network for time series classiﬁcatio n,

X. Wu, M. Y an, H. Tang, D. Wu, and L. Xie, “MSCGN: Multiscal e complementary gating network for time series classiﬁcatio n,” Biomedi- cal Signal Processing and Control , vol. 112, p. 108563, 2026

work page 2026

[6] [6]

A new approach to linear ﬁltering and predi ction problems,

R. E. Kalman, “A new approach to linear ﬁltering and predi ction problems,” 1960

work page 1960

[7] [7]

Bar-Shalom, X

Y . Bar-Shalom, X. R. Li, and T. Kirubarajan, Estimation with applica- tions to tracking and navigation: theory algorithms and sof tware. John Wiley & Sons, 2004

work page 2004

[8] [8]

Approaches to adaptive ﬁltering,

R. Mehra, “Approaches to adaptive ﬁltering,” IEEE Transactions on automatic control, vol. 17, no. 5, pp. 693–698, 2003

work page 2003

[9] [9]

Maximum like lihood from incomplete data via the EM algorithm,

A. P . Dempster, N. M. Laird, and D. B. Rubin, “Maximum like lihood from incomplete data via the EM algorithm,” Journal of the Royal Statistical Society. Series B (Methodological) , vol. 39, no. 1, pp. 1–38, 1977

work page 1977

[10] [10]

An approach to time seri es smoothing and forecasting using the EM algorithm,

R. H. Shumway and D. S. Stoffer, “An approach to time seri es smoothing and forecasting using the EM algorithm,” Journal of time series analysis , vol. 3, no. 4, pp. 253–264, 1982

work page 1982

[11] [11]

A compar ison of ARIMA and LSTM in forecasting time series,

S. Siami-Namini, N. Tavakoli, and A. S. Namin, “A compar ison of ARIMA and LSTM in forecasting time series,” in 2018 17th IEEE in- ternational conference on machine learning and applicatio ns (ICMLA) , pp. 1394–1401, Ieee, 2018

work page 2018

[12] [12]

The perf ormance of LSTM and BiLSTM in forecasting time series,

S. Siami-Namini, N. Tavakoli, and A. S. Namin, “The perf ormance of LSTM and BiLSTM in forecasting time series,” in 2019 IEEE International conference on big data (Big Data) , pp. 3285–3292, IEEE, 2019

work page 2019

[13] [13]

A deep learning model for sm art manufac- turing using convolutional LSTM neural network autoencode rs,

A. Essien and C. Giannetti, “A deep learning model for sm art manufac- turing using convolutional LSTM neural network autoencode rs,” IEEE Transactions on Industrial Informatics , vol. 16, no. 9, pp. 6069–6078, 2020

work page 2020

[14] [14]

LSTM full y convo- lutional networks for time series classiﬁcation,

F. Karim, S. Majumdar, H. Darabi, and S. Chen, “LSTM full y convo- lutional networks for time series classiﬁcation,” IEEE access , vol. 6, pp. 1662–1669, 2017

work page 2017

[15] [15]

Multi variate lstm- fcns for time series classiﬁcation,

F. Karim, S. Majumdar, H. Darabi, and S. Harford, “Multi variate lstm- fcns for time series classiﬁcation,” Neural networks, vol. 116, pp. 237– 245, 2019

work page 2019

[16] [16]

On the size of conv olutional neural networks and generalization performance,

M. Kabkab, E. Hand, and R. Chellappa, “On the size of conv olutional neural networks and generalization performance,” in 2016 23rd Interna- tional Conference on Pattern Recognition (ICPR) , pp. 3572–3577, IEEE, 2016

work page 2016

[17] [17]

Mod el-based deep learning for maneuvering target tracking,

N. Forti, L. M. Milleﬁori, P . Braca, and P . Willett, “Mod el-based deep learning for maneuvering target tracking,” in 2023 26th International Conference on Information Fusion (FUSION) , pp. 1–6, IEEE, 2023

work page 2023

[18] [18]

Time series regression and ex- ploratory data analysis,

R. H. Shumway and D. S. Stoffer, “Time series regression and ex- ploratory data analysis,” in Time Series Analysis and its Applications , pp. 47–82, Springer, 2011

work page 2011

[19] [19]

Gradient ﬂow in recurrent nets: the difﬁculty of learning long-term d ependencies,

S. Hochreiter, Y . Bengio, P . Frasconi, J. Schmidhuber, et al. , “Gradient ﬂow in recurrent nets: the difﬁculty of learning long-term d ependencies,” 2001

work page 2001

[20] [20]

LSTM: A search space odyssey,

K. Greff, R. K. Srivastava, J. Koutn´ ık, B. R. Steunebri nk, and J. Schmid- huber, “LSTM: A search space odyssey,” IEEE transactions on neural networks and learning systems , vol. 28, no. 10, pp. 2222–2232, 2016

work page 2016

[21] [21]

Tiny machine learning (tiny-ml) for efﬁcient channel estimation and signal detec tion,

H. Liu, Z. Wei, H. Zhang, B. Li, and C. Zhao, “Tiny machine learning (tiny-ml) for efﬁcient channel estimation and signal detec tion,” IEEE Transactions on V ehicular Technology, vol. 71, no. 6, pp. 6795–6800, 2022

work page 2022