arxiv: 2602.15089 · v2 · submitted 2026-02-16 · 💻 cs.LG · stat.ML

Recognition: 1 theorem link

· Lean Theorem

Triplet Feature Fusion for Equipment Anomaly Prediction : An Open-Source Methodology Using Small Foundation Models

Takato Yasuno

Authors on Pith no claims yet

Pith reviewed 2026-05-15 21:49 UTC · model grok-4.3

classification 💻 cs.LG stat.ML

keywords equipment anomaly predictionfeature fusionsmall foundation modelstime series embeddingsmultilingual embeddingsHVACpredictive maintenanceedge deployment

0 comments

The pith

Triplet fusion of stats, time-series, and text embeddings predicts equipment anomalies with 0.1% false positive rate.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper proposes fusing three feature types into a single vector for anomaly prediction in industrial equipment. Statistical summaries from 90-day sensor data, embeddings from a small time-series model, and multilingual text embeddings from equipment records are concatenated and classified by LightGBM. This yields high precision and AUC on HVAC data while running fast on CPU. The text component conditions the model on equipment type, cutting false positives sharply without explicit labels.

Core claim

The triplet feature fusion pipeline integrates statistical features (R^28), time-series embeddings (R^64 from LoRA-adapted TTM), and multilingual text embeddings (R^1024) into a concatenated vector processed by LightGBM to predict anomalies 30 to 90 days ahead, attaining 0.992 precision, 0.958 F1-score, 0.998 ROC-AUC, and reducing the false positive rate from 0.6% to 0.1% on a dataset of 64 HVAC units with 67,045 samples.

What carries the argument

The 1,116-dimensional triplet vector formed by concatenating sensor statistics, time-series model outputs, and multilingual text embeddings.

If this is right

Supports fully local inference in under 2 milliseconds on CPU for edge deployment.
Text embeddings align time-series patterns with fault archetypes without explicit categorical encoding.
Enables multi-horizon forecasting at 30, 60, and 90 days using the same trained model.
Relies entirely on open-source components with permissive licenses for easy adoption.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The multilingual embeddings could allow the method to transfer across languages and regions with similar equipment documentation.
Similar fusion strategies might improve anomaly detection in other sensor-rich domains like manufacturing or transportation.
Further gains could come from adapting the time-series model to specific industrial fault signatures rather than general time series.

Load-bearing premise

That adding the multilingual text embeddings from equipment master records supplies useful conditioning information that accounts for the large drop in false positive rate.

What would settle it

An experiment that removes the text embedding component from the triplet and retrains the classifier on the identical HVAC dataset to check if the false positive rate increases back toward the 0.6 percent baseline.

Figures

Figures reproduced from arXiv: 2602.15089 by Takato Yasuno.

**Figure 2.** Figure 2: Confusion matrices (top) and ROC curves (bottom) at the 30-, 60-, and 90-day horizons on the test [PITH_FULL_IMAGE:figures/full_fig_p007_2.png] view at source ↗

**Figure 3.** Figure 3: Multi-modal Fusion Feature importance Top-50 by LightGBM gain (30d / 60d / 90d). Orange: [PITH_FULL_IMAGE:figures/full_fig_p008_3.png] view at source ↗

**Figure 4.** Figure 4: UMAP (left) and t-SNE (right) projections of 278 unique [PITH_FULL_IMAGE:figures/full_fig_p009_4.png] view at source ↗

**Figure 6.** Figure 6: Cluster-representative series (Clusters 5–9). [PITH_FULL_IMAGE:figures/full_fig_p010_6.png] view at source ↗

**Figure 7.** Figure 7: Cluster-representative series (Clusters 10– [PITH_FULL_IMAGE:figures/full_fig_p013_7.png] view at source ↗

read the original abstract

Predicting equipment anomalies before they escalate into failures is a critical challenge in industrial facility management. Existing approaches rely either on hand-crafted threshold rules, which lack generalizability, or on large neural models that are impractical for on-site, air-gapped deployments. We present an industrial methodology that resolves this tension by combining open-source small foundation models into a unified 1,116-dimensional Triplet Feature Fusion pipeline. This pipeline integrates: (1) statistical features (x in $R^{28}$) derived from 90-day sensor histories, (2) time-series embeddings (y in $R^{64}$) from a LoRA-adapted IBM Granite TinyTimeMixer (TTM, 133K parameters), and (3) multilingual text embeddings (z in $R^{1024}$) extracted from Japanese equipment master records via multilingual-e5-large. The concatenated triplet h = [x; y; z] is processed by a LightGBM classifier (< 3 MB) trained to predict anomalies at 30-, 60-, and 90-day horizons. All components use permissive open-source licenses (Apache 2.0 / MIT). The inference-time pipeline runs entirely on CPU in under 2 ms, enabling edge deployment on co-located hardware without cloud dependency. On a dataset of 64 HVAC units comprising 67,045 samples, the triplet model achieves Precision = 0.992, F1 = 0.958, and ROC-AUC = 0.998 at the 30-day horizon. Crucially, it reduces the False Positive Rate from 0.6 percent (baseline) to 0.1 percent - an 83 percent reduction attributable to equipment-type conditioning via text embedding z. Cluster analysis reveals that the embeddings align time-series signatures with distinct fault archetypes, explaining how compact multilingual representations improve discrimination without explicit categorical encoding.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper gives a practical open-source triplet fusion for HVAC anomaly prediction with strong reported numbers, but the text embeddings' role in the FPR drop lacks an isolating ablation.

read the letter

This paper combines 28 statistical features from sensor histories, 64-dim embeddings from a LoRA-tuned TinyTimeMixer, and 1024-dim multilingual-e5 text vectors from Japanese equipment records, then feeds the 1116-dim concatenation into LightGBM for 30/60/90-day anomaly forecasts. On 67k samples from 64 HVAC units it reports precision 0.992, F1 0.958, and ROC-AUC 0.998 at the shortest horizon, with false-positive rate falling from 0.6% to 0.1%. The whole stack stays under 3 MB and runs in under 2 ms on CPU, which is the part that actually matters for air-gapped sites. The open-source choice of components and the explicit edge-deployment target are the clearest strengths; anyone who has tried to ship large models to factory hardware will see the appeal right away. The cluster plots that link embeddings to fault types are a reasonable way to motivate the text component. The main gap is the missing ablation that would show whether removing the text vector or swapping it for one-hot equipment labels erases the claimed FPR gain. With only 64 units, any unit-level leakage in the splits or imbalance in the positive class could produce the same numerical improvement without the embeddings supplying new signal. The abstract also gives no cross-validation scheme or error bars, so the metrics are hard to trust at face value until the full methods section is checked. The work is aimed at engineers who maintain building systems and need something they can run locally on modest hardware. Readers already working with mixed sensor-plus-text data will find the pipeline concrete enough to test on their own logs. It is worth sending to peer review so the data-handling details and the feature-ablation question can be settled by referees who know industrial datasets.

Referee Report

2 major / 2 minor

Summary. The paper proposes a Triplet Feature Fusion pipeline for industrial equipment anomaly prediction that concatenates statistical features (x in R^28) from 90-day sensor histories, time-series embeddings (y in R^64) from a LoRA-adapted IBM Granite TinyTimeMixer, and multilingual text embeddings (z in R^1024) from Japanese equipment records via multilingual-e5-large. The 1,116-dimensional vector h = [x; y; z] is classified by a lightweight LightGBM model to predict anomalies at 30/60/90-day horizons. On 67,045 samples from 64 HVAC units the triplet model reports Precision=0.992, F1=0.958, ROC-AUC=0.998 at the 30-day horizon and reduces false-positive rate from 0.6% (baseline) to 0.1%, an 83% reduction attributed to the text embeddings.

Significance. If the reported metrics are reproducible and the attribution to z is confirmed by ablation, the work would be significant for edge-deployable industrial monitoring: it demonstrates that small open-source foundation models plus multilingual metadata can deliver high-precision anomaly prediction on modest hardware without cloud access. The explicit open-source licensing and sub-2 ms CPU inference are practical strengths.

major comments (2)

[Abstract and §3] Abstract and §3 (pipeline description): the central claim that the FPR drop from 0.6% to 0.1% is 'attributable to equipment-type conditioning via text embedding z' is unsupported. No ablation that removes z, replaces z with one-hot equipment-type features, or compares against [x; y] alone is reported, nor is any statistical test of the performance delta. With only 64 units, unit-specific leakage or class imbalance could produce the same numerical improvement without the claimed mechanism.
[§4] §4 (experimental setup): cross-validation strategy, baseline construction details, train/test split criteria, and any safeguards against temporal or unit-level leakage are not described. The abstract reports point estimates without error bars or confidence intervals, making it impossible to assess whether the reported ROC-AUC of 0.998 is statistically distinguishable from the baseline.

minor comments (2)

[Figure 3] Figure 3 (cluster analysis): the caption should explicitly state the clustering algorithm, distance metric, and number of clusters so readers can reproduce the alignment between embeddings and fault archetypes.
[§2] Notation: the dimensions of x, y, and z are given in the abstract but the exact feature names and preprocessing steps for the 28 statistical features should be listed in a table for reproducibility.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the detailed and constructive feedback. We agree that the current manuscript lacks explicit ablations and experimental details, which weakens the central claims. We will revise the paper to address both major comments by adding the requested analyses and descriptions.

read point-by-point responses

Referee: [Abstract and §3] Abstract and §3 (pipeline description): the central claim that the FPR drop from 0.6% to 0.1% is 'attributable to equipment-type conditioning via text embedding z' is unsupported. No ablation that removes z, replaces z with one-hot equipment-type features, or compares against [x; y] alone is reported, nor is any statistical test of the performance delta. With only 64 units, unit-specific leakage or class imbalance could produce the same numerical improvement without the claimed mechanism.

Authors: We acknowledge that the manuscript does not contain explicit ablation studies isolating the contribution of z. The attribution to text embeddings is currently supported only by the overall performance gain and the cluster analysis mentioned in the abstract. To address this, the revised manuscript will include: (i) performance metrics for the [x; y] model alone, (ii) direct comparison against [x; y; z], (iii) a variant replacing z with one-hot equipment-type encoding, and (iv) statistical significance tests (e.g., paired bootstrap or McNemar’s test) on the FPR and AUC differences. These additions will either substantiate or qualify the claimed mechanism. revision: yes
Referee: [§4] §4 (experimental setup): cross-validation strategy, baseline construction details, train/test split criteria, and any safeguards against temporal or unit-level leakage are not described. The abstract reports point estimates without error bars or confidence intervals, making it impossible to assess whether the reported ROC-AUC of 0.998 is statistically distinguishable from the baseline.

Authors: We agree that §4 omits critical methodological details. In the revision we will expand this section to specify: (a) the cross-validation procedure (time-series purged k-fold with unit-level grouping to prevent leakage across the 64 HVAC units), (b) chronological train/test splits ensuring no future data leakage, (c) exact baseline construction (statistical-features-only LightGBM), and (d) all metrics reported with standard deviations and 95% bootstrap confidence intervals from repeated runs. This will allow readers to evaluate statistical distinguishability from the baseline. revision: yes

Circularity Check

0 steps flagged

No circularity in the presented methodology or claims

full rationale

The paper describes a standard empirical pipeline: extract statistical features x, time-series embeddings y from a pre-trained TTM model, text embeddings z from multilingual-e5-large, concatenate to h=[x;y;z], and train a LightGBM classifier on labeled data to report precision/F1/AUC/FPR metrics. No equations define any quantity in terms of itself, no fitted parameters are renamed as predictions, and no self-citations supply load-bearing uniqueness theorems or ansatzes. The attribution of FPR reduction to z is an interpretive claim about mechanism (supported or not by ablation), but it does not reduce any derivation to the paper's own inputs by construction. The work is self-contained against external benchmarks and receives the default non-circularity finding.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on the domain assumption that the chosen embeddings capture anomaly-relevant signals and that text embeddings add orthogonal value beyond statistical and time-series features; no free parameters or invented entities are explicitly introduced in the abstract.

axioms (1)

domain assumption Embeddings from the small TTM and multilingual-e5 models capture predictive signals for equipment anomalies.
Invoked when claiming that concatenation of x, y, z improves discrimination over baseline.

pith-pipeline@v0.9.0 · 5639 in / 1281 out tokens · 38970 ms · 2026-05-15T21:49:53.909505+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

The concatenated triplet h = [x; y; z] is processed by a LightGBM classifier

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Heterogeneous Variational Inference for Markov Degradation Hazard Models: Discretized Mixture with Interpretable Clusters
cs.LG 2026-04 unverdicted novelty 5.0

A discretized finite mixture model with ADVI identifies interpretable low- and high-risk clusters in Markov degradation hazard models for 280 industrial pumps, achieving 84x speedup over NUTS while enforcing stability...

Reference graph

Works this paper leans on

28 extracted references · 28 canonical work pages · cited by 1 Pith paper · 4 internal anchors

[1]

Montgomery.Introduction to Statis- tical Quality Control

Douglas C. Montgomery.Introduction to Statis- tical Quality Control. John Wiley & Sons, 6th edition, 2009

work page 2009
[2]

Isolation forest

Fei Tony Liu, Kai Ming Ting, and Zhi-Hua Zhou. Isolation forest. InProceedings of the 8th IEEE International Conference on Data Mining (ICDM), pages 413–422, 2008

work page 2008
[3]

Kyle Hundman, Valentino Constantinou, Carter Laporte, et al. Detecting spacecraft anoma- lies using LSTMs and nonparametric dynamic thresholding.Proceedings of the 24th ACM SIGKDD International Conference on Knowl- edge Discovery & Data Mining, pages 387–395, 2018

work page 2018
[4]

Dongmin Park, Yuichi Hoshi, and Charles C. Kemp. A multimodal anomaly detector for robot-assisted feeding using an LSTM-based variational autoencoder.IEEE Robotics and Au- tomation Letters, 3:1544–1551, 2018

work page 2018
[5]

A decoder-only foundation model for time- series forecasting.International Conference on Machine Learning (ICML), 2024

Abhimanyu Das, Weihao Kong, Andrew Leach, et al. A decoder-only foundation model for time- series forecasting.International Conference on Machine Learning (ICML), 2024

work page 2024
[6]

Unified training of universal time series forecasting transformers.International Confer- ence on Machine Learning (ICML), 2024

Gerald Woo, Chenghao Liu, Akshat Kumar, et al. Unified training of universal time series forecasting transformers.International Confer- ence on Machine Learning (ICML), 2024

work page 2024
[7]

Nguyen, et al

Vijay Ekambaram, Arindam Jati, Nam H. Nguyen, et al. TTMs: Fast multi-level tiny time mixers for improved zero-shot and few-shot fore- castingofmultivariatetimeseries.arXiv preprint arXiv:2401.03955, 2024

work page arXiv 2024
[8]

AI Value Creators: Beyond the Generative AI User Mindset

Rob Thomas, Paul Zikopoulos, and Kate Soule. AI Value Creators: Beyond the Generative AI User Mindset. O’Reilly Media, 2025

work page 2025
[9]

Hu, Yelong Shen, Phillip Wallis, et al

Edward J. Hu, Yelong Shen, Phillip Wallis, et al. LoRA: Low-rank adaptation of large language models.International Conference on Learning Representations (ICLR), 2022. 14

work page 2022
[10]

Efficient large language models: A survey.arXiv preprint arXiv:2312.03863,

Zhongwei Wan, Xin Wang, Che Liu, et al. Ef- ficient large language models: A survey.arXiv preprint arXiv:2312.03863, 2023

work page arXiv 2023
[11]

Phi-3 Technical Report: A Highly Capable Language Model Locally on Your Phone

Marah Abdin, Sam Ade Jacobs, Ammar Ahmad Awan, et al. Phi-3 technical report: A highly capable language model locally on your phone. arXiv preprint arXiv:2404.14219, 2024

work page internal anchor Pith review Pith/arXiv arXiv 2024
[12]

Arik and Tomas Pfister

Sercan O. Arik and Tomas Pfister. TabNet: At- tentive interpretable tabular learning.Proceed- ings of the AAAI Conference on Artificial Intel- ligence, 35(8):6679–6687, 2021

work page 2021
[13]

Deep neural networks and tabular data: A survey.IEEE Transactions on Neural Networks and Learning Systems, 2022

VadimBorisov, Tobias Leemann, Kathrin Seßler, et al. Deep neural networks and tabular data: A survey.IEEE Transactions on Neural Networks and Learning Systems, 2022

work page 2022
[14]

Learning transferable visual models from natural language supervision

Alec Radford, Jong Wook Kim, Chris Hallacy, et al. Learning transferable visual models from natural language supervision. InProceedings of the 38th International Conference on Machine Learning (ICML), pages 8748–8763, 2021

work page 2021
[15]

BLIP: Bootstrapping language- image pre-training for unified vision-language understanding and generation

Junnan Li, Dongxu Li, Caiming Xiong, and Steven Hoi. BLIP: Bootstrapping language- image pre-training for unified vision-language understanding and generation. InProceedings of the 39th International Conference on Machine Learning (ICML), pages 12888–12900, 2022

work page 2022
[16]

Multilingual E5 Text Embeddings: A Technical Report

Liang Wang, Nan Yang, Xiaolong Huang, et al. Multilingual E5 text embeddings: A technical report.arXiv preprint arXiv:2402.05672, 2024

work page internal anchor Pith review Pith/arXiv arXiv 2024
[17]

LightGBM: A highly efficient gradient boosting decision tree.Advances in Neural Information Processing Systems (NeurIPS), 30, 2017

Guolin Ke, Qi Meng, Thomas Finley, et al. LightGBM: A highly efficient gradient boosting decision tree.Advances in Neural Information Processing Systems (NeurIPS), 30, 2017

work page 2017
[18]

XGBoost: A scalable tree boosting system

Tianqi Chen and Carlos Guestrin. XGBoost: A scalable tree boosting system. InProceedings of the 22nd ACM SIGKDD International Confer- ence on Knowledge Discovery and Data Mining, pages 785–794, 2016

work page 2016
[19]

CatBoost: Unbiased boosting with categorical features

LiudmilaProkhorenkova, GlebGusev, Aleksandr Vorobev, et al. CatBoost: Unbiased boosting with categorical features. InAdvances in Neu- ral Information Processing Systems (NeurIPS), volume 31, 2018

work page 2018
[20]

Hybrid feature learning with time series embeddings for equipment anomaly prediction

Takato Yasuno. Hybrid feature learning with time series embeddings for equipment anomaly prediction. InProceedings of the 40th Annual Conference of the Japanese Society for Artificial Intelligence (JSAI 2026), Gunma, Japan, June

work page 2026
[21]

Triplet Feature Fusion for Equipment Anomaly Prediction : An Open-Source Methodology Using Small Foundation Models

Forthcoming (June 8–12, 2026). Preprint: arXiv:2602.15089 [cs.LG],https://arxiv.org/ abs/2602.15089

work page internal anchor Pith review Pith/arXiv arXiv 2026
[22]

Visualizing data using t-SNE.Journal of Ma- chine Learning Research, 9:2579–2605, 2008

Laurens van der Maaten and Geoffrey Hinton. Visualizing data using t-SNE.Journal of Ma- chine Learning Research, 9:2579–2605, 2008

work page 2008
[23]

UMAP: Uniform Manifold Approximation and Projection for Dimension Reduction

Leland McInnes, John Healy, and James Melville. UMAP: Uniform manifold approxi- mation and projection for dimension reduction. arXiv preprint arXiv:1802.03426, 2018

work page internal anchor Pith review Pith/arXiv arXiv 2018
[24]

A density-based algorithm for discover- ing clusters in large spatial databases with noise

Martin Ester, Hans-Peter Kriegel, Jörg Sander, et al. A density-based algorithm for discover- ing clusters in large spatial databases with noise. InProceedings of the 2nd International Confer- ence on Knowledge Discovery and Data Mining (KDD), pages 226–231, 1996

work page 1996
[25]

Stuart P. Lloyd. Least squares quantization in PCM.IEEE Transactions on Information The- ory, 28(2):129–137, 1982

work page 1982
[26]

Bishop.Pattern Recognition and Machine Learning

Christopher M. Bishop.Pattern Recognition and Machine Learning. Springer, New York, NY, 2006

work page 2006
[27]

Dam inflow time series regres- sion models minimising loss of hydropower op- portunities

Takato Yasuno. Dam inflow time series regres- sion models minimising loss of hydropower op- portunities. InProceedings of the Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD), pages 357–367. Springer, Cham, 2018

work page 2018
[28]

Flood inflow forecast using L2-norm ensemble weighting sea surface feature

Takato Yasuno et al. Flood inflow forecast using L2-norm ensemble weighting sea surface feature. arXiv preprint arXiv:2112.03108, 2022. 15

work page arXiv 2022