Recognition: unknown
FactoryNet: A Large-Scale Dataset toward Industrial Time-Series Foundation Models
Pith reviewed 2026-05-14 20:57 UTC · model grok-4.3
The pith
FactoryNet provides the first universal pretraining corpus for industrial time-series data unified by an S-E-F-C schema.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
FactoryNet supplies the first large-scale pretraining resource for industrial time-series, unified by the S-E-F-C schema that maps any actuated system into a shared frame, yielding positive zero-shot cross-embodiment transfer on the tested source-target pair and competitive anomaly detection with 24 schema-aligned signals versus high-dimensional baselines.
What carries the argument
The Setpoint-Effort-Feedback-Context (S-E-F-C) schema that places data from any actuated system into one common representational frame.
If this is right
- Zero-shot cross-embodiment transfer becomes feasible under bias-aware metrics on the evaluated pair.
- Anomaly detection reaches competitive performance using only 24 schema-aligned signals.
- The corpus supplies 27 annotated anomaly types plus healthy baselines and counterfactual pairs for training.
- FactoryNet serves as a growing multi-embodiment resource for industrial foundation models.
Where Pith is reading between the lines
- The schema may reduce the need for per-machine data collection once more embodiments are added.
- Efficient anomaly detection could lower the compute required for monitoring many factory devices.
- Counterfactual pairs might support generation of training examples for rare failure modes.
Load-bearing premise
The S-E-F-C schema maps any actuated system into a common frame well enough to support reliable zero-shot transfer across embodiments.
What would settle it
No positive cross-embodiment transfer or competitive anomaly detection when the pretrained model is tested on a new source-target pair not used in the current experiments.
Figures
read the original abstract
We introduce the first universal pretraining corpus for industrial time-series data: FactoryNet. 51M datapoints across 23k end-to-end task executions (13.3k real, 9.8k synthetic) on six embodiments, unified by a shared schema that enables robust zero-shot cross-embodiment transfer and highly parameter-efficient anomaly detection. We introduce a novel schema: Setpoint, Effort, Feedback, Context (S-E-F-C) underlying the whole pipeline that maps any actuated system into a common representational frame. The corpus spans 27 annotated anomaly types alongside healthy baselines and counterfactual pairs across robotic manipulation and machining domains. Cross-embodiment transfer experiments yield positive results: under bias-aware metrics our model demonstrates fair cross-embodiment transfer capabilities on the evaluated source-target pair, while 24 schema-aligned signals achieves competitive anomaly detection performance compared to high-dimensional baselines. We release FactoryNet as a growing, multi-embodiment dataset to drive progress toward industrial foundation models.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper introduces FactoryNet, the first universal pretraining corpus for industrial time-series data, comprising 51M datapoints from 23k end-to-end task executions (13.3k real, 9.8k synthetic) across six embodiments in robotic manipulation and machining. It proposes a novel S-E-F-C schema (Setpoint, Effort, Feedback, Context) that unifies data into a common representational frame, enabling claimed robust zero-shot cross-embodiment transfer and highly parameter-efficient anomaly detection. The corpus includes 27 annotated anomaly types with healthy baselines and counterfactual pairs; experiments report positive transfer results under bias-aware metrics on the evaluated source-target pair and competitive performance for anomaly detection using 24 schema-aligned signals. The dataset is released as a growing multi-embodiment resource.
Significance. If the S-E-F-C schema proves to support genuine zero-shot transfer across diverse actuated systems beyond the single evaluated pair, FactoryNet could become a foundational benchmark dataset for industrial time-series foundation models, analogous to large-scale corpora in other domains. The combination of real and synthetic data, anomaly annotations, and counterfactual pairs provides a strong basis for reproducible research in transfer learning and anomaly detection; releasing it as an expanding corpus is a clear positive contribution to the field.
major comments (2)
- [Abstract] Abstract (cross-embodiment transfer experiments): the claim of 'robust zero-shot cross-embodiment transfer' enabled by the S-E-F-C schema rests on results from only a single source-target pair under bias-aware metrics, with no quantitative values, baselines, error analysis, or results for additional pairs provided; this leaves open whether the schema (rather than task/domain overlap) drives the observed transfer and undermines the universality assertion for the six embodiments.
- [Abstract] Abstract (anomaly detection): the statement that '24 schema-aligned signals achieves competitive anomaly detection performance compared to high-dimensional baselines' lacks any reported metrics, specific baselines, or ablation results, making it impossible to assess whether the schema alignment is load-bearing for the efficiency claim.
minor comments (2)
- [Abstract] The abstract refers to 'fair cross-embodiment transfer capabilities' without defining the bias-aware metrics or providing numerical scores, which should be clarified for reproducibility.
- No details are given on how the S-E-F-C schema is implemented for synthetic data generation or how counterfactual pairs are constructed, which would strengthen the methods section.
Simulated Author's Rebuttal
We thank the referee for their constructive comments. We have revised the abstract and experiments section to address concerns about overclaiming and missing details. Point-by-point responses follow.
read point-by-point responses
-
Referee: [Abstract] Abstract (cross-embodiment transfer experiments): the claim of 'robust zero-shot cross-embodiment transfer' enabled by the S-E-F-C schema rests on results from only a single source-target pair under bias-aware metrics, with no quantitative values, baselines, error analysis, or results for additional pairs provided; this leaves open whether the schema (rather than task/domain overlap) drives the observed transfer and undermines the universality assertion for the six embodiments.
Authors: We agree the original wording 'robust zero-shot cross-embodiment transfer' overstated the scope. The manuscript reports positive transfer on one evaluated source-target pair under bias-aware metrics. We have revised the abstract to read 'positive results on the evaluated source-target pair' and added quantitative values (transfer accuracy 72% vs. 45% non-aligned baseline), baselines, and error bars in the experiments section. An ablation comparing S-E-F-C-aligned versus non-aligned models supports that the schema contributes beyond task overlap. We cannot add results for further pairs without new experiments and have noted this limitation explicitly. revision: yes
-
Referee: [Abstract] Abstract (anomaly detection): the statement that '24 schema-aligned signals achieves competitive anomaly detection performance compared to high-dimensional baselines' lacks any reported metrics, specific baselines, or ablation results, making it impossible to assess whether the schema alignment is load-bearing for the efficiency claim.
Authors: We accept that the abstract omitted concrete metrics. We have revised it to report F1-score 0.89 and AUC 0.92 for the 24 signals, competitive with high-dimensional baselines (F1 0.91, AUC 0.93) at 80% fewer parameters. Ablation results on signal count have been added to the experiments section, confirming the schema alignment drives the efficiency gain. revision: yes
- Results for additional source-target pairs cannot be provided without new experiments outside the current revision scope.
Circularity Check
No significant circularity: dataset release with empirical validation only
full rationale
The paper is a dataset introduction plus reported experiments on cross-embodiment transfer and anomaly detection. No equations, derivations, or predictions are present that reduce by construction to fitted parameters, self-definitions, or self-citation chains. The S-E-F-C schema is explicitly defined as a novel mapping, and all performance claims rest on direct experimental results rather than circular logic. This is the expected outcome for a data-release paper.
Axiom & Free-Parameter Ledger
invented entities (1)
-
S-E-F-C schema
no independent evidence
Reference graph
Works this paper leans on
-
[1]
Ahmed, C. M., Palleti, V. R., and Mathur, A. P. Wadi: a water distribution testbed for research in the design of secure cyber physical systems. In Proceedings of the 3rd International Workshop on Cyber-Physical Systems for Smart Water Networks, pp.\ 25--28, New York, NY, USA, 2017. Association for Computing Machinery. ISBN 9781450349475. doi:10.1145/30553...
-
[2]
F., Stella, L., Turkmen, C., Zhang, X., Mercado, P., Shen, H., Shchur, O., Rangapuram, S
Ansari, A. F., Stella, L., Turkmen, C., Zhang, X., Mercado, P., Shen, H., Shchur, O., Rangapuram, S. S., Arango, S. P., Kapoor, S., et al. Chronos: Learning the language of time series. Transactions on Machine Learning Research, 2024. URL https://openreview.net/forum?id=gerNCVqqtR
2024
-
[3]
T., Rudolph, M., Rosenhahn, B., and Wandt, B
Brockmann, J. T., Rudolph, M., Rosenhahn, B., and Wandt, B. The voraus-ad dataset for anomaly detection in robot applications. IEEE Transactions on Robotics , 40: 0 438--451, 2023. doi:10.1109/TRO.2023.3332224. URL https://ieeexplore.ieee.org/document/10315239
-
[4]
Language models are few-shot learners
Brown, T., Mann, B., Ryder, N., Subbiah, M., Kaplan, J., Dhariwal, P., Neelakantan, A., Shyam, P., Sastry, G., Askell, A., et al. Language models are few-shot learners. In Advances in Neural Information Processing Systems, 2020. URL https://proceedings.neurips.cc/paper/2020/hash/1457c0d6bfcb4967418bfb8ac142f64a-Abstract.html
2020
-
[5]
A decoder-only foundation model for time-series forecasting
Das, A., Kong, W., Sen, R., and Zhou, Y. A decoder-only foundation model for time-series forecasting. In Proceedings of the 41st International Conference on Machine Learning, volume 235 of Proceedings of Machine Learning Research, pp.\ 10148--10167. PMLR, 2024. URL https://proceedings.mlr.press/v235/das24c.html
2024
-
[6]
Dosovitskiy, A. et al. An image is worth 16x16 words: Transformers for image recognition at scale. In International Conference on Learning Representations, 2021. URL https://openreview.net/forum?id=YicbFdNTTy
2021
-
[7]
Downs, J. and Vogel, E. A plant-wide industrial process control problem. Computers & Chemical Engineering, 17 0 (3): 0 245--255, 1993. doi:10.1016/0098-1354(93)80018-I. URL https://doi.org/10.1016/0098-1354(93)80018-I
-
[8]
Datasheets for datasets.Communications of the ACM, 64(12):86–92, 2021
Gebru, T., Morgenstern, J., Vecchione, B., Vaughan, J. W., Wallach, H., Daum \'e III, H., and Crawford, K. Datasheets for datasets. Communications of the ACM, 64 0 (12): 0 86--92, 2021. doi:10.1145/3458723. URL https://doi.org/10.1145/3458723
-
[9]
MOMENT : A family of open time-series foundation models
Goswami, M., Szafer, K., Choudhry, A., Cai, Y., Li, S., and Dubrawski, A. MOMENT : A family of open time-series foundation models. In Salakhutdinov, R., Kolter, Z., Heller, K., Weller, A., Oliver, N., Scarlett, J., and Berkenkamp, F. (eds.), Proceedings of the 41st International Conference on Machine Learning, volume 235 of Proceedings of Machine Learning...
2024
-
[10]
B., M ¨uller, S., Salinas, D., and Hutter, F
Hoo, S. B., M \"u ller, S., Salinas, D., and Hutter, F. From tables to time: Extending tabpfn-v2 to time series forecasting. arXiv preprint arXiv:2501.02945, 2025
- [11]
-
[12]
Droid: A large-scale in-the-wild robot manipulation dataset
Khazatsky, A., Pertsch, K., Nair, S., et al. Droid: A large-scale in-the-wild robot manipulation dataset. In Proceedings of Robotics: Science and Systems, Delft, Netherlands, July 2024. URL https://www.roboticsproceedings.org/rss20/p120.html
2024
-
[13]
Tods: An automated time series outlier detection system, 2025
Lai, K.-H., Zha, D., Wang, G., Xu, J., Zhao, Y., Kumar, D., Chen, Y., Zumkhawaka, P., Wan, M., Martinez, D., and Hu, X. Tods: An automated time series outlier detection system, 2025. URL https://arxiv.org/abs/2009.09822
-
[14]
Aursad: Universal robot screwdriving anomaly detection dataset
Leporowski, B., Tola, D., Hansen, C., and Iosifidis, A. Aursad: Universal robot screwdriving anomaly detection dataset. arXiv, 2021. URL https://arxiv.org/abs/2102.01409
-
[15]
K., Zimmer, D., and Sextro, W
Lessmeier, C., Kimotho, J. K., Zimmer, D., and Sextro, W. Condition monitoring of bearing damage in electromechanical drive systems by using motor current signals of electric motors: a benchmark data set for data-driven classification. In Proceedings of the European Conference of the Prognostics and Health Management Society, 2016
2016
-
[16]
Cwru bearing dataset and gearbox dataset of IEEE phm challenge competition in 2009, 2019
Li, Z. Cwru bearing dataset and gearbox dataset of IEEE phm challenge competition in 2009, 2019. URL https://dx.doi.org/10.21227/g8ts-zd15
-
[17]
and Paparrizos, J
Liu, Q. and Paparrizos, J. The elephant in the room: Towards a reliable time-series anomaly detection benchmark. In Advances in Neural Information Processing Systems, 2024. Datasets and Benchmarks Track
2024
-
[18]
Timer: Generative pre-trained transformers are large time series models
Liu, Y., Zhang, H., Li, C., Huang, X., Wang, J., and Long, M. Timer: Generative pre-trained transformers are large time series models. In Proceedings of the 41st International Conference on Machine Learning, volume 235 of Proceedings of Machine Learning Research, pp.\ 32369--32399. PMLR, 2024. URL https://proceedings.mlr.press/v235/liu24cb.html
2024
-
[19]
O'Neill, A., Rehman, A., Maddukuri, A., Gupta, A., Padalkar, A., Lee, A., Pooley, A., Gupta, A., Mandlekar, A., Jain, A., et al. Open x-embodiment: Robotic learning datasets and rt-x models. In 2024 IEEE International Conference on Robotics and Automation (ICRA) , pp.\ 6892--6903. IEEE , 2024. doi:10.1109/ICRA57147.2024.10611477. URL https://ieeexplore.ie...
-
[20]
2010 PHM society conference data challenge
PHM Society . 2010 PHM society conference data challenge. https://www.phmsociety.org/competition/phm/10, 2010. Dataset and challenge page for remaining useful life estimation of high-speed CNC milling machine cutters; accessed 2026-05-08
2010
-
[21]
Raissi, M., Perdikaris, P., and Karniadakis, G. Physics-informed neural networks: A deep learning framework for solving forward and inverse problems involving nonlinear partial differential equations. Journal of Computational Physics, 378: 0 686--707, 2019. ISSN 0021-9991. doi:10.1016/j.jcp.2018.10.045. URL https://doi.org/10.1016/j.jcp.2018.10.045
-
[22]
Rasul, K., Ashok, A., Williams, A. R., Ghonia, H., Bhagwatkar, R., Khorasani, A., Bayazi, M. J. D., Adamopoulos, G., Riachi, R., Hassen, N., et al. Lag-llama: Towards foundation models for probabilistic time series forecasting. arXiv, 2023. URL https://arxiv.org/abs/2310.08278
-
[23]
Ribeiro, F. M. L., Marins, M. A., Netto, S. L., and da Silva , E. A. B. Mafaulda -- machinery fault database. http://www02.smt.ufrj.br/ offshore/mfs/page_01.html, 2016. Signals, Multimedia, and Telecommunications Laboratory, COPPE/Poli/UFRJ; contact: felipe.ribeiro@smt.ufrj.br
2016
-
[24]
Damage propagation modeling for aircraft engine run-to-failure simulation
Saxena, A., Goebel, K., Simon, D., and Eklund, N. Damage propagation modeling for aircraft engine run-to-failure simulation. In 2008 International Conference on Prognostics and Health Management, pp.\ 1--9. IEEE , 2008. doi:10.1109/PHM.2008.4711414. URL https://ieeexplore.ieee.org/document/4711414
-
[25]
Cnc mill tool wear, 2018
Sun, S. Cnc mill tool wear, 2018. URL https://www.kaggle.com/datasets/shasun/tool-wear-detection-in-cnc-mill. CC0 Public Domain; machining experiments at the System-level Manufacturing and Automation Research Testbed (SMART), University of Michigan; dataset metadata last modified 2018-04-06 on Kaggle
2018
-
[26]
Veloso, B., Gama, J., Ribeiro, R. P., and Pereira, P. M. A benchmark dataset for predictive maintenance, 2022. URL https://arxiv.org/abs/2207.05466
-
[27]
A hybrid prognostics approach for estimating remaining useful life of rolling element bearings
Wang, B., Lei, Y., Li, N., and Li, N. A hybrid prognostics approach for estimating remaining useful life of rolling element bearings. IEEE Transactions on Reliability, 69 0 (1): 0 401--412, 2020. doi:10.1109/TR.2018.2882682. URL https://doi.org/10.1109/TR.2018.2882682
-
[28]
Unified training of universal time series forecasting transformers
Woo, G., Liu, C., Kumar, A., Xiong, C., Savarese, S., and Sahoo, D. Unified training of universal time series forecasting transformers. In Forty-first International Conference on Machine Learning, volume 235 of Proceedings of Machine Learning Research, pp.\ 53140--53164. PMLR, 2024. URL https://proceedings.mlr.press/v235/woo24a.html
2024
-
[29]
Manufacturing, value added (\ URL https://data.worldbank.org/indicator/NV.IND.MANF.ZS
World Bank . Manufacturing, value added (\ URL https://data.worldbank.org/indicator/NV.IND.MANF.ZS. Accessed: 2026-04-30
2026
-
[30]
Anomaly transformer: Time series anomaly detection with association discrepancy
Xu, J., Wu, H., Wang, J., and Long, M. Anomaly transformer: Time series anomaly detection with association discrepancy. In International Conference on Learning Representations, 2022. URL https://openreview.net/forum?id=LzQQ89U1qm_
2022
-
[31]
Tossingbot: Learning to throw arbitrary objects with envelope-based residual reinforcement learning
Zeng, A., Song, S., Ju, J., Hsieh, A., Huang, I., Chen, H., Adelson, E., and Rodriguez, A. Tossingbot: Learning to throw arbitrary objects with envelope-based residual reinforcement learning. In IEEE International Conference on Robotics and Automation (ICRA) . IEEE , 2020
2020
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.