Recognition: unknown
Implementing CPSLint: A Data Validation and Sanitisation Tool for Industrial Cyber-Physical Systems
Pith reviewed 2026-05-10 03:30 UTC · model grok-4.3
The pith
CPSLint is a domain-specific language that expresses industrial CPS data sanitization and validation in just a few lines of code.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
CPSLint is a DSL designed to support the data preparation process for industrial CPS, where one can express the sanitization of large time-series data collections in just a few lines of code; it leverages the fact that many raw datasets in this domain require similar actions to become suitable for analysis, and it is presented as a publicly available tool applicable to any such case.
What carries the argument
CPSLint, a domain-specific language whose constructs abstract common sanitization and validation operations on CPS time-series data.
Load-bearing premise
Many raw industrial CPS time-series collections share enough similar preparation needs that they can be handled by a single concise DSL rather than case-by-case scripts.
What would settle it
A representative CPS dataset whose required sanitization steps cannot be expressed in CPSLint using only a few lines or that demands substantial additional custom code outside the DSL.
Figures
read the original abstract
Raw datasets are often too large and unstructured to work with directly, and require a data preparation phase. The domain of industrial Cyber-Physical Systems (CPSs) is no exception, as raw data typically consists of large time-series data collections that log the system's status at regular time intervals. The processing of such raw data is often carried out using ad hoc, case-specific, one-off Python scripts, often neglecting aspects of readability, reusability, and maintainability. In practice, this can cause professionals such as data scientists to write similar data preparation scripts for each case, requiring them to do much repetitive work. We introduce CPSLint, a Domain-Specific Language (DSL) designed to support the data preparation process for industrial CPS. CPSLint raises the level of abstraction to the point where both data scientists and domain experts can perform the data preparation task. We leverage the fact that many raw data collections in the industrial CPS domain require similar actions to render them suitable for data-centric workflows. In our DSL one can express the data preparation process in just a few lines of code. CPSLint is a publicly available tool applicable for any case involving time-series data collections in need of sanitisation.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript introduces CPSLint, a domain-specific language (DSL) for data validation and sanitization of raw time-series datasets in industrial cyber-physical systems (CPS). It motivates the work by noting that ad-hoc Python scripts for data preparation lead to repetitive work and neglect readability, reusability, and maintainability. CPSLint is claimed to raise the abstraction level so that both data scientists and domain experts can express the preparation process in just a few lines of code, leveraging common sanitization actions across CPS datasets. The tool is presented as publicly available and generally applicable to any time-series data sanitization needs.
Significance. If the DSL successfully captures common CPS data-preparation patterns and delivers the promised conciseness and accessibility, the work could meaningfully reduce repetitive scripting effort, improve collaboration between domain experts and data scientists, and enhance maintainability of industrial data pipelines. Such a contribution would be relevant to the intersection of programming languages and applied CPS engineering.
major comments (2)
- The central claim that data preparation 'can be expressed in just a few lines of code' and that the DSL 'raises the level of abstraction' is load-bearing for the usability argument, yet the abstract (and by extension the manuscript summary) provides no syntax examples, supported operations, or concrete code fragments to illustrate this.
- The generality assertion that CPSLint is 'applicable for any case involving time-series data collections in need of sanitisation' rests on the premise that 'many raw data collections in the industrial CPS domain require similar actions.' No coverage analysis, set of supported versus unsupported operations, or multiple independent case studies is referenced to substantiate or bound this claim.
Simulated Author's Rebuttal
We thank the referee for the constructive feedback on our manuscript. We address each major comment point by point below, indicating where revisions will be made to strengthen the presentation.
read point-by-point responses
-
Referee: The central claim that data preparation 'can be expressed in just a few lines of code' and that the DSL 'raises the level of abstraction' is load-bearing for the usability argument, yet the abstract (and by extension the manuscript summary) provides no syntax examples, supported operations, or concrete code fragments to illustrate this.
Authors: We agree that the abstract would benefit from a concrete illustration to support the usability claims. While the full manuscript provides the DSL syntax definition, grammar, and multiple usage examples in Sections 3 (Language Design) and 4 (Implementation and Usage), we will revise the abstract to include a short, self-contained code fragment demonstrating a typical sanitization workflow expressed in a few lines. revision: yes
-
Referee: The generality assertion that CPSLint is 'applicable for any case involving time-series data collections in need of sanitisation' rests on the premise that 'many raw data collections in the industrial CPS domain require similar actions.' No coverage analysis, set of supported versus unsupported operations, or multiple independent case studies is referenced to substantiate or bound this claim.
Authors: The DSL was designed around recurrent sanitization patterns (e.g., outlier detection, missing-value handling, timestamp alignment, and normalization) that we observed across multiple industrial CPS datasets during development. To make this explicit, we will add a new subsection (or table) that lists all supported operations, their parameters, and the domain rationale for inclusion, thereby providing the requested coverage analysis. The current evaluation uses one representative industrial case study; we do not have additional independent case studies available for inclusion. revision: partial
- Providing additional independent case studies to further substantiate generality, as this would require access to further proprietary industrial datasets beyond what is currently available.
Circularity Check
No circularity: tool description paper with no derivations or self-referential claims
full rationale
The paper is a descriptive implementation report on CPSLint, a DSL for time-series data sanitization in industrial CPS. It states that many datasets require similar actions and that the DSL allows expression in few lines of code, but provides no equations, predictions, fitted parameters, uniqueness theorems, or derivation chains. The generality claim rests on domain observation rather than any reduction to inputs by construction. No self-citations or ansatzes are invoked in a load-bearing way. This is a standard non-circular tool paper.
Axiom & Free-Parameter Ledger
Reference graph
Works this paper leans on
-
[1]
M. Mernik, J. Heering, A. M. Sloane, When and How to Develop Domain-Specific Languages, ACM Comput- ing Surveys (2005). doi:10.1145/1118890.1118892
-
[2]
Odyurt, O
U. Odyurt, O. Sayilir, M. Stoelinga, V. Zaytsev, CPSLint,
-
[3]
doi:10.5281/zenodo.17406795
-
[4]
U. Odyurt, Ömer Sayilir, M. Stoelinga, V. Zaytsev, CPSLint: A Domain-Specific Language Providing Data Validation and Sanitisation for Industrial Cyber- Physical Systems, 2025. doi: 10.48550/arXiv.2510. 18651
-
[5]
U. Odyurt, J. Roeder, A. D. Pimentel, I. G. Alonso, C. de Laat, Power Passports for Fault Tolerance: Anomaly Detection in Industrial CPS Using Electri- cal EFB, in: 2021 4th IEEE International Conference on Industrial Cyber-Physical Systems (ICPS), 2021. doi:10.1109/ICPS49255.2021.9468262
-
[6]
P. Klint, T. van der Storm, J. Vinju, RASCAL: A Domain Specific Language for Source Code Analysis and Ma- nipulation, in: 2009 Ninth IEEE International Working Conference on Source Code Analysis and Manipula- tion, 2009. doi:10.1109/SCAM.2009.28
-
[7]
P. Klint, T. van der Storm, J. Vinju, EASY Meta- programming with Rascal, in: Generative and Trans- formational Techniques in Software Engineering III: International Summer School, GTTSE 2009, Braga, Por- tugal, July 6-11, 2009. Revised Papers, Springer, 2011. doi:10.1007/978-3-642-18023-1_6
-
[8]
U. Odyurt, D. Sapra, A. D. Pimentel, The Choice of AI Matters: Alternative Machine Learning Approaches for CPS Anomalies, in: Advances and Trends in Ar- tificial Intelligence. From Theory to Practice, 2021. doi:10.1007/978-3-030-79463-7_40
-
[9]
S. Erdweg, T. van der Storm, M. Völter, L. Tratt, R. Bosman, W. R. Cook, A. Gerritsen, A. Hulshout, S. Kelly, A. Loh, G. D. P. Konat, P. J. Molina, M. Palat- nik, R. Pohjonen, E. Schindler, K. Schindler, R. Solmi, V. A. Vergu, E. Visser, K. van der Vlist, G. Wachsmuth, J. van der Woning, Evaluating and comparing lan- guage workbenches: Existing results an...
-
[10]
URL: https://www
GNU Project, GNU datamash, 2025. URL: https://www. gnu.org/software/datamash/
2025
-
[11]
Hoff, Lisp Query Notation — A DSL for Data Pro- cessing, 2024
A. Hoff, Lisp Query Notation — A DSL for Data Pro- cessing, 2024. doi:10.5281/zenodo.11001584
-
[12]
J. Giner-Miguelez, A. Gómez, J. Cabot, A Domain- Specific Language for Describing Machine Learning Datasets, Journal of Computer Languages (2023). doi:10.1016/j.cola.2023.101209
-
[13]
Heine, C
F. Heine, C. Kleiner, T. Oelsner, A DSL for Au- tomated Data Quality Monitoring, in: Database and Expert Systems Applications, 2020. doi:10.1007/ 978-3-030-59003-1_6
2020
-
[14]
A. de la Vega, D. García-Saiz, M. Zorrilla, P. Sánchez, Lavoisier: A DSL for Increasing the Level of Abstrac- tion of Data Selection and Formatting in Data Mining, Journal of Computer Languages (2020). doi:10.1016/ j.cola.2020.100987
-
[15]
B. Sal, D. García-Saiz, A. de la Vega, P. Sánchez, Domain-Specific Languages for the Automated Gen- eration of Datasets for Industry 4.0 Applications, Journal of Industrial Information Integration (2024). doi:10.1016/j.jii.2024.100657
-
[16]
Ackermann, V
S. Ackermann, V. Jovanovic, T. Rompf, M. Odersky, Jet: An Embedded DSL for High Performance Big Data Pro- cessing, 2012. URL: https://infoscience.epfl.ch/handle/ 20.500.14299/85985
2012
-
[17]
B. Vogel-Heuser, M. Zhang, M. Krüger, A. Vicaria, M. Gardill, Y. Jiang, A. Trächtler, H. Peters, M. Liewald, A. Schenek, P. Heinzelmann, M. Weyrich, DSL4DPiFS — A Graphical Notation to Model Data Pipeline De- ployment in Forming Systems, at - Automatisierung- stechnik (2025). doi:10.1515/auto-2024-0114
-
[18]
M. R. Berthold, N. Cebron, F. Dill, T. R. Gabriel, T. Köt- ter, T. Meinl, P. Ohl, K. Thiel, B. Wiswedel, KN- IME — the Konstanz information miner: version 2.0 and beyond, SIGKDD Explorations Newsletter (2009). doi:10.1145/1656274.1656280
-
[19]
U. Odyurt, R. Loendersloot, T. Tinga, Demonstrators for Industrial Cyber-Physical System Research: A Re- quirements Hierarchy Driven by Software-Intensive Design, 2026. doi:10.48550/arXiv.2510.18534
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.