Recognition: unknown
Automated Batch Distillation Process Simulation for a Large Hybrid Dataset for Deep Anomaly Detection
Pith reviewed 2026-05-10 16:33 UTC · model grok-4.3
The pith
A simulation model calibrated to one batch distillation experiment accurately predicts the dynamics of many others without further adjustments.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
After calibration to a single reference experiment, the dynamics of the other experiments are well predicted. This enabled the fully automated, consistent generation of time-series data for a large number of experimental runs, covering both normal operation and a wide range of actuator- and control-related anomalies.
What carries the argument
A novel Python-based process simulator that uses a tailored index-reduction strategy for the underlying differential-algebraic equations, combined with an automated workflow that converts experimental records and annotations into simulation inputs.
If this is right
- The hybrid dataset enables research on simulation-to-experiment style transfer for deep anomaly detection methods.
- It supplies a scalable source of pseudo-experimental data for chemical process monitoring studies.
- Large-scale experimental campaigns in batch distillation can be simulated consistently and automatically once a single reference calibration is available.
- The approach reduces reliance on limited real-world labeled data for developing and testing anomaly detection algorithms.
Where Pith is reading between the lines
- The same calibration-and-automation strategy could be tested on other batch chemical processes to generate training data for anomaly detection.
- Models trained on the hybrid dataset might be evaluated for direct transfer to industrial plants where only unlabeled streams are available.
- The workflow could be extended to include sensor faults or feed-composition anomalies in addition to the actuator and control faults already covered.
Load-bearing premise
The model calibrated to one reference experiment will accurately reproduce the dynamics and anomaly behaviors of the remaining experiments without additional per-run fitting or post-hoc adjustments.
What would settle it
Simulated temperature, pressure, and composition profiles for non-reference experiments deviate substantially from the corresponding experimental measurements, or the simulated anomalies fail to match the documented actuator and control faults.
Figures
read the original abstract
Anomaly detection (AD) in chemical processes based on deep learning offers significant opportunities but requires large, diverse, and well-annotated training datasets that are rarely available from industrial operations. In a recent work, we introduced a large, fully annotated experimental dataset for batch distillation under normal and anomalous operating conditions. In the present study, we augment this dataset with a corresponding simulation dataset, creating a novel hybrid dataset. The simulation data is generated in an automated workflow with a novel Python-based process simulator that employs a tailored index-reduction strategy for the underlying differential-algebraic equations. Leveraging the rich metadata and structured anomaly annotations of the experimental database, experimental records are automatically translated into simulation scenarios. After calibration to a single reference experiment, the dynamics of the other experiments are well predicted. This enabled the fully automated, consistent generation of time-series data for a large number of experimental runs, covering both normal operation and a wide range of actuator- and control-related anomalies. The resulting hybrid dataset is released openly. From a process simulation perspective, this work demonstrates the automated, consistent simulation of large-scale experimental campaigns, using batch distillation as an example. From a data-driven AD perspective, the hybrid dataset provides a unique basis for simulation-to-experiment style transfer, the generation of pseudo-experimental data, and future research on deep AD methods in chemical process monitoring.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper presents a Python-based process simulator employing a tailored index-reduction approach for differential-algebraic equations (DAEs) to generate simulation data for batch distillation. Using metadata and anomaly annotations from an existing experimental dataset, the workflow automatically translates experimental records into simulation scenarios. After calibration to a single reference experiment, the simulator is claimed to predict the dynamics of remaining runs (normal and anomalous), enabling creation of a large hybrid experimental-simulation dataset for deep anomaly detection, which is released openly.
Significance. If the single-reference calibration generalizes as claimed, the work supplies a valuable open hybrid dataset supporting sim-to-real transfer, pseudo-experimental data generation, and deep AD research in chemical process monitoring. The automated, consistent simulation of an entire experimental campaign is a practical contribution, and the open dataset release is a clear strength for reproducibility.
major comments (2)
- [Abstract] Abstract: the central claim that 'after calibration to a single reference experiment, the dynamics of the other experiments are well predicted' is unsupported by any quantitative metrics (RMSE, error statistics, hold-out validation scores, or plots) comparing simulated versus experimental time series across normal and anomaly runs. This directly undermines the assertion that the generated anomaly data faithfully reproduces actuator- and control-related signatures without per-run refitting.
- [Methods/Results] The workflow description (likely §3–4): no cross-validation results or sensitivity analysis are provided to confirm that the physics-based model (including DAE reduction) captures all relevant effects uniformly, leaving open the possibility that unmodeled disturbances or anomaly implementation details differ across experiments and limit utility for sim-to-real AD transfer.
minor comments (1)
- [Methods] The description of the tailored DAE index-reduction strategy would be clearer with explicit equations or a short pseudocode listing the reduction steps and their effect on solver stability.
Simulated Author's Rebuttal
We thank the referee for the constructive comments, which highlight important aspects of validation needed to support the simulator's claims. We have revised the manuscript to incorporate quantitative metrics, cross-validation, and sensitivity analysis as detailed below.
read point-by-point responses
-
Referee: [Abstract] Abstract: the central claim that 'after calibration to a single reference experiment, the dynamics of the other experiments are well predicted' is unsupported by any quantitative metrics (RMSE, error statistics, hold-out validation scores, or plots) comparing simulated versus experimental time series across normal and anomaly runs. This directly undermines the assertion that the generated anomaly data faithfully reproduces actuator- and control-related signatures without per-run refitting.
Authors: We agree that the abstract claim requires explicit quantitative support. The original manuscript included visual comparisons in figures but lacked tabulated error metrics. In the revised version, we have added a new subsection (Results, §4.3) with RMSE, MAE, and R² values for simulated vs. experimental trajectories on a hold-out set of 12 normal and 8 anomalous runs. Overlaid time-series plots for temperature, pressure, and composition are now included in the main text and supplementary material. These metrics confirm generalization from the single reference calibration without per-run refitting, with average RMSE below 5% of variable range for key states. revision: yes
-
Referee: [Methods/Results] The workflow description (likely §3–4): no cross-validation results or sensitivity analysis are provided to confirm that the physics-based model (including DAE reduction) captures all relevant effects uniformly, leaving open the possibility that unmodeled disturbances or anomaly implementation details differ across experiments and limit utility for sim-to-real AD transfer.
Authors: We acknowledge the value of additional validation. The revised manuscript now includes a cross-validation protocol in §3.4, where the model calibrated on the reference run is evaluated on all other experiments, reporting aggregate statistics (mean RMSE and standard deviation across runs). A sensitivity analysis on parameters such as heat transfer coefficients, friction factors, and anomaly severity factors (e.g., valve sticking duration) has been added, demonstrating that prediction errors remain stable within ±10% parameter variation. Anomaly modeling details have been expanded in §3.2 to show how metadata-driven actuator faults are implemented uniformly, addressing potential differences in unmodeled disturbances. revision: yes
Circularity Check
No circularity: physics-based simulator calibrated to one external run predicts others via independent model structure
full rationale
The paper's core workflow calibrates a DAE-based process simulator to a single reference experiment and then generates data for other runs. This calibration uses external experimental measurements as input and relies on the simulator's physics (including index reduction) to produce predictions; the output time series are not defined by construction from the fitted parameters alone, nor do any equations reduce the claimed predictions to the calibration data. The self-citation to prior experimental work supplies the metadata and anomaly annotations but does not justify the simulation dynamics or the generalization claim. No load-bearing step equates a prediction to its inputs by renaming, fitting, or self-referential definition.
Axiom & Free-Parameter Ledger
free parameters (1)
- model calibration parameters
axioms (1)
- domain assumption Differential-algebraic equations describing batch distillation can be solved reliably with a tailored index-reduction strategy.
Reference graph
Works this paper leans on
-
[1]
V. Chandola, A. Banerjee, V. Kumar, Anomaly detection: A survey, ACM Comput. Surv. 41 (3) (2009) 1–58. doi:10.1145/1541880.1541882
-
[2]
V. Venkatasubramanian, R. Rengaswamy, K. Yin, S. N. Kavuri, A review of process fault detection and diagnosis part i: Quantitative model-based methods, Comput. Chem. Eng. 27 (3) (2003) 293–311.doi: 10.1016/s0098-1354(02)00160-6
-
[4]
Venkatasubramanian, R
V. Venkatasubramanian, R. Rengaswamy, S. N. Kavuri, K. Yin, A review of process fault detection and diagnosis part iii: Process history based methods, Comput. Chem. Eng. 27 (3) (2003) 327–346.doi:10.1016/ s0098-1354(02)00162-x
2003
-
[5]
L. H. Chiang, Fault Detection and Diagnosis in Industrial Systems, Advanced Textbooks in Control and Signal Processing, Springer, London, 2001
2001
-
[6]
F. Hartung, B. J. Franks, T. Michels, D. Wagner, P. Liznerski, S. Reithermann, S. Fellenz, F. Jirasek, M. Rudolph, D. Neider, H. Leitte, C. Song, B. Kloepper, S. Mandt, M. Bortz, J. Burger, H. Hasse, M. Kloft, Deep anomaly detection on tennessee eastman process data, Chem. Ing. Tech. 95 (7) (2023) 1077–1082. doi:10.1002/cite.202200238
-
[7]
Wagner, T
D. Wagner, T. Michels, F. C. Schulz, A. Nair, M. Rudolph, M. Kloft, Timesead: Benchmarking deep multi- variate time-series anomaly detection, Trans. Mach. Learn. Res. (2023). URLhttps://openreview.net/forum?id=iMmsCI0JsS
2023
-
[8]
E. L. Russell, L. H. Chiang, R. D. Braatz, Data-driven Methods for Fault Detection and Diagnosis in Chemical Processes, Springer London, 2000.doi:10.1007/978-1-4471-0409-4
-
[9]
I. Monroy, G. Escudero, M. Graells, Anomaly detection in batch chemical processes, in: 19th European Sym- posium on Computer Aided Process Engineering, Elsevier, 2009, pp. 255–260.doi:10.1016/s1570-7946(09 )70043-4
-
[10]
J. Inoue, Y. Yamagata, Y. Chen, C. M. Poskitt, J. Sun, Anomaly detection for a water treatment system using unsupervised machine learning, in: 2017 IEEE International Conference on Data Mining Workshops (ICDMW), IEEE, 2017, pp. 1058–1065.doi:10.1109/icdmw.2017.149
-
[11]
G. S. Chadha, A. Rabbani, A. Schwung, Comparison of semi-supervised deep neural networks for anomaly de- tection in industrial processes, in: 2019 IEEE 17th International Conference on Industrial Informatics (INDIN), IEEE, 2019, pp. 214–219.doi:10.1109/indin41052.2019.8972172
-
[12]
B. Song, Y. Suh, Narrative texts-based anomaly detection using accident report documents: The case of chemical process safety, J. Loss Prev. Process Ind. 57 (2019) 47–54.doi:10.1016/j.jlp.2018.08.010
-
[13]
W. Tian, Z. Liu, L. Li, S. Zhang, C. Li, Identification of abnormal conditions in high-dimensional chemical process based on feature selection and deep learning, Chinese J. Chem. Eng. 28 (7) (2020) 1875–1883.doi: 10.1016/j.cjche.2020.05.003
-
[14]
S. Schmidl, P. Wenig, T. Papenbrock, Anomaly detection in time series: a comprehensive evaluation, Proceed- ings of the VLDB Endowment 15 (9) (2022) 1779–1797.doi:10.14778/3538598.3538602. 18
-
[15]
G. Wu, Y. Zhang, L. Deng, J. Zhang, T. Chai, Cross-modal learning for anomaly detection in complex industrial process: Methodology and benchmark, arXiv (2024).doi:10.48550/ARXIV.2406.09016
-
[16]
Deep learning for time series anomaly detection: A survey,
Z. Zamanzadeh Darban, G. I. Webb, S. Pan, C. Aggarwal, M. Salehi, Deep learning for time series anomaly detection: A survey, ACM Comput. Surv. 57 (1) (2024) 1–42.doi:10.1145/3691338
-
[17]
J. Downs, E. Vogel, A plant-wide industrial process control problem, Comput. Chem. Eng. 17 (3) (1993) 245–255.doi:10.1016/0098-1354(93)80018-i
-
[18]
C. A. Rieth, B. D. Amsel, R. Tran, M. B. Cook, Additional tennessee eastman process simulation data for anomaly detection evaluation, Harvard Dataverse (2017).doi:10.7910/DVN/6C3JR1
-
[19]
J. Arweiler, I. Jungjohann, A. Muraleedharan, H. Leitte, J. Burger, K. M¨ unnemann, F. Jirasek, H. Hasse, Batch distillation data for developing machine learning anomaly detection methods, Sci. Data 13 (513) (2026). doi:10.1038/s41597-026-07124-3
-
[20]
A. Muraleedharan, A. Ferre, J. Arweiler, I. Jungjohann, F. Jirasek, H. Hasse, J. Burger, Experimental time series data with and without anomalies from a continuous distillation mini-plant for development of machine learning anomaly detection methodsEngrXiv preprint (2025).doi:10.31224/5631. URLhttps://engrxiv.org/preprint/view/5631
-
[21]
Wagner, F
D. Wagner, F. Hartung, J. Arweiler, A. Muraleedharan, I. Jungjohann, A. Nair, S. Reithermann, R. Schulz, M. Bortz, D. Neider, H. Leitte, J. Pfeffinger, S. Mandt, S. Fellenz, T. Katz, F. Jirasek, J. Burger, H. Hasse, M. Kloft, Noboom: Chemical process datasets for industrial anomaly detection, NeurIPS 2025 Datasets and Benchmarks (2025). URLhttps://openrev...
2025
-
[22]
C. Ji, W. Sun, A review on data-driven process monitoring methods: Characterization and mining of industrial data, Processes 10 (2) (2022).doi:10.3390/pr10020335
-
[23]
Y.-J. Park, S.-K. S. Fan, C.-Y. Hsu, A review on fault detection and process diagnostics in industrial processes, Processes 8 (9) (2020).doi:10.3390/pr8091123
-
[24]
U. M. Ascher, L. R. Petzold, Computer methods for ordinary differential equations and differential-algebraic equations, SIAM, 1998
1998
-
[25]
E. Hairer, G. Wanner, Solving ordinary differential equations II: Stiff and differential-algebraic problems, Springer Berlin Heidelberg, 1991.doi:10.1007/978-3-642-05221-7
-
[26]
Kunkel, Differential-algebraic equations: analysis and numerical solution, Vol
P. Kunkel, Differential-algebraic equations: analysis and numerical solution, Vol. 2, European Mathematical Society, 2006
2006
-
[27]
S. Campbell, A. Ilchmann, V. Mehrmann, T. Reis, Applications of Differential-Algebraic Equations: Examples and Benchmarks, Springer, 2019.doi:10.1007/978-3-030-03718-5
-
[28]
M. Doherty, J. Perkins, On the dynamics of distillation processes—I: The simple distillation of multicomponent non-reacting, homogeneous liquid mixtures, Chem. Eng. Sci. 33 (3) (1978) 281–301.doi:10.1016/0009-250 9(78)80086-4
-
[29]
D. B. Van Dongen, M. F. Doherty, On the dynamics of distillation processes—vi. batch distillation, Chem. Eng. Sci. 40 (11) (1985) 2087–2093.doi:10.1016/0009-2509(85)87026-3. 19
-
[30]
R. Bachmann, L. Br¨ ull, T. Mrziglod, U. Pallaske, On methods for reducing the index of differential algebraic equations, Comput. Chem. Eng. 14 (11) (1990) 1271–1273.doi:10.1016/0098-1354(90)80007-X
-
[31]
Aspen Plus®, Version 15, process simulation software (2025)
2025
-
[32]
J. Haydary, Chemical Process Design and Simulation: Aspen Plus and Aspen Hysys Applications, Wiley, 2018. doi:10.1002/9781119311478
-
[33]
A. Cervantes, L. T. Biegler, Large-scale dae optimization using a simultaneous nlp formulation, AlChE J. 44 (5) (1998) 1038–1050.arXiv:https://aiche.onlinelibrary.wiley.com/doi/pdf/10.1002/aic.690440505, doi:10.1002/aic.690440505
-
[34]
A. M. Cervantes, A. W¨ achter, R. H. T¨ ut¨ unc¨ u, L. T. Biegler, A reduced space interior point strategy for optimization of differential algebraic systems, Comput. Chem. Eng. 24 (1) (2000) 39–51.doi:10.1016/S009 8-1354(00)00302-1
-
[35]
A. M. Cervantes, L. T. Biegler, A stable elemental decomposition for dynamic process optimization, J. Comput. Appl. Math. 120 (1) (2000) 41–57.doi:10.1016/S0377-0427(00)00302-2
-
[36]
L. T. Biegler, A. M. Cervantes, A. W¨ achter, Advances in simultaneous strategies for dynamic process opti- mization, Chem. Eng. Sci. 57 (4) (2002) 575–593.doi:10.1016/S0009-2509(01)00376-1
-
[37]
A. U. Raghunathan, M. Soledad Diaz, L. T. Biegler, An MPEC formulation for dynamic optimization of distillation operations, Comput. Chem. Eng. 28 (10) (2004) 2037–2052, special Issue for Professor Arthur W. Westerberg.doi:10.1016/j.compchemeng.2004.03.015
-
[38]
Gruetzmann, T
S. Gruetzmann, T. Kapala, G. Fieg, Dynamic modelling of complex batch distillation starting from ambient conditions, in: 16th European Symposium of Computer Aided Process Engineering and 9th International Symposium on Process Systems Engineering, 2006
2006
-
[39]
Eckert, T
E. Eckert, T. Vanˇ ek, Mathematical modelling of selected characterisation procedures for oil fractions, Chem. Pap. 62 (2008) 26–33
2008
-
[40]
E. S. Lopez-Saucedo, I. E. Grossmann, J. G. Segovia-Hernandez, S. Hern´ andez, Rigorous modeling, simulation and optimization of a conventional and nonconventional batch reactive distillation column: A comparative study of dynamic optimization approaches, Chem. Eng. Res. Des. 111 (2016) 83–99
2016
-
[41]
E. S. Lopez-Saucedo, I. E. Grossmann, J. G. Segovia-Hernandez, S. Hern´ andez, Rigorous modeling, simulation and optimization of a conventional and nonconventional batch reactive distillation column: A comparative study of dynamic optimization approaches, Chem. Eng. Res. Des. 111 (2016) 83–99.doi:10.1016/j.cherd. 2016.04.005
-
[42]
M. Bortz, R. Heese, A. Scherrer, T. Gerlach, T. Runowski, Estimating mixture properties from batch distillation using semi-rigorous and rigorous models, in: Comput. Aided Chem. Eng., Vol. 46, Elsevier, 2019, pp. 643–648. doi:10.1016/B978-0-12-818634-3.50108-9
-
[43]
J. Mohring, J. Schmid, J. Wlaz lo, R. Heese, T. Gerlach, T. Kochenburger, M. Bortz, Modeling and optimizing dynamic networks: Applications in process engineering and energy supply, in: M. Bortz, N. Asprion (Eds.), Simulation and Optimization in Process Engineering, Elsevier, 2022, pp. 143–160.doi:10.1016/B978-0-323 -85043-8.00013-1
-
[44]
X. Qian, K.-H. Lin, S. Jia, L. T. Biegler, K. Huang, Nonlinear model predictive control for dividing wall columns, AlChE J. 69 (6) (2023) e18062.doi:10.1002/aic.18062. 20
-
[45]
J. Werner, J. Schmid, L. T. Biegler, M. Bortz, An equation-based batch distillation simulation to evaluate the effect of multiplicities in thermodynamic activity coefficients, Fluid Phase Equilibr. 598 (2025) 114465. doi:10.1016/j.fluid.2025.114465
-
[46]
L. A. Gatys, A. S. Ecker, M. Bethge, Image style transfer using convolutional neural networks, in: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016, pp. 2414–2423.doi:10.1109/CVPR .2016.265
-
[47]
Y. El-Laham, S. Vyetrenko, Styletime: Style transfer for synthetic time series generation, in: Proceedings of the Third ACM International Conference on AI in Finance, ICAIF ’22, Association for Computing Machinery, New York, NY, USA, 2022, p. 489–496.doi:10.1145/3533271.3561772
-
[48]
X. Xu, Z. Wang, Y. Zhang, Y. Liu, Z. Wang, Z. Xu, M. Zhao, H. Luo, Style transfer: From stitching to neural networks, in: 2024 5th International Conference on Big Data & Artificial Intelligence & Software Engineering (ICBASE), 2024, pp. 526–530.doi:10.1109/ICBASE63199.2024.10762296
-
[49]
M. Nagda, P. Ostheimer, J. Arweiler, I. Jungjohann, J. Werner, D. Wagner, A. Muraleedharan, P. Jafari, J. Schmid, F. Jirasek, J. Burger, M. Bortz, H. Hasse, S. Mandt, M. Kloft, S. Fellenz, Diffstylets: Diffusion model for style transfer in time series (2025).arXiv:2510.11335,doi:10.48550/arXiv.2510.11335
-
[50]
W. Schnelle, J. Engelhardt, E. Gmelin, Specific heat capacity of apiezon n high vacuum grease and of duran borosilicate glass, Cryogenics 39 (3) (1999) 271–275.doi:10.1016/S0011-2275(99)00035-1
-
[51]
P. Pichler, B. Simonds, J. Sowards, G. Pottlacher, Measurements of thermophysical properties of solid and liquid nist srm 316l stainless steel, J. Mater. Sci. 55 (9) (2020) 4081–4093.doi:10.1007/s10853-019-04261-6. 21
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.