Data Profiling for Change Rules

Fei Chiang; Nishttha Sharma

arxiv: 2606.07860 · v1 · pith:KLDKREXPnew · submitted 2026-06-05 · 💻 cs.DB

Data Profiling for Change Rules

Nishttha Sharma , Fei Chiang This is my paper

Pith reviewed 2026-06-27 19:54 UTC · model grok-4.3

classification 💻 cs.DB

keywords change rulesdata profilingdeclarative dependenciessequential changestrend analysiscausal relationshipsCR-Minerdata quality rules

0 comments

The pith

Change Rules quantify sequential changes among ordered tuples in both conditions and outcomes to model trends and causal relationships.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces Change Rules that measure sequential changes across ordered records in both the triggering attributes and the resulting attributes. This goes beyond existing database constraints that work on unordered data or narrow sets of attributes and cannot easily capture the context of changes. CR-Miner discovers the rules by building candidate change intervals level by level. If the approach holds, database systems gain direct support for analyzing how and why data evolves over time. Experiments indicate the miner runs 40-50 percent faster on average than prior methods.

Core claim

We introduce Change Rules (CRs) that quantify the sequential changes among ordered tuples in both the antecedent and consequent attributes. CRs aim to address the limitations of existing declarative dependencies to support trend analysis and causal relationships that trigger change among attributes. We propose CR-Miner, an automated algorithm for CR discovery that generates candidate change intervals in a level-wise manner. Experimental results show that CR-Miner achieves an average runtime improvement of 40-50% over existing baselines.

What carries the argument

Change Rules (CRs) that quantify sequential changes among ordered tuples in both antecedent and consequent attributes, discovered by the CR-Miner algorithm through level-wise generation of candidate change intervals.

If this is right

CRs model the context under which attribute changes occur.
CRs support trend analysis and causal relationships that trigger change among attributes.
CR-Miner discovers such rules with an average 40-50% runtime improvement over baselines.
Database systems obtain improved change management through rules that handle ordered sequential changes.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

CRs could be applied to time-stamped logs in monitoring systems to flag deviations from expected change sequences.
Combining CRs with statistical trend models might allow prediction of future attribute shifts once the rules are learned.
The level-wise mining approach may extend naturally to multi-attribute change patterns if candidate generation is adapted for joint intervals.

Load-bearing premise

The data consists of ordered tuples for which sequential change intervals can be meaningfully defined and that level-wise candidate generation will remain efficient without excessive pruning or false positives.

What would settle it

Running CR-Miner on an ordered dataset with independently verified ground-truth change intervals and measuring whether the output rules match the known sequential patterns without high false-positive rates.

Figures

Figures reproduced from arXiv: 2606.07860 by Fei Chiang, Nishttha Sharma.

**Figure 1.** Figure 1: Framework Overview CR-Miner. For each attribute Aj ∈ R, we define the set of sorted changes as SAj . Candidate intervals are generated from SAj by computing a bitset for each interval where the bit is set to 1 if the corresponding change lies in the given interval, and 0 otherwise. If the longest consecutive sequence of 1’s for an interval gap g satisfies the minimum segment coverage θc, g is retained as … view at source ↗

**Figure 2.** Figure 2: Computing context-aware changes where η is a fixed threshold and ν controls sensitivity to contextual deviation. The model learns a mapping f : −→zk → pk, where pk represents the predicted contextual behaviour of the target change. From the trained model, we extract attribute importance weights {w1, w2, . . . , wN }, where wj reflects the contribution of attribute Aj to explain the changes in At. We extra… view at source ↗

**Figure 3.** Figure 3: Generating candidate intervals and their bitsets [PITH_FULL_IMAGE:figures/full_fig_p007_3.png] view at source ↗

**Figure 4.** Figure 4: Comparative runtime evaluation CR-Miner extends the FastDD framework to ordered sequential data, whereas DDs are defined over all tuple pairs in the relation. Datasets. We use three real-world datasets covering healthcare, employment, and weather domains. Our data and source code are publicly available [17]. (1) MIMIC-III [8]: describes the healthcare information of patients admitted to the emergency depar… view at source ↗

**Figure 5.** Figure 5: Comparative number of discovered rules outperforms FastDD but exhibits a sharper increase at larger sizes, eventually exceeding FastDD at larger data sizes. This suggests that runtime is also influenced by attribute characteristics and interval generation complexity. Exp-2: Runtime vs. #attributes [PITH_FULL_IMAGE:figures/full_fig_p012_5.png] view at source ↗

**Figure 6.** Figure 6: Varying support threshold (a) F1 Score for changing W (b) Beta Runtime for changing W (c) F1 Score for changing η and ν [PITH_FULL_IMAGE:figures/full_fig_p013_6.png] view at source ↗

**Figure 7.** Figure 7: CR-Miner Performance thresholds enable more aggressive pruning of the search space, as fewer CR candidates satisfy the stricter criteria. Consequently, Figure 6b shows the number of discovered CRs for varying support. As the support threshold increases, the number of discovered CRs decreases, as expected. Exp-6: Varying window size W. Figures 7 (a)-(b) show the impact of varying the window size across dif… view at source ↗

**Figure 8.** Figure 8: Runtime for each module of the framework [PITH_FULL_IMAGE:figures/full_fig_p014_8.png] view at source ↗

read the original abstract

Understanding data change is critical towards understanding trends, normal vs. abnormal behaviours, recognizing patterns, and the causes of change. Existing database systems have limited support for change management, relying on statistics, triggers, and constraints. Data quality rules model sequential changes along a restricted set of attributes, quantify change among unordered tuples, and have limited ability to model the context under which attribute changes occur. In this paper, we introduce Change Rules (CRs) that quantify the sequential changes among ordered tuples in both the antecedent and consequent attributes. CRs aim to address the limitations of existing declarative dependencies to support trend analysis and causal relationships that trigger change among attributes. We propose CR-Miner, an automated algorithm for CR discovery that generates candidate change intervals in a level-wise manner. Experimental results show that CR-Miner achieves an average runtime improvement of 40-50% over existing baselines.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper defines Change Rules for ordered tuples with changes on both sides plus a level-wise miner, but the causal claim has no supporting mechanism.

read the letter

The punchline is that this paper defines a new kind of data quality rule called Change Rules for ordered tuples with changes on both sides, and supplies a level-wise miner for them, but the causal relationship angle is not supported by the presented method.

CRs are new in requiring order and modeling antecedent and consequent changes together. The CR-Miner algorithm uses level-wise generation of change intervals, which is a concrete algorithmic step beyond prior work on dependencies.

The paper does a reasonable job laying out the motivation from limitations of existing rules and giving an automated discovery procedure that claims better runtime.

The soft spots are around the causal claim. The abstract says CRs support causal relationships that trigger change, but the mining is co-occurrence based without any causal semantics or identification strategy. That part reads as aspirational rather than delivered. Experiments are cited for runtime but without dataset or baseline details in the abstract, so the gains are hard to evaluate from what's here. The core assumption that data has natural ordering for interval changes could be a practical limit.

This is for database researchers focused on data quality and profiling. Someone looking to extend rule types might get use from the definitions and algorithm. It deserves a serious referee because the new rule type is clearly defined and the algorithm is specified.

Recommendation: Send to peer review, with attention to clarifying what the rules actually provide versus the broader claims.

Referee Report

2 major / 1 minor

Summary. The paper introduces Change Rules (CRs) as declarative dependencies that quantify sequential changes among ordered tuples in both antecedent and consequent attributes, proposes the CR-Miner algorithm that performs level-wise generation of change intervals, and reports that CR-Miner achieves 40-50% average runtime improvement over baselines in experiments. The central motivation is that existing dependencies have limited support for trend analysis and causal relationships triggering change.

Significance. A sound formalization of sequential change rules with an efficient miner could extend data-profiling techniques for ordered data. The reported runtime gains, if substantiated with full experimental details, would be a practical contribution; however, the absence of any causal-identification machinery means the 'causal relationships' framing does not add distinguishing value beyond standard co-occurrence mining.

major comments (2)

[Abstract] Abstract: the claim that CRs 'support ... causal relationships that trigger change among attributes' is not supported by the described semantics. The approach is characterized as level-wise candidate generation on change intervals with co-occurrence quantification, which is standard association-rule style discovery and supplies no mechanism (interventions, temporal precedence with exclusion of alternatives, or confounders) that would justify a causal interpretation.
The weakest assumption listed in the reader note (ordered tuples admit meaningful sequential change intervals and level-wise generation remains efficient) is load-bearing for the algorithm's claimed practicality, yet no analysis of pruning effectiveness, false-positive rates, or scalability on non-ideal data is referenced.

minor comments (1)

[Abstract] The abstract supplies no information on datasets, baselines, statistical significance, or exclusion criteria for the reported 40-50% runtime improvement; these details are required to evaluate the experimental claims.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback on our paper. We address the major comments point by point below.

read point-by-point responses

Referee: [Abstract] Abstract: the claim that CRs 'support ... causal relationships that trigger change among attributes' is not supported by the described semantics. The approach is characterized as level-wise candidate generation on change intervals with co-occurrence quantification, which is standard association-rule style discovery and supplies no mechanism (interventions, temporal precedence with exclusion of alternatives, or confounders) that would justify a causal interpretation.

Authors: We agree that the CR semantics provide co-occurrence based discovery without causal mechanisms like interventions or confounder adjustment. The abstract's reference to causal relationships was an overstatement intended to convey the utility for identifying change triggers in trends. We will revise the abstract to remove this claim and focus solely on trend analysis and sequential change quantification. revision: yes
Referee: [—] The weakest assumption listed in the reader note (ordered tuples admit meaningful sequential change intervals and level-wise generation remains efficient) is load-bearing for the algorithm's claimed practicality, yet no analysis of pruning effectiveness, false-positive rates, or scalability on non-ideal data is referenced.

Authors: The assumption regarding ordered tuples and efficient level-wise generation is indeed central to CR-Miner. Our experiments demonstrate runtime gains on the evaluated datasets, supporting practicality under the stated conditions. We did not provide separate analysis of pruning or false positives on non-ideal data. We will include additional discussion on these assumptions and their validity in the revised version. revision: partial

Circularity Check

0 steps flagged

No circularity in derivation chain

full rationale

The paper introduces Change Rules as a new declarative dependency and presents CR-Miner as a level-wise candidate generation algorithm for mining them from ordered tuples. No equations, fitted parameters, predictions, or self-citations are described that would reduce any claimed result (such as runtime improvements) to a quantity defined by the method itself. Experimental runtime claims are benchmarked against external baselines rather than internal fits. The work is self-contained as an algorithmic proposal without load-bearing self-referential steps.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 1 invented entities

The paper introduces Change Rules as a new declarative construct and relies on the assumption that input data is ordered; no external benchmarks or machine-checked proofs are mentioned in the abstract.

axioms (1)

domain assumption Input relations consist of ordered tuples on which sequential change intervals can be defined.
Stated implicitly in the motivation for CRs and the design of CR-Miner.

invented entities (1)

Change Rule (CR) no independent evidence
purpose: Declarative specification of sequential changes linking antecedent and consequent attributes on ordered tuples.
Newly defined construct whose validity is evaluated by the discovery algorithm.

pith-pipeline@v0.9.1-grok · 5664 in / 1211 out tokens · 17399 ms · 2026-06-27T19:54:41.646245+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

20 extracted references · 1 canonical work pages

[1]

Proceedings of the VLDB Endowment12(2), 85–98 (2018)

Bleifuß, T., Bornemann, L., Johnson, T., Kalashnikov, D.V., Naumann, F., Srivas- tava, D.: Exploring change: A new dimension of data analytics. Proceedings of the VLDB Endowment12(2), 85–98 (2018)

2018
[2]

Datenbank-Spektrum18, 79–87 (2018)

Bornemann, L., Bleifuß, T., Kalashnikov, D., Naumann, F., Srivastava, D.: Data change exploration using time series clustering. Datenbank-Spektrum18, 79–87 (2018)

2018
[3]

Celkan, T.T.: What does a hemogram say to us? Turkish Archives of Pedi- atrics/Türk Pediatri Arşivi55(2), 103 (2020)

2020
[4]

ACM SIGMOD Record26(2), 26–37 (1997)

Chawathe, S.S., Garcia-Molina, H.: Meaningful change detection in structured data. ACM SIGMOD Record26(2), 26–37 (1997)

1997
[5]

Environment and Climate Change Canada: Daily climate observations (csv) dataset.GovernmentofCanada,MeteorologicalServiceofCanadaOpenData(nd), https://dd.weather.gc.ca/today/climate/observations/daily/csv/, accessed: 2026- 04-10

2026
[6]

Proceedings of the VLDB Endowment2(1), 574–585 (2009)

Golab, L., Karloff, H., Korn, F., Saha, A., Srivastava, D.: Sequential dependencies. Proceedings of the VLDB Endowment2(1), 574–585 (2009)

2009
[7]

Heckert, N.A., Filliben, J.J., Croarkin, C.M., Hembree, B., Guthrie, W.F., Tobias, P., Prinz, J.: Handbook 151: Nist/sematech e-handbook of statistical methods (2002)

2002
[8]

Scientific data3(1), 1–9 (2016)

Johnson, A.E., Pollard, T.J., Shen, L., Lehman, L.w.H., Feng, M., Ghassemi, M., Moody,B.,Szolovits,P.,AnthonyCeli,L.,Mark,R.G.:Mimic-iii,afreelyaccessible critical care database. Scientific data3(1), 1–9 (2016)

2016
[9]

In: VLDB Workshops (2023)

Kanza, Y., Malik, R., Srivastava, D., Stone, C., Woodhull, G.: Data quality in data streams by modular change point detection. In: VLDB Workshops (2023)

2023
[10]

Proceedings of the VLDB Endowment17(7), 1552–1564 (2024)

Kuang, S., Yang, H., Tan, Z., Ma, S.: Efficient differential dependency discovery. Proceedings of the VLDB Endowment17(7), 1552–1564 (2024)

2024
[11]

In: East European Conference on Advances in Databases and Information Systems

Kwashie, S., Liu, J., Li, J., Ye, F.: Conditional differential dependencies (cdds). In: East European Conference on Advances in Databases and Information Systems. pp. 3–17. Springer (2015) Data Profiling for Change Rules 17

2015
[12]

Liu,F.T.,Ting,K.M.,Zhou,Z.H.:Isolationforest.In:2008eighthieeeinternational conference on data mining. pp. 413–422. IEEE (2008)

2008
[13]

In: Proceedings

Malhotra, P., Vig, L., Shroff, G., Agarwal, P., et al.: Long short term memory networks for anomaly detection in time series. In: Proceedings. vol. 89, p. 94 (2015)

2015
[14]

Big and Complex Data Analysis: Methodologies and Applications pp

Qiu, P.: Statistical process control charts as a tool for analyzing big data. Big and Complex Data Analysis: Methodologies and Applications pp. 123–138 (2017)

2017
[15]

In: Proceedings of the MLSDA 2014 2nd workshop on machine learning for sensory data analysis

Sakurada, M., Yairi, T.: Anomaly detection using autoencoders with nonlinear dimensionality reduction. In: Proceedings of the MLSDA 2014 2nd workshop on machine learning for sensory data analysis. pp. 4–11 (2014)

2014
[16]

UCI Machine Learning Repository (2018), DOI: https://doi.org/10.24432/C5B034

Salam, A., El Hibaoui, A.: Power Consumption of Tetouan City. UCI Machine Learning Repository (2018), DOI: https://doi.org/10.24432/C5B034

work page doi:10.24432/c5b034 2018
[17]

Sharma, N.: Cr-miner: Change rules discovery repository (2026), https://github.com/nishtthasharma/Change_Rules

2026
[18]

ACM Transactions on Database Systems (TODS)36(3), 1–41 (2011)

Song, S., Chen, L.: Differential dependencies: Reasoning and discovery. ACM Transactions on Database Systems (TODS)36(3), 1–41 (2011)

2011
[19]

arXiv preprint arXiv:1608.06169 (2016)

Szlichta, J., Godfrey, P., Golab, L., Kargar, M., Srivastava, D.: Effective and com- plete discovery of order dependencies via set-based axiomatization. arXiv preprint arXiv:1608.06169 (2016)

Pith/arXiv arXiv 2016
[20]

US Census Bureau Data (2024), https://www.census.gov/, accessed: 2026-04-10

United States Census Bureau: State naics detailed employment sizes dataset. US Census Bureau Data (2024), https://www.census.gov/, accessed: 2026-04-10

2024

[1] [1]

Proceedings of the VLDB Endowment12(2), 85–98 (2018)

Bleifuß, T., Bornemann, L., Johnson, T., Kalashnikov, D.V., Naumann, F., Srivas- tava, D.: Exploring change: A new dimension of data analytics. Proceedings of the VLDB Endowment12(2), 85–98 (2018)

2018

[2] [2]

Datenbank-Spektrum18, 79–87 (2018)

Bornemann, L., Bleifuß, T., Kalashnikov, D., Naumann, F., Srivastava, D.: Data change exploration using time series clustering. Datenbank-Spektrum18, 79–87 (2018)

2018

[3] [3]

Celkan, T.T.: What does a hemogram say to us? Turkish Archives of Pedi- atrics/Türk Pediatri Arşivi55(2), 103 (2020)

2020

[4] [4]

ACM SIGMOD Record26(2), 26–37 (1997)

Chawathe, S.S., Garcia-Molina, H.: Meaningful change detection in structured data. ACM SIGMOD Record26(2), 26–37 (1997)

1997

[5] [5]

Environment and Climate Change Canada: Daily climate observations (csv) dataset.GovernmentofCanada,MeteorologicalServiceofCanadaOpenData(nd), https://dd.weather.gc.ca/today/climate/observations/daily/csv/, accessed: 2026- 04-10

2026

[6] [6]

Proceedings of the VLDB Endowment2(1), 574–585 (2009)

Golab, L., Karloff, H., Korn, F., Saha, A., Srivastava, D.: Sequential dependencies. Proceedings of the VLDB Endowment2(1), 574–585 (2009)

2009

[7] [7]

Heckert, N.A., Filliben, J.J., Croarkin, C.M., Hembree, B., Guthrie, W.F., Tobias, P., Prinz, J.: Handbook 151: Nist/sematech e-handbook of statistical methods (2002)

2002

[8] [8]

Scientific data3(1), 1–9 (2016)

Johnson, A.E., Pollard, T.J., Shen, L., Lehman, L.w.H., Feng, M., Ghassemi, M., Moody,B.,Szolovits,P.,AnthonyCeli,L.,Mark,R.G.:Mimic-iii,afreelyaccessible critical care database. Scientific data3(1), 1–9 (2016)

2016

[9] [9]

In: VLDB Workshops (2023)

Kanza, Y., Malik, R., Srivastava, D., Stone, C., Woodhull, G.: Data quality in data streams by modular change point detection. In: VLDB Workshops (2023)

2023

[10] [10]

Proceedings of the VLDB Endowment17(7), 1552–1564 (2024)

Kuang, S., Yang, H., Tan, Z., Ma, S.: Efficient differential dependency discovery. Proceedings of the VLDB Endowment17(7), 1552–1564 (2024)

2024

[11] [11]

In: East European Conference on Advances in Databases and Information Systems

Kwashie, S., Liu, J., Li, J., Ye, F.: Conditional differential dependencies (cdds). In: East European Conference on Advances in Databases and Information Systems. pp. 3–17. Springer (2015) Data Profiling for Change Rules 17

2015

[12] [12]

Liu,F.T.,Ting,K.M.,Zhou,Z.H.:Isolationforest.In:2008eighthieeeinternational conference on data mining. pp. 413–422. IEEE (2008)

2008

[13] [13]

In: Proceedings

Malhotra, P., Vig, L., Shroff, G., Agarwal, P., et al.: Long short term memory networks for anomaly detection in time series. In: Proceedings. vol. 89, p. 94 (2015)

2015

[14] [14]

Big and Complex Data Analysis: Methodologies and Applications pp

Qiu, P.: Statistical process control charts as a tool for analyzing big data. Big and Complex Data Analysis: Methodologies and Applications pp. 123–138 (2017)

2017

[15] [15]

In: Proceedings of the MLSDA 2014 2nd workshop on machine learning for sensory data analysis

Sakurada, M., Yairi, T.: Anomaly detection using autoencoders with nonlinear dimensionality reduction. In: Proceedings of the MLSDA 2014 2nd workshop on machine learning for sensory data analysis. pp. 4–11 (2014)

2014

[16] [16]

UCI Machine Learning Repository (2018), DOI: https://doi.org/10.24432/C5B034

Salam, A., El Hibaoui, A.: Power Consumption of Tetouan City. UCI Machine Learning Repository (2018), DOI: https://doi.org/10.24432/C5B034

work page doi:10.24432/c5b034 2018

[17] [17]

Sharma, N.: Cr-miner: Change rules discovery repository (2026), https://github.com/nishtthasharma/Change_Rules

2026

[18] [18]

ACM Transactions on Database Systems (TODS)36(3), 1–41 (2011)

Song, S., Chen, L.: Differential dependencies: Reasoning and discovery. ACM Transactions on Database Systems (TODS)36(3), 1–41 (2011)

2011

[19] [19]

arXiv preprint arXiv:1608.06169 (2016)

Szlichta, J., Godfrey, P., Golab, L., Kargar, M., Srivastava, D.: Effective and com- plete discovery of order dependencies via set-based axiomatization. arXiv preprint arXiv:1608.06169 (2016)

Pith/arXiv arXiv 2016

[20] [20]

US Census Bureau Data (2024), https://www.census.gov/, accessed: 2026-04-10

United States Census Bureau: State naics detailed employment sizes dataset. US Census Bureau Data (2024), https://www.census.gov/, accessed: 2026-04-10

2024