Evaluating Heuristics for Iterative Impact Analysis

Maksym Petrenko; V\'aclav Rajlich; Yibin Wang

arxiv: 1907.08730 · v1 · pith:IHTNACQWnew · submitted 2019-07-20 · 💻 cs.SE

Evaluating Heuristics for Iterative Impact Analysis

Yibin Wang , Maksym Petrenko , V\'aclav Rajlich This is my paper

Pith reviewed 2026-05-24 19:14 UTC · model grok-4.3

classification 💻 cs.SE

keywords iterative impact analysispropagation heuristicstermination heuristicssoftware change impactprecision and recallreenactment simulationopen source repositoriesimpact analysis techniques

0 comments

The pith

Propagation heuristics do not improve iterative impact analysis beyond random inspection of dependencies.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Iterative impact analysis starts from one changed unit and follows program dependencies to locate other impacted units, continuing until a termination rule signals completion. The paper tests several propagation heuristics drawn from earlier work, paired with a practical termination heuristic, by simulating developer actions on real changes mined from open source repositories. The reenactment shows that the overall IIA process achieves higher recall than other impact analysis methods. However, the tested heuristics produce no gain in precision or recall compared with simply selecting units at random for inspection.

Core claim

Iterative impact analysis provides better recall than the other known impact analysis techniques. However the IIA with the propagation heuristics that we investigated does not supersede IIA combined with a random inspection, and hence these heuristics do not help the IIA.

What carries the argument

Reenactment process that simulates developers applying propagation heuristics and a termination rule while traversing dependencies on historical software changes.

If this is right

IIA yields higher recall than alternative impact analysis techniques.
The investigated propagation heuristics produce no improvement in precision or recall over random unit selection.
Termination heuristics influence when the analysis stops and therefore affect estimates of completeness.
Heuristic-guided IIA does not reduce missed impacted units or unnecessary inspections relative to random inspection.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

New propagation rules would need to outperform random selection to be worth implementing in developer tools.
The reenactment method itself could serve as a low-cost way to screen future heuristics before real-user trials.
Results raise the question of whether dependency traversal in impact analysis benefits more from different kinds of guidance than from the rules tested here.

Load-bearing premise

The reenactment process accurately models how developers would apply the heuristics in real development workflows when deciding which units to inspect.

What would settle it

A field study in which developers perform live IIA on actual changes, comparing the number of missed impacted units and irrelevant inspections when using the heuristics versus random selection.

read the original abstract

Iterative impact analysis (IIA) is a process that allows developers to estimate the impacted units of a software change. Starting from a single impacted unit, the developers inspect its interacting units via program dependencies to identify the ones that are also impacted, and this process continues iteratively. Experience has shown that developers often miss impacted units and inspect many irrelevant units. In this work, we study propagation heuristics that guide developers to find the actual impacted units and termination heuristics that help to decide whether the estimated impact is complete. The roles of these two kinds of heuristics are complementary and affect both the precision and recall when used during IIA. We investigated several propagation heuristics adapted from previously published papers and combined them with a practical termination heuristic. We developed a reenactment process that simulates the actions of developers who use those heuristics during IIA, and we assessed their performance. The software changes for our reenactment were mined from the repositories of open source projects. We found that IIA provides better recall than the other known impact analysis techniques. However the IIA with the propagation heuristics that we investigated does not supersede IIA combined with a random inspection, and hence these heuristics do not help the IIA.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper's reenactment on mined changes finds that the tested propagation heuristics add nothing over random inspection for iterative impact analysis.

read the letter

The main result is that IIA with those adapted propagation heuristics does not beat plain random selection on the precision and recall metrics in their simulation, even though IIA itself shows better recall than other impact analysis approaches. They mine real changes from open source repositories, adapt heuristics from prior work, combine them with a termination rule, and run a reenactment that steps through the iterative inspection process. The direct comparison to random is useful because it gives a clear baseline instead of just reporting absolute numbers. The data source and the negative outcome against random are the parts that feel grounded. The reenactment method itself is the soft spot. It imposes a fixed ordering and selection rule that may not match how developers actually mix heuristics with their own knowledge or decide when to stop. If that modeling choice drives the result, then the claim that the heuristics do not help could be narrower than it first appears. The abstract leaves the exact random selection procedure and any statistical tests unspecified, so those details will determine how much the finding travels. This is the sort of incremental empirical check that belongs in a software engineering maintenance venue. Readers working on impact analysis or change impact tools would find the negative result worth seeing, even if only to think about better ways to simulate or test heuristics. I would send it to peer review because the setup is concrete enough for referees to evaluate the reenactment logic and the random baseline directly.

Referee Report

3 major / 2 minor

Summary. The paper claims that iterative impact analysis (IIA) yields higher recall than other impact analysis techniques when evaluated via reenactment on changes mined from open-source repositories. However, the propagation heuristics studied (adapted from prior work and paired with a termination heuristic) do not improve precision or recall over a random-inspection baseline, leading to the conclusion that these heuristics do not help IIA.

Significance. If the reenactment faithfully models developer decision-making, the result would indicate that commonly proposed propagation heuristics add little value beyond random selection during IIA, shifting research emphasis toward termination heuristics or hybrid approaches that incorporate domain knowledge. The use of real mined changes from repositories provides a concrete empirical basis rather than synthetic examples.

major comments (3)

[Section 3] Reenactment process (Section 3): the simulation imposes a fixed ordering and selection rule based solely on the heuristic score or random choice, but provides no mechanism for developers to interleave domain knowledge or terminate early; this directly undermines the central claim that the heuristics 'do not supersede' random inspection, as the outcome may be an artifact of the rigid model rather than a property of the heuristics.
[Section 3.2] Random baseline definition (Section 3.2 and results in Section 4): the procedure for selecting units under the random condition is not fully specified (e.g., uniform over all units, over neighbors, or stratified), nor is the exact number of trials or variance reported; without this, the comparison that heuristics add no value cannot be verified and is load-bearing for the main conclusion.
[Section 4] Statistical evaluation (Section 4): no hypothesis tests, confidence intervals, or effect-size measures are reported for the precision/recall differences between heuristics and random; the claim of 'does not supersede' therefore rests on point estimates whose reliability is unknown.

minor comments (2)

[Abstract] The abstract states that 'IIA provides better recall than the other known impact analysis techniques' but the manuscript does not cite or tabulate the specific prior techniques and their reported recall values used for this comparison.
[Section 3] Notation for precision and recall is introduced without an explicit equation or pseudocode showing how true-positive impacted units are identified from the mined change data.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive feedback. We respond to each major comment below, indicating where revisions will be made to address the concerns while defending the core evaluation approach.

read point-by-point responses

Referee: [Section 3] Reenactment process (Section 3): the simulation imposes a fixed ordering and selection rule based solely on the heuristic score or random choice, but provides no mechanism for developers to interleave domain knowledge or terminate early; this directly undermines the central claim that the heuristics 'do not supersede' random inspection, as the outcome may be an artifact of the rigid model rather than a property of the heuristics.

Authors: The reenactment deliberately models a developer who follows only the heuristic (or random choice) for ordering inspections, with termination governed by the separate termination heuristic described in the paper. This isolates whether the propagation heuristics themselves provide value beyond random selection. Real developers may incorporate domain knowledge, but that is outside the scope of testing the heuristics in isolation. We will revise Section 3 to explicitly state this modeling rationale and its limitations. revision: partial
Referee: [Section 3.2] Random baseline definition (Section 3.2 and results in Section 4): the procedure for selecting units under the random condition is not fully specified (e.g., uniform over all units, over neighbors, or stratified), nor is the exact number of trials or variance reported; without this, the comparison that heuristics add no value cannot be verified and is load-bearing for the main conclusion.

Authors: We will revise Section 3.2 to fully specify the random selection procedure (including the population from which units are drawn and whether it is uniform), report the exact number of trials performed per change, and include variance information for the random baseline results. revision: yes
Referee: [Section 4] Statistical evaluation (Section 4): no hypothesis tests, confidence intervals, or effect-size measures are reported for the precision/recall differences between heuristics and random; the claim of 'does not supersede' therefore rests on point estimates whose reliability is unknown.

Authors: We agree that the lack of statistical analysis weakens the presentation of results. In the revision we will add hypothesis tests (such as non-parametric tests suitable for the data), confidence intervals, and effect-size measures for the precision and recall comparisons in Section 4. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical comparison to external random baseline

full rationale

The paper reports an empirical reenactment study that mines changes from open-source repositories and simulates developer inspection under propagation heuristics versus random ordering, measuring precision and recall. The central claim (heuristics do not supersede random inspection) is a direct observational comparison to an external baseline rather than any derivation, fitted parameter, or self-referential definition. No equations, ansatzes, or uniqueness theorems are invoked that reduce to the paper's own inputs or prior self-citations in a load-bearing way. The evaluation is therefore self-contained against the mined data and random control.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

The evaluation rests on standard assumptions in empirical software engineering about the validity of reenactment as a proxy for developer behavior and the representativeness of open source projects; no free parameters, invented entities, or non-standard axioms are described in the abstract.

pith-pipeline@v0.9.0 · 5736 in / 932 out tokens · 19412 ms · 2026-05-24T19:14:59.257993+00:00 · methodology

Evaluating Heuristics for Iterative Impact Analysis

Core claim

What carries the argument

If this is right

Where Pith is reading between the lines

Load-bearing premise

What would settle it

discussion (0)