Toward Semantically-Seeded, Graph-Propagated Impact Analysis Across Software Artifacts: A Vision

Momil Seedat

arxiv: 2606.18855 · v1 · pith:LRRK54PHnew · submitted 2026-06-17 · 💻 cs.SE

Toward Semantically-Seeded, Graph-Propagated Impact Analysis Across Software Artifacts: A Vision

Momil Seedat This is my paper

Pith reviewed 2026-06-26 20:17 UTC · model grok-4.3

classification 💻 cs.SE

keywords change impact analysissemantic similaritygraph propagationsoftware traceabilityheterogeneous graphsimpact analysissoftware artifacts

0 comments

The pith

Fusing semantic similarity with graph propagation recovers software change impacts missed by either method alone.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Change impact analysis tools typically use either text similarity or structural dependencies in isolation, each leaving distinct blind spots. The paper proposes a training-free analyzer that constructs a heterogeneous graph of artifacts connected by typed edges from static analysis, computes a semantic prior via cosine similarity to the changed artifact, propagates impact scores multi-hop with decay, and blends the signals using a single weight lambda. A proof-of-concept on five labelled scenarios in a payment subsystem demonstrates recovery of zero-textual-overlap artifacts via propagation and non-adjacent helper functions via the semantic layer. Lambda is shown to act as an explicit precision-recall control. The same blended formulation is presented as extending to operational artifacts such as images and metrics.

Core claim

The only configuration that covers both the vocabulary-blind and the edge-blind cases is the fusion of a semantic prior and multi-hop graph propagation blended by a single weight lambda on a heterogeneous artifact graph.

What carries the argument

A heterogeneous artifact graph with typed edges, a semantic prior from cosine similarity on embeddings, multi-hop propagation with decay over a row-normalized matrix, and a tunable blend weight lambda.

If this is right

Artifacts with no shared vocabulary are recovered through propagation paths.
Artifacts related in meaning but without a connecting edge are recovered through the semantic prior.
Analysis extends across requirements, configurations, services and tests rather than code alone.
Lambda supplies an explicit and interpretable knob for the precision-recall trade-off.
The same structure applies to non-code operational artifacts such as images, metrics and dashboards.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The approach could be embedded in development environments to surface impact warnings during edits.
Historical change data from public repositories could be used to test whether lambda values generalize across projects.
Adding runtime metrics to the graph might directly link code changes to observed operational shifts.

Load-bearing premise

A complete heterogeneous graph with typed edges can be constructed across the full requirement-config-service-test chain and extended to operational artifacts using static analysis.

What would settle it

A comparison on additional systems with labelled change scenarios that measures whether the fused scores outperform pure semantic and pure propagation baselines specifically on zero-overlap and zero-edge impacts.

read the original abstract

When a single software artifact changes - a requirement, a configuration value, or a function - engineers must determine what else is impacted. Existing change-impact-analysis (CIA) tooling tends to rely on one of two signals in isolation: semantic similarity recovered from text (information-retrieval traceability, code search, embeddings), or structural dependency following (call graphs, IDE "find usages", test-impact selection). Each has a characteristic blind spot. A semantically driven tool misses an impacted artifact whose text shares no vocabulary with the change; a structurally driven tool misses artifacts related in meaning but not joined by an edge, and most operate only over code rather than the Requirement-Config-Service-Test chain. We argue for a training-free and interpretable analyzer that fuses both signals over the same embeddings. We model the system as a heterogeneous artifact graph with typed edges recovered by static analysis, compute a semantic prior by cosine similarity to the changed artifact, propagate impact multi-hop with decay over a row-normalized propagation matrix, and blend the two with a single tunable weight lambda. A small but complete proof-of-concept on a payment subsystem (5 labelled change scenarios) shows the mechanism we care about: artifacts with zero textual overlap with the change are still recovered through propagation, and helper functions that propagation alone cannot reach are recovered through the semantic layer. The fusion is the only configuration that covers both blind spots, and lambda acts as an explicit precision/recall control. Drawing on four publicly documented production failures, we argue that the same formulation extends to operational artifacts (images, metrics, dashboards, data schemas) that code-only analysis cannot reach.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

This vision paper sketches a lambda-blended fusion of embeddings and graph propagation for change impact analysis across artifacts, with a clean POC on five scenarios, but the evidence is only illustrative and the extension past code artifacts lacks a construction method.

read the letter

This paper proposes fusing semantic embeddings and graph propagation for change impact analysis across software artifacts beyond just code. The small POC on a payment subsystem with five labelled scenarios illustrates how the blend recovers both zero-text-overlap artifacts via propagation and non-connected ones via semantics.

The new part is the specific multi-hop decay propagation blended with lambda on a heterogeneous graph that includes requirements and configs. The POC does a good job showing the mechanism the authors care about, and lambda as a precision-recall knob is a straightforward idea that could be useful in practice.

It engages honestly with the limitations of single-signal approaches and draws on real production failures for motivation. The architecture is training-free and interpretable, which is a plus for adoption.

The main weakness is that the evidence stays qualitative. No metrics, baselines, or statistical tests are provided, so the assertion that fusion is the only way to cover both blind spots rests on the examples alone. The extension to images, metrics, and other operational artifacts is asserted based on the failures, but the graph construction is only described using static analysis for code. No procedure is given for building typed edges on non-code artifacts, which undercuts the heterogeneous graph claim for the full scope. This is a significant gap for the vision as presented.

The math is simple row-normalized propagation and cosine similarity, nothing that requires new derivation. The citations look appropriate and cover the relevant literature on information retrieval traceability and dependency-based CIA.

This is for colleagues working on software maintenance and traceability. A reader looking for ideas on combining signals could extract the lambda blending and the POC structure for their own experiments.

I would not push for peer review in a standard empirical track because the soundness is limited without numbers or systematic evaluation. It could fit a vision or ideas track where the architectural proposal is the main point.

Referee Report

2 major / 1 minor

Summary. The manuscript presents a vision for a change-impact analysis (CIA) tool that fuses semantic similarity from text embeddings with structural dependency propagation on a heterogeneous artifact graph. The approach computes a semantic prior via cosine similarity, propagates impact with multi-hop decay on a row-normalized matrix, and blends them using a tunable parameter lambda. A proof-of-concept on a payment subsystem with 5 change scenarios demonstrates recovery of artifacts with no textual overlap via propagation and unreachable ones via semantics. The authors argue that this fusion addresses blind spots of pure semantic or structural methods and extends to the full Requirement-Config-Service-Test chain plus operational artifacts.

Significance. If the graph-construction premise can be realized consistently, the method would supply a training-free, interpretable CIA approach that explicitly combines complementary signals, with lambda serving as a direct precision/recall knob. The POC concretely illustrates recovery of zero-overlap and unreachable artifacts, a useful demonstration for a vision paper. The training-free property and explicit control parameter are additional strengths.

major comments (2)

[Abstract] Abstract: the claim that 'the same formulation extends to operational artifacts (images, metrics, dashboards, data schemas)' is load-bearing for the stated scope, yet the manuscript supplies no procedure for recovering typed edges on these non-code artifacts while depending on static analysis, which is code-centric; this leaves the heterogeneous-graph premise unsupported beyond code.
[POC description] POC description (payment subsystem, 5 labelled scenarios): the demonstration that 'the fusion is the only configuration that covers both blind spots' rests on qualitative illustration alone; no quantitative metrics, baselines, statistical tests, or error analysis are reported, weakening the cross-configuration claim.

minor comments (1)

The propagation matrix and lambda-blending formula would benefit from explicit equations or pseudocode to support reproducibility of the described mechanism.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive comments on this vision paper. We address each major point below, with clarifications on scope and the illustrative nature of the POC.

read point-by-point responses

Referee: [Abstract] Abstract: the claim that 'the same formulation extends to operational artifacts (images, metrics, dashboards, data schemas)' is load-bearing for the stated scope, yet the manuscript supplies no procedure for recovering typed edges on these non-code artifacts while depending on static analysis, which is code-centric; this leaves the heterogeneous-graph premise unsupported beyond code.

Authors: We agree the extension claim requires care. The manuscript is a vision paper whose core technical contribution is the semantic-prior-plus-propagation fusion on a heterogeneous artifact graph constructed via static analysis for code and requirements. The operational-artifact extension is presented as an argument drawn from four documented production failures rather than a completed procedure; we do not claim a general method for typed-edge recovery on images or dashboards. In revision we will tighten the abstract and introduction to separate the realized fusion mechanism from the prospective extension, making explicit that non-code edge recovery remains future work. revision: partial
Referee: [POC description] POC description (payment subsystem, 5 labelled scenarios): the demonstration that 'the fusion is the only configuration that covers both blind spots' rests on qualitative illustration alone; no quantitative metrics, baselines, statistical tests, or error analysis are reported, weakening the cross-configuration claim.

Authors: The POC is deliberately small and qualitative to exhibit the two complementary failure modes the fusion is designed to address (zero-overlap artifacts recovered only by propagation; unreachable helpers recovered only by the semantic prior). Because the paper is a vision piece, a full benchmark suite with statistical tests lies outside its scope. We will nevertheless add a compact summary table in the revised manuscript that tabulates, for each of the five scenarios, which artifacts are recovered under pure semantics, pure propagation, and the blended formulation, thereby making the coverage claim more explicit without overstating the evaluation. revision: yes

Circularity Check

0 steps flagged

No circularity: architectural vision with illustrative POC, no derivations or fitted models

full rationale

The paper presents a vision for fusing semantic and structural signals in change-impact analysis via a heterogeneous graph, cosine prior, propagation, and lambda blend. No equations, parameters, or predictions are derived; the POC is explicitly described as illustrative of mechanism on code artifacts rather than a fitted result. No self-citations, uniqueness theorems, or ansatzes are invoked as load-bearing. The central claim reduces to an architectural proposal whose extension to operational artifacts is stated as an argument from examples, not a reduction to prior inputs. This is self-contained against external benchmarks as a forward-looking design sketch.

Axiom & Free-Parameter Ledger

1 free parameters · 3 axioms · 0 invented entities

The proposal rests on standard domain assumptions from information retrieval and graph analysis plus one explicit tunable parameter; no new entities are postulated.

free parameters (1)

lambda
Single tunable weight that blends the semantic cosine-similarity prior with the propagated scores and is described as controlling precision versus recall.

axioms (3)

domain assumption Cosine similarity on embeddings supplies a meaningful semantic prior for impact relevance
Invoked when computing the initial semantic scores before propagation.
domain assumption Row-normalized propagation matrix with decay models multi-hop impact propagation
Used to spread impact across the heterogeneous graph.
domain assumption Typed edges recovered by static analysis accurately represent dependencies across requirement, config, service, and test artifacts
Required to build the graph that enables propagation beyond code.

pith-pipeline@v0.9.1-grok · 5821 in / 1598 out tokens · 30677 ms · 2026-06-26T20:17:03.882417+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

18 extracted references · 4 canonical work pages · 2 internal anchors

[1]

S. Lehnert. A taxonomy for software change impact analysis. InProc. IWPSE-EVOL ’11, pp. 41–50. ACM, 2011. doi:10.1145/2024445.2024454

work page doi:10.1145/2024445.2024454 2011
[2]

Antoniol, G

G. Antoniol, G. Canfora, G. Casazza, A. De Lucia, and E. Merlo. Recovering traceability links between code and documentation. IEEE Trans. Software Eng., 28(10):970–983, 2002

2002
[3]

Marcus and J

A. Marcus and J. I. Maletic. Recovering documentation-to-source-code traceability links using latent semantic indexing. In Proc. ICSE ’03, pp. 125–135. IEEE, 2003

2003
[4]

Z. Feng, D. Guo, D. Tang, N. Duan, X. Feng, M. Gong, L. Shou, B. Qin, T. Liu, D. Jiang, and M. Zhou. CodeBERT: A pre-trained model for programming and natural languages. InFindings of EMNLP 2020, pp. 1536–1547. ACL, 2020

2020
[5]

Reimers and I

N. Reimers and I. Gurevych. Sentence-BERT: Sentence embeddings using Siamese BERT-networks. InProc. EMNLP-IJCNLP 2019, pp. 3982–3992. ACL, 2019

2019
[6]

M. Weiser. Program slicing. InProc. ICSE ’81, pp. 439–449. IEEE, 1981

1981
[7]

T. Reps, S. Horwitz, and M. Sagiv. Precise interprocedural dataflow analysis via graph reachability. InProc. POPL ’95, pp. 49–61. ACM, 1995. doi:10.1145/199448.199462

work page doi:10.1145/199448.199462 1995
[8]

Rothermel and M

G. Rothermel and M. J. Harrold. Analyzing regression test selection techniques.IEEE Trans. Software Eng., 22(8):529–551, 1996

1996
[9]

MacCormack, J

A. MacCormack, J. Rusnak, and C. Y . Baldwin. Exploring the structure of complex software designs: An empirical study of open source and proprietary code.Management Science, 52(7):1015–1030, 2006

2006
[10]

Zimmermann, P

T. Zimmermann, P. Weißgerber, S. Diehl, and A. Zeller. Mining version histories to guide software changes. InProc. ICSE ’04, pp. 563–572. IEEE, 2004

2004
[11]

Learning to Represent Programs with Graphs

M. Allamanis, M. Brockschmidt, and M. Khademi. Learning to represent programs with graphs. InProc. ICLR 2018. arXiv:1711.00740

work page internal anchor Pith review Pith/arXiv arXiv 2018
[12]

D. Edge, H. Trinh, N. Cheng, J. Bradley, A. Chao, A. Mody, S. Truitt, D. Metropolitansky, R. O. Ness, and J. Larson. From local to global: A Graph RAG approach to query-focused summarization. arXiv:2404.16130, 2024

work page internal anchor Pith review Pith/arXiv arXiv 2024
[13]

L. Katz. A new status index derived from sociometric analysis.Psychometrika, 18(1):39–43, 1953

1953
[14]

L. Page, S. Brin, R. Motwani, and T. Winograd. The PageRank citation ranking: Bringing order to the web. Technical Report 1999-66, Stanford InfoLab, 1999

1999
[15]

T. H. Haveliwala. Topic-sensitive PageRank. InProc. WWW 2002, pp. 517–526. ACM, 2002

2002
[16]

2023-03-08 incident: Infrastructure connectivity issue affect- ing multiple regions

Datadog Engineering. 2023-03-08 incident: Infrastructure connectivity issue affect- ing multiple regions. Datadog Blog, 2023. https://www.datadoghq.com/blog/ 2023-03-08-multiregion-infrastructure-connectivity-issue/

2023
[17]

Postmortem for Aurora Postgres migration, November 23, 2022

RevenueCat Engineering. Postmortem for Aurora Postgres migration, November 23, 2022. RevenueCat Blog, 2022. https://www.revenuecat.com/blog/engineering/postmortem-aurora-postgres-migration/

2022
[18]

L. Mierzwa. Monitoring our monitoring: how we validate our Prometheus alert rules. Cloudflare Blog, 2022. https: //blog.cloudflare.com/monitoring-our-monitoring/ 7

2022

[1] [1]

S. Lehnert. A taxonomy for software change impact analysis. InProc. IWPSE-EVOL ’11, pp. 41–50. ACM, 2011. doi:10.1145/2024445.2024454

work page doi:10.1145/2024445.2024454 2011

[2] [2]

Antoniol, G

G. Antoniol, G. Canfora, G. Casazza, A. De Lucia, and E. Merlo. Recovering traceability links between code and documentation. IEEE Trans. Software Eng., 28(10):970–983, 2002

2002

[3] [3]

Marcus and J

A. Marcus and J. I. Maletic. Recovering documentation-to-source-code traceability links using latent semantic indexing. In Proc. ICSE ’03, pp. 125–135. IEEE, 2003

2003

[4] [4]

Z. Feng, D. Guo, D. Tang, N. Duan, X. Feng, M. Gong, L. Shou, B. Qin, T. Liu, D. Jiang, and M. Zhou. CodeBERT: A pre-trained model for programming and natural languages. InFindings of EMNLP 2020, pp. 1536–1547. ACL, 2020

2020

[5] [5]

Reimers and I

N. Reimers and I. Gurevych. Sentence-BERT: Sentence embeddings using Siamese BERT-networks. InProc. EMNLP-IJCNLP 2019, pp. 3982–3992. ACL, 2019

2019

[6] [6]

M. Weiser. Program slicing. InProc. ICSE ’81, pp. 439–449. IEEE, 1981

1981

[7] [7]

T. Reps, S. Horwitz, and M. Sagiv. Precise interprocedural dataflow analysis via graph reachability. InProc. POPL ’95, pp. 49–61. ACM, 1995. doi:10.1145/199448.199462

work page doi:10.1145/199448.199462 1995

[8] [8]

Rothermel and M

G. Rothermel and M. J. Harrold. Analyzing regression test selection techniques.IEEE Trans. Software Eng., 22(8):529–551, 1996

1996

[9] [9]

MacCormack, J

A. MacCormack, J. Rusnak, and C. Y . Baldwin. Exploring the structure of complex software designs: An empirical study of open source and proprietary code.Management Science, 52(7):1015–1030, 2006

2006

[10] [10]

Zimmermann, P

T. Zimmermann, P. Weißgerber, S. Diehl, and A. Zeller. Mining version histories to guide software changes. InProc. ICSE ’04, pp. 563–572. IEEE, 2004

2004

[11] [11]

Learning to Represent Programs with Graphs

M. Allamanis, M. Brockschmidt, and M. Khademi. Learning to represent programs with graphs. InProc. ICLR 2018. arXiv:1711.00740

work page internal anchor Pith review Pith/arXiv arXiv 2018

[12] [12]

D. Edge, H. Trinh, N. Cheng, J. Bradley, A. Chao, A. Mody, S. Truitt, D. Metropolitansky, R. O. Ness, and J. Larson. From local to global: A Graph RAG approach to query-focused summarization. arXiv:2404.16130, 2024

work page internal anchor Pith review Pith/arXiv arXiv 2024

[13] [13]

L. Katz. A new status index derived from sociometric analysis.Psychometrika, 18(1):39–43, 1953

1953

[14] [14]

L. Page, S. Brin, R. Motwani, and T. Winograd. The PageRank citation ranking: Bringing order to the web. Technical Report 1999-66, Stanford InfoLab, 1999

1999

[15] [15]

T. H. Haveliwala. Topic-sensitive PageRank. InProc. WWW 2002, pp. 517–526. ACM, 2002

2002

[16] [16]

2023-03-08 incident: Infrastructure connectivity issue affect- ing multiple regions

Datadog Engineering. 2023-03-08 incident: Infrastructure connectivity issue affect- ing multiple regions. Datadog Blog, 2023. https://www.datadoghq.com/blog/ 2023-03-08-multiregion-infrastructure-connectivity-issue/

2023

[17] [17]

Postmortem for Aurora Postgres migration, November 23, 2022

RevenueCat Engineering. Postmortem for Aurora Postgres migration, November 23, 2022. RevenueCat Blog, 2022. https://www.revenuecat.com/blog/engineering/postmortem-aurora-postgres-migration/

2022

[18] [18]

L. Mierzwa. Monitoring our monitoring: how we validate our Prometheus alert rules. Cloudflare Blog, 2022. https: //blog.cloudflare.com/monitoring-our-monitoring/ 7

2022