TagDebt: A Bot to Support Technical Debt Management

Daniel Feitosa; Elisa Yumi Nakagawa; Jo\~ao Paulo Biazotto; Paris Avgeriou

arxiv: 2605.29869 · v1 · pith:I5VOIFEYnew · submitted 2026-05-28 · 💻 cs.SE

TagDebt: A Bot to Support Technical Debt Management

Jo\~ao Paulo Biazotto , Daniel Feitosa , Paris Avgeriou , Elisa Yumi Nakagawa This is my paper

Pith reviewed 2026-06-29 06:28 UTC · model grok-4.3

classification 💻 cs.SE

keywords technical debtself-admitted technical debtGitHub botissue labelingtechnical debt managementsoftware maintenancebot integration

0 comments

The pith

A GitHub bot can automatically label issues for self-admitted technical debt to aid management.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents TagDebt as a bot that integrates into GitHub repositories and assigns labels to issues marking them as containing self-admitted technical debt or not. This labeling is intended to surface technical debt early so teams can address it before it accumulates costs in maintainability. The authors evaluated the bot through interviews with sixteen practitioners using a technology acceptance model, finding that it helps organize issues and cuts down on manual tracking work. The study also notes that team size and codebase size shape whether teams choose to adopt the bot. The work positions TagDebt as an example of specialized tools that fit into existing development processes without requiring major changes.

Core claim

TagDebt is a proof-of-concept bot that integrates with GitHub to automatically label issues as SATD or non-SATD, making technical debt visible in standard issue trackers and supporting more efficient management without disrupting current workflows.

What carries the argument

The TagDebt bot, which automatically assigns SATD or non-SATD labels to GitHub issues.

If this is right

Technical debt items become visible directly in GitHub issue lists.
Teams spend less time manually scanning and tagging debt-related issues.
Adoption is more likely in smaller teams and smaller codebases.
The bot can serve as a starting point for adding code-level checks later.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Labeled issues could feed into automated alerts that prompt specific refactoring tasks.
Wider use might encourage teams to treat debt labels as a standard part of issue triage.
The approach could extend to other platforms if the labeling logic is made portable.

Load-bearing premise

Automatically labeling issues for self-admitted technical debt will cause practitioners to manage that debt more effectively than before.

What would settle it

A study that tracks whether teams using the labeled issues actually address or reduce technical debt items at a higher rate than teams without the labels.

read the original abstract

Context: Technical debt (TD) is a widely studied metaphor that helps to explain how sub-optimal decisions that can harm software maintainability over time. Although incurring TD is not intrinsically bad, tracking and managing TD are crucial to avoid its negative effects. Hence, researchers and practitioners have proposed and developed diverse approaches and tools for managing TD. However, we are still lacking specialized tools for technical debt management (TDM), specifically ones that can be easily integrated into existing development workflows. Objective: We present and evaluate TagDebt, a bot that can be integrated within GitHub repositories and automatically assign labels to issues (i.e., SATD or non-SATD). TagDebt helps in the identification of TD (i.e., by looking for self-admitted technical debt (SATD)), leading to more efficient TDM. Methods: We carried out a Design Science Research study to design and implement TagDebt. For its evaluation, we executed a Technology Acceptance Model (TAM) study through interviews with 16 practitioners, to check the bot's usefulness, ease of use, and contextual factors that might impact the bot's usage (such as team size and practitioners' roles). Results: Overall, practitioners found that TagDebt is useful, especially for organizing issues and reducing manual work. Furthermore, they pointed out that the bot is overall easy to use, and its documentation is clear. The analysis also revealed that contextual factors, such as team and codebase size, impact the decision to adopt TagDebt. Finally, several improvements were suggested, such as including features to check and update the source code. Conclusion: TagDebt is a proof-of-concept for the development and usage of more specialized tools for TDM. It helps to make TD visible without disrupting existing workflows and help practitioners avoid the risks of unmanaged TD.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

TagDebt ships a working GitHub bot for SATD labeling plus a small TAM interview study, but offers no accuracy numbers or outcome measures so the efficiency claim stays untested.

read the letter

The main thing here is a new GitHub bot that auto-labels issues as SATD or non-SATD, built through a design-science process and checked with 16 practitioner interviews on usefulness and ease of use. Practitioners said it helps organize issues and cuts manual work, and they liked the documentation.

What the paper does cleanly is deliver an integrated artifact that sits inside existing GitHub flows without extra steps. The interview results also surface practical points about team size and codebase size affecting adoption, which is the kind of detail that matters for tool papers.

The soft spot is the missing link between the bot and actual technical debt management gains. The abstract and stress-test note both show no precision, recall, or ground-truth check on the labels, and no before-after data on triage time or remediation rates. The TAM interviews capture perceived usefulness after a demo, but that does not test whether the labels are accurate enough or whether people act on them. Without those pieces the central claim that TagDebt leads to more efficient TDM rests on unverified assumptions.

This is a standard tool-building paper aimed at software engineering researchers who work on maintenance tools or SATD detection. Someone looking for implementation patterns or practitioner feedback on bot integration could get value from the details. It is coherent on its own terms and ships a reproducible artifact, so it deserves a serious referee even though the evaluation is narrow. I would send it for review with a request for classifier metrics and perhaps a small usage pilot.

Referee Report

3 major / 1 minor

Summary. The paper presents TagDebt, a GitHub-integrated bot that automatically labels issues as self-admitted technical debt (SATD) or non-SATD. Developed via Design Science Research, it is evaluated through a TAM interview study with 16 practitioners assessing usefulness, ease of use, and contextual factors (e.g., team size). Results indicate practitioners view it as useful for organizing issues and reducing manual effort, with suggestions for enhancements like source-code checks; the conclusion positions it as a proof-of-concept for specialized TDM tools.

Significance. A working GitHub bot for SATD labeling could reduce workflow disruption in TD management if the labeling is reliable. The TAM study supplies practitioner perspectives on adoption barriers, which is a positive step for applied SE research. However, the lack of any reported classifier performance data or objective TDM outcome measures substantially weakens the central claim that the bot produces more efficient technical debt management.

major comments (3)

[Evaluation/Results] Evaluation/Results sections: The TAM study measures only perceived usefulness and ease of use; no precision, recall, F1, or other accuracy metrics are reported for the SATD labeling component on any test set or ground-truth data. This directly undermines the claim that automatic labeling leads to more efficient TDM.
[Methods/Implementation] Methods/Implementation: No description is given of the detection algorithm, model, or rules used to generate SATD/non-SATD labels, nor any validation that the generated labels match human judgment. Without this, the usefulness findings rest on an untested premise about label quality.
[Results] Results: Practitioner statements about reduced manual work are not accompanied by any controlled or before/after metrics on triage time, remediation rates, or maintainability outcomes, leaving the efficiency claim unsupported.

minor comments (1)

[Abstract] The abstract states that 'several improvements were suggested' but does not enumerate them; listing the top suggestions with participant quotes would improve clarity.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive feedback highlighting areas where the evaluation and claims can be strengthened. We address each major comment below and will revise the manuscript accordingly to better align the claims with the study scope and evidence provided.

read point-by-point responses

Referee: [Evaluation/Results] Evaluation/Results sections: The TAM study measures only perceived usefulness and ease of use; no precision, recall, F1, or other accuracy metrics are reported for the SATD labeling component on any test set or ground-truth data. This directly undermines the claim that automatic labeling leads to more efficient technical debt management.

Authors: We agree that the evaluation is limited to perceived usefulness and ease of use via the TAM study with 16 practitioners, without reporting quantitative classifier metrics such as precision or recall. The manuscript frames TagDebt as a proof-of-concept whose primary contribution is the GitHub-integrated artifact and practitioner acceptance data, with efficiency benefits described based on self-reported reductions in manual effort. We will revise the abstract, introduction, and conclusion to explicitly qualify all efficiency claims as perceived rather than objectively measured, and add a limitations section noting the absence of labeling accuracy evaluation as future work. revision: yes
Referee: [Methods/Implementation] Methods/Implementation: No description is given of the detection algorithm, model, or rules used to generate SATD/non-SATD labels, nor any validation that the generated labels match human judgment. Without this, the usefulness findings rest on an untested premise about label quality.

Authors: The referee correctly notes the absence of a detailed description of the SATD detection approach. While the contribution centers on the bot's integration and TAM evaluation rather than novel detection techniques, we acknowledge that readers need to understand the labeling mechanism to assess the tool. In the revised manuscript we will add a dedicated subsection under Methods describing the current implementation (including any rules or model employed) and any internal validation performed during development. revision: yes
Referee: [Results] Results: Practitioner statements about reduced manual work are not accompanied by any controlled or before/after metrics on triage time, remediation rates, or maintainability outcomes, leaving the efficiency claim unsupported.

Authors: The study employed a qualitative TAM interview protocol focused on perceptions; no controlled experiments or quantitative outcome metrics (e.g., triage time) were collected. We will revise the Results and Discussion sections to present the practitioner statements strictly as perceptions, remove or qualify any implication of measured efficiency gains, and explicitly list the lack of objective TDM outcome measures as a limitation of the current evaluation design. revision: yes

Circularity Check

0 steps flagged

No circularity: engineering artifact evaluated via external practitioner feedback

full rationale

The paper describes the design and implementation of TagDebt (a GitHub bot for SATD labeling) and evaluates it solely through a TAM interview study (N=16). No equations, fitted parameters, predictions, or derivation chains exist. Claims about usefulness rest on direct interview data rather than any self-referential construction, self-citation load-bearing, or renaming of known results. The evaluation is self-contained against external benchmarks (practitioner responses) with no reduction of outputs to inputs by definition.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

This is an applied software engineering paper that presents a tool and reports interview results; it introduces no free parameters, mathematical axioms, or new postulated entities.

pith-pipeline@v0.9.1-grok · 5870 in / 1163 out tokens · 23116 ms · 2026-06-29T06:28:51.754654+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

2 extracted references · 2 canonical work pages

[1]

Frisch, Ian P

Alves, N.S.R., Mendes, T.S., Mendon¸ ca, M.G., Sp´ ınola, R.O., Shull, F., Seaman, C.: Identification and management of technical debt: A systematic mapping study. Information and Software Technology, 100–121 (2016) https://doi.org/10.1016/j. infsof.2015.10.008 Avgeriou, P., Ozkaya, I., Chatzigeorgiou, A., Ciolkowski, M., Ernst, N.A., Koontz, R.J., Poort,...

work page doi:10.1016/j 2016
[2]

https: //doi.org/10.1145/3558489.3559072 Zue, V.W., Glass, J.R.: Conversational interfaces: advances and challenges

Association for Computing Machinery, New York, NY, USA (2022). https: //doi.org/10.1145/3558489.3559072 Zue, V.W., Glass, J.R.: Conversational interfaces: advances and challenges. Proceed- ings of the IEEE88(8), 1166–1180 (2000) https://doi.org/10.1109/5.880078 59

work page doi:10.1145/3558489.3559072 2022

[1] [1]

Frisch, Ian P

Alves, N.S.R., Mendes, T.S., Mendon¸ ca, M.G., Sp´ ınola, R.O., Shull, F., Seaman, C.: Identification and management of technical debt: A systematic mapping study. Information and Software Technology, 100–121 (2016) https://doi.org/10.1016/j. infsof.2015.10.008 Avgeriou, P., Ozkaya, I., Chatzigeorgiou, A., Ciolkowski, M., Ernst, N.A., Koontz, R.J., Poort,...

work page doi:10.1016/j 2016

[2] [2]

https: //doi.org/10.1145/3558489.3559072 Zue, V.W., Glass, J.R.: Conversational interfaces: advances and challenges

Association for Computing Machinery, New York, NY, USA (2022). https: //doi.org/10.1145/3558489.3559072 Zue, V.W., Glass, J.R.: Conversational interfaces: advances and challenges. Proceed- ings of the IEEE88(8), 1166–1180 (2000) https://doi.org/10.1109/5.880078 59

work page doi:10.1145/3558489.3559072 2022