Research Artifacts in Secondary Studies: A Systematic Mapping in Software Engineering

Aleksi Huotala; Miikka Kuutila; Mika M\"antyl\"a

arxiv: 2504.12646 · v3 · submitted 2025-04-17 · 💻 cs.SE

Research Artifacts in Secondary Studies: A Systematic Mapping in Software Engineering

Aleksi Huotala , Miikka Kuutila , Mika M\"antyl\"a This is my paper

Pith reviewed 2026-05-22 20:12 UTC · model grok-4.3

classification 💻 cs.SE

keywords secondary studiesresearch artifactssystematic reviewssoftware engineeringreproducibilityopen sciencesystematic mapping study

0 comments

The pith

Only 31.5% of secondary studies in software engineering report research artifacts, though rates are rising over time.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper conducts a systematic mapping of 537 secondary studies in software engineering published from 2013 to 2023 to assess the reporting of research artifacts. It reveals that just 31.5 percent of these studies make artifacts available. While regression analysis shows a significant increase in availability across the years, the 2023 figures stand at 62 percent providing any artifact and only 30.4 percent using a permanent repository with a DOI. The authors argue that requiring artifact publication would strengthen transparency and reproducibility in the discipline.

Core claim

Through examination of 537 secondary studies, the authors establish that research artifacts appear in only 31.5% of cases overall. Availability has improved steadily, yet in the most recent year examined only 62.0% of studies offer artifacts and 30.4% store them in permanent repositories identified by DOIs. This pattern supports the conclusion that mandatory artifact publication in secondary studies would advance openness in software engineering research.

What carries the argument

A systematic mapping study that identifies, screens, and analyzes secondary studies for the presence and storage methods of research artifacts, incorporating temporal trend analysis via regression.

If this is right

Requiring artifacts would enable independent verification of secondary study findings.
Use of DOI-based repositories would reduce the risk of link rot and lost data.
Standardized artifact sharing could raise overall research quality in secondary studies.
Adoption might lead journals to update submission guidelines for systematic reviews.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Low artifact rates could similarly affect primary studies, warranting broader checks across software engineering publications.
Automated tools might scan for artifact mentions to flag non-compliant papers during review.
Comparison with other fields like medicine could reveal whether software engineering lags in open practices.

Load-bearing premise

The sample of 537 secondary studies identified is representative of the entire population of such studies in software engineering from 2013 to 2023, without major biases in search or selection.

What would settle it

Repeating the literature search and screening on an independent database or for a different time period and obtaining artifact availability rates that differ substantially from 31.5 percent overall or from the observed yearly trend.

read the original abstract

Context: Systematic reviews (SRs) summarize state-of-the-art evidence in science, including software engineering (SE). Objective: Our objective is to evaluate how SRs report research artifacts and to provide a comprehensive list of these artifacts. Method: We examined 537 secondary studies published between 2013 and 2023 to analyze the availability and reporting of research artifacts. Results: Our findings indicate that only 31.5% of the reviewed studies include research artifacts. Encouragingly, the situation is gradually improving, as our regression analysis shows a significant increase in the availability of research artifacts over time. However, in 2023, just 62.0% of secondary studies provide a research artifact while an even lower percentage, 30.4% use a permanent repository with a digital object identifier (DOI) for storage. Conclusion: To enhance transparency and reproducibility in SE research, we advocate for the mandatory publication of research artifacts in secondary studies.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 2 minor

Summary. The paper reports a systematic mapping of 537 secondary studies in software engineering published 2013–2023. It finds that only 31.5% of these studies include research artifacts, with the proportion rising over time (regression trend) yet reaching only 62.0% in 2023 and 30.4% using permanent DOI repositories. The authors conclude that mandatory artifact publication is needed to improve transparency and reproducibility, and they supply a list of observed artifacts.

Significance. If the sample proves representative and the classification reliable, the work supplies a large-scale empirical baseline on artifact reporting practices in SE secondary studies together with an enumerated catalog of artifact types. These data could directly inform journal policies, review checklists, and community guidelines aimed at raising reproducibility standards.

major comments (3)

[Methods] Methods section: the search strategy, exact query strings, list of databases, and complete inclusion/exclusion criteria are not reported, so the claim that the 537 studies are representative of all secondary studies 2013–2023 cannot be evaluated and the headline percentages (31.5 %, 62.0 %, 30.4 %) rest on an unverified sampling frame.
[Methods] Methods / Results sections: inter-rater reliability statistics (e.g., Cohen’s κ or percentage agreement) for the screening and artifact-classification steps are absent, leaving the reliability of the central counts and the 31.5 % figure unquantified.
[Results] Results section: the regression analysis demonstrating a “significant increase” over time provides no model specification, coefficient values, p-values, R², or checks for assumptions, preventing assessment of the trend claim that underpins the “gradually improving” narrative.

minor comments (2)

[Abstract] The abstract states the sample size and percentages but omits any methodological summary; adding one sentence on search scope and classification procedure would improve standalone readability.
[Results] Table or figure presenting the yearly artifact percentages should include the number of studies per year so readers can judge the stability of the 2023 figure (62.0 %).

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive comments, which highlight important areas for improving methodological transparency. We address each major comment below and will revise the manuscript to incorporate the requested details.

read point-by-point responses

Referee: [Methods] Methods section: the search strategy, exact query strings, list of databases, and complete inclusion/exclusion criteria are not reported, so the claim that the 537 studies are representative of all secondary studies 2013–2023 cannot be evaluated and the headline percentages (31.5 %, 62.0 %, 30.4 %) rest on an unverified sampling frame.

Authors: We agree that the Methods section requires explicit reporting of the search strategy, query strings, databases, and inclusion/exclusion criteria to allow evaluation of the sampling frame. In the revised manuscript we will add a dedicated subsection with these details, including the precise search strings employed across the databases and the full criteria applied to arrive at the 537 studies. revision: yes
Referee: [Methods] Methods / Results sections: inter-rater reliability statistics (e.g., Cohen’s κ or percentage agreement) for the screening and artifact-classification steps are absent, leaving the reliability of the central counts and the 31.5 % figure unquantified.

Authors: We acknowledge that inter-rater reliability statistics were not reported. The revised manuscript will include these metrics (Cohen’s κ and/or percentage agreement) for both the screening and artifact-classification phases, calculated from the independent reviews performed by the authors. revision: yes
Referee: [Results] Results section: the regression analysis demonstrating a “significant increase” over time provides no model specification, coefficient values, p-values, R², or checks for assumptions, preventing assessment of the trend claim that underpins the “gradually improving” narrative.

Authors: We agree that the regression results need fuller specification. The revised Results section will report the exact model (including whether logistic or linear regression was used), coefficient values, p-values, R², and any assumption checks performed, thereby substantiating the reported trend. revision: yes

Circularity Check

0 steps flagged

No circularity: direct empirical counts and regression on collected sample

full rationale

The paper performs a systematic mapping study that identifies 537 secondary studies via search and screening, then directly tabulates the proportion containing research artifacts (31.5 % overall, 62.0 % in 2023) and fits a standard regression to the observed yearly data to report a temporal trend. No equation, parameter, or claim reduces by construction to a fitted value defined from the same outputs; the reported statistics are simple aggregates and a linear trend fit on the screened corpus. The derivation chain consists of literature retrieval, inclusion decisions, and descriptive statistics, all of which are externally verifiable against the listed papers and therefore self-contained rather than tautological.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central percentages rest on the assumption that the search strategy captured a representative sample and that 'research artifact' was classified consistently; no free parameters or new entities are introduced.

axioms (1)

domain assumption The database search and screening process yields an unbiased sample of secondary studies published 2013-2023.
This premise underpins the claim that the observed 31.5% rate reflects actual practice rather than search artifacts.

pith-pipeline@v0.9.0 · 5701 in / 1331 out tokens · 51763 ms · 2026-05-22T20:12:01.348633+00:00 · methodology

Research Artifacts in Secondary Studies: A Systematic Mapping in Software Engineering

Core claim

What carries the argument

If this is right

Where Pith is reading between the lines

Load-bearing premise

What would settle it

discussion (0)