Research Artifacts in Secondary Studies: A Systematic Mapping in Software Engineering
Pith reviewed 2026-05-22 20:12 UTC · model grok-4.3
The pith
Only 31.5% of secondary studies in software engineering report research artifacts, though rates are rising over time.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Through examination of 537 secondary studies, the authors establish that research artifacts appear in only 31.5% of cases overall. Availability has improved steadily, yet in the most recent year examined only 62.0% of studies offer artifacts and 30.4% store them in permanent repositories identified by DOIs. This pattern supports the conclusion that mandatory artifact publication in secondary studies would advance openness in software engineering research.
What carries the argument
A systematic mapping study that identifies, screens, and analyzes secondary studies for the presence and storage methods of research artifacts, incorporating temporal trend analysis via regression.
If this is right
- Requiring artifacts would enable independent verification of secondary study findings.
- Use of DOI-based repositories would reduce the risk of link rot and lost data.
- Standardized artifact sharing could raise overall research quality in secondary studies.
- Adoption might lead journals to update submission guidelines for systematic reviews.
Where Pith is reading between the lines
- Low artifact rates could similarly affect primary studies, warranting broader checks across software engineering publications.
- Automated tools might scan for artifact mentions to flag non-compliant papers during review.
- Comparison with other fields like medicine could reveal whether software engineering lags in open practices.
Load-bearing premise
The sample of 537 secondary studies identified is representative of the entire population of such studies in software engineering from 2013 to 2023, without major biases in search or selection.
What would settle it
Repeating the literature search and screening on an independent database or for a different time period and obtaining artifact availability rates that differ substantially from 31.5 percent overall or from the observed yearly trend.
read the original abstract
Context: Systematic reviews (SRs) summarize state-of-the-art evidence in science, including software engineering (SE). Objective: Our objective is to evaluate how SRs report research artifacts and to provide a comprehensive list of these artifacts. Method: We examined 537 secondary studies published between 2013 and 2023 to analyze the availability and reporting of research artifacts. Results: Our findings indicate that only 31.5% of the reviewed studies include research artifacts. Encouragingly, the situation is gradually improving, as our regression analysis shows a significant increase in the availability of research artifacts over time. However, in 2023, just 62.0% of secondary studies provide a research artifact while an even lower percentage, 30.4% use a permanent repository with a digital object identifier (DOI) for storage. Conclusion: To enhance transparency and reproducibility in SE research, we advocate for the mandatory publication of research artifacts in secondary studies.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper reports a systematic mapping of 537 secondary studies in software engineering published 2013–2023. It finds that only 31.5% of these studies include research artifacts, with the proportion rising over time (regression trend) yet reaching only 62.0% in 2023 and 30.4% using permanent DOI repositories. The authors conclude that mandatory artifact publication is needed to improve transparency and reproducibility, and they supply a list of observed artifacts.
Significance. If the sample proves representative and the classification reliable, the work supplies a large-scale empirical baseline on artifact reporting practices in SE secondary studies together with an enumerated catalog of artifact types. These data could directly inform journal policies, review checklists, and community guidelines aimed at raising reproducibility standards.
major comments (3)
- [Methods] Methods section: the search strategy, exact query strings, list of databases, and complete inclusion/exclusion criteria are not reported, so the claim that the 537 studies are representative of all secondary studies 2013–2023 cannot be evaluated and the headline percentages (31.5 %, 62.0 %, 30.4 %) rest on an unverified sampling frame.
- [Methods] Methods / Results sections: inter-rater reliability statistics (e.g., Cohen’s κ or percentage agreement) for the screening and artifact-classification steps are absent, leaving the reliability of the central counts and the 31.5 % figure unquantified.
- [Results] Results section: the regression analysis demonstrating a “significant increase” over time provides no model specification, coefficient values, p-values, R², or checks for assumptions, preventing assessment of the trend claim that underpins the “gradually improving” narrative.
minor comments (2)
- [Abstract] The abstract states the sample size and percentages but omits any methodological summary; adding one sentence on search scope and classification procedure would improve standalone readability.
- [Results] Table or figure presenting the yearly artifact percentages should include the number of studies per year so readers can judge the stability of the 2023 figure (62.0 %).
Simulated Author's Rebuttal
We thank the referee for the constructive comments, which highlight important areas for improving methodological transparency. We address each major comment below and will revise the manuscript to incorporate the requested details.
read point-by-point responses
-
Referee: [Methods] Methods section: the search strategy, exact query strings, list of databases, and complete inclusion/exclusion criteria are not reported, so the claim that the 537 studies are representative of all secondary studies 2013–2023 cannot be evaluated and the headline percentages (31.5 %, 62.0 %, 30.4 %) rest on an unverified sampling frame.
Authors: We agree that the Methods section requires explicit reporting of the search strategy, query strings, databases, and inclusion/exclusion criteria to allow evaluation of the sampling frame. In the revised manuscript we will add a dedicated subsection with these details, including the precise search strings employed across the databases and the full criteria applied to arrive at the 537 studies. revision: yes
-
Referee: [Methods] Methods / Results sections: inter-rater reliability statistics (e.g., Cohen’s κ or percentage agreement) for the screening and artifact-classification steps are absent, leaving the reliability of the central counts and the 31.5 % figure unquantified.
Authors: We acknowledge that inter-rater reliability statistics were not reported. The revised manuscript will include these metrics (Cohen’s κ and/or percentage agreement) for both the screening and artifact-classification phases, calculated from the independent reviews performed by the authors. revision: yes
-
Referee: [Results] Results section: the regression analysis demonstrating a “significant increase” over time provides no model specification, coefficient values, p-values, R², or checks for assumptions, preventing assessment of the trend claim that underpins the “gradually improving” narrative.
Authors: We agree that the regression results need fuller specification. The revised Results section will report the exact model (including whether logistic or linear regression was used), coefficient values, p-values, R², and any assumption checks performed, thereby substantiating the reported trend. revision: yes
Circularity Check
No circularity: direct empirical counts and regression on collected sample
full rationale
The paper performs a systematic mapping study that identifies 537 secondary studies via search and screening, then directly tabulates the proportion containing research artifacts (31.5 % overall, 62.0 % in 2023) and fits a standard regression to the observed yearly data to report a temporal trend. No equation, parameter, or claim reduces by construction to a fitted value defined from the same outputs; the reported statistics are simple aggregates and a linear trend fit on the screened corpus. The derivation chain consists of literature retrieval, inclusion decisions, and descriptive statistics, all of which are externally verifiable against the listed papers and therefore self-contained rather than tautological.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption The database search and screening process yields an unbiased sample of secondary studies published 2013-2023.
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.