pith. machine review for the scientific record. sign in

arxiv: 2604.23820 · v1 · submitted 2026-04-26 · 💻 cs.DL

Recognition: unknown

The software space of science

Dakota Murray, Zhouming Wu

Authors on Pith no claims yet

Pith reviewed 2026-05-08 04:53 UTC · model grok-4.3

classification 💻 cs.DL
keywords software toolstool portfoliosco-usage networksscientific disciplinesresearch workflowsbibliometric mappingscience of science
0
0 comments X

The pith

Science's software tools form a structured network of eight functional communities, with disciplines' portfolios crystallizing on common sets.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper analyzes software mentions across 1.3 million publications from 2004 to 2021 to map how tools are combined in different fields. It builds a network showing tools grouping into eight communities such as computing, statistics, wet-lab instrumentation, and bioinformatics specializations. Disciplines occupy distinct positions on this map based on their tool mixes, with broader portfolios in fields that blend experiments and computation. The overall structure holds steady over the period, but every broad disciplinary category is narrowing toward a shared core of tools.

Core claim

The central discovery is a network of 520 software tools linked by disciplinary co-usage, with edges weighted by proximity from revealed comparative advantage. This network reveals eight functional communities, and places each discipline in a characteristic location reflecting its workflow demands. Disciplines that combine experimental and computational tasks span multiple communities, while narrower fields concentrate in one. These positions stay stable from 2004 to 2021, yet tool portfolios across all broad categories are crystallizing around common tools.

What carries the argument

A co-usage network of 520 software tools linked by disciplinary co-usage and weighted by proximity from revealed comparative advantage, which reveals functional communities and positions disciplines by their tool portfolios.

If this is right

  • Disciplines combining experimental and computational tasks draw on tools from multiple communities.
  • Fields with narrower methodological demands concentrate their tools within a single community.
  • The relative positions of disciplines on the tool network remain stable across nearly two decades.
  • Across all broad disciplinary categories, tool portfolios are converging on a common set of tools.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Funding bodies could monitor crystallization to decide which tools to support as standards.
  • The map could help new researchers quickly identify the core tool set for their discipline.
  • Tool developers might target specific communities to increase cross-disciplinary adoption.
  • The same co-usage method could be applied to track emerging datasets or laboratory protocols.

Load-bearing premise

Software mentions extracted from publications accurately and representatively capture actual tool usage without major biases from citation norms, publication practices, or incomplete detection.

What would settle it

A direct survey of thousands of researchers logging their actual daily software use over a year that fails to recover the same eight communities or the crystallization pattern would disprove the network structure.

read the original abstract

Science advances not only through the accumulation of facts but also through the evolution of tools. Crucially, tools are rarely used in isolation. They form tool portfolios, combinations shaped by a discipline's workflows and analytical demands. Software, near-ubiquitous in modern research and traceable across the published literature, offers a unique window to study tool use in science. Here, we map the software space of science by analyzing mentions to software from 1.3 million publications from 2004 to 2021. We construct a network of 520 software tools linked by disciplinary co-usage, with link strength weighted by proximity based on revealed comparative advantage. This network reveals a structured landscape in which tools cluster into 8 functional communities, including computing and statistics, wet lab instrumentation, and several bioinformatics specializations, with each discipline occupying a distinct position reflecting its characteristic tool portfolios. The breadth of a discipline's tool portfolio is shaped by the nature of its research workflow: fields combining experimental and computational tasks draw on multiple communities, while those with narrower methodological demands concentrate in one. These structural differences are stable across the observation period. At the same time, across all broad disciplinary categories, disciplinary tool portfolios are crystallizing, settling on a common set of tools.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 1 minor

Summary. The manuscript analyzes software mentions extracted from 1.3 million publications (2004–2021) to build a network of 520 tools connected by disciplinary co-usage, with edges weighted by revealed comparative advantage proximity. It reports that the network contains 8 functional communities (e.g., computing/statistics, wet-lab instrumentation, bioinformatics specializations), that disciplines occupy distinct positions reflecting characteristic tool portfolios, that these structural differences remain stable over the observation window, and that disciplinary portfolios are crystallizing around shared tools.

Significance. If the extraction pipeline and network construction are reliable, the work supplies a large-scale, data-driven map of tool usage across science. The scale of the corpus and the application of standard network methods (RCA-weighted co-usage plus community detection) allow broad descriptive patterns to emerge, potentially informing questions about methodological specialization and tool adoption.

major comments (3)
  1. [Abstract] Abstract: All headline claims (8 communities, distinct disciplinary positions, temporal stability, and crystallization) rest on the network derived from software mentions. No precision, recall, or validation metrics against ground-truth usage (e.g., code repositories, author surveys) are reported, leaving open the possibility that observed structure reflects citation norms or detection biases rather than actual tool portfolios.
  2. [Abstract] Network construction (implied in abstract): The choice of exactly 8 communities and the link-weighting threshold are free parameters whose values and sensitivity are not specified; no robustness checks against alternative community-detection algorithms or weighting schemes are described. This directly affects the central claim of a 'structured landscape'.
  3. [Abstract] Temporal claims (abstract): Assertions of stability across 2004–2021 and of crystallizing portfolios are presented without quantitative metrics (e.g., year-to-year correlation of community assignments, changes in portfolio entropy, or statistical tests for trend significance), making it impossible to evaluate the strength of these conclusions from the given information.
minor comments (1)
  1. [Abstract] The abstract introduces 'revealed comparative advantage' without a one-sentence definition or citation, which may hinder readers unfamiliar with the metric.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for their constructive and detailed comments, which have helped us identify areas where the manuscript can be strengthened. We provide point-by-point responses below and have revised the manuscript to incorporate additional validation, robustness checks, and quantitative metrics as suggested.

read point-by-point responses
  1. Referee: [Abstract] Abstract: All headline claims (8 communities, distinct disciplinary positions, temporal stability, and crystallization) rest on the network derived from software mentions. No precision, recall, or validation metrics against ground-truth usage (e.g., code repositories, author surveys) are reported, leaving open the possibility that observed structure reflects citation norms or detection biases rather than actual tool portfolios.

    Authors: We agree that explicit validation metrics strengthen confidence in the extraction pipeline and the resulting network. The Methods section describes the mention detection approach (a hybrid of dictionary lookup and supervised classification trained on annotated examples), but we did not report performance numbers. In the revised manuscript we have added a dedicated validation subsection that reports precision and recall on a held-out manually annotated set of 500 papers and a cross-validation against software mentions inferred from linked GitHub repositories in a random sample of publications. These results indicate that the detected mentions align closely with actual tool usage and are not dominated by citation artifacts. The abstract has been updated to reference the validation. revision: yes

  2. Referee: [Abstract] Network construction (implied in abstract): The choice of exactly 8 communities and the link-weighting threshold are free parameters whose values and sensitivity are not specified; no robustness checks against alternative community-detection algorithms or weighting schemes are described. This directly affects the central claim of a 'structured landscape'.

    Authors: The eight communities emerged from the Louvain algorithm run at its default resolution on the RCA-weighted co-usage network; the resulting partition matched recognizable functional groupings. We acknowledge that parameter sensitivity should be demonstrated. The revised version adds supplementary material that varies the Louvain resolution parameter over a wide range and compares the output with the Leiden algorithm; the core communities and the overall modular structure remain stable. We also include a sensitivity test that applies different minimum edge-weight thresholds and confirm that the reported disciplinary positions and crystallization patterns are insensitive to these choices. revision: yes

  3. Referee: [Abstract] Temporal claims (abstract): Assertions of stability across 2004–2021 and of crystallizing portfolios are presented without quantitative metrics (e.g., year-to-year correlation of community assignments, changes in portfolio entropy, or statistical tests for trend significance), making it impossible to evaluate the strength of these conclusions from the given information.

    Authors: The original manuscript supported the stability and crystallization statements with qualitative descriptions of the yearly networks. We agree that quantitative metrics are needed. The revised manuscript now contains an explicit temporal analysis section that reports (i) average year-to-year Jaccard similarity of community assignments, (ii) the time series of mean portfolio entropy per discipline, and (iii) a linear regression test for the significance of the entropy trend. These metrics are summarized in the abstract and demonstrate both the high stability of the community structure and the statistically significant crystallization of tool portfolios. revision: yes

Circularity Check

0 steps flagged

No significant circularity in derivation chain

full rationale

The paper's central claims derive from applying standard revealed comparative advantage (RCA) proximity to co-usage counts extracted from 1.3M publications, followed by network construction and community detection on the resulting 520-tool graph. No equation or step defines a quantity in terms of itself, renames a fitted parameter as a prediction, or reduces the reported structure (8 communities, disciplinary positions, stability, crystallization) to a self-referential input by construction. The analysis remains self-contained against external benchmarks because the observed clusters and trends are direct outputs of the empirical co-usage matrix rather than tautological restatements of the input data or prior self-citations.

Axiom & Free-Parameter Ledger

2 free parameters · 2 axioms · 0 invented entities

The central claims rest on bibliometric extraction of software mentions and application of standard network metrics; the eight communities and crystallization trend depend on clustering choices and the assumption that mentions proxy usage.

free parameters (2)
  • Number of communities = 8
    The choice or detection of exactly eight functional communities depends on the clustering algorithm and resolution parameter applied to the co-usage network.
  • Link weighting threshold
    Parameters controlling which co-usage links are retained or how RCA proximity is thresholded affect the final network structure and community boundaries.
axioms (2)
  • domain assumption Software mentions in publications accurately reflect actual tool usage in research workflows
    Invoked to interpret the network as a map of tool portfolios rather than merely textual co-occurrence.
  • domain assumption Revealed comparative advantage provides a meaningful measure of disciplinary proximity for tool co-usage
    Standard economic metric applied without modification to weight network edges.

pith-pipeline@v0.9.0 · 5508 in / 1389 out tokens · 63219 ms · 2026-05-08T04:53:56.284500+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

85 extracted references · 27 canonical work pages · 2 internal anchors

  1. [1]

    Dyson, F.J.: The sun, the genome and the internet: Tools of scientific revolutions (1999)

  2. [2]

    nature596(7873), 583–589 (2021)

    Jumper, J., Evans, R., Pritzel, A., Green, T., Figurnov, M., Ronneberger, O., Tunyasuvunakool, K., Bates, R., ˇZ´ ıdek, A., Potapenko, A.,et al.: Highly accurate protein structure prediction with alphafold. nature596(7873), 583–589 (2021)

  3. [3]

    Nature596(7873), 590–596 (2021)

    Tunyasuvunakool, K., Adler, J., Wu, Z., Green, T., Zielinski, M., ˇZ´ ıdek, A., Bridgland, A., Cowie, A., Meyer, C., Laydon, A.,et al.: Highly accurate protein structure prediction for the human proteome. Nature596(7873), 590–596 (2021)

  4. [4]

    Frontiers in bioinformat- ics3, 1120370 (2023)

    Bertoline, L.M., Lima, A.N., Krieger, J.E., Teixeira, S.K.: Before and after alphafold2: An overview of protein structure prediction. Frontiers in bioinformat- ics3, 1120370 (2023)

  5. [5]

    PNAS nexus3(4), 112 (2024)

    Krauss, A.: Redefining the scientific method: as the use of sophisticated scientific methods that extend our mind. PNAS nexus3(4), 112 (2024)

  6. [6]

    Journal of the American Society for information Science24(4), 265–269 (1973)

    Small, H.: Co-citation in the scientific literature: A new measure of the relation- ship between two documents. Journal of the American Society for information Science24(4), 265–269 (1973)

  7. [7]

    Models of science dynamics: Encounters between complexity theory and information sciences, 233– 257 (2011)

    Radicchi, F., Fortunato, S., Vespignani, A.: Citation networks. Models of science dynamics: Encounters between complexity theory and information sciences, 233– 257 (2011)

  8. [8]

    Science316(5827), 1036–1039 (2007)

    Wuchty, S., Jones, B.F., Uzzi, B.: The increasing dominance of teams in production of knowledge. Science316(5827), 1036–1039 (2007)

  9. [9]

    PloS one7(7), 39464 (2012)

    B¨ orner, K., Klavans, R., Patek, M., Zoss, A.M., Biberstine, J.R., Light, R.P., Larivi` ere, V., Boyack, K.W.: Design and update of a classification system: The ucsd map of science. PloS one7(7), 39464 (2012)

  10. [10]

    Journal of the Association for Information Science and Technology67(9), 2137–2155 (2016)

    Howison, J., Bullard, J.: Software in the scientific literature: Problems with seeing, finding, and using software mentioned in the biology literature. Journal of the Association for Information Science and Technology67(9), 2137–2155 (2016)

  11. [11]

    Journal of the Association for Information Science and Technology72(7), 870–884 (2021)

    Du, C., Cohoon, J., Lopez, P., Howison, J.: Softcite dataset: A dataset of soft- ware mentions in biomedical and economic research publications. Journal of the Association for Information Science and Technology72(7), 870–884 (2021)

  12. [12]

    PeerJ Computer Science8, 1022 (2022)

    Du, C., Cohoon, J., Lopez, P., Howison, J.: Understanding progress in software citation: a study of software citation in the cord-19 corpus. PeerJ Computer Science8, 1022 (2022)

  13. [13]

    arXiv preprint arXiv:2209.00693 (2022)

    Istrate, A.-M., Li, D., Taraborelli, D., Torkar, M., Veytsman, B., Williams, I.: 22 A large dataset of software mentions in the biomedical literature. arXiv preprint arXiv:2209.00693 (2022)

  14. [14]

    Nature methods18(10), 1161– 1168 (2021)

    Wratten, L., Wilm, A., G¨ oke, J.: Reproducible, scalable, and shareable analysis pipelines with bioinformatics workflow managers. Nature methods18(10), 1161– 1168 (2021)

  15. [15]

    Science317(5837), 482–487 (2007)

    Hidalgo, C.A., Klinger, B., Barab´ asi, A.-L., Hausmann, R.: The product space conditions the development of nations. Science317(5837), 482–487 (2007)

  16. [16]

    Nature computational science 4(7), 465–468 (2024)

    Hocquet, A., Wieber, F., Gramelsberger, G., Hinsen, K., Diesmann, M., Pasquini Santos, F., Landstr¨ om, C., Peters, B., Kasprowicz, D., Borrelli, A.,et al.: Software in science is ubiquitous yet overlooked. Nature computational science 4(7), 465–468 (2024)

  17. [17]

    Advances in neural information processing systems32(2019)

    Paszke, A., Gross, S., Massa, F., Lerer, A., Bradbury, J., Chanan, G., Killeen, T., Lin, Z., Gimelshein, N., Antiga, L., et al.: Pytorch: An imperative style, high- performance deep learning library. Advances in neural information processing systems32(2019)

  18. [18]

    Barker et al., ‘Introducing the FAIR Principles for research software’, Sci

    Barker, M., Chue Hong, N.P., Katz, D.S., Lamprecht, A.-L., Martinez-Ortiz, C., Psomopoulos, F., Harrow, J., Castro, L.J., Gruenpeter, M., Martinez, P.A., Hon- eyman, T.: Introducing the fair principles for research software. Scientific Data 9(1) (2022) https://doi.org/10.1038/s41597-022-01710-x

  19. [19]

    Scientific Data10(1) (2023) https://doi.org/10.1038/s41597-023-02463-x

    Patel, B., Soundarajan, S., M´ enager, H., Hu, Z.: Making biomedical research soft- ware fair: Actionable step-by-step guidelines with a user-support tool. Scientific Data10(1) (2023) https://doi.org/10.1038/s41597-023-02463-x

  20. [20]

    PLOS Computational Biology16(11), 1008390 (2020) https://doi.org/ 10.1371/journal.pcbi.1008390

    Romano, J.D., Moore, J.H.: Ten simple rules for writing a paper about scientific software. PLOS Computational Biology16(11), 1008390 (2020) https://doi.org/ 10.1371/journal.pcbi.1008390

  21. [21]

    Victoria Stodden, Marcia McNutt, David H

    Smith, A.M., Katz, D.S., Niemeyer, K.E.: Software citation principles. PeerJ Computer Science2, 86 (2016) https://doi.org/10.7717/peerj-cs.86

  22. [22]

    Scientometrics129(7), 3997–4019 (2024) https://doi.org/10.1007/s11192-024-05064-6

    Wang, Y., Li, K.: How do official software citation formats evolve over time? a longitudinal analysis of r programming language packages. Scientometrics129(7), 3997–4019 (2024) https://doi.org/10.1007/s11192-024-05064-6

  23. [23]

    Journal of Informetrics13(1), 449–461 (2019) https://doi.org/10.1016/j.joi.2019.02.007

    Li, K., Chen, P.-Y., Yan, E.: Challenges of measuring software impact through citations: An examination of the lme4 r package. Journal of Informetrics13(1), 449–461 (2019) https://doi.org/10.1016/j.joi.2019.02.007

  24. [24]

    Scientometrics109(3), 1593–1610 (2016) https://doi.org/10

    Pan, X., Yan, E., Hua, W.: Disciplinary differences of software use and impact in scientific literature. Scientometrics109(3), 1593–1610 (2016) https://doi.org/10. 1007/s11192-016-2138-4 23

  25. [25]

    In: International Conference on Theory and Practice of Digital Libraries, pp

    Escamilla, E., Klein, M., Cooper, T., Rampin, V., Weigle, M.C., Nelson, M.L.: The rise of github in scholarly publications. In: International Conference on Theory and Practice of Digital Libraries, pp. 187–200 (2022). Springer

  26. [26]

    EPJ Data Science11(1) (2022) https://doi.org/10.1140/ epjds/s13688-022-00345-7

    Trujillo, M.Z., H´ ebert-Dufresne, L., Bagrow, J.: The penumbra of open source: projects outside of centralized platforms are longer maintained, more academic and more collaborative. EPJ Data Science11(1) (2022) https://doi.org/10.1140/ epjds/s13688-022-00345-7

  27. [27]

    Journal of Informetrics12(2), 481–493 (2018) https://doi.org/10.1016/j.joi.2018

    Pan, X., Yan, E., Cui, M., Hua, W.: Examining the usage, citation, and diffusion patterns of bibliometric mapping software: A comparative study of three tools. Journal of Informetrics12(2), 481–493 (2018) https://doi.org/10.1016/j.joi.2018. 03.005

  28. [28]

    Journal of Informetrics11(4), 989–1002 (2017) https:// doi.org/10.1016/j.joi.2017.08.003

    Li, K., Yan, E., Feng, Y.: How is r cited in research outputs? structure, impacts, and citation standard. Journal of Informetrics11(4), 989–1002 (2017) https:// doi.org/10.1016/j.joi.2017.08.003

  29. [29]

    Proceedings of the National Academy of Sciences 115(10), 2323–2328 (2018) https://doi.org/10.1073/pnas.1714730115

    Ramiro, C., Srinivasan, M., Malt, B.C., Xu, Y.: Algorithms in the historical emergence of word senses. Proceedings of the National Academy of Sciences 115(10), 2323–2328 (2018) https://doi.org/10.1073/pnas.1714730115 . Publisher: Proceedings of the National Academy of Sciences

  30. [30]

    Strategy Sci- ence5(4), 293–310 (2020) https://doi.org/10.1287/stsc.2020.0108

    Denrell, J., Kov´ acs, B.: The Ecology of Management Concepts. Strategy Sci- ence5(4), 293–310 (2020) https://doi.org/10.1287/stsc.2020.0108 . Publisher: INFORMS

  31. [31]

    American Sociological Review84(3), 545–576 (2019) https://doi.org/10.1177/ 0003122419846628

    Hallett, T., Stapleton, O., Sauder, M.: Public Ideas: Their Varieties and Careers. American Sociological Review84(3), 545–576 (2019) https://doi.org/10.1177/ 0003122419846628

  32. [32]

    In: Proceedings of the 24th ACM International on Confer- ence on Information and Knowledge Management

    Kenter, T., Wevers, M., Huijnen, P., Rijke, M.: Ad Hoc Monitoring of Vocabulary Shifts over Time. In: Proceedings of the 24th ACM International on Confer- ence on Information and Knowledge Management. CIKM ’15, pp. 1191–1200. Association for Computing Machinery, New York, NY, USA (2015). https://doi. org/10.1145/2806416.2806474 .https://dl.acm.org/doi/10....

  33. [33]

    Springer, Cham (2017)

    Chen, C., Song, M.: Representing Scientific Knowledge: The Role of Uncer- tainty. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-62543-0 . http://link.springer.com/10.1007/978-3-319-62543-0Accessed 2024-01-30

  34. [34]

    Technology Analysis & Strategic Management34(4), 363–376 (2022) https://doi.org/10.1080/09537325.2021.1901873

    Ho, J.C.: Disruptive innovation from the perspective of innovation diffusion theory. Technology Analysis & Strategic Management34(4), 363–376 (2022) https://doi.org/10.1080/09537325.2021.1901873

  35. [35]

    Science159(3810), 56–63 (1968) https://doi

    Merton, R.K.: The Matthew Effect in Science: The reward and communication 24 systems of science are considered. Science159(3810), 56–63 (1968) https://doi. org/10.1126/science.159.3810.56

  36. [36]

    Scientometrics128(4), 2507–2533 (2023) https://doi

    Doehne, M., Herfeld, C.: How academic opinion leaders shape scientific ideas: an acknowledgment analysis. Scientometrics128(4), 2507–2533 (2023) https://doi. org/10.1007/s11192-022-04623-z

  37. [37]

    EPJ Data Science7(1), 1–16 (2018) https://doi.org/10.1140/epjds/s13688-018-0166-4

    Morgan, A.C., Economou, D.J., Way, S.F., Clauset, A.: Prestige drives epistemic inequality in the diffusion of scientific ideas. EPJ Data Science7(1), 1–16 (2018) https://doi.org/10.1140/epjds/s13688-018-0166-4

  38. [38]

    JAMA289(15), 1969 (2003) https://doi.org/10.1001/jama.289.15.1969

    Berwick, D.M.: Disseminating Innovations in Health Care. JAMA289(15), 1969 (2003) https://doi.org/10.1001/jama.289.15.1969 . Accessed 2024-01-30

  39. [39]

    Science 340(6134), 814–815 (2013) https://doi.org/10.1126/science.1231535

    Joppa, L.N., McInerny, G., Harper, R., Salido, L., Takeda, K., O’Hara, K., Gav- aghan, D., Emmott, S.: Troubling Trends in Scientific Software Use. Science 340(6134), 814–815 (2013) https://doi.org/10.1126/science.1231535 . Publisher: American Association for the Advancement of Science. Accessed 2024-01-09

  40. [40]

    American Journal of Sociology111(2), 447–504 (2005) https://doi.org/10.1086/ 432782

    Uzzi, B., Spiro, J.: Collaboration and Creativity: The Small World Problem. American Journal of Sociology111(2), 447–504 (2005) https://doi.org/10.1086/ 432782

  41. [41]

    Harvard university press, ??? (1987)

    Latour, B.: Science in Action: How to Follow Scientists and Engineers Through Society. Harvard university press, ??? (1987)

  42. [42]

    American Journal of Sociology83(6), 1420–1443 (1978) https://doi.org/10.1086/226707

    Granovetter, M.: Threshold Models of Collective Behavior. American Journal of Sociology83(6), 1420–1443 (1978) https://doi.org/10.1086/226707 . Publisher: The University of Chicago Press

  43. [43]

    Science337(6090), 49–53 (2012) https: //doi.org/10.1126/science.1217330

    Valente, T.W.: Network Interventions. Science337(6090), 49–53 (2012) https: //doi.org/10.1126/science.1217330 . Publisher: American Association for the Advancement of Science

  44. [44]

    International Journal of Gynecol- ogy & Obstetrics85(S1), 3–13 (2004) https://doi.org/10.1016/j.ijgo.2004.01

    Free, M.J.: Achieving appropriate design and widespread use of health care technologies in the developing world. International Journal of Gynecol- ogy & Obstetrics85(S1), 3–13 (2004) https://doi.org/10.1016/j.ijgo.2004.01. 009 . eprint: https://onlinelibrary.wiley.com/doi/pdf/10.1016/j.ijgo.2004.01.009. Accessed 2024-01-30

  45. [45]

    Creativity and Innovation Management16(4), 408– 421 (2007) https://doi.org/10.1111/j.1467-8691.2007.00451.x

    Gem¨ unden, H.G., Salomo, S., H¨ olzle, K.: Role Models for Radical Innovations in Times of Open Innovation. Creativity and Innovation Management16(4), 408– 421 (2007) https://doi.org/10.1111/j.1467-8691.2007.00451.x

  46. [46]

    1–8 (2009)

    Hannay, J.E., MacLeod, C., Singer, J., Langtangen, H.P., Pfahl, D., Wilson, G.: How do scientists develop and use scientific software? In: 2009 ICSE Workshop on Software Engineering for Computational Science and Engineering, pp. 1–8 (2009). 25 Ieee

  47. [47]

    Genome biology17(1), 177 (2016)

    Ziemann, M., Eren, Y., El-Osta, A.: Gene name errors are widespread in the scientific literature. Genome biology17(1), 177 (2016)

  48. [48]

    PLoS computational biology17(7), 1008984 (2021)

    Abeysooriya, M., Soria, M., Kasu, M.S., Ziemann, M.: Gene name errors: Lessons not learned. PLoS computational biology17(7), 1008984 (2021)

  49. [49]

    Frontiers in immunology12, 768541 (2021)

    White, S., Quinn, J., Enzor, J., Staats, J., Mosier, S.M., Almarode, J., Denny, T.N., Weinhold, K.J., Ferrari, G., Chan, C.: Flowkit: a python toolkit for integrated manual and automated cytometry analysis workflows. Frontiers in immunology12, 768541 (2021)

  50. [50]

    A Study of LLMs' Preferences for Libraries and Programming Languages

    Twist, L., Zhang, J.M., Harman, M., Syme, D., Noppen, J., Nauck, D.: Llms love python: A study of llms’ bias for programming languages and libraries. arXiv preprint arXiv:2503.17181 (2025)

  51. [51]

    Genome biology15(12), 550 (2014)

    Love, M.I., Huber, W., Anders, S.: Moderated estimation of fold change and dispersion for rna-seq data with deseq2. Genome biology15(12), 550 (2014)

  52. [52]

    Schueller, W., Wachs, J., Servedio, V.D.P., Thurner, S., Loreto, V.: Evolving collaboration, dependencies, and use in the rust open source software ecosystem. Sci. Data9(1), 703 (2022)

  53. [53]

    Schueller, W., Wachs, J.: Modeling interconnected social and technical risks in open source software ecosystems. Collect. Intell.3(1) (2024)

  54. [54]

    In: Pro- ceedings of the 2018 26th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering, pp

    Valiev, M., Vasilescu, B., Herbsleb, J.: Ecosystem-level determinants of sustained activity in open-source projects: a case study of the PyPI ecosystem. In: Pro- ceedings of the 2018 26th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering, pp. 644–655. ACM, New York, NY, USA (2018)

  55. [55]

    Li, K., Yan, E.: Co-mention network of R packages: Scientific impact and clustering structure. J. Informetr.12(1), 87–100 (2018)

  56. [56]

    Nature Human Behaviour 6(9), 1206–1217 (2022)

    Miao, L., Murray, D., Jung, W.-S., Larivi` ere, V., Sugimoto, C.R., Ahn, Y.-Y.: The latent structure of global scientific development. Nature Human Behaviour 6(9), 1206–1217 (2022)

  57. [57]

    Scientometrics109(3), 1695–1709 (2016)

    Guevara, M.R., Hartmann, D., Aristar´ an, M., Mendoza, M., Hidalgo, C.A.: The research space: using career paths to predict the evolution of the research output of individuals, institutions, and nations. Scientometrics109(3), 1695–1709 (2016)

  58. [58]

    Scibert: Pretrained contextualized embeddings for scientific text

    Beltagy, I., Lo, K., Cohan, A.: Scibert: A pretrained language model for scientific text. arXiv preprint arXiv:1903.10676 (2019)

  59. [59]

    arXiv preprint arXiv:2402.14602 (2024)

    Druskat, S., Hong, N.P.C., Buzzard, S., Konovalov, O., Kornek, P.: Don’t mention 26 it: An approach to assess challenges to using software mentions for citation and discoverability research. arXiv preprint arXiv:2402.14602 (2024)

  60. [60]

    In: International Workshop on Natural Scientific Language Processing and Research Knowledge Graphs, pp

    Istrate, A.-M., Fisher, J., Yang, X., Moraw, K., Li, K., Li, D., Klein, M.: Sci- entific software citation intent classification using large language models. In: International Workshop on Natural Scientific Language Processing and Research Knowledge Graphs, pp. 80–99 (2024). Springer Nature Switzerland Cham

  61. [61]

    Frontiers in Research Metrics and Analytics3, 23 (2018)

    Hook, D.W., Porter, S.J., Herzog, C.: Dimensions: building context for search and evaluation. Frontiers in Research Metrics and Analytics3, 23 (2018)

  62. [62]

    Quantitative Science Studies4(1), 127–143 (2023)

    Porter, S.J., Hawizy, L., Hook, D.W.: Recategorising research: Mapping from for 2008 to for 2020 in dimensions. Quantitative Science Studies4(1), 127–143 (2023)

  63. [63]

    revealed

    Balassa, B.: Trade liberalisation and “revealed” comparative advantage 1. The manchester school33(2), 99–123 (1965)

  64. [64]

    Eurasian business review5(1), 99–115 (2015)

    Laursen, K.: Revealed comparative advantage and the alternatives as measures of international specialization. Eurasian business review5(1), 99–115 (2015)

  65. [65]

    Physical Review X4(1), 011047 (2014)

    Peixoto, T.P.: Hierarchical block structures and high-resolution model selection in large networks. Physical Review X4(1), 011047 (2014)

  66. [66]

    figshare (2014) https://doi.org/10

    Peixoto, T.P.: The graph-tool python library. figshare (2014) https://doi.org/10. 6084/m9.figshare.1164194 . Accessed 2014-09-10

  67. [67]

    ´A., Bogun´ a, M., Vespignani, A.: Extracting the multiscale backbone of complex weighted networks

    Serrano, M. ´A., Bogun´ a, M., Vespignani, A.: Extracting the multiscale backbone of complex weighted networks. Proceedings of the national academy of sciences 106(16), 6483–6488 (2009)

  68. [68]

    Proceedings of the American Mathematical society7(1), 48–50 (1956)

    Kruskal, J.B.: On the shortest spanning subtree of a graph and the traveling salesman problem. Proceedings of the American Mathematical society7(1), 48–50 (1956)

  69. [69]

    Rhoades, S.A.: The herfindahl-hirschman index. Fed. Res. Bull.79, 188 (1993)

  70. [70]

    In: Proceedings of the International Multiconference of Engineers and Computer Scientists, vol

    Niwattanakul, S., Singthongchai, J., Naenudorn, E., Wanapu, S.: Using of jac- card coefficient for keywords similarity. In: Proceedings of the International Multiconference of Engineers and Computer Scientists, vol. 1, pp. 380–384 (2013)

  71. [71]

    Scientometrics64(3), 351–374 (2005)

    Boyack, K.W., Klavans, R., B¨ orner, K.: Mapping the backbone of science. Scientometrics64(3), 351–374 (2005)

  72. [72]

    Proceedings of the national academy of sciences101(suppl 1), 5200–5205 (2004)

    Newman, M.E.: Coauthorship networks and patterns of scientific collaboration. Proceedings of the national academy of sciences101(suppl 1), 5200–5205 (2004)

  73. [73]

    McGraw-Hill Education (UK), ??? (2001) 27

    Becher, T., Trowler, P.: Academic Tribes and Territories. McGraw-Hill Education (UK), ??? (2001) 27

  74. [74]

    PeerJ Computer Science8, 835 (2022)

    Schindler, D., Bensmann, F., Dietze, S., Kr¨ uger, F.: The role of software in science: a knowledge graph-based analysis of software mentions in pubmed central. PeerJ Computer Science8, 835 (2022)

  75. [75]

    PLoS biology12(1), 1001745 (2014)

    Wilson, G., Aruliah, D.A., Brown, C.T., Chue Hong, N.P., Davis, M., Guy, R.T., Haddock, S.H., Huff, K.D., Mitchell, I.M., Plumbley, M.D.,et al.: Best practices for scientific computing. PLoS biology12(1), 1001745 (2014)

  76. [76]

    The American economic review 75(2), 332–337 (1985)

    David, P.A.: Clio and the economics of qwerty. The American economic review 75(2), 332–337 (1985)

  77. [77]

    The economic journal99(394), 116–131 (1989)

    Arthur, W.B.: Competing technologies, increasing returns, and lock-in by histor- ical events. The economic journal99(394), 116–131 (1989)

  78. [78]

    Handbook of industrial organization3, 1967–2072 (2007)

    Farrell, J., Klemperer, P.: Coordination and lock-in: Competition with switch- ing costs and network effects. Handbook of industrial organization3, 1967–2072 (2007)

  79. [79]

    The review of economic studies62(4), 515–539 (1995)

    Klemperer, P.: Competition when consumers have switching costs: An overview with applications to industrial organization, macroeconomics, and international trade. The review of economic studies62(4), 515–539 (1995)

  80. [80]

    In: Proceedings of the 2013 ACM SIGPLAN International Conference on Object Oriented Programming Systems Languages & Applications, pp

    Meyerovich, L.A., Rabkin, A.S.: Empirical analysis of programming language adoption. In: Proceedings of the 2013 ACM SIGPLAN International Conference on Object Oriented Programming Systems Languages & Applications, pp. 1–18 (2013)

Showing first 80 references.