pith. machine review for the scientific record. sign in

arxiv: 2604.08888 · v1 · submitted 2026-04-10 · 💻 cs.SE · cs.HC

Recognition: no theorem link

From OSS to Open Source AI: an Exploratory Study of Collaborative Development Paradigm Divergence

Authors on Pith no claims yet

Pith reviewed 2026-05-10 17:53 UTC · model grok-4.3

classification 💻 cs.SE cs.HC
keywords open source softwareopen source AIcollaboration intensityuser innovationGitHubHugging Facedevelopment paradigmsrepository analysis
0
0 comments X

The pith

Open source AI models show lower collaboration intensity and a shift to adaptive user-innovation compared with traditional open source software.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper measures how the collaborative practices around open source AI models differ from those around conventional open source code projects. It does so by running statistical, network, and content analyses on roughly 1.4 million GitHub repositories and 1.4 million Hugging Face repositories, then supplementing the numbers with interviews. The central finding is that AI-model development displays markedly lower collaboration intensity, reduced openness to direct code-level contributions while knowledge exchange stays relatively open, and a turn toward individual adaptation of existing models rather than joint improvement. These patterns arise because AI artifacts differ in kind from ordinary software. The results point to concrete ways that collaboration tools and norms might need to change for AI work.

Core claim

Compared with the traditional open source software development paradigm, the open source AI model development paradigm exhibits significantly lower collaboration intensity, lower collaboration openness regarding direct contribution while persisting relatively open knowledge exchange, and a divergence toward adaptive utilization user-innovation rather than collaborative improvement.

What carries the argument

Large-scale comparative analysis of collaboration intensity, openness, and user-innovation metrics drawn from repository data on GitHub and Hugging Face Hub, augmented by social-network and content analyses plus semi-structured interviews.

Load-bearing premise

The chosen metrics from repository data and the interview sample accurately capture and explain the underlying collaborative paradigms without significant selection bias or unmeasured confounders.

What would settle it

Re-running the same intensity and openness calculations on a fresh, randomly sampled set of repositories or on contributor survey responses and finding no statistically significant difference between the two paradigms.

Figures

Figures reproduced from arXiv: 2604.08888 by Hengzhi Ye, Minghui Zhou.

Figure 1
Figure 1. Figure 1: OSS development process pipeline Requirement elicitation model design Model architecture Data ...... ...... Core development process (model developing & training) model pretrain model finetune model training Base model (optional) Data processing Data collection model evaluation downstream process distribute usage (deployment) usage (calling API) Finetune usage [PITH_FULL_IMAGE:figures/full_fig_p007_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: OSM development process pipeline of the OSS and OSM pipelines, with options to “agree”, “don’t agree”, or “not sure”. We also set an optional open-ended section for qualitative feedback at the end of the survey. After a one-week period, we collected 179 valid responses, with 84 respondents (46.9%) identified as software engineers and 95 respondents (53.1%) identified as academic researchers. Notably, 136(7… view at source ↗
Figure 3
Figure 3. Figure 3: Overview of methodology [PITH_FULL_IMAGE:figures/full_fig_p008_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Correlations of selected features in OSS [PITH_FULL_IMAGE:figures/full_fig_p014_4.png] view at source ↗
Figure 6
Figure 6. Figure 6: Distribution of Activities of OSS and OSM repositories [PITH_FULL_IMAGE:figures/full_fig_p015_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: Part of the OSM user commit network For the user commit network, we examined the proportion of community members belonging to the predominant organizational affiliation within each detected community across the entire 2For readability, the figure displays only a small part of the network to prevent excessive node and edge density that would impair visual interpretation. The complete network can be reproduc… view at source ↗
Figure 8
Figure 8. Figure 8: Distribution of communication theme of OSS and OSM repositories [PITH_FULL_IMAGE:figures/full_fig_p019_8.png] view at source ↗
read the original abstract

AI development is embracing open-source paradigm, but the fundamental distinction between AI models and traditional software artifacts may lead to a divergent open-source development paradigm with different collaborative practices, which remains unexplored. We therefore bridge the knowledge gap by quantifying and characterizing the differences in the collaborative development paradigms of traditional open source software (OSS) and open source AI models (OSM), and investigating the underlying factors that may drive these distinctions. We collect 1,428,792 OSS repositories from GitHub and 1,440,527 OSM repositories from HF Hub, and conduct comprehensive statistical, social network and content analyses to measure and understand the differences in collaboration intensity, collaboration openness, and user innovation across the two development paradigms, complementing these quantitative results with semi-structured interviews. In consequence, we find that compared to OSS development paradigm, the OSM development paradigm exhibits significantly lower collaboration intensity; lower collaboration openness regarding direct contribution while persisting relatively open knowledge exchange; and a divergence toward adaptive utilization user-innovation rather than collaborative improvement. Through semi-structured interviews, we further elucidate the socio-technical factors underlying these differences. These findings reveal the paradigmatic divergence in open source development between traditional OSS and OSM across three critical dimensions of open source collaboration and potential underlying factors, shedding light on how to improve collaborative work techniques and practices within the context of AI development.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

1 major / 3 minor

Summary. The paper claims that open-source AI model (OSM) development on Hugging Face Hub diverges from traditional OSS on GitHub: using data from 1.4M+ repositories each plus SNA, stats, content analysis, and interviews, it finds OSM shows significantly lower collaboration intensity, lower openness on direct contributions (while retaining open knowledge exchange), and a shift toward adaptive utilization/user-innovation rather than collaborative improvement. Socio-technical factors are explored via interviews.

Significance. If the platform-comparability issues are resolved, the work offers a large-scale descriptive baseline on how artifact type (models vs. code) shapes open collaboration, with practical implications for AI dev practices. Strengths include the repository scale supporting statistical comparisons, mixed-methods design, and grounding in public platform data.

major comments (1)
  1. [Data Collection and Quantitative Analysis] The central claims of significantly lower collaboration intensity and lower direct-contribution openness in OSM (abstract; results sections) rest on treating GitHub and HF Hub repositories as equivalent units for counting contributors, interaction edges, and collaboration signals. GitHub captures commits/PRs/issues as primary signals, while HF Hub hosts serialized weights/configs/model cards with most adaptation occurring externally; without explicit normalization for these affordances in the extraction pipeline, the quantitative divergences risk being partly artifactual. This is load-bearing for the 'significantly lower' and 'divergence toward adaptive utilization' statements.
minor comments (3)
  1. [Methodology] Clarify the exact operationalization of 'collaboration intensity' and 'openness' metrics (e.g., how unique contributors and interaction edges are defined and normalized across platforms) to allow replication.
  2. [Interviews] Provide more detail on interview sampling strategy, participant demographics, and how themes were derived to assess selection bias in the qualitative component.
  3. [Results] Report effect sizes or confidence intervals alongside p-values in statistical comparisons to strengthen the 'significantly lower' claims.

Simulated Author's Rebuttal

1 responses · 0 unresolved

Thank you for the constructive feedback. We address the major comment on data collection and quantitative analysis below, agreeing that platform differences require careful consideration. We will revise the manuscript to strengthen the discussion on comparability.

read point-by-point responses
  1. Referee: The central claims of significantly lower collaboration intensity and lower direct-contribution openness in OSM (abstract; results sections) rest on treating GitHub and HF Hub repositories as equivalent units for counting contributors, interaction edges, and collaboration signals. GitHub captures commits/PRs/issues as primary signals, while HF Hub hosts serialized weights/configs/model cards with most adaptation occurring externally; without explicit normalization for these affordances in the extraction pipeline, the quantitative divergences risk being partly artifactual. This is load-bearing for the 'significantly lower' and 'divergence toward adaptive utilization' statements.

    Authors: We thank the referee for this insightful observation. While GitHub and HF Hub have different primary signals, our quantitative analysis focuses on comparable metrics of collaboration intensity (e.g., average contributors per repo, network density) and openness (e.g., proportion of external vs internal contributions where measurable). We did apply some normalization by repository size and activity in our statistical comparisons. Nevertheless, we acknowledge that full normalization for external adaptations is not feasible with available data, as HF Hub does not track downstream uses comprehensively. We will add a new paragraph in the Discussion section elaborating on these socio-technical differences and their impact on measurement, and include it as a limitation. Additionally, the mixed-methods approach with interviews helps triangulate the findings beyond pure quantitative counts. We believe this addresses the concern without invalidating the core claims, which are supported by multiple lines of evidence. revision: partial

Circularity Check

0 steps flagged

No circularity: empirical measurements from external public repositories

full rationale

The paper performs an exploratory empirical study by collecting 1.4M+ OSS repos from GitHub and 1.4M+ OSM repos from HF Hub, then applies statistical, social-network, and content analyses plus new semi-structured interviews. No equations, fitted parameters, or derivations are present. Central claims (lower collaboration intensity, lower direct-contribution openness, shift to adaptive user-innovation) are direct outputs of these external data measurements rather than reductions to self-citations, ansatzes, or prior results by the same authors. The analysis is therefore self-contained against external benchmarks with no load-bearing step that collapses to its own inputs by construction.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The central claim rests on the assumption that GitHub and HF Hub repositories plus the interview set faithfully represent the two paradigms and that the chosen quantitative measures validly proxy collaboration intensity, openness, and innovation type.

axioms (2)
  • domain assumption Public repositories on GitHub and HF Hub accurately represent the collaborative development practices of OSS and OSM paradigms
    Core sampling assumption invoked to justify the 1.4M+ repo datasets.
  • domain assumption Statistical, social network, and content analyses can reliably measure collaboration intensity, openness, and user-innovation type
    Invoked to interpret the quantitative results as evidence of paradigmatic differences.

pith-pipeline@v0.9.0 · 5534 in / 1273 out tokens · 31463 ms · 2026-05-10T17:53:17.311882+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

140 extracted references · 7 canonical work pages · 2 internal anchors

  1. [1]

    Ahmad Abdellatif, Mairieli Wessel, Igor Steinmacher, Marco A Gerosa, and Emad Shihab. 2022. BotHunter: An approach to detect software bots in GitHub. InProceedings of the 19th International Conference on Mining Software Repositories. 6–17

  2. [2]

    Karan Aggarwal, Abram Hindle, and Eleni Stroulia. 2014. Co-evolution of project documentation and popularity within github. InProceedings of the 11th working conference on mining software repositories. 360–363

  3. [3]

    2022.Stable diffusion public release

    Stability AI. 2022.Stable diffusion public release. Retrieved April 1, 2025 from https://stability.ai/blog/stable-diffusion- public-release

  4. [4]

    Adem Ait, Javier Luis Cánovas Izquierdo, and Jordi Cabot. 2025. On the suitability of hugging face hub for empirical studies.Empirical Software Engineering30, 2 (2025), 1–48

  5. [5]

    Adem Ait, Javier Luis Cánovas Izquierdo, and Jordi Cabot. 2023. Hfcommunity: A tool to analyze the hugging face hub community. In2023 IEEE international conference on software analysis, evolution and reengineering (SANER). IEEE, 728–732

  6. [6]

    Christopher Akiki, Giada Pistilli, Margot Mieskes, Matthias Gallé, Thomas Wolf, Suzana Ilić, and Yacine Jernite

  7. [7]

    Bigscience: A case study in the social construction of a multilingual large language model.arXiv preprint arXiv:2212.04960(2022)

  8. [8]

    Mohammad AlMarzouq, Li Zheng, Guang Rong, and Varun Grover. 2005. Open source: Concepts, benefits, and challenges.Communications of the Association for Information Systems16, 1 (2005), 37

  9. [9]

    2022.a project to record the public github timeline, archive it, and make it easily accessible for further analysis

    GH Archive. 2022.a project to record the public github timeline, archive it, and make it easily accessible for further analysis. Retrieved April 1, 2025 from https://www.gharchive.org

  10. [10]

    Yochai Benkler and Helen Nissenbaum. 2006. Commons-based peer production and virtue.Journal of political philosophy14, 4 (2006)

  11. [11]

    Marcus Vinicius Bertoncello, Gustavo Pinto, Igor Scaliante Wiese, and Igor Steinmacher. 2020. Pull requests or commits? which method should we use to study contributors’ behavior?. In2020 IEEE 27th International Conference on Software Analysis, Evolution and Reengineering (SANER). IEEE, 592–601

  12. [12]

    Dane Bertram, Amy Voida, Saul Greenberg, and Robert Walker. 2010. Communication, collaboration, and bugs: the social nature of issue tracking in small, collocated teams. InProceedings of the 2010 ACM conference on Computer supported cooperative work. 291–300

  13. [13]

    S Bhaskaran and Raja Marappan. 2023. Enhanced personalized recommendation system for machine learning public datasets: generalized modeling, simulation, significant results and analysis.International Journal of Information Technology15, 3 (2023), 1583–1595

  14. [14]

    Tegawendé F Bissyandé, David Lo, Lingxiao Jiang, Laurent Réveillere, Jacques Klein, and Yves Le Traon. 2013. Got issues? who cares about it? a large scale investigation of issue trackers from github. In2013 IEEE 24th international symposium on software reliability engineering (ISSRE). IEEE, 188–197

  15. [15]

    Erling Bjögvinsson, Pelle Ehn, and Per-Anders Hillgren. 2012. Design things and design thinking: Contemporary participatory design challenges.Design issues28, 3 (2012), 101–116

  16. [16]

    Alexander Boden, Frank Rosswog, Gunnar Stevens, and Volker Wulf. 2014. Articulation spaces: bridging the gap between formal and informal coordination. InProceedings of the 17th ACM conference on Computer supported cooperative work & social computing. 1120–1130

  17. [17]

    Marcel Bogers and Joel West. 2012. Managing distributed innovation: Strategic utilization of open and user innovation. Creativity and innovation management21, 1 (2012), 61–75

  18. [18]

    Stephen P Borgatti and Martin G Everett. 2000. Models of core/periphery structures.Social networks21, 4 (2000), 375–395

  19. [19]

    Hudson Borges and Marco Tulio Valente. 2018. What’s in a github star? understanding repository starring practices in a social coding platform.Journal of Systems and Software146 (2018), 112–129

  20. [20]

    Virginia Braun and Victoria Clarke. 2006. Using thematic analysis in psychology.Qualitative research in psychology 3, 2 (2006), 77–101. 26 Ye et al

  21. [21]

    Virginia Braun and Victoria Clarke. 2019. Reflecting on reflexive thematic analysis.Qualitative research in sport, exercise and health11, 4 (2019), 589–597

  22. [22]

    Scott Brisson, Ehsan Noei, and Kelly Lyons. 2020. We are family: analyzing communication in github software repositories and their forks. In2020 IEEE 27th International Conference on Software Analysis, Evolution and Reengineering (SANER). IEEE, 59–69

  23. [23]

    John L Campbell, Charles Quincy, Jordan Osserman, and Ove K Pedersen. 2013. Coding in-depth semistructured interviews: Problems of unitization and intercoder reliability and agreement.Sociological methods & research42, 3 (2013), 294–320

  24. [24]

    Joel Castaño, Silverio Martínez-Fernández, and Xavier Franch. 2024. Lessons learned from mining the hugging face repository. InProceedings of the 1st IEEE/ACM International Workshop on Methodological Issues with Empirical Studies in Software Engineering. 1–6

  25. [25]

    Joel Castaño, Silverio Martínez-Fernández, Xavier Franch, and Justus Bogner. 2024. Analyzing the evolution and maintenance of ml models on hugging face. InProceedings of the 21st International Conference on Mining Software Repositories. 607–618

  26. [26]

    Natarajan Chidambaram and Pooya Rostami Mazrae. 2022. Bot detection in github repositories. InProceedings of the 19th International Conference on Mining Software Repositories. 726–728

  27. [27]

    Prerna Chikersal, Maria Tomprou, Young Ji Kim, Anita Williams Woolley, and Laura Dabbish. 2017. Deep structures of collaboration: Physiological correlates of collective intelligence and group satisfaction. InProceedings of the 2017 ACM conference on computer supported cooperative work and social computing. 873–888

  28. [28]

    William G Cochran. 1946. Relative accuracy of systematic and stratified random samples for a certain class of populations.The Annals of Mathematical Statistics17, 2 (1946), 164–177

  29. [29]

    Valerio Cosentino, Javier Luis, and Jordi Cabot. 2016. Findings from GitHub: methods, datasets and limitations. In Proceedings of the 13th International Conference on Mining Software Repositories. 137–141

  30. [30]

    Laura Dabbish, Colleen Stuart, Jason Tsay, and Jim Herbsleb. 2012. Social coding in GitHub: transparency and collaboration in an open software repository. InProceedings of the ACM 2012 conference on computer supported cooperative work. 1277–1286

  31. [31]

    Ozren Dabic, Emad Aghajani, and Gabriele Bavota. 2021. Sampling projects in github for MSR studies. In2021 IEEE/ACM 18th International Conference on Mining Software Repositories (MSR). IEEE, 560–564

  32. [32]

    Pasquale De Meo, Emilio Ferrara, Giacomo Fiumara, and Alessandro Provetti. 2011. Generalized louvain method for community detection in large networks. In2011 11th international conference on intelligent systems design and applications. IEEE, 88–93

  33. [33]

    2025.DeepSeek-R1 Release

    Deepseek. 2025.DeepSeek-R1 Release. Retrieved April 1, 2025 from https://api-docs.deepseek.com/news/news250120

  34. [34]

    Luis Felipe Dias, Igor Steinmacher, and Gustavo Pinto. 2018. Who drives company-owned OSS projects: internal or external members?Journal of the Brazilian Computer Society24, 1 (2018), 16

  35. [35]

    O’Reilly Media, Inc

    Chris DiBona and Sam Ockman. 1999.Open sources: Voices from the open source revolution. " O’Reilly Media, Inc. "

  36. [36]

    Jakob Eder. 2019. Innovation in the periphery: A critical survey and research agenda.International Regional Science Review42, 2 (2019), 119–146

  37. [37]

    2025.Hugging Face Hub

    Hugging Face. 2025.Hugging Face Hub. Retrieved April 1, 2025 from https://huggingface.co/models

  38. [38]

    Hongbo Fang, Patrick Park, James Evans, James Herbsleb, and Bogdan Vasilescu. 2024. Weak ties explain open source innovation.arXiv preprint arXiv:2411.05646(2024)

  39. [39]

    Brian Fitzgerald. 2006. The transformation of open source software.MIS quarterly(2006), 587–598

  40. [40]

    translation

    Joan H Fujimura. 1992. Crafting science: Standardized packages, boundary objects, and “translation. ”.Science as practice and culture168, 1992 (1992), 168–69

  41. [41]

    R Stuart Geiger, Nelle Varoquaux, Charlotte Mazel-Cabasse, and Chris Holdgraf. 2018. The types, roles, and practices of documentation in data analytics open source software libraries: a collaborative ethnography of documentation work.Computer Supported Cooperative Work (CSCW)27, 3 (2018), 767–802

  42. [42]

    Fabrizio Gilardi, Meysam Alizadeh, and Maël Kubli. 2023. ChatGPT outperforms crowd workers for text-annotation tasks.Proceedings of the National Academy of Sciences120, 30 (2023), e2305016120

  43. [43]

    2022.REST API

    GitHub. 2022.REST API. Retrieved April 1, 2025 from https://docs.github.com/en/rest

  44. [44]

    2025.About GitHub

    Github. 2025.About GitHub. Retrieved April 1, 2025 from https://github.com/about

  45. [45]

    Mehdi Golzadeh, Alexandre Decan, and Tom Mens. 2022. On the rise and fall of CI services in GitHub. In2022 IEEE International Conference on Software Analysis, Evolution and Reengineering (SANER). IEEE, 662–672

  46. [46]

    Georgios Gousios and Diomidis Spinellis. 2017. Mining software engineering data from GitHub. In2017 IEEE/ACM 39th International Conference on Software Engineering Companion (ICSE-C). IEEE, 501–502

  47. [47]

    Can I modify the code of an open source project?

    Gilles Gravier. 2019.Working with an open source project (aka “Can I modify the code of an open source project?”). Retrieved April 1, 2025 from https://www.finos.org/blog/working-with-an-open-source-project-aka-can-i-modify- the-code-of-an-open-source-project From OSS to Open Source AI: an Exploratory Study of Collaborative Development Paradigm Divergence 27

  48. [48]

    Greg Guest, Arwen Bunce, and Laura Johnson. 2006. How many interviews are enough? An experiment with data saturation and variability.Field methods18, 1 (2006), 59–82

  49. [49]

    Carl Gutwin, Steve Benford, Jeff Dyck, Mike Fraser, Ivan Vaghi, and Chris Greenhalgh. 2004. Revealing delay in collaborative environments. InProceedings of the SIGCHI conference on human factors in computing systems. 503–510

  50. [50]

    Anja Guzzi, Alberto Bacchelli, Michele Lanza, Martin Pinzger, and Arie Van Deursen. 2013. Communication in open source software development mailing lists. In2013 10th Working Conference on Mining Software Repositories (MSR). IEEE, 277–286

  51. [51]

    Hideaki Hata, Nicole Novielli, Sebastian Baltes, Raula Gaikovina Kula, and Christoph Treude. 2022. GitHub Discussions: An exploratory study of early adoption.Empirical Software Engineering27 (2022), 1–32

  52. [52]

    Herbsleb and Audris Mockus

    James D. Herbsleb and Audris Mockus. 2003. An empirical study of speed and communication in globally distributed software development.IEEE Transactions on software engineering29, 6 (2003), 481–494

  53. [53]

    James D Herbsleb, Audris Mockus, Thomas A Finholt, and Rebecca E Grinter. 2000. Distance, dependencies, and delay in a global collaboration. InProceedings of the 2000 ACM conference on Computer supported cooperative work. 319–328

  54. [54]

    Michael Heron, Vicki L Hanson, and Ian Ricketts. 2013. Open source and accessibility: advantages and limitations. Journal of interaction Science1 (2013), 1–10

  55. [55]

    private-collective

    Eric von Hippel and Georg von Krogh. 2003. Open source software and the “private-collective” innovation model: Issues for organization science.Organization science14, 2 (2003), 209–223

  56. [56]

    Edward J Hu, Yelong Shen, Phillip Wallis, Zeyuan Allen-Zhu, Yuanzhi Li, Shean Wang, Lu Wang, Weizhu Chen, et al

  57. [57]

    Lora: Low-rank adaptation of large language models.ICLR1, 2 (2022), 3

  58. [58]

    2025.Why it’s important for AI companies to open source their models

    Yaron Inger. 2025.Why it’s important for AI companies to open source their models. Retrieved April 1, 2025 from https: //www.techzine.eu/experts/analytics/127851/why-its-important-for-ai-companies-to-open-source-their-models/

  59. [59]

    Carina Jacobi, Wouter Van Atteveldt, and Kasper Welbers. 2018. Quantitative analysis of large amounts of journalistic texts using topic modelling. InRethinking research methods in an age of digital journalism. Routledge, 89–106

  60. [60]

    Oskar Jarczyk, Błażej Gruszka, Szymon Jaroszewicz, Leszek Bukowski, and Adam Wierzbicki. 2014. Github projects. quality analysis of open-source software. InSocial Informatics: 6th International Conference, SocInfo 2014, Barcelona, Spain, November 11-13, 2014. Proceedings 6. Springer, 80–94

  61. [61]

    Wenxin Jiang, Nicholas Synovic, Matt Hyatt, Taylor R Schorlemmer, Rohan Sethi, Yung-Hsiang Lu, George K Thiruvathukal, and James C Davis. 2023. An empirical study of pre-trained model reuse in the hugging face deep learning model registry. In2023 IEEE/ACM 45th International Conference on Software Engineering (ICSE). IEEE, 2463–2475

  62. [62]

    Wenxin Jiang, Nicholas Synovic, Purvish Jajal, Taylor R Schorlemmer, Arav Tewari, Bhavesh Pareek, George K Thiruvathukal, and James C Davis. 2023. Ptmtorrent: A dataset for mining open-source pre-trained model packages. In2023 IEEE/ACM 20th International Conference on Mining Software Repositories (MSR). IEEE, 57–61

  63. [63]

    Wenxin Jiang, Jerin Yasmin, Jason Jones, Nicholas Synovic, Jiashen Kuo, Nathaniel Bielanski, Yuan Tian, George K Thiruvathukal, and James C Davis. 2024. Peatmoss: A dataset and initial analysis of pre-trained models in open-source software. InProceedings of the 21st International Conference on Mining Software Repositories. 431–443

  64. [64]

    Jason Jones, Wenxin Jiang, Nicholas Synovic, George Thiruvathukal, and James Davis. 2024. What do we know about Hugging Face? A systematic literature review and quantitative validation of qualitative claims. InProceedings of the 18th ACM/IEEE International Symposium on Empirical Software Engineering and Measurement. 13–24

  65. [65]

    Takeshi Kakimoto, Yasutaka Kamei, Masao Ohira, and Kenichi Matsumoto. 2006. Social network analysis on communications for knowledge collaboration in oss communities. InProceedings of the international workshop on supporting knowledge collaboration in software development (KCSD’06). Citeseer, 35–41

  66. [66]

    Hanna Kallio, Anna-Maija Pietilä, Martin Johnson, and Mari Kangasniemi. 2016. Systematic methodological review: developing a framework for a qualitative semi-structured interview guide.Journal of advanced nursing72, 12 (2016), 2954–2965

  67. [67]

    Timothy Kinsman, Mairieli Wessel, Marco A Gerosa, and Christoph Treude. 2021. How do software developers use github actions to automate their workflows?. In2021 IEEE/ACM 18th International Conference on Mining Software Repositories (MSR). IEEE, 420–431

  68. [68]

    Chinmay Kulkarni, Tongshuang Wu, Kenneth Holstein, Q Vera Liao, Min Kyung Lee, Mina Lee, and Hariharan Subramonyam. 2023. LLMs and the Infrastructure of CSCW. InCompanion publication of the 2023 conference on computer supported cooperative work and social computing. 408–410

  69. [69]

    Woosuk Kwon, Zhuohan Li, Siyuan Zhuang, Ying Sheng, Lianmin Zheng, Cody Hao Yu, Joseph Gonzalez, Hao Zhang, and Ion Stoica. 2023. Efficient memory management for large language model serving with pagedattention. InProceedings of the 29th symposium on operating systems principles. 611–626

  70. [70]

    Max Langenkamp and Daniel N Yue. 2022. How open source machine learning software shapes ai. InProceedings of the 2022 AAAI/ACM Conference on AI, Ethics, and Society. 385–395. 28 Ye et al

  71. [71]

    Jan Marco Leimeister. 2010. Collective intelligence.Business & Information Systems Engineering2, 4 (2010), 245–248

  72. [72]

    Yueyue Liu, Hongyu Zhang, Zhiqiang Li, and Yuantian Miao. 2024. Optimizing the Utilization of Large Language Models via Schedule Optimization: An Exploratory Study. InProceedings of the 18th ACM/IEEE International Symposium on Empirical Software Engineering and Measurement. 84–95

  73. [73]

    Yezhou Ma, Huiying Li, Jiyao Hu, Rong Xie, and Yang Chen. 2017. Mining the network of the programmers: a data-driven analysis of GitHub. InProceedings of the 12th Chinese Conference on Computer Supported Cooperative Work and Social Computing. 165–168

  74. [74]

    Nora McDonald and Sean Goggins. 2013. Performance and participation in open source software on github. InCHI’13 extended abstracts on human factors in computing systems. 139–144

  75. [75]

    Patrick E McKnight and Julius Najab. 2010. Mann-Whitney U Test.The Corsini encyclopedia of psychology(2010), 1–1

  76. [76]

    2023.Meta and Microsoft introduce the next generation of Llama

    Meta. 2023.Meta and Microsoft introduce the next generation of Llama. Retrieved April 1, 2025 from https://about.fb. com/news/2023/07/llama-2

  77. [77]

    Audris Mockus, Roy T Fielding, and James D Herbsleb. 2002. Two case studies of open source software development: Apache and Mozilla.ACM Transactions on Software Engineering and Methodology (TOSEM)11, 3 (2002), 309–346

  78. [78]

    Behnaz Moradi-Jamei, Brandon L Kramer, J Bayoán Santiago Calderón, and Gizem Korkmaz. 2021. Community formation and detection on GitHub collaboration networks. InProceedings of the 2021 IEEE/ACM international conference on advances in social networks analysis and mining. 244–251

  79. [79]

    Baruch Nevo. 1985. Face validity revisited.Journal of educational measurement22, 4 (1985), 287–293

  80. [80]

    Tim O’Reilly. 1999. Lessons from open-source software development.Commun. ACM42, 4 (1999), 32–37

Showing first 80 references.