Gender Differences in Research Topic and Method Convergence among Collaborating Scholars in Library and Information Science
Pith reviewed 2026-06-26 11:12 UTC · model grok-4.3
The pith
Female scholars in Library and Information Science show lower convergence in research topics and methods within collaborating groups than male scholars.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Using Top2Vec to identify topics and the CogFT model to classify methods across 25,204 papers, the study determines that female scholars showed lower convergence in their research methods and topic choices compared to male scholars when working in collaborating groups.
What carries the argument
Convergence measured as similarity in Top2Vec topic vectors and CogFT method classifications inside gender-composed collaboration groups extracted from paper metadata.
If this is right
- Collaborating groups with female scholars explore a wider range of topics than all-male groups.
- Methodological choices show greater variety when female scholars participate in teams.
- Gender composition of research teams directly affects the spread of research approaches in LIS.
- The observed pattern supplies a baseline for tracking how collaboration diversity changes as more women enter the field.
Where Pith is reading between the lines
- The same convergence difference could appear in other disciplines if the method is repeated on their publication records.
- Policies that increase mixed-gender teams might raise overall topic and method diversity without additional interventions.
- Longitudinal checks on newer papers could test whether the gap narrows as gender balance improves.
Load-bearing premise
Author gender can be accurately inferred from names or metadata and the topic and method models capture true similarity without systematic bias in the LIS papers.
What would settle it
Re-running the analysis on the same papers after manually confirming genders for a large sample and using independent topic and method classifiers that finds no difference in convergence rates.
read the original abstract
This study explores gender differences in research topic choice and methodology among collaborating scholars. Previous studies have often focused on gender differences in research topics or methods at the individual level of scholars, without considering collaborating groups, lacking depth and practical guidance. This study takes Library and Information Science (LIS) as an example, employing the Top2Vec method for topic identification and the CogFT model for research method classification. It systematically analyzes 25,204 papers published between 1990 and 2022 to investigate gender differences in the convergence of research topics and method choices among collaborating scholars in this field. The results of the study found that female scholars showed lower convergence in their research methods and topic choices compared to male scholars. This study uses a relatively systematic methodology to address the difficulty of studying gender differences in academic publishing, and is expected to serve as a reference for other disciplines and research questions. This study also emphasizes the manifestation of gender differences in collaborative research and provides insights into the convergence and diversity of research topics and methods chosen by scholars.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper examines gender differences in research topic and method convergence within collaborating groups of scholars in Library and Information Science. It applies Top2Vec for topic identification and the CogFT model for method classification to 25,204 LIS papers (1990–2022), reporting that female scholars exhibit lower convergence in both topics and methods than male scholars in collaborative settings.
Significance. If the central observational claim holds after addressing labeling and model-validation issues, the work would usefully extend prior individual-level gender studies to the collaborative-group level and supply a replicable pipeline for other fields.
major comments (2)
- [Data and methods] The headline comparison of convergence by gender rests entirely on inferred author genders, yet the manuscript reports neither an accuracy metric, a manually validated subsample, nor any handling of ambiguous or non-Western names. Differential misclassification rates by culture, subfield, or career stage would directly confound the reported gender gap.
- [Methods] No training details, validation metrics, or robustness checks are supplied for either Top2Vec (topic model) or CogFT (method classifier). Without these, it is impossible to assess whether the reported convergence differences are artifacts of model choices or corpus-specific biases.
minor comments (2)
- [Abstract] The abstract states the sample size and time window but does not define the precise operationalization of 'convergence' (e.g., pairwise similarity thresholds or aggregation rules across co-authors).
- Table or figure captions should explicitly state the number of papers retained after gender inference and any exclusion criteria.
Simulated Author's Rebuttal
We thank the referee for the constructive comments, which highlight important gaps in reporting. We address each major point below and will revise the manuscript to incorporate the requested details and validations.
read point-by-point responses
-
Referee: [Data and methods] The headline comparison of convergence by gender rests entirely on inferred author genders, yet the manuscript reports neither an accuracy metric, a manually validated subsample, nor any handling of ambiguous or non-Western names. Differential misclassification rates by culture, subfield, or career stage would directly confound the reported gender gap.
Authors: We agree that the absence of validation metrics for gender inference is a limitation that could affect interpretation of the results. In the revised manuscript we will specify the exact inference method employed, report accuracy on a manually validated random subsample of at least 500 authors (stratified by subfield and publication year), and explicitly discuss handling of ambiguous or non-Western names together with any observed differential error rates. These additions will allow readers to assess potential confounding. revision: yes
-
Referee: [Methods] No training details, validation metrics, or robustness checks are supplied for either Top2Vec (topic model) or CogFT (method classifier). Without these, it is impossible to assess whether the reported convergence differences are artifacts of model choices or corpus-specific biases.
Authors: We acknowledge that the current text omits these implementation details. The revised version will include full training procedures and hyperparameter settings for both models, quantitative validation metrics (topic coherence and diversity for Top2Vec; precision, recall, and F1 on a held-out test set for CogFT), and robustness checks such as sensitivity to embedding dimensionality and alternative classifiers. These additions will demonstrate that the reported gender differences are not driven by model-specific artifacts. revision: yes
Circularity Check
No circularity: empirical application of external models to corpus yields group comparisons without self-referential reduction
full rationale
The paper applies Top2Vec for topic modeling and CogFT for method classification to a fixed corpus of 25,204 LIS papers, then computes convergence statistics separately for male- and female-collaboration subsets. No equations, fitted parameters, or predictions are presented; the reported gender differences are direct outputs of these steps rather than quantities defined in terms of themselves. Gender labeling from names/metadata is an upstream input assumption (with acknowledged validation gaps), not a self-citation chain or ansatz that forces the result. The derivation chain is therefore self-contained against external benchmarks and does not reduce to its own inputs by construction.
Axiom & Free-Parameter Ledger
Reference graph
Works this paper leans on
-
[1]
Akyil, F. T., Saygili, E., & Akyil, M. (2020). The perennial issue of gender discrepancy in publications on chest diseases. European Respiratory Journal, 56(6). Alers, M., van Leerdam, L., Dielissen, P., & Lagro-Janssen, A. (2014). Gendered specialities during medical education: a literature review. Perspectives on Medical Education, 3, 163-178. Angelov, ...
arXiv 2020
-
[2]
Chari, A., & Goldsmith-Pinkham, P. (2017). Gender representation in economics across topics and time: Evidence from the NBER summer institute (No. w23953). National Bureau of Economic Research. https://doi.org/10.3386/w23953 Chu, H. (2015). Research methods in library and information science: A content analysis. Library & Information Science Research, 37(...
-
[3]
io | Determine the gender of a name
https://doi.org/10.1177/009430610503400408 Genderize. io | Determine the gender of a name . (n. d. ). Retrieved November 18, 2022, from https://genderize.io/ Grootendorst, M. (2023). BERTopic: Leveraging BERT and c-TF-IDF to create easily interpretable topics. https://github. com/MaartenGr/BERTopic (Original work published
-
[4]
Hofmann, T. (2001). Unsupervised learning by probabilistic latent semantic analysis. Machine learning, 42, 177-196.Hoppe, T. A., Litovitz, A., Willis, K. A., Meseroll, R. A., Perkins, M. J., Hutchins, B. I., ... & Santangelo, G. M. (2019). Topic choice contributes to the lower rate of NIH awards to African-American/black scientists. Science Advances, 5(10...
-
[5]
Su, R., Rounds, J., & Armstrong, P.I. (2009). Men and things, women and people: a meta-analysis of sex differences in interests . Psychological Bulletin, 135(6), 859 . https://doi.org/10.1037/a0017364 Santos, J.M., Horta, H., & Feng, S. (2024). Homophily and its effects on collaborations and repeated collaborations: a study across scientific fields . Scie...
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.