Gender Differences in Research Topic and Method Convergence among Collaborating Scholars in Library and Information Science

Chengzhi Zhang; Linlei Xie; Siqi Wei

arxiv: 2606.21908 · v1 · pith:NWRE7KITnew · submitted 2026-06-20 · 💻 cs.DL · cs.CL· cs.HC· cs.IR

Gender Differences in Research Topic and Method Convergence among Collaborating Scholars in Library and Information Science

Chengzhi Zhang , Linlei Xie , Siqi Wei This is my paper

Pith reviewed 2026-06-26 11:12 UTC · model grok-4.3

classification 💻 cs.DL cs.CLcs.HCcs.IR

keywords gender differencesresearch collaborationtopic convergencemethod convergenceLibrary and Information Sciencescholarly publishingresearch diversity

0 comments

The pith

Female scholars in Library and Information Science show lower convergence in research topics and methods within collaborating groups than male scholars.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper analyzes gender patterns in how scholars jointly select research topics and methods. It processes 25,204 LIS papers from 1990 to 2022 with topic modeling and method classification tools to measure similarity inside collaboration groups. The central finding is that groups involving female scholars display less overlap in both topics and methods than male groups. A reader would care because the result points to measurable differences in how gender shapes collaborative research choices and the resulting spread of ideas in a field.

Core claim

Using Top2Vec to identify topics and the CogFT model to classify methods across 25,204 papers, the study determines that female scholars showed lower convergence in their research methods and topic choices compared to male scholars when working in collaborating groups.

What carries the argument

Convergence measured as similarity in Top2Vec topic vectors and CogFT method classifications inside gender-composed collaboration groups extracted from paper metadata.

If this is right

Collaborating groups with female scholars explore a wider range of topics than all-male groups.
Methodological choices show greater variety when female scholars participate in teams.
Gender composition of research teams directly affects the spread of research approaches in LIS.
The observed pattern supplies a baseline for tracking how collaboration diversity changes as more women enter the field.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same convergence difference could appear in other disciplines if the method is repeated on their publication records.
Policies that increase mixed-gender teams might raise overall topic and method diversity without additional interventions.
Longitudinal checks on newer papers could test whether the gap narrows as gender balance improves.

Load-bearing premise

Author gender can be accurately inferred from names or metadata and the topic and method models capture true similarity without systematic bias in the LIS papers.

What would settle it

Re-running the analysis on the same papers after manually confirming genders for a large sample and using independent topic and method classifiers that finds no difference in convergence rates.

read the original abstract

This study explores gender differences in research topic choice and methodology among collaborating scholars. Previous studies have often focused on gender differences in research topics or methods at the individual level of scholars, without considering collaborating groups, lacking depth and practical guidance. This study takes Library and Information Science (LIS) as an example, employing the Top2Vec method for topic identification and the CogFT model for research method classification. It systematically analyzes 25,204 papers published between 1990 and 2022 to investigate gender differences in the convergence of research topics and method choices among collaborating scholars in this field. The results of the study found that female scholars showed lower convergence in their research methods and topic choices compared to male scholars. This study uses a relatively systematic methodology to address the difficulty of studying gender differences in academic publishing, and is expected to serve as a reference for other disciplines and research questions. This study also emphasizes the manifestation of gender differences in collaborative research and provides insights into the convergence and diversity of research topics and methods chosen by scholars.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper reports lower topic and method convergence in female LIS collaborations but rests on unvalidated name-based gender labels.

read the letter

The one thing to know is that this paper finds female scholars show lower convergence in research topics and methods within collaborating groups in library and information science than male scholars, based on 25k papers from 1990-2022. The group-level angle is the main step beyond prior individual-level studies.

They apply Top2Vec for topics and CogFT for methods to a decent-sized corpus, which is a straightforward extension using established tools. The observational framing and the claim that this addresses a gap in collaborative analysis are reasonable.

The soft spot is the gender inference. The abstract shows no validation sample, accuracy metric, or handling for ambiguous or non-Western names, so the stress-test concern about differential error rates by culture or subfield stands. That directly affects whether the convergence gap can be read as a real gender difference. Without those checks the central comparison is hard to trust. The full text might add robustness tests, but nothing visible yet addresses this.

This is incremental work aimed at scientometrics and gender-in-science researchers who want data points on collaboration patterns in a single field. A reader focused on solid evidence would wait for the labeling details.

It deserves peer review because the question is clear and the data volume is there, even if revisions on gender assignment and sensitivity checks would be needed.

Referee Report

2 major / 2 minor

Summary. The paper examines gender differences in research topic and method convergence within collaborating groups of scholars in Library and Information Science. It applies Top2Vec for topic identification and the CogFT model for method classification to 25,204 LIS papers (1990–2022), reporting that female scholars exhibit lower convergence in both topics and methods than male scholars in collaborative settings.

Significance. If the central observational claim holds after addressing labeling and model-validation issues, the work would usefully extend prior individual-level gender studies to the collaborative-group level and supply a replicable pipeline for other fields.

major comments (2)

[Data and methods] The headline comparison of convergence by gender rests entirely on inferred author genders, yet the manuscript reports neither an accuracy metric, a manually validated subsample, nor any handling of ambiguous or non-Western names. Differential misclassification rates by culture, subfield, or career stage would directly confound the reported gender gap.
[Methods] No training details, validation metrics, or robustness checks are supplied for either Top2Vec (topic model) or CogFT (method classifier). Without these, it is impossible to assess whether the reported convergence differences are artifacts of model choices or corpus-specific biases.

minor comments (2)

[Abstract] The abstract states the sample size and time window but does not define the precise operationalization of 'convergence' (e.g., pairwise similarity thresholds or aggregation rules across co-authors).
Table or figure captions should explicitly state the number of papers retained after gender inference and any exclusion criteria.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive comments, which highlight important gaps in reporting. We address each major point below and will revise the manuscript to incorporate the requested details and validations.

read point-by-point responses

Referee: [Data and methods] The headline comparison of convergence by gender rests entirely on inferred author genders, yet the manuscript reports neither an accuracy metric, a manually validated subsample, nor any handling of ambiguous or non-Western names. Differential misclassification rates by culture, subfield, or career stage would directly confound the reported gender gap.

Authors: We agree that the absence of validation metrics for gender inference is a limitation that could affect interpretation of the results. In the revised manuscript we will specify the exact inference method employed, report accuracy on a manually validated random subsample of at least 500 authors (stratified by subfield and publication year), and explicitly discuss handling of ambiguous or non-Western names together with any observed differential error rates. These additions will allow readers to assess potential confounding. revision: yes
Referee: [Methods] No training details, validation metrics, or robustness checks are supplied for either Top2Vec (topic model) or CogFT (method classifier). Without these, it is impossible to assess whether the reported convergence differences are artifacts of model choices or corpus-specific biases.

Authors: We acknowledge that the current text omits these implementation details. The revised version will include full training procedures and hyperparameter settings for both models, quantitative validation metrics (topic coherence and diversity for Top2Vec; precision, recall, and F1 on a held-out test set for CogFT), and robustness checks such as sensitivity to embedding dimensionality and alternative classifiers. These additions will demonstrate that the reported gender differences are not driven by model-specific artifacts. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical application of external models to corpus yields group comparisons without self-referential reduction

full rationale

The paper applies Top2Vec for topic modeling and CogFT for method classification to a fixed corpus of 25,204 LIS papers, then computes convergence statistics separately for male- and female-collaboration subsets. No equations, fitted parameters, or predictions are presented; the reported gender differences are direct outputs of these steps rather than quantities defined in terms of themselves. Gender labeling from names/metadata is an upstream input assumption (with acknowledged validation gaps), not a self-citation chain or ansatz that forces the result. The derivation chain is therefore self-contained against external benchmarks and does not reduce to its own inputs by construction.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract provides no explicit free parameters, axioms, or invented entities; all modeling assumptions are implicit in the named algorithms.

pith-pipeline@v0.9.1-grok · 5719 in / 973 out tokens · 17352 ms · 2026-06-26T11:12:23.971370+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

5 extracted references · 4 canonical work pages

[1]

T., Saygili, E., & Akyil, M

Akyil, F. T., Saygili, E., & Akyil, M. (2020). The perennial issue of gender discrepancy in publications on chest diseases. European Respiratory Journal, 56(6). Alers, M., van Leerdam, L., Dielissen, P., & Lagro-Janssen, A. (2014). Gendered specialities during medical education: a literature review. Perspectives on Medical Education, 3, 163-178. Angelov, ...

arXiv 2020
[2]

Chari, A., & Goldsmith-Pinkham, P. (2017). Gender representation in economics across topics and time: Evidence from the NBER summer institute (No. w23953). National Bureau of Economic Research. https://doi.org/10.3386/w23953 Chu, H. (2015). Research methods in library and information science: A content analysis. Library & Information Science Research, 37(...

work page doi:10.3386/w23953 2017
[3]

io | Determine the gender of a name

https://doi.org/10.1177/009430610503400408 Genderize. io | Determine the gender of a name . (n. d. ). Retrieved November 18, 2022, from https://genderize.io/ Grootendorst, M. (2023). BERTopic: Leveraging BERT and c-TF-IDF to create easily interpretable topics. https://github. com/MaartenGr/BERTopic (Original work published

work page doi:10.1177/009430610503400408 2022
[4]

Hofmann, T. (2001). Unsupervised learning by probabilistic latent semantic analysis. Machine learning, 42, 177-196.Hoppe, T. A., Litovitz, A., Willis, K. A., Meseroll, R. A., Perkins, M. J., Hutchins, B. I., ... & Santangelo, G. M. (2019). Topic choice contributes to the lower rate of NIH awards to African-American/black scientists. Science Advances, 5(10...

work page doi:10.1073/pnas.0901265106 2001
[5]

Su, R., Rounds, J., & Armstrong, P.I. (2009). Men and things, women and people: a meta-analysis of sex differences in interests . Psychological Bulletin, 135(6), 859 . https://doi.org/10.1037/a0017364 Santos, J.M., Horta, H., & Feng, S. (2024). Homophily and its effects on collaborations and repeated collaborations: a study across scientific fields . Scie...

work page doi:10.1037/a0017364 2009

[1] [1]

T., Saygili, E., & Akyil, M

Akyil, F. T., Saygili, E., & Akyil, M. (2020). The perennial issue of gender discrepancy in publications on chest diseases. European Respiratory Journal, 56(6). Alers, M., van Leerdam, L., Dielissen, P., & Lagro-Janssen, A. (2014). Gendered specialities during medical education: a literature review. Perspectives on Medical Education, 3, 163-178. Angelov, ...

arXiv 2020

[2] [2]

Chari, A., & Goldsmith-Pinkham, P. (2017). Gender representation in economics across topics and time: Evidence from the NBER summer institute (No. w23953). National Bureau of Economic Research. https://doi.org/10.3386/w23953 Chu, H. (2015). Research methods in library and information science: A content analysis. Library & Information Science Research, 37(...

work page doi:10.3386/w23953 2017

[3] [3]

io | Determine the gender of a name

https://doi.org/10.1177/009430610503400408 Genderize. io | Determine the gender of a name . (n. d. ). Retrieved November 18, 2022, from https://genderize.io/ Grootendorst, M. (2023). BERTopic: Leveraging BERT and c-TF-IDF to create easily interpretable topics. https://github. com/MaartenGr/BERTopic (Original work published

work page doi:10.1177/009430610503400408 2022

[4] [4]

Hofmann, T. (2001). Unsupervised learning by probabilistic latent semantic analysis. Machine learning, 42, 177-196.Hoppe, T. A., Litovitz, A., Willis, K. A., Meseroll, R. A., Perkins, M. J., Hutchins, B. I., ... & Santangelo, G. M. (2019). Topic choice contributes to the lower rate of NIH awards to African-American/black scientists. Science Advances, 5(10...

work page doi:10.1073/pnas.0901265106 2001

[5] [5]

Su, R., Rounds, J., & Armstrong, P.I. (2009). Men and things, women and people: a meta-analysis of sex differences in interests . Psychological Bulletin, 135(6), 859 . https://doi.org/10.1037/a0017364 Santos, J.M., Horta, H., & Feng, S. (2024). Homophily and its effects on collaborations and repeated collaborations: a study across scientific fields . Scie...

work page doi:10.1037/a0017364 2009