arxiv: 2604.06321 · v1 · submitted 2026-04-07 · 💻 cs.DL

Recognition: no theorem link

Matching Researchers to Funding Calls: A Reproducible Institution-Level Framework

Wenceslao Arroyo-Machado , Laura L\'azaro-Soraluce , Clara Ortega-Sevilla , Enrique de la Fuente-Guti\'errez , Daniel Torres-Salinas

Authors on Pith no claims yet

Pith reviewed 2026-05-10 18:07 UTC · model grok-4.3

classification 💻 cs.DL

keywords funding callsresearcher matchingbibliometric profilesword embeddingscosine similarityHorizon Europereproducible frameworkgrant recommendations

0 comments

The pith

An institution-level framework matches researchers to funding calls by creating multiple publication profiles per researcher and semantically matching them to calls via word embeddings.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper develops a reproducible framework for matching researchers to funding opportunities at the institutional level. It avoids single aggregated profiles by building several publication sets for each researcher according to bibliometric criteria including authorship position and time windows. These sets are compared to funding call texts using word embeddings to calculate cosine similarity. Scores undergo within-researcher normalisation and percentile ranking to create ranked recommendations. A case study with 3,013 researchers and 291 Horizon Europe topics shows the different indicators provide complementary information about suitability.

Core claim

The central discovery is a framework that constructs multiple publication sets defined by bibliometric criteria such as authorship position and time window for each researcher, independently matches them to funding calls using word embeddings, and applies within-researcher normalisation and percentile-based ranking of cosine similarity scores to produce recommendations. This was verified through application to 3,013 researchers from the University of Granada and 291 Horizon Europe topics, where the four indicators were found to capture complementary signals.

What carries the argument

The construction of multiple bibliometric publication sets per researcher combined with semantic matching via word embeddings and cosine similarity, transformed into recommendations by normalisation and percentile ranking.

If this is right

The framework can be reproduced at other institutions using their publication data and standard embedding models.
Each researcher receives multiple scores reflecting different facets of their publication record.
The method works across disciplines without being tied to particular funding agencies.
Percentile ranking allows direct comparison of match quality across different researchers and calls.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

If adopted, institutions could streamline the process of identifying suitable funding for their staff.
The approach might be tested by tracking whether recommended researchers actually apply for and receive the suggested grants.
Similar techniques could extend to matching researchers with potential collaborators or conferences.
Integrating actual award data would provide a way to refine the matching accuracy over time.

Load-bearing premise

That the cosine similarity between word-embedding vectors of a researcher's publication sets and a funding call's text serves as a reliable indicator of suitability without additional fine-tuning or human review.

What would settle it

Observing whether researchers who receive high match scores from the framework actually apply for and win funding from those calls at higher rates than low-scoring ones, or comparing against expert judgments on match quality.

read the original abstract

Grant recommendation systems remain one of the least explored areas within academic recommender systems, and existing proposals are typically tied to specific funding agencies or disciplinary domains. This paper presents an institution-level reproducible framework for matching researchers to funding opportunities by combining bibliometric profiling with semantic matching. Rather than representing each researcher through a single aggregated profile, the framework constructs multiple publication sets defined by bibliometric criteria such as authorship position and time window, each independently compared against funding calls using word embeddings. Within-researcher normalisation and percentile-based ranking transform cosine similarity scores into actionable recommendations. A case study applied to 3,013 researchers from the University of Granada and 291 Horizon Europe topics verify it and shows that the four indicators capture complementary signals.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Multiple per-researcher publication sets with embeddings and within-researcher normalisation is a clean structuring choice, but the case study only shows the indicators differ without proving they surface suitable grants.

read the letter

The paper's main contribution is a straightforward way to build four separate publication profiles for each researcher—split by authorship position and time windows—then score each one independently against funding-call text with word embeddings and cosine similarity. They normalise the scores inside each researcher's own list before ranking the calls. That avoids the usual single-profile aggregation and makes the output more actionable at institution scale. They demonstrate it on 3,013 Granada researchers against 291 Horizon Europe topics and report that the four indicators produce different top lists with low inter-correlations, which they read as complementary signals. The pipeline is described clearly enough that it could be reproduced from public data and standard libraries, which is a practical plus for research offices or recommender-system builders. The within-researcher normalisation is a sensible detail that keeps high-volume and low-volume researchers on the same footing. The limitation is the evaluation. The case study stops at showing internal diversity; there are no known successful grant outcomes, no expert-rated match quality, and no baseline comparison to simpler keyword or topic-model approaches. Without that external anchor, the claim that the framework helps researchers find suitable calls rests on the assumption that embedding cosine is a good proxy, which is left untested here. This is the sort of paper a university research-support group or a recommender-systems lab would want to see. It gives them a ready-to-run recipe rather than another high-level idea. I would send it for peer review because the structuring is explicit, the scale is decent, and the reproducibility claim is credible, even though the validation section would need strengthening with some form of outcome data or human judgment to carry more weight.

Referee Report

2 major / 2 minor

Summary. The manuscript proposes a reproducible institution-level framework for matching researchers to funding calls. It constructs multiple publication sets for each researcher based on bibliometric criteria (e.g., authorship position and time windows), computes semantic similarity to funding call texts using word embeddings and cosine similarity, and applies within-researcher normalization with percentile ranking to produce recommendations. The framework is illustrated and claimed to be verified through a case study involving 3,013 researchers at the University of Granada and 291 Horizon Europe topics, demonstrating that four indicators capture complementary signals.

Significance. If the matching approach can be shown to produce useful recommendations, the framework would offer a practical, scalable, and reproducible tool for institutions to assist researchers in identifying suitable funding opportunities. The emphasis on multiple profiles and complementarity is a strength, but the absence of validation against real outcomes currently limits the significance of the contribution.

major comments (2)

[Case Study] Case Study section: The assertion that the case study verifies the framework and shows complementary signals is unsupported by evidence. No quantitative metrics, baseline comparisons, statistical tests, or details on how complementarity was measured (e.g., correlation coefficients or overlap statistics) are supplied; the claim reduces to internal diversity of top-k lists without external grounding.
[Framework and Abstract] Framework and Abstract: The core matching step relies on the unvalidated assumption that cosine similarity between word-embedding vectors of publication sets and funding-call text reliably indicates suitability. No domain-specific fine-tuning, human-rated match validation, or precision@K against known outcomes is reported, rendering the verification circular (indicator diversity does not establish predictive utility).

minor comments (2)

[Abstract] The abstract and introduction could explicitly name the four indicators and describe the exact bibliometric criteria used to define the publication sets.
[Framework] Reproducibility would be strengthened by including pseudocode or explicit equations for the normalization and percentile-ranking steps.

Simulated Author's Rebuttal

2 responses · 1 unresolved

We thank the referee for the thoughtful and constructive review. We agree that the case study section would be strengthened by adding quantitative metrics for complementarity and that the abstract and framework descriptions should clarify the illustrative scope of the case study rather than implying full verification of predictive utility. We will revise the manuscript accordingly to address these concerns while preserving the core contribution of the reproducible multi-profile framework. Our point-by-point responses follow.

read point-by-point responses

Referee: [Case Study] Case Study section: The assertion that the case study verifies the framework and shows complementary signals is unsupported by evidence. No quantitative metrics, baseline comparisons, statistical tests, or details on how complementarity was measured (e.g., correlation coefficients or overlap statistics) are supplied; the claim reduces to internal diversity of top-k lists without external grounding.

Authors: We acknowledge that the current presentation in the case study relies primarily on descriptive examples of differing top-k lists across the four indicators without formal quantitative support. We will revise the section to include explicit metrics such as average Jaccard overlap between top-20 recommendation sets from each indicator pair, Kendall tau rank correlations between the underlying similarity scores, and a comparison against a single aggregated researcher profile as a baseline. These additions will provide measurable evidence of complementarity. We note, however, that the case study is designed to demonstrate the framework's application and output diversity on real institutional data rather than to establish external predictive validity, which would require ground-truth outcomes. revision: yes
Referee: [Framework and Abstract] Framework and Abstract: The core matching step relies on the unvalidated assumption that cosine similarity between word-embedding vectors of publication sets and funding-call text reliably indicates suitability. No domain-specific fine-tuning, human-rated match validation, or precision@K against known outcomes is reported, rendering the verification circular (indicator diversity does not establish predictive utility).

Authors: We agree that the framework builds on the established use of embedding-based cosine similarity for semantic matching without providing domain-specific validation in this work. We will revise the abstract to change 'verify it' to 'demonstrate the application of' and add explicit discussion in the framework and limitations sections clarifying that indicator complementarity is shown internally while overall suitability remains an assumption supported by prior NLP literature. We will also note the absence of fine-tuning (to maintain reproducibility with off-the-shelf models) and the need for future human or outcome-based evaluation. The claim of complementarity is limited to diversity among the four bibliometric profiles and does not assert predictive utility. revision: partial

standing simulated objections not resolved

The manuscript does not have access to real-world outcome data (such as which researchers successfully applied for or received funding from the 291 Horizon Europe topics), preventing the addition of precision@K or similar external validation metrics against known matches.

Circularity Check

0 steps flagged

No significant circularity; framework uses standard techniques with independent inputs

full rationale

The paper constructs four distinct researcher profiles from separate bibliometric criteria (authorship position and time windows), embeds both profiles and funding-call text, computes cosine similarities, applies within-researcher normalization, and produces percentile rankings. The case study simply applies these steps to 3,013 researchers and 291 calls, then observes that the resulting recommendation lists differ across indicators. No equation reduces an output to a fitted parameter or to the input by construction, no self-citation supplies a load-bearing uniqueness theorem, and no ansatz is smuggled in. The claim that the indicators are complementary follows directly from using non-identical input sets rather than any definitional loop. Absence of external ground-truth validation is a limitation of empirical strength, not a circularity in the derivation.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

The work relies on established techniques without introducing new axioms, free parameters, or invented entities; all components are drawn from prior bibliometric and NLP methods.

pith-pipeline@v0.9.0 · 5442 in / 1131 out tokens · 45902 ms · 2026-05-10T18:07:15.182858+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

7 extracted references · 7 canonical work pages

[1]

Aagaard, K., Mongeon, P., Ramos-Vielba, I., & Thomas, D. A. (2021). Getting to the bottom of research funding: Acknowledging the complexity of funding dynamics. PLOS ONE, 16(5), e0251488. https://doi.org/10.1371/journal.pone.0251488 Abudu, R., Oliver, K., & Boaz, A. (2022). What funders are doing to assess the impact of their investments in health and bio...

work page doi:10.1371/journal.pone.0251488 2021
[2]

E., Nagre, K., & Matnani, P

https://doi.org/10.1186/s12961-022-00888-1 Acuna, D. E., Nagre, K., & Matnani, P. (2022). EILEEN: A recommendation system for scientific publications and grants (arXiv:2110.09663). arXiv. https://doi.org/10.48550/arXiv.2110.09663 Álvarez-Bornstein, B., & Montesi, M. (2021). Funding acknowledgements in scientific publications: A literature review. Research...

work page doi:10.1186/s12961-022-00888-1 2022
[3]

M., Funabiki, N., Brata, K

https://doi.org/10.1038/d41586-026-00469-0 Fahrudin, T. M., Funabiki, N., Brata, K. C., Noprianto, N., Muhaimin, A., & Hindrayani, K. M. (2025). Comparative Analysis of Sentence Transformers for Reference Paper Collection in Five Academic Fields. Proceedings of the 2025 8th International Conference on Computational Intelligence and Intelligent Systems , 1...

work page doi:10.1038/d41586-026-00469-0 2025
[4]

https://doi.org/10.1126/science.z1bp5k1 Kamada, S., Ichimura, T., & Watanabe, T

Science. https://doi.org/10.1126/science.z1bp5k1 Kamada, S., Ichimura, T., & Watanabe, T. (2015). Recommendation System of Grants-in-Aid for Researchers by using JSPS Keyword. 2015 IEEE 8th International Workshop on Computational Intelligence and Applications (IWCIA) , 143 –148. https://doi.org/10.1109/IWCIA.2015.7449479 Kamada, S., Ichimura, T., & Watana...

work page doi:10.1126/science.z1bp5k1 2015
[5]

J., Viergever, R

https://doi.org/10.1001/jama.284.1.89 Røttingen, J.-A., Regmi, S., Eide, M., Young, A. J., Viergever, R. F., Årdal, C., Guzman, J., Edwards, D., Matlin, S. A., & Terry, R. F. (2013). Mapping of available health research and development data: What’s there, what’s missing, and what role is there for a global Preprint version v1 March 2026 19 observatory? Th...

work page doi:10.1001/jama.284.1.89 2013
[6]

https://doi.org/10.1186/s12961-020-00560-6 Yin, Y., Dong, Y., Wang, K., Wang, D., & Jones, B. F. (2022). Public use and public funding of science. Nature Human Behaviour , 6(10), 1344 –1350. https://doi.org/10.1038/s41562-022-01397-5 Zhang, Y., Chen, Q., Yang, Z., Lin, H., & Lu, Z. (2019). BioWordVec, improving biomedical word embeddings with subword info...

work page doi:10.1186/s12961-020-00560-6 2022
[7]

G., Yaseen, A., Zhu, J., Sabharwal, R., Roberts, K., Cao, T., & Wu, H

https://doi.org/10.1038/s41597-019-0055-0 Zhang, Z., Patra, B. G., Yaseen, A., Zhu, J., Sabharwal, R., Roberts, K., Cao, T., & Wu, H. (2023). Scholarly recommendation systems: A literature survey. Knowledge and Information Systems, 65(11), 4433–4478. https://doi.org/10.1007/s10115-023-01901-x Zhang, Z., Yaseen, A., & Wu, H. (2024). Scholarly recommendatio...

work page doi:10.1038/s41597-019-0055-0 2023