pith. machine review for the scientific record. sign in

arxiv: 2604.06321 · v1 · submitted 2026-04-07 · 💻 cs.DL

Recognition: no theorem link

Matching Researchers to Funding Calls: A Reproducible Institution-Level Framework

Authors on Pith no claims yet

Pith reviewed 2026-05-10 18:07 UTC · model grok-4.3

classification 💻 cs.DL
keywords funding callsresearcher matchingbibliometric profilesword embeddingscosine similarityHorizon Europereproducible frameworkgrant recommendations
0
0 comments X

The pith

An institution-level framework matches researchers to funding calls by creating multiple publication profiles per researcher and semantically matching them to calls via word embeddings.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper develops a reproducible framework for matching researchers to funding opportunities at the institutional level. It avoids single aggregated profiles by building several publication sets for each researcher according to bibliometric criteria including authorship position and time windows. These sets are compared to funding call texts using word embeddings to calculate cosine similarity. Scores undergo within-researcher normalisation and percentile ranking to create ranked recommendations. A case study with 3,013 researchers and 291 Horizon Europe topics shows the different indicators provide complementary information about suitability.

Core claim

The central discovery is a framework that constructs multiple publication sets defined by bibliometric criteria such as authorship position and time window for each researcher, independently matches them to funding calls using word embeddings, and applies within-researcher normalisation and percentile-based ranking of cosine similarity scores to produce recommendations. This was verified through application to 3,013 researchers from the University of Granada and 291 Horizon Europe topics, where the four indicators were found to capture complementary signals.

What carries the argument

The construction of multiple bibliometric publication sets per researcher combined with semantic matching via word embeddings and cosine similarity, transformed into recommendations by normalisation and percentile ranking.

If this is right

  • The framework can be reproduced at other institutions using their publication data and standard embedding models.
  • Each researcher receives multiple scores reflecting different facets of their publication record.
  • The method works across disciplines without being tied to particular funding agencies.
  • Percentile ranking allows direct comparison of match quality across different researchers and calls.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • If adopted, institutions could streamline the process of identifying suitable funding for their staff.
  • The approach might be tested by tracking whether recommended researchers actually apply for and receive the suggested grants.
  • Similar techniques could extend to matching researchers with potential collaborators or conferences.
  • Integrating actual award data would provide a way to refine the matching accuracy over time.

Load-bearing premise

That the cosine similarity between word-embedding vectors of a researcher's publication sets and a funding call's text serves as a reliable indicator of suitability without additional fine-tuning or human review.

What would settle it

Observing whether researchers who receive high match scores from the framework actually apply for and win funding from those calls at higher rates than low-scoring ones, or comparing against expert judgments on match quality.

read the original abstract

Grant recommendation systems remain one of the least explored areas within academic recommender systems, and existing proposals are typically tied to specific funding agencies or disciplinary domains. This paper presents an institution-level reproducible framework for matching researchers to funding opportunities by combining bibliometric profiling with semantic matching. Rather than representing each researcher through a single aggregated profile, the framework constructs multiple publication sets defined by bibliometric criteria such as authorship position and time window, each independently compared against funding calls using word embeddings. Within-researcher normalisation and percentile-based ranking transform cosine similarity scores into actionable recommendations. A case study applied to 3,013 researchers from the University of Granada and 291 Horizon Europe topics verify it and shows that the four indicators capture complementary signals.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The manuscript proposes a reproducible institution-level framework for matching researchers to funding calls. It constructs multiple publication sets for each researcher based on bibliometric criteria (e.g., authorship position and time windows), computes semantic similarity to funding call texts using word embeddings and cosine similarity, and applies within-researcher normalization with percentile ranking to produce recommendations. The framework is illustrated and claimed to be verified through a case study involving 3,013 researchers at the University of Granada and 291 Horizon Europe topics, demonstrating that four indicators capture complementary signals.

Significance. If the matching approach can be shown to produce useful recommendations, the framework would offer a practical, scalable, and reproducible tool for institutions to assist researchers in identifying suitable funding opportunities. The emphasis on multiple profiles and complementarity is a strength, but the absence of validation against real outcomes currently limits the significance of the contribution.

major comments (2)
  1. [Case Study] Case Study section: The assertion that the case study verifies the framework and shows complementary signals is unsupported by evidence. No quantitative metrics, baseline comparisons, statistical tests, or details on how complementarity was measured (e.g., correlation coefficients or overlap statistics) are supplied; the claim reduces to internal diversity of top-k lists without external grounding.
  2. [Framework and Abstract] Framework and Abstract: The core matching step relies on the unvalidated assumption that cosine similarity between word-embedding vectors of publication sets and funding-call text reliably indicates suitability. No domain-specific fine-tuning, human-rated match validation, or precision@K against known outcomes is reported, rendering the verification circular (indicator diversity does not establish predictive utility).
minor comments (2)
  1. [Abstract] The abstract and introduction could explicitly name the four indicators and describe the exact bibliometric criteria used to define the publication sets.
  2. [Framework] Reproducibility would be strengthened by including pseudocode or explicit equations for the normalization and percentile-ranking steps.

Simulated Author's Rebuttal

2 responses · 1 unresolved

We thank the referee for the thoughtful and constructive review. We agree that the case study section would be strengthened by adding quantitative metrics for complementarity and that the abstract and framework descriptions should clarify the illustrative scope of the case study rather than implying full verification of predictive utility. We will revise the manuscript accordingly to address these concerns while preserving the core contribution of the reproducible multi-profile framework. Our point-by-point responses follow.

read point-by-point responses
  1. Referee: [Case Study] Case Study section: The assertion that the case study verifies the framework and shows complementary signals is unsupported by evidence. No quantitative metrics, baseline comparisons, statistical tests, or details on how complementarity was measured (e.g., correlation coefficients or overlap statistics) are supplied; the claim reduces to internal diversity of top-k lists without external grounding.

    Authors: We acknowledge that the current presentation in the case study relies primarily on descriptive examples of differing top-k lists across the four indicators without formal quantitative support. We will revise the section to include explicit metrics such as average Jaccard overlap between top-20 recommendation sets from each indicator pair, Kendall tau rank correlations between the underlying similarity scores, and a comparison against a single aggregated researcher profile as a baseline. These additions will provide measurable evidence of complementarity. We note, however, that the case study is designed to demonstrate the framework's application and output diversity on real institutional data rather than to establish external predictive validity, which would require ground-truth outcomes. revision: yes

  2. Referee: [Framework and Abstract] Framework and Abstract: The core matching step relies on the unvalidated assumption that cosine similarity between word-embedding vectors of publication sets and funding-call text reliably indicates suitability. No domain-specific fine-tuning, human-rated match validation, or precision@K against known outcomes is reported, rendering the verification circular (indicator diversity does not establish predictive utility).

    Authors: We agree that the framework builds on the established use of embedding-based cosine similarity for semantic matching without providing domain-specific validation in this work. We will revise the abstract to change 'verify it' to 'demonstrate the application of' and add explicit discussion in the framework and limitations sections clarifying that indicator complementarity is shown internally while overall suitability remains an assumption supported by prior NLP literature. We will also note the absence of fine-tuning (to maintain reproducibility with off-the-shelf models) and the need for future human or outcome-based evaluation. The claim of complementarity is limited to diversity among the four bibliometric profiles and does not assert predictive utility. revision: partial

standing simulated objections not resolved
  • The manuscript does not have access to real-world outcome data (such as which researchers successfully applied for or received funding from the 291 Horizon Europe topics), preventing the addition of precision@K or similar external validation metrics against known matches.

Circularity Check

0 steps flagged

No significant circularity; framework uses standard techniques with independent inputs

full rationale

The paper constructs four distinct researcher profiles from separate bibliometric criteria (authorship position and time windows), embeds both profiles and funding-call text, computes cosine similarities, applies within-researcher normalization, and produces percentile rankings. The case study simply applies these steps to 3,013 researchers and 291 calls, then observes that the resulting recommendation lists differ across indicators. No equation reduces an output to a fitted parameter or to the input by construction, no self-citation supplies a load-bearing uniqueness theorem, and no ansatz is smuggled in. The claim that the indicators are complementary follows directly from using non-identical input sets rather than any definitional loop. Absence of external ground-truth validation is a limitation of empirical strength, not a circularity in the derivation.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

The work relies on established techniques without introducing new axioms, free parameters, or invented entities; all components are drawn from prior bibliometric and NLP methods.

pith-pipeline@v0.9.0 · 5442 in / 1131 out tokens · 45902 ms · 2026-05-10T18:07:15.182858+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

7 extracted references · 7 canonical work pages

  1. [1]

    Aagaard, K., Mongeon, P., Ramos-Vielba, I., & Thomas, D. A. (2021). Getting to the bottom of research funding: Acknowledging the complexity of funding dynamics. PLOS ONE, 16(5), e0251488. https://doi.org/10.1371/journal.pone.0251488 Abudu, R., Oliver, K., & Boaz, A. (2022). What funders are doing to assess the impact of their investments in health and bio...

  2. [2]

    E., Nagre, K., & Matnani, P

    https://doi.org/10.1186/s12961-022-00888-1 Acuna, D. E., Nagre, K., & Matnani, P. (2022). EILEEN: A recommendation system for scientific publications and grants (arXiv:2110.09663). arXiv. https://doi.org/10.48550/arXiv.2110.09663 Álvarez-Bornstein, B., & Montesi, M. (2021). Funding acknowledgements in scientific publications: A literature review. Research...

  3. [3]

    M., Funabiki, N., Brata, K

    https://doi.org/10.1038/d41586-026-00469-0 Fahrudin, T. M., Funabiki, N., Brata, K. C., Noprianto, N., Muhaimin, A., & Hindrayani, K. M. (2025). Comparative Analysis of Sentence Transformers for Reference Paper Collection in Five Academic Fields. Proceedings of the 2025 8th International Conference on Computational Intelligence and Intelligent Systems , 1...

  4. [4]

    https://doi.org/10.1126/science.z1bp5k1 Kamada, S., Ichimura, T., & Watanabe, T

    Science. https://doi.org/10.1126/science.z1bp5k1 Kamada, S., Ichimura, T., & Watanabe, T. (2015). Recommendation System of Grants-in-Aid for Researchers by using JSPS Keyword. 2015 IEEE 8th International Workshop on Computational Intelligence and Applications (IWCIA) , 143 –148. https://doi.org/10.1109/IWCIA.2015.7449479 Kamada, S., Ichimura, T., & Watana...

  5. [5]

    J., Viergever, R

    https://doi.org/10.1001/jama.284.1.89 Røttingen, J.-A., Regmi, S., Eide, M., Young, A. J., Viergever, R. F., Årdal, C., Guzman, J., Edwards, D., Matlin, S. A., & Terry, R. F. (2013). Mapping of available health research and development data: What’s there, what’s missing, and what role is there for a global Preprint version v1 March 2026 19 observatory? Th...

  6. [6]

    https://doi.org/10.1186/s12961-020-00560-6 Yin, Y., Dong, Y., Wang, K., Wang, D., & Jones, B. F. (2022). Public use and public funding of science. Nature Human Behaviour , 6(10), 1344 –1350. https://doi.org/10.1038/s41562-022-01397-5 Zhang, Y., Chen, Q., Yang, Z., Lin, H., & Lu, Z. (2019). BioWordVec, improving biomedical word embeddings with subword info...

  7. [7]

    G., Yaseen, A., Zhu, J., Sabharwal, R., Roberts, K., Cao, T., & Wu, H

    https://doi.org/10.1038/s41597-019-0055-0 Zhang, Z., Patra, B. G., Yaseen, A., Zhu, J., Sabharwal, R., Roberts, K., Cao, T., & Wu, H. (2023). Scholarly recommendation systems: A literature survey. Knowledge and Information Systems, 65(11), 4433–4478. https://doi.org/10.1007/s10115-023-01901-x Zhang, Z., Yaseen, A., & Wu, H. (2024). Scholarly recommendatio...