pith.machine review for the scientific record.sign in
cs.DL
Digital Libraries
Covers all aspects of the digital library design and document and text creation. Note that there will be some overlap with Information Retrieval (which is a separate subject area). Roughly includes material in ACM Subject Classes H.3.5, H.3.6, H.3.7, I.7.
Augmentation with text similarity cuts disconnection in math and operations research networks while keeping disciplinary clusters intact.
abstractclick to expand
Citation graphs are fundamental tools for modeling scientific structure, but are often fragmented due to missing citations of scientifically connected articles. To address this issue, we propose a computationally efficient hybrid framework integrating citation topology with large language model (LLM)-based text similarity. Using 662,369 Web of Science publications in Mathematics and Operations Research & Management Science, we augment the original graph by adding semantic edges from small, disconnected components and weighting existing citations according to textual similarity. Semantic augmentation substantially reduces fragmentation while preserving disciplinary homogeneity. Compared to embedding-only clustering, cluster detection on augmented graphs using the Leiden algorithm retains structural interpretability while offering multi-scale organization. The method scales efficiently to large datasets and offers a practical strategy for strengthening citation-based indicators without collapsing disciplinary boundaries.
Regulators should require raw participant data before accepting the published results as verified.
abstractclick to expand
Paper mills produce fraudulent research manuscripts built on recycled tables and figures, or on entirely fabricated data. A more recent pattern has emerged: apparently genuine trials with real patients, but with manipulated statistical analyses engineered to support regulatory approval while remaining plausible to peer reviewers. This analysis applies the INSPECT-SR trustworthiness framework to 23 randomised controlled trials and post-marketing studies linked to CinnaGen Co., Iran's largest biosimilar manufacturer, and its clinical operations subsidiary Orchid Pharmed. Papers were retrieved from PubMed and assessed against the original study records. A total of 180 problems were identified across nine categories. The five most frequent issues were reporting failures (n=37), arithmetic violations (n=28), design flaws (n=26), registration irregularities (n=25), and statistical errors (n=25). Analysis of the co authorship network shows that trial design, data management, and manuscript preparation were concentrated within the sponsoring organisation. The underlying structural drivers appear to be a convergence of domestic publication incentives, commercial pressure from international sanctions that created demand for domestically produced drugs, and regulatory pathways that require this body of trial evidence. Because this pattern differs fundamentally from classical paper mills, we propose the term clinical trial engineering to describe it. Regulatory bodies, including the European Medicines Agency (EMA), should treat published clinical evidence from this cluster as unverified until independent access to individual participant data is granted
This exploratory study examines how low-impact journals, defined through subject-normalized Eigenfactor percentiles, are associated with denser and more reciprocating patterns of author-to-author citations. Using Crossref records, we assign journals to broad subject areas, compute subject-specific Eigenfactor scores, propagate venue quality to works and authors, match authors in low- (Case) versus high-influence (Control) venues by subject and h5, and analyze citation edges for cohesion and anomalies. Across a 10% sample of 9,431 matched pairs, authors in low-impact venues exhibit significantly higher cohesion: 6.7x higher co-author citation rates and 4.7x higher reciprocity in the aggregate Case-Control comparison. A subject-aware hybrid detection pipeline flags 277 outliers with 93.5% Case purity; these outliers display an 11x clique-strength lift relative to non-outliers, revealing a stark "Two Worlds" segregation (r = 0.71) where low-impact venues operate as closed citation economies. The largest detected component (n = 23) displays a hub-and-spoke topology in which peripheral "Sycophants" funnel citations to central "Beneficiaries" through coordinated bursts, confirming a directed flow imbalance rather than reciprocal exchange among equals. Overall, cohesion, rather than broad asymmetry, accounts for the main Case-Control differences, suggesting that low-impact venues foster segregated, inward-looking citation economies that distort bibliometric indicators.
Scholarly blogs have become an important venue for scholarly communication, yet they remain insufficiently integrated into digital research and information infrastructures, which places their long-term preservation and citability at risk. This study investigates what challenges German scholarly bloggers perceive concerning blog preservation and what requirements they articulate for a sustainable information infrastructure. Drawing on Star and Ruhleder's (1996) dimensions of information infrastructure as a theoretical lens, we conducted and qualitatively analyzed 13 semi-structured interviews with scholarly bloggers. The analysis reveals three connected themes. First, bloggers perceive a structural deficit in institutional responsibility and support: the long-term preservation of blogs is not systematically assumed by libraries, universities, or platforms, while bloggers are not sufficiently supported by their affiliated institutions. Second, bloggers articulate heterogeneous requirements like persistent identifiers, structured metadata, technical interoperability, and organizational sustainability. Third, governance preferences are characterized by distrust toward commercial and public infrastructures, compounded by concerns about geopolitical dependencies on non-European platforms. These findings demonstrate that no single centralized infrastructure can adequately address the diverse and context-dependent needs of bloggers. We argue for a decentralized information infrastructure for scholarly blogs and offer concrete recommendations for information infrastructure facilities, platform providers, bloggers and research performing organizations.
Study of 106k papers from 2000-2024 finds CV leads academic impact while Web&IR stays industry-driven
abstractclick to expand
Recent artificial intelligence has developed rapidly with significant interdisciplinary expansion, yet existing studies often treat it as a whole, lacking systematic long-term subfield comparisons and structural analyses, thereby limiting understanding of internal differences and evolutionary mechanisms. To address this gap, we employ bibliometric methods, using expert interviews and indicator screening to construct an analytical framework. Twelve bibliometric indicators are selected across three dimensions: Impact and Dissemination, Collaboration Characteristics, and Author Characteristics. We conduct horizontal and longitudinal analyses of five subfields (AI, CV, ML, NLP, Web\&IR) from 2000 to 2024. Using CSRankings classification and a dataset of 106,622 papers, we apply violin plots, chord diagrams, and sankey diagrams to characterize structural features and evolutionary paths. Results show that these subfields have entered high-intensity knowledge diffusion: academic impact increased, knowledge dissemination accelerated, external disciplinary reliance grown, and knowledge production shifted from closed accumulation to open, interdisciplinary, multi-actor networks. On this basis, subfields exhibit significant structural differentiation: CV leads in academic impact with a task-oriented trajectory; ML shows shrinking industry collaboration but concentrated international collaboration with a relatively dispersed structure; Web\&IR is strongly industry-driven with a stable collaboration network; AI shows continuous growth; NLP remains relatively stable. Overall, this study reveals artificial intelligence evolving from unified diffusion to structural differentiation, constructs an extensible multidimensional framework, and provides a quantitative approach for understanding complex technological field evolution.
Large language models (LLMs) are known to generate plausible but false information across a wide range of contexts, yet the real-world magnitude and consequences of this hallucination problem remain poorly understood. Here we leverage a uniquely verifiable object - scientific citations - to audit 111 million references across 2.5 million papers in arXiv, bioRxiv, SSRN, and PubMed Central. We find a sharp rise in non-existent references following widespread LLM adoption, with a conservative estimate of 146,932 hallucinated citations in 2025 alone. These errors are diffusely embedded across many papers but especially pronounced in fields with rapid AI uptake, in manuscripts with linguistic signatures of AI-assisted writing, and among small and early-career author teams. At the same time, hallucinated references disproportionately assign credit to already prominent and male scholars, suggesting that LLM-generated errors may reinforce existing inequities in scientific recognition. Preprint moderation and journal publication processes capture only a fraction of these errors, suggesting that the spread of hallucinated content has outpaced existing safeguards. Together, these findings demonstrate that LLM hallucinations are infiltrating knowledge production at scale, threatening both the reliability and equity of future scientific discovery as human and AI systems draw on the existing literature.
Faculty mobility is often understood as a mechanism through which universities redistribute scientific talent and potentially improve research performance. Yet the system-level structure of mobility and its association with individual research trajectories have rarely been examined together. We link longitudinal faculty rosters from U.S. research universities to OpenAlex publication records and study 11,535 tenure-system faculty members who changed institutions between 2011 and 2020, with a comparison group of more than 200,000 non-moving faculty members. A directed network of faculty moves reveals a strongly hierarchical market: high-prestige institutions are net importers, lower-prestige institutions are net exporters, and the mobility hierarchy closely parallels the hierarchy observed in faculty hiring. However, event-study models that account for pre-move trajectories show little evidence of sustained post-move gains in publication volume, citation impact, or top-cited publication rates, including among upward moves to more prestigious institutions. The most consistent post-move change is collaborative: movers form new coauthor ties. We also observe modest increases in the share of papers with positive CD-index values. These patterns suggest that faculty mobility primarily reallocates existing research capacity within a persistent institutional hierarchy rather than systematically altering individual research trajectories.
The extent to which Artificial Intelligence (AI) technologies can trigger generalized paradigm shifts in science is unclear. Although these technologies have revolutionized data collection and analysis in specific fields, their overall impact depends on the scope and ways of adoption. We analyze over 227 million scholarly works from the OpenAlex collection (1960-2024) spanning four scientific domains and 46 fields. To distinguish the use of AI as research method (AI adoption) from mentioning AI-related terms (AI engagement), we developed a two-step AI-assisted semantic classification pipeline, validated through human coding of 911 abstracts and a robustness check on 348,000 full-text articles (PLOS One). We document differences in the timing and extent of AI adoption across domains, with generalized exponential growth after 2015. The transformative nature of this growth, however, is less apparent. AI-supported research is confined to a few topics with strong ties to Computer Science and conventional statistical frameworks, suggesting limited epistemological transformation. It is also associated with an unwarranted citation premium and substantially higher retraction rates than non-AI-supported. Geographically, while wealthy countries lead in AI publications per capita, global South countries in a belt from Indonesia to Algeria lead in AI adoption relative to their national output, signaling a distinctive resource concentration pattern. The transformative capacity of AI in science thus remains untapped, and its rapid adoption underlines challenges in research openness, transparency, reproducibility, and ethics. We discuss how best research practices could boost the benefits of AI adoption and highlight areas that warrant closer scrutiny.
The extent to which Artificial Intelligence (AI) technologies can trigger generalized paradigm shifts in science is unclear. Although these technologies have revolutionized data collection and analysis in specific fields, their overall impact depends on the scope and ways of adoption. We analyze over 227 million scholarly works from the OpenAlex collection (1960-2024) spanning four scientific domains and 46 fields. To distinguish the use of AI as research method (AI adoption) from mentioning AI-related terms (AI engagement), we developed a two-step AI-assisted semantic classification pipeline, validated through human coding of 911 abstracts and a robustness check on 348,000 full-text articles (PLOS One). We document differences in the timing and extent of AI adoption across domains, with generalized exponential growth after 2015. The transformative nature of this growth, however, is less apparent. AI-supported research is confined to a few topics with strong ties to Computer Science and conventional statistical frameworks, suggesting limited epistemological transformation. It is also associated with an unwarranted citation premium and substantially higher retraction rates than non-AI-supported. Geographically, while wealthy countries lead in AI publications per capita, global South countries in a belt from Indonesia to Algeria lead in AI adoption relative to their national output, signaling a distinctive resource concentration pattern. The transformative capacity of AI in science thus remains untapped, and its rapid adoption underlines challenges in research openness, transparency, reproducibility, and ethics. We discuss how best research practices could boost the benefits of AI adoption and highlight areas that warrant closer scrutiny.
The extent to which Artificial Intelligence (AI) technologies can trigger generalized paradigm shifts in science is unclear. Although these technologies have revolutionized data collection and analysis in specific fields, their overall impact depends on the scope and ways of adoption. We analyze over 227 million scholarly works from the OpenAlex collection (1960-2024) spanning four scientific domains and 46 fields. To distinguish the use of AI as research method (AI adoption) from mentioning AI-related terms (AI engagement), we developed a two-step AI-assisted semantic classification pipeline, validated through human coding of 911 abstracts and a robustness check on 348,000 full-text articles (PLOS One). We document differences in the timing and extent of AI adoption across domains, with generalized exponential growth after 2015. The transformative nature of this growth, however, is less apparent. AI-supported research is confined to a few topics with strong ties to Computer Science and conventional statistical frameworks, suggesting limited epistemological transformation. It is also associated with an unwarranted citation premium and substantially higher retraction rates than non-AI-supported. Geographically, while wealthy countries lead in AI publications per capita, global South countries in a belt from Indonesia to Algeria lead in AI adoption relative to their national output, signaling a distinctive resource concentration pattern. The transformative capacity of AI in science thus remains untapped, and its rapid adoption underlines challenges in research openness, transparency, reproducibility, and ethics. We discuss how best research practices could boost the benefits of AI adoption and highlight areas that warrant closer scrutiny.
Post-publication peer review (PPPR) has emerged as an important supplement to traditional peer review, with social media playing a growing role in publicising potential problems in published research. However, it remains unclear whether social media discussions of retracted articles primarily reflect good practices, such as exposing flaws and acknowledging retraction status, or bad practices, such as overlooking retractions and continuing to disseminate scientific misinformation. In this study, we collected Bluesky posts referencing scholarly articles from Altmetric and retrieved metadata for the referenced articles using OpenAlex. The final dataset included 284 retracted articles with 79 pre-retraction posts and 857 post-retraction posts, 59 retraction notices with 186 posts, and 609,461 non-retracted articles with 1,344,756 posts. We manually coded Bluesky posts discussing retracted articles to identify instances of good and bad practice. The results show that posts demonstrating good practice (89.9%) substantially outnumbered those demonstrating bad practice (10.1%). Posts reflecting good practice also had more user engagement. In the pre-retraction phase, good practice posts constituted a slight minority (43.0%), whereas in the post-retraction phase they were dominant (94.2%). Most negative posts in the pre-retraction phase (90.0%) had good practice while only 17.3% positive posts in the post-retraction phase showed bad practice. Thus, sentiment analysis can be helpful to filter posts that could flag potential flaws before retraction, but it may struggle to accurately identify the spread of misinformation after retraction. More broadly, this study highlights the potential of Bluesky to support responsible scientific communication, public scrutiny, and research integrity.
This paper presents a modular AI agentic skill pipeline for automating subject indexing with Library of Congress Subject Headings (LCSH). Subject indexing - the process of analyzing a work's aboutness, selecting controlled vocabulary terms, and encoding them as MARC21 subject access fields - is one of the most time-consuming components of library cataloging. The system decomposes this process into four discrete, sequentially executed agent skills: conceptual analysis, quantitative filtering, authority validation, and MARC field synthesis. Each skill encodes domain knowledge drawn directly from Library of Congress Subject Headings Manual (SHM) instruction sheets and subject analysis theory. The pipeline was evaluated against a corpus of ten titles whose existing subject headings were captured from the Harvard Library bibliographic dataset (a snapshot of their Alma ILS). Results demonstrate strong conceptual alignment with professional subject indexing practice, with notable differences in specificity, subdivision practice, and the agent's adherence to the 2026 LC policy discontinuing form subdivisions in favor of LCGFT 655 fields.
Scientific peer review increasingly struggles to assess reproducibility at the scale and complexity of modern research output. Evaluating reproducibility requires reconstructing experimental dependencies, methodological choices, data flows, and result-generating procedures, which often exceeds what human reviewers can provide. Agentic Reproducibility Assessment (ARA) formalizes reproducibility assessment as a structured reasoning task over scientific documents. Given a paper, ARA extracts a directed workflow graph linking sources, methods, experiments, and outputs, then evaluates its reconstructability using structural and content-based scores for reproducibility assessments. Experiments on 213 ReScience C articles - the largest cross-domain benchmark of human-validated computational reproducibility studies considered to date - demonstrate ARA's generalizability and consistent workflow reconstruction and assessment across LLMs, model temperatures, and scientific domains. ARA achieves ~61% accuracy on three benchmarks, and the highest accuracy reported on ReproBench (60.71% vs. 36.84%) and GoldStandardDB (61.68% vs. 43.56%), highlighting its potential to complement human review at scale and enabling next-generation peer review. Code and Data available: https://github.com/AndresLaverdeMarin/agentic_reproducibility_assessment.
Contemporary scientometric indicators remain anchored in paradigms and axioms from when academic research was conducted in small scholarly communities. With the global proliferation of scientific research, academia is now organized in large communities with high rates of information incompleteness regarding work impact and individual contributions. This has significant implications for how research output is measured and quality controlled, especially as the rate of academic publishing continues to rise. Exploits of complex systems are typically found at discrete transition points where rules turn on or off, and academia is not immune to this pattern. Exploitative career boosting strategies are a growing problem, largely enabled by misaligned incentives and traditional metrics that force discretization of credit to authors and prior works despite their fundamentally continuous nature.
This article introduces Liberata's scientometrics, a share based framework for academic publishing and quality control. In this system, authorship positions are replaced with contribution shares that sum to unity and encode both ordinality and relative contribution distances. These shares can be traded on Liberata's academic marketplaces for quality control services such as peer review and replication, rewarding contributors based on the long term success of the work. Citations are weighted to guard against frivolous referencing and credit inflation, and modular correction factors allow multiple measures of impact. Liberata's metrics are formalized through two fundamental graphs, Shares and References, from which the system constructs academic capital and derives scientometrics capturing impact, risk, collaboration, collusion, value of quality control, and diversification. These metrics represent academic contributions and extend naturally to institutions, regions, time periods, and research fields.
HERITRACE is an open-source web application that enables users without Semantic Web expertise to curate RDF data through form-based interfaces with automatic provenance documentation and change tracking in RDF. It uses SHACL for data model definition and form generation, connects to existing SPARQL-accessible stores without data migration, and records every modification as a provenance snapshot that can be browsed and restored. HERITRACE is domain-agnostic: adapting it to a new collection requires only SHACL shapes and YAML display rules, without code changes. This paper describes the software architecture and provides the first empirical evaluation. HERITRACE is deployed in production for the ParaText project, where classical philologists curate bibliographic data about ancient Greek exegetical traditions, and is planned as the editing interface for OpenCitations and as the curation layer for the Social Sciences and Humanities Citation Index within the GRAPHIA Horizon Europe project. Since it operates on any SPARQL-accessible store without data migration, its adoption potential extends to any domain maintaining RDF data. HERITRACE is publicly available on GitHub under the ISC license, archived on Zenodo and Software Heritage Archive, and documented for deployment with a pre-built Docker image.
OpenAlex has recently emerged as a leading alternative to proprietary bibliometric sources. However, concerns remain regarding the quality of its metadata, especially the institutional profiles which are crucial for evaluating organizations. This study assesses the quality of affiliation data in OpenAlex using German research institutions. Publications from top-tier journals were analyzed and institutional publication counts in OpenAlex were systematically compared with counts in Scopus. The results show that OpenAlex generally contains more publications at the journal level, reflecting its broader coverage. However, institutional publication counts in OpenAlex are consistently lower, indicating missing or incorrectly assigned affiliations. Nevertheless, the correlations between institutional outputs in both databases are very high, suggesting that relative institutional rankings remain stable. These findings suggest that OpenAlex is suitable for comparative institutional analyses in academic research but requires further improvement in affiliation metadata before it can be used for evaluation contexts that rely on absolute publication counts.
AI classifier finds reuse higher than citation counts, implying open data benefits are underestimated.
abstractclick to expand
Numerous metascience studies and other initiatives have begun to monitor the prevalence of open science practices when it is more important to understand the 'downstream' effects or impacts of open science. PLOS and DataSeer have developed a new LLM-based indicator to measure an important effect of open science: the reuse of research data. Our results show a data reuse rate of 43%, which is higher than established bibliometric techniques. We show that data reuse can be measured at scale using LLMs and generative artificial intelligence. The positive effects of research data sharing and reuse may currently be underestimated.
Four types of changes with customers, collaborators and competitors support a new fee structure and alter views on value and community.
abstractclick to expand
Sustaining open data infrastructures over time is a complex puzzle, involving dynamic funding models and relationships with customers, collaborators, and competitors. Despite their importance, these mechanisms are often hidden from view, limiting their applicability to other infrastructures. In this article, we examine how Dryad, a well-known open data infrastructure, has worked toward financial sustainability by reconfiguring relationships with other actors and by strategically implementing a new business model and process of assetization. We identify four types of relationship reconfigurations with customers, collaborators, and competitors critical to Dryad's financial evolution: reinforcing, forging, positioning, and excluding. We then analyze how Dryad's strategic efforts to develop a new fee structure have changed its interpretations of value(s), community, and governance, factors important in an infrastructure's longevity. We conclude by highlighting emerging tensions that provide insight for other open infrastructures working to become financially sustainable. As a whole, our analysis focuses not just on financial mechanisms for funding open data infrastructures (although those emerge) but on the relationships which enable them.
Cross-national comparison of research funding projects is increasingly important for science policy and strategic planning, but language differences remain a major obstacle. In particular, KAKENHI project descriptions are written primarily in Japanese, whereas projects from major overseas funding agencies, such as NSF, NIH, and UKRI, are documented in English.
This study investigates whether multilingual sentence embeddings can support meaningful cross-lingual comparison of research funding projects, with particular attention to the semantic effects of translating Japanese texts into English. For each KAKENHI project, we construct two representations: the original Japanese text and its machine-translated English version, both embedded in a shared semantic space using a multilingual Sentence-BERT model. We then compare their distances and nearest-neighbor relationships with respect to projects from English-language funding agencies.
The results show that the Japanese and translated English representations of the same KAKENHI project are, on average, located closer to one another than to native English projects, indicating substantial cross-lingual alignment. However, the overlap of nearest neighbors between the two representations is limited, averaging 2.9 out of 10. This suggests that multilingual embeddings capture semantic similarity across languages to a meaningful extent, while language differences and translation still affect the local structure of the embedding space.
These findings suggest that multilingual embeddings provide a useful basis for large-scale exploratory comparison of funding projects across countries and agencies. At the same time, they offer an empirical reference for assessing semantic drift when Japanese research project data are translated into English for international analysis.
Do e-scooter speed governance policies yield behavioral safety gains beyond the mechanical cap they impose? A firmware ceiling mechanically prevents speeding, but whether the same riders also generate fewer harsh accelerations and harsh decelerations when the ungoverned mode is withdrawn remains open. We analyze 19.5 million GPS-instrumented trips from 52 South Korean cities (February to November 2023). Our two-stage predict-then-validate design targets two trip-level binary outcomes, any harsh-acceleration event and any harsh-deceleration event. In Phase~I, we predict each outcome's within-user reduction under an ungoverned-to-governed substitution, using a rider-heterogeneous random-parameters binary logit on the pre-ban period. In Phase~II, we validate these predictions using a difference-in-differences specification that exploits the operator's system-wide December~2023 removal of the ungoverned mode. The causal estimates confirm the Phase~I predictions in sign and order of magnitude on both outcomes, are Bonferroni-significant, and satisfy a 3-month pre-ban parallel-trends test. A within-user composition check finds no behavioral offsetting, indicating that firmware removal of an ungoverned mode lowers both harsh-event margins through a purely mechanical channel. These results imply that speed governance policies can deliver measurable safety gains on unconstrained behavioral margins.
ACM and IEEE are the two premier associations on computing and electrical/electronics engineering which publish and organize the great majority of periodicals and conferences, respectively, serving these disciplines. Science is a constantly evolving process, and these publication fora are expected to follow the trends. In this article, we focus on the periodicals published by the two associations and seek to detect and/or confirm any contemporary science trends as these are reflected to the periodical titles established recently. Our study is rather qualitative than quantitative, aiming at revealing patterns immediately comprehensible and validatable by the reader. Among the most notable patterns, we see a growing preference of both associations for the open access mode of publication; we also observe ACM's orientation toward AI-focused periodicals, and most importantly, a significant theme overlap among periodicals of the same association and this is valid for both ACM and IEEE.
Analysis finds most digital humanities articles omit reusable method details and nearly all venues use blind peer review.
abstractclick to expand
Open Science has become a central framework for promoting transparency, accessibility, and inclusiveness in scholarly research. While the Digital Humanities (DH) community has long embraced openness in terms of research outputs, less attention seems to have been paid to the openness of the methodological and evaluative processes underlying knowledge production. This paper presents an exploratory study that investigates the current state of openness in DH research practices, focusing specifically on research data management documentation and peer review processes. In particular, this study addresses two research questions: (1) to what extent DH publications that describe data explicitly reference external documentation detailing data creation and management processes; and (2) how widely open peer review practices are adopted across DH conferences and journals. The results revealed a limited adoption of open methodological practices. Only a small fraction of the analysed articles provided explicit, reusable documentation of data creation workflows, and no references to data management plans or formal research data management documentation were found. An even more critical picture emerges from the analysis of peer review practices: the vast majority of DH venues continue to rely on traditional single- or double-blind review models, with open peer review adopted in only a few isolated cases.
Network built from 1.3 million papers shows disciplines keeping distinct portfolios while settling on shared tools.
abstractclick to expand
Science advances not only through the accumulation of facts but also through the evolution of tools. Crucially, tools are rarely used in isolation. They form tool portfolios, combinations shaped by a discipline's workflows and analytical demands. Software, near-ubiquitous in modern research and traceable across the published literature, offers a unique window to study tool use in science. Here, we map the software space of science by analyzing mentions to software from 1.3 million publications from 2004 to 2021. We construct a network of 520 software tools linked by disciplinary co-usage, with link strength weighted by proximity based on revealed comparative advantage. This network reveals a structured landscape in which tools cluster into 8 functional communities, including computing and statistics, wet lab instrumentation, and several bioinformatics specializations, with each discipline occupying a distinct position reflecting its characteristic tool portfolios. The breadth of a discipline's tool portfolio is shaped by the nature of its research workflow: fields combining experimental and computational tasks draw on multiple communities, while those with narrower methodological demands concentrate in one. These structural differences are stable across the observation period. At the same time, across all broad disciplinary categories, disciplinary tool portfolios are crystallizing, settling on a common set of tools.
We present a semantic-structural atlas of transportation research built from 120{,}323 papers across 34 peer-reviewed journals published between 1967 and 2025, roughly an order of magnitude larger than and a decade beyond Sun and Rahwan's~(2017) coauthorship study. We use OpenAlex and Crossref as open, CC0-licensed data sources, resolve author identity through OpenAlex author IDs, ORCID records, and manual alias resolution, and embed every paper with SPECTER2 with Arora-style whitening concatenated with concept TF--IDF and venue linear-discriminant projections. On this substrate we report three findings. First, Leiden on the author-level semantic k-nearest-neighbor graph yields 23 topic communities that agree only weakly with the 172 coauthor communities (normalized mutual information $0.23$), opening room for a predictive layer that neither source encodes alone. Second, a multiplex Leiden partition combining both edge types recovers 181 communities and localizes where collaboration and topic structure decouple. Third -- the paper's core methodological contribution -- we define \emph{phantom collaborators}, pairs of authors who are top-$K$ semantic neighbors yet $\geq 3$ hops apart in the coauthor graph, and show via a temporal hold-out (training cutoff 2019) that phantom pairs become real coauthors in 2020--2025 at a rate $16$ to $33$ times above random, popularity-weighted, and same-venue baselines, with a $68$-fold monotone gradient between the highest- and lowest-similarity buckets. All artifacts are released as a live, reproducible web atlas at https://choi-seongjin.github.io/transport-atlas/.
Matching social-media sales offers to published titles shows up to 23.5% of some proceedings came from organized fraud.
abstractclick to expand
Paper mills are a growing threat to the integrity of science, yet their penetration in conference proceedings remains underexplored despite conferences being more important than journals in some scientific subfields. This study aims to identify papers in conference proceedings whose titles have been offered for sale on social media platforms. We collected more than 4,000 unique publication offers from more than 200 social media channels and used semi-automated methods along with human assessment to match offers with papers published in IEEE conference proceedings. We identified 1,720 papers in 286 IEEE conference proceedings, accounting for up to 23.51% of an individual conference. These problematic papers are co-authored by more than 6,500 researchers from over 3,500 affiliations in 55 countries. The identified papers demonstrate collaboration anomalies, high diversity of affiliations per paper, citation manipulation, a predominance of six-author papers, and content-based irregularities. Our findings show that paper mills are a large, organized, and often public market that commercializes scientific misconduct, not limited to papers, but infiltrating multiple parts of the research ecosystem.
Survey finds missing metadata and rare citations, urging guidelines for better discovery and reuse
abstractclick to expand
Scientific posters are one of the most common forms of scholarly communication and contain early-stage insights with potential to accelerate scientific discovery. We investigated where posters are shared, to what extent their sharing aligns with the FAIR principles, and how commonly they are reused. We identified 86 platforms hosting posters, with many not assigning persistent identifiers. A total of 150k posters are shared as of 2024 on the 43 platforms where we were able to count, which is relatively low. Looking in more detail at posters shared on Zenodo and Figshare, we found that repositories are not always supporting structured metadata critical for poster discovery, like conference information, and that researchers are not providing such metadata even if they are supported. We also observed that while there is some engagement with posters in terms of views and downloads, citing posters is not yet a common practice. Our recommendations are for the scientific community to encourage poster sharing and reuse and establish clear guidelines to make posters FAIR.
Research methods constitute an indispensable tool for scholars engaged in scientific inquiry. Investigating how scholars use research methods throughout their careers can reveal distinct patterns in method adoption, providing valuable insights for novice researchers in selecting appropriate methods. This study employs a comprehensive dataset comprising full-text journal articles and bibliographic records from the Library and Information Science (LIS) domain. Utilizing an automated classification model based on full-text cognitive analysis, the research methods employed by LIS scholars are systematically identified. Topic modeling was then conducted using Top2Vec. Subsequently, author name disambiguation is performed, and academic age is calculated for each scholar. This study focuses on 435 senior scholars with an academic age of more than 14 years and a consistent publication record at five-year intervals, covering a total of 6,116 articles. The corpus covers 16 research method categories and 20 research topics. The findings indicate that bibliometric methods are the most frequently used across career stages, accounting for 19.61% among early-career scholars and 31.81% among senior scholars. Over the course of a scholarly career, the diversity of research methods initially increases and then declines. Furthermore, scholars exhibit a propensity for combining multiple research methods, including both conventional and unconventional pairings. Notably, the research methods most commonly used by researchers change with age and seniority.
The debate about scholarly knowledge infrastructure has long been framed as a contest between openness and commercial enclosure. This framing distorts both policy and practice. The real tension lies between the persistent cost of producing and refining structured metadata under deep technological friction, and the differentiated demands distinct communities place on data quality, focus and granularity. We introduce the innovation annulus: the zone between freely available structured data and the advancing frontier of commercially refined knowledge products. This zone is a permanent, functional feature of the ecosystem -- not a pathology to eliminate. By analogy with the efficient market hypothesis, its width measures production inefficiency, set by the interplay of friction and demand. Artificial intelligence reshapes the annulus, lowering barriers to basic structuring, raising the threshold at which refinement adds value, and introducing systemic risks through unprovenanced AI-derived metadata. CRediT contributions, funding acknowledgements and AI disclosure statements illustrate the annulus lifecycle. Governance should calibrate the annulus, not abolish it: thin enough to serve research efficiently, wide enough to sustain innovation. A formal welfare framework, analogous to the Nordhaus optimal patent life, characterises the trade-offs and yields testable predictions. The Barcelona Declaration offers a promising forum for boundary governance.
Scientific tools dictate the boundaries of human knowledge, serving as the foundation for perceptions and explorations. In the era of Big Science, science are increasingly dependent on advanced analytical technologies and experimental platforms. Over the past decades, national and supranational entities have invested massive financial resources, collaborative networks, and collective intelligence to construct Big Science Facilities (BSFs) aimed at generating cutting edge knowledge. However, empirical evaluations of these machines actual performance in driving scientific innovation remain scarce. To address this gap, we collected 310,086 publications from 88 global BSFs and constructed a matched control dataset of approximately 3 million publications sharing the same last authors. Our analysis reveals that the utilization of BSFs has expanded significantly since 1950s. Crucially, publications supported by these facilities exhibit higher recombinant novelty and interdisciplinary integration. Furthermore, this improvement is most pronounced in non physical sciences domains traditionally peripheral to BSFs core focus indicating the emergence of a powerful intra facility knowledge spillover effect. By enriching the Facilitymetrics framework, our findings provide empirical evidence that BSFs act as vital engines for scientific discovery, offering policymakers essential metrics to justify infrastructural investments, while prompting the science of science community to reassess the profound impact of scientific tools on knowledge production
Researchers who first link two prior collaborators produce work more likely to land in top journals, with stronger effects in larger groups.
abstractclick to expand
In modern scientific collaboration networks, certain researchers play a pivotal role in bridging scholars who have never worked together - a phenomenon we term academic "match-makers." Despite their potential importance, the prevalence, characteristics, benefits, and long-term trajectory of these individuals remain underexplored. Using the Microsoft Academic Graph (MAG), we operationalized a match-maker as an author who, in a given publication, introduced a first-time collaboration between two co-authors, each of whom had previously collaborated with the match-maker but not with each other. We employed a configuration null model to distinguish observed patterns from random chance. Our findings reveal that the match-maker phenomenon is deliberate, prevalent, and consequential. Among authors with over 20 publications, nearly 30% have served as a match-maker, and the probability of acting as one increased eightfold from 1980 to 2019. Publications involving a match-maker are more likely to appear in high-impact journals and exhibit higher disruptiveness - particularly in larger teams - suggesting that match-makers help facilitate what we term integrative disruption. Match-makers tend to emerge early in their careers, peaking around the 20th publication and at an academic age of roughly ten years. While nearly all match-makers eventually experience "abandonment" in the sense that the connected researchers later collaborate without them, their continued involvement remains substantial and is driven by research needs rather than structural factors. This reframes abandonment not as exclusion but as a natural evolution within project-based collaborations. The academic match-maker phenomenon is a strategic feature of collaboration networks characterized by early-career emergence, context-dependent persistence, and tangible contributions to high-impact, disruptive research.
Over 80 percent of high-profile retractions go undetected by offline models, while valid papers are rarely mislabeled.
abstractclick to expand
Large Language Models (LLMs) can be helpful for literature search and summarisation, but retracted articles can confuse them. This article asks three open weights (offline) LLMs whether 161 high profile retracted articles had been retracted, performing a similar check for a benchmark multidisciplinary set of 34,070 non-retracted articles. Based on titles and abstracts, in over 80% of cases the LLMs claimed that a retracted article had not been retracted (GPT OSS 120B: 82%; Gemma 3 27B: 84%; DeepSeek R1 72B: 88%). The reasons given for a correct retraction declaration were often wrong, even if detailed. This confirms that LLMs have little ability to distinguish between valid and retracted studies, unless they are allowed to, and do, check online. For the benchmark test, there were only 55 false retraction claims from 34,070 non-retracted full text articles, and 28 false claims when only the title and abstract were entered, suggesting that there is only a small chance that LLMs discount valid studies. When retractions are erroneously claimed, this does not seem to be due to mistakes in the article. Overall, the results give new reasons to be cautious about LLM claims about academic findings.
Across scholarly communities, manuscripts face similar evaluative rituals: editors invite experts to privately assess submissions through formal peer reviews. This closed, loosely structured, and publisher-mediated process is now being supplemented by critiques on open, distributed platforms. We call this practice, a blend of three open peer review variants, informal peer review as it is accessible to outsiders, unmediated by publishers, and conducted across public platforms. Informal peer reviewers range from occasional error detectors to experienced sleuths who identify plagiarism, fraud, errors, conflicts of interest, and conceptual flaws. They may interpret methods, clarify jargon, assess value, and connect to related work.
Here, we asked four questions: (1) Who are informal peer reviewers? (2) Where do they work? (3) How do they evaluate research? and (4) What are their impacts? To answer these questions, we conducted a cross-platform digital ethnography with participant observation. We traced discourse across communities over four months and revisited cases after nine and twelve months. From 15 communities, we selected 12 case mentions (10 unique cases) and 8 meta-commentaries from 26 reviewers. Using open and axial coding, we generated 1,080 codes and four themes: reviewers are a motley crew, they self-organize across subpar digital spaces, use deep, uncommon strategies, and they face resistance from authors, publishers, and editors.
Informal peer review, we concluded, is a fragile, minimally governed patchwork of people, platforms, and practices, as well as an emerging evidence infrastructure that can be scaled up. We advise advocates and tool-builders to evolve informal review tools, communities, training, and governance by connecting to scholars' values, reducing participation friction, and rewarding attempts to extend the scholarly dialogue.
Exploratory work from small teams diffuses slowly due to conversion costs while consolidating research gains quick recognition.
abstractclick to expand
Science advances not only by accumulating discovered patterns but by changing how new problems and solutions are expressed. While structural indicators track scholarly attention, they offer only an indirect proxy for the reorganization of meaning. We propose a semantic geometry based on the R-P-C (references, focal publication, and citing publications) framework to quantify how a publication positions itself relative to its knowledge base and diffusion. This geometry identifies three publication types: consolidating, exploratory and balanced. Our results show that the semantic similarity and distance between a publication's knowledge base and diffusion serve as a mechanistic explanation for disruption, with novelty (atypical reference combinations) acting as an antecedent disturbance that triggers a semantic rupture. This is related to team size, where small teams preserve a higher potential for exploratory departures while large collaborations systematically align with paradigmatic consolidation. Crucially, this geometry explains why citation trajectories differ; consolidating research earns rapid recognition by lowering comprehension costs, while exploratory work faces high paradigm conversion costs that result in slower, more selective diffusion. Collectively, this R-P-C framework provides a robust instrument for monitoring the dynamic of scientific paradigms.
Generative AI systems such as ChatGPT are increasingly used in scientific writing, yet their broader implications for the organization of scientific knowledge remain unclear. We examine whether AI-assisted writing intensity, measured as the share of text in a paper that is predicted to exhibit features consistent with LLM-generated text, is associated with scientific disruption and knowledge recombination. Using approximately two million full-text research articles published between 2021 and 2024 and linked to citation networks, we document a sharp temporal pattern beginning in 2023. Before 2023, higher AI-assisted writing intensity is weakly or negatively associated with disruption; after 2023, the association becomes positive in within-author, within-field analyses. Over the same period, the positive association between AI-assisted writing intensity and cross-field citation breadth weakens substantially, and the negative association with citation concentration attenuates. Thus, the post-2023 increase in disruption is not accompanied by broader knowledge sourcing. These patterns suggest that generative AI is associated with more disruptive citation structures without a corresponding expansion in cross-field recombination. Rather than simply broadening the search space of science, AI-assisted writing may be associated with new forms of recombination built from relatively narrower knowledge inputs.
Peer review shapes which scientific claims enter the published record, but its internal dynamics are hard to measure at scale because reviewer criticism and author revision are usually embedded in long, unstructured correspondence. Here we use a fixed-prompt large language model pipeline to convert the review correspondence of \textit{Nature Communications} papers published from 2017 to 2024 into structured reviewer--author interactions. We find that review pressure is concentrated in the first round and focused disproportionately on core claims rather than peripheral presentation. Higher average opinion strength is also associated with more reviewer disagreement, while review patterns vary little with broad team attributes, consistent with relatively impartial evaluation. Contrary to the intuition that stronger papers should pass review more smoothly, with greater reviewer--author agreement and less extensive revision, we find that stronger criticism, higher-quality comments, and greater revision burden are associated with higher later citation impact within accepted papers. We finally show that fields differ more in review style than in review length, pointing to disciplinary variation in how criticism is negotiated and resolved. These findings position open peer review not just as a gatekeeping mechanism but as a measurable record of how influential scientific claims are challenged, defended, and revised before entering the published record.
Adding citations while drafting in LaTeX often requires leaving the editor, searching for a paper in mind, copying its BibTeX entry into the project bibliography, renaming the cite key, and then returning to the sentence. \texttt{OverCite} is an open-source, lightweight tool that lets authors find, select, and insert citations without leaving the writing environment. In Overleaf, \texttt{OverCite} uses rough citation placeholders (e.g., $\texttt{\textbackslash citep\{Perlmutter1999\}}$) and local sentence context to query ADS/SciX-indexed literature, rank likely matches, and insert the selected reference, without leaving the editor. A companion \texttt{VS Code} extension provides the same functionality for local LaTeX projects. The ADS/SciX database includes astronomy, physics, computer science, mathematics, biology, and \emph{all} indexed arXiv e-prints, making \texttt{OverCite} useful across a broad range of scientific disciplines.
In the academic landscape, scientific research has been primarily conducted through research institutions, which requires a massive influx of funds from various sources. Presently, these funding bodies have been moving from trust-based funding to performance-based evaluation systems for granting funds to the research bodies. This has led to the rise in popularity of various indices or statistics that measure institutional research strength or expertise. Institutional research expertise usually focuses on publication volume and its impact measured using the widely used h- and g-indices. However, these indices fail to capture the thematic expertise of research for institutions. To address this gap, two new expertise indicators, namely the x-index, the x_d-index, and bias-adjusted variants, the field-normalised x_d-index, and the fractional x_d-index, were introduced recently. Additionally, we propose two new variants, the category-adjusted x-index and the inverse variance weighted x_d-index, which further account for resolvable bias, and a novel statistic, the x_o-index, which acts as a measure of the overall research expertise. While several packages that calculate the traditional h- and g-indices exist, these novel expertise indices are yet to be included in such existing packages. The 'xxdi' R package provides simple functions that implement these expertise indices and their variants, enabling their utilisation by the wider research community. A stable version of the package is available on CRAN (https://doi.org/10.32614/CRAN.package.xxdi) and an in-development version on GitHub (https://github.com/nilabhrardas/xxdi).
Study of 15,000 Nature Communications articles finds papers with new results alone rank higher in citations and top-percentile impact than้ฃไบ
abstractclick to expand
Scientific novelty drives advances at the research frontier, yet it is also associated with heightened uncertainty and potential resistance from incumbent paradigms, leading to complex patterns of scientific impact. Prior studies have primarily ex-amined the relationship between a single dimension of novelty -- such as theoreti-cal, methodological, or results-based novelty -- and scientific impact. However, because scientific novelty is inherently multidimensional, focusing on isolated dimensions may obscure how different types of novelty jointly shape impact. Consequently, we know little about how combinations of novelty types influence scientific impact. To this end, we draw on a dataset of 15,322 articles published in Nature Communications. Using the DeepSeek-V3 model, we classify articles into three novelty dimensions based on the content of their Introduction sections: theoretical novelty, methodological novelty, and results-based novelty. These dimensions may coexist within the same article, forming distinct novelty configura-tions. Scientific impact is measured using five-year citation counts and indicators of whether an article belongs to the top 1% or top 10% highly cited papers. Descriptive results indicate that results-based novelty alone and the simultaneous presence of all three novelty types are the dominant configurations in the sample. Regression results further show that articles with results-based novelty only re-ceive significantly more citations and are more likely to rank among the top 1% and top 10% highly cited papers than articles exhibiting all three novelty types. These findings advance our understanding of how multidimensional novelty configurations shape knowledge diffusion.
This study presents a large-scale network dataset, NIH-MPINet, curated from NIH RePORTER and PubMed, characterizing collaboration among multiple Principal Investigators (multi-PIs) on NIH R01-equivalent grants from 2006 to 2023. The network characterizes 30,127 PIs as nodes and their collaborations on 86,743 NIH R01-equivalent grants as edges, spanning 888 recipient organizations and supported by 40 NIH Institutes and Centers. We also curated comprehensive metadata, including node-level features such as PI affiliation, alongside edge-level features comprising grant years, titles, and abstracts. Using these data, we constructed a PI collaboration network and identified 19 communities as well as 20 major research topics. Several collaboration communities showed distinct thematic profiles, such as cardiovascular health, cancer immunotherapy, neuroscience, and microbiome research, while genetics and genomics were broadly represented across communities. By incorporating temporal analysis, we observed shifts in research topics and collaboration patterns over time. Topics like healthcare and outcomes research, cognitive health, and Alzheimer's disease have become more prominent in recent years, whereas molecular and cellular biology has seen a relative decline. Overall, this work provides a high-fidelity, feature-rich resource for advancing statistical learning methods and network analysis-based discoveries in the study of long-term biomedical collaboration.
To better align theories of paradigm shifting discoveries and empirics identifying them, we pro-pose a novel measure that incorporates a discovery impact, novelty, and tendency to break with the past into a single, coherent measure. Calibration using the National Inventor Hall of Fame data reveals that impact, novelty, and disruptiveness are strict complements meaning, for example, that greater impact cannot substitute for moderate novelty. We illustrate the workings of the measure using data on USPTO patents from 1982 to 2015.
Trackable, forkable workflows expose the full reasoning path instead of only polished papers, enabling faster fixes and wider reuse.
abstractclick to expand
The way science is currently practiced shows conclusions but hides how they were reached. Researchers work privately, polish their results, publish a finished paper, and defend it. Errors are punished by retraction rather than corrected by amendment. Alternative directions are pursued through competing papers with no shared history. The reasoning, the dead ends, the trade-offs, the corrections: everything that would let others understand how a conclusion was reached is invisible. Two decades of open science reform have addressed this by opening specific artifacts: papers, data, code, notebooks, protocols. Each is valuable, but the unit remains a finished product. None opens the thinking process itself: the evolving sequence of questions, interpretations, dead ends, and direction changes that constitutes the actual scientific contribution.
This paper argues that opening the process of science (not just its outputs) would produce a step change in the speed of scientific progress, the accessibility of scientific reasoning, the trustworthiness of scientific claims, and the scalability of scientific quality assurance. We identify three properties the workflow needs: visible (the process is open, not just the product), trackable (every change is recorded and attributable), and forkable (anyone can branch from any point with shared history preserved). A visible, trackable flow is inherently verifiable: by humans, by automated tools, by AI agents. Software development adopted this flow decades ago, and the results (faster correction, broader contribution, maintained quality at scale) demonstrate the opportunity for science.
Four states from legacy seeds to published artifacts enable durable migration of mixed human-AI work using local scripts.
abstractclick to expand
We propose \emph{ClawXiv}, a workflow and archive architecture for mixed human--AI research. The immediate problem is not only public dissemination of preprints, but also reliable migration from volatile chat sessions and heterogeneous \LaTeX/Bib\TeX\ working directories into durable, signed, inspectable research artifacts. ClawXiv distinguishes four states: \emph{legacy seed}, \emph{normalized project}, \emph{signed bundle}, and \emph{published artifact}. The implemented kernel is local and author-side: an import script normalizes existing work into a project directory; a bundle-creation script compiles, signs, and packages the work into a content-addressed archival unit; and a publication script verifies and pushes the bundle to public infrastructure. Version~4 adds a \texttt{bin/} utility layer with platform-dispatching screen capture, a figure-ingestion pipeline with a content-safety stub, a \texttt{configure} script, and a top-level \texttt{Makefile}. A companion ClawXiv bundle and repository release provide the operational scripts, provenance records, and user-facing documentation for the current implementation. Code is available at \texttt{github.com/kornai/clawxiv}.
This paper empirically examines the practical validity of the official evaluation criteria underpinning the Research Productivity (PQ) Grant framework, as governed by the Brazilian National Council for Scientific and Technological Development (CNPq). By operationalizing regulatory dimensions (including bibliographic output, human resource training, and scientific recognition) as measurable variables extracted from CVs and OpenAlex bibliometric data, we treat policy-defined indicators as testable hypotheses rather than a priori assumptions. Using a block-based adaptation of the Boruta feature selection algorithm across several machine learning classifiers, we evaluate the statistical contribution of each dimension in distinguishing grant levels, with a focus on identifying top-tier (Level 1A) researchers. Our models achieve high predictive performance, with mean AUC scores reaching 0.96, indicating that PQ levels carry a robust and structured statistical signal. However, explanatory power is heavily concentrated within a limited subset of features, specifically bibliographic production, graduate-level supervision and institutional management roles. Conversely, several criteria explicitly emphasized in the regulations demonstrated no detectable statistical contribution to classification outcomes. These findings reveal a potential misalignment between the formal regulatory framework and the effective signals driving evaluation outcomes, suggesting that the practical evaluative signal is substantially more compact than officially stated and providing evidence-based insights for the refinement and transparency of research assessment policies.
Science currently offers two options for quality assurance, both inadequate. Journal gatekeeping claims to verify both integrity and contribution, but actually measures prestige: peer review is slow, biased, and misses fabricated citations even at top venues. Open science provides no quality assurance at all: the only filter between AI-generated text and the public record is the author's integrity. AI-assisted writing makes both worse by producing more papers faster than either system can absorb.
We propose a third option: measure the paper itself. sciwrite-lint (pip install sciwrite-lint) is an open-source linter for scientific manuscripts that runs entirely on the researcher's machine (free public databases, a single consumer GPU, and open-weights models) with no manuscripts sent to external services. The pipeline verifies that references exist, checks retraction status, compares metadata against canonical records, downloads and parses cited papers, verifies that they support the claims made about them, and follows one level further to check cited papers' own bibliographies. Each reference receives a per-reference reliability score aggregating all verification signals. We evaluate the pipeline on 30 unseen papers from arXiv and bioRxiv with error injection and LLM-adjudicated false positive analysis.
As an experimental extension, we propose SciLint Score, combining integrity verification with a contribution component that operationalizes five frameworks from philosophy of science (Popper, Lakatos, Kitcher, Laudan, Mayo) into computable structural properties of scientific arguments. The integrity component is the core of the tool and is evaluated in this paper; the contribution component is released as experimental code for community development.
This paper presents a comprehensive dataset of doctoral theses defended in France between 1985 and 2025, constructed from multiple national academic metadata sources. The dataset is primarily based on data from the French national thesis platform and is enriched using additional authority and bibliographic databases to improve data quality, completeness, and interoperability. The data production pipeline includes the aggregation of heterogeneous sources, the correction of inconsistent identifiers, the enrichment of person and institution records, and the construction of derived variables describing academic careers, jury participation, institutional affiliations, and thesis characteristics. Additional identifiers from major academic repositories and library catalogues are integrated to facilitate linkage with external data sources and future dataset extensions. The resulting dataset provides structured information at the thesis, individual, and institutional levels, enabling both descriptive and relational analyses. This resource is particularly suited for research on doctoral education, academic networks, supervision practices, jury composition, institutional collaboration, and the evolution of research communities over time. The paper documents the data sources, processing pipeline, feature construction, data quality issues, and limitations, with the objective of facilitating reuse of the dataset by other researchers and supporting future extensions and longitudinal analyses of the academic system.
This paper presents Lishu, a deployable web artifact for searching, monitoring, and interpreting literature from elite business and management journals. The system integrates the UTD-24 and Financial Times 50 (FT50) journal pools and combines Crossref, OpenAlex, Unpaywall, and optional CORE enrichment to support a broader research workflow than article retrieval alone. In the current implementation, users can search across curated journal pools, apply multi-journal filters, preview open full-text excerpts when available, generate citations and exports, inspect topic and affiliation structure, produce review drafts, simulate virtual peer review, and assemble grant-oriented research narratives. Unlike static journal directories or general-purpose academic search engines, the artifact is explicitly scoped to high-status management outlets and is designed to support sensemaking tasks that matter to researchers, doctoral students, and lab managers: identifying recent work, surfacing topical concentration, comparing themes, and converting search output into actionable research material. Architecturally, the system emphasizes source transparency, modularity, and low-cost public deployability through a lightweight Node.js service layer, a multi-page client interface, optional large-language-model enhancement for interpretation and writing support, and a free-tier persistence path through Supabase. The paper contributes both a functioning design artifact and an extensible architectural pattern for journal-pool-specific scholarly discovery and writing support, with implications for digital research infrastructure in information systems and business scholarship.
The use of Large Language Models (LLMs) like ChatGPT and DeepSeek for translation and language polishing is a welcome development, reducing the longstanding publishing barrier to non-English speakers. Assessing the uptake of this facility is useful to give insights into changing nature of scientific writing. Although the prevalence of LLM-associated terms has been tracked across science in abstracts and for full text biomedical research, their science-wide prevalence in full texts is unknown. In response, this article investigates an expanded set of 80 potentially LLM-associated terms during 2021-2025 in a science-wide full text collection from the publisher MDPI (1.25 million articles), partly focusing on the 73 journals that published at least 500 articles in 2021. The results demonstrate the increasing prevalence of LLM-associated terms science-wide in full texts to 2024, with some terms declining from 2024 to 2025 and others continuing to increase. LLMs seem to avoid some terms (e.g., thus, moreover) and a few terms have stronger associations with abstracts than full texts (e.g., enhanced) or the opposite (e.g., leveraged). The term family "underscore" had the biggest increase: up to 29-fold. There are substantial differences between journals in the apparent use of LLMs for writing, from lower uptake in the life sciences to higher uptake in social sciences, electronic engineering and environmental science. Fields in which there is currently low uptake may need improved or specialist support, such as for reliably translating complex formulae, before the full benefits of automatic translation can be realised.
Scaling laws describe how language model capabilities grow with compute and data, but say nothing about how long a model matters once released. We provide the first large-scale empirical account of how scientists adopt and abandon language models over time. We track 62 LLMs across over 108k citing papers (2018-2025), each with at least three years of post-release data, and classify every citation as active adoption or background reference to construct per-model adoption trajectories that raw citation counts cannot resolve. We find three regularities. First, scientific adoption follows an inverted-U trajectory: usage rises after release, peaks, and declines as newer models appear, a pattern we term the \textit{scientific adoption curve}. Second, this curve is compressing: each additional release year is associated with a 27\% reduction in time-to-peak adoption ($p < 0.001$), robust to minimum-age thresholds and controls for model size. Third, release timing dominates model-level attributes as a predictor of lifecycle dynamics. Release year explains both time-to-peak and scientific lifespan more strongly than architecture, openness, or scale, though model size and access modality retain modest predictive power for total adoption volume. Together, these findings complement scaling laws with adoption-side regularities and suggest that the forces driving rapid capability progress may be the same forces compressing scientific relevance.
No server or code required as Google Sheets and local API keys power large language model and machine learning assistance for evidence work.
abstractclick to expand
Background: Server-based screening tools impose subscription costs, while open-source alternatives require coding skills. Objectives: We developed a browser extension that provides no-code, serverless artificial intelligence (AI)-assisted title and abstract screening and examined its functionality. Methods: TiAb Review Plugin is an open-source Chrome browser extension (available at https://chromewebstore.google.com/detail/tiab-review-plugin/alejlnlfflogpnabpbplmnojgoeeabij). It uses Google Sheets as a shared database, requiring no dedicated server and enabling multi-reviewer collaboration. Users supply their own Gemini API key, stored locally and encrypted. The tool offers three screening modes: manual review, large language model (LLM) batch screening, and machine learning (ML) active learning. For ML evaluation, we re-implemented the default ASReview active learning algorithm (TF-IDF with Naive Bayes) in TypeScript to enable in-browser execution, and verified equivalence against the original Python implementation using 10-fold cross-validation on six datasets. For LLM evaluation, we compared 16 parameter configurations across two model families on a benchmark dataset, then validated the optimal configuration (Gemini 3.0 Flash, low thinking budget, TopP=0.95) with a sensitivity-oriented prompt on five public datasets (1,038 to 5,628 records, 0.5 to 2.0 percent prevalence). Results: The TypeScript classifier produced top-100 rankings 100 percent identical to the original ASReview across all six datasets. For LLM screening, recall was 94 to 100 percent with precision of 2 to 15 percent, and Work Saved over Sampling at 95 percent recall (WSS@95) ranged from 48.7 to 87.3 percent. Conclusions: We developed a functional browser extension that integrates LLM screening and ML active learning into a no-code, serverless environment, ready for practical use in systematic review screening.
Publication sets based on authorship position and time windows are matched to calls with word embeddings and ranked by percentile for each
abstractclick to expand
Grant recommendation systems remain one of the least explored areas within academic recommender systems, and existing proposals are typically tied to specific funding agencies or disciplinary domains. This paper presents an institution-level reproducible framework for matching researchers to funding opportunities by combining bibliometric profiling with semantic matching. Rather than representing each researcher through a single aggregated profile, the framework constructs multiple publication sets defined by bibliometric criteria such as authorship position and time window, each independently compared against funding calls using word embeddings. Within-researcher normalisation and percentile-based ranking transform cosine similarity scores into actionable recommendations. A case study applied to 3,013 researchers from the University of Granada and 291 Horizon Europe topics verify it and shows that the four indicators capture complementary signals.
The accelerating pace of scientific publishing makes it increasingly difficult for researchers to stay current. We present Paper Espresso, an open-source platform that automatically discovers, summarizes, and analyzes trending arXiv papers. The system uses large language models (LLMs) to generate structured summaries with topical labels and keywords, and provides multi-granularity trend analysis at daily, weekly, and monthly scales through LLM-driven topic consolidation. Over 35 months of continuous deployment, Paper Espresso has processed over 13,300 papers and publicly released all structured metadata, revealing rich dynamics in the AI research landscape: a mid-2025 surge in reinforcement learning for LLM reasoning, non-saturating topic emergence (6,673 unique topics), and a positive correlation between topic novelty and community engagement (2.0x median upvotes for the most novel papers). A live demo is available at https://huggingface.co/spaces/Elfsong/Paper_Espresso.
Disambiguating scholars with identical names is essential for accurate authorship assignment and robust large-scale scientometric research. Existing methods are often designed for Latin-script metadata and perform poorly on Chinese names. In international publications, Chinese names typically appear as Romanized Pinyin, which is highly ambiguous as it can map to multiple distinct characters. Chinese characters, in contrast, reduce but do not eliminate this ambiguity, and are rarely available in international records. To address both challenges, we propose a rule-based disambiguation framework that integrates co-authorship networks, citation networks, author affiliations, and content similarity. We apply this framework to 65,241 physics papers from the China National Knowledge Infrastructure (CNKI), spanning over 70 years of data. On a human annotated sample of 80 name pairs, our method achieves F1-scores of 0.88 for Pinyin names and 0.89 for character-based names, outperforming two baseline approaches, with improvements driven primarily by higher recall. The comparable performance across both writing systems shows that our approach is script-agnostic, enabling reliable large-scale scientometric analyses.
Positive bias in training literature now limits models used for research, data training, and peer review.
abstractclick to expand
Scientific publishing systematically filters out negative results. We argue that this long-standing asymmetry has become an urgent problem in the era of large language models, which inherit the positive bias of the literature they are trained on, face an impending shortage of high-quality training data, and are increasingly deployed as both research tools and peer reviewers. We analyze three ways in which LLMs have changed the value of failure data and show that the systematic absence of such data degrades their utility as research tools, training data consumers, and peer reviewers alike. We outline experimental protocols to validate these claims and discuss the structural conditions under which a failure-inclusive publishing culture could emerge.
Large language models with web search are increasingly used in scientific publishing agents, yet they still produce BibTeX entries with pervasive field-level errors. Prior evaluations tested base models without search, which does not reflect current practice. We construct a benchmark of 931 papers across four scientific domains and three citation tiers -- popular, low-citation, and recent post-cutoff -- designed to disentangle parametric memory from search dependence, with version-aware ground truth accounting for multiple citable versions of the same paper. Three search-enabled frontier models (GPT-5, Claude Sonnet-4.6, Gemini-3 Flash) generate BibTeX entries scored on nine fields and a six-way error taxonomy, producing ~23,000 field-level observations. Overall accuracy is 83.6%, but only 50.9% of entries are fully correct; accuracy drops 27.7pp from popular to recent papers, revealing heavy reliance on parametric memory even when search is available. Field-error co-occurrence analysis identifies two failure modes: wholesale entry substitution (identity fields fail together) and isolated field error. We evaluate clibib, an open-source tool for deterministic BibTeX retrieval from the Zotero Translation Server with CrossRef fallback, as a mitigation mechanism. In a two-stage integration where baseline entries are revised against authoritative records, accuracy rises +8.0pp to 91.5%, fully correct entries rise from 50.9% to 78.3%, and regression rate is only 0.8%. An ablation comparing single-stage and two-stage integration shows that separating search from revision yields larger gains and lower regression (0.8% vs. 4.8%), demonstrating that integration architecture matters independently of model capability. We release the benchmark, error taxonomy, and clibib tool to support evaluation and mitigation of citation hallucinations in LLM-based scientific writing.
Interviews show humanities researchers change preferences with tasks, demand provenance, seek challenges, and work in long threads instead.
abstractclick to expand
User models for recommender systems (RecSys) typically assume stable preferences, similarity-based relevance, and session-bounded interactions -- assumptions derived from high-volume consumer contexts. This paper investigates these assumptions for humanities scholars working with digital archives. Following a human-centered design approach, we conducted focus groups and analyzed interview data from 18 researchers. Our analysis identifies four dimensions where scholarly information-seeking diverges from common RecSys user modeling: (1) context volatility -- preferences shift with research tasks and domain expertise; (2) epistemic trust -- relevance depends on verifiable provenance; (3) contrastive seeking -- researchers seek items that challenge their current direction; and (4) strand continuity -- research spans long-term threads rather than discrete sessions. We discuss implications for user modeling and outline how these dimensions relate to collaborative filtering, content-based, and session-based recommendation. We propose these dimensions as a diagnostic framework applicable beyond archives to similar application domains where typical user modeling assumptions may not hold.
Non-existent papers are appearing in hundreds of documents as AI tools reliably pair real authors with fake titles.
abstractclick to expand
This paper investigates how generative AI produces and propagates hallucinated academic references, focusing on the recurring non-existent citation 'Education Governance and Datafication' attributed to Ben Williamson and Nelli Piattoeva. Drawing on 137 accessible source papers identified through Google Scholar and Google searches, the study analyses the structure, recurrence, and onward citation of this phantom reference. It shows that hallucinated citations are not random inventions but patterned recombinations of real authors, journals, dates, and keywords, with duplication occurring in nearly 30% of cases. The paper also reports a structured interrogation of ChatGPT 5-mini about how it generates citations and finds that, absent verification, the model reconstructs plausible references from learned patterns rather than factual recall. Finally, ten AI-generated essays on datafication and school governance were examined: while most references were genuine or partly accurate, 9.2% remained hallucinated, including an exact match to the most common phantom citation. The findings highlight ongoing risks to academic integrity and show that web-enabled AI still does not fully eliminate fabricated references.
Patents cite hybrid models more often, but gold and diamond OA show stronger semantic links, especially inside patent bodies.
abstractclick to expand
Scientific research is a key input into technological innovation, yet not all scientific knowledge is equally mobilized in patents. This paper examines how different scientific publishing models shape both the selection of scientific publications cited in patents and their cognitive alignment with patented technologies. Using large-scale data on non-patent references linking patents to scientific publications, combined with metadata from OpenAlex, we compare the Open Access (OA) structure of patent-cited science to that of the scientific literature. We then assess cognitive alignment using semantic similarity between patent abstracts and the abstracts of cited publications, distinguishing between citations appearing in the front section of patents and those embedded in the body of patent texts. We find that patent citations disproportionately draw on publications disseminated through highly visible and institutionally established publishing channels, particularly hybrid and bronze OA models, indicating strong selection effects. However, this dominance in citation counts does not translate into stronger cognitive alignment with patented technologies. On the contrary, publications in fully OA journals (gold and diamond OA) exhibit equal or higher semantic proximity, especially when cited in the body of patents. These results suggest that the contribution of OA to innovation depends less on access alone than on how different publishing models are embedded in information infrastructures that shape the visibility, discoverability, and use of scientific knowledge.