pith. machine review for the scientific record. sign in

arxiv: 2604.23430 · v1 · submitted 2026-04-25 · 💻 cs.IR · cs.AI· cs.CL· cs.DL· cs.SE

Recognition: unknown

Automating Categorization of Scientific Texts with In-Context Learning and Prompt-Chaining in Large Language Models

Authors on Pith no claims yet

Pith reviewed 2026-05-08 07:13 UTC · model grok-4.3

classification 💻 cs.IR cs.AIcs.CLcs.DLcs.SE
keywords large language modelsprompt chainingin-context learningscientific text categorizationhierarchical classificationprompt engineeringtaxonomy based classification
0
0 comments X

The pith

Prompt chaining in large language models outperforms in-context learning for hierarchical scientific text classification.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper tests off-the-shelf large language models on the task of assigning scientific texts to categories within a given hierarchical scheme. It compares simple in-context learning, where example cases sit inside the prompt, against prompt chaining, a step-by-step sequence of prompts that uses earlier answers to guide later decisions. Experiments show chaining produces higher accuracy, especially on the broader domain and subject layers, exceeding prior models at those levels while topic-level results stay near 50 percent. If the pattern holds, the approach offers a practical way to organize growing scientific collections without training new models from scratch for each taxonomy.

Core claim

The central finding is that prompt chaining yields superior classification accuracy compared to pure in-context learning, particularly when applied to the nested structure of the taxonomy. Large language models using this approach outperform state-of-the-art models for first-level domain prediction and perform better than an older BERT model for second-level subject prediction, though accuracy at the third-level topic remains around 50 percent even with chaining.

What carries the argument

Prompt chaining, a sequence of linked prompts that breaks the hierarchical decision into ordered steps and feeds prior outputs forward to refine each subsequent choice.

If this is right

  • Large language models equipped with prompt chaining reach higher accuracy than prior state-of-the-art systems for domain-level assignment of scientific texts.
  • The same chaining method produces stronger results than BERT models on subject-level classification tasks.
  • Topic-level classification within the hierarchy stays limited to roughly 50 percent accuracy with current prompting techniques.
  • Temperature settings influence the stability and accuracy of the classification outputs under both prompting strategies.
  • Chaining proves especially useful when the classification scheme itself contains multiple nested layers.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The sequential prompting pattern could transfer to other hierarchical labeling tasks that are not limited to scientific literature.
  • If chaining reduces reliance on task-specific fine-tuning, organizations could adapt the method to new taxonomies with minimal additional data.
  • Combining chaining with retrieval of similar past examples might raise topic-level accuracy without changing the base model.
  • Testing the same workflow on texts from a single narrow field versus mixed domains would reveal how sensitive the gains are to content variety.

Load-bearing premise

The labels used as ground truth accurately match the intended hierarchical scheme and the tested models plus strategies will perform similarly on other collections of scientific texts.

What would settle it

Applying the identical chaining procedure and models to an independently labeled collection of scientific texts drawn from a different source and finding that accuracy falls back to or below the in-context learning baseline would show the reported gains do not hold.

Figures

Figures reproduced from arXiv: 2604.23430 by Gautam Kishore Shahi, Oliver Hummel.

Figure 1
Figure 1. Figure 1: Hierarchical structure of domains, subjects, and topics within ORKG taxonomy. names are multivalent—for example, Thermal Engineering/Process Engineer￾ing—which could be mapped as equivalent classes in an ontology, but remain challenging for LLMs to predict accurately, even with different evaluation tech￾niques. Similarly, many subjects are multi-label, such as Urbanism, Spatial Plan￾ning, Transportation an… view at source ↗
Figure 2
Figure 2. Figure 2: Methodology used in the identification of research area and upper part shows hi￾erarchical identification of research areas using In-context learning and Prompt Chain￾ing) Data Collection The initial step involved the selection and collection of a suitable dataset. For this study, we analyzed scientific texts sourced from the FORC dataset. A comprehensive description of the dataset, including its struc￾tur… view at source ↗
read the original abstract

The relentless expansion of scientific literature presents significant challenges for navigation and knowledge discovery. Within Research Information Retrieval, established tasks such as text summarization and classification remain crucial for enabling researchers and practitioners to effectively navigate this vast landscape, so that efforts have increasingly been focused on developing advanced research information systems. These systems aim not only to provide standard keyword-based search functionalities but also to incorporate capabilities for automatic content categorization within knowledge-intensive organizations across academia and industry. This study systematically evaluates the performance of off-the-shelf Large Language Models (LLMs) in analyzing scientific texts according to a given classification scheme. We utilized the hierarchical ORKG taxonomy as a classification framework, employing the FORC dataset as ground truth. We investigated the effectiveness of advanced prompt engineering strategies, namely In-Context Learning (ICL) and Prompt Chaining, and experimentally explored the influence of the LLMs' temperature hyperparameter on classification accuracy. Our experiments demonstrate that Prompt Chaining yields superior classification accuracy compared to pure ICL, particularly when applied to the nested structure of the ORKG taxonomy. LLMs with prompt chaining outperform the state-of-the-art models for domain (1st level) prediction and show even better performance for subject (2nd level) prediction compared to the older BERT model. However, LLMs are not yet able to perform well in classifying the topic (3rd level) of research areas based on this specific hierarchical taxonomy, as they only reach about 50% accuracy even with prompt chaining.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. The paper evaluates off-the-shelf LLMs for hierarchical classification of scientific texts into the ORKG taxonomy (domain/subject/topic levels), using the FORC dataset as ground truth. It compares in-context learning (ICL) against prompt chaining, reports that chaining yields higher accuracy (outperforming SOTA models at level 1 and BERT at level 2), examines the temperature hyperparameter, and notes that level-3 accuracy remains around 50%.

Significance. If the empirical results prove robust and reproducible, the work indicates that prompt chaining can improve LLM-based hierarchical categorization over standard ICL, offering a practical alternative to fine-tuned models like BERT for organizing scientific literature in research information systems.

major comments (2)
  1. [Abstract] The central claims (prompt chaining > ICL; outperformance vs. SOTA/BERT at levels 1-2) rest on the assumption that FORC labels are accurate, complete, and correctly aligned with the nested ORKG taxonomy. The abstract states FORC is used as ground truth but provides no mapping procedure, quality checks, or validation of label fidelity; any mismatches would make the reported accuracies (especially the level-3 drop) uninterpretable.
  2. [Experimental protocol] Performance numbers and comparisons are stated without details on exact prompt templates, number of in-context examples, specific temperature values tested, statistical significance, error bars, or full experimental protocol. This prevents verification of whether gains are robust or sensitive to post-hoc choices.
minor comments (1)
  1. [Abstract] The abstract mentions exploring the temperature hyperparameter but does not summarize its observed influence; ensure the main text reports these results clearly with any associated figures or tables.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their insightful comments, which have helped us identify areas for improvement in our manuscript. Below, we provide point-by-point responses to the major comments and indicate how we plan to revise the paper accordingly.

read point-by-point responses
  1. Referee: [Abstract] The central claims (prompt chaining > ICL; outperformance vs. SOTA/BERT at levels 1-2) rest on the assumption that FORC labels are accurate, complete, and correctly aligned with the nested ORKG taxonomy. The abstract states FORC is used as ground truth but provides no mapping procedure, quality checks, or validation of label fidelity; any mismatches would make the reported accuracies (especially the level-3 drop) uninterpretable.

    Authors: We agree with the referee that the abstract lacks sufficient detail on the FORC dataset's alignment with the ORKG taxonomy, which is crucial for interpreting the results. The manuscript does describe the use of FORC as ground truth, but the mapping procedure and validation are not explicitly outlined. To rectify this, we will revise the abstract to include a brief mention of the alignment and add a detailed description of the mapping process, including any quality checks, in the Methods section of the revised manuscript. This will enhance the interpretability of our accuracy figures, particularly at level 3. revision: yes

  2. Referee: [Experimental protocol] Performance numbers and comparisons are stated without details on exact prompt templates, number of in-context examples, specific temperature values tested, statistical significance, error bars, or full experimental protocol. This prevents verification of whether gains are robust or sensitive to post-hoc choices.

    Authors: We concur that a more complete experimental protocol is essential for reproducibility. While the manuscript discusses the temperature hyperparameter and compares ICL with prompt chaining, it does not provide the exact prompt templates or other specifics mentioned. In the revised version, we will include the full experimental details, such as the prompt templates in an appendix, the number of in-context examples used, the specific temperature values tested, and statistical measures including significance tests and error bars. These additions will allow readers to verify the robustness of the reported performance gains. revision: yes

Circularity Check

0 steps flagged

No circularity: purely empirical comparisons without derivations or self-referential reductions

full rationale

The paper conducts an experimental evaluation of LLM prompting techniques (In-Context Learning and Prompt Chaining) for hierarchical classification on the FORC dataset using the ORKG taxonomy. No equations, parameter fittings, or derivation chains exist; reported results are direct accuracy metrics compared to baselines such as BERT. The central claims rest on empirical performance differences rather than any reduction to inputs by construction. Assumptions about FORC label quality and ORKG alignment constitute data-validity concerns, not circularity. No self-citations are load-bearing for the methodology or results.

Axiom & Free-Parameter Ledger

1 free parameters · 2 axioms · 0 invented entities

The work rests on the assumption that the FORC dataset is a faithful ground truth for the ORKG taxonomy and that LLM responses to prompts reflect genuine classification capability rather than memorization or prompt artifacts. No free parameters beyond the temperature hyperparameter are explicitly fitted in the abstract, and no new entities are invented.

free parameters (1)
  • temperature hyperparameter
    The paper experimentally varies this value to study its influence on classification accuracy, implying it is treated as a tunable factor.
axioms (2)
  • domain assumption The ORKG taxonomy provides a valid hierarchical classification scheme for scientific texts.
    Invoked when using the taxonomy levels as the target for prediction.
  • domain assumption The FORC dataset labels are accurate and aligned with ORKG categories.
    Used as ground truth for all accuracy measurements.

pith-pipeline@v0.9.0 · 5579 in / 1437 out tokens · 40882 ms · 2026-05-08T07:13:22.004502+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

48 extracted references · 12 canonical work pages · 5 internal anchors

  1. [1]

    Souradip Chakraborty, Amrit Singh Bedi, Sicheng Zhu, Bang An, Dinesh Manocha, and Furong Huang

    Abburi, H., Suesserman, M., Pudota, N., Veeramani, B., Bowen, E., Bhat- tacharya,S.:Generativeaitextclassificationusingensemblellmapproaches. arXiv preprint arXiv:2309.07755 (2023)

  2. [2]

    In: Interna- tional Workshop on Natural Scientific Language Processing and Research Knowledge Graphs

    Abu Ahmad, R., Borisova, E., Rehm, G.: Forc@ nslp2024: Overview and insights from the field of research classification shared task. In: Interna- tional Workshop on Natural Scientific Language Processing and Research Knowledge Graphs. pp. 189–204. Springer (2024)

  3. [3]

    Natural Language Processing Journal p

    Al Nazi, Z., Hossain, M.R., Al Mamun, F.: Evaluation of open and closed- source llms for low-resource language with zero-shot, few-shot, and chain-of- thought prompting. Natural Language Processing Journal p. 100124 (2025)

  4. [4]

    The Serials Librarian76(1-4), 35–41 (2019)

    Auer, S., Mann, S.: Towards an open research knowledge graph. The Serials Librarian76(1-4), 35–41 (2019)

  5. [5]

    In: LREC (2008)

    Bird, S., Dale, R., Dorr, B.J., Gibson, B.R., Joseph, M.T., Kan, M.Y., Lee, D., Powley, B., Radev, D.R., Tan, Y.F., et al.: The acl anthology reference corpus: A reference dataset for bibliographic research in computational lin- guistics. In: LREC (2008)

  6. [6]

    Humanities and Social Sciences Communications8(1), 1–15 (2021)

    Bornmann, L., Haunschild, R., Mutz, R.: Growth rates of modern science: a latent piecewise growth curve approach to model publication numbers from established and new literature databases. Humanities and Social Sciences Communications8(1), 1–15 (2021)

  7. [7]

    ACM transactions on intelligent systems and technology15(3), 1–45 (2024)

    Chang, Y., Wang, X., Wang, J., Wu, Y., Yang, L., Zhu, K., Chen, H., Yi, X., Wang, C., Wang, Y., et al.: A survey on evaluation of large language models. ACM transactions on intelligent systems and technology15(3), 1–45 (2024)

  8. [8]

    KO KNOWLEDGE OR- GANIZATION40(5), 295–304 (2014)

    Desale, S.K., Kumbhar, R.M.: Research on automatic classification of docu- ments in library environment: a literature review. KO KNOWLEDGE OR- GANIZATION40(5), 295–304 (2014)

  9. [9]

    arXiv preprint arXiv:2412.17321 (2024)

    Devatine, N., Abraham, L.: Assessing human editing effort on llm-generated texts via compression-based edit distance. arXiv preprint arXiv:2412.17321 (2024)

  10. [10]

    BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

    Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018)

  11. [11]

    International Journal of Computer Applications in Technology68(4), 369–378 (2022)

    Enamoto,L.,Santos,A.R.,Maia,R.,Weigang,L.,Filho,G.P.R.:Multi-label legal text classification with bilstm and attention. International Journal of Computer Applications in Technology68(4), 369–378 (2022)

  12. [12]

    Busi- ness & Information Systems Engineering66(1), 111–126 (2024)

    Feuerriegel, S., Hartmann, J., Janiesch, C., Zschech, P.: Generative ai. Busi- ness & Information Systems Engineering66(1), 111–126 (2024)

  13. [13]

    Available at SSRN 4504303 (2023)

    Gao, A.: Prompt engineering for large language models. Available at SSRN 4504303 (2023)

  14. [14]

    arXiv preprint arXiv:2409.18812 (2024) Automating Categorization of Scientific Texts using LLMs 23

    Giglou, H.B., D’Souza, J., Auer, S.: Llms4synthesis: Leveraging large language models for scientific synthesis. arXiv preprint arXiv:2409.18812 (2024) Automating Categorization of Scientific Texts using LLMs 23

  15. [15]

    Journal of Data and Information Science5(1), 18–38 (2020)

    Golub, K., Hagelbäck, J., Ardö, A.: Automatic classification of swedish metadata using dewey decimal classification: a comparison of approaches. Journal of Data and Information Science5(1), 18–38 (2020)

  16. [16]

    In: Proceedings of the 2023 ACM Conference on Fairness, Accountability, and Transparency

    Hacker, P., Engel, A., Mauer, M.: Regulating chatgpt and other large gen- erative ai models. In: Proceedings of the 2023 ACM Conference on Fairness, Accountability, and Transparency. pp. 1112–1123 (2023)

  17. [17]

    D-lib Magazine22(9/10), 37 (2016)

    Herrmannova, D., Knoth, P.: An analysis of the microsoft academic graph. D-lib Magazine22(9/10), 37 (2016)

  18. [18]

    JOM 73(11), 3383–3400 (2021)

    Hong, Z., Ward, L., Chard, K., Blaiszik, B., Foster, I.: Challenges and ad- vances in information extraction from scientific literature: a review. JOM 73(11), 3383–3400 (2021)

  19. [19]

    arXiv preprint arXiv:1508.01991 (2015)

    Huang, Z., Xu, W., Yu, K.: Bidirectional lstm-crf models for sequence tag- ging. arXiv preprint arXiv:1508.01991 (2015)

  20. [20]

    In: Proceedings of the 10th international conference on knowledge capture

    Jaradeh, M.Y., Oelen, A., Farfar, K.E., Prinz, M., D’Souza, J., Kismihók, G., Stocker, M., Auer, S.: Open research knowledge graph: Next generation infrastructure for semantic scholarly knowledge. In: Proceedings of the 10th international conference on knowledge capture. pp. 243–246 (2019)

  21. [21]

    Jiang, M., D’Souza, J., Auer, S., Downie, J.S.: Improving scholarly knowl- edge representation: Evaluating bert-based models for scientific relation classification. In: Digital Libraries at Times of Massive Societal Transition: 22nd International Conference on Asia-Pacific Digital Libraries, ICADL 2020, Kyoto, Japan, November 30–December 1, 2020, Proceedi...

  22. [22]

    Natural Language Processing Journal p

    Kalyan, K.S.: A survey of gpt-3 family large language models including chatgpt and gpt-4. Natural Language Processing Journal p. 100048 (2023)

  23. [23]

    Graham, F.Q

    Kinney, R., Anastasiades, C., Authur, R., Beltagy, I., Bragg, J., Buraczyn- ski,A.,Cachola,I.,Candra,S.,Chandrasekhar,Y.,Cohan,A.,etal.:These- mantic scholar open data platform. arXiv preprint arXiv:2301.10140 (2023)

  24. [24]

    In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition

    Liu, S., Yu, S., Lin, Z., Pathak, D., Ramanan, D.: Language models as black-box optimizers for vision-language models. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp. 12687–12697 (2024)

  25. [25]

    Mahapatra, R., Gayan, M., Jamatia, B., et al.: Artificial intelligence tools to enhance scholarly communication: An exploration based on a systematic review (2024)

  26. [26]

    https://ollama.com (2024), online; ac- cessed 6 August 2024

    Morgan, J., Chiang, M.: Ollama. https://ollama.com (2024), online; ac- cessed 6 August 2024

  27. [27]

    In: Proceedings of the 3rd Work- shop on Trustworthy Natural Language Processing (TrustNLP 2023)

    Mosca, E., Abdalla, M.H.I., Basso, P., Musumeci, M., Groh, G.: Distin- guishing fact from fiction: A benchmark dataset for identifying machine- generated scientific papers in the llm era. In: Proceedings of the 3rd Work- shop on Trustworthy Natural Language Processing (TrustNLP 2023). pp. 190–207 (2023)

  28. [28]

    MIT press (2022)

    Murphy, K.P.: Probabilistic machine learning: an introduction. MIT press (2022)

  29. [29]

    AIS Transactions on Human-Computer Interaction15(3), 247–267 (2023) 24 Shahi & Hummel

    Nah, F., Cai, J., Zheng, R., Pang, N.: An activity system-based perspective of generative ai: Challenges and research directions. AIS Transactions on Human-Computer Interaction15(3), 247–267 (2023) 24 Shahi & Hummel

  30. [30]

    https://ncses.nsf.gov/pubs/nsb202333/publication-output-by-region- country-or-economy-and-by-scientific-field (2023), accessed: 2025-09-23

    National Science Foundation: Publication output by re- gion, country, or economy, and by scientific field. https://ncses.nsf.gov/pubs/nsb202333/publication-output-by-region- country-or-economy-and-by-scientific-field (2023), accessed: 2025-09-23

  31. [31]

    International Journal of Surgery110(2), 1329–1330 (2024)

    Pal,S.,Bhattacharya,M.,Islam,M.A.,Chakraborty,C.:Ai-enabledchatgpt or llm: a new algorithm is required for plagiarism-free scientific writing. International Journal of Surgery110(2), 1329–1330 (2024)

  32. [32]

    arXiv preprint arXiv:2407.07630 (2024)

    Perełkiewicz,M.,Poświata,R.:Areviewofthechallengeswithmassiveweb- mined corpora used in large language models pre-training. arXiv preprint arXiv:2407.07630 (2024)

  33. [33]

    In: Proceedings of the 9th Workshop on Linked Data in Linguistics@ LREC-COLING 2024

    Pertsas, V., Kasapaki, M., Constantopoulos, P.: An annotated dataset for transformer-based scholarly information extraction and linguistic linked data generation. In: Proceedings of the 9th Workshop on Linked Data in Linguistics@ LREC-COLING 2024. pp. 84–93 (2024)

  34. [34]

    Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks

    Reimers, N., Gurevych, I.: Sentence-bert: Sentence embeddings using siamese bert-networks. In: Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics (11 2019), https://arxiv.org/abs/1908.10084

  35. [35]

    Commu- nications of the ACM55(11), 12–12 (2012)

    Rous, B.: Major update to acm’s computing classification system. Commu- nications of the ACM55(11), 12–12 (2012)

  36. [36]

    Libraries Unlimited (1998)

    Scott, M.L.: Dewey decimal classification. Libraries Unlimited (1998)

  37. [37]

    In: Proceedings of the 27th International Conference on Enterprise Information Sys- tems - Volume 1: ICEIS

    Shahi, G., Hummel, O.: On the effectiveness of large language mod- els in automating categorization of scientific texts. In: Proceedings of the 27th International Conference on Enterprise Information Sys- tems - Volume 1: ICEIS. pp. 544–554. INSTICC, SciTePress (2025). https://doi.org/10.5220/0013299100003929

  38. [38]

    In: Proceedings of the Bibliometric- enhanced Information Retrieval Workshop (BIR) at the European Confer- ence on Information Retrieval (ECIR 2024)

    Shahi, G.K., Hummel, O.: Enhancing research information systems with identification of domain experts. In: Proceedings of the Bibliometric- enhanced Information Retrieval Workshop (BIR) at the European Confer- ence on Information Retrieval (ECIR 2024). CEUR Workshop Proceedings, CEUR-WS.org (March 2024)

  39. [39]

    In: Proceedings of the 14th International AAAI Conference on Web and Social Media (2020)

    Shahi, G.K., Nandini, D.: FakeCovid – a multilingual cross-domain fact check news dataset for covid-19. In: Proceedings of the 14th International AAAI Conference on Web and Social Media (2020)

  40. [40]

    Gemma: Open Models Based on Gemini Research and Technology

    Team, G., Mesnard, T., Hardin, C., Dadashi, R., Bhupatiraju, S., Pathak, S., Sifre, L., Rivière, M., Kale, M.S., Love, J., et al.: Gemma: Open models based on gemini research and technology. arXiv preprint arXiv:2403.08295 (2024)

  41. [41]

    LLaMA: Open and Efficient Foundation Language Models

    Touvron, H., Lavril, T., Izacard, G., Martinet, X., Lachaux, M.A., Lacroix, T., Rozière, B., Goyal, N., Hambro, E., Azhar, F., et al.: Llama: Open and efficient foundation language models. arXiv preprint arXiv:2302.13971 (2023)

  42. [42]

    Advances in neural information processing systems30(2017) Automating Categorization of Scientific Texts using LLMs 25

    Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. Advances in neural information processing systems30(2017) Automating Categorization of Scientific Texts using LLMs 25

  43. [43]

    Journal of the American Society for Information Science and Technology 60(11), 2269–2286 (2009)

    Wang, J.: An extensive study on automated dewey decimal classification. Journal of the American Society for Information Science and Technology 60(11), 2269–2286 (2009)

  44. [44]

    Wang, K., Shen, Z., Huang, C., Wu, C.H., Dong, Y., Kanakia, A.: Microsoft academicgraph:Whenexpertsarenotenough.QuantitativeScienceStudies 1(1), 396–413 (2020)

  45. [45]

    International Journal of Digital Earth17(1), 2353122 (2024)

    Wang, S., Hu, T., Xiao, H., Li, Y., Zhang, C., Ning, H., Zhu, R., Li, Z., Ye, X.: Gpt, large language models (llms) and generative artificial intelli- gence (gai) models in geospatial science: a systematic review. International Journal of Digital Earth17(1), 2353122 (2024)

  46. [46]

    Young, J.S., Lammert, M.: Chatgpt for classification: Evaluation of an au- tomated course mapping method in academic libraries (2024)

  47. [47]

    Information Processing & Management60(6), 103507 (2023)

    Zhang, C., Tian, L., Chu, H.: Usage frequency and application variety of re- search methods in library and information science: Continuous investigation from 1991 to 2021. Information Processing & Management60(6), 103507 (2023)

  48. [48]

    A Survey of Large Language Models

    Zhao, W.X., Zhou, K., Li, J., Tang, T., Wang, X., Hou, Y., Min, Y., Zhang, B., Zhang, J., Dong, Z., et al.: A survey of large language models. arXiv preprint arXiv:2303.18223 (2023)