Where's the Structure? A Systematic Literature Review of Empirical Research on Human-AI Collaboration and Hybrid Intelligence for Learning
Pith reviewed 2026-06-28 18:22 UTC · model grok-4.3
The pith
A review of 62 studies shows that human-AI collaboration for learning benefits from structured processes and maps current structures and gaps.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The review of 62 empirical studies characterizes collaboration processes, their structures, and contexts of application in human-AI collaboration for learning, while extracting emerging design knowledge and research gaps.
What carries the argument
The systematic literature review of empirical studies on human-AI collaboration, used to identify and categorize collaboration structures and processes.
Load-bearing premise
The assumption that the 62 selected studies adequately represent the entire field of human-AI collaboration for learning and that the categorization of structures and gaps is accurate.
What would settle it
Finding a significant number of additional empirical studies with effective unstructured human-AI collaboration, or a replication review reaching substantially different conclusions on structures and gaps.
read the original abstract
Artificial intelligence (AI) has been applied across educational contexts to support learning. One approach to such support is "human-AI collaboration" (also termed "hybrid intelligence"), where human(s) and AI components interact to promote human learning. However, as in human-to-human computer-supported collaborative learning (CSCL), unstructured interaction does not necessarily produce an effective learning experience. This paper reports a systematic literature review of empirical studies (N=62) on human-AI collaboration and hybrid intelligence for learning support. The review characterizes collaboration processes, their structures, and contexts of application. It also extracts emerging design knowledge and research gaps. Researchers and technology designers can use these findings as a starting point for structuring more effective AI-enhanced technologies for collaboration, in educational practice and future research.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript reports a systematic literature review of N=62 empirical studies on human-AI collaboration and hybrid intelligence for learning. It aims to characterize collaboration processes, their structures, and contexts of application, extract emerging design knowledge, and identify research gaps to inform the design of AI-enhanced collaborative technologies.
Significance. If the synthesis rests on a rigorous, transparent, and representative sample with clearly documented methods, the extracted structures, design knowledge, and gaps could provide a useful foundation for researchers and designers working at the intersection of AI and computer-supported collaborative learning.
major comments (2)
- [Abstract] Abstract: The abstract states N=62 and high-level goals but supplies no information on search strategy, inclusion criteria, quality assessment, or inter-rater reliability, so it is impossible to judge whether the synthesis supports the stated claims about collaboration processes and research gaps.
- [Methods (or equivalent section describing the review process)] Review process / Methods section: No description of databases searched, search strings, screening process, data extraction protocol, or thematic synthesis method is provided. This information is load-bearing for determining whether the 62 studies constitute a representative sample and whether the reported structures and gaps accurately reflect the literature.
minor comments (1)
- Add a PRISMA flow diagram or equivalent table documenting the study identification, screening, and inclusion steps.
Simulated Author's Rebuttal
We thank the referee for their careful reading and constructive feedback on the transparency of our systematic review. We agree that the current manuscript version does not adequately document the review process and will revise accordingly to strengthen the paper.
read point-by-point responses
-
Referee: [Abstract] Abstract: The abstract states N=62 and high-level goals but supplies no information on search strategy, inclusion criteria, quality assessment, or inter-rater reliability, so it is impossible to judge whether the synthesis supports the stated claims about collaboration processes and research gaps.
Authors: We agree that the abstract should convey more information about the review's methodological rigor. In the revised version we will expand the abstract (within length constraints) to include a concise statement on the search strategy, inclusion/exclusion criteria, quality assessment, and synthesis approach. revision: yes
-
Referee: [Methods (or equivalent section describing the review process)] Review process / Methods section: No description of databases searched, search strings, screening process, data extraction protocol, or thematic synthesis method is provided. This information is load-bearing for determining whether the 62 studies constitute a representative sample and whether the reported structures and gaps accurately reflect the literature.
Authors: We acknowledge that the submitted manuscript lacks a sufficiently detailed Methods section. We will add a complete Methods section that reports the databases searched, exact search strings, PRISMA screening process, data extraction protocol, inter-rater reliability statistics, and the thematic synthesis procedure. This addition will directly address the concern about representativeness and allow readers to evaluate the validity of the extracted structures and gaps. revision: yes
Circularity Check
No circularity: literature review with no derivations or self-referential claims
full rationale
This is a systematic literature review synthesizing findings from 62 external empirical studies. No models, equations, parameters, predictions, or uniqueness theorems are derived. The central claims rest on external literature rather than any reduction to the authors' own inputs, fits, or prior self-citations. No load-bearing steps match the enumerated circularity patterns.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption The 62 studies identified through the search strategy are representative of empirical research on human-AI collaboration for learning
Reference graph
Works this paper leans on
-
[1]
The form combined closed categories (e.g., educational level in IQ1.2; micro-/macro-type of collaboration structure as part of IQ3.1) with open responses (e.g., how is the collaboration process between learner and AI? IQ2.4). All team members independently coded a purposefully selected subset of four sources; disagreements and doubts were discussed until ...
2006
-
[2]
Regarding educational settings (IQ1.2, Figure 3, top-right), most studies were conducted in higher education (57/62 studies)
largely reflects our aim of identifying more established work (and the databases selected). Regarding educational settings (IQ1.2, Figure 3, top-right), most studies were conducted in higher education (57/62 studies). Only a few addressed other levels, such as C.-H. Lin et al.’s (2025) exploration of GenAI for second-language writing in K-12, or were leve...
2025
-
[3]
with versus without
and project-based learning (14 papers), which is consistent with our focus on HAIC/HI and with the fact that projects are often developed through teamwork. Figure 3 WHERE’S THE STRUCTURE? 16 Descriptive characteristics of the N=62 reviewed papers. The reviewed papers’ research designs (IQ1.6) also reflect an early-stage, technology-driven field. Most were...
2024
-
[4]
hybrid adaptivity
define HI as the "combination of human and machine intelligence, augmenting human intellect and capabilities instead of replacing them and achieving goals that were unreachable by either humans or machines" (p. 491). Notably, none of the reviewed studies reconceptualized these general definitions for learning-specific contexts. The work of Dellermann and ...
2021
-
[5]
AI assistant
further triangulates researchers' conceptualizations. A large majority (52/62 papers) used an "AI assistant" mode, in which humans offload tasks or expect answers from the AI (e.g., a ChatGPT chatbot). Despite the HAIC/HI framing, only 10 studies portrayed a "teammate AI" — in which the AI operates as a roughly equal team member (e.g., a system promoting ...
2023
-
[6]
collaboration
or as a teammate (Darban, 2024), reflecting the terminological fuzziness typical of early-stage research areas (cf. Borup et al., 2006; Rip & Voß, 2013). An inductive analysis of collaboration setups (IQ2.4) showed that most studies gave learners a vanilla, non-customized AI chatbot for student-initiated "collaboration" (34 studies), with some providing a...
2024
-
[7]
macro scripts/structures
or process-specific (e.g., designated AI use phases, as in Min et al., 2025). Technological scaffolding integrated into system interfaces was less common (6 papers). Another recurring setup inserted the AI as an actor in human-to-human conversations (6 papers; e.g., Gutiérrez-Ferré et al., 2024). While learner initiative dominated, notable exceptions exis...
2025
-
[8]
micro scripts/structures
– and an overlapping set of "micro scripts/structures" (25/38), which scaffold the interaction itself through devices such as sentence starters or question prompts (cf. Kobbe et al., 2007). As noted under IQ2.4, these structures were often delivered socially (15/38, e.g., teacher instructions as in Garro Mena,
2007
-
[9]
In 21 of those 38 studies, the constraints were additionally driven by LLM prompting (as in CodeTutor, Lyu et al.,
but were sometimes also embedded in technology (15/38, e.g., user interface phases/transitions, as in Weber et al., 2025). In 21 of those 38 studies, the constraints were additionally driven by LLM prompting (as in CodeTutor, Lyu et al.,
2025
-
[10]
AI creates/answers, humans refine
"AI creates/answers, humans refine" (46/62 papers): Students delegate tasks or pose questions to the AI, then assess the AI-generated artefacts or answers, deciding whether further refinement is needed and engaging in multiple cycles of human feedback and AI refinement, either by modifying outputs themselves or by issuing additional requests to the AI. In...
2025
-
[11]
Humans create, AI assesses
"Humans create, AI assesses" (18/62 papers): Students carry out a learning task, typically producing an artefact such as a document or drawing, and at various points may request AI assessment and feedback. That feedback may be structured by a pedagogical framework (akin to a micro structure) or constrained loosely, or not at all, e.g., through LLM prompti...
2025
-
[12]
AI participation in human collaboration
"AI participation in human collaboration" (7/62 papers): Students collaborate through computer support (e.g., a chat), and the AI monitors that collaboration, intervening in three ways: (a) providing an assessment of the ongoing human collaboration (Cai et al., 2024; Sankaranarayanan et al.,
2024
-
[13]
coach" or
– for example, a chatbot detecting unequal participation and suggesting more opportunities for less-active students (Cai et al., 2024); (b) answering a question raised by a participating student (Cai et al., 2024), potentially triggering a shift toward structure #2; or (c) playing the role of another team member (5 papers) – for instance, an LLM-based cha...
2024
-
[14]
Beyond these collaboration structures, a key concern in recent literature is the tension among human control vs
AI participation in human collaboration (bottom). Beyond these collaboration structures, a key concern in recent literature is the tension among human control vs. AI automation, and agency (IQ3.2) which has implications for the reliability, safety, and trustworthiness of human-centered AI (HCAI) approaches (Shneiderman, 2020). These tensions are well-reco...
2020
-
[15]
metacognitive laziness
or "metacognitive laziness" (Y. Fan et al., 2025). Regarding human vs. AI control, only in Mlynář (2024) do students exercise full control over the AI tool, as they are the ones actually building the machine learning model. Gyasi (2025) presents another notable case, allowing students to change the AI's "mode of contribution" (see IQ2.3) by switching betw...
2025
-
[16]
we encourage graduate students to reassess their SMART goals and action plans with their human mentors for personalized support
and find advantages in AI agents assuming various roles/personas, thus being perceived as realistic, competent, and dependable partners (Edwards et al., 2025). The reviewed papers offer a wide variety of more concrete design guidelines. These include using gamification to increase motivation (Aslan et al., 2024), having humans double-check AI outputs (“we...
2025
-
[17]
and WHERE’S THE STRUCTURE? 27 integrating interventions in wider contexts (e.g., going beyond coding to computational modeling, in Chen et al., 2024). Other proposals include combining AI agents with self-monitoring SRL (Self-Regulated Learning) checklists to reduce machine dominance and increase humanization of feedback (Darvishi et al.,
2024
-
[18]
are particularly good at recognizing errors or misunderstandings in students' tasks
and leveraging AI misrecognition as a learning opportunity rather than merely an error (Song et al., 2022). Since AI systems "are particularly good at recognizing errors or misunderstandings in students' tasks" (Weber et al., 2025, p. 669), instructional and technology designers can use them to spot specific errors and provide targeted formative feedback ...
2022
-
[19]
Authors also stress the need for AI literacy, including prompting skills (6 papers) and broader understanding of AI capabilities and limitations (12 papers)
(10 papers) and using misrecognitions or failures as learning opportunities (Song et al., 2022). Authors also stress the need for AI literacy, including prompting skills (6 papers) and broader understanding of AI capabilities and limitations (12 papers). Other works call for aligning HAIC/HI interventions with overall curriculum (e.g., Hwang et al.,
2022
-
[20]
co-learning
(9 papers), or for creating a community of all related stakeholders, e.g., educators, researchers, and developers (Alier et al., 2025). Worth highlighting on its own is Bosch et al.'s (2025) work, noted above as a rare case of genuine "co-learning", in which both human and AI learn through interaction with each other and the environment. The authors provi...
2025
-
[21]
co-creation
explicitly identify HAIC/HI structures and their associated interactions as a primary target for future research, reinforcing the conclusion from IQ2.4 that the collaborative dimension of HAIC/HI remains peripheral to most reviewed works. Discussion Implications of the Review Our analysis of 62 empirical studies on HAIC/HI for student learning reflects an...
2025
-
[22]
Hybrid intelligence
– remains WHERE’S THE STRUCTURE? 30 debatable, given the typical disparity in human and AI goals and the fact that, in the overwhelming majority of cases, AIs do not learn (see Bosch et al., 2025 for an exception). "Hybrid intelligence" is therefore more literally precise, focusing less on the interaction and relationships between humans and AIs, and more...
2025
-
[23]
collaboration
More detailed descriptions of human-AI interactions: as established in (CS)CL research (e.g., Dillenbourg et al., 1996; Stahl, 2006), granular accounts of "collaboration" are essential for understanding learning phenomena in these new settings
1996
-
[24]
Exploration of more complex collaboration structures: the structures surfaced in this review are relatively primitive (see IQ3.1). Emerging theoretical and empirical work is starting to identify more sophisticated micro- and macro-level HAIC/HI structures (e.g., Maya, 2024; Prieto et al., 2023), but empirical testing and comparison (against each other or ...
2024
-
[25]
novelty effects
More longitudinal studies: moving beyond one-shot interventions is necessary to understand HAIC/HI as a potentially distinct form of learning, to rule out "novelty effects", and to examine both its benefits and unwanted side-effects (such as “metacognitive laziness” in Y. Fan et al., 2025)
2025
-
[26]
Decision Support
More rigorous evaluation of learning gains and learner skills, given that assessing the effects of interventions on learning (not merely artifact quality or WHERE’S THE STRUCTURE? 31 task efficiency), despite being the ostensible goal of the field, remains comparatively rare (see IQ1.9 above; and Yan, Greiff, et al., 2025 for a broader discussion in the c...
2025
-
[27]
peer/equal
– such as assistant, coach, or teammate – the rarely-encountered "peer/equal" role, in which AI mimics a fellow student with a comparable knowledge level, merits deeper exploration. Adjacent literature reviews identify overlapping recurrent roles: learning partner (Deng & Yu, 2023), motivator, peer/co-learner (Han et al.,
2023
-
[28]
– all of which can be repurposed in a way that is pedagogically sound. Sharples (2023) proposes more specific AI functions for learning conversations: possibility engine, Socratic opponent, collaboration coach… categories that partly overlap our empirically-sourced ones. Other emerging sources describe pedagogical roles for GenAI in WHERE’S THE STRUCTURE?...
2023
-
[29]
where is the structure?
can be directly incorporated into prompts in LLM-based HAIC/HI systems. The paper's title question ("where is the structure?") also points to how structures are implemented. Our synthesis (IQ2.4, IQ3.1) reveals a majority of studies relying on socially-implemented structures, such as teacher instructions. This approach is flexible and low-cost but unrelia...
2013
-
[30]
We therefore acknowledge that, by publication, some findings may already be outdated
made for a slower review and writeup. We therefore acknowledge that, by publication, some findings may already be outdated. That said, our unsystematic reading of more recent sources appears to confirm the general trends identified (definitional fuzziness, lack of interaction detail, failure to measure learning – see, e.g., Ong et al., 2026; F. Zhang et a...
2026
-
[31]
may follow. Regarding future research, a priority should be convening groups of HAIC/HI (for learning) experts to engage in pattern synthesis, a practice well established in collaborative learning and learning design research (Baggetun et al., 2004; Goodyear,
2004
-
[32]
with the goal of deriving formal design patterns in the Alexandrian sense. Given the volume of recent work and likely tacit knowledge accumulating among researchers not yet codified in publications, formats such as Mor and colleagues’ "participatory pattern workshops" could be especially productive for building the design knowledge needed to support more ...
-
[33]
https://doi.org/10.48550/ARXIV.2503.16307 Akata, Z., Balliet, D., De Rijke, M., Dignum, F., Dignum, V., Eiben, G., Fokkens, A., Grossi, D., Hindriks, K., Hoos, H., Hung, H., Jonker, C., Monz, C., Neerincx, M., Oliehoek, F., Prakken, H., Schlobach, S., Van Der Gaag, L., Van Harmelen, F., … Welling, M. (2020). A Research Agenda for Hybrid Intelligence: Augm...
-
[34]
https://doi.org/10.1007/s44163-024-00203-7 Alexander, C., Ishikawa, S., Silverstein, M., Jacobson, M., King, I. F., & Angel, S. (1977). A pattern language: Towns, buildings, construction. Oxford University Press. Alier, M., Pereira, J., García-Peñalvo, F. J., Casañ, M. J., & Cabré, J. (2025). LAMB: An open-source software framework to create artificial in...
-
[35]
https://doi.org/10.3390/su15042940 Dillenbourg, P. (1999). What do you mean by “Collaborative Learning”? In P. Dillenbourg (Ed.), Collaborative Learning. Cognitive and Computational Approaches (pp. 1–19). Elsevier Science. Dillenbourg, P. (2013). Design for classroom orchestration. Computers and Education, 69, 485–492. WHERE’S THE STRUCTURE? 42 Dillenbour...
-
[36]
https://doi.org/10.1186/s40594-025-00537-3 Fan, Y., Tang, L., Le, H., Shen, K., Tan, S., Zhao, Y., Shen, Y., Li, X., & Gašević, D. (2025). Beware of metacognitive laziness: Effects of generative artificial intelligence on learning motivation, processes, and performance. British Journal of Educational Technology, 56(2), 489–530. https://doi.org/10.1111/bje...
-
[37]
https://doi.org/10.1186/s41239-023-00426-1 Lee, G.-G., Mun, S., Shin, M.-K., & Zhai, X. (2025). Collaborative Learning with Artificial Intelligence Speakers: Pre-service Elementary Science Teachers’ Responses to the WHERE’S THE STRUCTURE? 49 Prototype. Science & Education, 34(2), 847–875. https://doi.org/10.1007/s11191-024-00526-y Lin, C.-H., Zhou, K., Li...
-
[38]
https://doi.org/10.5334/2008-13 WHERE’S THE STRUCTURE? 51 Nguyen, A., Hong, Y., Dang, B., & Huang, X. (2024). Human-AI collaboration patterns in AI-assisted academic writing. Studies in Higher Education, 49(5), 847–864. https://doi.org/10.1080/03075079.2024.2323593 Nguyen, A., Ilesanmi, F., Dang, B., Vuorenmaa, E., & Järvelä, S. (2024). Hybrid Intelligenc...
-
[39]
https://doi.org/10.3390/fi16080268 Prieto, L. P., Asensio-Perez, J. I., Munoz-Cristobal, J. A., Dimitriadis, Y. A., Jorrin-Abellan, I. M., & Gomez-Sanchez, E. (2013). Enabling Teachers to Deploy CSCL Designs across WHERE’S THE STRUCTURE? 53 Distributed Learning Environments. IEEE Transactions on Learning Technologies, 6(4), 324–336. https://doi.org/10.110...
-
[40]
https://doi.org/10.1613/jair.1.12360 Wiethof, C., & Bittner, E. A. C. (2021). Hybrid Intelligence – Combining the Human in the Loop with the Computer in the Loop: A Systematic Literature Review. International Conference on Information Systems (ICIS). Wollny, S., Schneider, J., Di Mitri, D., Weidlich, J., Rittberger, M., & Drachsler, H. (2021). Are We Ther...
-
[41]
https://doi.org/10.1186/s41239-019-0171-0 Zhai, C., Wibowo, S., & Li, L. D. (2024). The effects of over-reliance on AI dialogue systems on students’ cognitive abilities: A systematic review. Smart Learning Environments, 11(1),
-
[42]
All content was reviewed and verified by the author
https://doi.org/10.1186/s40561-024-00316-7 Zhang, F., Gou, J., Shen, K. N., Camarinha-Matos, L. M., & Wang, Z. (2025). Effects of AI teammates on learning behavior in Human-AI collaboration environments: A perspective on self-regulated learning. Education and Information Technologies, 30(18), 26801–26825. https://doi.org/10.1007/s10639-025-13717-z WHERE’S...
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.