pith. machine review for the scientific record. sign in

arxiv: 2604.19042 · v1 · submitted 2026-04-21 · 💻 cs.IR

Recognition: unknown

STK-Adapter: Incorporating Evolving Graph and Event Chain for Temporal Knowledge Graph Extrapolation

Boyan Shi, Huaiyu Wan, Junfeng Shen, Shengnan Guo, Shuyuan Zhao, Wei Chen, Weijie Zhang, Xinrui Hou, Youfang Lin

Authors on Pith no claims yet

Pith reviewed 2026-05-10 02:36 UTC · model grok-4.3

classification 💻 cs.IR
keywords temporal knowledge graphextrapolationlarge language modelsmixture of expertsadaptercross-modality alignmentevent chains
0
0 comments X

The pith

The STK-Adapter integrates spatial-temporal information from evolving graphs and event chains into large language models via mixture-of-experts modules for improved temporal knowledge graph extrapolation.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Temporal knowledge graph extrapolation predicts future events from historical data, but integrating graph structures with large language models often loses key spatial-temporal details and dilutes structural features over fine-tuning. The paper proposes STK-Adapter to bridge this gap by adding three specialized mixture-of-experts components: one for spatial and temporal patterns in the graph, one for semantics in event sequences, and one for aligning the two modalities deeply using guided attention. This setup allows the model to maintain essential information while leveraging the reasoning power of language models. Experiments confirm better performance than existing approaches and effective transfer to new datasets.

Core claim

The Spatial-Temporal Knowledge Adapter flexibly combines an evolving graph encoder with a large language model through Spatial-Temporal MoE for capturing structures and patterns, Event-Aware MoE for temporal dependencies in events, and Cross-Modality Alignment MoE for TKG-guided deep alignment, thereby addressing information loss and feature dilution in TKG extrapolation.

What carries the argument

Spatial-Temporal Knowledge Adapter (STK-Adapter) using three mixture-of-experts modules for spatial-temporal capture, event awareness, and cross-modality alignment.

Load-bearing premise

The proposed mixture-of-experts modules successfully achieve deep cross-modality alignment and preserve the TKG's evolving structural features without introducing new losses or overfitting during LLM fine-tuning.

What would settle it

Conducting the reported experiments on the benchmark datasets and finding that STK-Adapter does not significantly outperform prior methods in extrapolation metrics or cross-dataset performance.

Figures

Figures reproduced from arXiv: 2604.19042 by Boyan Shi, Huaiyu Wan, Junfeng Shen, Shengnan Guo, Shuyuan Zhao, Wei Chen, Weijie Zhang, Xinrui Hou, Youfang Lin.

Figure 1
Figure 1. Figure 1: Limitations of existing approaches for inte [PITH_FULL_IMAGE:figures/full_fig_p002_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: The overall architecture of STK-Adapter consists of three MoEs: the ST-MoE, the EA-MoE, and the [PITH_FULL_IMAGE:figures/full_fig_p003_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: A comparison of Hit@1 performance be￾tween STK-Adapter and MESH on four datasets, using three backbone LLMs: Llama3-8B, Qwen2.5-7B, and Mistral-7B. 5.2.2 Compatibility Analysis To assess the compatibility of STK-Adapter with different evolving graph encoders, we evaluate its performance with four pre-trained encoders: REGCN, TiRGN, CognTKE, and LogCL. As shown in [PITH_FULL_IMAGE:figures/full_fig_p007_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: A comparison of Hit@1 performance among STK-Adapter, LLM-DA, and CognTKE in a cross￾dataset generalization setting. 5.5 Cross-Dataset Generalization To evaluate the cross-dataset generalization of STK￾Adapter, we conduct zero-shot experiments on the ICE series datasets. Following Chen et al. (2025c), models are trained on one dataset and tested on another without fine-tuning. We compare STK￾Adapter with th… view at source ↗
Figure 5
Figure 5. Figure 5: Distribution of expert routing decisions across [PITH_FULL_IMAGE:figures/full_fig_p009_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: An example of the instruction format. B.2 Instruction Construction Instruction tuning enhances the instruction￾following capabilities of LLMs by fine-tuning them on curated prompt-response pairs. In our pro￾posed STK-Adapter framework, we formulate the TKG extrapolation task as a generative instruction￾following task; the specific format of the instruction is illustrated in [PITH_FULL_IMAGE:figures/full_f… view at source ↗
Figure 7
Figure 7. Figure 7: Study on the proportion λ of the hybrid score, the dimension d of each expert, the number of experts n, and the depth m of neighborhood sampling. F.2 Parameter Sensitivity Study We conduct experiments on the ICE14 and WIKI datasets to explore the effects of four hyperparam￾eters: the hybrid score proportion λ, the number of experts n, the dimension d of each expert and the depth m of neighborhood sampling … view at source ↗
Figure 8
Figure 8. Figure 8: Visualization of case study. mance and efficiency. For the number of experts n, the model’s performance initially improves as n increases before declining. This trend reflects that a moderate number of experts helps capture diverse patterns, while too many cause insufficient training per expert. A similar trend is observed for the neighborhood sampling depth m, where appro￾priate sampling depth provides es… view at source ↗
read the original abstract

Temporal Knowledge Graph (TKG) extrapolation aims to predict future events based on historical facts. Recent studies have attempted to enhance TKG extrapolation by integrating TKG's evolving structural representations and textual event chains into Large Language Models (LLMs). Yet, two main challenges limit these approaches: (1) The loss of essential spatial-temporal information due to shallow alignment between TKG's graph evolving structural representation and the LLM's semantic space, and (2) the progressive dilution of the TKG's evolving structural features during LLM fine-tuning. To address these challenges, we propose the Spatial-Temporal Knowledge Adapter (STK-Adapter), which flexibly integrates the evolving graph encoder and the LLM to facilitate TKG reasoning. In STK-Adapter, a Spatial-Temporal MoE is designed to capture spatial structures and temporal patterns inherent in TKGs. An Event-Aware MoE is employed to model intricate temporal semantics dependencies within event chains. In addition, a Cross-Modality Alignment MoE is proposed to facilitate deep cross-modality alignment by TKG-guided attention experts. Extensive experiments on benchmark datasets demonstrate that STK-Adapter significantly outperforms state-of-the-art methods and exhibits strong generalization capabilities in cross-dataset task. The code is available at https://github.com/Zhaoshuyuan0246/STK-Adapter.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. The manuscript proposes the Spatial-Temporal Knowledge Adapter (STK-Adapter) for temporal knowledge graph (TKG) extrapolation. It integrates an evolving graph encoder with large language models (LLMs) using three Mixture-of-Experts (MoE) components: a Spatial-Temporal MoE to capture spatial structures and temporal patterns in TKGs, an Event-Aware MoE to model temporal semantics dependencies in event chains, and a Cross-Modality Alignment MoE with TKG-guided attention experts to achieve deep cross-modality alignment. The approach targets two challenges—loss of spatial-temporal information from shallow alignment and progressive dilution of evolving structural features during LLM fine-tuning—and reports that extensive experiments on benchmark datasets show significant outperformance over state-of-the-art methods along with strong cross-dataset generalization. Code is released at the provided GitHub link.

Significance. If the empirical claims hold with adequate mechanistic support, the work would offer a modular adapter architecture that better preserves graph evolution when interfacing TKGs with LLMs, potentially improving extrapolation accuracy and generalization in temporal reasoning tasks. The open-sourcing of code supports reproducibility, which strengthens the contribution if the MoE routing and alignment mechanisms prove robust.

major comments (2)
  1. [Methods (MoE subsections)] Methods section describing the three MoE modules: the central claim that Spatial-Temporal MoE, Event-Aware MoE, and Cross-Modality Alignment MoE (with TKG-guided attention experts) resolve shallow alignment and feature dilution rests on an assertion that expert routing and attention prevent LLM overwriting of graph structure, but no equations for routing functions, attention computation, or training objectives are supplied to demonstrate this mechanistically.
  2. [Experiments] Experimental results and ablation sections: the abstract states that STK-Adapter 'significantly outperforms' SOTA methods and shows 'strong generalization' in cross-dataset tasks, yet no quantitative metrics, ablation tables isolating each MoE's contribution, error bars, or details on how the experts are trained appear in the provided description, leaving the load-bearing empirical support unverified.
minor comments (1)
  1. [Abstract] The abstract refers to 'benchmark datasets' without naming them or the specific metrics used; adding these would improve clarity even if they appear later in the paper.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the thoughtful and detailed comments. We will revise the manuscript to strengthen the methodological rigor and empirical presentation as outlined below.

read point-by-point responses
  1. Referee: [Methods (MoE subsections)] Methods section describing the three MoE modules: the central claim that Spatial-Temporal MoE, Event-Aware MoE, and Cross-Modality Alignment MoE (with TKG-guided attention experts) resolve shallow alignment and feature dilution rests on an assertion that expert routing and attention prevent LLM overwriting of graph structure, but no equations for routing functions, attention computation, or training objectives are supplied to demonstrate this mechanistically.

    Authors: We acknowledge that the current Methods section provides high-level descriptions of the three MoE modules without the explicit mathematical formulations. In the revised manuscript we will add the equations for the expert routing functions (including the gating mechanism), the TKG-guided attention computation within the Cross-Modality Alignment MoE, and the composite training objective. These additions will directly illustrate how the routing and attention mechanisms are designed to preserve evolving graph structure and prevent overwriting during LLM fine-tuning. revision: yes

  2. Referee: [Experiments] Experimental results and ablation sections: the abstract states that STK-Adapter 'significantly outperforms' SOTA methods and shows 'strong generalization' in cross-dataset tasks, yet no quantitative metrics, ablation tables isolating each MoE's contribution, error bars, or details on how the experts are trained appear in the provided description, leaving the load-bearing empirical support unverified.

    Authors: The full manuscript reports experimental results on standard TKG benchmarks and cross-dataset generalization. To make the empirical support fully verifiable, we will expand the Experiments section with complete quantitative tables (including exact metric values), ablation studies that isolate the contribution of each MoE component, error bars computed over multiple random seeds, and additional details on expert training procedures and routing statistics. revision: yes

Circularity Check

0 steps flagged

No significant circularity; claims rest on empirical benchmarks rather than any derivation chain.

full rationale

The paper introduces the STK-Adapter architecture with Spatial-Temporal MoE, Event-Aware MoE, and Cross-Modality Alignment MoE to integrate TKG evolving graphs and event chains into LLMs. Its central claims of outperformance and cross-dataset generalization are asserted via experimental results on benchmark datasets, with no mathematical derivation, first-principles equations, or predictive steps that reduce by construction to fitted inputs, self-citations, or ansatzes. The work contains no load-bearing uniqueness theorems, self-definitional relations, or renamed known results; it is a standard empirical architecture proposal whose validity is externally falsifiable through replication of the reported benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The proposal rests on standard assumptions from the MoE and adapter literature (transformers can be extended with expert routing, attention can align modalities) plus the domain assumption that TKG evolving structure and event chains contain recoverable spatial-temporal signals; no new physical entities or ad-hoc constants are introduced.

axioms (2)
  • domain assumption Mixture-of-experts routing can selectively preserve structural features that would otherwise be diluted during LLM fine-tuning.
    Invoked when the paper states that the Spatial-Temporal MoE and Cross-Modality Alignment MoE solve the dilution problem.
  • domain assumption Deep cross-modality alignment via TKG-guided attention experts is feasible and superior to shallow alignment.
    Central to the design of the third MoE component.

pith-pipeline@v0.9.0 · 5564 in / 1660 out tokens · 42118 ms · 2026-05-10T02:36:37.551741+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

80 extracted references · 28 canonical work pages · 4 internal anchors

  1. [1]

    Proceedings of the AAAI conference on artificial intelligence , volume=

    Learning from history: Modeling temporal knowledge graphs with sequential copy-generation networks , author=. Proceedings of the AAAI conference on artificial intelligence , volume=

  2. [2]

    Proceedings of the AAAI conference on artificial intelligence , volume=

    Temporal knowledge graph reasoning with historical contrastive learning , author=. Proceedings of the AAAI conference on artificial intelligence , volume=

  3. [3]

    Proceedings of the 44th international ACM SIGIR conference on research and development in information retrieval , pages=

    Temporal knowledge graph reasoning based on evolutional representation learning , author=. Proceedings of the 44th international ACM SIGIR conference on research and development in information retrieval , pages=

  4. [4]

    The semantic web: 15th international conference, ESWC 2018, Heraklion, Crete, Greece, June 3--7, 2018, proceedings 15 , pages=

    Modeling relational data with graph convolutional networks , author=. The semantic web: 15th international conference, ESWC 2018, Heraklion, Crete, Greece, June 3--7, 2018, proceedings 15 , pages=. 2018 , organization=

  5. [5]

    Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling

    Empirical evaluation of gated recurrent neural networks on sequence modeling , author=. arXiv preprint arXiv:1412.3555 , year=

  6. [6]

    International conference on learning representations , year=

    Explainable subgraph reasoning for forecasting on temporal knowledge graphs , author=. International conference on learning representations , year=

  7. [7]

    , author=

    TiRGN: Time-Guided Recurrent Graph Network with Local-Global Historical Patterns for Temporal Knowledge Graph Reasoning. , author=. IJCAI , pages=

  8. [8]

    2024 IEEE 40th International Conference on Data Engineering (ICDE) , pages=

    Local-global history-aware contrastive learning for temporal knowledge graph reasoning , author=. 2024 IEEE 40th International Conference on Data Engineering (ICDE) , pages=. 2024 , organization=

  9. [9]

    arXiv preprint arXiv:2109.04101 , year=

    Timetraveler: Reinforcement learning for temporal knowledge graph forecasting , author=. arXiv preprint arXiv:2109.04101 , year=

  10. [10]

    Proceedings of the AAAI conference on artificial intelligence , volume=

    Tlogic: Temporal logical rules for explainable link forecasting on temporal knowledge graphs , author=. Proceedings of the AAAI conference on artificial intelligence , volume=

  11. [11]

    Proceedings of the AAAI Conference on Artificial Intelligence , volume=

    CognTKE: A Cognitive Temporal Knowledge Extrapolation Framework , author=. Proceedings of the AAAI Conference on Artificial Intelligence , volume=

  12. [12]

    ACM transactions on intelligent systems and technology , volume=

    A survey on evaluation of large language models , author=. ACM transactions on intelligent systems and technology , volume=. 2024 , publisher=

  13. [13]

    arXiv preprint arXiv:2305.07912 , year=

    Pre-trained language model with prompts for temporal knowledge graph completion , author=. arXiv preprint arXiv:2305.07912 , year=

  14. [17]

    Advances in Neural Information Processing Systems , volume=

    Large language models-guided dynamic adaptation for temporal knowledge graph reasoning , author=. Advances in Neural Information Processing Systems , volume=

  15. [18]

    Proceedings of the ACM Web Conference 2024 , pages=

    Graphtranslator: Aligning graph model to large language model for open-ended tasks , author=. Proceedings of the ACM Web Conference 2024 , pages=

  16. [19]

    Findings of the Association for Computational Linguistics ACL 2024 , pages=

    KG-adapter: Enabling knowledge graph integration in large language models through parameter-efficient fine-tuning , author=. Findings of the Association for Computational Linguistics ACL 2024 , pages=

  17. [21]

    arXiv preprint arXiv:1904.05530 , year=

    Recurrent event network: Autoregressive structure inference over temporal knowledge graphs , author=. arXiv preprint arXiv:1904.05530 , year=

  18. [22]

    arXiv preprint arXiv:2203.07782 , year=

    Complex evolutional pattern learning for temporal knowledge graph reasoning , author=. arXiv preprint arXiv:2203.07782 , year=

  19. [23]

    arXiv preprint arXiv:2210.09708 , year=

    Hismatch: Historical structure matching based temporal knowledge graph reasoning , author=. arXiv preprint arXiv:2210.09708 , year=

  20. [24]

    2023 IEEE 39th international conference on data engineering (ICDE) , pages=

    RETIA: relation-entity twin-interact aggregation for temporal knowledge graph extrapolation , author=. 2023 IEEE 39th international conference on data engineering (ICDE) , pages=. 2023 , organization=

  21. [26]

    Proceedings of the 2018 conference on empirical methods in natural language processing , pages=

    Hyte: Hyperplane-based temporally aware knowledge graph embedding , author=. Proceedings of the 2018 conference on empirical methods in natural language processing , pages=

  22. [28]

    Advances in Neural Information Processing Systems , volume=

    Are language models actually useful for time series forecasting? , author=. Advances in Neural Information Processing Systems , volume=

  23. [29]

    A Multi-Expert Structural-Semantic Hybrid Framework for Unveiling Historical Patterns in Temporal Knowledge Graphs , booktitle =

    Yimin Deng and Yuxia Wu and Yejing Wang and Guoshuai Zhao and Li Zhu and Qidong Liu and Derong Xu and Zichuan Fu and Xian Wu and Yefeng Zheng and Xiangyu Zhao and Xueming Qian , editor =. A Multi-Expert Structural-Semantic Hybrid Framework for Unveiling Historical Patterns in Temporal Knowledge Graphs , booktitle =. 2025 , url =

  24. [31]

    and Jordan, Michael I

    Jacobs, Robert A. and Jordan, Michael I. and Nowlan, Steven J. and Hinton, Geoffrey E. , journal=. Adaptive Mixtures of Local Experts , year=

  25. [32]

    Hu and Yelong Shen and Phillip Wallis and Zeyuan Allen

    Edward J. Hu and Yelong Shen and Phillip Wallis and Zeyuan Allen. LoRA: Low-Rank Adaptation of Large Language Models , booktitle =. 2022 , url =

  26. [33]

    International conference on machine learning , pages=

    Parameter-efficient transfer learning for NLP , author=. International conference on machine learning , pages=. 2019 , organization=

  27. [36]

    Proceedings of the AAAI Conference on Artificial Intelligence , volume=

    Spatial-Temporal Knowledge Distillation for Takeaway Recommendation , author=. Proceedings of the AAAI Conference on Artificial Intelligence , volume=

  28. [38]

    Advances in neural information processing systems , volume=

    Sequence to sequence learning with neural networks , author=. Advances in neural information processing systems , volume=

  29. [39]

    arXiv preprint arXiv:2503.20633 , year=

    Enhancing Multi-modal Models with Heterogeneous MoE Adapters for Fine-tuning , author=. arXiv preprint arXiv:2503.20633 , year=

  30. [40]

    Proceedings of the 32nd ACM international conference on information and knowledge management , pages=

    St-moe: Spatio-temporal mixture-of-experts for debiasing in traffic prediction , author=. Proceedings of the 32nd ACM international conference on information and knowledge management , pages=

  31. [43]

    Journal of Machine Learning Research , volume=

    Switch transformers: Scaling to trillion parameter models with simple and efficient sparsity , author=. Journal of Machine Learning Research , volume=

  32. [44]

    The Thirteenth International Conference on Learning Representations , year=

    HMoRA: Making LLMs more effective with hierarchical mixture of loRA experts , author=. The Thirteenth International Conference on Learning Representations , year=

  33. [45]

    Advances in Neural Information Processing Systems , volume=

    Fusemoe: Mixture-of-experts transformers for fleximodal fusion , author=. Advances in Neural Information Processing Systems , volume=

  34. [46]

    2024 , url =

    Llama 3 Model Card , author=. 2024 , url =

  35. [48]

    ArXiv , year=

    Mistral 7B , author=. ArXiv , year=

  36. [49]

    DATA INTELLIGENCE , Year =

    Wang, Jing and Zhang, Shuo and Li, Runzhi , Title =. DATA INTELLIGENCE , Year =. doi:10.3724/2096-7004.di.2024.0023 , Keywords =

  37. [50]

    DATA INTELLIGENCE , Year =

    Xiao, Peng and Liu, Chao and Jia, Wei and Dong, Lijun , Title =. DATA INTELLIGENCE , Year =. doi:10.3724/2096-7004.di.2025.0023 , Keywords =

  38. [54]

    Knowledge-Based Systems , pages=

    Dual-view Temporal Knowledge Graph Reasoning , author=. Knowledge-Based Systems , pages=. 2025 , publisher=

  39. [55]

    IEEE Transactions on Knowledge and Data Engineering , year=

    Next-POI Recommendation via Spatial-Temporal Knowledge Graph Contrastive Learning and Trajectory Prompt , author=. IEEE Transactions on Knowledge and Data Engineering , year=

  40. [56]

    Applied Intelligence , volume=

    Exploiting multi-attention network with contextual influence for point-of-interest recommendation , author=. Applied Intelligence , volume=. 2021 , publisher=

  41. [58]

    Proceedings of the AAAI Conference on Artificial Intelligence , volume=

    Think How Your Teammates Think: Active Inference Can Benefit Decentralized Execution , author=. Proceedings of the AAAI Conference on Artificial Intelligence , volume=

  42. [59]

    Pattern Recognition , pages=

    PUA: Pseudo-Features Made Useful Again for Robust Graph Node Classification under Distribution Shift , author=. Pattern Recognition , pages=. 2026 , publisher=

  43. [60]

    AI@Meta. 2024. https://github.com/meta-llama/llama3/blob/main/MODEL_CARD.md Llama 3 model card

  44. [61]

    Elizabeth Boschee, Jennifer Lautenschlager, Sean O'Brien, Steve Shellman, James Starz, and Michael Ward. 2015. https://doi.org/10.7910/DVN/28075 ICEWS Coded Event Data

  45. [62]

    He Chang, Jie Wu, Zhulin Tao, Yunshan Ma, Xianglin Huang, and Tat-Seng Chua. 2025. Integrate temporal graph learning into llm-based temporal knowledge graph model. arXiv preprint arXiv:2501.11911

  46. [63]

    Huajun Chen. 2024. https://doi.org/10.3724/2096-7004.di.2024.0001 Large knowledge model: Perspectives and challenges . DATA INTELLIGENCE, 6(3):587--620

  47. [64]

    Wei Chen, Haoyu Huang, Zhiyu Zhang, Tianyi Wang, Youfang Lin, Liang Chang, and Huaiyu Wan. 2025 a . Next-poi recommendation via spatial-temporal knowledge graph contrastive learning and trajectory prompt. IEEE Transactions on Knowledge and Data Engineering

  48. [65]

    Wei Chen, Huaiyu Wan, Yuting Wu, Shuyuan Zhao, Jiayaqi Cheng, Yuxin Li, and Youfang Lin. 2024. Local-global history-aware contrastive learning for temporal knowledge graph reasoning. In 2024 IEEE 40th International Conference on Data Engineering (ICDE), pages 733--746. IEEE

  49. [66]

    Wei Chen, Yuting Wu, Shengnan Guo, Shuhan Wu, Zhishu Jiang, Youfang Lin, and Huaiyu Wan. 2025 b . Dual-view temporal knowledge graph reasoning. Knowledge-Based Systems, page 114330

  50. [67]

    Wei Chen, Yuting Wu, Shuhan Wu, Zhiyu Zhang, Mengqi Liao, Youfang Lin, and Huaiyu Wan. 2025 c . Cogntke: A cognitive temporal knowledge extrapolation framework. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 39, pages 14815--14823

  51. [68]

    Zixiang Chen, Yihe Deng, Yue Wu, Quanquan Gu, and Yuanzhi Li. 2022. Towards understanding mixture of experts in deep learning. arXiv preprint arXiv:2208.02813

  52. [69]

    Shib Sankar Dasgupta, Swayambhu Nath Ray, and Partha Talukdar. 2018. Hyte: Hyperplane-based temporally aware knowledge graph embedding. In Proceedings of the 2018 conference on empirical methods in natural language processing, pages 2001--2011

  53. [70]

    Yimin Deng, Yuxia Wu, Yejing Wang, Guoshuai Zhao, Li Zhu, Qidong Liu, Derong Xu, Zichuan Fu, Xian Wu, Yefeng Zheng, Xiangyu Zhao, and Xueming Qian. 2025. https://aclanthology.org/2025.findings-acl.1056/ A multi-expert structural-semantic hybrid framework for unveiling historical patterns in temporal knowledge graphs . In Findings of the Association for Co...

  54. [71]

    Shihan Dou, Enyu Zhou, Yan Liu, Songyang Gao, Wei Shen, Limao Xiong, Yuhao Zhou, Xiao Wang, Zhiheng Xi, Xiaoran Fan, Shiliang Pu, Jiang Zhu, Rui Zheng, Tao Gui, Qi Zhang, and Xuanjing Huang. 2024. https://doi.org/10.18653/V1/2024.ACL-LONG.106 Loramoe: Alleviating world knowledge forgetting in large language models via moe-style plugin . In Proceedings of ...

  55. [72]

    Yifu Gao, Linbo Qiao, Zhigang Kan, Zhihua Wen, Yongquan He, and Dongsheng Li. 2024. Two-stage generative question answering on temporal knowledge graph using large language models. arXiv preprint arXiv:2402.16568

  56. [73]

    Xing Han, Huy Nguyen, Carl Harris, Nhat Ho, and Suchi Saria. 2024. Fusemoe: Mixture-of-experts transformers for fleximodal fusion. Advances in Neural Information Processing Systems, 37:67850--67900

  57. [74]

    Neil Houlsby, Andrei Giurgiu, Stanislaw Jastrzebski, Bruna Morrone, Quentin De Laroussilhe, Andrea Gesmundo, Mona Attariyan, and Sylvain Gelly. 2019. Parameter-efficient transfer learning for nlp. In International conference on machine learning, pages 2790--2799. PMLR

  58. [75]

    Hu, Yelong Shen, Phillip Wallis, Zeyuan Allen - Zhu, Yuanzhi Li, Shean Wang, Lu Wang, and Weizhu Chen

    Edward J. Hu, Yelong Shen, Phillip Wallis, Zeyuan Allen - Zhu, Yuanzhi Li, Shean Wang, Lu Wang, and Weizhu Chen. 2022. https://openreview.net/forum?id=nZeVKeeFYf9 Lora: Low-rank adaptation of large language models . In The Tenth International Conference on Learning Representations, ICLR 2022, Virtual Event, April 25-29, 2022 . OpenReview.net

  59. [76]

    Jin Huang, Xingjian Zhang, Qiaozhu Mei, and Jiaqi Ma. 2023. Can llms effectively leverage graph structural information through prompts, and why? arXiv preprint arXiv:2309.16595

  60. [77]

    doi:10.1162/neco.1991.3.1.79

    Robert A. Jacobs, Michael I. Jordan, Steven J. Nowlan, and Geoffrey E. Hinton. 1991. https://doi.org/10.1162/neco.1991.3.1.79 Adaptive mixtures of local experts . Neural Computation, 3(1):79--87

  61. [78]

    Albert Qiaochu Jiang, Alexandre Sablayrolles, Arthur Mensch, Chris Bamford, Devendra Singh Chaplot, Diego de Las Casas, Florian Bressand, Gianna Lengyel, Guillaume Lample, Lucile Saulnier, L \'e lio Renard Lavaud, Marie-Anne Lachaux, Pierre Stock, Teven Le Scao, Thibaut Lavril, Thomas Wang, Timoth \'e e Lacroix, and William El Sayed. 2023. https://api.sem...

  62. [79]

    Dong-Ho Lee, Kian Ahrabian, Woojeong Jin, Fred Morstatter, and Jay Pujara. 2023. Temporal knowledge graph forecasting without knowledge using in-context learning. arXiv preprint arXiv:2305.10613

  63. [80]

    Yujia Li, Shiliang Sun, and Jing Zhao. 2022. Tirgn: Time-guided recurrent graph network with local-global historical patterns for temporal knowledge graph reasoning. In IJCAI, pages 2152--2158

  64. [81]

    Zixuan Li, Xiaolong Jin, Wei Li, Saiping Guan, Jiafeng Guo, Huawei Shen, Yuanzhuo Wang, and Xueqi Cheng. 2021. Temporal knowledge graph reasoning based on evolutional representation learning. In Proceedings of the 44th international ACM SIGIR conference on research and development in information retrieval, pages 408--417

  65. [82]

    Mengqi Liao, Wei Chen, Junfeng Shen, Shengnan Guo, and Huaiyu Wan. 2025. Hmora: Making llms more effective with hierarchical mixture of lora experts. In The Thirteenth International Conference on Learning Representations

  66. [83]

    Ruotong Liao, Xu Jia, Yangzhe Li, Yunpu Ma, and Volker Tresp. 2023. Gentkg: Generative forecasting on temporal knowledge graph with large language models. arXiv preprint arXiv:2310.07793

  67. [84]

    Yushan Liu, Yunpu Ma, Marcel Hildebrandt, Mitchell Joblin, and Volker Tresp. 2022. Tlogic: Temporal logical rules for explainable link forecasting on temporal knowledge graphs. In Proceedings of the AAAI conference on artificial intelligence, volume 36, pages 4120--4127

  68. [85]

    Ruilin Luo, Tianle Gu, Haoling Li, Junzhe Li, Zicheng Lin, Jiayi Li, and Yujiu Yang. 2024. Chain of history: Learning and forecasting with llms for temporal knowledge graph completion. arXiv preprint arXiv:2401.06072

  69. [86]

    Noam Shazeer, Azalia Mirhoseini, Krzysztof Maziarz, Andy Davis, Quoc Le, Geoffrey Hinton, and Jeff Dean. 2017. Outrageously large neural networks: The sparsely-gated mixture-of-experts layer. arXiv preprint arXiv:1701.06538

  70. [87]

    Ilya Sutskever, Oriol Vinyals, and Quoc V Le. 2014. Sequence to sequence learning with neural networks. Advances in neural information processing systems, 27

  71. [88]

    Jiapu Wang, Sun Kai, Linhao Luo, Wei Wei, Yongli Hu, Alan Wee-Chung Liew, Shirui Pan, and Baocai Yin. 2024 a . Large language models-guided dynamic adaptation for temporal knowledge graph reasoning. Advances in Neural Information Processing Systems, 37:8384--8410

  72. [89]

    Keyu Wang, Guilin Qi, Jiaoyan Chen, Yi Huang, and Tianxing Wu. 2024 b . https://doi.org/10.3724/2096-7004.di.2024.0088 Embedding ontologies via incorporating extensional and intensional knowledge . DATA INTELLIGENCE, 6(4):1222--1241

  73. [90]

    Yaqing Wang, Subhabrata Mukherjee, Xiaodong Liu, Jing Gao, Ahmed Hassan Awadallah, and Jianfeng Gao. 2022. Adamix: Mixture-of-adapter for parameter-efficient tuning of large language models. arXiv preprint arXiv:2205.12410, 1(2):4

  74. [91]

    Hao Wu, Shoucheng Song, Chang Yao, Sheng Han, Huaiyu Wan, Youfang Lin, and Kai Lv. 2026. Think how your teammates think: Active inference can benefit decentralized execution. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 40, pages 29749--29757

  75. [92]

    An Yang, Baosong Yang, Binyuan Hui, Bo Zheng, Bowen Yu, Chang Zhou, Chengpeng Li, Chengyuan Li, Dayiheng Liu, Fei Huang, Guanting Dong, Haoran Wei, Huan Lin, Jialong Tang, Jialin Wang, Jian Yang, Jianhong Tu, Jianwei Zhang, Jianxin Ma, and 40 others. 2024. Qwen2 technical report. arXiv preprint arXiv:2407.10671

  76. [93]

    Chang Yao, Youfang Lin, Shoucheng Song, Hao Wu, Yuqing Ma, Shang Han, and Kai Lv. 2025. From general relation patterns to task-specific decision-making in continual multi-agent coordination. arXiv preprint arXiv:2507.06004

  77. [94]

    Zihao Yin, Zhihai Wang, Haiyang Liu, Chuanlan Li, Muyun Yao, Shijiang Li, Fangjing Li, Jia Ren, and Yanchao Yang. 2026. Pua: Pseudo-features made useful again for robust graph node classification under distribution shift. Pattern Recognition, page 113185

  78. [95]

    Dacao Zhang, Kun Zhang, Shimao Chu, Le Wu, Xin Li, and Si Wei. 2025 a . More: A mixture of low-rank experts for adaptive multi-task learning. arXiv preprint arXiv:2505.22694

  79. [96]

    Zhiyu Zhang, Wei Chen, Youfang Lin, and Huaiyu Wan. 2025 b . A generative adaptive replay continual learning model for temporal knowledge graph reasoning. arXiv preprint arXiv:2506.04083

  80. [97]

    Shuyuan Zhao, Wei Chen, Boyan Shi, Liyong Zhou, Shuohao Lin, and Huaiyu Wan. 2025. Spatial-temporal knowledge distillation for takeaway recommendation. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 39, pages 13365--13373