TrimCaching: Parameter-sharing Edge Caching for AI Model Downloading

Fangming Liu; Guanqiao Qu; Jian Li; Kaibin Huang; Qian Chen; Xianhao Chen; Zheng Lin

arxiv: 2404.14204 · v5 · pith:OHB7YQ6Cnew · submitted 2024-04-22 · 💻 cs.NI

TrimCaching: Parameter-sharing Edge Caching for AI Model Downloading

Guanqiao Qu , Zheng Lin , Qian Chen , Jian Li , Fangming Liu , Xianhao Chen , Kaibin Huang This is my paper

Pith reviewed 2026-05-24 02:31 UTC · model grok-4.3

classification 💻 cs.NI

keywords edge cachingAI model downloadingparameter sharingcache hit ratiowireless networksmodel placementsubmodular optimization

0 comments

The pith

Sharing parameter blocks across AI models lets edge networks cache more models and raise hit ratios.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper develops TrimCaching to place AI models on edge servers by treating shared parameter blocks as reusable storage units rather than caching each model whole. It formulates the placement task as maximizing cache hit ratio while trading off storage use against download latency in multi-edge wireless settings. The authors prove the general problem is submodular maximization under submodular constraints and therefore lacks a constant-factor polynomial algorithm, then supply a (1-ε)/2-approximation for the practical case of a small fixed number of shared blocks and a greedy heuristic for the general case. Simulations show the resulting placements achieve higher hit ratios than conventional content caching that ignores parameter overlap.

Core claim

TrimCaching exploits the fact that many AI models share parameter blocks containing reusable knowledge; modeling this overlap turns model placement into a submodular maximization problem whose solution, via a special-case polynomial algorithm or a greedy method, improves cache hit ratio over non-sharing baselines.

What carries the argument

The parameter-sharing model placement formulation that treats shared blocks as common storage items to maximize hit ratio under latency and storage constraints.

If this is right

Edge servers can store a larger effective catalog of models within the same memory budget.
Download latency for users requesting models with shared blocks drops because fewer unique blocks need transmission.
The approximation algorithms give network operators a concrete way to compute placements without solving the NP-hard general problem.
The framework extends to any set of models whose parameter overlap can be quantified in advance.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

If parameter sharing turns out to be dynamic rather than fixed, the placement decisions would need periodic recomputation as new models arrive.
The same sharing idea could apply to caching of other composite objects such as container images or dataset shards that contain overlapping files.
Operators might combine TrimCaching with popularity prediction to decide which shared blocks to pre-position on which edges.

Load-bearing premise

A wide range of AI models share a significant proportion of parameter blocks that can be treated as reusable across models.

What would settle it

Running the placement algorithm on a trace of real AI models where the measured shared-block overlap is below the level assumed in the special-case analysis and observing no hit-ratio gain over independent-model caching.

Figures

Figures reproduced from arXiv: 2404.14204 by Fangming Liu, Guanqiao Qu, Jian Li, Kaibin Huang, Qian Chen, Xianhao Chen, Zheng Lin.

**Figure 3.** Figure 3: The TrimCaching framework in a multi-edge scenario. [PITH_FULL_IMAGE:figures/full_fig_p003_3.png] view at source ↗

**Figure 4.** Figure 4: An example of the special case with a small fixed number of shared [PITH_FULL_IMAGE:figures/full_fig_p006_4.png] view at source ↗

**Figure 5.** Figure 5: The process of updating T (eN , w˙ N ) for edge server m, where the first row is the index of the number of cache hits and the first column is the model index in IN . by T (eN , w˙ N ) =    min    T (eN − 1, w˙ N ), T [PITH_FULL_IMAGE:figures/full_fig_p007_5.png] view at source ↗

**Figure 6.** Figure 6: The process of determining Xˆ m,N , where the first row and the first column are the same as those in [PITH_FULL_IMAGE:figures/full_fig_p008_6.png] view at source ↗

**Figure 7.** Figure 7: Cache hit ratio in the special case, where a small fixed number of shared parameter blocks is considered, using the ResNet-based model library. The [PITH_FULL_IMAGE:figures/full_fig_p011_7.png] view at source ↗

**Figure 8.** Figure 8: Cache hit ratio in the special case using the GPT-2-based model library. [PITH_FULL_IMAGE:figures/full_fig_p012_8.png] view at source ↗

**Figure 9.** Figure 9: Cache hit ratio in the general case using the ResNet-based model library. [PITH_FULL_IMAGE:figures/full_fig_p012_9.png] view at source ↗

**Figure 10.** Figure 10: Cache hit ratio and average running time of different algorithms. [PITH_FULL_IMAGE:figures/full_fig_p012_10.png] view at source ↗

**Figure 11.** Figure 11: Impact of user mobility on the cache hit ratio over time. [PITH_FULL_IMAGE:figures/full_fig_p013_11.png] view at source ↗

**Figure 13.** Figure 13: Cache hit ratio comparisons with online placement strategies using [PITH_FULL_IMAGE:figures/full_fig_p014_13.png] view at source ↗

read the original abstract

Next-generation mobile networks are expected to facilitate fast AI model downloading to end users. By caching models on edge servers, mobile networks can deliver models to end users with low latency, resulting in a paradigm of edge model caching. In this paper, we develop a novel model placement framework, called parameter-sharing model caching (TrimCaching). TrimCaching exploits the key observation that a wide range of AI models, such as convolutional neural networks or large language models, can share a significant proportion of parameter blocks containing reusable knowledge, thereby improving storage efficiency. To this end, we formulate a parameter-sharing model placement problem to maximize the cache hit ratio in multi-edge wireless networks by balancing the fundamental tradeoff between storage efficiency and service latency. We show that the formulated problem is a submodular maximization problem with submodular constraints, for which no polynomial-time approximation algorithm exists. To tackle this challenge, we study an important special case, where a small fixed number of parameter blocks are shared across models, which often holds in practice. In such a case, a polynomial-time algorithm with a $\left(1-\epsilon\right)/2$-approximation guarantee is developed. Subsequently, we address the original problem for the general case by developing a greedy algorithm. Simulation results demonstrate that the proposed TrimCaching framework significantly improves the cache hit ratio compared with state-of-the-art content caching without exploiting shared parameters in AI models.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

TrimCaching gives a submodular formulation for sharing parameter blocks across AI models in edge caches, with a (1-ε)/2 approximation for the fixed-shared-blocks case and a greedy fallback, plus simulation claims of higher hit rates.

read the letter

The core contribution is a parameter-sharing placement problem that treats reusable blocks across models as a way to cut storage while keeping low latency in multi-edge wireless setups. They prove it is submodular maximization under submodular constraints, develop a polynomial-time (1-ε)/2-approximation when the number of shared blocks is small and fixed, and fall back to greedy for the general case. Simulations are said to beat standard content caching that ignores shared parameters.

Referee Report

3 major / 1 minor

Summary. The paper proposes TrimCaching, a parameter-sharing edge caching framework for AI model downloading in multi-edge wireless networks. It formulates the model placement problem as maximizing cache hit ratio by exploiting shared parameter blocks across models (e.g., CNNs or LLMs), shows the problem is submodular maximization under submodular constraints (no poly-time approximation exists in general), develops a polynomial-time (1-ε)/2-approximation algorithm for the special case of a small fixed number of shared blocks, proposes a greedy algorithm for the general case, and reports via simulations that TrimCaching significantly improves cache hit ratio over state-of-the-art content caching methods that ignore parameter sharing.

Significance. If the algorithmic guarantees and simulation improvements hold, the work addresses a timely problem in edge computing for large AI models by trading off storage efficiency against latency through parameter reuse. The submodular formulation, the special-case approximation guarantee, and the practical observation about fixed shared blocks are strengths that could influence caching designs in 5G/6G networks.

major comments (3)

[Abstract] Abstract: The central simulation claim (significant cache hit ratio improvement) is load-bearing for the paper's contribution, yet the abstract provides no details on simulation setup, number of trials, error bars, specific network parameters, or how submodularity was verified; this prevents assessment of whether the reported gains are robust or reproducible.
[Abstract] Abstract: The key modeling assumption that 'a wide range of AI models... share a significant proportion of parameter blocks... often holds with a small fixed number of shared blocks in practice' is stated without quantification or reference to concrete model families (e.g., specific CNN or LLM parameter overlap statistics); this assumption directly enables the special-case algorithm and must be supported for the (1-ε)/2 guarantee to be practically relevant.
[Abstract] Abstract: The claim that the formulated problem is 'a submodular maximization problem with submodular constraints, for which no polynomial-time approximation algorithm exists' is presented without a proof sketch or reference to the specific submodular constraint functions; verification of both submodularity and the inapproximability result is required to justify moving to the special-case and greedy algorithms.

minor comments (1)

[Abstract] Abstract: The approximation ratio is written as $(1-ε)/2$; clarify whether ε is a user-specified parameter or derived from the number of shared blocks.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive comments. We address each major comment below and will revise the abstract to better support the claims while respecting length constraints.

read point-by-point responses

Referee: [Abstract] Abstract: The central simulation claim (significant cache hit ratio improvement) is load-bearing for the paper's contribution, yet the abstract provides no details on simulation setup, number of trials, error bars, specific network parameters, or how submodularity was verified; this prevents assessment of whether the reported gains are robust or reproducible.

Authors: We agree that the abstract would benefit from additional details on the simulation results. In the revision, we will incorporate key elements such as the number of Monte Carlo trials, mention of error bars, and primary network parameters (e.g., number of edge servers and model sizes). Submodularity is established theoretically (see Section III); we will add a reference to this section. The complete simulation methodology remains in Section V. revision: yes
Referee: [Abstract] Abstract: The key modeling assumption that 'a wide range of AI models... share a significant proportion of parameter blocks... often holds with a small fixed number of shared blocks in practice' is stated without quantification or reference to concrete model families (e.g., specific CNN or LLM parameter overlap statistics); this assumption directly enables the special-case algorithm and must be supported for the (1-ε)/2 guarantee to be practically relevant.

Authors: We acknowledge that the assumption requires stronger support. We will revise the abstract (or move supporting text to the introduction) to include quantitative examples drawn from the literature on CNNs (e.g., ResNet variants) and LLMs (e.g., BERT/GPT fine-tuning), citing typical shared-block overlap statistics of 30-60%. This will directly justify the practical relevance of the special-case algorithm. revision: yes
Referee: [Abstract] Abstract: The claim that the formulated problem is 'a submodular maximization problem with submodular constraints, for which no polynomial-time approximation algorithm exists' is presented without a proof sketch or reference to the specific submodular constraint functions; verification of both submodularity and the inapproximability result is required to justify moving to the special-case and greedy algorithms.

Authors: The abstract summarizes results whose full proofs appear in Section III, where we prove submodularity of both the objective (cache-hit ratio) and the per-edge storage constraints, and invoke the known inapproximability of submodular maximization under submodular knapsack constraints. We will revise the abstract to add an explicit reference to Section III and briefly identify the constraint functions. revision: partial

Circularity Check

0 steps flagged

No significant circularity in derivation chain

full rationale

The paper formulates a parameter-sharing model placement problem as submodular maximization with submodular constraints, develops a polynomial-time approximation for the special case of fixed shared blocks and a greedy algorithm for the general case, then validates via simulation. These steps rest on standard submodular optimization properties and the independent observation about shared AI model parameters; no equation or result reduces by construction to a fitted input, self-citation, or renamed known pattern. Simulations provide external empirical evidence rather than a forced prediction. The derivation is therefore self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests primarily on the domain assumption that AI models share reusable parameter blocks in practice, presented as an observation without independent evidence or proof in the abstract.

axioms (1)

domain assumption A wide range of AI models share a significant proportion of parameter blocks containing reusable knowledge, often with a small fixed number shared across models.
Stated as the key observation enabling the framework in the abstract.

pith-pipeline@v0.9.0 · 5793 in / 1238 out tokens · 26086 ms · 2026-05-24T02:31:43.870656+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

74 extracted references · 74 canonical work pages

[1]

TrimCaching: Parameter- sharing AI model caching in wireless edge networks,

G. Qu, Z. Lin, F. Liu, X. Chen, and K. Huang, “TrimCaching: Parameter- sharing AI model caching in wireless edge networks,” in Proc. IEEE Int. Conf. Distrib. Comput. Syst. (ICDCS) , Jersey City, NJ, USA, Jul. 2024, pp. 36–46

work page 2024
[2]

D²-JSCC: Digital deep joint source-channel coding for semantic communications,

J. Huang, K. Yuan, C. Huang, and K. Huang, “D²-JSCC: Digital deep joint source-channel coding for semantic communications,” IEEE J. Sel. Areas Commun., vol. 43, no. 4, pp. 1246–1261, Apr. 2025

work page 2025
[3]

Semantic sleuth: Identifying ponzi contracts via large language models,

C. Wu, J. Chen, Z. Wang, R. Liang, and R. Du, “Semantic sleuth: Identifying ponzi contracts via large language models,” in Proc. 39th IEEE/ACM Int. Conf. Autom. Softw. Eng. , ser. ASE ’24, Oct. 2024, p. 582–593

work page 2024
[4]

Toward full-scene domain generalization in multi-agent collaborative bird’s eye view segmentation for connected and autonomous driving,

S. Hu, Z. Fang, Y . Deng, X. Chen, Y . Fang, and S. Kwong, “Toward full-scene domain generalization in multi-agent collaborative bird’s eye view segmentation for connected and autonomous driving,” IEEE Trans. Intell. Transp. Syst. , vol. 26, no. 2, pp. 1783–1796, Feb. 2025

work page 2025
[5]

Fedhome: Cloud-edge based personalized federated learning for in-home health monitoring,

Q. Wu, X. Chen, Z. Zhou, and J. Zhang, “Fedhome: Cloud-edge based personalized federated learning for in-home health monitoring,” IEEE Trans. Mobile Comput. , vol. 21, no. 8, pp. 2818–2832, Aug. 2022

work page 2022
[6]

Federated learning for smart healthcare: A survey,

D. C. Nguyen, Q.-V . Pham, P. N. Pathirana, M. Ding, A. Seneviratne, Z. Lin, O. Dobre, and W.-J. Hwang, “Federated learning for smart healthcare: A survey,” ACM Comput. Surv. , vol. 55, no. 3, pp. 1–37, Feb. 2022

work page 2022
[7]

Privacy risks in reinforcement learning for household robots,

M. Li, W. Ding, and D. Zhao, “Privacy risks in reinforcement learning for household robots,” in 2024 IEEE International Conference on Robotics and Automation (ICRA) . IEEE, 2024, pp. 5148–5154

work page 2024
[8]

FedMeld: A model-dispersal feder- ated learning framework for space-ground integrated networks,

Q. Chen, X. Chen, and K. Huang, “FedMeld: A model-dispersal feder- ated learning framework for space-ground integrated networks,” arXiv preprint arXiv:2412.17231, 2024

work page arXiv 2024
[9]

EchoHand: High accuracy and presentation attack resistant hand authentication on commodity mobile devices,

C. Wu, J. Chen, K. He, Z. Zhao, R. Du, and C. Zhang, “EchoHand: High accuracy and presentation attack resistant hand authentication on commodity mobile devices,” in Proc. 2022 ACM SIGSAC Conf. Comput. Commun. Secur., ser. CCS ’22, Nov. 2022, p. 2931–2945

work page 2022
[10]

2021, version 18.2.0

3GPP, “3rd generation partnership project; Technical specification group services and system aspects; Study on traffic characteristics and perfor- mance requirements for AI/ML model transfer in 5GS; (Release 18),” 3rd Generation Partnership Project (3GPP), Technical Specification (TS) 22.874, Dec. 2021, version 18.2.0

work page 2021
[11]

Notable site recognition using deep learning on mobile and crowd-sourced imagery,

J. Tan, A. Noulas, D. S ´aez, and R. Schifanella, “Notable site recognition using deep learning on mobile and crowd-sourced imagery,” in Proc. 2020 21st IEEE Int. Conf. Mobile Data Manage. (MDM) , Versailles, France, Aug. 2020, pp. 137–147

work page 2020
[12]

Sense4FL: Vehicular crowdsensing enhanced federated learning for autonomous driving,

Y . Ma, S. Hu, Z. Fang, Y . Ji, Y . Deng, and Y . Fang, “Sense4FL: Vehicular crowdsensing enhanced federated learning for autonomous driving,” arXiv preprint arXiv:2503.17697 , 2025

work page arXiv 2025
[13]

Space-ground fluid AI for 6G edge intelligence,

Q. Chen, Z. Wang, X. Chen, J. Wen, D. Zhou, S. Ji, M. Sheng, and K. Huang, “Space-ground fluid AI for 6G edge intelligence,” arXiv preprint arXiv:2411.15845, 2024

work page arXiv 2024
[14]

A joint learning and communications framework for federated learning over wireless networks,

M. Chen, Z. Yang, W. Saad, C. Yin, H. V . Poor, and S. Cui, “A joint learning and communications framework for federated learning over wireless networks,” IEEE Trans. Wireless Commun. , vol. 20, no. 1, pp. 269–283, Jan. 2021

work page 2021
[15]

Byzantine-robust federated learning via cosine similarity aggregation,

T. Zhu, Z. Guo, C. Yao, J. Tan, S. Dou, W. Wang, and Z. Han, “Byzantine-robust federated learning via cosine similarity aggregation,” Comput. Netw., vol. 254, p. 110730, Dec. 2024

work page 2024
[16]

Ultra- low-latency edge inference for distributed sensing,

Z. Wang, A. E. Kalør, Y . Zhou, P. Popovski, and K. Huang, “Ultra- low-latency edge inference for distributed sensing,” arXiv preprint arXiv:2407.13360, 2024

work page arXiv 2024
[17]

Priori- tized information bottleneck theoretic framework with distributed online learning for edge video analytics,

Z. Fang, S. Hu, J. Wang, Y . Deng, X. Chen, and Y . Fang, “Priori- tized information bottleneck theoretic framework with distributed online learning for edge video analytics,” IEEE/ACM Trans. Netw., pp. 1–17, early access 2025

work page 2025
[18]

Gemel: Model merging for memory-efficient, real-time video analytics at the edge,

A. Padmanabhan, N. Agarwal, A. Iyer, G. Ananthanarayanan, Y . Shu, N. Karianakis, G. H. Xu, and R. Netravali, “Gemel: Model merging for memory-efficient, real-time video analytics at the edge,” in Proc. USENIX Symp. Netw. Syst. Des. Implement. (NSDI) , Boston, MA, USA, Apr. 2023, pp. 973–994

work page 2023
[19]

Pushing large language models to the 6G edge: Vision, challenges, and opportunities,

Z. Lin, G. Qu, Q. Chen, X. Chen, Z. Chen, and K. Huang, “Pushing large language models to the 6G edge: Vision, challenges, and opportunities,” arXiv preprint arXiv:2309.16739 , 2023

work page arXiv 2023
[20]

Transfer learning & fine-tuning,

TenserFlow, “Transfer learning & fine-tuning,” 2023. [Online]. Available: https://www.tensorflow.org/guide/keras/transfer learning# introduction

work page 2023
[21]

A comprehensive survey on transfer learning,

F. Zhuang, Z. Qi, K. Duan, D. Xi, Y . Zhu, H. Zhu, H. Xiong, and Q. He, “A comprehensive survey on transfer learning,” Proc. IEEE, vol. 109, no. 1, pp. 43–76, Jan. 2020

work page 2020
[22]

Spottune: Transfer learning through adaptive fine-tuning,

Y . Guo, H. Shi, A. Kumar, K. Grauman, T. Rosing, and R. Feris, “Spottune: Transfer learning through adaptive fine-tuning,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR) , Long Beach, CA, USA, Jun. 2019

work page 2019
[23]

PACP: Priority-aware collaborative perception for connected and autonomous vehicles,

Z. Fang, S. Hu, H. An, Y . Zhang, J. Wang, H. Cao, X. Chen, and Y . Fang, “PACP: Priority-aware collaborative perception for connected and autonomous vehicles,” IEEE Trans. Mobile Comput. , pp. 15 003– 15 018, Dec. 2024

work page 2024
[24]

LoRA: Low-rank adaptation of large language models,

E. J. Hu, yelong shen, P. Wallis, Z. Allen-Zhu, Y . Li, S. Wang, L. Wang, and W. Chen, “LoRA: Low-rank adaptation of large language models,” in Proc. Int. Conf. Learn. Represent. (ICLR) , Apr. 2022

work page 2022
[25]

Deep residual learning for image recognition,

K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR), Las Vegas, NV , USA, Jun. 2016, pp. 770–778

work page 2016
[26]

Learning multiple layers of features from tiny images,

A. Krizhevsky, G. Hinton et al. , “Learning multiple layers of features from tiny images,” Apr. 2009

work page 2009
[27]

The E2E dataset: New challenges for end-to-end generation,

J. Novikova, O. Du ˇsek, and V . Rieser, “The E2E dataset: New challenges for end-to-end generation,” in Proc. Annu. SIGdial Meeting Disc. Dialogue, Saarbr ¨ucken, Germany, Aug. 2017, pp. 201–206

work page 2017
[28]

Cre- ating training corpora for NLG micro-planners,

C. Gardent, A. Shimorina, S. Narayan, and L. Perez-Beltrachini, “Cre- ating training corpora for NLG micro-planners,” in Proc. Annu. Meet. Assoc. Comput. Linguist. (ACL), Vancouver, Canada, Jul. 2017, pp. 179– 188

work page 2017
[29]

Dart: Open-domain structured data record to text generation,

L. Nan, D. Radev, R. Zhang, A. Rau, A. Sivaprasad, C. Hsieh, X. Tang, A. Vyas, N. Verma, P. Krishna, Y . Liu, N. Irwanto, J. Pan, F. Rahman, A. Zaidi, M. Mutuma, Y . Tarabar, A. Gupta, T. Yu, Y . C. Tan, X. V . Lin, C. Xiong, R. Socher, and N. F. Rajani, “Dart: Open-domain structured data record to text generation,” arXiv preprint arXiv:2007.02871, 2021

work page arXiv 2007
[30]

Femtocaching: Wireless content delivery through distributed caching helpers,

K. Shanmugam, N. Golrezaei, A. G. Dimakis, A. F. Molisch, and G. Caire, “Femtocaching: Wireless content delivery through distributed caching helpers,” IEEE Trans. Inf. Theory , vol. 59, no. 12, pp. 8402– 8413, Dec. 2013

work page 2013
[31]

On the complexity of optimal content placement in hierarchical caching networks,

K. Poularakis and L. Tassiulas, “On the complexity of optimal content placement in hierarchical caching networks,” IEEE Trans. Commun. , vol. 64, no. 5, pp. 2092–2103, Mar. 2016

work page 2092
[32]

On the complexity of optimal re- quest routing and content caching in heterogeneous cache networks,

M. Dehghan, B. Jiang, A. Seetharam, T. He, T. Salonidis, J. Kurose, D. Towsley, and R. Sitaraman, “On the complexity of optimal re- quest routing and content caching in heterogeneous cache networks,” IEEE/ACM Trans. Netw., vol. 25, no. 3, pp. 1635–1648, Jun. 2017. 15

work page 2017
[33]

Edge-caching wireless networks: Performance analysis and optimization,

T. X. Vu, S. Chatzinotas, and B. Ottersten, “Edge-caching wireless networks: Performance analysis and optimization,” IEEE Trans. Wireless Commun., vol. 17, no. 4, pp. 2827–2839, Apr. 2018

work page 2018
[34]

Caching at the wireless edge: Design aspects, challenges, and future directions,

D. Liu, B. Chen, C. Yang, and A. F. Molisch, “Caching at the wireless edge: Design aspects, challenges, and future directions,” IEEE Commun. Mag., vol. 54, no. 9, pp. 22–28, Sep. 2016

work page 2016
[35]

On energy-efficient edge caching in heterogeneous networks,

F. Gabry, V . Bioglio, and I. Land, “On energy-efficient edge caching in heterogeneous networks,” IEEE J. Sel. Areas Commun. , vol. 34, no. 12, pp. 3288–3298, Dec. 2016

work page 2016
[36]

Delay-minimized edge caching in heterogeneous vehicular networks: A matching-based approach,

H. Wu, J. Chen, W. Xu, N. Cheng, W. Shi, L. Wang, and X. Shen, “Delay-minimized edge caching in heterogeneous vehicular networks: A matching-based approach,” IEEE Trans. Wireless Commun. , vol. 19, no. 10, pp. 6409–6424, Oct. 2020

work page 2020
[37]

Latency minimization for content delivery networks with wireless edge caching,

T. X. Vu, L. Lei, S. Vuppala, A. Kalantari, S. Chatzinotas, and B. Otter- sten, “Latency minimization for content delivery networks with wireless edge caching,” in Proc. IEEE Int. Conf.Commun. (ICC) , Kansas City, MO, USA, May 2018, pp. 1–6

work page 2018
[38]

PartialLoading: User scheduling and bandwidth allocation for parameter-sharing edge inference,

G. Qu, Q. Chen, X. Chen, K. Huang, and Y . Fang, “PartialLoading: User scheduling and bandwidth allocation for parameter-sharing edge inference,” arXiv preprint arXiv:2503.22982 , 2025

work page arXiv 2025
[39]

Edge intel- ligence: Architectures, challenges, and applications,

D. Xu, T. Li, Y . Li, X. Su, S. Tarkoma, and P. Hui, “Edge intel- ligence: Architectures, challenges, and applications,” arXiv preprint arXiv:2003.12172, 2020

work page arXiv 2003
[40]

Cached model-as-a-resource: Provisioning large language model agents for edge intelligence in space-air-ground integrated networks,

M. Xu, D. Niyato, H. Zhang, J. Kang, Z. Xiong, S. Mao, and Z. Han, “Cached model-as-a-resource: Provisioning large language model agents for edge intelligence in space-air-ground integrated networks,” arXiv preprint arXiv:2403.05826, 2024

work page arXiv 2024
[41]

Pipelining split learning in multi-hop edge networks,

W. Wei, Z. Lin, T. Li, X. Li, and X. Chen, “Pipelining split learning in multi-hop edge networks,” arXiv preprint arXiv:2505.04368 , 2025

work page arXiv 2025
[42]

Split learning in 6G edge networks,

Z. Lin, G. Qu, X. Chen, and K. Huang, “Split learning in 6G edge networks,” IEEE Wireless Commun., vol. 31, no. 4, pp. 170–176, Aug. 2024

work page 2024
[43]

QoS-aware placement of deep learning services on the edge with multiple service implemen- tations,

N. Hudson, H. Khamfroush, and D. E. Lucani, “QoS-aware placement of deep learning services on the edge with multiple service implemen- tations,” in Proc. Int. Conf. on Comput. Commun. and Netw. (ICCCN) , Athens, Greece, Jul. 2021, pp. 1–8

work page 2021
[44]

In-situ model downloading to realize versatile edge AI in 6G mobile networks,

K. Huang, H. Wu, Z. Liu, and X. Qi, “In-situ model downloading to realize versatile edge AI in 6G mobile networks,” IEEE Commun. Mag., vol. 30, no. 3, pp. 96–102, 2023

work page 2023
[45]

Mobile edge intelligence for large language models: A contemporary survey,

G. Qu, Q. Chen, W. Wei, Z. Lin, X. Chen, and K. Huang, “Mobile edge intelligence for large language models: A contemporary survey,” IEEE Commun. Surveys Tuts., pp. 1–42, early access 2025

work page 2025
[46]

Efficient multiuser AI downloading via reusable knowledge broadcasting,

H. Wu, Q. Zeng, and K. Huang, “Efficient multiuser AI downloading via reusable knowledge broadcasting,” IEEE Trans. Wireless Commun., vol. 23, no. 8, pp. 10 459–10 472, Aug. 2024

work page 2024
[47]

Collaborative edge caching in context-aware device-to-device networks,

X. Zhao, P. Yuan, H. li, and S. Tang, “Collaborative edge caching in context-aware device-to-device networks,” IEEE Trans. Veh. Technol. , vol. 67, no. 10, pp. 9583–9596, Oct. 2018

work page 2018
[48]

Soft cache hits: Improving performance through recommendation and delivery of related content,

P. Sermpezis, T. Giannakas, T. Spyropoulos, and L. Vigneri, “Soft cache hits: Improving performance through recommendation and delivery of related content,” IEEE J. Sel. Areas Commun. , vol. 36, no. 6, pp. 1300– 1313, Jun. 2018

work page 2018
[49]

Multimedia caching strategies for hetero- geneous application and server environments,

A. Dan and D. Sitaram, “Multimedia caching strategies for hetero- geneous application and server environments,” Multimed. Tools Appl. , vol. 4, pp. 279–312, May 1997

work page 1997
[50]

Jointly optimizing content caching and recommendations in small cell net- works,

L. E. Chatzieleftheriou, M. Karaliopoulos, and I. Koutsopoulos, “Jointly optimizing content caching and recommendations in small cell net- works,” IEEE Trans. Mobile Comput. , vol. 18, no. 1, pp. 125–138, Jan. 2019

work page 2019
[51]

A survey of caching mechanisms in information-centric networking,

M. Zhang, H. Luo, and H. Zhang, “A survey of caching mechanisms in information-centric networking,” IEEE Commun. Surveys Tuts., vol. 17, no. 3, pp. 1473–1499, 3rd Quart. 2015

work page 2015
[52]

Fujishige, Submodular functions and optimization

S. Fujishige, Submodular functions and optimization . New York, NY , USA: Elsevier, 2005

work page 2005
[53]

Lov ´asz, Mathematical Programming The State of the Art: Bonn

L. Lov ´asz, Mathematical Programming The State of the Art: Bonn

work page
[54]

Submodular functions and convexity, pp

Berlin, Heidelberg: Springer, 1983, ch. Submodular functions and convexity, pp. 235–257

work page 1983
[55]

Autotune: Automatically tuning convolutional neural networks for improved transfer learning,

S. S. Basha, S. K. Vinakota, V . Pulabaigari, S. Mukherjee, and S. R. Dubey, “Autotune: Automatically tuning convolutional neural networks for improved transfer learning,” Neural Netw., vol. 133, pp. 112–122, Jan. 2021

work page 2021
[56]

A comprehensive survey on transfer learning,

F. Zhuang, Z. Qi, K. Duan, D. Xi, Y . Zhu, H. Zhu, H. Xiong, and Q. He, “A comprehensive survey on transfer learning,” Proc. IEEE, vol. 109, no. 1, pp. 43–76, Jan. 2021

work page 2021
[57]

Models and pre-trained weights

PyTorch, “Models and pre-trained weights.” [Online]. Available: https://docs.pytorch.org/vision/main/models.html

work page
[58]

Introducing Apple’s on-device and server foundation models,

Apple, “Introducing Apple’s on-device and server foundation models,”

work page
[59]

Available: https://machinelearning.apple.com/research/ introducing-apple-foundation-models

[Online]. Available: https://machinelearning.apple.com/research/ introducing-apple-foundation-models

work page
[60]

Serving customized AI models at scale with LoRA,

IBM, “Serving customized AI models at scale with LoRA,” 2024. [Online]. Available: https://research.ibm.com/blog/LoRAs-explained

work page 2024
[61]

Submodular optimization with submodular cover and submodular knapsack constraints,

R. K. Iyer and J. A. Bilmes, “Submodular optimization with submodular cover and submodular knapsack constraints,” in Proc. Adv. Neural Inform. Process. Syst. (NeurIPS) , Stateline, NV , USA, Dec. 2013, pp. 1–9

work page 2013
[62]

Fast semidifferential-based submod- ular function optimization,

R. Iyer, S. Jegelka, and J. Bilmes, “Fast semidifferential-based submod- ular function optimization,” in Proc. Int. Conf. Mach. Learn. (ICML) , Atlanta, USA, Jun. 2013, pp. 855–863

work page 2013
[63]

3rd generation partnership project; Technical specification group radio access network; NR; Base station (BS) radio transmission and reception; (Release 18),

3GPP, “3rd generation partnership project; Technical specification group radio access network; NR; Base station (BS) radio transmission and reception; (Release 18),” 3rd Generation Partnership Project (3GPP), Technical Specification (TS) 38.104, Dec. 2024, version 18.8.0

work page 2024
[64]

Mobile backhaul: An overview,

GSMA, “Mobile backhaul: An overview,” 2019. [Online]. Available: https://www.gsma.com/futurenetworks/wiki/ mobile-backhaul-an-overview/

work page 2019
[65]

Ultra- dense 5G small cell deployment for fiber and wireless backhaul-aware infrastructures,

A. L. Rezaabad, H. Beyranvand, J. A. Salehi, and M. Maier, “Ultra- dense 5G small cell deployment for fiber and wireless backhaul-aware infrastructures,” IEEE Trans. Veh. Technol., vol. 67, no. 12, pp. 12 231– 12 243, Dec. 2018

work page 2018
[66]

Spectral efficiency analysis of cell-free massive MIMO systems with zero-forcing detector,

P. Liu, K. Luo, D. Chen, and T. Jiang, “Spectral efficiency analysis of cell-free massive MIMO systems with zero-forcing detector,” IEEE Trans. Wireless Commun., vol. 19, no. 2, pp. 795–807, Feb. 2020

work page 2020
[67]

Relative frequency as a determinant of phonetic change,

G. K. Zipf, “Relative frequency as a determinant of phonetic change,” Harvard Studies in Classical Philology , vol. 40, pp. 1–95, 1929

work page 1929
[68]

Social and spatial proactive caching for mobile data offloading,

E. Bas ¸tu˘g, M. Bennis, and M. Debbah, “Social and spatial proactive caching for mobile data offloading,” in Proc. IEEE Int. Conf.Commun. Workshops (ICC Wkshps), Sydney, NSW, Australia, Jun. 2014, pp. 581– 586

work page 2014
[69]

Learning-aided content placement in caching-enabled fog computing systems using thompson sampling,

J. Zhu, X. Huang, and Z. Shao, “Learning-aided content placement in caching-enabled fog computing systems using thompson sampling,” in Proc. IEEE Int. Conf. Acoust., Speech, Signal Process. (ICASSP) , Barcelona, Spain, May 2020, pp. 5060–5064

work page 2020
[70]

A survey of ICN content naming and in-network caching in 5G and beyond networks,

O. Serhane, K. Yahyaoui, B. Nour, and H. Moungla, “A survey of ICN content naming and in-network caching in 5G and beyond networks,” IEEE Internet Things J. , vol. 8, no. 6, pp. 4081–4104, Mar. 2021

work page 2021
[71]

On caching and routing in information-centric net- works,

A. Seetharam, “On caching and routing in information-centric net- works,” IEEE Commun. Mag. , vol. 56, no. 3, pp. 204–209, Mar. 2018

work page 2018
[72]

Approximability issues for unconstrained and constrained maximization of half-product related functions,

H. Kellerer, R. Sarto Basso, and V . A. Strusevich, “Approximability issues for unconstrained and constrained maximization of half-product related functions,” Theor. Comput. Sci., vol. 659, pp. 64–71, Jan. 2017

work page 2017
[73]

Approximation algorithms for the multiple knapsack problem with assignment restrictions,

M. Dawande, J. Kalagnanam, P. Keskinocak, F. S. Salman, and R. Ravi, “Approximation algorithms for the multiple knapsack problem with assignment restrictions,” J. Comb. Optim. , vol. 4, pp. 171–186, 2000

work page 2000
[74]

A polynomial time approximation scheme for the multiple knapsack problem,

C. Chekuri and S. Khanna, “A polynomial time approximation scheme for the multiple knapsack problem,” SIAM J. Comput. , vol. 35, no. 3, pp. 713–728, 2005. 1 APPENDIX A PROOF OF PROPOSITION 1 We begin by introducing a few statements. For any fea- sible X, let η (X) = {xm,i | xm,i = 1, xm,i ∈ X} denote the set of model caching decisions with xm,i = 1 , IX =...

work page 2005

[1] [1]

TrimCaching: Parameter- sharing AI model caching in wireless edge networks,

G. Qu, Z. Lin, F. Liu, X. Chen, and K. Huang, “TrimCaching: Parameter- sharing AI model caching in wireless edge networks,” in Proc. IEEE Int. Conf. Distrib. Comput. Syst. (ICDCS) , Jersey City, NJ, USA, Jul. 2024, pp. 36–46

work page 2024

[2] [2]

D²-JSCC: Digital deep joint source-channel coding for semantic communications,

J. Huang, K. Yuan, C. Huang, and K. Huang, “D²-JSCC: Digital deep joint source-channel coding for semantic communications,” IEEE J. Sel. Areas Commun., vol. 43, no. 4, pp. 1246–1261, Apr. 2025

work page 2025

[3] [3]

Semantic sleuth: Identifying ponzi contracts via large language models,

C. Wu, J. Chen, Z. Wang, R. Liang, and R. Du, “Semantic sleuth: Identifying ponzi contracts via large language models,” in Proc. 39th IEEE/ACM Int. Conf. Autom. Softw. Eng. , ser. ASE ’24, Oct. 2024, p. 582–593

work page 2024

[4] [4]

Toward full-scene domain generalization in multi-agent collaborative bird’s eye view segmentation for connected and autonomous driving,

S. Hu, Z. Fang, Y . Deng, X. Chen, Y . Fang, and S. Kwong, “Toward full-scene domain generalization in multi-agent collaborative bird’s eye view segmentation for connected and autonomous driving,” IEEE Trans. Intell. Transp. Syst. , vol. 26, no. 2, pp. 1783–1796, Feb. 2025

work page 2025

[5] [5]

Fedhome: Cloud-edge based personalized federated learning for in-home health monitoring,

Q. Wu, X. Chen, Z. Zhou, and J. Zhang, “Fedhome: Cloud-edge based personalized federated learning for in-home health monitoring,” IEEE Trans. Mobile Comput. , vol. 21, no. 8, pp. 2818–2832, Aug. 2022

work page 2022

[6] [6]

Federated learning for smart healthcare: A survey,

D. C. Nguyen, Q.-V . Pham, P. N. Pathirana, M. Ding, A. Seneviratne, Z. Lin, O. Dobre, and W.-J. Hwang, “Federated learning for smart healthcare: A survey,” ACM Comput. Surv. , vol. 55, no. 3, pp. 1–37, Feb. 2022

work page 2022

[7] [7]

Privacy risks in reinforcement learning for household robots,

M. Li, W. Ding, and D. Zhao, “Privacy risks in reinforcement learning for household robots,” in 2024 IEEE International Conference on Robotics and Automation (ICRA) . IEEE, 2024, pp. 5148–5154

work page 2024

[8] [8]

FedMeld: A model-dispersal feder- ated learning framework for space-ground integrated networks,

Q. Chen, X. Chen, and K. Huang, “FedMeld: A model-dispersal feder- ated learning framework for space-ground integrated networks,” arXiv preprint arXiv:2412.17231, 2024

work page arXiv 2024

[9] [9]

EchoHand: High accuracy and presentation attack resistant hand authentication on commodity mobile devices,

C. Wu, J. Chen, K. He, Z. Zhao, R. Du, and C. Zhang, “EchoHand: High accuracy and presentation attack resistant hand authentication on commodity mobile devices,” in Proc. 2022 ACM SIGSAC Conf. Comput. Commun. Secur., ser. CCS ’22, Nov. 2022, p. 2931–2945

work page 2022

[10] [10]

2021, version 18.2.0

3GPP, “3rd generation partnership project; Technical specification group services and system aspects; Study on traffic characteristics and perfor- mance requirements for AI/ML model transfer in 5GS; (Release 18),” 3rd Generation Partnership Project (3GPP), Technical Specification (TS) 22.874, Dec. 2021, version 18.2.0

work page 2021

[11] [11]

Notable site recognition using deep learning on mobile and crowd-sourced imagery,

J. Tan, A. Noulas, D. S ´aez, and R. Schifanella, “Notable site recognition using deep learning on mobile and crowd-sourced imagery,” in Proc. 2020 21st IEEE Int. Conf. Mobile Data Manage. (MDM) , Versailles, France, Aug. 2020, pp. 137–147

work page 2020

[12] [12]

Sense4FL: Vehicular crowdsensing enhanced federated learning for autonomous driving,

Y . Ma, S. Hu, Z. Fang, Y . Ji, Y . Deng, and Y . Fang, “Sense4FL: Vehicular crowdsensing enhanced federated learning for autonomous driving,” arXiv preprint arXiv:2503.17697 , 2025

work page arXiv 2025

[13] [13]

Space-ground fluid AI for 6G edge intelligence,

Q. Chen, Z. Wang, X. Chen, J. Wen, D. Zhou, S. Ji, M. Sheng, and K. Huang, “Space-ground fluid AI for 6G edge intelligence,” arXiv preprint arXiv:2411.15845, 2024

work page arXiv 2024

[14] [14]

A joint learning and communications framework for federated learning over wireless networks,

M. Chen, Z. Yang, W. Saad, C. Yin, H. V . Poor, and S. Cui, “A joint learning and communications framework for federated learning over wireless networks,” IEEE Trans. Wireless Commun. , vol. 20, no. 1, pp. 269–283, Jan. 2021

work page 2021

[15] [15]

Byzantine-robust federated learning via cosine similarity aggregation,

T. Zhu, Z. Guo, C. Yao, J. Tan, S. Dou, W. Wang, and Z. Han, “Byzantine-robust federated learning via cosine similarity aggregation,” Comput. Netw., vol. 254, p. 110730, Dec. 2024

work page 2024

[16] [16]

Ultra- low-latency edge inference for distributed sensing,

Z. Wang, A. E. Kalør, Y . Zhou, P. Popovski, and K. Huang, “Ultra- low-latency edge inference for distributed sensing,” arXiv preprint arXiv:2407.13360, 2024

work page arXiv 2024

[17] [17]

Priori- tized information bottleneck theoretic framework with distributed online learning for edge video analytics,

Z. Fang, S. Hu, J. Wang, Y . Deng, X. Chen, and Y . Fang, “Priori- tized information bottleneck theoretic framework with distributed online learning for edge video analytics,” IEEE/ACM Trans. Netw., pp. 1–17, early access 2025

work page 2025

[18] [18]

Gemel: Model merging for memory-efficient, real-time video analytics at the edge,

A. Padmanabhan, N. Agarwal, A. Iyer, G. Ananthanarayanan, Y . Shu, N. Karianakis, G. H. Xu, and R. Netravali, “Gemel: Model merging for memory-efficient, real-time video analytics at the edge,” in Proc. USENIX Symp. Netw. Syst. Des. Implement. (NSDI) , Boston, MA, USA, Apr. 2023, pp. 973–994

work page 2023

[19] [19]

Pushing large language models to the 6G edge: Vision, challenges, and opportunities,

Z. Lin, G. Qu, Q. Chen, X. Chen, Z. Chen, and K. Huang, “Pushing large language models to the 6G edge: Vision, challenges, and opportunities,” arXiv preprint arXiv:2309.16739 , 2023

work page arXiv 2023

[20] [20]

Transfer learning & fine-tuning,

TenserFlow, “Transfer learning & fine-tuning,” 2023. [Online]. Available: https://www.tensorflow.org/guide/keras/transfer learning# introduction

work page 2023

[21] [21]

A comprehensive survey on transfer learning,

F. Zhuang, Z. Qi, K. Duan, D. Xi, Y . Zhu, H. Zhu, H. Xiong, and Q. He, “A comprehensive survey on transfer learning,” Proc. IEEE, vol. 109, no. 1, pp. 43–76, Jan. 2020

work page 2020

[22] [22]

Spottune: Transfer learning through adaptive fine-tuning,

Y . Guo, H. Shi, A. Kumar, K. Grauman, T. Rosing, and R. Feris, “Spottune: Transfer learning through adaptive fine-tuning,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR) , Long Beach, CA, USA, Jun. 2019

work page 2019

[23] [23]

PACP: Priority-aware collaborative perception for connected and autonomous vehicles,

Z. Fang, S. Hu, H. An, Y . Zhang, J. Wang, H. Cao, X. Chen, and Y . Fang, “PACP: Priority-aware collaborative perception for connected and autonomous vehicles,” IEEE Trans. Mobile Comput. , pp. 15 003– 15 018, Dec. 2024

work page 2024

[24] [24]

LoRA: Low-rank adaptation of large language models,

E. J. Hu, yelong shen, P. Wallis, Z. Allen-Zhu, Y . Li, S. Wang, L. Wang, and W. Chen, “LoRA: Low-rank adaptation of large language models,” in Proc. Int. Conf. Learn. Represent. (ICLR) , Apr. 2022

work page 2022

[25] [25]

Deep residual learning for image recognition,

K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR), Las Vegas, NV , USA, Jun. 2016, pp. 770–778

work page 2016

[26] [26]

Learning multiple layers of features from tiny images,

A. Krizhevsky, G. Hinton et al. , “Learning multiple layers of features from tiny images,” Apr. 2009

work page 2009

[27] [27]

The E2E dataset: New challenges for end-to-end generation,

J. Novikova, O. Du ˇsek, and V . Rieser, “The E2E dataset: New challenges for end-to-end generation,” in Proc. Annu. SIGdial Meeting Disc. Dialogue, Saarbr ¨ucken, Germany, Aug. 2017, pp. 201–206

work page 2017

[28] [28]

Cre- ating training corpora for NLG micro-planners,

C. Gardent, A. Shimorina, S. Narayan, and L. Perez-Beltrachini, “Cre- ating training corpora for NLG micro-planners,” in Proc. Annu. Meet. Assoc. Comput. Linguist. (ACL), Vancouver, Canada, Jul. 2017, pp. 179– 188

work page 2017

[29] [29]

Dart: Open-domain structured data record to text generation,

L. Nan, D. Radev, R. Zhang, A. Rau, A. Sivaprasad, C. Hsieh, X. Tang, A. Vyas, N. Verma, P. Krishna, Y . Liu, N. Irwanto, J. Pan, F. Rahman, A. Zaidi, M. Mutuma, Y . Tarabar, A. Gupta, T. Yu, Y . C. Tan, X. V . Lin, C. Xiong, R. Socher, and N. F. Rajani, “Dart: Open-domain structured data record to text generation,” arXiv preprint arXiv:2007.02871, 2021

work page arXiv 2007

[30] [30]

Femtocaching: Wireless content delivery through distributed caching helpers,

K. Shanmugam, N. Golrezaei, A. G. Dimakis, A. F. Molisch, and G. Caire, “Femtocaching: Wireless content delivery through distributed caching helpers,” IEEE Trans. Inf. Theory , vol. 59, no. 12, pp. 8402– 8413, Dec. 2013

work page 2013

[31] [31]

On the complexity of optimal content placement in hierarchical caching networks,

K. Poularakis and L. Tassiulas, “On the complexity of optimal content placement in hierarchical caching networks,” IEEE Trans. Commun. , vol. 64, no. 5, pp. 2092–2103, Mar. 2016

work page 2092

[32] [32]

On the complexity of optimal re- quest routing and content caching in heterogeneous cache networks,

M. Dehghan, B. Jiang, A. Seetharam, T. He, T. Salonidis, J. Kurose, D. Towsley, and R. Sitaraman, “On the complexity of optimal re- quest routing and content caching in heterogeneous cache networks,” IEEE/ACM Trans. Netw., vol. 25, no. 3, pp. 1635–1648, Jun. 2017. 15

work page 2017

[33] [33]

Edge-caching wireless networks: Performance analysis and optimization,

T. X. Vu, S. Chatzinotas, and B. Ottersten, “Edge-caching wireless networks: Performance analysis and optimization,” IEEE Trans. Wireless Commun., vol. 17, no. 4, pp. 2827–2839, Apr. 2018

work page 2018

[34] [34]

Caching at the wireless edge: Design aspects, challenges, and future directions,

D. Liu, B. Chen, C. Yang, and A. F. Molisch, “Caching at the wireless edge: Design aspects, challenges, and future directions,” IEEE Commun. Mag., vol. 54, no. 9, pp. 22–28, Sep. 2016

work page 2016

[35] [35]

On energy-efficient edge caching in heterogeneous networks,

F. Gabry, V . Bioglio, and I. Land, “On energy-efficient edge caching in heterogeneous networks,” IEEE J. Sel. Areas Commun. , vol. 34, no. 12, pp. 3288–3298, Dec. 2016

work page 2016

[36] [36]

Delay-minimized edge caching in heterogeneous vehicular networks: A matching-based approach,

H. Wu, J. Chen, W. Xu, N. Cheng, W. Shi, L. Wang, and X. Shen, “Delay-minimized edge caching in heterogeneous vehicular networks: A matching-based approach,” IEEE Trans. Wireless Commun. , vol. 19, no. 10, pp. 6409–6424, Oct. 2020

work page 2020

[37] [37]

Latency minimization for content delivery networks with wireless edge caching,

T. X. Vu, L. Lei, S. Vuppala, A. Kalantari, S. Chatzinotas, and B. Otter- sten, “Latency minimization for content delivery networks with wireless edge caching,” in Proc. IEEE Int. Conf.Commun. (ICC) , Kansas City, MO, USA, May 2018, pp. 1–6

work page 2018

[38] [38]

PartialLoading: User scheduling and bandwidth allocation for parameter-sharing edge inference,

G. Qu, Q. Chen, X. Chen, K. Huang, and Y . Fang, “PartialLoading: User scheduling and bandwidth allocation for parameter-sharing edge inference,” arXiv preprint arXiv:2503.22982 , 2025

work page arXiv 2025

[39] [39]

Edge intel- ligence: Architectures, challenges, and applications,

D. Xu, T. Li, Y . Li, X. Su, S. Tarkoma, and P. Hui, “Edge intel- ligence: Architectures, challenges, and applications,” arXiv preprint arXiv:2003.12172, 2020

work page arXiv 2003

[40] [40]

Cached model-as-a-resource: Provisioning large language model agents for edge intelligence in space-air-ground integrated networks,

M. Xu, D. Niyato, H. Zhang, J. Kang, Z. Xiong, S. Mao, and Z. Han, “Cached model-as-a-resource: Provisioning large language model agents for edge intelligence in space-air-ground integrated networks,” arXiv preprint arXiv:2403.05826, 2024

work page arXiv 2024

[41] [41]

Pipelining split learning in multi-hop edge networks,

W. Wei, Z. Lin, T. Li, X. Li, and X. Chen, “Pipelining split learning in multi-hop edge networks,” arXiv preprint arXiv:2505.04368 , 2025

work page arXiv 2025

[42] [42]

Split learning in 6G edge networks,

Z. Lin, G. Qu, X. Chen, and K. Huang, “Split learning in 6G edge networks,” IEEE Wireless Commun., vol. 31, no. 4, pp. 170–176, Aug. 2024

work page 2024

[43] [43]

QoS-aware placement of deep learning services on the edge with multiple service implemen- tations,

N. Hudson, H. Khamfroush, and D. E. Lucani, “QoS-aware placement of deep learning services on the edge with multiple service implemen- tations,” in Proc. Int. Conf. on Comput. Commun. and Netw. (ICCCN) , Athens, Greece, Jul. 2021, pp. 1–8

work page 2021

[44] [44]

In-situ model downloading to realize versatile edge AI in 6G mobile networks,

K. Huang, H. Wu, Z. Liu, and X. Qi, “In-situ model downloading to realize versatile edge AI in 6G mobile networks,” IEEE Commun. Mag., vol. 30, no. 3, pp. 96–102, 2023

work page 2023

[45] [45]

Mobile edge intelligence for large language models: A contemporary survey,

G. Qu, Q. Chen, W. Wei, Z. Lin, X. Chen, and K. Huang, “Mobile edge intelligence for large language models: A contemporary survey,” IEEE Commun. Surveys Tuts., pp. 1–42, early access 2025

work page 2025

[46] [46]

Efficient multiuser AI downloading via reusable knowledge broadcasting,

H. Wu, Q. Zeng, and K. Huang, “Efficient multiuser AI downloading via reusable knowledge broadcasting,” IEEE Trans. Wireless Commun., vol. 23, no. 8, pp. 10 459–10 472, Aug. 2024

work page 2024

[47] [47]

Collaborative edge caching in context-aware device-to-device networks,

X. Zhao, P. Yuan, H. li, and S. Tang, “Collaborative edge caching in context-aware device-to-device networks,” IEEE Trans. Veh. Technol. , vol. 67, no. 10, pp. 9583–9596, Oct. 2018

work page 2018

[48] [48]

Soft cache hits: Improving performance through recommendation and delivery of related content,

P. Sermpezis, T. Giannakas, T. Spyropoulos, and L. Vigneri, “Soft cache hits: Improving performance through recommendation and delivery of related content,” IEEE J. Sel. Areas Commun. , vol. 36, no. 6, pp. 1300– 1313, Jun. 2018

work page 2018

[49] [49]

Multimedia caching strategies for hetero- geneous application and server environments,

A. Dan and D. Sitaram, “Multimedia caching strategies for hetero- geneous application and server environments,” Multimed. Tools Appl. , vol. 4, pp. 279–312, May 1997

work page 1997

[50] [50]

Jointly optimizing content caching and recommendations in small cell net- works,

L. E. Chatzieleftheriou, M. Karaliopoulos, and I. Koutsopoulos, “Jointly optimizing content caching and recommendations in small cell net- works,” IEEE Trans. Mobile Comput. , vol. 18, no. 1, pp. 125–138, Jan. 2019

work page 2019

[51] [51]

A survey of caching mechanisms in information-centric networking,

M. Zhang, H. Luo, and H. Zhang, “A survey of caching mechanisms in information-centric networking,” IEEE Commun. Surveys Tuts., vol. 17, no. 3, pp. 1473–1499, 3rd Quart. 2015

work page 2015

[52] [52]

Fujishige, Submodular functions and optimization

S. Fujishige, Submodular functions and optimization . New York, NY , USA: Elsevier, 2005

work page 2005

[53] [53]

Lov ´asz, Mathematical Programming The State of the Art: Bonn

L. Lov ´asz, Mathematical Programming The State of the Art: Bonn

work page

[54] [54]

Submodular functions and convexity, pp

Berlin, Heidelberg: Springer, 1983, ch. Submodular functions and convexity, pp. 235–257

work page 1983

[55] [55]

Autotune: Automatically tuning convolutional neural networks for improved transfer learning,

S. S. Basha, S. K. Vinakota, V . Pulabaigari, S. Mukherjee, and S. R. Dubey, “Autotune: Automatically tuning convolutional neural networks for improved transfer learning,” Neural Netw., vol. 133, pp. 112–122, Jan. 2021

work page 2021

[56] [56]

A comprehensive survey on transfer learning,

F. Zhuang, Z. Qi, K. Duan, D. Xi, Y . Zhu, H. Zhu, H. Xiong, and Q. He, “A comprehensive survey on transfer learning,” Proc. IEEE, vol. 109, no. 1, pp. 43–76, Jan. 2021

work page 2021

[57] [57]

Models and pre-trained weights

PyTorch, “Models and pre-trained weights.” [Online]. Available: https://docs.pytorch.org/vision/main/models.html

work page

[58] [58]

Introducing Apple’s on-device and server foundation models,

Apple, “Introducing Apple’s on-device and server foundation models,”

work page

[59] [59]

Available: https://machinelearning.apple.com/research/ introducing-apple-foundation-models

[Online]. Available: https://machinelearning.apple.com/research/ introducing-apple-foundation-models

work page

[60] [60]

Serving customized AI models at scale with LoRA,

IBM, “Serving customized AI models at scale with LoRA,” 2024. [Online]. Available: https://research.ibm.com/blog/LoRAs-explained

work page 2024

[61] [61]

Submodular optimization with submodular cover and submodular knapsack constraints,

R. K. Iyer and J. A. Bilmes, “Submodular optimization with submodular cover and submodular knapsack constraints,” in Proc. Adv. Neural Inform. Process. Syst. (NeurIPS) , Stateline, NV , USA, Dec. 2013, pp. 1–9

work page 2013

[62] [62]

Fast semidifferential-based submod- ular function optimization,

R. Iyer, S. Jegelka, and J. Bilmes, “Fast semidifferential-based submod- ular function optimization,” in Proc. Int. Conf. Mach. Learn. (ICML) , Atlanta, USA, Jun. 2013, pp. 855–863

work page 2013

[63] [63]

3rd generation partnership project; Technical specification group radio access network; NR; Base station (BS) radio transmission and reception; (Release 18),

3GPP, “3rd generation partnership project; Technical specification group radio access network; NR; Base station (BS) radio transmission and reception; (Release 18),” 3rd Generation Partnership Project (3GPP), Technical Specification (TS) 38.104, Dec. 2024, version 18.8.0

work page 2024

[64] [64]

Mobile backhaul: An overview,

GSMA, “Mobile backhaul: An overview,” 2019. [Online]. Available: https://www.gsma.com/futurenetworks/wiki/ mobile-backhaul-an-overview/

work page 2019

[65] [65]

Ultra- dense 5G small cell deployment for fiber and wireless backhaul-aware infrastructures,

A. L. Rezaabad, H. Beyranvand, J. A. Salehi, and M. Maier, “Ultra- dense 5G small cell deployment for fiber and wireless backhaul-aware infrastructures,” IEEE Trans. Veh. Technol., vol. 67, no. 12, pp. 12 231– 12 243, Dec. 2018

work page 2018

[66] [66]

Spectral efficiency analysis of cell-free massive MIMO systems with zero-forcing detector,

P. Liu, K. Luo, D. Chen, and T. Jiang, “Spectral efficiency analysis of cell-free massive MIMO systems with zero-forcing detector,” IEEE Trans. Wireless Commun., vol. 19, no. 2, pp. 795–807, Feb. 2020

work page 2020

[67] [67]

Relative frequency as a determinant of phonetic change,

G. K. Zipf, “Relative frequency as a determinant of phonetic change,” Harvard Studies in Classical Philology , vol. 40, pp. 1–95, 1929

work page 1929

[68] [68]

Social and spatial proactive caching for mobile data offloading,

E. Bas ¸tu˘g, M. Bennis, and M. Debbah, “Social and spatial proactive caching for mobile data offloading,” in Proc. IEEE Int. Conf.Commun. Workshops (ICC Wkshps), Sydney, NSW, Australia, Jun. 2014, pp. 581– 586

work page 2014

[69] [69]

Learning-aided content placement in caching-enabled fog computing systems using thompson sampling,

J. Zhu, X. Huang, and Z. Shao, “Learning-aided content placement in caching-enabled fog computing systems using thompson sampling,” in Proc. IEEE Int. Conf. Acoust., Speech, Signal Process. (ICASSP) , Barcelona, Spain, May 2020, pp. 5060–5064

work page 2020

[70] [70]

A survey of ICN content naming and in-network caching in 5G and beyond networks,

O. Serhane, K. Yahyaoui, B. Nour, and H. Moungla, “A survey of ICN content naming and in-network caching in 5G and beyond networks,” IEEE Internet Things J. , vol. 8, no. 6, pp. 4081–4104, Mar. 2021

work page 2021

[71] [71]

On caching and routing in information-centric net- works,

A. Seetharam, “On caching and routing in information-centric net- works,” IEEE Commun. Mag. , vol. 56, no. 3, pp. 204–209, Mar. 2018

work page 2018

[72] [72]

Approximability issues for unconstrained and constrained maximization of half-product related functions,

H. Kellerer, R. Sarto Basso, and V . A. Strusevich, “Approximability issues for unconstrained and constrained maximization of half-product related functions,” Theor. Comput. Sci., vol. 659, pp. 64–71, Jan. 2017

work page 2017

[73] [73]

Approximation algorithms for the multiple knapsack problem with assignment restrictions,

M. Dawande, J. Kalagnanam, P. Keskinocak, F. S. Salman, and R. Ravi, “Approximation algorithms for the multiple knapsack problem with assignment restrictions,” J. Comb. Optim. , vol. 4, pp. 171–186, 2000

work page 2000

[74] [74]

A polynomial time approximation scheme for the multiple knapsack problem,

C. Chekuri and S. Khanna, “A polynomial time approximation scheme for the multiple knapsack problem,” SIAM J. Comput. , vol. 35, no. 3, pp. 713–728, 2005. 1 APPENDIX A PROOF OF PROPOSITION 1 We begin by introducing a few statements. For any fea- sible X, let η (X) = {xm,i | xm,i = 1, xm,i ∈ X} denote the set of model caching decisions with xm,i = 1 , IX =...

work page 2005