Test-time scaling for personalized LLMs follows a logarithmic utility curve under oracle selection but standard reward models suffer user-level collapse and query-level hacking; a probabilistic reward model with learned variance enables consistent scaling.
Fung, Hailong Yang, and Depei Qian
7 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
verdicts
UNVERDICTED 7roles
background 1polarities
background 1representative citing papers
ProjRes achieves near-100% accuracy in membership inference on FedLLMs by measuring projection residuals of hidden embeddings on gradient subspaces, outperforming prior methods by up to 75.75% even under differential privacy.
LA-LoRA decouples LoRA matrix updates in DPFL settings to improve robustness to privacy noise, delivering up to 16.83% higher accuracy than prior LoRA variants on Swin-B under strict epsilon=1.
SecureGate reduces PII leakage up to 31.66X in federated LLM fine-tuning via token-gated dual LoRA adapters while preserving utility and achieving perfect routing reliability.
DECA partitions LLM parameters into blocks for sequential block-wise Adam optimization in decentralized non-IID settings to support efficient full-parameter fine-tuning.
FedSpy-LLM uses gradient decomposition and iterative alignment to reconstruct larger batches and longer sequences of training data from LLM gradients in federated settings, including with PEFT methods.
FediLoRA is a lightweight federated LoRA aggregation method that jointly mitigates missing modalities and heterogeneous ranks in collaborative fine-tuning of foundation models.
citing papers explorer
-
Test-Time Personalization: A Diagnostic Framework and Probabilistic Fix for Scaling Failures
Test-time scaling for personalized LLMs follows a logarithmic utility curve under oracle selection but standard reward models suffer user-level collapse and query-level hacking; a probabilistic reward model with learned variance enables consistent scaling.