CachePrune enables fine-grained, token-level KV cache reuse across LLM requests by masking sensitive segments, eliminating direct side-channel leakage while cutting TTFT by 4.5x and raising hit rates by 44% versus prior coarse-grained methods.
Measuring the mixing of contextual information in the transformer
3 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
verdicts
UNVERDICTED 3roles
background 1polarities
background 1representative citing papers
Geometry-calibrated conformal abstention lets language models abstain from uncertain queries with finite-sample guarantees on both participation rate and conditional correctness of answers.
A systematic literature review of explainability in multimodal attention models finds most studies focus on vision-language tasks with attention-based explanations, but evaluation methods lack consistency and modality-specific considerations.
citing papers explorer
-
CachePrune: Privacy-Aware and Fine-Grained KV Cache Sharing for Efficient LLM Inference
CachePrune enables fine-grained, token-level KV cache reuse across LLM requests by masking sensitive segments, eliminating direct side-channel leakage while cutting TTFT by 4.5x and raising hit rates by 44% versus prior coarse-grained methods.
-
Geometry-Calibrated Conformal Abstention for Language Models
Geometry-calibrated conformal abstention lets language models abstain from uncertain queries with finite-sample guarantees on both participation rate and conditional correctness of answers.
-
Decoding the Multimodal Maze: A Systematic Review on the Adoption of Explainability in Multimodal Attention-based Models
A systematic literature review of explainability in multimodal attention models finds most studies focus on vision-language tasks with attention-based explanations, but evaluation methods lack consistency and modality-specific considerations.