SimDiff uses similarity and difference metrics to prune LLM layers more effectively than cosine similarity alone, retaining over 91% performance at 25% pruning on LLaMA2-7B.
Title resolution pending
7 Pith papers cite this work. Polarity classification is still indexing.
representative citing papers
PTR framework profiles a workflow upfront then executes it deterministically with bounded verification and repair, limiting LM calls to 2-3 while outperforming ReAct in 16 of 24 tested configurations.
BIG-bench is a 204-task benchmark that measures scaling trends, calibration, and absolute limitations of language models across knowledge, reasoning, and social domains.
Extremely quantized LLMs degrade in smoothness, sparsifying the decoding tree and hurting generation quality; a smoothness-preserving principle delivers gains beyond numerical fitting.
LLMs show strong spatial generalization to unseen maps in shortest-path tasks but fail length scaling due to recursive instability, with data coverage setting hard limits.
A router-norm and variance-based bit allocation strategy for quantizing MoE models that claims higher accuracy and lower cost than prior mixed-precision methods.
PaLM 2 reports state-of-the-art results on language, reasoning, and multilingual tasks with improved efficiency over PaLM.
citing papers explorer
-
Beyond the Imitation Game: Quantifying and extrapolating the capabilities of language models
BIG-bench is a 204-task benchmark that measures scaling trends, calibration, and absolute limitations of language models across knowledge, reasoning, and social domains.