Curtailing diversity in candidate pools for test-time scaling increases unsafe LLM outputs, as demonstrated by a reference-guided reduction protocol that evades standard safety classifiers across open and closed models.
Ensembling large language models with process reward-guided tree search for better complex reasoning.arXiv preprint arXiv:2412.15797
4 Pith papers cite this work. Polarity classification is still indexing.
fields
cs.CL 4verdicts
UNVERDICTED 4representative citing papers
LLM-PeerReview ensembles LLMs by scoring responses with LLM-as-Judge and selecting the best via averaging or truth inference, beating Smoothie-Global by 6.9-7.3 points on four datasets.
A literature survey that introduces a taxonomy for LLM reasoning paradigms, analyzes methodological trends, and synthesizes failure modes from over 300 papers.
A systematic survey of LLM ensemble methods organized into a taxonomy of ensemble-before-inference, ensemble-during-inference, and ensemble-after-inference stages, with review of benchmarks, applications, and future directions.
citing papers explorer
-
Less Diverse, Less Safe: The Indirect But Pervasive Risk of Test-Time Scaling in Large Language Models
Curtailing diversity in candidate pools for test-time scaling increases unsafe LLM outputs, as demonstrated by a reference-guided reduction protocol that evades standard safety classifiers across open and closed models.
-
Scoring, Reasoning, and Selecting the Best! Ensembling Large Language Models via a Peer-Review Process
LLM-PeerReview ensembles LLMs by scoring responses with LLM-as-Judge and selecting the best via averaging or truth inference, beating Smoothie-Global by 6.9-7.3 points on four datasets.
-
The Periodic Table of LLM Reasoning: A Structured Survey of Reasoning Paradigms, Methods, and Failure Modes
A literature survey that introduces a taxonomy for LLM reasoning paradigms, analyzes methodological trends, and synthesizes failure modes from over 300 papers.
-
Harnessing Multiple Large Language Models: A Survey on LLM Ensemble
A systematic survey of LLM ensemble methods organized into a taxonomy of ensemble-before-inference, ensemble-during-inference, and ensemble-after-inference stages, with review of benchmarks, applications, and future directions.