HalluWorld is a controlled benchmark using explicit reference world models to automatically label and disentangle hallucinations in LLMs across synthetic environments with varying complexity and observability.
Title resolution pending
7 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
roles
baseline 1polarities
baseline 1representative citing papers
Output vector editing on MLP neurons suppresses memorization in LLMs up to 87.9% on 6831 sequences in OLMo-7B with a 2.7x gap over zero ablation, ensemble covering 96.5%.
ARIADNE routes queries to the best adapter via embedding-space centroid proximity, recovering 97.44% of upper-bound performance on 23 NLP tasks and 89.7% selection accuracy on 44 tasks without training or internal access.
LIMO achieves 63.3% on AIME24 and 95.6% on MATH500 via supervised fine-tuning on roughly 1% of the data used by prior models, supporting the claim that minimal strategic examples suffice when pre-training has already encoded domain knowledge.
UltraChat supplies 1.5 million high-quality multi-turn dialogues that, when used to fine-tune LLaMA, produce UltraLLaMA, which outperforms prior open-source chat models including Vicuna.
Re-evaluating controlled text generation systems under standardized conditions reveals that many published performance claims do not hold, highlighting the need for consistent evaluation practices.
Self-Refine boosts LLM outputs by ~20% on average across seven tasks by having the same model iteratively generate, critique, and refine its own responses.
citing papers explorer
-
HalluWorld: A Controlled Benchmark for Hallucination via Reference World Models
HalluWorld is a controlled benchmark using explicit reference world models to automatically label and disentangle hallucinations in LLMs across synthetic environments with varying complexity and observability.
-
Output Vector Editing for Memorization Mitigation in Large Language Models
Output vector editing on MLP neurons suppresses memorization in LLMs up to 87.9% on 6831 sequences in OLMo-7B with a 2.7x gap over zero ablation, ensemble covering 96.5%.
-
ARIADNE: Agnostic Routing for Inference-time Adapter DyNamic sElection
ARIADNE routes queries to the best adapter via embedding-space centroid proximity, recovering 97.44% of upper-bound performance on 23 NLP tasks and 89.7% selection accuracy on 44 tasks without training or internal access.
-
LIMO: Less is More for Reasoning
LIMO achieves 63.3% on AIME24 and 95.6% on MATH500 via supervised fine-tuning on roughly 1% of the data used by prior models, supporting the claim that minimal strategic examples suffice when pre-training has already encoded domain knowledge.
-
Enhancing Chat Language Models by Scaling High-quality Instructional Conversations
UltraChat supplies 1.5 million high-quality multi-turn dialogues that, when used to fine-tune LLaMA, produce UltraLLaMA, which outperforms prior open-source chat models including Vicuna.
-
A Comparative Study of Controlled Text Generation Systems Using Level-Playing-Field Evaluation Principles
Re-evaluating controlled text generation systems under standardized conditions reveals that many published performance claims do not hold, highlighting the need for consistent evaluation practices.
-
Self-Refine: Iterative Refinement with Self-Feedback
Self-Refine boosts LLM outputs by ~20% on average across seven tasks by having the same model iteratively generate, critique, and refine its own responses.