DOLORES, an agent using a formal language for meta-reasoning to construct adaptive scaffolds on the fly, outperforms prior scaffolding methods by 24.8% on average across four hard benchmarks and multiple model sizes.
Judging llm-as-a-judge with mt-bench and chatbot arena
3 Pith papers cite this work. Polarity classification is still indexing.
years
2026 3verdicts
UNVERDICTED 3representative citing papers
HumorRank ranks nine LLMs on textual humor using GTVH-grounded pairwise tournaments and Adaptive Swiss aggregation on the SemEval-2026 MWAHAHA dataset, finding that comedic mechanism mastery matters more than scale.
RAG-Pref is a training-free RAG-based alignment technique that conditions LLMs on contrastive preference samples during inference, yielding over 3.7x average improvement in agentic attack refusals when combined with offline methods across five LLMs.
citing papers explorer
-
Deep Reasoning in General Purpose Agents via Structured Meta-Cognition
DOLORES, an agent using a formal language for meta-reasoning to construct adaptive scaffolds on the fly, outperforms prior scaffolding methods by 24.8% on average across four hard benchmarks and multiple model sizes.
-
HumorRank: A Tournament-Based Leaderboard for Evaluating Humor Generation in Large Language Models
HumorRank ranks nine LLMs on textual humor using GTVH-grounded pairwise tournaments and Adaptive Swiss aggregation on the SemEval-2026 MWAHAHA dataset, finding that comedic mechanism mastery matters more than scale.
-
Leveraging RAG for Training-Free Alignment of LLMs
RAG-Pref is a training-free RAG-based alignment technique that conditions LLMs on contrastive preference samples during inference, yielding over 3.7x average improvement in agentic attack refusals when combined with offline methods across five LLMs.