CQC-RAG improves RAG factuality by generating diverse equivalent queries, building query-specific contexts, and selecting answers via cross-query confidence stability, with reported gains of +4.76 pp EM on TriviaQA and +9.12 pp EM on MuSiQue.
How easily do irrelevant inputs skew the responses of large language models?
8 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
verdicts
UNVERDICTED 8roles
background 2polarities
background 2representative citing papers
CLORE augments correct on-policy rollouts by deleting repetitive and irrelevant segments then optimizes with auxiliary DPO to improve accuracy-efficiency trade-off on math benchmarks.
Seirênes trains LLMs via adversarial self-play to generate and overcome evolving distractions, producing gains of 7-10 points on math reasoning benchmarks and exposing blind spots in larger models.
PMSR progressively constructs structured reasoning trajectories with dual-scope queries and compositional reasoning to improve knowledge acquisition and answer accuracy in knowledge-intensive VQA.
A gamified system with multiple LLM agents of varied personalities gathers interaction data to produce more effective and interpretable Big Five personality assessments than single-context methods.
E2LLM uses encoder-based soft prompt compression for long contexts to improve LLM reasoning on tasks like summarization and QA while maintaining efficiency.
A cross-modal attention refinement module plus hybrid loss improves robustness of audio-text retrieval on noisy and long-form audio.
A semi-structured thematic synthesis identifies core challenges in FM selection, alignment, prompting, orchestration, testing, deployment, and cross-cutting concerns like observability for production-ready FMware.
citing papers explorer
-
CQC-RAG: Robust Retrieval-Augmented Generation via Cross-Query Consistency
CQC-RAG improves RAG factuality by generating diverse equivalent queries, building query-specific contexts, and selecting answers via cross-query confidence stability, with reported gains of +4.76 pp EM on TriviaQA and +9.12 pp EM on MuSiQue.
-
CLORE: Content-Level Optimization for Reasoning Efficiency
CLORE augments correct on-policy rollouts by deleting repetitive and irrelevant segments then optimizes with auxiliary DPO to improve accuracy-efficiency trade-off on math benchmarks.
-
Seir\^enes: Adversarial Self-Play with Evolving Distractions for LLM Reasoning
Seirênes trains LLMs via adversarial self-play to generate and overcome evolving distractions, producing gains of 7-10 points on math reasoning benchmarks and exposing blind spots in larger models.
-
Progressive Multimodal Search and Reasoning for Knowledge-Intensive Visual Question Answering
PMSR progressively constructs structured reasoning trajectories with dual-scope queries and compositional reasoning to improve knowledge acquisition and answer accuracy in knowledge-intensive VQA.
-
Exploring a Gamified Personality Assessment Method through Interaction with LLM Agents Embodying Different Personalities
A gamified system with multiple LLM agents of varied personalities gathers interaction data to produce more effective and interpretable Big Five personality assessments than single-context methods.
-
E2LLM: Encoder Elongated Large Language Models for Long-Context Understanding and Reasoning
E2LLM uses encoder-based soft prompt compression for long contexts to improve LLM reasoning on tasks like summarization and QA while maintaining efficiency.
-
Robust Audio-Text Retrieval via Cross-Modal Attention and Hybrid Loss
A cross-modal attention refinement module plus hybrid loss improves robustness of audio-text retrieval on noisy and long-form audio.
-
From Cool Demos to Production-Ready FMware: Core Challenges and a Technology Roadmap
A semi-structured thematic synthesis identifies core challenges in FM selection, alignment, prompting, orchestration, testing, deployment, and cross-cutting concerns like observability for production-ready FMware.