PersLitEval benchmark shows LLMs perform better on conceptual Persian literature tasks than spelling or word formation, with explained few-shot prompting yielding the strongest results across six models.
Compositional preference models for aligning lms
3 Pith papers cite this work. Polarity classification is still indexing.
years
2026 3verdicts
UNVERDICTED 3representative citing papers
A new pipeline uses interpretability to characterize concepts in preference data and shape rewards via feature or data interventions during LM post-training.
A literature survey that introduces a taxonomy for LLM reasoning paradigms, analyzes methodological trends, and synthesizes failure modes from over 300 papers.
citing papers explorer
-
PersLitEval: Fine-grained Benchmark and Evaluation of LLMs on Persian Literature Questions
PersLitEval benchmark shows LLMs perform better on conceptual Persian literature tasks than spelling or word formation, with explained few-shot prompting yielding the strongest results across six models.
-
Anatomy of Post-Training: Using Interpretability to Characterize Data and Shape the Learning Signal
A new pipeline uses interpretability to characterize concepts in preference data and shape rewards via feature or data interventions during LM post-training.
-
The Periodic Table of LLM Reasoning: A Structured Survey of Reasoning Paradigms, Methods, and Failure Modes
A literature survey that introduces a taxonomy for LLM reasoning paradigms, analyzes methodological trends, and synthesizes failure modes from over 300 papers.