Smith, Daniel Khashabi, and Hannaneh Hajishirzi

Wang, Yizhong, Kordi, Yeganeh, Mishra, Swaroop, Liu, Alisa, Smith, Noah A · 2023 · DOI 10.18653/v1/2023.acl-long.754

9 Pith papers cite this work. Polarity classification is still indexing.

9 Pith papers citing it

open at publisher browse 9 citing papers

citation-role summary

background 1

citation-polarity summary

background 1

representative citing papers

ArgBench: Benchmarking LLMs on Computational Argumentation Tasks

cs.CL · 2026-04-19 · unverdicted · novelty 8.0

ArgBench unifies 33 existing datasets into a standardized benchmark for testing LLMs across 46 argumentation tasks and analyzes the impact of prompting techniques and model factors on performance.

AcquisitionSynthesis: Targeted Data Generation using Acquisition Functions

cs.CL · 2026-05-13 · unverdicted · novelty 7.0

AcquisitionSynthesis uses acquisition functions as rewards to train generators that produce higher-quality synthetic data, delivering 2-7% gains on math, medical QA, and coding tasks with improved robustness to forgetting.

MTA: Multi-Granular Trajectory Alignment for Large Language Model Distillation

cs.CL · 2026-05-02 · unverdicted · novelty 7.0

MTA improves LLM knowledge distillation by aligning representations along layer-wise trajectories with adaptive granularity from words to phrases using dynamic structural and hidden representation alignment losses.

Teaching LLMs Human-Like Editing of Inappropriate Argumentation via Reinforcement Learning

cs.CL · 2026-04-14 · unverdicted · novelty 7.0

Reinforcement learning with a multi-part reward teaches LLMs to output independent, meaning-preserving sentence edits that raise argument appropriateness close to full rewriting.

Self-Rewarding Language Models

cs.CL · 2024-01-18 · conditional · novelty 7.0

Iterative self-rewarding via LLM-as-Judge in DPO training on Llama 2 70B improves instruction following and self-evaluation, outperforming GPT-4 on AlpacaEval 2.0.

Process Reinforcement through Implicit Rewards

cs.LG · 2025-02-03 · conditional · novelty 6.0

PRIME enables online process reward model updates in LLM RL using implicit rewards from rollouts and outcome labels, yielding 15.1% average gains on reasoning benchmarks and surpassing a stronger instruct model with 10% of the data.

Self-Prompting Small Language Models for Privacy-Sensitive Clinical Information Extraction

cs.CL · 2026-05-05 · unverdicted · novelty 5.0

Small open-weight language models can self-optimize prompts for clinical named entity recognition in dental notes, reaching micro F1 of 0.864 after DPO on Qwen2.5-14B.

SRA: Span Representation Alignment for Large Language Model Distillation

cs.CL · 2026-05-02 · unverdicted · novelty 5.0

SRA reframes cross-tokenizer LLM distillation as alignment of attention-weighted span centers of mass in a multi-particle dynamical system and reports consistent gains over prior CTKD baselines.

Benchmarked Yet Not Measured -- Generative AI Should be Evaluated Against Real-World Utility

cs.LG · 2026-05-07 · unverdicted · novelty 4.0 · 2 refs

Generative AI evaluation must shift from static benchmark scores to measuring sustained improvements in human capabilities within specific deployment contexts.

citing papers explorer

Showing 1 of 1 citing paper after filters.

Self-Rewarding Language Models cs.CL · 2024-01-18 · conditional · none · ref 119
Iterative self-rewarding via LLM-as-Judge in DPO training on Llama 2 70B improves instruction following and self-evaluation, outperforming GPT-4 on AlpacaEval 2.0.

Smith, Daniel Khashabi, and Hannaneh Hajishirzi

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer