IPQA is a new benchmark that measures how well models identify core user intents from history in personalized question answering, finding that performance is poor and declines with greater question complexity.
Peper, Christopher Clarke, Andrew Lee, Parker Hill, Jonathan K
6 Pith papers cite this work. Polarity classification is still indexing.
representative citing papers
SCOPE selects readable hidden layers, constructs conformal gates with IND calibration, and uses supermartingale e-processes to certify persistent service-boundary evidence, improving rejection over final-layer detectors across multiple LLMs and boundary conditions.
TaDSE learns dialogue sentence embeddings via template-guided self-supervised contrastive learning plus synthetic slot-filling augmentation and reports gains on five downstream benchmarks.
TextClusterLab introduces an LLM-driven generator for synthetic text clustering datasets with tunable attributes and a suitability benchmark for evaluation.
IAP uses RL to train LLMs to explicitly infer and apply implicit user intent in single-turn personalized QA, achieving ~7.5% average macro-score gains over baselines on LaMP-QA.
citing papers explorer
-
IPQA: A Benchmark for Core Intent Identification in Personalized Question Answering
IPQA is a new benchmark that measures how well models identify core user intents from history in personalized question answering, finding that performance is poor and declines with greater question complexity.
-
SCOPE: Sequential Conformal Probing for Reliable OOD Rejection in LLM Services
SCOPE selects readable hidden layers, constructs conformal gates with IND calibration, and uses supermartingale e-processes to certify persistent service-boundary evidence, improving rejection over final-layer detectors across multiple LLMs and boundary conditions.
-
Template-assisted Contrastive Learning of Task-oriented Dialogue Sentence Embeddings
TaDSE learns dialogue sentence embeddings via template-guided self-supervised contrastive learning plus synthetic slot-filling augmentation and reports gains on five downstream benchmarks.
-
TextClusterLab: An Integrated Framework for Reliable Text Clustering Studies
TextClusterLab introduces an LLM-driven generator for synthetic text clustering datasets with tunable attributes and a suitability benchmark for evaluation.
-
Training LLMs with Reinforcement Learning for Intent-Aware Personalized Question Answering
IAP uses RL to train LLMs to explicitly infer and apply implicit user intent in single-turn personalized QA, achieving ~7.5% average macro-score gains over baselines on LaMP-QA.
- Evaluating Memory in LLM Agents via Incremental Multi-Turn Interactions