CheckMIABench converts LLMs with intermediate checkpoints into clean MIA testbeds by using pre- and post-checkpoint training data from the same distribution and evaluates published attacks on Pythia and OLMo models while releasing an open-source library.
(eds.) Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
12 Pith papers cite this work. Polarity classification is still indexing.
representative citing papers
ProMediate introduces a theory-grounded simulation testbed and socio-cognitive metrics to evaluate proactive AI mediator agents in multi-party multi-topic negotiations, with experiments showing a socially intelligent mediator improves consensus change and intervention speed over a generic baseline.
PeReGrINE is a graph-based benchmark that restructures Amazon Reviews 2023 with temporal cutoffs and introduces dissonance analysis to measure how well retrieval-conditioned models match user style and product consensus.
Emotional perturbations induced via activation steering systematically alter strategic choices made by small language model agents in cooperative and competitive game templates, yet the resulting behaviors remain unstable and only partially aligned with human patterns.
ToxPrune prunes toxic subwords from BPE tokenizers in LLMs to mitigate toxic dialogue responses and improve diversity on both toxic and non-toxic models.
Presents PEC-Home dataset for elliptical smart-home commands and shows LLMs achieve lower execution accuracy on elliptical inputs than complete commands even with dialogue history access.
A dual-agent closed-loop system integrates Theory of Mind reasoning with multimodal video generation to create social avatars that outperform full-information baselines on dialogue quality under information asymmetry.
RECAP is an inference-time framework using cognitive appraisal theory to enhance emotional alignment and transparency in medical dialogue systems across model scales.
Introduces PAS and FAS task abstractions plus the LLM-S^3 benchmark to evaluate LLMs on generating sociodemographic survey responses across 11 real datasets and multiple models.
Modifying nationality and language parameters in English-centric personas for mental health dialogues introduces clinical inconsistencies across languages and causes LLM judges to perform inaccurately on non-English depression severity assessments.
citing papers explorer
-
CheckMIABench: Firm Foundations For Membership Inference Attacks on Language Models
CheckMIABench converts LLMs with intermediate checkpoints into clean MIA testbeds by using pre- and post-checkpoint training data from the same distribution and evaluates published attacks on Pythia and OLMo models while releasing an open-source library.
-
ProMediate: A Socio-cognitive framework for evaluating proactive agents in multi-party negotiation
ProMediate introduces a theory-grounded simulation testbed and socio-cognitive metrics to evaluate proactive AI mediator agents in multi-party multi-topic negotiations, with experiments showing a socially intelligent mediator improves consensus change and intervention speed over a generic baseline.
-
PeReGrINE: Evaluating Personalized Review Fidelity with User Item Graph Context
PeReGrINE is a graph-based benchmark that restructures Amazon Reviews 2023 with temporal cutoffs and introduces dissonance analysis to measure how well retrieval-conditioned models match user style and product consensus.
-
On Emotion-Sensitive Decision Making of Small Language Model Agents
Emotional perturbations induced via activation steering systematically alter strategic choices made by small language model agents in cooperative and competitive game templates, yet the resulting behaviors remain unstable and only partially aligned with human patterns.
-
Toxic Subword Pruning for Dialogue Response Generation on Large Language Models
ToxPrune prunes toxic subwords from BPE tokenizers in LLMs to mitigate toxic dialogue responses and improve diversity on both toxic and non-toxic models.
-
PEC-Home: Interpretation of Progressively Elliptical Commands in Smart Homes
Presents PEC-Home dataset for elliptical smart-home commands and shows LLMs achieve lower execution accuracy on elliptical inputs than complete commands even with dialogue history access.
-
Resonant Minds: Closed-Loop Social Avatars with Theory of Mind
A dual-agent closed-loop system integrates Theory of Mind reasoning with multimodal video generation to create social avatars that outperform full-information baselines on dialogue quality under information asymmetry.
-
RECAP: Transparent Inference-Time Emotion Alignment for Medical Dialogue Systems
RECAP is an inference-time framework using cognitive appraisal theory to enhance emotional alignment and transparency in medical dialogue systems across model scales.
-
Large Language Models as Virtual Survey Respondents: Evaluating Sociodemographic Response Generation
Introduces PAS and FAS task abstractions plus the LLM-S^3 benchmark to evaluate LLMs on generating sociodemographic survey responses across 11 real datasets and multiple models.
-
Creating Multilingual Mental Health Dialogue Datasets: Limits of Persona-Based Localization via Nationality and Language
Modifying nationality and language parameters in English-centric personas for mental health dialogues introduces clinical inconsistencies across languages and causes LLM judges to perform inaccurately on non-English depression severity assessments.
- Learn-To-Learn on Arbitrary Textual Conditioning: A Hypernetwork-Driven Meta-Gated LLM
- Strategic Persuasion with Trait-Conditioned Multi-Agent Systems for Iterative Legal Argumentation