CheckMIABench converts LLMs with intermediate checkpoints into clean MIA testbeds by using pre- and post-checkpoint training data from the same distribution and evaluates published attacks on Pythia and OLMo models while releasing an open-source library.
(eds.) Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
12 Pith papers cite this work. Polarity classification is still indexing.
representative citing papers
ProMediate introduces a theory-grounded simulation testbed and socio-cognitive metrics to evaluate proactive AI mediator agents in multi-party multi-topic negotiations, with experiments showing a socially intelligent mediator improves consensus change and intervention speed over a generic baseline.
PeReGrINE is a graph-based benchmark that restructures Amazon Reviews 2023 with temporal cutoffs and introduces dissonance analysis to measure how well retrieval-conditioned models match user style and product consensus.
Emotional perturbations induced via activation steering systematically alter strategic choices made by small language model agents in cooperative and competitive game templates, yet the resulting behaviors remain unstable and only partially aligned with human patterns.
ToxPrune prunes toxic subwords from BPE tokenizers in LLMs to mitigate toxic dialogue responses and improve diversity on both toxic and non-toxic models.
Presents PEC-Home dataset for elliptical smart-home commands and shows LLMs achieve lower execution accuracy on elliptical inputs than complete commands even with dialogue history access.
A dual-agent closed-loop system integrates Theory of Mind reasoning with multimodal video generation to create social avatars that outperform full-information baselines on dialogue quality under information asymmetry.
RECAP is an inference-time framework using cognitive appraisal theory to enhance emotional alignment and transparency in medical dialogue systems across model scales.
Introduces PAS and FAS task abstractions plus the LLM-S^3 benchmark to evaluate LLMs on generating sociodemographic survey responses across 11 real datasets and multiple models.
Modifying nationality and language parameters in English-centric personas for mental health dialogues introduces clinical inconsistencies across languages and causes LLM judges to perform inaccurately on non-English depression severity assessments.
citing papers explorer
-
CheckMIABench: Firm Foundations For Membership Inference Attacks on Language Models
CheckMIABench converts LLMs with intermediate checkpoints into clean MIA testbeds by using pre- and post-checkpoint training data from the same distribution and evaluates published attacks on Pythia and OLMo models while releasing an open-source library.