Seven clinician-informed safety criteria enable LLM-as-a-Judge to reach substantial agreement with human consensus (Cohen's κ up to 0.75) on evaluating LLM responses to users demonstrating psychosis.
Title resolution pending
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
citation-role summary
background 1
citation-polarity summary
years
2026 2roles
background 1polarities
background 1representative citing papers
Agency in sustained human-AI chatbot talks emerges as co-constructed turn-by-turn through boundary-setting and intention-steering, organized in a new 3-by-4 framework of actors and actions.
citing papers explorer
-
Using LLM-as-a-Judge/Jury to Advance Scalable, Clinically-Validated Safety Evaluations of Model Responses to Users Demonstrating Psychosis
Seven clinician-informed safety criteria enable LLM-as-a-Judge to reach substantial agreement with human consensus (Cohen's κ up to 0.75) on evaluating LLM responses to users demonstrating psychosis.
-
Does My Chatbot Have an Agenda? Understanding Human and AI Agency in Human-Human-like Chatbot Interaction
Agency in sustained human-AI chatbot talks emerges as co-constructed turn-by-turn through boundary-setting and intention-steering, organized in a new 3-by-4 framework of actors and actions.