Concept Alignment

Christopher Kello; Ilia Sucholutsky; Polyphony J. Bruna; Sunayana Rane; Thomas L. Griffiths

arxiv: 2401.08672 · v1 · pith:HFPLAOJWnew · submitted 2024-01-09 · 💻 cs.LG · cs.AI· q-bio.NC

Concept Alignment

Sunayana Rane , Polyphony J. Bruna , Ilia Sucholutsky , Christopher Kello , Thomas L. Griffiths This is my paper

classification 💻 cs.LG cs.AIq-bio.NC

keywords alignmenthumansconceptconceptssystemsaligncognitiveexplain

0 comments

read the original abstract

Discussion of AI alignment (alignment between humans and AI systems) has focused on value alignment, broadly referring to creating AI systems that share human values. We argue that before we can even attempt to align values, it is imperative that AI systems and humans align the concepts they use to understand the world. We integrate ideas from philosophy, cognitive science, and deep learning to explain the need for concept alignment, not just value alignment, between humans and machines. We summarize existing accounts of how humans and machines currently learn concepts, and we outline opportunities and challenges in the path towards shared concepts. Finally, we explain how we can leverage the tools already being developed in cognitive science and AI research to accelerate progress towards concept alignment.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 3 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

A Taxonomy of Conceptual Alignment in Human-Robot Dialogue
cs.RO 2026-06 unverdicted novelty 6.0

Introduces a taxonomy and dialogue act schema for bidirectional conceptual alignment in human-robot interaction dialogues.
Investigating Concept Alignment Using Implausible Category Members
cs.AI 2026-05 unverdicted novelty 6.0

AI models misalign with humans on concept boundaries when probed with implausible category members, such as classifying words as vehicles or vegetables as fruit.
Medical Model Synthesis Architectures: A Case Study
cs.AI 2026-05 unverdicted novelty 5.0

MedMSA framework retrieves knowledge via language models then builds formal probabilistic models to produce uncertainty-weighted differential diagnoses from symptoms.