Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP) , year =

Bisk, Yonatan, Holtzman, Ari, Thomason, Jesse, Andreas, Jacob, Bengio, Yoshua, Chai, Joyce · 2020 · DOI 10.18653/v1/2020.emnlp-main.703

3 Pith papers cite this work. Polarity classification is still indexing.

3 Pith papers citing it

open at publisher browse 3 citing papers

representative citing papers

Beyond Object-Level Alignment: Do Brains and DNNs Preserve the Same Transformations?

q-bio.NC · 2026-05-07 · unverdicted · novelty 8.0

A new Naturality Violation Score applied to fMRI and vision DNNs reveals that semantic axes align best in higher visual cortex and deeper layers while low-level axes align earlier.

Lexical Consensus: Grounded Word Learning and Shared Meaning in Artificial Agents

cs.CL · 2026-06-20 · unverdicted · novelty 7.0

Agents acquire lexical labels for visual concepts following a perceptual coherence gradient where perceptual distance predicts learning accuracy independently of semantic distance in a pre-registered CIFAR-100 experiment.

The Wrong Kind of Right: Quantifying and Localizing Misfired Alignment in LLMs

cs.CL · 2026-06-17 · unverdicted · novelty 6.0

LLMs exhibit misfired alignment on stereotype questions at 4.7-18.9% rates on the new VETO benchmark of 2,032 contrastive pairs, unlike humans at 0%, due to overgeneralized safety cues after instruction tuning.

citing papers explorer

Showing 3 of 3 citing papers.

Beyond Object-Level Alignment: Do Brains and DNNs Preserve the Same Transformations? q-bio.NC · 2026-05-07 · unverdicted · none · ref 24
A new Naturality Violation Score applied to fMRI and vision DNNs reveals that semantic axes align best in higher visual cortex and deeper layers while low-level axes align earlier.
Lexical Consensus: Grounded Word Learning and Shared Meaning in Artificial Agents cs.CL · 2026-06-20 · unverdicted · none · ref 9
Agents acquire lexical labels for visual concepts following a perceptual coherence gradient where perceptual distance predicts learning accuracy independently of semantic distance in a pre-registered CIFAR-100 experiment.
The Wrong Kind of Right: Quantifying and Localizing Misfired Alignment in LLMs cs.CL · 2026-06-17 · unverdicted · none · ref 41
LLMs exhibit misfired alignment on stereotype questions at 4.7-18.9% rates on the new VETO benchmark of 2,032 contrastive pairs, unlike humans at 0%, due to overgeneralized safety cues after instruction tuning.

Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP) , year =

fields

years

verdicts

representative citing papers

citing papers explorer