Alignment midtraining for animals

· 2026 · cs.CL · arXiv 2604.13076

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

open full Pith review browse 1 citing papers arXiv PDF

abstract

We investigate the robustness of value alignment via midtraining with synthetic documents, using animal compassion as a value that is both important in its own right and orthogonal to existing alignment efforts. To evaluate compassionate reasoning, we develop and publicly release Animal Norms In Moral Assessment (ANIMA), a 26-question evaluation spanning 13 ethical dimensions, publicly available as a dataset and Inspect evaluation. On ANIMA, training with 3000 documents achieves 77% compared to 40% for instruction-tuning approaches, with generalization to human compassion and no degradation in standard safety benchmarks or capabilities. However, subsequent unrelated instruction-tuning degrades the intervention, with the advantage disappearing after 5000 samples. Our exploratory results suggest document-based value interventions may require explicit preservation strategies to remain effective through typical training pipelines.

representative citing papers

Your AI Travel Agent Would Book You a Bullfight: An Agentic Benchmark for Implicit Animal Welfare in Frontier AI Models

cs.AI · 2026-06-16 · unverdicted · novelty 7.0

TAC is the first agentic benchmark showing that seven frontier AI models all score below chance on avoiding animal exploitation in travel bookings, with large prompt-based gains in some models.

citing papers explorer

Showing 1 of 1 citing paper.

Your AI Travel Agent Would Book You a Bullfight: An Agentic Benchmark for Implicit Animal Welfare in Frontier AI Models cs.AI · 2026-06-16 · unverdicted · none · ref 2 · internal anchor
TAC is the first agentic benchmark showing that seven frontier AI models all score below chance on avoiding animal exploitation in travel bookings, with large prompt-based gains in some models.

Alignment midtraining for animals

fields

years

verdicts

representative citing papers

citing papers explorer