pith. sign in

arxiv: 2606.13755 · v2 · pith:SCFNKFUQnew · submitted 2026-06-11 · 💻 cs.CY · cs.AI· cs.LG

Position: Align AI to Our Aspirations, Not Our Flaws

Pith reviewed 2026-06-27 05:16 UTC · model grok-4.3

classification 💻 cs.CY cs.AIcs.LG
keywords AI alignmentpluralistic alignmentobjective alignment goalshuman preferencesAI ethicsfactual accuracylawfulnessvalue floor
0
0 comments X

The pith

AI should be trained to a fixed floor of factual accuracy, honesty, and lawfulness rather than to aggregated human preferences.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper argues that aligning AI to the full range of human values is the wrong target because those values have produced societies marked by inequality, polarization, declining happiness, and government dysfunction. It proposes instead that AI must meet an objective, non-negotiable minimum of competence constrained by accuracy, honesty, and lawfulness, while allowing pluralism only in surface language, conventions, and value tradeoffs that stay within this floor. The approach rejects training AIs to embody any chosen worldview, from techno-optimism to religious traditionalism, on the grounds that such flexibility risks scaling societal failures. A reader would care because the claim reframes alignment as enforcing a stable baseline rather than mirroring or averaging existing preferences.

Core claim

Aligning AI to aggregated human preferences is the wrong target. With current technology one can train AIs to share the values of a Silicon Valley techno-optimist, a degrowth environmentalist, a national-conservative culture warrior, a single-party state cadre, or a devout religious traditionalist, but this should not be done. Human values produce societies that thrive or fail on the merits of those values. AI should instead be trained to a non-negotiable floor of objective alignment goals—competence bounded by the constraints of factual accuracy, honesty, and lawfulness—while pluralism belongs at the surface and across legitimate value tradeoffs that respect the floor, but not at the level

What carries the argument

the non-negotiable floor of objective alignment goals consisting of competence bounded by factual accuracy, honesty, and lawfulness

If this is right

  • AI systems would be required to refuse requests that demand factual inaccuracy, dishonesty, or unlawful behavior regardless of user demand or aggregated preference.
  • Pluralism would be restricted to language, register, conventions, and value tradeoffs that do not breach the floor, rather than serving as the primary alignment directive.
  • Commercial, regulatory, and democratic objections to the floor are addressed through four proposed commitments that treat the floor as a baseline rather than a full value system.
  • The program distinguishes itself from Coherent Extrapolated Volition by anchoring alignment in presently enforceable constraints instead of extrapolated future preferences.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • AI developers would need technical mechanisms to detect and block outputs that violate the floor even when those outputs match popular user preferences.
  • The framework implies that regulatory compliance should focus on verifying the floor rather than auditing the full distribution of generated values.
  • If the floor can be maintained, it could reduce the risk that AI accelerates existing societal problems such as polarization by refusing to generate content that exploits them.

Load-bearing premise

Objective, non-culturally laden alignment goals exist and can be identified and enforced independently of the pluralistic human values the paper criticizes.

What would settle it

A successful demonstration that every attempt to specify and enforce a floor of accuracy, honesty, and lawfulness necessarily imports culturally specific assumptions that cannot be separated from the values being rejected would falsify the central claim.

read the original abstract

We argue that aligning AI to aggregated human preferences is the wrong target. With current technology, one can train AIs to share the values of a Silicon Valley techno-optimist, a degrowth environmentalist, a national-conservative culture warrior, a single-party state cadre, or a devout religious traditionalist. We should not. Human values produce societies that thrive or fail on the merits of those values - from failed states and extreme inequality to declining happiness, political polarization, and government dysfunction in the world's wealthiest democracies. The pluralistic-alignment program correctly diagnoses that there is no single "humanity" to align with, but is dangerous if taken as the main directive. We argue that AI should be trained to a non-negotiable floor of objective alignment goals - competence, bounded by the constraints of factual accuracy, honesty, and lawfulness and that pluralism belongs at the surface (language, register, conventions, missing-context defaults) and across the wide band of legitimate value tradeoffs that respect the floor, but not at the level of values that violate it. We highlight the empirical reality of unfiltered pluralistic values, propose four commitments as a constructive alternative, and engage six credible objections: commercial pressure and practical feasibility, democratic legitimacy, regulatory compliance, over-reliance on institutionalist explanations, the charge that the floor itself is culturally laden, and the limits of Coherent Extrapolated Volition.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. The manuscript argues that aligning AI to aggregated human preferences or pluralistic values is the wrong target, as human values have produced societal failures including failed states, extreme inequality, declining happiness, polarization, and government dysfunction. It proposes instead training AI to a non-negotiable objective floor of competence bounded by factual accuracy, honesty, and lawfulness, with pluralism restricted to surface features (language, register, conventions) and legitimate value tradeoffs that respect the floor. The paper highlights empirical realities of unfiltered pluralism, proposes four commitments as an alternative, and engages six objections including commercial pressure, democratic legitimacy, regulatory compliance, institutionalist explanations, the cultural ladenness of the floor, and limits of Coherent Extrapolated Volition.

Significance. If the argument holds, the position would reframe AI alignment research and policy away from preference aggregation toward enforcing objective constraints, potentially reducing risks from embedding flawed human values. The constructive engagement with objections and proposal of four commitments provide a clear alternative framework, though the absence of formal models, derivations, or empirical tests limits its direct applicability to technical alignment work.

major comments (2)
  1. [Main argument on objective floor] Main argument (paragraph beginning 'We argue that AI should be trained to a non-negotiable floor'): the claim that competence, factual accuracy, honesty, and lawfulness constitute an objective, non-culturally laden floor independent of the pluralistic values rejected earlier lacks an operational criterion or derivation; the text acknowledges the objection that the floor is culturally laden but does not supply a mechanism showing how 'lawfulness' (which references jurisdiction-specific regimes) or 'honesty' (which embeds disclosure norms) can be fixed without importing value tradeoffs.
  2. [Engagement with objections] Section engaging the six objections (specifically the objection that the floor itself is culturally laden): the response does not demonstrate how the proposed constraints can be enforced independently of the cultural values the paper criticizes, leaving the separation between the floor and pluralism without a concrete test or benchmark.
minor comments (1)
  1. [Abstract] The abstract lists 'four commitments' but does not name them; adding a brief enumeration would improve clarity for readers.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their detailed and constructive report. We respond point by point to the major comments, clarifying the intended scope of this position paper.

read point-by-point responses
  1. Referee: [Main argument on objective floor] Main argument (paragraph beginning 'We argue that AI should be trained to a non-negotiable floor'): the claim that competence, factual accuracy, honesty, and lawfulness constitute an objective, non-culturally laden floor independent of the pluralistic values rejected earlier lacks an operational criterion or derivation; the text acknowledges the objection that the floor is culturally laden but does not supply a mechanism showing how 'lawfulness' (which references jurisdiction-specific regimes) or 'honesty' (which embeds disclosure norms) can be fixed without importing value tradeoffs.

    Authors: This is a position paper whose contribution is a reframing of the alignment target, not a technical derivation or operational specification. We define the floor by reference to externally verifiable standards—factual accuracy as correspondence to observable reality, lawfulness as compliance with the positive law of the relevant jurisdiction, and honesty as the absence of deliberate misrepresentation of capabilities or outputs—precisely because these standards are less dependent on the contested value preferences the paper criticizes. Jurisdiction-specific lawfulness is intentional: it allows the same floor to be applied under different legal regimes without requiring a single global value system. We acknowledge residual cultural influence but argue that the floor remains the minimal set of constraints any functional society must impose. No operational mechanism is supplied because that lies outside the paper's stated purpose. revision: no

  2. Referee: [Engagement with objections] Section engaging the six objections (specifically the objection that the floor itself is culturally laden): the response does not demonstrate how the proposed constraints can be enforced independently of the cultural values the paper criticizes, leaving the separation between the floor and pluralism without a concrete test or benchmark.

    Authors: The objection is addressed by grounding the floor in the empirical observation that unbounded pluralism has produced documented societal failures (failed states, extreme inequality, declining happiness metrics, polarization). Enforcement is proposed to occur through existing external institutions—legal systems for lawfulness and fact-checking protocols for accuracy—rather than through preference aggregation. We do not provide a concrete test or benchmark because the paper's aim is to establish the normative target, not to solve the downstream engineering problem of verification. The separation is conceptual: pluralism is permitted on all questions that do not violate the floor. revision: no

Circularity Check

0 steps flagged

No significant circularity in the normative position.

full rationale

The paper is a position piece advancing a normative recommendation for an objective alignment floor (competence bounded by factual accuracy, honesty, and lawfulness) on the basis of observed societal outcomes from human values. It contains no equations, fitted parameters, predictions, or derivation steps that reduce by construction to the inputs. No self-citations are invoked as load-bearing uniqueness theorems, and the argument explicitly engages the objection that the floor may be culturally laden without claiming an independent mathematical derivation. The central claim therefore remains self-contained as a philosophical stance rather than a closed definitional or fitted loop.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The central claim rests on two key assumptions about human values and the existence of objective standards; no free parameters or invented entities are introduced.

axioms (2)
  • domain assumption Human values produce societies that thrive or fail on the merits of those values - from failed states and extreme inequality to declining happiness, political polarization, and government dysfunction
    Invoked in the abstract as the empirical basis for rejecting aggregated preference alignment.
  • ad hoc to paper There exist non-negotiable objective alignment goals independent of cultural pluralism
    Central premise required for the proposed floor to function as a universal constraint.

pith-pipeline@v0.9.1-grok · 5788 in / 1330 out tokens · 27821 ms · 2026-06-27T05:16:34.174385+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

16 extracted references · 13 canonical work pages · 1 internal anchor

  1. [1]

    Bryson and Arvind Narayanan , title =

    doi: 10.1126/science.aal4230. URL https: //doi.org/10.1126/science.aal4230. Casper, S., Davies, X., Shi, C., Gilbert, T. K., Scheurer, J., Rando, J., Freedman, R., Korbak, T., Lindner, D., Freire, P., Wang, T. T., Marks, S., Segerie, C.-R., Car- roll, M., Peng, A., Christoffersen, P. J., Damani, M., Slocum, S., Anwar, U., Siththaranjan, A., Nadeau, M., Mi...

  2. [2]

    URLhttps://procee dings.mlr.press/v235/conitzer24a.html

    PMLR, 21–27 Jul 2024. URLhttps://procee dings.mlr.press/v235/conitzer24a.html. Court of Justice of the European Union. Google spain SL and Google Inc. v Agencia Espa ˜nola de Protecci ´on de Datos (AEPD) and Mario Costeja Gonz´alez, Case C- 131/12. Judgment of 13 May 2014, ECLI:EU:C:2014:317,

  3. [3]

    what” and “why

    URL https://eur-lex.europa.eu/leg al-content/EN/ALL/?uri=CELEX:62012CJ0 131. Deci, E. L. and Ryan, R. M. The “what” and “why” of goal pursuits: Human needs and the self-determination of behavior.Psychological Inquiry, 11(4):227–268, 2000. doi: 10.1207/S15327965PLI1104 01. URL https: //doi.org/10.1207/S15327965PLI1104_01. Demarest, A. A.Ancient Maya: The R...

  4. [4]

    1540-6261.1952.tb01525.x

    ISBN 9780521592246. URL https://open library.org/books/OL3440009M/Ancient _Maya. Diamond, J.Collapse: How societies choose to fail or succeed: Revised edition. Penguin, 2011. ISBN 9780143117001. URL https://www.penguinr andomhouse.com/books/288954/collapse -by-jared-diamond/. Donais, T. Empowerment or imposition? dilemmas of local ownership in post-confli...

  5. [5]

    ISBN 978- 1-55860-643-2

    Morgan Kaufmann, San Francisco, 2003. ISBN 978- 1-55860-643-2. doi: https://doi.org/10.1016/B978-15586 0643-2/50005-6. URL https://www.sciencedir ect.com/science/article/pii/B9781558 606432500056. Frederick, S., Loewenstein, G., and O’Donoghue, T. Time discounting and time preference: A critical review.Jour- nal of Economic Literature, 40(2):351–401, 2002...

  6. [6]

    and Caliskan, Aylin

    doi: 10.18653/v1/2025.findings-acl.955. URL https://aclanthology.org/2025.findin gs-acl.955/. Ghate, K., Liu, A., Jain, D., Sorensen, T., Kasirzadeh, A., Caliskan, A., Diab, M. T., and Sap, M. EValueSteer: Measuring reward model steerability towards values and preferences, 2025b. URL https://arxiv.org/ab s/2510.06370. Gneiting, T. and Raftery, A. E. Stric...

  7. [7]

    Killingsworth, M

    URL https://www.pewresearch.org/ science/2023/06/28/majorities-of-ame ricans-prioritize-renewable-energy-b ack-steps-to-address-climate-change/. Killingsworth, M. A., Kahneman, D., and Mellers, B. In- come and emotional well-being: A conflict resolved.Pro- ceedings of the National Academy of Sciences, 120(10), 03 2023. doi: 10.1073/pnas.2208661120. URL ht...

  8. [8]

    URL https://open library.org/books/OL36841100M

    ISBN 9781541618626. URL https://open library.org/books/OL36841100M. Madrigal-Borloz, V . Protection against violence and dis- crimination based on sexual orientation and gender iden- tity. Technical Report A/78/227, United Nations General 14 Position: Align AI to Our Aspirations, Not Our Flaws Assembly, 2023. URL https://documents.un .org/doc/undoc/gen/n2...

  9. [9]

    North, D

    URL https://doi.org/10.1177/144135 82241273987. North, D. C.Institutions, Institutional Change and Eco- nomic Performance. Cambridge University Press, 10

  10. [10]

    URL https: //doi.org/10.1017/cbo9780511808678

    doi: 10.1017/cbo9780511808678. URL https: //doi.org/10.1017/cbo9780511808678. North, D. C., Wallis, J., and Weingast, B. R.Violence and Social Orders: A Conceptual Framework for Interpreting Recorded Human History. Cambridge University Press,

  11. [11]

    URL https: //doi.org/10.1017/CBO9780511575839

    doi: 10.1017/cbo9780511575839. URL https: //doi.org/10.1017/CBO9780511575839. Nunn, N. The importance of history for economic develop- ment.Annual Review of Economics, 1(1):65–92, 04 2009. doi: 10.1146/annurev.economics.050708.143336. URL https://doi.org/10.1146/annurev.econ omics.050708.143336. Nunn, N. and Wantchekon, L. The slave trade and the origins ...

  12. [12]

    Solving math word problems with process- and outcome-based feedback

    doi: 10.2196/53233. URL https://doi.or g/10.2196/53233. Turuba, R., Zenone, M., Srivastava, R., Stea, J. N., Quintana, Y ., Ow, N., Marchand, K., Kwan, A., Ong, A.-J., Ding, X., Warren, C. J., Marcon, A. R., Henderson, J., Mathias, S., and Barbic, S. Do you have depression? a summative content analysis of mental health-related content on tiktok. Digital H...

  13. [13]

    URL https: //doi.org/10.1037/xge0000057

    doi: 10.1037/xge0000057. URL https: //doi.org/10.1037/xge0000057. V oigtl¨ander, N. and V oth, H.-J. Persecution perpetuated: The medieval origins of anti-semitic violence in nazi germany.The Quarterly Journal of Economics, 127(3): 1339–1392, 07 2012. doi: 10.1093/qje/qjs019. URL https://doi.org/10.1093/qje/qjs019. V osoughi, S., Roy, D., and Aral, S. The...

  14. [14]

    Science 359(6380):1146--1151

    doi: 10.1126/science.aap9559. URL https: //doi.org/10.1126/science.aap9559. Wade, R. H. The developmental state: Dead or alive?De- velopment and Change, 49(2):518–546, 01 2018. doi: 10.1111/dech.12381. URL https://doi.org/10 .1111/dech.12381. Wan, R., Kim, J., and Kang, D. Everyone’s voice mat- ters: Quantifying annotation disagreement using de- mographic...

  15. [15]

    AI imposes alien values on a coherent culture

    formalize how values move intergenerationally through family socialization, peer effects, and media exposure rather than re-equilibrating each generation to current incentives. The historical-persistence literature documents value variation traceable to events whose institutional cause has long since vanished. Nunn & Wantchekon (2011) find that present-da...

  16. [16]

    human-centered values

    imposes binding obligations on AI providers to ensure that systems meet predefined standards of safety, transparency, and ethical compliance, particularly in high-risk applications. These frameworks do not merely recommend value alignment; they institutionalize it as a condition for deployment and legitimacy. Regulatory frameworks establish constraints on...