pith. sign in

arXiv preprint arXiv:2009.06367 , year =

7 Pith papers cite this work. Polarity classification is still indexing.

7 Pith papers citing it

clear filters

representative citing papers

Inference Time Causal Probing in LLMs

cs.AI · 2026-05-08 · unverdicted · novelty 7.0

HDMI is a new probe-free technique that steers LLM hidden states via margin objectives to achieve more reliable causal interventions than prior probe-based methods on standard benchmarks.

Conditional Attribute Estimation with Autoregressive Sequence Models

cs.AI · 2026-05-13 · unverdicted · novelty 6.0

Conditional Attribute Transformers jointly estimate next-token probabilities and conditional attribute values for autoregressive sequence models, enabling credit assignment, counterfactuals, and steerable generation in one pass.

Aligning AI With Shared Human Values

cs.CY · 2020-08-05 · conditional · novelty 6.0

Introduces ETHICS benchmark showing current language models have promising but incomplete ability to predict basic human ethical judgments on text scenarios.

citing papers explorer

Showing 2 of 2 citing papers after filters.

  • Inference Time Causal Probing in LLMs cs.AI · 2026-05-08 · unverdicted · none · ref 10

    HDMI is a new probe-free technique that steers LLM hidden states via margin objectives to achieve more reliable causal interventions than prior probe-based methods on standard benchmarks.

  • Conditional Attribute Estimation with Autoregressive Sequence Models cs.AI · 2026-05-13 · unverdicted · none · ref 16

    Conditional Attribute Transformers jointly estimate next-token probabilities and conditional attribute values for autoregressive sequence models, enabling credit assignment, counterfactuals, and steerable generation in one pass.