pith. machine review for the scientific record. sign in

citation dossier

Inference-time intervention: Eliciting truthful answers from a language model

Kenneth Li, Oam Patel, Fernanda Viégas, Hanspeter Pfister, and Martin Wattenberg · 2023 · arXiv 2306.03341

17Pith papers citing it
17reference links
cs.LGtop field · 7 papers
UNVERDICTEDtop verdict bucket · 16 papers

This arXiv-backed work is queued for full Pith review when it crosses the high-inbound sweep. That review runs reader · skeptic · desk-editor · referee · rebuttal · circularity · lean confirmation · RS check · pith extraction.

read on arXiv PDF

why this work matters in Pith

Pith has found this work in 17 reviewed papers. Its strongest current cluster is cs.LG (7 papers). The largest review-status bucket among citing papers is UNVERDICTED (16 papers). For highly cited works, this page shows a dossier first and a bounded explorer second; it never tries to render every citing paper at once.

years

2026 15 2023 2

representative citing papers

Deep Minds and Shallow Probes

cs.LG · 2026-05-12 · unverdicted · novelty 7.0

Symmetry under affine reparameterizations of hidden coordinates selects a unique hierarchy of shallow coordinate-stable probes and a probe-visible quotient for cross-model transfer.

Minimizing Collateral Damage in Activation Steering

cs.LG · 2026-05-01 · unverdicted · novelty 6.0

Activation steering is cast as constrained optimization that minimizes collateral damage by weighting perturbations according to the empirical second-moment matrix of activations instead of assuming isotropy.

Steering Llama 2 via Contrastive Activation Addition

cs.CL · 2023-12-09 · unverdicted · novelty 6.0

Contrastive Activation Addition steers Llama 2 Chat by adding averaged residual-stream activation differences from contrastive example pairs to control targeted behaviors at inference time.

citing papers explorer

Showing 17 of 17 citing papers.