arxiv: 2604.09213 · v1 · submitted 2026-04-10 · 💻 cs.CV

Recognition: no theorem link

SHIFT: Steering Hidden Intermediates in Flow Transformers

Aibek Alanov, Andrey Kuznetsov, Nina Konovalova

Authors on Pith no claims yet

Pith reviewed 2026-05-10 18:30 UTC · model grok-4.3

classification 💻 cs.CV

keywords DiTdiffusion modelsconcept removalactivation steeringinference-time controlimage generationsteering vectorsconcept editing

0 comments

The pith

SHIFT steers intermediate activations in DiT diffusion models to remove unwanted concepts at inference time without retraining.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces SHIFT, a lightweight framework that learns steering vectors to manipulate hidden states inside DiT-based diffusion models during generation. These vectors are applied selectively to layers and timesteps to suppress specific visual concepts while leaving the rest of the prompt and overall image quality intact. A reader would care because the method offers flexible editing and safety controls for high-quality image generators without the expense of full retraining. The same mechanism also supports shifting outputs toward target styles or biasing the presence of objects.

Core claim

SHIFT learns steering vectors from intermediate activations and applies them dynamically to chosen layers and timesteps in DiT models, thereby suppressing target visual concepts, moving generations into desired style domains, or biasing samples toward specific objects, all while preserving prompt adherence and sample quality.

What carries the argument

Steering vectors applied to hidden intermediate activations in the DiT flow transformer to guide concept presence during inference.

If this is right

DiT models can be adapted for concept removal or style control without any retraining step.
The same steering mechanism works for suppressing targets, adding styles, or changing object content.
Effective control holds across varied prompts and different target concepts.
Generation quality and adherence to the original prompt remain largely unchanged.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The approach could be tested on other transformer-based image or video generators beyond DiT.
Deployed systems might use similar vectors for on-the-fly content moderation.
The vectors may encode interpretable directions in the model's internal representation of scenes.
Pairing SHIFT with existing prompt techniques could produce finer-grained edits.

Load-bearing premise

Steering vectors can be found that selectively suppress or shift only the intended concept without harming image quality, prompt fidelity, or consistency across prompts and timesteps.

What would settle it

Applying the learned vectors to a fresh set of prompts produces images that either retain the target concept or show clear losses in visual quality and prompt match compared with the unsteered baseline.

Figures

Figures reproduced from arXiv: 2604.09213 by Aibek Alanov, Andrey Kuznetsov, Nina Konovalova.

**Figure 1.** Figure 1: Double-stream block. Early text-to-image diffusion models were mostly U-Net based, with explicit self-attention over image latents and cross-attention for text conditioning. However, modern diffusion models adopt transformer-based architecture for improved scalability and multimodal integration. In this work we focus on Flux, which combines a CLIP pooled embedding for modulation [24] and T5 token-lev… view at source ↗

**Figure 2.** Figure 2: Steering pipeline overview with three stages: (1) dataset construction from contrastive prompt pairs, (2) steering vector computation based on mean difference and separation plane, and (3) application of the vector during inference, (4) steering inside diffusion transformer. 4.1 Where to apply steering As we mentioned before, in U-Net-based latent diffusion models, cross-attention layers provide an explici… view at source ↗

**Figure 3.** Figure 3: Flux.1[dev] origin and steered generation. * denotes that we use steering vector from Flux.1[schnell] activations to steer Flux.1[dev] The results are presented in [PITH_FULL_IMAGE:figures/full_fig_p009_3.png] view at source ↗

**Figure 4.** Figure 4: Qualitative Comparison with Baselines: the first row – original generation of FLUX.1[schnell], from top to bottom – UCE, CA and SHIFT (Ours) [PITH_FULL_IMAGE:figures/full_fig_p010_4.png] view at source ↗

**Figure 5.** Figure 5: Style Steering To evaluate style removal, we focused on two main tasks following ESD. First, we assessed the ability to remove the style of a famous artist like Van Gogh while preserving similar artists (Picasso, Rembrandt, Warhol, Car- [PITH_FULL_IMAGE:figures/full_fig_p010_5.png] view at source ↗

**Figure 5.** Figure 5: Qualitative Comparison with Baselines for Flux.1[dev] (left) and Flux.1[schnell] (right). * denotes that we use steering vector from Flux.1[schnell] activations to steer Flux.1[dev] [PITH_FULL_IMAGE:figures/full_fig_p011_5.png] view at source ↗

**Figure 6.** Figure 6: Steering to remove small concrete objects: glasses, hat, red lipstick; first row: origin image, second: steered 5.4 Ablations In this section, we provide a comprehensive ablation analysis of the proposed steering process. Our study is categorized into two parts: (i) an evaluation of steering strength for abstract concept erasure (e.g., nudity) while preserving image generation quality on COCO dataset, and … view at source ↗

**Figure 7.** Figure 7: Steering to remove concrete concept, ablations for 3 different close concepts: Mickey, Snoopy and Spongebob. The * in the last row indicates step-specific steering vectors. DT – indicates steering strength, CLIP indicates pooled text encoder strength [PITH_FULL_IMAGE:figures/full_fig_p012_7.png] view at source ↗

**Figure 8.** Figure 8: Examples of image generations on the COCO dataset using Flux.1[dev], Flux.1[schnell], and steered Flux.1[dev]*, Flux.1[dev] and Flux.1[schnell] generations, where * denotes steering Flux.1[dev] with the activation vector derived from Flux.1[schnell]. high-quality erasure with minimal degradation in FID and virtually no decrease in CLIP score. We additionally present images generated without steering and w… view at source ↗

**Figure 9.** Figure 9: Qualitative results for Van Gogh style erasure. First row: original Flux.1[dev] and Flux.1[schnell] generations. Second row: UCE and ESD competitors. Last rows: our steered generations, where * denotes steering Flux.1[dev] with the activation vector derived from Flux.1[schnell]. 7.6 Add concept task Task formulation We additionally evaluate our steering method on a concept addition task. Unlike editing, th… view at source ↗

**Figure 10.** Figure 10: Qualitative results demonstrating preservation of other artists’ styles (Picasso, Rembrandt, Warhol, Caravaggio) while erasing Van Gogh’s style using our steered model (SHIFT) [PITH_FULL_IMAGE:figures/full_fig_p022_10.png] view at source ↗

**Figure 11.** Figure 11: Dataset examples for constructing steering vectors for the "add concept" task. Provided images are for target concepts: "red lipstick", "hat", and "smile". regularization to ensure that steering remains within the manifold of trained activations. For validation, we construct a small dataset consisting of 80 diverse prompts and evaluate the steering performance. We additionally conduct experiments by appl… view at source ↗

**Figure 12.** Figure 12: Ablation of number of steered DiT blocks for adding "hat" using Flux.1[schnell]. Rows (top to bottom): origin, 7, 9, 12 all blocks. DiT strength: 1000; text encoder strength: 1.5. Origin image all blocks 0-7 blocks 0-9 blocks 0-12 blocks [PITH_FULL_IMAGE:figures/full_fig_p024_12.png] view at source ↗

**Figure 13.** Figure 13: Ablation of number of steered DiT blocks for adding "smile" using Flux.1[schnell]. Rows (top to bottom): origin, 7, 9, 12 all blocks. DiT strength: 1000; text encoder strength: 1.5. Task formulation Our method can also be applied to the concept switching task, where the goal is to replace one target concept with another (e.g., cat → dog or man → woman). For the steering dataset, we employ both positive an… view at source ↗

**Figure 14.** Figure 14: Ablation of number of steered DiT blocks for adding "red lipstick" using Flux.1[schnell]. Rows (top to bottom): origin, 7, 9, 12 all blocks. DiT strength: 1000; text encoder strength: 1.5. Origin image all blocks 0-7 blocks 0-9 blocks 0-12 blocks [PITH_FULL_IMAGE:figures/full_fig_p025_14.png] view at source ↗

**Figure 15.** Figure 15: Ablation of number of steered DiT blocks for adding "apple" using Flux.1[schnell]. Rows (top to bottom): origin, 7, 9, 12 all blocks. DiT strength: 1000; text encoder strength: 1.5. negative concept 20 pairs of prompts. Example of training dataset for steering is presented in [PITH_FULL_IMAGE:figures/full_fig_p025_15.png] view at source ↗

**Figure 16.** Figure 16: Ablation of number of timesteps for adding "smile" (left), "apple" (right) using Flux.1[schnell]. Rows (top to bottom): origin, vector from 0 step applied to all steps; all steps; 0 step; 0-1 steps and 2-3 steps steering. DiT strength: 1000; text encoder strength: 1.5. Experiments and Results We tested Flux.1[schnell] model for "switch concept" task. To evaluate performance, we curated a diverse validati… view at source ↗

**Figure 17.** Figure 17: Ablation of number of timesteps for adding "red lipstick" (left), "hat" (right) using Flux.1[schnell]. Rows (top to bottom): origin, vector from 0 step applied to all steps; all steps; 0 step; 0-1 steps and 2-3 steps steering. DiT strength: 1000; text encoder strength: 1.5. Experiments and Results We tested Flux.1[schnell] model for "stylization" task. We evaluated our method on several stylization tasks,… view at source ↗

**Figure 18.** Figure 18: Dataset examples for constructing steering vectors for the "add concept" task. Provided images are for target concepts: "banana → apple, young → old". the DiT blocks. We tested the approach across the add, switch, and stylization tasks, with results for all concepts shown in [PITH_FULL_IMAGE:figures/full_fig_p028_18.png] view at source ↗

**Figure 19.** Figure 19: Ablation of number of steered DiT blocks for switching concepts "young → old" using Flux.1[schnell]. Rows (top to bottom): origin, all blocks, 7, 9, 12. DiT strength: 1000; text encoder strength: 3. Origin image all blocks 0-7 blocks 0-9 blocks 0-12 blocks [PITH_FULL_IMAGE:figures/full_fig_p029_19.png] view at source ↗

**Figure 20.** Figure 20: Ablation of number of steered DiT blocks for switching concepts "banana → apple" using Flux.1[schnell]. Rows (top to bottom): origin, all blocks, 7, 9, 12. DiT strength: 1000; text encoder strength: 3 [PITH_FULL_IMAGE:figures/full_fig_p029_20.png] view at source ↗

**Figure 21.** Figure 21: Ablation of number of steered DiT blocks for switching concepts "dog → cat" using Flux.1[schnell]. Rows (top to bottom): origin, all blocks, 7, 9, 12. DiT strength: 1000; text encoder strength: 3. Origin image all blocks 0-7 blocks 0-9 blocks 0-12 blocks [PITH_FULL_IMAGE:figures/full_fig_p030_21.png] view at source ↗

**Figure 22.** Figure 22: Ablation of number of steered DiT blocks for switching concepts "bicycle → car" using Flux.1[schnell]. Rows (top to bottom): origin, all blocks, 7, 9, 12. DiT strength: 1000; text encoder strength: 1.5 [PITH_FULL_IMAGE:figures/full_fig_p030_22.png] view at source ↗

**Figure 23.** Figure 23: Ablation of number of timesteps for switching "banana → apple" (left), "dog → cat" (right) using Flux.1[schnell]. Rows (top to bottom): origin, vector from 0 step applied to all steps; all steps; 0 step; 0-1 steps and 2-3 steps steering. DiT strength: 1000; text encoder strength: 3. Origin image one vec all steps 0 step 0-1 steps 2-3 steps [PITH_FULL_IMAGE:figures/full_fig_p031_23.png] view at source ↗

**Figure 24.** Figure 24: Ablation of number of timesteps for switching "young → old" (left), "bicycle → car" (right) using Flux.1[schnell]. Rows (top to bottom): origin, vector from 0 step applied to all steps; all steps; 0 step; 0-1 steps and 2-3 steps steering. DiT strength: 1000; text encoder strength: 3/1.5 [PITH_FULL_IMAGE:figures/full_fig_p031_24.png] view at source ↗

**Figure 25.** Figure 25: Dataset examples for constructing steering vectors for the "add concept" task. Provided images are for target concepts: "cyberpunk", "sketch" and "impressionism" style [PITH_FULL_IMAGE:figures/full_fig_p032_25.png] view at source ↗

**Figure 26.** Figure 26: Ablation of number of steered DiT blocks for adding style "cyberpunk" using Flux.1[schnell]. Rows (top to bottom): origin, 7, 9, 12 all blocks. DiT strength: 1000; text encoder strength: 1.5. Origin image all blocks 0-7 blocks 0-9 blocks 0-12 blocks [PITH_FULL_IMAGE:figures/full_fig_p033_26.png] view at source ↗

**Figure 27.** Figure 27: Ablation of number of steered DiT blocks for adding style "impressionism" using Flux.1[schnell]. Rows (top to bottom): origin, 7, 9, 12 all blocks. DiT strength: 1000; text encoder strength: 1.5 [PITH_FULL_IMAGE:figures/full_fig_p033_27.png] view at source ↗

**Figure 28.** Figure 28: Ablation of number of steered DiT blocks for adding style "sketch" using Flux.1[schnell]. Rows (top to bottom): origin, 7, 9, 12 all blocks. DiT strength: 1000; text encoder strength: 1.5. cat apple age (old) [PITH_FULL_IMAGE:figures/full_fig_p034_28.png] view at source ↗

**Figure 29.** Figure 29: Ablation of steering-vector transfer from Flux.1[schnell] to Flux.1[dev] for the "switch concept" task. Rows show (top to bottom): base generation (neutral prompt), steered result with "dog → cat", steered result with "banana → apple", steered result with "young → old". DiT steering strength: 1000; text encoder strength: 3 [PITH_FULL_IMAGE:figures/full_fig_p034_29.png] view at source ↗

**Figure 30.** Figure 30: Ablation of steering-vector transfer from Flux.1[schnell] to Flux.1[dev] for the "add concept" task. Rows show (top to bottom): steered result with "hat", steered result with "smile", steered result with "glasses". DiT steering strength: 1000; text encoder strength: 1.5. empty prompt age (old) gender (woman) cat empty prompt apple smile red lipstick empty prompt anime cyberpunk sketch [PITH_FULL_IMAGE:fi… view at source ↗

**Figure 31.** Figure 31: Steering results for unconditional (empty-prompt) generations. Top row: addconcept task; middle row: switch-concept task; bottom row: stylization task [PITH_FULL_IMAGE:figures/full_fig_p035_31.png] view at source ↗

**Figure 32.** Figure 32: Classifier scores for different blocks and steps. Left: steering tasks (stylization → switch → add concept); right: classifiers for the switch task (linear SVM → logistic regression → RBF SVM) [PITH_FULL_IMAGE:figures/full_fig_p036_32.png] view at source ↗

read the original abstract

Diffusion models have become leading approaches for high-fidelity image generation. Recent DiT-based diffusion models, in particular, achieve strong prompt adherence while producing high-quality samples. We propose SHIFT, a simple but effective and lightweight framework for concept removal in DiT diffusion models via targeted manipulation of intermediate activations at inference time, inspired by activation steering in large language models. SHIFT learns steering vectors that are dynamically applied to selected layers and timesteps to suppress unwanted visual concepts while preserving the prompt's remaining content and overall image quality. Beyond suppression, the same mechanism can shift generations into a desired \emph{style domain} or bias samples toward adding or changing target objects. We demonstrate that SHIFT provides effective and flexible control over DiT generation across diverse prompts and targets without time-consuming retraining.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

SHIFT adapts activation steering from LLMs to DiT models for inference-time concept control, which is a practical extension but rests on unshown empirical details.

read the letter

SHIFT takes the activation steering idea from language models and applies it to hidden states in DiT diffusion models. The method learns vectors that get added to selected layers and timesteps during inference to suppress concepts, shift styles, or bias toward certain objects. This happens without any model retraining, which is the main practical angle here. The abstract positions it as lightweight and flexible across prompts, and that framing is straightforward enough to follow. The paper does a reasonable job of showing how the same mechanism can handle removal, style transfer, and object changes rather than just one narrow task. That breadth is useful if the results hold up. What stands out is the focus on dynamic application at specific points in the diffusion process instead of a blanket intervention. The approach builds directly on prior steering work without pretending to invent the core trick. The soft spots are mostly around evidence. The abstract claims effective control without hurting quality or prompt adherence, but we have no numbers, no ablation tables, and no direct comparisons to baselines like prompt editing or fine-tuning. If the steering vectors only work cleanly on simple concepts and start to leak into other image elements on harder cases, that would limit the real-world value. The assumption that concepts live in steerable directions in the activations needs the experiments to back it up, and those details are not visible yet. This paper is aimed at people building or using DiT-based generators who want quick post-hoc edits for safety or customization. A reader already working on controllable diffusion would pick up the implementation ideas and see how it differs from training-based methods. It deserves a serious referee because the idea is timely and the claims are testable. The experiments will decide whether the method is ready for wider use or needs more validation on side effects and generalization.

Referee Report

2 major / 2 minor

Summary. The paper proposes SHIFT, a lightweight inference-time framework for DiT-based diffusion models that learns steering vectors from intermediate activations and applies them dynamically to selected layers and timesteps. The goal is to suppress target visual concepts (or shift style/object presence) while preserving prompt adherence and sample quality, without any retraining or fine-tuning of the underlying model.

Significance. If the empirical claims hold, the work would be significant for practical control of large generative models: it offers a training-free alternative to concept erasure or editing that is both computationally cheap and flexible across prompts. The approach builds on activation-steering ideas from LLMs and adapts them to the diffusion trajectory, which could generalize to other transformer-based generators.

major comments (2)

[Method] The central claim that steering vectors can be learned to selectively suppress concepts without side effects on fidelity or adherence rests on an unstated assumption that concepts are additively separable in the chosen intermediate activations. No derivation or analysis is provided showing why this holds for DiT blocks across timesteps; the method description would benefit from an explicit statement of the optimization objective used to obtain the vectors.
[Experiments] The abstract asserts effectiveness 'across diverse prompts and targets' and preservation of image quality, yet the visible text contains no quantitative metrics, ablation tables, or failure-case analysis. Without reported numbers (e.g., concept-suppression accuracy, FID, CLIP similarity, or human preference scores) it is impossible to assess whether the steering introduces artifacts or prompt drift.

minor comments (2)

[Title/Abstract] The title refers to 'Flow Transformers' while the abstract discusses DiT diffusion models; a brief clarification of the relationship (or whether 'Flow' denotes a specific variant) would avoid confusion.
[Method] Notation for the steering vector application (e.g., which exact layers and timestep ranges are selected) should be formalized with an equation or pseudocode for reproducibility.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback on our work. We address the two major comments point by point below, clarifying the method and committing to strengthened experimental reporting in the revision.

read point-by-point responses

Referee: [Method] The central claim that steering vectors can be learned to selectively suppress concepts without side effects on fidelity or adherence rests on an unstated assumption that concepts are additively separable in the chosen intermediate activations. No derivation or analysis is provided showing why this holds for DiT blocks across timesteps; the method description would benefit from an explicit statement of the optimization objective used to obtain the vectors.

Authors: The steering vectors are obtained by computing the difference between mean activations conditioned on prompts that include versus exclude the target concept, at the selected layers and timesteps. This direction is then scaled and added during inference. While the approach is primarily empirical and draws from successful activation steering in LLMs, we agree that the manuscript would be improved by an explicit statement of the objective (maximizing concept suppression subject to bounded deviation from the original activation trajectory) and a short discussion of the additive separability assumption. We will add this clarification and a brief supporting analysis in the revised method section. revision: yes
Referee: [Experiments] The abstract asserts effectiveness 'across diverse prompts and targets' and preservation of image quality, yet the visible text contains no quantitative metrics, ablation tables, or failure-case analysis. Without reported numbers (e.g., concept-suppression accuracy, FID, CLIP similarity, or human preference scores) it is impossible to assess whether the steering introduces artifacts or prompt drift.

Authors: The current version emphasizes qualitative results to demonstrate flexibility across prompts and targets. We acknowledge that quantitative support is necessary to substantiate the claims of preserved fidelity and adherence. In the revision we will add tables reporting CLIP-based concept suppression scores, FID and CLIP similarity for image quality and prompt adherence, ablation studies on layer/timestep selection, and a discussion of observed failure cases. revision: yes

Circularity Check

0 steps flagged

No significant circularity

full rationale

The paper proposes SHIFT as an empirical framework for steering intermediate activations in DiT models to suppress concepts at inference time. No mathematical derivations, equations, first-principles results, or prediction claims appear in the abstract or described content. The method is framed as learning and applying steering vectors without any self-definitional reductions, fitted inputs renamed as predictions, or load-bearing self-citations that collapse the central claim to its own inputs. The approach is self-contained as a practical technique with no derivation chain to inspect for circularity.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only review provides no information on free parameters, axioms, or invented entities; all such elements are unknown.

pith-pipeline@v0.9.0 · 5427 in / 989 out tokens · 27772 ms · 2026-05-10T18:30:25.714644+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

37 extracted references · 13 canonical work pages · 7 internal anchors

[1]

Science391(6787), 787–792 (2026)

Beaglehole, D., Radhakrishnan, A., Boix-Adsera, E., Belkin, M.: Toward universal steering and monitoring of ai models. Science391(6787), 787–792 (2026)

2026
[2]

arXiv preprint arXiv:2212.06013 (2022)

Brack, M., Schramowski, P., Friedrich, F., Hintersdorf, D., Kersting, K.: The stable artist: Steering semantics in diffusion latent space. arXiv preprint arXiv:2212.06013 (2022)

work page arXiv 2022
[3]

Erasing undesirable concepts in diffusion models with adversarial preservation.arXiv preprint arXiv:2410.15618, 2024

Bui, A., Vuong, L., Doan, K., Le, T., Montague, P., Abraham, T., Phung, D.: Erasing undesirable concepts in diffusion models with adversarial preservation. arXiv preprint arXiv:2410.15618 (2024)

work page arXiv 2024
[4]

Advances in neural information processing systems34, 8780–8794 (2021)

Dhariwal, P., Nichol, A.: Diffusion models beat gans on image synthesis. Advances in neural information processing systems34, 8780–8794 (2021)

2021
[5]

In: Forty-first international conference on machine learning (2024)

Esser, P., Kulal, S., Blattmann, A., Entezari, R., Müller, J., Saini, H., Levi, Y., Lorenz, D., Sauer, A., Boesel, F., et al.: Scaling rectified flow transformers for high-resolution image synthesis. In: Forty-first international conference on machine learning (2024)

2024
[6]

arXiv e-prints pp

Gaintseva, T., Ma, C., Liu, Z., Benning, M., Slabaugh, G., Deng, J., Elezi, I.: Casteer: Steering diffusion models for controllable generation. arXiv e-prints pp. arXiv–2503 (2025)

2025
[7]

In: Proceedings of the IEEE/CVF international conference on computer vision

Gandikota, R., Materzynska, J., Fiotto-Kaufman, J., Bau, D.: Erasing concepts from diffusion models. In: Proceedings of the IEEE/CVF international conference on computer vision. pp. 2426–2436 (2023)

2023
[8]

In: Proceedings of the IEEE/CVF winter conference on applications of computer vision

Gandikota, R., Orgad, H., Belinkov, Y., Materzyńska, J., Bau, D.: Unified concept editing in diffusion models. In: Proceedings of the IEEE/CVF winter conference on applications of computer vision. pp. 5111–5120 (2024)

2024
[9]

In: Forty-second International Conference on Machine Learning (2025)

Gao, D., Lu, S., Zhou, W., Chu, J., Zhang, J., Jia, M., Zhang, B., Fan, Z., Zhang, W.: Eraseanything: Enabling concept erasure in rectified flow transformers. In: Forty-second International Conference on Machine Learning (2025)

2025
[10]

Li, and Jacob Andreas

Hernandez, E., Li, B.Z., Andreas, J.: Inspecting and editing knowledge represen- tations in language models. arXiv preprint arXiv:2304.00740 (2023)

work page arXiv 2023
[11]

Advances in neural information processing systems33, 6840–6851 (2020)

Ho, J., Jain, A., Abbeel, P.: Denoising diffusion probabilistic models. Advances in neural information processing systems33, 6840–6851 (2020)

2020
[12]

Classifier-Free Diffusion Guidance

Ho, J., Salimans, T.: Classifier-free diffusion guidance. arXiv preprint arXiv:2207.12598 (2022)

work page internal anchor Pith review Pith/arXiv arXiv 2022
[13]

Editing Models with Task Arithmetic

Ilharco, G., Ribeiro, M.T., Wortsman, M., Gururangan, S., Schmidt, L., Ha- jishirzi, H., Farhadi, A.: Editing models with task arithmetic. arXiv preprint arXiv:2212.04089 (2022) 16 Konovalova et al

work page internal anchor Pith review arXiv 2022
[14]

In: Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

Khashabi, D., Lyu, X., Min, S., Qin, L., Richardson, K., Welleck, S., Hajishirzi, H., Khot, T., Sabharwal, A., Singh, S., et al.: Prompt waywardness: The curious case of discretized interpretation of continuous prompts. In: Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Tec...

2022
[15]

In: Proceedings of the IEEE/CVF international conference on computer vision

Kumari, N., Zhang, B., Wang, S.Y., Shechtman, E., Zhang, R., Zhu, J.Y.: Ablat- ing concepts in text-to-image diffusion models. In: Proceedings of the IEEE/CVF international conference on computer vision. pp. 22691–22702 (2023)

2023
[16]

FLUX.1 Kontext: Flow Matching for In-Context Image Generation and Editing in Latent Space

Labs, B.F., Batifol, S., Blattmann, A., Boesel, F., Consul, S., Diagne, C., Dock- horn, T., English, J., English, Z., Esser, P., et al.: Flux. 1 kontext: Flow match- ing for in-context image generation and editing in latent space. arXiv preprint arXiv:2506.15742 (2025)

work page internal anchor Pith review Pith/arXiv arXiv 2025
[17]

In: Proceedings of the 2021 conference on empirical methods in natural language processing

Lester, B., Al-Rfou, R., Constant, N.: The power of scale for parameter-efficient prompt tuning. In: Proceedings of the 2021 conference on empirical methods in natural language processing. pp. 3045–3059 (2021)

2021
[18]

Advances in Neural Information Processing Systems36, 41451–41530 (2023)

Li, K., Patel, O., Viégas, F., Pfister, H., Wattenberg, M.: Inference-time inter- vention: Eliciting truthful answers from a language model. Advances in Neural Information Processing Systems36, 41451–41530 (2023)

2023
[19]

In: The Thirteenth International Conference on Learning Representations (2025)

Liu, S., Ye, H., Zou, J.: Reducing hallucinations in large vision-language models via latent space steering. In: The Thirteenth International Conference on Learning Representations (2025)

2025
[20]

In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition

Lyu, M., Yang, Y., Hong, H., Chen, H., Jin, X., He, Y., Xue, H., Han, J., Ding, G.: One-dimensional adapter to rule them all: Concepts diffusion models and erasing applications. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp. 7559–7568 (2024)

2024
[21]

Advances in neural information processing systems35, 17359–17372 (2022)

Meng, K., Bau, D., Andonian, A., Belinkov, Y.: Locating and editing factual associ- ations in gpt. Advances in neural information processing systems35, 17359–17372 (2022)

2022
[22]

Peebles,W.,Xie,S.:Scalablediffusionmodelswithtransformers.In:Proceedingsof the IEEE/CVF international conference on computer vision. pp. 4195–4205 (2023)

2023
[23]

SDXL: Improving Latent Diffusion Models for High-Resolution Image Synthesis

Podell, D., English, Z., Lacey, K., Blattmann, A., Dockhorn, T., Müller, J., Penna, J., Rombach, R.: Sdxl: Improving latent diffusion models for high-resolution image synthesis. arXiv preprint arXiv:2307.01952 (2023)

work page internal anchor Pith review Pith/arXiv arXiv 2023
[24]

In: International conference on machine learning

Radford, A., Kim, J.W., Hallacy, C., Ramesh, A., Goh, G., Agarwal, S., Sastry, G., Askell, A., Mishkin, P., Clark, J., et al.: Learning transferable visual models from natural language supervision. In: International conference on machine learning. pp. 8748–8763. PmLR (2021)

2021
[25]

Journal of machine learning research21(140), 1–67 (2020)

Raffel, C., Shazeer, N., Roberts, A., Lee, K., Narang, S., Matena, M., Zhou, Y., Li, W., Liu, P.J.: Exploring the limits of transfer learning with a unified text-to-text transformer. Journal of machine learning research21(140), 1–67 (2020)

2020
[26]

Red-teaming the stable diffusion safety filter.arXiv preprint arXiv:2210.04610, 2022

Rando, J., Paleka, D., Lindner, D., Heim, L., Tramèr, F.: Red-teaming the stable diffusion safety filter. arXiv preprint arXiv:2210.04610 (2022)

work page arXiv 2022
[27]

arXiv preprint arXiv:2410.23054 (2024)

Rodriguez, P., Blaas, A., Klein, M., Zappella, L., Apostoloff, N., Cuturi, M., Suau, X.: Controlling language and diffusion models by transporting activations. arXiv preprint arXiv:2410.23054 (2024)

work page arXiv 2024
[28]

In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition

Rombach, R., Blattmann, A., Lorenz, D., Esser, P., Ommer, B.: High-resolution image synthesis with latent diffusion models. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp. 10684–10695 (2022)

2022
[29]

In: Proceedings of the SHIFT 17 IEEE/CVF conference on computer vision and pattern recognition

Schramowski, P., Brack, M., Deiseroth, B., Kersting, K.: Safe latent diffusion: Mitigating inappropriate degeneration in diffusion models. In: Proceedings of the SHIFT 17 IEEE/CVF conference on computer vision and pattern recognition. pp. 22522– 22531 (2023)

2023
[30]

Advances in neural information processing systems35, 25278–25294 (2022)

Schuhmann, C., Beaumont, R., Vencu, R., Gordon, C., Wightman, R., Cherti, M., Coombes, T., Katta, A., Mullis, C., Wortsman, M., et al.: Laion-5b: An open large- scale dataset for training next generation image-text models. Advances in neural information processing systems35, 25278–25294 (2022)

2022
[31]

In: Findings of the Association for Computational Linguistics: ACL 2022

Subramani, N., Suresh, N., Peters, M.E.: Extracting latent steering vectors from pretrained language models. In: Findings of the Association for Computational Linguistics: ACL 2022. pp. 566–581 (2022)

2022
[33]

Steering Language Models With Activation Engineering

Turner, A.M., Thiergart, L., Leech, G., Udell, D., Vazquez, J.J., Mini, U., Mac- Diarmid, M.: Steering language models with activation engineering, 2024. URL https://arxiv. org/abs/2308.102482308(2024)

work page internal anchor Pith review arXiv 2024
[34]

arXiv preprint arXiv:2507.13386 (2025)

Zhang, Y., Jin, E., Dong, Y., Wu, Y., Torr, P., Khakzar, A., Stegmaier, J., Kawaguchi, K.: Minimalist concept erasure in generative models. arXiv preprint arXiv:2507.13386 (2025)

work page arXiv 2025
[35]

In: NeurIPS ML Safety Workshop (2022)

Zhou, Y., Muresanu, A.I., Han, Z., Paster, K., Pitis, S., Chan, H., Ba, J.: Steering large language models using ape. In: NeurIPS ML Safety Workshop (2022)

2022
[36]

Fine-Tuning Language Models from Human Preferences

Ziegler, D.M., Stiennon, N., Wu, J., Brown, T.B., Radford, A., Amodei, D., Chris- tiano, P., Irving, G.: Fine-tuning language models from human preferences. arXiv preprint arXiv:1909.08593 (2019)

work page internal anchor Pith review arXiv 1909
[37]

Representation Engineering: A Top-Down Approach to AI Transparency

Zou, A., Phan, L., Chen, S., Campbell, J., Guo, P., Ren, R., Pan, A., Yin, X., Mazeika, M., Dombrowski, A.K., et al.: Representation engineering: A top-down approach to ai transparency. arXiv preprint arXiv:2310.01405 (2023) 18 Konovalova et al. 7 Appendix 7.1 Implementation details Baselines UCE.We follow the official UCE implementation. For nudity erasu...

work page internal anchor Pith review Pith/arXiv arXiv 2023
[38]

Fluffy white cat

Both models were tested with DiT blocks steering strength250. Flux.1[schnell] Flux.1[dev] origin UCESteered origin ESDSteered* Steered Van Gogh style erase Fig. 9:Qualitative results for Van Gogh style erasure. First row: original Flux.1[dev] and Flux.1[schnell] generations. Second row: UCE and ESD competitors. Last rows: our steered generations, where * ...