The butterfly effect of altering prompts: How small changes and jailbreaks affect large language model performance.arXiv preprint arXiv:2401.03729

· 2024 · arXiv 2401.03729

5 Pith papers cite this work. Polarity classification is still indexing.

5 Pith papers citing it

read on arXiv browse 5 citing papers

citation-role summary

background 1

citation-polarity summary

background 1

representative citing papers

Representation Without Control: Testing the Realization Effect in Language Models

cs.AI · 2026-05-24 · unverdicted · novelty 6.0

LLMs display prompt-sensitive risk behavior and a linearly decodable realization-status signal in Gemma's residual stream, yet activation steering along this direction fails to shift downstream risk choices.

Medical Context Distorts Decisions in Clinical Vision Language Models

cs.CV · 2026-05-17 · unverdicted · novelty 6.0

Clinical VLMs over-rely on text modality, irrelevant clinical history, and prompt wording when making chest x-ray decisions on MIMIC-CXR data.

Steered Generation via Gradient-Based Optimization on Sparse Query Features

cs.LG · 2026-05-21 · unverdicted · novelty 5.0

Prototype-Based Sparse Steering decomposes query activations with SAEs and optimizes sparse features via gradients to steer LLM outputs toward specific behaviors.

A Survey of Large Language Models for Perception and Measurement of Human Psychology

cs.CY · 2026-05-20 · unverdicted · novelty 5.0

A survey proposing a three-pillar framework to evaluate LLMs as tools for measuring latent psychological constructs and reviewing applications in personality and mental health.

Shaping Schema via Language Representation as the Next Frontier for LLM Intelligence Expanding

cs.AI · 2026-05-10 · unverdicted · novelty 3.0

Advanced language representations shape LLMs' schemas to improve knowledge activation and problem-solving.

citing papers explorer

Showing 5 of 5 citing papers.

Representation Without Control: Testing the Realization Effect in Language Models cs.AI · 2026-05-24 · unverdicted · none · ref 12
LLMs display prompt-sensitive risk behavior and a linearly decodable realization-status signal in Gemma's residual stream, yet activation steering along this direction fails to shift downstream risk choices.
Medical Context Distorts Decisions in Clinical Vision Language Models cs.CV · 2026-05-17 · unverdicted · none · ref 14
Clinical VLMs over-rely on text modality, irrelevant clinical history, and prompt wording when making chest x-ray decisions on MIMIC-CXR data.
Steered Generation via Gradient-Based Optimization on Sparse Query Features cs.LG · 2026-05-21 · unverdicted · none · ref 36
Prototype-Based Sparse Steering decomposes query activations with SAEs and optimizes sparse features via gradients to steer LLM outputs toward specific behaviors.
A Survey of Large Language Models for Perception and Measurement of Human Psychology cs.CY · 2026-05-20 · unverdicted · none · ref 196
A survey proposing a three-pillar framework to evaluate LLMs as tools for measuring latent psychological constructs and reviewing applications in personality and mental health.
Shaping Schema via Language Representation as the Next Frontier for LLM Intelligence Expanding cs.AI · 2026-05-10 · unverdicted · none · ref 64
Advanced language representations shape LLMs' schemas to improve knowledge activation and problem-solving.

The butterfly effect of altering prompts: How small changes and jailbreaks affect large language model performance.arXiv preprint arXiv:2401.03729

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer