Salt: Steering activations towards leakage-free thinking in chain of thought

Batra, S · 2025 · arXiv 2511.07772

3 Pith papers cite this work. Polarity classification is still indexing.

3 Pith papers citing it

read on arXiv browse 3 citing papers

citation-role summary

background 1

citation-polarity summary

background 1

representative citing papers

Hidden Thoughts Are Not Secret: Reasoning Trace Exposure in LLMs

cs.AI · 2026-05-30 · unverdicted · novelty 7.0

REP elicits hidden LLM reasoning traces via in-context shadow demonstrations, raising similarity to internal traces while retaining distillation utility across datasets and models.

Security Considerations for Multi-agent Systems

cs.CR · 2026-03-09 · unverdicted · novelty 6.0

No existing AI security framework covers a majority of the 193 identified multi-agent system threats in any category, with OWASP Agentic Security Initiative achieving the highest overall coverage at 65.3%.

Quantifying Subliminal Behavioral Transfer Ratios in Language Model Distillation

cs.LG · 2026-06-09 · unverdicted · novelty 5.0

Steering Llama-2-7B-Chat and Qwen2.5-7B-Instruct teachers and distilling students on benign data transfers measurable jailbreak susceptibility, with Llama showing threshold behavior at α = -0.15 and Qwen reaching transfer ratios up to 0.61.

citing papers explorer

Showing 1 of 1 citing paper after filters.

Security Considerations for Multi-agent Systems cs.CR · 2026-03-09 · unverdicted · none · ref 249
No existing AI security framework covers a majority of the 193 identified multi-agent system threats in any category, with OWASP Agentic Security Initiative achieving the highest overall coverage at 65.3%.

Salt: Steering activations towards leakage-free thinking in chain of thought

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer