Overcoming State Inertia in Full-Duplex Spoken Language Models via Activation Steering
Pith reviewed 2026-06-27 13:21 UTC · model grok-4.3
The pith
Activation steering with a perception vector shifts full-duplex speech models from generative to perceptive state during interruptions.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Full-duplex spoken language models exhibit stream-specific predictive patterns in hidden representations and dynamically modulate between a generative state aligned with model output and a perceptive state aligned with incoming user input. During abrupt user interruptions the modulation lags, producing transient bias toward the generative state that causes the model to miss the beginning of the incoming input. Activation steering with a perception vector extracted from the hidden-state analysis shifts the model into the perceptive state and raises response correctness on the Zero-Buffer Benchmark from 28 percent to 45 percent and initial-word occurrence rate from 40 percent to 72 percent on
What carries the argument
The perception vector, a direction in activation space obtained by contrasting hidden representations during perceptive versus generative contexts, which when added steers the model's internal predictive focus toward the incoming user stream.
If this is right
- Improves response correctness from 28 percent to 45 percent and initial-word occurrence rate from 40 percent to 72 percent on PersonaPlex.
- Produces gains across multiple state-of-the-art full-duplex spoken language models.
- Requires no fine-tuning and adds only negligible computational overhead.
- Enables immediate comprehension when user speech begins abruptly, as measured by the Zero-Buffer Benchmark.
Where Pith is reading between the lines
- The same vector extraction could be applied to other abrupt context switches such as topic changes or speaker hand-offs.
- Repeated steering during extended conversations might be tested to check whether it preserves coherence outside interruption events.
- Vector-based steering may offer a general route to reduce retraining needs when adapting spoken models to new duplex behaviors.
Load-bearing premise
The hidden-state analysis isolates a single perception vector whose addition reliably shifts the model to the perceptive state without side effects on other behaviors or non-interruption turns.
What would settle it
An experiment in which models given the perception vector show no gain in correctness or initial-word occurrence rate on the Zero-Buffer Benchmark, or show degraded performance on ordinary non-interruption dialogue tasks.
Figures
read the original abstract
Full-duplex spoken language models (FD-SLMs) enable seamless speech interaction by allowing models to listen and speak simultaneously, yet the internal mechanism by which they coordinate listening and speaking remains underexplored. We analyze the predictive behavior encoded in FD-SLM hidden representations and find that they exhibit stream-specific predictive patterns: during listening, they preferentially predict the incoming user stream, whereas during speaking, they preferentially predict the model output stream. Building on this observation, we show that FD-SLMs dynamically modulate their internal predictive focus between two states: a generative state aligned with model output generation and a perceptive state aligned with incoming user input. However, this modulation can lag behind abrupt changes in conversational context. During user interruptions, the model remains transiently biased toward the generative state before transitioning into the perceptive state, causing it to miss the beginning of the incoming input. We term this delayed internal transition state inertia. To quantify its downstream impact, we introduce the Zero-Buffer Benchmark (ZBB), a diagnostic benchmark for evaluating immediate interruption comprehension when user speech begins abruptly. We evaluate this setting using response correctness and initial-word occurrence rate (IWOR). Finally, we mitigate state inertia through activation steering with a perception vector, a training-free intervention with little additional computational overhead. Across multiple state-of-the-art FD-SLMs, activation steering substantially improves interruption handling; for example, on PersonaPlex, it improves correctness from 28% to 45% and IWOR from 40% to 72% without any fine-tuning.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper claims that full-duplex spoken language models exhibit stream-specific predictive patterns in hidden states (preferring user input during listening and model output during speaking) and dynamically switch between generative and perceptive states, but suffer from 'state inertia' during abrupt user interruptions, remaining transiently biased toward generation and missing initial input. It introduces the Zero-Buffer Benchmark (ZBB) to measure this via response correctness and initial-word occurrence rate (IWOR). The authors propose a training-free activation steering intervention using a 'perception vector' derived from hidden-state contrasts to induce the perceptive state, reporting concrete gains such as on PersonaPlex (correctness 28% to 45%, IWOR 40% to 72%) across multiple FD-SLMs without fine-tuning.
Significance. If the central result holds, the work supplies a lightweight, post-training intervention for improving interruption handling in FD-SLMs, a practically relevant capability for natural spoken dialogue systems. The introduction of the ZBB diagnostic benchmark and the empirical demonstration of numeric gains on defined interruption settings constitute positive contributions. The approach extends activation steering techniques with concrete benchmark numbers.
major comments (2)
- [§5 (Evaluation and results)] §5 (Evaluation and results): The reported gains (e.g., PersonaPlex correctness 28%→45%, IWOR 40%→72%) are measured only on the interruption-specific ZBB; no metrics are supplied for non-interruption spoken QA, generation quality on ordinary turns, or latency. This is load-bearing for the central claim that the single perception vector shifts only the generative/perceptive state without side effects.
- [§3 (Hidden-state analysis)] §3 (Hidden-state analysis): The perception vector is constructed from hidden-state differences between listening and speaking streams, but the manuscript provides no ablation on vector construction details (layer selection, token averaging, or contrast method) and reports no error bars or statistical tests on the numeric improvements, leaving the reliability of the state-transition claim unverified.
minor comments (1)
- [Abstract] The abstract and introduction introduce multiple novel terms ('state inertia', 'perception vector', 'Zero-Buffer Benchmark') without inline definitions or forward references, which reduces immediate clarity.
Simulated Author's Rebuttal
We thank the referee for the detailed and constructive feedback. We address the two major comments point by point below, proposing concrete revisions to strengthen the manuscript.
read point-by-point responses
-
Referee: [§5 (Evaluation and results)] The reported gains (e.g., PersonaPlex correctness 28%→45%, IWOR 40%→72%) are measured only on the interruption-specific ZBB; no metrics are supplied for non-interruption spoken QA, generation quality on ordinary turns, or latency. This is load-bearing for the central claim that the single perception vector shifts only the generative/perceptive state without side effects.
Authors: We agree that the current evaluation is centered on the ZBB to isolate the effect of state inertia during abrupt interruptions. The perception vector is constructed specifically from listening-versus-speaking contrasts to target only the state transition. To directly address the concern about potential side effects, we will add experiments in the revision that apply the same steering to non-interruption spoken QA and generation tasks, reporting correctness, generation quality metrics, and latency to confirm that ordinary performance is preserved. revision: yes
-
Referee: [§3 (Hidden-state analysis)] The perception vector is constructed from hidden-state differences between listening and speaking streams, but the manuscript provides no ablation on vector construction details (layer selection, token averaging, or contrast method) and reports no error bars or statistical tests on the numeric improvements, leaving the reliability of the state-transition claim unverified.
Authors: We acknowledge the value of ablations and statistical reporting. In the revised manuscript we will add (i) ablations varying the layer(s) used, token-averaging strategy, and contrast formulation, and (ii) results with error bars across multiple random seeds together with statistical significance tests on the reported improvements. revision: yes
Circularity Check
No significant circularity detected.
full rationale
The paper's core chain—identifying stream-specific predictive patterns in hidden states, defining generative/perceptive states and state inertia from those observations, introducing the ZBB benchmark, and deriving a perception vector via hidden-state contrast for activation steering—does not reduce any reported result to a fitted quantity on the evaluation data or to a self-referential definition. The perception vector is computed from observed differences and applied post-hoc; gains on PersonaPlex (correctness 28%→45%, IWOR 40%→72%) are measured outcomes, not quantities forced by construction. No load-bearing self-citations, uniqueness theorems, or ansatzes imported from prior author work appear in the provided text. The derivation is self-contained against external benchmarks.
Axiom & Free-Parameter Ledger
axioms (2)
- domain assumption FD-SLM hidden representations encode distinct generative and perceptive predictive states that can be read out from activation patterns.
- domain assumption The transition between these states can be accelerated by a fixed linear intervention in activation space.
invented entities (3)
-
state inertia
no independent evidence
-
perception vector
no independent evidence
-
Zero-Buffer Benchmark
no independent evidence
Reference graph
Works this paper leans on
-
[1]
Understanding intermediate layers using linear classifier probes, 2017
Guillaume Alain and Yoshua Bengio. Understanding intermediate layers using linear classifier probes, 2017. URLhttps://openreview.net/forum?id=ryF7rTqgl
2017
-
[2]
On the landscape of spo- ken language models: A comprehensive survey.Transactions on Machine Learning Research, 2025
Siddhant Arora, Kai-Wei Chang, Chung-Ming Chien, Yifan Peng, Haibin Wu, Yossi Adi, Em- manuel Dupoux, Hung-yi Lee, Karen Livescu, and Shinji Watanabe. On the landscape of spo- ken language models: A comprehensive survey.Transactions on Machine Learning Research, 2025
2025
-
[3]
V oice activity detection (vad) in noisy environments.arXiv preprint arXiv:2312.05815, 2023
Joshua Ball. V oice activity detection (vad) in noisy environments.arXiv preprint arXiv:2312.05815, 2023
arXiv 2023
-
[4]
Nora Belrose, Igor Ostrovsky, Lev McKinney, Zach Furman, Logan Smith, Danny Halawi, Stella Biderman, and Jacob Steinhardt. Eliciting latent predictions from transformers with the tuned lens.arXiv preprint arXiv:2303.08112, 2023
Pith/arXiv arXiv 2023
-
[5]
TiCo: Time- controllable training for spoken dialogue models.arXiv preprint arXiv:2603.22267, 2026
Kai-Wei Chang, Wei-Chih Chen, En-Pei Hu, Hung-yi Lee, and James Glass. TiCo: Time- controllable training for spoken dialogue models.arXiv preprint arXiv:2603.22267, 2026
Pith/arXiv arXiv 2026
-
[6]
Game-time: Evaluating temporal dynamics in spoken language models
Kai-Wei Chang, En-Pei Hu, Chun-Yi Kuan, Wenze Ren, Wei-Chih Chen, Guan-Ting Lin, Yu Tsao, Shao-Hua Sun, Hung-yi Lee, and James Glass. Game-time: Evaluating temporal dynamics in spoken language models. InICASSP 2026-2026 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pages 16302–16306. IEEE, 2026
2026
-
[7]
Runjin Chen, Andy Arditi, Henry Sleight, Owain Evans, and Jack Lindsey. Persona vectors: Monitoring and controlling character traits in language models.arXiv preprint arXiv:2507.21509, 2025
Pith/arXiv arXiv 2025
-
[8]
Clark and Jean E
Herbert H. Clark and Jean E. Fox Tree. Using uh and um in spontaneous speak- ing.Cognition, 84(1):73–111, 2002. ISSN 0010-0277. doi: https://doi.org/10.1016/ S0010-0277(02)00017-3. URLhttps://www.sciencedirect.com/science/article/ pii/S0010027702000173
2002
-
[9]
Simple and controllable music generation.Advances in neural informa- tion processing systems, 36:47704–47720, 2023
Jade Copet, Felix Kreuk, Itai Gat, Tal Remez, David Kant, Gabriel Synnaeve, Yossi Adi, and Alexandre Défossez. Simple and controllable music generation.Advances in neural informa- tion processing systems, 36:47704–47720, 2023
2023
-
[10]
Recent advances in speech language models: A survey
Wenqian Cui, Dianzhi Yu, Xiaoqi Jiao, Ziqiao Meng, Guangyan Zhang, Qichao Wang, Steven Y Guo, and Irwin King. Recent advances in speech language models: A survey. In Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 13943–13970, 2025
2025
-
[11]
Wenqian Cui, Lei Zhu, Xiaohui Li, Zhihan Guo, Haoli Bai, Lu Hou, and Irwin King. Think before you talk: Enhancing meaningful dialogue generation in full-duplex speech language models with planning-inspired text guidance.arXiv preprint arXiv:2508.07375, 2025
Pith/arXiv arXiv 2025
-
[12]
High fidelity neural audio compression.Transactions on Machine Learning Research, 2023
Alexandre Défossez, Jade Copet, Gabriel Synnaeve, and Yossi Adi. High fidelity neural audio compression.Transactions on Machine Learning Research, 2023
2023
-
[13]
Moshi: a speech-text foundation model for real-time dialogue.arXiv preprint arXiv:2410.00037, 2024
Alexandre Défossez, Laurent Mazaré, Manu Orsini, Amélie Royer, Patrick Pérez, Hervé Jégou, Edouard Grave, and Neil Zeghidour. Moshi: a speech-text foundation model for real-time dialogue.arXiv preprint arXiv:2410.00037, 2024
Pith/arXiv arXiv 2024
-
[14]
Kimi-audio technical report.arXiv preprint arXiv:2504.18425, 2025
Ding Ding, Zeqian Ju, Yichong Leng, Songxiang Liu, Tong Liu, Zeyu Shang, Kai Shen, Wei Song, Xu Tan, Heyi Tang, et al. Kimi-audio technical report.arXiv preprint arXiv:2504.18425, 2025
Pith/arXiv arXiv 2025
-
[15]
Exploring filler words and their impact.Schwa
Emily Duvall, Aimee Robbins, Thomas Graham, and Scott Divett. Exploring filler words and their impact.Schwa. Language & Linguistics, 11:35–49, 2014. 10
2014
-
[16]
LLaMA-omni: Seamless speech interaction with large language models
Qingkai Fang, Shoutao Guo, Yan Zhou, Zhengrui Ma, Shaolei Zhang, and Yang Feng. LLaMA-omni: Seamless speech interaction with large language models. InThe Thirteenth International Conference on Learning Representations, 2025. URLhttps://openreview. net/forum?id=PYmrUQmMEw
2025
-
[17]
Transformer feed-forward layers are key-value memories
Mor Geva, Roei Schuster, Jonathan Berant, and Omer Levy. Transformer feed-forward layers are key-value memories. InProceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pages 5484–5495, 2021
2021
-
[18]
Challenges for spoken dialogue systems
James Glass. Challenges for spoken dialogue systems. InProceedings of the 1999 IEEE ASRU Workshop, volume 696. MIT Laboratory for Computer Science Cambridge, 1999
1999
-
[19]
Pauses, gaps and overlaps in conversations.Journal of Phonetics, 38(4):555–568, 2010
Mattias Heldner and Jens Edlund. Pauses, gaps and overlaps in conversations.Journal of Phonetics, 38(4):555–568, 2010
2010
-
[20]
Modu- lation of the auditory cortex during speech: an meg study.Journal of cognitive neuroscience, 14(8):1125–1138, 2002
John F Houde, Srikantan S Nagarajan, Kensuke Sekihara, and Michael M Merzenich. Modu- lation of the auditory cortex during speech: an meg study.Journal of cognitive neuroscience, 14(8):1125–1138, 2002
2002
-
[21]
Wavchat: A survey of spoken dialogue models
Shengpeng Ji, Yifu Chen, Minghui Fang, Jialong Zuo, Jingyu Lu, Hanting Wang, Ziyue Jiang, Long Zhou, Shujie Liu, Xize Cheng, et al. Wavchat: A survey of spoken dialogue models. arXiv preprint arXiv:2411.13577, 2024
arXiv 2024
-
[22]
Raon-speech technical report, 2026
Krafton. Raon-speech technical report, 2026
2026
-
[23]
Guan-Ting Lin, Shih-Yun Shan Kuan, Jiatong Shi, Kai-Wei Chang, Siddhant Arora, Shinji Watanabe, and Hung-yi Lee. Full-duplex-bench-v2: A multi-turn evaluation framework for duplex dialogue systems with an automated examiner.arXiv preprint arXiv:2510.07838, 2025
Pith/arXiv arXiv 2025
-
[24]
Guan-Ting Lin, Jiachen Lian, Tingle Li, Qirui Wang, Gopala Anumanchipalli, Alexander H Liu, and Hung-yi Lee. Full-duplex-bench: A benchmark to evaluate full-duplex spoken dia- logue models on turn-taking capabilities.arXiv preprint arXiv:2503.04721, 2025
arXiv 2025
-
[25]
Guan-Ting Lin, Chen Chen, Zhehuai Chen, and Hung-yi Lee. Full-duplex-bench-v3: Bench- marking tool use for full-duplex voice agents under real-world disfluency.arXiv preprint arXiv:2604.04847, 2026
Pith/arXiv arXiv 2026
-
[26]
Full-duplex-bench v1
Guan-Ting Lin, Shih-Yun Shan Kuan, Qirui Wang, Jiachen Lian, Tingle Li, Shinji Watan- abe, and Hung-yi Lee. Full-duplex-bench v1. 5: Evaluating overlap handling for full-duplex speech models. InICASSP 2026-2026 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pages 19447–19451. IEEE, 2026
2026
-
[27]
interpreting GPT: the logit lens, 2020
nostalgebraist. interpreting GPT: the logit lens, 2020. URLhttps://www.lesswrong.com/ posts/AcKRB8wDpdaN6v6ru/interpreting-gpt-the-logit-lens
2020
-
[28]
Jussi Numminen, Riitta Salmelin, and Riitta Hari. Subject’s own speech reduces reactivity of the human auditory cortex.Neuroscience Letters, 265(2):119–122, 1999. ISSN 0304-3940. doi: https://doi.org/10.1016/S0304-3940(99)00218-9. URLhttps://www.sciencedirect. com/science/article/pii/S0304394099002189
-
[29]
Fd-bench: A full-duplex benchmarking pipeline designed for full duplex spoken dia- logue systems
Yizhou Peng, Yi-Wen Chao, Dianwen Ng, Yukun Ma, Chongjia Ni, Bin Ma, and Eng Siong Chng. Fd-bench: A full-duplex benchmarking pipeline designed for full duplex spoken dia- logue systems. InProc. Interspeech 2025, pages 176–180, 2025
2025
-
[30]
Daking Rai, Yilun Zhou, Shi Feng, Abulhair Saparov, and Ziyu Yao. A practical re- view of mechanistic interpretability for transformer-based language models.arXiv preprint arXiv:2407.02646, 2024
arXiv 2024
-
[31]
Flexible turn-taking for spoken dialog systems.Language Technologies Insti- tute, CMU Dec, 12, 2008
Antoine Raux. Flexible turn-taking for spoken dialog systems.Language Technologies Insti- tute, CMU Dec, 12, 2008
2008
-
[32]
Steering llama 2 via contrastive activation addition
Nina Rimsky, Nick Gabrieli, Julian Schulz, Meg Tong, Evan Hubinger, and Alexander Turner. Steering llama 2 via contrastive activation addition. InProceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 15504– 15522, 2024. 11
2024
-
[33]
Rajarshi Roy, Jonathan Raiman, Sang-gil Lee, Teodor-Dumitru Ene, Robert Kirby, Sungwon Kim, Jaehyeon Kim, and Bryan Catanzaro. Personaplex: V oice and role control for full duplex conversational speech models.arXiv preprint arXiv:2602.06053, 2026
arXiv 2026
-
[34]
Turn-taking in conversational systems and human-robot interaction: a review
Gabriel Skantze. Turn-taking in conversational systems and human-robot interaction: a review. Computer Speech & Language, 67:101178, 2021
2021
-
[35]
Improving instruction-following in language models through activation steering
Alessandro Stolfo, Vidhisha Balachandran, Safoora Yousefi, Eric Horvitz, and Besmira Nushi. Improving instruction-following in language models through activation steering. InThe Thir- teenth International Conference on Learning Representations, 2024
2024
-
[36]
Intelligent barge-in in conversational systems
Nikko Ström and Stephanie Seneff. Intelligent barge-in in conversational systems. InINTER- SPEECH, pages 652–655, 2000
2000
-
[37]
Bert rediscovers the classical nlp pipeline
Ian Tenney, Dipanjan Das, and Ellie Pavlick. Bert rediscovers the classical nlp pipeline. In Proceedings of the 57th annual meeting of the association for computational linguistics, pages 4593–4601, 2019
2019
-
[38]
Steering language models with activation engineering.arXiv preprint arXiv:2308.10248, 2023
Alexander Matt Turner, Lisa Thiergart, Gavin Leech, David Udell, Juan J Vazquez, Ulisse Mini, and Monte MacDiarmid. Steering language models with activation engineering.arXiv preprint arXiv:2308.10248, 2023
Pith/arXiv arXiv 2023
-
[39]
Beyond turn-based interfaces: Synchronous llms as full-duplex dialogue agents
Bandhav Veluri, Benjamin N Peloquin, Bokai Yu, Hongyu Gong, and Shyamnath Gollakota. Beyond turn-based interfaces: Synchronous llms as full-duplex dialogue agents. InProceed- ings of the 2024 Conference on Empirical Methods in Natural Language Processing, pages 21390–21402, 2024
2024
-
[40]
Chengyou Wang, Hongfei Yue, Guojian Li, Zhixian Zhao, Shuiyuan Wang, Shuai Wang, Xin Xu, Hui Bu, and Lei Xie. Full-duplex interaction in spoken dialogue systems: A comprehen- sive study from the icassp 2026 humdial challenge.arXiv preprint arXiv:2604.21406, 2026
Pith/arXiv arXiv 2026
-
[41]
Trojan activation attack: Red-teaming large language models using steering vectors for safety-alignment
Haoran Wang and Kai Shu. Trojan activation attack: Red-teaming large language models using steering vectors for safety-alignment. InProceedings of the 33rd ACM International Conference on Information and Knowledge Management, pages 2347–2357, 2024
2024
-
[42]
Step-audio 2 technical report.arXiv preprint arXiv:2507.16632, 2025
Boyong Wu, Chao Yan, Chen Hu, Cheng Yi, Chengli Feng, Fei Tian, Feiyu Shen, Gang Yu, Haoyang Zhang, Jingbei Li, et al. Step-audio 2 technical report.arXiv preprint arXiv:2507.16632, 2025
Pith/arXiv arXiv 2025
-
[43]
Codec-superb: An in-depth analysis of sound codec models
Haibin Wu, Ho-Lam Chung, Yi-Cheng Lin, Yuan-Kuei Wu, Xuanjun Chen, Yu-Chi Pai, Hsiu- Hsuan Wang, Kai-Wei Chang, Alex Liu, and Hung-yi Lee. Codec-superb: An in-depth analysis of sound codec models. InFindings of the Association for Computational Linguistics: ACL 2024, pages 10330–10348, 2024
2024
-
[44]
Kangxiang Xia, Bingshen Mu, Xian Shi, Jin Xu, and Lei Xie. Semantic-aware interrup- tion detection in spoken dialogue systems: Benchmark, metric, and model.arXiv preprint arXiv:2603.24144, 2026
arXiv 2026
-
[45]
Zhifei Xie and Changqiao Wu. Mini-omni: Language models can hear, talk while thinking in streaming.arXiv preprint arXiv:2408.16725, 2024
arXiv 2024
-
[46]
Qwen3-omni technical report.arXiv preprint arXiv:2509.17765, 2025
Jin Xu, Zhifang Guo, Hangrui Hu, Yunfei Chu, Xiong Wang, Jinzheng He, Yuxuan Wang, Xian Shi, Ting He, Xinfa Zhu, et al. Qwen3-omni technical report.arXiv preprint arXiv:2509.17765, 2025
Pith/arXiv arXiv 2025
-
[47]
Soundstream: An end-to-end neural audio codec.IEEE/ACM Transactions on Audio, Speech, and Language Processing, 30:495–507, 2021
Neil Zeghidour, Alejandro Luebs, Ahmed Omran, Jan Skoglund, and Marco Tagliasacchi. Soundstream: An end-to-end neural audio codec.IEEE/ACM Transactions on Audio, Speech, and Language Processing, 30:495–507, 2021
2021
-
[48]
Aohan Zeng, Zhengxiao Du, Mingdao Liu, Kedong Wang, Shengmin Jiang, Lei Zhao, Yuxiao Dong, and Jie Tang. Glm-4-voice: Towards intelligent and human-like end-to-end spoken chatbot.arXiv preprint arXiv:2412.02612, 2024. 12
Pith/arXiv arXiv 2024
-
[49]
He Zhang, Wenqian Cui, Haoning Xu, Xiaohui Li, Lei Zhu, Haoli Bai, Shaohua Ma, and Irwin King. Mtr-duplexbench: Towards a comprehensive evaluation of multi-round conversations for full-duplex speech language models.arXiv preprint arXiv:2511.10262, 2025
Pith/arXiv arXiv 2025
-
[50]
Beyond the turn-based game: Enabling real-time conversa- tions with duplex models
Xinrong Zhang, Yingfa Chen, Shengding Hu, Xu Han, Zihang Xu, Yuanwei Xu, Weilin Zhao, Maosong Sun, and Zhiyuan Liu. Beyond the turn-based game: Enabling real-time conversa- tions with duplex models. InProceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, pages 11543–11557, 2024
2024
-
[51]
Andy Zou, Long Phan, Sarah Chen, James Campbell, Phillip Guo, Richard Ren, Alexander Pan, Xuwang Yin, Mantas Mazeika, Ann-Kathrin Dombrowski, et al. Representation engi- neering: A top-down approach to ai transparency.arXiv preprint arXiv:2310.01405, 2023. 13 A Dataset Details A.1 Turn-by-turn interaction dataset A.1 Dataset: Turn-by-turn dataset. Logit l...
Pith/arXiv arXiv 2023
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.