Component Ablation for Efficient Hybrid Language Model Architectures: Performance, Resilience, and Compression Implications

Elies Segu\'i-Mas; Guillermina Tormo-Carb\'o; Hector Borobia

arxiv: 2603.22473 · v2 · pith:W4JH44UZnew · submitted 2026-03-23 · 💻 cs.CL · cs.AI· cs.LG

Component Ablation for Efficient Hybrid Language Model Architectures: Performance, Resilience, and Compression Implications

Hector Borobia , Elies Segu\'i-Mas , Guillermina Tormo-Carb\'o This is my paper

classification 💻 cs.CL cs.AIcs.LG

keywords componenthybridarchitectureslanguagemodelablationattentionmodels

0 comments

read the original abstract

Hybrid language models combine softmax attention with linear-time sequence mechanisms such as state-space or linear-attention layers, but the functional contribution of each component type remains insufficiently characterized. We study component-level ablation in two sub-1B hybrid language models, Qwen3.5-0.8B and Falcon-H1-0.5B, using likelihood-based evaluation, downstream benchmarks, layer-wise interventions, random controls, and representation-level diagnostics. Across the tested models, removing either attention or the alternative sequence-processing pathway substantially degrades performance, indicating that both component types contribute to model behavior. Likelihood metrics are especially sensitive to the linear-attention or state-space pathway, while downstream benchmark degradation depends on task and architecture. Layer-wise ablations show that component importance is position-dependent, with the strongest effects concentrated in early or mid-network components rather than uniformly across depth. Random-removal controls further show that hybrid architectures and same-family Transformer baselines degrade differently under structural perturbation. These results suggest that component ablation is a useful diagnostic for understanding hybrid language model architectures. The findings provide evidence relevant to efficient model design, compression, robustness analysis, and deployment decisions in architectures that combine attention with alternative sequence-processing mechanisms.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 3 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Component-Aware Self-Speculative Decoding in Hybrid Language Models
cs.CL 2026-05 unverdicted novelty 7.0

Component-aware self-speculative decoding achieves high acceptance rates in parallel hybrid models like Falcon-H1 but fails in sequential ones like Qwen3.5, with the gap tied to how components are integrated.
Agentic Discovery of Neural Architectures: AIRA-Compose and AIRA-Design
cs.AI 2026-05 unverdicted novelty 6.0

Multi-agent LLM systems discover new Transformer and hybrid architectures that outperform Llama 3.2 at 1B scale and approach human SOTA on long-range benchmarks.
Where Should LoRA Go? Component-Type Placement in Hybrid Language Models
cs.CL 2026-04 unverdicted novelty 6.0

Adapting only the attention components with LoRA outperforms full-model adaptation in hybrid LLMs, with recurrent adaptation harming sequential hybrids but helping parallel ones.