From Parameters to Data: A Task-Parameter-Guided Fine-Tuning Pipeline for Efficient LLM Alignment

· 2026 · cs.LG · arXiv 2605.21558

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

open full Pith review browse 1 citing papers arXiv PDF

abstract

Adapting Large Language Models (LLMs) to specialized domains typically incurs high data and computational overhead. While prior efficiency efforts have largely treated data selection and parameter-efficient fine-tuning as isolated processes, our empirical analysis suggests they may be intrinsically coupled. We posit the Strong Map Hypothesis: a sparse subset of attention heads plays a dominant role in task-specific adaptation, acting as keys that unlock specific data patterns. Building on this observation, we propose From Parameters to Data (P2D), a unified framework that leverages these task-sensitive attention heads as a dual compass for both sample mining and structural pruning. To rigorously quantify the total pipeline cost, we introduce the Alignment Efficiency Ratio (AER) metric for both selection latency and training time. Mechanistically, P2D identifies critical heads via a lightweight proxy and uses them as a functional filter to curate high-affinity data, establishing a synergistic pipeline. Empirically, by updating merely 10% of attention heads on 10% of the data, P2D achieves an 8.3 pp performance gain over strong baselines and delivers a 7.0x end-to-end time speedup. These results validate that precise parameter-data synchronization eliminates redundancy, offering a new paradigm for efficient alignment.

representative citing papers

Momentum for Reasoning: Dense Intrinsic Signals in Policy Optimization

cs.AI · 2026-06-07 · unverdicted · novelty 6.0

ISPO densifies GRPO rewards with sequence-level informativeness and token-level directional signals from policy probabilities to reduce zero-advantage collapse and hallucinated certainty on math benchmarks.

citing papers explorer

Showing 1 of 1 citing paper.

Momentum for Reasoning: Dense Intrinsic Signals in Policy Optimization cs.AI · 2026-06-07 · unverdicted · none · ref 3 · internal anchor
ISPO densifies GRPO rewards with sequence-level informativeness and token-level directional signals from policy probabilities to reduce zero-advantage collapse and hallucinated certainty on math benchmarks.

From Parameters to Data: A Task-Parameter-Guided Fine-Tuning Pipeline for Efficient LLM Alignment

fields

years

verdicts

representative citing papers

citing papers explorer