Turning Drift into Constraint: Robust Reasoning Alignment in Non-Stationary Multi-Stream Environments

Xiaoyu Yang , En Yu , Wei Duan , Jie Lu

Authors on Pith no claims yet

classification 💻 cs.CV cs.AIcs.LG

keywords reasoningalignmentdriftmodelsmodelsourceconstraintenvironments

read the original abstract

This paper identifies a critical yet underexplored challenge in reasoning alignment from multiple multi-modal large language models (MLLMs): In non-stationary environments, the diverse reasoning distributions of source models often evolve unpredictably, transmitting systematic biases and drift to the target model. To address this, we formulate multi-source reasoning alignment as a constraint satisfaction problem under concept drift theory. We propose Autonomous Preference Optimization (APO), a novel framework that treats inter-model divergences not as noise, but as dynamic negative constraints. APO operates via a two-stage protocol: first, supervised bootstrapping projects the target model into the capability union of source models; second, constraint-aware optimization synthesizes a consistent consensus manifold by explicitly suppressing drifting trajectories via a multi-negative Plackett-Luce objective. Extensive experiments on chest X-ray interpretation demonstrate that our 7B model achieves superior robustness, outperforming even proprietary source models in average accuracy. Furthermore, we release CXR-MAX, a large-scale benchmark comprising 170,982 reasoning trajectories from seven large-scale MLLMs to facilitate research on reasoning alignment under drift. Code and data are available at: https://github.com/XiaoyuYoung/APO.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 3 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Autonomous Drift Learning in Data Streams: A Unified Perspective
cs.LG 2026-05 unverdicted novelty 7.0

A survey proposes a novel 3D taxonomy classifying drifts into time stream, data stream, and model stream categories to unify research on non-stationary autonomous learning.
XrayClaw: Cooperative-Competitive Multi-Agent Alignment for Trustworthy Chest X-ray Diagnosis
cs.CV 2026-04 unverdicted novelty 7.0

XrayClaw deploys cooperative-competitive multi-agent alignment and Competitive Preference Optimization to raise diagnostic accuracy, reasoning fidelity, and generalization on chest X-ray benchmarks.
Towards Robust Endogenous Reasoning: Unifying Drift Adaptation in Non-Stationary Tuning
cs.LG 2026-04 unverdicted novelty 5.0

CPO++ adapts reinforcement fine-tuning of MLLMs to endogenous multi-modal concept drift through counterfactual reasoning and preference optimization, yielding better coherence and cross-domain robustness in safety-cri...