← back to paper
arxiv: 2602.04872 · 2 revisions
Multi-layer Cross-attention is Provably Optimal for Multi-modal In-context Learning