{"total":11,"items":[{"citing_arxiv_id":"2605.28819","ref_index":8,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"PEFT-Arena: Understanding Parameter-Efficient Finetuning from a Stability-Plasticity Perspective","primary_cat":"cs.LG","submitted_at":"2026-05-27T17:59:51+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"PEFT-Arena reveals distinct stability-plasticity profiles across PEFT methods, with orthogonal finetuning achieving the best Pareto frontier under comparable parameter budgets, supported by weight-space spectral and activation-space retention analyses.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.21468","ref_index":13,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"You Only Need Minimal RLVR Training: Extrapolating LLMs via Rank-1 Trajectories","primary_cat":"cs.LG","submitted_at":"2026-05-20T17:53:20+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"RELEX extrapolates LLM checkpoints from short RLVR prefixes by projecting deltas onto a rank-1 subspace and fitting a linear trend, matching full training performance at 15% of the steps.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.22869","ref_index":56,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"FuRA: Full-Rank Parameter-Efficient Fine-Tuning with Spectral Preconditioning","primary_cat":"cs.LG","submitted_at":"2026-05-19T22:11:25+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"FuRA uses block tensor-train factorization with fixed pretrained SVD basis to achieve full-rank spectral preconditioning, outperforming Full FT by +1.37 on LLaMA-3-8B commonsense reasoning and surpassing QLoRA in quantized settings.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.19282","ref_index":44,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Rethinking Muon Beyond Pretraining: Spectral Failures and High-Pass Remedies for VLA and RLVR","primary_cat":"cs.LG","submitted_at":"2026-05-19T03:00:26+00:00","verdict":"CONDITIONAL","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"Pion modifies Muon's Newton-Schulz iterations into a controllable high-pass filter that anchors dominant singular values at 1 while suppressing noisy tails, outperforming Muon and AdamW in VLA and RLVR regimes.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.12492","ref_index":94,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Pion: A Spectrum-Preserving Optimizer via Orthogonal Equivalence Transformation","primary_cat":"cs.LG","submitted_at":"2026-05-12T17:59:34+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":5.0,"formal_verification":"none","one_line_summary":"Pion is an optimizer that preserves the singular values of weight matrices in LLM training by applying orthogonal equivalence transformations.","context_count":1,"top_context_role":"background","top_context_polarity":"background","context_text":"advantage on code generation, achieving the highest ID and OOD scores across both base models. On mathematical finetuning, Pion matches the ID performance of competing optimizers while more effectively preserving OOD capabilities, highlighting its robustness against catastrophic forgetting. Reinforcement learning with verifiable reward (RLVR). We study Pion as an optimizer for RLVR. Our motivation comes from recent observations [94] that RLVR updates largely preserve the spectral structure of pretrained weight matrices, suggesting that RLVR may benefit from optimizers whose update geometry aligns with the underlying matrix structure. Because Pion preserves the weight spectrum during optimization, it is naturally suitable for RLVR training. We therefore compare Pion against AdamW and Muon to assess whether it can improve the performance of RLVR."},{"citing_arxiv_id":"2605.11739","ref_index":103,"ref_count":2,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Learning to Foresee: Unveiling the Unlocking Efficiency of On-Policy Distillation","primary_cat":"cs.CL","submitted_at":"2026-05-12T08:19:15+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":5.0,"formal_verification":"none","one_line_summary":"On-policy distillation gains efficiency from early foresight in module allocation and update directions, which the proposed EffOPD method exploits for 3x faster training with comparable performance.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.10973","ref_index":49,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Rotation-Preserving Supervised Fine-Tuning","primary_cat":"cs.LG","submitted_at":"2026-05-08T20:20:05+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"RPSFT improves the in-domain versus out-of-domain performance trade-off during LLM supervised fine-tuning by penalizing rotations in pretrained singular subspaces as a proxy for loss-sensitive directions.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.07330","ref_index":51,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"SparseRL-Sync: Lossless Weight Synchronization with ~100x Less Communication","primary_cat":"cs.LG","submitted_at":"2026-05-08T06:34:51+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":4.0,"formal_verification":"none","one_line_summary":"SparseRL-Sync achieves lossless weight synchronization in large-scale RL by sending only changed parameters, reducing communication volume by roughly 100x under observed 99%+ element-level sparsity.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.00610","ref_index":43,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Decouple before Integration: Test-time Synthesis of SFT and RLVR Task Vectors","primary_cat":"cs.LG","submitted_at":"2026-05-01T12:20:44+00:00","verdict":"CONDITIONAL","verdict_confidence":"MODERATE","novelty_score":6.0,"formal_verification":"none","one_line_summary":"DoTS decouples SFT and RLVR training then synthesizes their task vectors at inference time to match integrated training results at ~3% compute cost.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2602.03839","ref_index":31,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Understanding and Exploiting Weight Update Sparsity for Communication-Efficient Distributed RL","primary_cat":"cs.LG","submitted_at":"2026-02-03T18:56:48+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"PULSE exploits BF16-invisible sparsity in weight updates to enable over 100x lower communication in distributed RL post-training via compute-visible sparsification.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2601.18832","ref_index":28,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"The Geometric Reasoner: Manifold-Informed Latent Foresight Search for Long-Context Reasoning","primary_cat":"cs.LG","submitted_at":"2026-01-25T18:16:17+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"TGR performs manifold-informed latent foresight search to boost trajectory coverage in long-context reasoning tasks by up to 13 AUC points with minimal overhead.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null}],"limit":50,"offset":0}