{"total":16,"items":[{"citing_arxiv_id":"2606.26986","ref_index":69,"ref_count":1,"confidence":0.88,"is_internal_anchor":false,"paper_title":"ReaORE: Reasoning-Guided Progressive Open Relation Extraction Empowered by Large Reasoning Models","primary_cat":"cs.CL","submitted_at":"2026-06-25T12:59:53+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":4.0,"formal_verification":"none","one_line_summary":"ReaORE is a progressive open relation extraction method that applies coarse-to-fine reasoning to improve generalization to unseen relations over clustering or direct LLM generation.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2606.26466","ref_index":87,"ref_count":1,"confidence":0.88,"is_internal_anchor":false,"paper_title":"Soft Token Alignment for Cross-Lingual Reasoning","primary_cat":"cs.CL","submitted_at":"2026-06-25T00:01:58+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"SOLAR aligns soft-token probability mixtures across languages in embedding space during SFT and raises multilingual reasoning accuracy by up to 17.7 points over the base model.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2606.11853","ref_index":224,"ref_count":1,"confidence":0.88,"is_internal_anchor":false,"paper_title":"Task-Aware Structured Memory for Dynamic Multi-modal In-Context Learning","primary_cat":"cs.CV","submitted_at":"2026-06-10T09:30:25+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":5.0,"formal_verification":"none","one_line_summary":"TASM proposes a task-aware structured memory framework using task-vector compression, bipartite token merging, and a Core Memory plus Latent Bank hierarchy to enable efficient dynamic multi-modal in-context learning.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2606.09932","ref_index":126,"ref_count":1,"confidence":0.88,"is_internal_anchor":false,"paper_title":"When RL Fails after SFT: Rejuvenating Model Plasticity for Robust SFT-to-RL Handoff","primary_cat":"cs.LG","submitted_at":"2026-06-07T17:58:58+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":5.0,"formal_verification":"none","one_line_summary":"Excessive SFT reduces LLM plasticity for RL; Rejuvenation restores it via base-anchored fusion and targeted neuron resets, yielding better RL performance and OOD generalization.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2606.08147","ref_index":42,"ref_count":1,"confidence":0.88,"is_internal_anchor":false,"paper_title":"Biological Reasoning-Informed Regression for Interpretable Regulatory DNA Activity Prediction","primary_cat":"q-bio.GN","submitted_at":"2026-06-06T12:56:08+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"R3LM trains LLMs via two-stage reasoning-then-regression on a new dataset CRE-ReasonBench with mechanistic traces, achieving SOTA enhancer activity prediction across three cell types with interpretable outputs.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.22223","ref_index":14,"ref_count":1,"confidence":0.88,"is_internal_anchor":false,"paper_title":"How Many Different Outputs Can a Transformer Generate?","primary_cat":"cs.LG","submitted_at":"2026-05-21T09:26:37+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"Transformers are limited to a linearly growing number of accessible output sequences with prompt length, with exponential decay in accessible proportion beyond a critical point, even under unbounded context.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.17978","ref_index":97,"ref_count":1,"confidence":0.88,"is_internal_anchor":false,"paper_title":"AutoVecCoder: Teaching LLMs to Generate Explicitly Vectorized Code","primary_cat":"cs.CL","submitted_at":"2026-05-18T07:33:15+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"AutoVecCoder combines VecPrompt for automated intrinsic knowledge synthesis and VecRL for efficiency-aligned RL to train an 8B LLM that achieves SOTA on SimdBench SSE/AVX subsets and sometimes exceeds -O3 compiler results.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.11518","ref_index":40,"ref_count":2,"confidence":0.88,"is_internal_anchor":false,"paper_title":"AutoLLMResearch: Training Research Agents for Automating LLM Experiment Configuration - Learning from Cheap, Optimizing Expensive","primary_cat":"cs.AI","submitted_at":"2026-05-12T04:42:35+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"AutoLLMResearch trains agents in a multi-fidelity LLMConfig-Gym environment formulated as a long-horizon MDP to enable cross-fidelity extrapolation for automating high-cost LLM experiment configurations.","context_count":1,"top_context_role":"background","top_context_polarity":"background","context_text":"regions diverge, a property critical for real-world deployment.Result 10:Our method degrades far more gracefully across both stress tests and recovers quickly, showing that text-based reasoning learns a deeper extrapolation strategy that remains robust under adversarial low-fidelity regimes. Broader Impact and FutureThis work contributes to broader AI Scientist [40]. Our framework can be extended to more LLM configuration scenarios and broadly to any domain where cheap trials can guide expensive decisions (catalyst optimization, etc [41, 42]). Future directions: 1) expanding our Gym with more tasks and fidelity; 2) multi-objective optimization that balances competing goals. More broadly,toward Recursive LLM Design,as foreshadowed in our introduction, AutoLLMResearch"},{"citing_arxiv_id":"2605.08031","ref_index":19,"ref_count":1,"confidence":0.88,"is_internal_anchor":false,"paper_title":"Object Hallucination-Free Reinforcement Unlearning for Vision-Language Models","primary_cat":"cs.CV","submitted_at":"2026-05-08T17:19:53+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"HFRU is a two-stage reinforcement unlearning method operating on the vision encoder with GRPO optimization and an abstraction reward that achieves over 98% forgetting and retention on object and face tasks with negligible hallucination.","context_count":1,"top_context_role":"background","top_context_polarity":"background","context_text":"Since πref assigns non-zero probability to such sequences, it follows that Zcomp(x, p)> Z pen(x, p).(18) Now consider any hallucinated sequence yh, i.e., a sequence containing tokens in DHallu such that DHallu ∩ O(x) =∅ . By construction, such sequences do not contain valid hypernyms, hence Rabs(yh) = 0. Therefore, their numerators remain unchanged: πcomp(yh |x, p) = πref(yh |x, p) exp \u0010 Rpen(yh) β \u0011 Zcomp(x, p) .(19) SinceZ comp(x, p)> Z pen(x, p), we obtain πcomp(yh |x, p)< π pen(yh |x, p),∀(x, p)∈ D f .(20) Summing over all hallucinated sequences yields Phallu(πcomp |x, p)< P hallu(πpen |x, p),∀(x, p)∈ D f .(21) Taking expectation over the forget setD f completes the proof: E(x,p)∼Df [Phallu(πcomp |x, p)]<E (x,p)∼Df [Phallu(πpen |x, p)].(22) C Implementation Details"},{"citing_arxiv_id":"2605.07145","ref_index":15,"ref_count":1,"confidence":0.88,"is_internal_anchor":false,"paper_title":"Fine-tuning a vision-language model for fracture-surface morphology recognition","primary_cat":"cond-mat.mtrl-sci","submitted_at":"2026-05-08T02:26:36+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":4.0,"formal_verification":"none","one_line_summary":"Fine-tuning Qwen3-VL-32B-Instruct on a curated set of 13k fracture images yields a specialist model achieving 0.92 precision on morphology recognition, outperforming the base model and several proprietary VLMs on a 100-image manual benchmark.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2604.26020","ref_index":92,"ref_count":1,"confidence":0.88,"is_internal_anchor":false,"paper_title":"Training Computer Use Agents to Assess the Usability of Graphical User Interfaces","primary_cat":"cs.CL","submitted_at":"2026-04-28T18:04:11+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"uxCUA is a trained computer use agent that assesses GUI usability more accurately than larger models by learning to prioritize and execute important user interactions on labeled interface datasets.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2604.19144","ref_index":59,"ref_count":1,"confidence":0.88,"is_internal_anchor":false,"paper_title":"ReflectMT: Internalizing Reflection for Efficient and High-Quality Machine Translation","primary_cat":"cs.CL","submitted_at":"2026-04-21T06:48:41+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"ReflectMT internalizes reflection via two-stage RL to enable direct high-quality machine translation that outperforms explicit reasoning models like DeepSeek-R1 on WMT24 while using 94% fewer tokens.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2604.09907","ref_index":23,"ref_count":1,"confidence":0.88,"is_internal_anchor":false,"paper_title":"From UAV Imagery to Agronomic Reasoning: A Multimodal LLM Benchmark for Plant Phenotyping","primary_cat":"cs.CV","submitted_at":"2026-04-10T21:08:18+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"PlantXpert benchmark shows fine-tuned VLMs reach up to 78% accuracy on plant phenotyping but scaling gains plateau and quantitative biological reasoning remains weak.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2604.02881","ref_index":46,"ref_count":1,"confidence":0.88,"is_internal_anchor":false,"paper_title":"One Model to Translate Them All? A Journey to Mount Doom for Multilingual Model Merging","primary_cat":"cs.CL","submitted_at":"2026-04-03T08:45:26+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"Merging fine-tuned models for multilingual translation fails because fine-tuning redistributes language-specific neurons rather than sharpening them, increasing representational divergence in output-generating layers.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2512.02764","ref_index":86,"ref_count":1,"confidence":0.88,"is_internal_anchor":false,"paper_title":"PEFT-Factory: Unified Parameter-Efficient Fine-Tuning of Autoregressive Large Language Models","primary_cat":"cs.CL","submitted_at":"2025-12-02T13:44:41+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":5.0,"formal_verification":"none","one_line_summary":"PEFT-Factory supplies a ready-to-use, extensible codebase that unifies 19 PEFT methods and evaluation pipelines for fine-tuning large autoregressive language models.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2507.00432","ref_index":19,"ref_count":1,"confidence":0.88,"is_internal_anchor":false,"paper_title":"Does Math Reasoning Improve General LLM Capabilities? Understanding Transferability of LLM Reasoning","primary_cat":"cs.AI","submitted_at":"2025-07-01T05:23:05+00:00","verdict":"CONDITIONAL","verdict_confidence":"MODERATE","novelty_score":6.0,"formal_verification":"none","one_line_summary":"Math reasoning gains in LLMs rarely transfer to general domains; RL tuning generalizes while SFT causes forgetting and representation drift.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null}],"limit":50,"offset":0}