EO-Gym supplies an executable multimodal environment and 9k-trajectory benchmark that turns Earth Observation into a tool-using, multi-step reasoning task, revealing that current VLMs struggle on temporal and cross-sensor workflows while fine-tuning lifts Pass@3 from 0.49 to 0.74.
Swift: A scalable lightweight infrastructure for fine-tuning
7 Pith papers cite this work. Polarity classification is still indexing.
years
2026 7verdicts
UNVERDICTED 7representative citing papers
DataPRM is a new process reward model for data analysis agents that detects silent errors via environment interaction and ternary rewards, yielding 7-11% gains on benchmarks and further RL improvements.
GRACE scores reasoning steps via gradient alignment and trajectory consistency to select data subsets that match full performance with 5% of the data on Qwen3-VL-2B-Instruct.
LightThinker++ adds explicit adaptive memory management and a trajectory synthesis pipeline to LLM reasoning, cutting peak token use by ~70% while gaining accuracy in standard and long-horizon agent tasks.
CharTool equips MLLMs with cropping and code tools plus agentic RL on DuoChart data to raise chart-reasoning accuracy by up to 9.78 percent on benchmarks.
Multilingual RAG rerankers exhibit language bias that limits cross-lingual evidence use, and the proposed LAURA method aligns ranking with downstream generation utility to reduce the bias and improve performance.
SparseBalance dynamically adjusts sparsity and batches workloads to load-balance sparse attention training, delivering up to 1.33x speedup and 0.46% better long-context performance on LongBench.
citing papers explorer
-
SparseBalance: Load-Balanced Long Context Training with Dynamic Sparse Attention
SparseBalance dynamically adjusts sparsity and batches workloads to load-balance sparse attention training, delivering up to 1.33x speedup and 0.46% better long-context performance on LongBench.