uxCUA is a trained computer use agent that assesses GUI usability more accurately than larger models by learning to prioritize and execute important user interactions on labeled interface datasets.
hub
L lama F actory: Unified Efficient Fine-Tuning of 100+ Language Models
16 Pith papers cite this work. Polarity classification is still indexing.
hub tools
citation-role summary
citation-polarity summary
roles
background 2polarities
background 2representative citing papers
ReflectMT internalizes reflection via two-stage RL to enable direct high-quality machine translation that outperforms explicit reasoning models like DeepSeek-R1 on WMT24 while using 94% fewer tokens.
PlantXpert benchmark shows fine-tuned VLMs reach up to 78% accuracy on plant phenotyping but scaling gains plateau and quantitative biological reasoning remains weak.
Merging fine-tuned models for multilingual translation fails because fine-tuning redistributes language-specific neurons rather than sharpening them, increasing representational divergence in output-generating layers.
SOLAR aligns soft-token probability mixtures across languages in embedding space during SFT and raises multilingual reasoning accuracy by up to 17.7 points over the base model.
R3LM trains LLMs via two-stage reasoning-then-regression on a new dataset CRE-ReasonBench with mechanistic traces, achieving SOTA enhancer activity prediction across three cell types with interpretable outputs.
Transformers are limited to a linearly growing number of accessible output sequences with prompt length, with exponential decay in accessible proportion beyond a critical point, even under unbounded context.
AutoVecCoder combines VecPrompt for automated intrinsic knowledge synthesis and VecRL for efficiency-aligned RL to train an 8B LLM that achieves SOTA on SimdBench SSE/AVX subsets and sometimes exceeds -O3 compiler results.
AutoLLMResearch trains agents in a multi-fidelity LLMConfig-Gym environment formulated as a long-horizon MDP to enable cross-fidelity extrapolation for automating high-cost LLM experiment configurations.
HFRU is a two-stage reinforcement unlearning method operating on the vision encoder with GRPO optimization and an abstraction reward that achieves over 98% forgetting and retention on object and face tasks with negligible hallucination.
Math reasoning gains in LLMs rarely transfer to general domains; RL tuning generalizes while SFT causes forgetting and representation drift.
TASM proposes a task-aware structured memory framework using task-vector compression, bipartite token merging, and a Core Memory plus Latent Bank hierarchy to enable efficient dynamic multi-modal in-context learning.
Excessive SFT reduces LLM plasticity for RL; Rejuvenation restores it via base-anchored fusion and targeted neuron resets, yielding better RL performance and OOD generalization.
PEFT-Factory supplies a ready-to-use, extensible codebase that unifies 19 PEFT methods and evaluation pipelines for fine-tuning large autoregressive language models.
ReaORE is a progressive open relation extraction method that applies coarse-to-fine reasoning to improve generalization to unseen relations over clustering or direct LLM generation.
Fine-tuning Qwen3-VL-32B-Instruct on a curated set of 13k fracture images yields a specialist model achieving 0.92 precision on morphology recognition, outperforming the base model and several proprietary VLMs on a 100-image manual benchmark.
citing papers explorer
-
AutoLLMResearch: Training Research Agents for Automating LLM Experiment Configuration - Learning from Cheap, Optimizing Expensive
AutoLLMResearch trains agents in a multi-fidelity LLMConfig-Gym environment formulated as a long-horizon MDP to enable cross-fidelity extrapolation for automating high-cost LLM experiment configurations.
-
Object Hallucination-Free Reinforcement Unlearning for Vision-Language Models
HFRU is a two-stage reinforcement unlearning method operating on the vision encoder with GRPO optimization and an abstraction reward that achieves over 98% forgetting and retention on object and face tasks with negligible hallucination.