hub

L lama F actory: Unified Efficient Fine-Tuning of 100+ Language Models

LlamaFactory: Unified Efficient Fine-Tuning of 100+ Language Models · 2024 · DOI 10.18653/v1/2024.acl-demos.38

16 Pith papers cite this work. Polarity classification is still indexing.

16 Pith papers citing it

open at publisher browse 16 citing papers

hub tools

JSON dossier citing papers JSON publisher DOI

citation-role summary

background 2

citation-polarity summary

background 2

representative citing papers

Training Computer Use Agents to Assess the Usability of Graphical User Interfaces

cs.CL · 2026-04-28 · unverdicted · novelty 7.0

uxCUA is a trained computer use agent that assesses GUI usability more accurately than larger models by learning to prioritize and execute important user interactions on labeled interface datasets.

ReflectMT: Internalizing Reflection for Efficient and High-Quality Machine Translation

cs.CL · 2026-04-21 · unverdicted · novelty 7.0

ReflectMT internalizes reflection via two-stage RL to enable direct high-quality machine translation that outperforms explicit reasoning models like DeepSeek-R1 on WMT24 while using 94% fewer tokens.

From UAV Imagery to Agronomic Reasoning: A Multimodal LLM Benchmark for Plant Phenotyping

cs.CV · 2026-04-10 · unverdicted · novelty 7.0

PlantXpert benchmark shows fine-tuned VLMs reach up to 78% accuracy on plant phenotyping but scaling gains plateau and quantitative biological reasoning remains weak.

One Model to Translate Them All? A Journey to Mount Doom for Multilingual Model Merging

cs.CL · 2026-04-03 · unverdicted · novelty 7.0

Merging fine-tuned models for multilingual translation fails because fine-tuning redistributes language-specific neurons rather than sharpening them, increasing representational divergence in output-generating layers.

Soft Token Alignment for Cross-Lingual Reasoning

cs.CL · 2026-06-25 · unverdicted · novelty 6.0

SOLAR aligns soft-token probability mixtures across languages in embedding space during SFT and raises multilingual reasoning accuracy by up to 17.7 points over the base model.

Biological Reasoning-Informed Regression for Interpretable Regulatory DNA Activity Prediction

q-bio.GN · 2026-06-06 · unverdicted · novelty 6.0

R3LM trains LLMs via two-stage reasoning-then-regression on a new dataset CRE-ReasonBench with mechanistic traces, achieving SOTA enhancer activity prediction across three cell types with interpretable outputs.

How Many Different Outputs Can a Transformer Generate?

cs.LG · 2026-05-21 · unverdicted · novelty 6.0

Transformers are limited to a linearly growing number of accessible output sequences with prompt length, with exponential decay in accessible proportion beyond a critical point, even under unbounded context.

AutoVecCoder: Teaching LLMs to Generate Explicitly Vectorized Code

cs.CL · 2026-05-18 · unverdicted · novelty 6.0

AutoVecCoder combines VecPrompt for automated intrinsic knowledge synthesis and VecRL for efficiency-aligned RL to train an 8B LLM that achieves SOTA on SimdBench SSE/AVX subsets and sometimes exceeds -O3 compiler results.

AutoLLMResearch: Training Research Agents for Automating LLM Experiment Configuration - Learning from Cheap, Optimizing Expensive

cs.AI · 2026-05-12 · unverdicted · novelty 6.0 · 2 refs

AutoLLMResearch trains agents in a multi-fidelity LLMConfig-Gym environment formulated as a long-horizon MDP to enable cross-fidelity extrapolation for automating high-cost LLM experiment configurations.

Object Hallucination-Free Reinforcement Unlearning for Vision-Language Models

cs.CV · 2026-05-08 · unverdicted · novelty 6.0

HFRU is a two-stage reinforcement unlearning method operating on the vision encoder with GRPO optimization and an abstraction reward that achieves over 98% forgetting and retention on object and face tasks with negligible hallucination.

Does Math Reasoning Improve General LLM Capabilities? Understanding Transferability of LLM Reasoning

cs.AI · 2025-07-01 · conditional · novelty 6.0

Math reasoning gains in LLMs rarely transfer to general domains; RL tuning generalizes while SFT causes forgetting and representation drift.

Task-Aware Structured Memory for Dynamic Multi-modal In-Context Learning

cs.CV · 2026-06-10 · unverdicted · novelty 5.0

TASM proposes a task-aware structured memory framework using task-vector compression, bipartite token merging, and a Core Memory plus Latent Bank hierarchy to enable efficient dynamic multi-modal in-context learning.

When RL Fails after SFT: Rejuvenating Model Plasticity for Robust SFT-to-RL Handoff

cs.LG · 2026-06-07 · unverdicted · novelty 5.0

Excessive SFT reduces LLM plasticity for RL; Rejuvenation restores it via base-anchored fusion and targeted neuron resets, yielding better RL performance and OOD generalization.

PEFT-Factory: Unified Parameter-Efficient Fine-Tuning of Autoregressive Large Language Models

cs.CL · 2025-12-02 · unverdicted · novelty 5.0

PEFT-Factory supplies a ready-to-use, extensible codebase that unifies 19 PEFT methods and evaluation pipelines for fine-tuning large autoregressive language models.

ReaORE: Reasoning-Guided Progressive Open Relation Extraction Empowered by Large Reasoning Models

cs.CL · 2026-06-25 · unverdicted · novelty 4.0

ReaORE is a progressive open relation extraction method that applies coarse-to-fine reasoning to improve generalization to unseen relations over clustering or direct LLM generation.

Fine-tuning a vision-language model for fracture-surface morphology recognition

cond-mat.mtrl-sci · 2026-05-08 · unverdicted · novelty 4.0

Fine-tuning Qwen3-VL-32B-Instruct on a curated set of 13k fracture images yields a specialist model achieving 0.92 precision on morphology recognition, outperforming the base model and several proprietary VLMs on a 100-image manual benchmark.

citing papers explorer

Showing 2 of 2 citing papers after filters.

AutoLLMResearch: Training Research Agents for Automating LLM Experiment Configuration - Learning from Cheap, Optimizing Expensive cs.AI · 2026-05-12 · unverdicted · none · ref 40 · 2 links
AutoLLMResearch trains agents in a multi-fidelity LLMConfig-Gym environment formulated as a long-horizon MDP to enable cross-fidelity extrapolation for automating high-cost LLM experiment configurations.
Object Hallucination-Free Reinforcement Unlearning for Vision-Language Models cs.CV · 2026-05-08 · unverdicted · none · ref 19
HFRU is a two-stage reinforcement unlearning method operating on the vision encoder with GRPO optimization and an abstraction reward that achieves over 98% forgetting and retention on object and face tasks with negligible hallucination.

L lama F actory: Unified Efficient Fine-Tuning of 100+ Language Models

hub tools

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer