Title resolution pending

Qwen3 Technical Report , author= · 2025

23 Pith papers cite this work. Polarity classification is still indexing.

23 Pith papers citing it

Title metadata for this work has not finished resolving. The hub is built from the citation graph; the title resolver retries DOI and OpenAlex on its next pass.

representative citing papers

ReVision: Scaling Computer-Use Agents via Temporal Visual Redundancy Reduction

cs.CL · 2026-05-11 · unverdicted · novelty 7.0

ReVision reduces visual token usage by 46% on average in agent trajectories via a learned patch selector and improves success rates by 3% on three benchmarks, showing that history saturation stems from inefficient representations rather than lack of utility.

OLIVIA: Online Learning via Inference-time Action Adaptation for Decision Making in LLM ReAct Agents

cs.AI · 2026-05-11 · unverdicted · novelty 7.0

OLIVIA treats LLM agent action selection as a contextual linear bandit over frozen hidden states and applies UCB exploration to adapt online, yielding consistent gains over static ReAct and prompt-based baselines on four benchmarks.

Multi-domain Multi-modal Document Classification Benchmark with a Multi-level Taxonomy

cs.CL · 2026-05-11 · unverdicted · novelty 7.0

MMM-Bench is the first benchmark with a 5-level taxonomy, 5,990 multi-modal documents from 12 commercial domains, expert hierarchical annotations, and baselines that reveal four key challenges in multi-domain document classification.

Positional LSH: Binary Block Matrix Approximation for Attention with Linear Biases

cs.LG · 2026-05-10 · unverdicted · novelty 7.0

ALiBi bias is the expectation of positional LSH-induced block masks, yielding spectral and max-norm approximation bounds that reduce long-context biased attention to randomized short-context unbiased attention.

Fin-Bias: Comprehensive Evaluation for LLM Decision-Making under human bias in Finance Domain

cs.CL · 2026-05-09 · unverdicted · novelty 7.0

LLMs copy biased analyst ratings in investment decisions but a new detection method encourages independent reasoning and can improve stock return predictions beyond human levels.

Agentick: A Unified Benchmark for General Sequential Decision-Making Agents

cs.AI · 2026-05-07 · unverdicted · novelty 7.0

Agentick is a new benchmark for sequential decision-making agents that evaluates RL, LLM, VLM, hybrid, and human approaches across 37 tasks and finds no single method dominates.

Solve the Loop: Attractor Models for Language and Reasoning

cs.LG · 2026-05-12 · unverdicted · novelty 6.0

Attractor Models solve for fixed points in transformer embeddings using implicit differentiation to enable stable iterative refinement, delivering better perplexity, accuracy, and efficiency than standard or looped transformers.

Search Your Block Floating Point Scales!

cs.LG · 2026-05-12 · unverdicted · novelty 6.0

ScaleSearch optimizes block floating point scales via fine-grained search to cut quantization error by 27% for NVFP4, improving PTQ by up to 15 points on MATH500 for Qwen3-8B and attention PPL by 0.77 on Llama 3.1 70B.

Learning to Foresee: Unveiling the Unlocking Efficiency of On-Policy Distillation

cs.CL · 2026-05-12 · unverdicted · novelty 6.0

On-policy distillation gains efficiency from early foresight in module allocation and low-rank update directions, enabling EffOPD to accelerate training by 3x via adaptive extrapolation without extra modules or tuning.

Enhancing Multilingual Counterfactual Generation through Alignment-as-Preference Optimization

cs.CL · 2026-05-12 · unverdicted · novelty 6.0

Macro uses Direct Preference Optimization on composite-scored preference pairs to improve validity of multilingual self-generated counterfactual explanations by 12.55% on average without degrading minimality.

GeoR-Bench: Evaluating Geoscience Visual Reasoning

cs.CV · 2026-05-12 · unverdicted · novelty 6.0

GeoR-Bench shows top multimodal models reach only 42.7% strict accuracy on geoscience visual reasoning tasks while open-source models reach 10.3%, with outputs often visually plausible yet scientifically inaccurate.

Crosslingual On-Policy Self-Distillation for Multilingual Reasoning

cs.CL · 2026-05-10 · unverdicted · novelty 6.0

COPSD improves mathematical reasoning in low-resource languages by having LLMs self-distill from their own high-resource English behavior via token-level divergence on rollouts with privileged crosslingual context.

APCD: Adaptive Path-Contrastive Decoding for Reliable Large Language Model Generation

cs.CL · 2026-05-10 · unverdicted · novelty 6.0

APCD reduces LLM hallucinations by expanding decoding paths adaptively when entropy signals uncertainty and by contrasting divergent paths to control their interaction.

Dynamic Meta-Metrics: Source-Sentence Conditioned Weighting for MT Evaluation

cs.CL · 2026-05-09 · unverdicted · novelty 6.0

Dynamic Meta-Metrics learns source-sentence-conditioned combinations of MT metrics, with MLP-based hard and soft clustering versions outperforming static linear and Gaussian process ensembles on WMT data.

SlimQwen: Exploring the Pruning and Distillation in Large MoE Model Pre-training

cs.LG · 2026-05-09 · unverdicted · novelty 6.0

Pruning pretrained MoE models outperforms training from scratch, different compression methods converge after continued pretraining, and combining KD with language modeling loss plus progressive schedules yields a competitive 23A2B model from Qwen3-Next-80A3B.

MARLaaS: Multi-Tenant Asynchronous Reinforcement Learning as a Service

cs.DC · 2026-05-08 · unverdicted · novelty 6.0

MARLaaS enables concurrent RL fine-tuning across up to 32 tasks using LoRA adapters and a disaggregated asynchronous architecture, matching single-task performance while improving accelerator utilization by 4.3x and cutting end-to-end time by 85%.

Rotation-Preserving Supervised Fine-Tuning

cs.LG · 2026-05-08 · unverdicted · novelty 6.0

RPSFT improves the in-domain versus out-of-domain performance trade-off during LLM supervised fine-tuning by penalizing rotations in pretrained singular subspaces as a proxy for loss-sensitive directions.

ExecuTorch -- A Unified PyTorch Solution to Run AI Models On-Device

cs.LG · 2026-05-05 · unverdicted · novelty 6.0

ExecuTorch is a unified PyTorch-native deployment framework that enables seamless on-device execution of AI models across heterogeneous hardware while preserving original PyTorch semantics.

OGLS-SD: On-Policy Self-Distillation with Outcome-Guided Logit Steering for LLM Reasoning

cs.LG · 2026-05-12 · unverdicted · novelty 5.0

OGLS-SD improves LLM reasoning by using verifiable outcome rewards to guide logit steering that calibrates teacher distributions in on-policy self-distillation, addressing reflection-induced mismatches.

UserGPT Technical Report

cs.IR · 2026-05-09 · unverdicted · novelty 5.0

UserGPT introduces a generative LLM framework with a behavior simulation engine, semantization module, and DF-GRPO post-training that scores 0.7325 on tag prediction and 0.7528 on summary generation on HPR-Bench while compressing records by up to 97.9%.

Unleashing Scalable Context Parallelism for Foundation Models Pre-Training via FCP

cs.DC · 2026-05-08 · unverdicted · novelty 5.0

FCP shards sequences at block level with flexible P2P communication and bin-packing to achieve near-linear scaling up to 256 GPUs and 1.13x-2.21x higher attention MFU in foundation model pre-training.

Qwen-Scope: Turning Sparse Features into Development Tools for Large Language Models

cs.CL · 2026-05-12 · unverdicted · novelty 4.0

Qwen-Scope provides open-source sparse autoencoders for Qwen models that function as practical interfaces for steering, evaluating, data workflows, and optimizing large language models.

Qwen Goes Brrr: Off-the-Shelf RAG for Ukrainian Multi-Domain Document Understanding

cs.CL · 2026-05-11 · unverdicted · novelty 3.0

A RAG pipeline with contextual PDF chunking, question-and-answer-aware retrieval and reranking using Qwen3 models reaches 0.96 accuracy on a Ukrainian multi-domain document QA shared task.

citing papers explorer

Showing 23 of 23 citing papers.

ReVision: Scaling Computer-Use Agents via Temporal Visual Redundancy Reduction cs.CL · 2026-05-11 · unverdicted · none · ref 44
ReVision reduces visual token usage by 46% on average in agent trajectories via a learned patch selector and improves success rates by 3% on three benchmarks, showing that history saturation stems from inefficient representations rather than lack of utility.
OLIVIA: Online Learning via Inference-time Action Adaptation for Decision Making in LLM ReAct Agents cs.AI · 2026-05-11 · unverdicted · none · ref 26
OLIVIA treats LLM agent action selection as a contextual linear bandit over frozen hidden states and applies UCB exploration to adapt online, yielding consistent gains over static ReAct and prompt-based baselines on four benchmarks.
Multi-domain Multi-modal Document Classification Benchmark with a Multi-level Taxonomy cs.CL · 2026-05-11 · unverdicted · none · ref 22
MMM-Bench is the first benchmark with a 5-level taxonomy, 5,990 multi-modal documents from 12 commercial domains, expert hierarchical annotations, and baselines that reveal four key challenges in multi-domain document classification.
Positional LSH: Binary Block Matrix Approximation for Attention with Linear Biases cs.LG · 2026-05-10 · unverdicted · none · ref 48
ALiBi bias is the expectation of positional LSH-induced block masks, yielding spectral and max-norm approximation bounds that reduce long-context biased attention to randomized short-context unbiased attention.
Fin-Bias: Comprehensive Evaluation for LLM Decision-Making under human bias in Finance Domain cs.CL · 2026-05-09 · unverdicted · none · ref 50
LLMs copy biased analyst ratings in investment decisions but a new detection method encourages independent reasoning and can improve stock return predictions beyond human levels.
Agentick: A Unified Benchmark for General Sequential Decision-Making Agents cs.AI · 2026-05-07 · unverdicted · none · ref 21
Agentick is a new benchmark for sequential decision-making agents that evaluates RL, LLM, VLM, hybrid, and human approaches across 37 tasks and finds no single method dominates.
Solve the Loop: Attractor Models for Language and Reasoning cs.LG · 2026-05-12 · unverdicted · none · ref 52
Attractor Models solve for fixed points in transformer embeddings using implicit differentiation to enable stable iterative refinement, delivering better perplexity, accuracy, and efficiency than standard or looped transformers.
Search Your Block Floating Point Scales! cs.LG · 2026-05-12 · unverdicted · none · ref 30
ScaleSearch optimizes block floating point scales via fine-grained search to cut quantization error by 27% for NVFP4, improving PTQ by up to 15 points on MATH500 for Qwen3-8B and attention PPL by 0.77 on Llama 3.1 70B.
Learning to Foresee: Unveiling the Unlocking Efficiency of On-Policy Distillation cs.CL · 2026-05-12 · unverdicted · none · ref 39
On-policy distillation gains efficiency from early foresight in module allocation and low-rank update directions, enabling EffOPD to accelerate training by 3x via adaptive extrapolation without extra modules or tuning.
Enhancing Multilingual Counterfactual Generation through Alignment-as-Preference Optimization cs.CL · 2026-05-12 · unverdicted · none · ref 55
Macro uses Direct Preference Optimization on composite-scored preference pairs to improve validity of multilingual self-generated counterfactual explanations by 12.55% on average without degrading minimality.
GeoR-Bench: Evaluating Geoscience Visual Reasoning cs.CV · 2026-05-12 · unverdicted · none · ref 30
GeoR-Bench shows top multimodal models reach only 42.7% strict accuracy on geoscience visual reasoning tasks while open-source models reach 10.3%, with outputs often visually plausible yet scientifically inaccurate.
Crosslingual On-Policy Self-Distillation for Multilingual Reasoning cs.CL · 2026-05-10 · unverdicted · none · ref 18
COPSD improves mathematical reasoning in low-resource languages by having LLMs self-distill from their own high-resource English behavior via token-level divergence on rollouts with privileged crosslingual context.
APCD: Adaptive Path-Contrastive Decoding for Reliable Large Language Model Generation cs.CL · 2026-05-10 · unverdicted · none · ref 41
APCD reduces LLM hallucinations by expanding decoding paths adaptively when entropy signals uncertainty and by contrasting divergent paths to control their interaction.
Dynamic Meta-Metrics: Source-Sentence Conditioned Weighting for MT Evaluation cs.CL · 2026-05-09 · unverdicted · none · ref 23
Dynamic Meta-Metrics learns source-sentence-conditioned combinations of MT metrics, with MLP-based hard and soft clustering versions outperforming static linear and Gaussian process ensembles on WMT data.
SlimQwen: Exploring the Pruning and Distillation in Large MoE Model Pre-training cs.LG · 2026-05-09 · unverdicted · none · ref 75
Pruning pretrained MoE models outperforms training from scratch, different compression methods converge after continued pretraining, and combining KD with language modeling loss plus progressive schedules yields a competitive 23A2B model from Qwen3-Next-80A3B.
MARLaaS: Multi-Tenant Asynchronous Reinforcement Learning as a Service cs.DC · 2026-05-08 · unverdicted · none · ref 44
MARLaaS enables concurrent RL fine-tuning across up to 32 tasks using LoRA adapters and a disaggregated asynchronous architecture, matching single-task performance while improving accelerator utilization by 4.3x and cutting end-to-end time by 85%.
Rotation-Preserving Supervised Fine-Tuning cs.LG · 2026-05-08 · unverdicted · none · ref 92
RPSFT improves the in-domain versus out-of-domain performance trade-off during LLM supervised fine-tuning by penalizing rotations in pretrained singular subspaces as a proxy for loss-sensitive directions.
ExecuTorch -- A Unified PyTorch Solution to Run AI Models On-Device cs.LG · 2026-05-05 · unverdicted · none · ref 22
ExecuTorch is a unified PyTorch-native deployment framework that enables seamless on-device execution of AI models across heterogeneous hardware while preserving original PyTorch semantics.
OGLS-SD: On-Policy Self-Distillation with Outcome-Guided Logit Steering for LLM Reasoning cs.LG · 2026-05-12 · unverdicted · none · ref 4
OGLS-SD improves LLM reasoning by using verifiable outcome rewards to guide logit steering that calibrates teacher distributions in on-policy self-distillation, addressing reflection-induced mismatches.
UserGPT Technical Report cs.IR · 2026-05-09 · unverdicted · none · ref 10
UserGPT introduces a generative LLM framework with a behavior simulation engine, semantization module, and DF-GRPO post-training that scores 0.7325 on tag prediction and 0.7528 on summary generation on HPR-Bench while compressing records by up to 97.9%.
Unleashing Scalable Context Parallelism for Foundation Models Pre-Training via FCP cs.DC · 2026-05-08 · unverdicted · none · ref 24
FCP shards sequences at block level with flexible P2P communication and bin-packing to achieve near-linear scaling up to 256 GPUs and 1.13x-2.21x higher attention MFU in foundation model pre-training.
Qwen-Scope: Turning Sparse Features into Development Tools for Large Language Models cs.CL · 2026-05-12 · unverdicted · none · ref 15
Qwen-Scope provides open-source sparse autoencoders for Qwen models that function as practical interfaces for steering, evaluating, data workflows, and optimizing large language models.
Qwen Goes Brrr: Off-the-Shelf RAG for Ukrainian Multi-Domain Document Understanding cs.CL · 2026-05-11 · unverdicted · none · ref 31
A RAG pipeline with contextual PDF chunking, question-and-answer-aware retrieval and reranking using Qwen3 models reaches 0.96 accuracy on a Ukrainian multi-domain document QA shared task.

Title resolution pending

fields

years

verdicts

representative citing papers

citing papers explorer