Qwen3.5 : Towards native multimodal agents, February 2026

Qwen Team · 2026

8 Pith papers cite this work. Polarity classification is still indexing.

8 Pith papers citing it

browse 8 citing papers

citation-role summary

background 1

citation-polarity summary

background 1

representative citing papers

Adaptive Stopping for Multi-Turn LLM Reasoning

cs.CL · 2026-04-01 · unverdicted · novelty 8.0

MiCP is the first conformal prediction method for multi-turn LLM pipelines that allocates per-turn error budgets to enable adaptive stopping with an overall coverage guarantee, shown to reduce turns and cost on RAG and ReAct benchmarks.

F-GRPO: Factorized Group-Relative Policy Optimization for Unified Candidate Generation and Ranking

cs.LG · 2026-05-13 · unverdicted · novelty 7.0

F-GRPO factorizes group-relative policy optimization into generation and ranking phases within one autoregressive sequence, using order-invariant coverage and position-aware utility rewards to improve top-ranked performance on recommendation and multi-hop QA tasks.

LongMemEval-V2: Evaluating Long-Term Agent Memory Toward Experienced Colleagues

cs.CL · 2026-05-12 · unverdicted · novelty 7.0

LongMemEval-V2 is a new benchmark where AgentRunbook-C reaches 72.5% accuracy on long-term agent memory tasks, beating RAG baselines at 48.5% and basic coding agents at 69.3%.

AssayBench: An Assay-Level Virtual Cell Benchmark for LLMs and Agents

cs.LG · 2026-05-11 · unverdicted · novelty 7.0

AssayBench is a new gene-ranking benchmark for phenotypic CRISPR screens that shows zero-shot generalist LLMs outperform both biology-specific LLMs and trainable baselines on adjusted nDCG.

Super Apriel: One Checkpoint, Many Speeds

cs.LG · 2026-04-21 · unverdicted · novelty 7.0

A single 15B supernet checkpoint supports runtime switching between attention mixer placements for multiple decode speed presets while retaining 77-96% quality relative to the teacher model.

ATBench: A Diverse and Realistic Agent Trajectory Benchmark for Safety Evaluation and Diagnosis

cs.AI · 2026-04-02 · unverdicted · novelty 6.0 · 2 refs

ATBench is a new trajectory-level benchmark with 1,000 diverse and realistic scenarios for assessing safety in LLM agents.

Lossless Anti-Distillation Sampling

cs.LG · 2026-05-12 · unverdicted · novelty 5.0

LADS is a sampling method that keeps benign user generations statistically identical to the original model while forcing correlated samples across a distiller's multiple accounts, provably worsening their generalization via uniform convergence bounds.

Structured Recurrent Mixers for Massively Parallelized Sequence Generation

cs.CL · 2026-05-09 · 2 refs

citing papers explorer

Showing 8 of 8 citing papers.

Adaptive Stopping for Multi-Turn LLM Reasoning cs.CL · 2026-04-01 · unverdicted · none · ref 18
MiCP is the first conformal prediction method for multi-turn LLM pipelines that allocates per-turn error budgets to enable adaptive stopping with an overall coverage guarantee, shown to reduce turns and cost on RAG and ReAct benchmarks.
F-GRPO: Factorized Group-Relative Policy Optimization for Unified Candidate Generation and Ranking cs.LG · 2026-05-13 · unverdicted · none · ref 36
F-GRPO factorizes group-relative policy optimization into generation and ranking phases within one autoregressive sequence, using order-invariant coverage and position-aware utility rewards to improve top-ranked performance on recommendation and multi-hop QA tasks.
LongMemEval-V2: Evaluating Long-Term Agent Memory Toward Experienced Colleagues cs.CL · 2026-05-12 · unverdicted · none · ref 89
LongMemEval-V2 is a new benchmark where AgentRunbook-C reaches 72.5% accuracy on long-term agent memory tasks, beating RAG baselines at 48.5% and basic coding agents at 69.3%.
AssayBench: An Assay-Level Virtual Cell Benchmark for LLMs and Agents cs.LG · 2026-05-11 · unverdicted · none · ref 22
AssayBench is a new gene-ranking benchmark for phenotypic CRISPR screens that shows zero-shot generalist LLMs outperform both biology-specific LLMs and trainable baselines on adjusted nDCG.
Super Apriel: One Checkpoint, Many Speeds cs.LG · 2026-04-21 · unverdicted · none · ref 43
A single 15B supernet checkpoint supports runtime switching between attention mixer placements for multiple decode speed presets while retaining 77-96% quality relative to the teacher model.
ATBench: A Diverse and Realistic Agent Trajectory Benchmark for Safety Evaluation and Diagnosis cs.AI · 2026-04-02 · unverdicted · none · ref 28 · 2 links
ATBench is a new trajectory-level benchmark with 1,000 diverse and realistic scenarios for assessing safety in LLM agents.
Lossless Anti-Distillation Sampling cs.LG · 2026-05-12 · unverdicted · none · ref 112
LADS is a sampling method that keeps benign user generations statistically identical to the original model while forcing correlated samples across a distiller's multiple accounts, provably worsening their generalization via uniform convergence bounds.
Structured Recurrent Mixers for Massively Parallelized Sequence Generation cs.CL · 2026-05-09 · unreviewed · ref 55 · 2 links

Qwen3.5 : Towards native multimodal agents, February 2026

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer