super hub Mixed citations

write newline

" write newline "" before

Mixed citation behavior. Most common role is background (50%).

192 Pith papers citing it

Background 50% of classified citations

browse 192 citing papers more from " write newline "" before

hub tools

JSON dossier citing papers JSON

citation-role summary

background 9 other 5 method 2

citation-polarity summary

background 8 unclear 6 use method 2

claims ledger

background Flesch-Kincaid Grade Level 8.97 9.08 -0.11 -0.1673 -0.1528 Table 5: Textual complexity metrics and their correlation with frequency. Corr. denotes correlation. We use nlp = spacy.load("en_core_web_sm") for calculation. Bin Range N BLEU(HF) BLEU(LF)∆BLEU(HF-LF) chrF(HF) chrF(LF)∆chrF(HF-LF) Strict Depth Match 144 20.82 16.04 +4.78 48.73 43.86 +4.87 [0%,5%) 144 20.82 16.04 +4.78 48.73 43.86 +4.87 [5%,10%) 6 22.45 14.79 +7.65 49.76 49.19 +0.57 [10%,15%) 71 19.12 15.38 +3.74 46.19 44.71 +1.47 [15%,2

authors

" write newline "" before

co-cited works

representative citing papers

SwissGov-RSD: A Human-annotated, Cross-lingual Benchmark for Token-level Recognition of Semantic Differences Between Related Documents

cs.CL · 2025-12-08 · accept · novelty 8.0

SwissGov-RSD is the first naturalistic cross-lingual document-level benchmark with human token-level semantic difference annotations, on which both LLMs and encoders show a large performance gap relative to simpler settings.

BEAVER: An Enterprise Benchmark for Text-to-SQL

cs.CL · 2024-09-03 · unverdicted · novelty 8.0

BEAVER is the first text-to-SQL benchmark from private enterprise data warehouses, revealing SOTA agentic frameworks achieve only 10.8% accuracy on complex real-world queries.

Investigating More Explainable and Partition-Free Compositionality Estimation for LLMs: A Rule-Generation Perspective

cs.AI · 2026-04-30 · unverdicted · novelty 7.0

A rule-generation perspective lets LLMs write programs as rules for data mapping and applies complexity theory to estimate their compositionality, tested on string-to-grid tasks.

OCR-Memory: Optical Context Retrieval for Long-Horizon Agent Memory

cs.CL · 2026-04-29 · unverdicted · novelty 7.0

OCR-Memory encodes agent trajectories as images with visual anchors and retrieves verbatim text via locate-and-transcribe, yielding gains on long-horizon benchmarks under strict context limits.

From Chatbots to Confidants: A Cross-Cultural Study of LLM Adoption for Emotional Support

cs.CL · 2026-04-28 · unverdicted · novelty 7.0 · 2 refs

Cross-cultural survey of 4,641 participants shows LLM emotional support adoption varies widely by country and demographics, with socioeconomic status as strongest predictor of trust and use, and English-speaking nations more accepting than others in Europe.

PushupBench: Your VLM is not good at counting pushups

cs.CV · 2026-04-25 · unverdicted · novelty 7.0

VLMs reach only 42.1% exact accuracy on counting pushups in videos, with weaker models exploiting modal counts, and 1k-sample fine-tuning transfers gains to MVBench, PerceptionTest, and TVBench.

Fine-tuning vs. In-context Learning in Large Language Models: A Formal Language Learning Perspective

cs.CL · 2026-04-25 · conditional · novelty 7.0 · 2 refs

A controlled formal language task reveals fine-tuning outperforms in-context learning on in-distribution generalization but equals it on out-of-distribution, with ICL showing greater sensitivity to model size and tokenization.

StoryTR: Narrative-Centric Video Temporal Retrieval with Theory of Mind Reasoning

cs.AI · 2026-04-25 · unverdicted · novelty 7.0

StoryTR is a new benchmark and agentic data pipeline that adds explicit Theory of Mind reasoning chains to train smaller video retrieval models, yielding a 15% relative IoU gain over larger baselines on narrative content.

Evaluating Temporal Consistency in Multi-Turn Language Models

cs.CL · 2026-04-24 · unverdicted · novelty 7.0

Language models frequently violate temporal scope stability in multi-turn dialogues by drifting toward present-day assumptions even when they possess the correct facts.

BERAG: Bayesian Ensemble Retrieval-Augmented Generation for Knowledge-based Visual Question Answering

cs.CL · 2026-04-24 · unverdicted · novelty 7.0

BERAG applies Bayesian ensemble weighting of individual documents via token-by-token posterior updates in retrieval-augmented generation, yielding gains on knowledge-based visual QA tasks.

How Tokenization Limits Phonological Knowledge Representation in Language Models and How to Improve Them

cs.CL · 2026-04-18 · unverdicted · novelty 7.0

Subword tokenization impairs phonological knowledge encoding in LMs, but an IPA-based fine-tuning method restores it with minimal impact on other capabilities.

BIASEDTALES-ML: A Multilingual Dataset for Analyzing Narrative Attribute Distributions in LLM-Generated Stories

cs.CL · 2026-04-18 · unverdicted · novelty 7.0

BiasedTales-ML provides a parallel multilingual corpus of LLM-generated children's stories that reveals substantial cross-lingual differences in narrative attributes not captured by English-centric analyses.

Conjunctive Prompt Attacks in Multi-Agent LLM Systems

cs.MA · 2026-04-17 · unverdicted · novelty 7.0

Conjunctive prompt attacks split adversarial elements across agents and routing paths in multi-agent LLM systems, evading isolated defenses and succeeding through topology-aware optimization.

VisPCO: Visual Token Pruning Configuration Optimization via Budget-Aware Pareto-Frontier Learning for Vision-Language Models

cs.CV · 2026-04-16 · unverdicted · novelty 7.0

VisPCO uses continuous relaxation, straight-through estimators, and budget-aware Pareto-frontier learning to automatically discover optimal visual token pruning configurations that approximate grid-search results across VLMs and benchmarks.

HintPilot: LLM-based Compiler Hint Synthesis for Code Optimization

cs.SE · 2026-04-16 · unverdicted · novelty 7.0

HintPilot synthesizes semantics-preserving compiler hints via retrieval-augmented LLM generation and profiling-guided refinement, delivering up to 6.88x geometric mean speedup over -Ofast on PolyBench and HumanEval-CPP while preserving correctness.

Route to Rome Attack: Directing LLM Routers to Expensive Models via Adversarial Suffix Optimization

cs.CR · 2026-04-16 · unverdicted · novelty 7.0

R²A uses a hybrid ensemble surrogate router and suffix optimization to significantly increase black-box LLM router selection of expensive models across query distributions.

ADAPT: Benchmarking Commonsense Planning under Unspecified Affordance Constraints

cs.AI · 2026-04-16 · unverdicted · novelty 7.0

ADAPT augments planners with affordance reasoning to raise task success in environments with unspecified and time-varying object affordances, and a LoRA-finetuned VLM backend beats GPT-4o on the new DynAfford benchmark.

Schema Key Wording as an Instruction Channel in Structured Generation under Constrained Decoding

cs.CL · 2026-04-16 · unverdicted · novelty 7.0

Schema-key wording functions as an implicit instruction channel under constrained decoding, with experiments showing that rephrasing only the keys can substantially change accuracy on math benchmarks while prompt, model, structure, and decoding remain unchanged.

SPAGBias: Uncovering and Tracing Structured Spatial Gender Bias in Large Language Models

cs.CL · 2026-04-16 · unverdicted · novelty 7.0

SPAGBias reveals that LLMs form nuanced gender associations with specific urban micro-spaces that exceed real-world distributions and produce failures in planning and descriptive tasks.

Controlling Authority Retrieval: A Missing Retrieval Objective for Authority-Governed Knowledge

cs.IR · 2026-04-15 · unverdicted · novelty 7.0

CAR is a new retrieval objective that targets the currently active authority set rather than most-similar documents, with theorems on coverage conditions and evaluations showing two-stage methods outperform dense retrieval on authority-governed datasets.

Why Multimodal In-Context Learning Lags Behind? Unveiling the Inner Mechanisms and Bottlenecks

cs.CV · 2026-04-15 · unverdicted · novelty 7.0

Multimodal ICL lags text-only ICL in few-shot settings due to weak cross-modal reasoning alignment and unreliable task mapping transfer, with an inference-stage method proposed to strengthen transfer.

Teaching LLMs Human-Like Editing of Inappropriate Argumentation via Reinforcement Learning

cs.CL · 2026-04-14 · unverdicted · novelty 7.0

Reinforcement learning with a multi-part reward teaches LLMs to output independent, meaning-preserving sentence edits that raise argument appropriateness close to full rewriting.

Calibrated Confidence Estimation for Tabular Question Answering

cs.CL · 2026-04-14 · unverdicted · novelty 7.0

Tabular QA LLMs are overconfident, but Multi-Format Agreement using Markdown/HTML/JSON/CSV variants improves AUROC to 0.80 and cuts calibration error by 44-63% at lower cost than sampling.

EgoEsportsQA: An Egocentric Video Benchmark for Perception and Reasoning in Esports

cs.CV · 2026-04-14 · unverdicted · novelty 7.0

EgoEsportsQA is a new egocentric video QA benchmark from esports matches that shows state-of-the-art Video-LLMs reach only 71.58% accuracy and struggle more with tactical reasoning than basic perception.

citing papers explorer

Showing 23 of 23 citing papers after filters.

Investigating More Explainable and Partition-Free Compositionality Estimation for LLMs: A Rule-Generation Perspective cs.AI · 2026-04-30 · unverdicted · none · ref 2
A rule-generation perspective lets LLMs write programs as rules for data mapping and applies complexity theory to estimate their compositionality, tested on string-to-grid tasks.
StoryTR: Narrative-Centric Video Temporal Retrieval with Theory of Mind Reasoning cs.AI · 2026-04-25 · unverdicted · none · ref 29
StoryTR is a new benchmark and agentic data pipeline that adds explicit Theory of Mind reasoning chains to train smaller video retrieval models, yielding a 15% relative IoU gain over larger baselines on narrative content.
ADAPT: Benchmarking Commonsense Planning under Unspecified Affordance Constraints cs.AI · 2026-04-16 · unverdicted · none · ref 25
ADAPT augments planners with affordance reasoning to raise task success in environments with unspecified and time-varying object affordances, and a LoRA-finetuned VLM backend beats GPT-4o on the new DynAfford benchmark.
RADD: Retrieval-Augmented Discrete Diffusion for Multi-Modal Knowledge Graph Completion cs.AI · 2026-04-28 · unverdicted · none · ref 2
RADD decouples retrieval and reranking in multi-modal KGC via a relation-aware KGE retriever and conditional discrete denoiser, reporting state-of-the-art results on three benchmarks.
Beyond Text-Dominance: Understanding Modality Preference of Omni-modal Large Language Models cs.AI · 2026-04-18 · unverdicted · none · ref 2
Omni-modal LLMs exhibit visual preference that emerges in mid-to-late layers, enabling hallucination detection without task-specific training.
CAMO: An Agentic Framework for Automated Causal Discovery from Micro Behaviors to Macro Emergence in LLM Agent Simulations cs.AI · 2026-04-16 · unverdicted · none · ref 2
CAMO automates causal discovery in LLM agent simulations by converting hypotheses to computable factors, learning minimal causal subgraphs around an emergent target, and using internal counterfactual probing to orient edges.
MARS$^2$: Scaling Multi-Agent Tree Search via Reinforcement Learning for Code Generation cs.AI · 2026-04-16 · unverdicted · none · ref 38
MARS² integrates multi-agent collaboration with tree-structured search in RL to boost code generation by increasing exploratory diversity and using path-level group advantages for credit assignment.
Preventing Safety Drift in Large Language Models via Coupled Weight and Activation Constraints cs.AI · 2026-04-14 · unverdicted · none · ref 2
Coupled constraints on weight updates in a safety subspace and regularization of SAE-identified safety features preserve LLM refusal behaviors during fine-tuning better than weight-only or activation-only methods.
GAM: Hierarchical Graph-based Agentic Memory for LLM Agents cs.AI · 2026-04-14 · unverdicted · none · ref 2
GAM decouples event-level memory encoding from topic-level consolidation in LLM agents using hierarchical graphs to reduce interference and improve long-term coherence and retrieval.
FACT-E: Causality-Inspired Evaluation for Trustworthy Chain-of-Thought Reasoning cs.AI · 2026-04-12 · unverdicted · none · ref 58
FACT-E uses controlled perturbations as an instrumental signal to measure intra-chain faithfulness in CoT reasoning and combines it with answer consistency to select trustworthy trajectories.
MONETA: Multimodal Industry Classification through Geographic Information with Multi Agent Systems cs.AI · 2026-04-09 · unverdicted · none · ref 50
MONETA is the first multimodal benchmark for industry classification using text and geographic sources, with MLLM baselines at 62-74% accuracy and up to 22.8% gains from multi-turn context enrichment and explanations.
Label Effects: Shared Heuristic Reliance in Trust Assessment by Humans and LLM-as-a-Judge cs.AI · 2026-04-07 · unverdicted · none · ref 55
Both humans and LLMs trust content more when labeled human-authored than AI-generated, with LLMs showing denser attention to labels and higher uncertainty under AI labels, mirroring human heuristic patterns.
CodeDistiller: Automatically Generating Code Libraries for Scientific Coding Agents cs.AI · 2025-11-30 · conditional · none · ref 27
CodeDistiller distills 250 materials-science GitHub repositories into vetted code libraries that improve the accuracy and scientific soundness of experiments generated by ASD agents.
WiseMind: a knowledge-guided multi-agent framework for accurate and empathetic psychiatric diagnosis cs.AI · 2025-02-28 · unverdicted · none · ref 2
WiseMind is a dual-agent LLM system with DSM-5 knowledge graph guidance that reaches 85.6% top-1 diagnostic accuracy on simulated and real psychiatric conversations while producing supportive responses.
Assessing Y-Axis Influence: Bias in Multimodal Language Models on Chart-to-Table Translation cs.AI · 2026-04-27 · unverdicted · none · ref 26
Y-axis features such as major tick digit length, number of ticks, value range, and format introduce significant biases in multimodal models during chart-to-table tasks, with y-axis prompting improving performance for some models.
From Coarse to Fine: Self-Adaptive Hierarchical Planning for LLM Agents cs.AI · 2026-04-25 · unverdicted · none · ref 37
AdaPlan-H enables LLM agents to generate self-adaptive hierarchical plans that adjust detail level to task difficulty, improving success rates in multi-step tasks.
Rethinking Math Reasoning Evaluation: A Robust LLM-as-a-Judge Framework Beyond Symbolic Rigidity cs.AI · 2026-04-24 · unverdicted · none · ref 39
An LLM-as-a-judge evaluation framework for math reasoning outperforms symbolic methods by accurately assessing diverse answer representations and formats.
E3-TIR: Enhanced Experience Exploitation for Tool-Integrated Reasoning cs.AI · 2026-04-10 · unverdicted · none · ref 2
E3-TIR integrates expert prefixes, guided branches, and self-exploration via mix policy optimization to deliver 6% better tool-use performance with under 10% of the usual synthetic data and 1.46x ROI.
Verify Before You Commit: Towards Faithful Reasoning in LLM Agents via Self-Auditing cs.AI · 2026-04-09 · unverdicted · none · ref 2
SAVeR adds self-auditing of internal beliefs in LLM agents via persona-based candidates and constraint-guided repairs, improving faithfulness on six benchmarks without hurting task performance.
SAT: Balancing Reasoning Accuracy and Efficiency with Stepwise Adaptive Thinking cs.AI · 2026-04-09 · unverdicted · none · ref 43
SAT reduces reasoning tokens by up to 40% across multiple large reasoning models and benchmarks by adaptively pruning steps based on difficulty while maintaining or improving accuracy.
Enhancing Mental Health Counseling Support in Bangladesh using Culturally-Grounded Knowledge cs.AI · 2026-04-16 · unverdicted · none · ref 34
A clinically validated knowledge graph built for Bangladeshi stressors and interventions improves LLM counseling responses over standard RAG in contextual relevance and clinical appropriateness.
Towards Trustworthy Report Generation: A Deep Research Agent with Progressive Confidence Estimation and Calibration cs.AI · 2026-04-07 · unverdicted · none · ref 32
A deep research agent incorporates progressive confidence estimation and calibration to produce trustworthy reports with transparent confidence scores on claims.
Thinking Fast, Thinking Wrong: Intuitiveness Modulates LLM Counterfactual Reasoning in Policy Evaluation cs.AI · 2026-04-12 · unreviewed · ref 2

write newline

hub tools

citation-role summary

citation-polarity summary

claims ledger

authors

co-cited works

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer