Mixed citations

inProceed- ingsofthe2023ConferenceonEmpiricalMethodsinNaturalLanguageProcessing2511–2522 (Association for Computational Linguistics, Singapore, 2023)

Traian Rebedea, Razvan Dinu, Makesh Narsimhan Sreedhar, Christopher Parisien, Jonathan Cohen · 2023 · DOI 10.18653/v1/2023.emnlp-

Mixed citation behavior. Most common role is background (60%).

30 Pith papers citing it

Background 60% of classified citations

open at publisher browse 30 citing papers

citation-role summary

background 3 method 2

citation-polarity summary

background 3 use method 2

representative citing papers

CrossPool: Efficient Multi-LLM Serving for Cold MoE Models through KV-Cache and Weight Disaggregation

cs.DC · 2026-06-23 · unverdicted · novelty 7.0 · 2 refs

CrossPool separates weights and KV-cache into distinct GPU pools plus a planner, virtualizer, and layer-wise scheduler to cut P99 time-between-tokens by up to 10.4x versus prior kvcached multi-LLM systems.

QCFuse: Query-Aware Cache Fusion via Compressed View for Efficient RAG Serving

cs.AI · 2026-06-04 · unverdicted · novelty 7.0

QCFuse achieves full-prefill quality in RAG with 1.7x average prefill speedup over full prefill and 1.5x over ProphetKV via compressed query-aware cache fusion.

Hijacking Agent Memory: Stealthy Trojan Attacks Through Conversational Interaction

cs.CR · 2026-05-28 · unverdicted · novelty 7.0

MemPoison enables stealthy memory poisoning in LLM agents via dialogue by using semantic relational bridges, entity masquerading, and joint embedding optimization to bypass selective extraction and rewriting, achieving up to 0.95 attack success rate.

SilentRetrieval: Hijacking Retrieval-Augmented Generation via Semantically-Preserving Adversarial Data Poisoning

cs.CR · 2026-05-27 · unverdicted · novelty 7.0

SilentRetrieval is a data poisoning attack achieving 84.6% HR@10 and 57.5% ASR-LLM on Natural Questions via coordinated beam search and trigger fusion while preserving document fluency.

Layer-wise Token Compression for Efficient Document Reranking

cs.IR · 2026-05-20 · unverdicted · novelty 7.0 · 2 refs

Layer-wise Token Compression applies adaptive token pooling at middle transformer layers for cross-encoder rerankers, preserving MS MARCO ranking quality while raising QPS up to 25% on passages and 116% on documents, with added gains on listwise LLM rerankers and a regularizer effect for long inputs

Code Generation by Differential Test Time Scaling

cs.SE · 2026-05-19 · unverdicted · novelty 7.0

DiffCodeGen clusters code candidates by behavioral similarity from fuzzing-synthesized inputs and selects the largest cluster's medoid, matching or exceeding prior test-time scaling methods with far less token and time cost.

IPQA: A Benchmark for Core Intent Identification in Personalized Question Answering

cs.CL · 2025-10-27 · conditional · novelty 7.0

IPQA is a new benchmark that measures how well models identify core user intents from history in personalized question answering, finding that performance is poor and declines with greater question complexity.

From tools to thieves: Measuring and understanding public perceptions of AI through crowdsourced metaphors

cs.CY · 2025-01-29 · unverdicted · novelty 7.0

Crowdsourced metaphors show rising anthropomorphism and warmth toward AI that predict trust and adoption, with notable demographic differences.

ActPlane: Programmable OS-Level Policy Enforcement for Agent Harnesses

cs.OS · 2026-06-23 · unverdicted · novelty 6.0 · 2 refs

ActPlane introduces an OS-kernel policy engine using an information-flow control DSL and eBPF to enforce agent harness policies, achieving better compliance on indirect paths with 1.9-8.4% overhead.

Ontology-constrained multi-LLM scoring of hypothesis support in the predictive processing literature

q-bio.NC · 2026-05-23 · unverdicted · novelty 6.0

A multi-LLM council scores predictive processing papers on an expert ontology, maps results in 3D hypothesis space, and introduces a dispersion metric showing greater spread in global versus local oddball paradigms.

Understanding Wacky Weights: A Dissection of SPLADE's Learned Term Importance

cs.IR · 2026-05-19 · conditional · novelty 6.0

SPLADE models produce wacky expansion terms whose prevalence rises with larger vocabularies and falls with stricter sparsity; these terms primarily aid in-domain retrieval rather than out-of-domain generalization.

Designing for Being-With: Presence Without Personhood in Conversational Human-AI Interaction

cs.HC · 2026-05-16 · unverdicted · novelty 6.0

Introduces bounded relational presence as a designable, tunable, and withdrawable quality for conversational AI that supports engagement while avoiding claims of personhood or human equivalence.

Structural Generalization on SLOG without Hand-Written Rules

cs.CL · 2026-04-28 · unverdicted · novelty 6.0 · 2 refs

A neural cellular automaton learns compositional rules from data alone to achieve structural generalization on the SLOG semantic parsing benchmark, reaching 67.3% accuracy and fully succeeding on 11 of 17 categories.

Learning Evidence of Depression Symptoms via Prompt Induction

cs.CL · 2026-04-27 · unverdicted · novelty 6.0

Symptom Induction compresses labeled data into interpretable guidelines that improve LLM classification of depression symptoms in text, outperforming zero-shot, in-context, and fine-tuning approaches with gains on rare symptoms and cross-disease generalization.

GRASP: Grounded CoT Reasoning with Dual-Stage Optimization for Multimodal Sarcasm Target Identification

cs.CL · 2026-04-10 · unverdicted · novelty 6.0

GRASP improves multimodal sarcasm target identification by anchoring visual regions in grounded chain-of-thought reasoning and using dual-stage optimization on a new balanced dataset.

Memory in the LLM Era: Modular Architectures and Strategies in a Unified Framework

cs.CL · 2026-04-02 · unverdicted · novelty 6.0

A unified framework for LLM agent memory is benchmarked, with a new hybrid method outperforming state-of-the-art on standard tasks.

MirrorBench: A Benchmark to Evaluate Conversational User-Proxy Agents for Human-Likeness

cs.AI · 2026-01-13 · unverdicted · novelty 6.0

MirrorBench defines a reproducible benchmark combining lexical metrics (MATTR, Yule's K, HD-D) and LLM-judge metrics with calibration controls to measure human-likeness of user-proxy agents across four datasets.

MM-Telco: Benchmarks and Multimodal Large Language Models for Telecom Applications

cs.AI · 2025-11-17 · unverdicted · novelty 6.0

MM-Telco creates multimodal benchmarks for telecom and demonstrates that fine-tuned LLMs and VLMs achieve significant performance gains on domain-specific tasks.

We'll Fix it in Post: Improving Text-to-Video Generation with Neuro-Symbolic Feedback

cs.CV · 2025-04-24 · unverdicted · novelty 6.0

NeuS-E is a post-generation refinement method that uses neuro-symbolic analysis of a formal video representation to detect and correct semantic and temporal inconsistencies in text-to-video outputs, improving prompt alignment by nearly 40%.

CoCoMUT: A Tool for Code-Context Mining and Automated Dataset Generation

cs.SE · 2026-06-30 · unverdicted · novelty 5.0

CoCoMUT is a reusable pipeline that discovers project structure, constructs call graphs, extracts source, reconciles bytecode to source, and emits versioned JSON datasets of method contexts, demonstrated on 20 Java repositories with 97.8% reconciliation and 99% audit accuracy.

Resonant Minds: Closed-Loop Social Avatars with Theory of Mind

cs.CV · 2026-06-04 · unverdicted · novelty 5.0

A dual-agent closed-loop system integrates Theory of Mind reasoning with multimodal video generation to create social avatars that outperform full-information baselines on dialogue quality under information asymmetry.

EgoCoT-Bench: Benchmarking Grounded and Verifiable Operation-Centric Chain of Thought Reasoning for MLLMs

cs.CV · 2026-05-19 · unverdicted · novelty 5.0

EgoCoT-Bench provides 3,172 verifiable QA pairs across perception, anticipation, and reasoning tasks on egocentric videos, revealing that many MLLMs give answer-correct but evidence-inconsistent explanations.

TextClusterLab: An Integrated Framework for Reliable Text Clustering Studies

cs.IR · 2026-05-17 · unverdicted · novelty 5.0

TextClusterLab introduces an LLM-driven generator for synthetic text clustering datasets with tunable attributes and a suitability benchmark for evaluation.

Not All RAGs Are Created Equal: A Component-Wise Empirical Study for Software Engineering Tasks

cs.SE · 2026-05-14 · unverdicted · novelty 5.0

Retriever-side choices, particularly the retrieval algorithm, exert more influence on RAG performance than generator selection across code generation, summarization, and repair tasks.

citing papers explorer

Showing 27 of 27 citing papers after filters.

CrossPool: Efficient Multi-LLM Serving for Cold MoE Models through KV-Cache and Weight Disaggregation cs.DC · 2026-06-23 · unverdicted · none · ref 1 · 2 links
CrossPool separates weights and KV-cache into distinct GPU pools plus a planner, virtualizer, and layer-wise scheduler to cut P99 time-between-tokens by up to 10.4x versus prior kvcached multi-LLM systems.
QCFuse: Query-Aware Cache Fusion via Compressed View for Efficient RAG Serving cs.AI · 2026-06-04 · unverdicted · none · ref 3
QCFuse achieves full-prefill quality in RAG with 1.7x average prefill speedup over full prefill and 1.5x over ProphetKV via compressed query-aware cache fusion.
Hijacking Agent Memory: Stealthy Trojan Attacks Through Conversational Interaction cs.CR · 2026-05-28 · unverdicted · none · ref 56
MemPoison enables stealthy memory poisoning in LLM agents via dialogue by using semantic relational bridges, entity masquerading, and joint embedding optimization to bypass selective extraction and rewriting, achieving up to 0.95 attack success rate.
SilentRetrieval: Hijacking Retrieval-Augmented Generation via Semantically-Preserving Adversarial Data Poisoning cs.CR · 2026-05-27 · unverdicted · none · ref 37
SilentRetrieval is a data poisoning attack achieving 84.6% HR@10 and 57.5% ASR-LLM on Natural Questions via coordinated beam search and trigger fusion while preserving document fluency.
Layer-wise Token Compression for Efficient Document Reranking cs.IR · 2026-05-20 · unverdicted · none · ref 21 · 2 links
Layer-wise Token Compression applies adaptive token pooling at middle transformer layers for cross-encoder rerankers, preserving MS MARCO ranking quality while raising QPS up to 25% on passages and 116% on documents, with added gains on listwise LLM rerankers and a regularizer effect for long inputs
Code Generation by Differential Test Time Scaling cs.SE · 2026-05-19 · unverdicted · none · ref 90
DiffCodeGen clusters code candidates by behavioral similarity from fuzzing-synthesized inputs and selects the largest cluster's medoid, matching or exceeding prior test-time scaling methods with far less token and time cost.
From tools to thieves: Measuring and understanding public perceptions of AI through crowdsourced metaphors cs.CY · 2025-01-29 · unverdicted · none · ref 1
Crowdsourced metaphors show rising anthropomorphism and warmth toward AI that predict trust and adoption, with notable demographic differences.
ActPlane: Programmable OS-Level Policy Enforcement for Agent Harnesses cs.OS · 2026-06-23 · unverdicted · none · ref 43 · 2 links
ActPlane introduces an OS-kernel policy engine using an information-flow control DSL and eBPF to enforce agent harness policies, achieving better compliance on indirect paths with 1.9-8.4% overhead.
Ontology-constrained multi-LLM scoring of hypothesis support in the predictive processing literature q-bio.NC · 2026-05-23 · unverdicted · none · ref 82
A multi-LLM council scores predictive processing papers on an expert ontology, maps results in 3D hypothesis space, and introduces a dispersion metric showing greater spread in global versus local oddball paradigms.
Designing for Being-With: Presence Without Personhood in Conversational Human-AI Interaction cs.HC · 2026-05-16 · unverdicted · none · ref 2
Introduces bounded relational presence as a designable, tunable, and withdrawable quality for conversational AI that supports engagement while avoiding claims of personhood or human equivalence.
Structural Generalization on SLOG without Hand-Written Rules cs.CL · 2026-04-28 · unverdicted · none · ref 13 · 2 links
A neural cellular automaton learns compositional rules from data alone to achieve structural generalization on the SLOG semantic parsing benchmark, reaching 67.3% accuracy and fully succeeding on 11 of 17 categories.
Learning Evidence of Depression Symptoms via Prompt Induction cs.CL · 2026-04-27 · unverdicted · none · ref 13
Symptom Induction compresses labeled data into interpretable guidelines that improve LLM classification of depression symptoms in text, outperforming zero-shot, in-context, and fine-tuning approaches with gains on rare symptoms and cross-disease generalization.
GRASP: Grounded CoT Reasoning with Dual-Stage Optimization for Multimodal Sarcasm Target Identification cs.CL · 2026-04-10 · unverdicted · none · ref 19
GRASP improves multimodal sarcasm target identification by anchoring visual regions in grounded chain-of-thought reasoning and using dual-stage optimization on a new balanced dataset.
Memory in the LLM Era: Modular Architectures and Strategies in a Unified Framework cs.CL · 2026-04-02 · unverdicted · none · ref 45
A unified framework for LLM agent memory is benchmarked, with a new hybrid method outperforming state-of-the-art on standard tasks.
MirrorBench: A Benchmark to Evaluate Conversational User-Proxy Agents for Human-Likeness cs.AI · 2026-01-13 · unverdicted · none · ref 24
MirrorBench defines a reproducible benchmark combining lexical metrics (MATTR, Yule's K, HD-D) and LLM-judge metrics with calibration controls to measure human-likeness of user-proxy agents across four datasets.
MM-Telco: Benchmarks and Multimodal Large Language Models for Telecom Applications cs.AI · 2025-11-17 · unverdicted · none · ref 22
MM-Telco creates multimodal benchmarks for telecom and demonstrates that fine-tuned LLMs and VLMs achieve significant performance gains on domain-specific tasks.
We'll Fix it in Post: Improving Text-to-Video Generation with Neuro-Symbolic Feedback cs.CV · 2025-04-24 · unverdicted · none · ref 11
NeuS-E is a post-generation refinement method that uses neuro-symbolic analysis of a formal video representation to detect and correct semantic and temporal inconsistencies in text-to-video outputs, improving prompt alignment by nearly 40%.
CoCoMUT: A Tool for Code-Context Mining and Automated Dataset Generation cs.SE · 2026-06-30 · unverdicted · none · ref 21
CoCoMUT is a reusable pipeline that discovers project structure, constructs call graphs, extracts source, reconciles bytecode to source, and emits versioned JSON datasets of method contexts, demonstrated on 20 Java repositories with 97.8% reconciliation and 99% audit accuracy.
Resonant Minds: Closed-Loop Social Avatars with Theory of Mind cs.CV · 2026-06-04 · unverdicted · none · ref 24
A dual-agent closed-loop system integrates Theory of Mind reasoning with multimodal video generation to create social avatars that outperform full-information baselines on dialogue quality under information asymmetry.
EgoCoT-Bench: Benchmarking Grounded and Verifiable Operation-Centric Chain of Thought Reasoning for MLLMs cs.CV · 2026-05-19 · unverdicted · none · ref 52
EgoCoT-Bench provides 3,172 verifiable QA pairs across perception, anticipation, and reasoning tasks on egocentric videos, revealing that many MLLMs give answer-correct but evidence-inconsistent explanations.
TextClusterLab: An Integrated Framework for Reliable Text Clustering Studies cs.IR · 2026-05-17 · unverdicted · none · ref 42
TextClusterLab introduces an LLM-driven generator for synthetic text clustering datasets with tunable attributes and a suitability benchmark for evaluation.
Not All RAGs Are Created Equal: A Component-Wise Empirical Study for Software Engineering Tasks cs.SE · 2026-05-14 · unverdicted · none · ref 42
Retriever-side choices, particularly the retrieval algorithm, exert more influence on RAG performance than generator selection across code generation, summarization, and repair tasks.
ChipLingo: A Systematic Training Framework for Large Language Models in EDA cs.LG · 2026-04-30 · unverdicted · none · ref 23
ChipLingo trains LLMs on EDA data via corpus construction, domain-adaptive pretraining, and RAG scenario alignment, reaching 59.7% accuracy with an 8B model and 70.02% with a 32B model on a new internal EDA benchmark.
Evaluating the Practical Effectiveness of LLM-Driven Index Tuning with Microsoft Database Tuning Advisor cs.DB · 2026-03-10 · unverdicted · none · ref 50
LLMs can outperform DTA on index recommendations for some workloads but remain less reliable with practical adoption challenges.
Defending against Backdoor Attacks via Module Switching cs.CR · 2025-04-08 · unverdicted · none · ref 16
Module-switching defense disrupts backdoors more effectively than weight averaging with fewer models and remains robust even when some models share the same backdoors.
Conversational Query Engine for Mixed-Modality Heterogeneous Enterprise Data Sources cs.IR · 2026-06-15 · unverdicted · none · ref 18
COGNI is a production conversational BI system with indexing, routing, retrieval, and caching layers that reports 88-94% accuracy metrics on internal enterprise benchmarks for mixed structured and unstructured data.
From Binary Groundedness to Support Relations: Towards a Reader-Centred Taxonomy for Comprehension of AI Output cs.HC · 2026-04-09 · unverdicted · none · ref 24
Binary groundedness judgments in AI evaluations should be replaced by a reader-centered taxonomy of support relations that distinguishes syntactic and interpretive moves between generated statements and source documents.

inProceed- ingsofthe2023ConferenceonEmpiricalMethodsinNaturalLanguageProcessing2511–2522 (Association for Computational Linguistics, Singapore, 2023)

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer