super hub Mixed citations

Title resolution pending

Mistral 7B · 2023 · cs.CL · arXiv 2310.06825

Mixed citation behavior. Most common role is background (61%).

631 Pith papers citing it

Background 61% of classified citations

open full Pith review browse 631 citing papers more from Mistral 7B arXiv PDF

Title metadata for this work has not finished resolving. The hub is built from the citation graph; the title resolver retries DOI and OpenAlex on its next pass.

abstract

We introduce Mistral 7B v0.1, a 7-billion-parameter language model engineered for superior performance and efficiency. Mistral 7B outperforms Llama 2 13B across all evaluated benchmarks, and Llama 1 34B in reasoning, mathematics, and code generation. Our model leverages grouped-query attention (GQA) for faster inference, coupled with sliding window attention (SWA) to effectively handle sequences of arbitrary length with a reduced inference cost. We also provide a model fine-tuned to follow instructions, Mistral 7B -- Instruct, that surpasses the Llama 2 13B -- Chat model both on human and automated benchmarks. Our models are released under the Apache 2.0 license.

hub tools

JSON dossier citing papers JSON arXiv source

citation-role summary

background 57 method 15 baseline 10 other 6 dataset 2

citation-polarity summary

background 55 use method 15 baseline 10 unclear 8 use dataset 2

claims ledger

abstract We introduce Mistral 7B v0.1, a 7-billion-parameter language model engineered for superior performance and efficiency. Mistral 7B outperforms Llama 2 13B across all evaluated benchmarks, and Llama 1 34B in reasoning, mathematics, and code generation. Our model leverages grouped-query attention (GQA) for faster inference, coupled with sliding window attention (SWA) to effectively handle sequences of arbitrary length with a reduced inference cost. We also provide a model fine-tuned to follow instructions, Mistral 7B -- Instruct, that surpasses the Llama 2 13B -- Chat model both on human and auto

authors

author = Mistral 7B

co-cited works

representative citing papers

CATCH-ME if you RAG: a dataset of Contextually Annotated multi-Turn Counterspeech against Hate and Misinformation Exchanges

cs.CL · 2026-06-18 · unverdicted · novelty 8.0

Presents a new expert-curated dataset of multi-turn counterspeech dialogues in five languages targeting hate against seven groups, with span annotations linking to verified external knowledge for RAG applications.

Entropy-Gated Latent Recursion

cs.LG · 2026-06-15 · unverdicted · novelty 8.0 · 2 refs

EGLR adds a deterministic layer-recursion axis gated by entropy that is complementary to temperature sampling, raising joint oracle accuracy on MATH-500 from 83.4% to 91.6% for a 3B model.

Can LLMs Write Correct TLA+ Specifications? Evaluating Natural-Language-to-TLA+ Generation

cs.AI · 2026-06-04 · accept · novelty 8.0

Across 30 LLMs and 205 TLA+ tasks, syntactic correctness reaches at most 26.6% and semantic correctness 8.6%, with all successes limited to progressive prompting and no advantage from larger models.

Do Models Share Safety Representations? Cross-Model Steering for Safe Visual Generation

cs.CV · 2026-06-03 · unverdicted · novelty 8.0

A safety direction estimated in a source LLM is transported to a target generator through lightweight alignment on benign data alone, matching native safety performance without any target-side unsafe data.

Faithfulness Metrics Don't Measure Faithfulness: A Meta-Evaluation with Ground Truth

cs.CL · 2026-05-24 · unverdicted · novelty 8.0

Introduces BonaFide benchmark of 3,066 ground-truth labeled CoTs showing most faithfulness metrics perform near chance with biases and poor scaling to longer chains.

RTI-Bench: A Structured Dataset for Indian Right-to-Information Decision Analysis

cs.CL · 2026-05-16 · accept · novelty 8.0

RTI-Bench is the first publicly released structured dataset of CIC administrative decisions with outcome labels, exemption citations, IRAC reasoning, and timelines, built from 1,218 corpus cases and 298 PDFs, achieving 95.3% label precision on manual review and 57.3% accuracy on a Mistral 7B zero-Sh

Privacy Auditing with Zero (0) Training Run

cs.CR · 2026-05-14 · unverdicted · novelty 8.0

Zero-Run auditing supplies valid lower bounds on differential privacy parameters from fixed member and non-member datasets by modeling and correcting distribution-shift confounding via causal-inference techniques.

Crafting Reversible SFT Behaviors in Large Language Models

cs.LG · 2026-05-07 · unverdicted · novelty 8.0

LCDD creates sparse carriers for SFT behaviors that SFT-Eraser can reverse, with ablations showing the sparse structure enables causal control.

DurableUn: Quantization-Induced Recovery Attacks in Machine Unlearning

cs.LG · 2026-05-04 · conditional · novelty 8.0 · 2 refs

INT4 quantization recovers up to 22 times more forgotten training data in unlearned LLMs, and the proposed DURABLEUN-SAF method is the first to maintain forgetting across BF16, INT8, and INT4 precisions.

Backdoor Attacks on Decentralised Post-Training

cs.CR · 2026-03-31 · conditional · novelty 8.0

An adversary controlling an intermediate pipeline stage in decentralized LLM post-training can inject a backdoor that reduces alignment from 80% to 6%, with the backdoor persisting in 60% of cases even after subsequent safety training.

CacheTrap: Unveiling a Stealthier Gray-Box Trojan against LLMs

cs.CR · 2025-11-27 · conditional · novelty 8.0

CacheTrap achieves 100% targeted attack success on five open-source LLMs by using an efficient search to locate and flip a single bit in the KV cache as a transient trigger, while preserving normal accuracy without the trigger.

MediQAl: A French Medical Question Answering Dataset for Knowledge and Reasoning Evaluation

cs.CL · 2025-07-28 · accept · novelty 8.0

MediQAl is a new French medical QA benchmark with 32k exam-sourced questions in three formats and cognitive labels, evaluated on 14 LLMs to reveal gaps between factual recall and reasoning performance.

Large Language Diffusion Models

cs.CL · 2025-02-14 · unverdicted · novelty 8.0

LLaDA is a scalable diffusion-based language model that matches autoregressive LLMs like LLaMA3 8B on tasks and surpasses GPT-4o on reversal poem completion.

MME-RealWorld: Could Your Multimodal LLM Challenge High-Resolution Real-World Scenarios that are Difficult for Humans?

cs.CV · 2024-08-23 · conditional · novelty 8.0

MME-RealWorld is the largest manually annotated high-resolution benchmark for MLLMs, where even the best models achieve less than 60% accuracy on challenging real-world tasks.

LiveBench: A Challenging, Contamination-Limited LLM Benchmark

cs.CL · 2024-06-27 · unverdicted · novelty 8.0

LiveBench is a contamination-limited LLM benchmark with auto-scored challenging tasks from recent sources across math, coding, reasoning and more, where top models score below 70%.

ORPO: Monolithic Preference Optimization without Reference Model

cs.CL · 2024-03-12 · conditional · novelty 8.0

ORPO performs preference alignment during supervised fine-tuning via a monolithic odds ratio penalty, allowing 7B models to outperform larger state-of-the-art models on alignment benchmarks.

Evaluating Very Long-Term Conversational Memory of LLM Agents

cs.CL · 2024-02-27 · unverdicted · novelty 8.0

Creates LoCoMo benchmark dataset for very long-term LLM conversational memory and shows current models struggle with lengthy dialogues and long-range temporal dynamics.

Information Dynamics of Language Communication

cs.CL · 2026-06-29 · unverdicted · novelty 7.0

The paper defines STE and SPID, two information-theoretic measures of semantic flow and decomposition in language exchanges, and applies them to four dialogue datasets.

Anisotropy Decides Cosine vs. Rank Metrics for Text Embeddings

cs.CL · 2026-06-28 · conditional · novelty 7.0

Anisotropy, quantified by dominant-dimension variance fraction, determines the best parameter-free similarity metric for text embeddings, with rank-based metrics gaining ~20% relative where cosine is weakest.

MultiHashFormer: Hash-based Generative Language Models

cs.CL · 2026-06-26 · unverdicted · novelty 7.0

MultiHashFormer enables hash-based autoregression in LMs by encoding tokens as multi-hash signatures, outperforming standard Transformers at 100M-3B scales while keeping parameter count constant for multilingual expansion.

Compositional Skill Routing for LLM Agents: Decompose, Retrieve, and Compose

cs.CL · 2026-06-16 · unverdicted · novelty 7.0

SkillWeaver formalizes compositional skill routing for LLM agents and introduces SAD, which raises step-level decomposition accuracy from 51% to 67.7% on a new 300-query benchmark over 2209 real MCP skills.

Models Take Notes at Prefill: KV Cache Can Be Editable and Composable

cs.LG · 2026-06-14 · unverdicted · novelty 7.0

KV caches function as notebooks of prefilled conclusions, enabling field-level edits that recover decisions (especially with CoT) and position-portable skill composition with near-identical outputs at O(L) cost.

Polar: A Benchmark for Evaluating Political Bias in LLMs

cs.CL · 2026-06-11 · unverdicted · novelty 7.0

Polar is a new cross-context benchmark showing LLM political bias measurements are not fixed but vary with country, issue, model, and language.

Small LLMs for Biomedical Claim Verification: Cost-Effective Fine-Tuning, Structural Dataset Shortcuts, and Cross-Domain Generalization

cs.CL · 2026-06-11 · unverdicted · novelty 7.0

Fine-tuned Mistral-7B via QLoRA achieves up to 12% higher F1 than GPT-4o on biomedical claim verification with 1008 examples, identifies a structural shortcut in SciFact, and shows robust cross-domain transfer from sound data.

citing papers explorer

Showing 50 of 631 citing papers.

Layer-wise Token Compression for Efficient Document Reranking cs.IR · 2026-05-20 · unverdicted · none · ref 20 · 2 links · internal anchor
Layer-wise Token Compression applies adaptive token pooling at middle transformer layers for cross-encoder rerankers, preserving MS MARCO ranking quality while raising QPS up to 25% on passages and 116% on documents, with added gains on listwise LLM rerankers and a regularizer effect for long inputs
What Do Biomedical NER and Entity Linking Benchmarks Measure? A Corpus-Centric Diagnostic Framework cs.CL · 2026-05-19 · accept · none · ref 87 · internal anchor
A corpus-centric framework diagnoses scale, structure, overlap, metadata, and terminology properties across nine biomedical NER/EL corpora, showing substantial differences that common statistics fail to capture.
Trust No Tool: Evaluating and Defending LLM Agents under Untrusted Tool Feedback cs.CR · 2026-05-17 · unverdicted · none · ref 22 · internal anchor
Presents TRUST-Bench benchmark for hidden-trigger tool compromises in LLM agents and VISTA-Guard framework for trajectory-aware risk scoring of final actions under untrusted feedback.
Conflict-Free Replicated Data Types for Neural Network Model Merging: A Two-Layer Architecture Enabling CRDT-Compliant Model Merging Across 26 Strategies cs.DC · 2026-05-16 · unverdicted · none · ref 13 · internal anchor
A two-layer CRDT architecture wraps any of 26 neural network merge strategies to deliver strong eventual consistency in distributed model merging.
GQA-{\mu}P: The maximal parameterization update for grouped query attention cs.LG · 2026-05-14 · unverdicted · none · ref 9 · internal anchor
Derives μP scalings for GQA via promoted spectral-norm definition of feature learning and a modified norm preserving scaling laws for non-full-rank matrices, with experiments showing learning-rate transfer.
Widening the Gap: Exploiting LLM Quantization via Outlier Injection cs.LG · 2026-05-14 · conditional · none · ref 33 · internal anchor
The paper introduces an outlier-injection attack that induces targeted weight collapse in LLMs under advanced quantization schemes including AWQ, GPTQ, and GGUF I-quants.
GHGbench: A Unified Multi-Entity, Multi-Task Benchmark for Carbon Emission Prediction cs.LG · 2026-05-13 · unverdicted · none · ref 24 · 2 links · internal anchor
GHGbench supplies a harmonized dataset and multi-task benchmark for company and building carbon emission prediction, with baselines showing large OOD gaps and benefits from multimodal embeddings.
Inducing Artificial Uncertainty in Language Models cs.CL · 2026-05-13 · unverdicted · none · ref 14 · internal anchor
Inducing artificial uncertainty on trivial tasks allows training probes that achieve higher calibration on hard data than standard approaches while retaining performance on easy data.
TokAlign++: Advancing Vocabulary Adaptation via Better Token Alignment cs.CL · 2026-05-13 · unverdicted · none · ref 63 · internal anchor
TokAlign++ learns token alignments between LLM vocabularies from monolingual representations to enable faster adaptation, better text compression, and effective token-level distillation across 15 languages with minimal steps.
Uncovering Symmetry Transfer in Large Language Models via Layer-Peeled Optimization math.OC · 2026-05-12 · conditional · none · ref 42 · internal anchor
Symmetries in next-token prediction targets induce corresponding geometric symmetries such as circulant matrices and equiangular tight frames in the optimal weights and embeddings of a layer-peeled LLM surrogate model.
Towards Automated Air Traffic Safety Assessment Around Non-Towered Airports Using Large Language Models cs.AI · 2026-05-12 · unverdicted · none · ref 28 · internal anchor
Large language models achieve macro F1 scores above 0.85 on binary nominal-versus-danger classification from CTAF radio transcripts and METAR weather data using a new synthetic dataset with a 12-category hazard taxonomy.
BadSKP: Backdoor Attacks on Knowledge Graph-Enhanced LLMs with Soft Prompts cs.AI · 2026-05-12 · conditional · none · ref 43 · internal anchor
BadSKP poisons graph node embeddings to steer soft prompts in KG-enhanced LLMs, achieving high attack success rates where text-channel backdoors fail due to semantic anchoring.
SoK: Unlearnability and Unlearning for Model Dememorization cs.LG · 2026-05-12 · conditional · none · ref 89 · internal anchor
The first integrated taxonomy, empirical study of interplay and shallow dememorization, plus a theoretical guarantee on dememorization depth for certified unlearning.
Deep Minds and Shallow Probes cs.LG · 2026-05-12 · unverdicted · none · ref 10 · internal anchor
Symmetry under affine reparameterizations of hidden coordinates selects a unique hierarchy of shallow coordinate-stable probes and a probe-visible quotient for cross-model transfer.
SLIM: Sparse Latent Steering for Interpretable and Property-Directed LLM-Based Molecular Editing cs.LG · 2026-05-11 · unverdicted · none · ref 6 · internal anchor
SLIM decomposes LLM hidden states via sparse autoencoders with learnable gates to enable precise, interpretable steering of molecular properties, yielding up to 42.4-point gains on the MolEditRL benchmark.
Cross-Family Universality of Behavioral Axes via Anchor-Projected Representations cs.AI · 2026-05-11 · unverdicted · none · ref 21 · internal anchor
Behavioral directions from one LLM family transfer to others via projection into a shared anchor coordinate space, yielding 0.83 ten-way detection accuracy and steering effects up to 0.46% on held-out models.
EvoPref: Multi-Objective Evolutionary Optimization Discovers Diverse LLM Alignments Beyond Gradient Descent cs.NE · 2026-05-10 · unverdicted · none · ref 32 · internal anchor
EvoPref applies NSGA-II evolutionary optimization with archive-based diversity to populations of LoRA adapters, yielding 18% higher preference coverage and 47% lower collapse than gradient descent baselines while matching alignment quality.
Entropy-informed Decoding: Adaptive Information-Driven Branching cs.LG · 2026-05-10 · unverdicted · none · ref 2 · internal anchor
EDEN adaptively sets branching factor proportional to next-token entropy, achieving better accuracy per expansion than fixed beam search while providing a proof that monotone entropy-based branching outperforms any fixed budget allocation.
Positional LSH: Binary Block Matrix Approximation for Attention with Linear Biases cs.LG · 2026-05-10 · unverdicted · none · ref 49 · internal anchor
ALiBi bias is the expectation of positional LSH-induced block masks, yielding spectral and max-norm approximation bounds that reduce long-context biased attention to randomized short-context unbiased attention.
Theoretical Limits of Language Model Alignment cs.LG · 2026-05-08 · unverdicted · none · ref 29 · internal anchor
The maximum reward gain under KL-regularized LM alignment is a Jeffreys divergence term, estimable as covariance from base samples, with best-of-N approaching the theoretical limit.
VITA-QinYu: Expressive Spoken Language Model for Role-Playing and Singing cs.CL · 2026-05-07 · unverdicted · none · ref 37 · internal anchor
VITA-QinYu is the first expressive end-to-end spoken language model supporting role-playing and singing alongside conversation, trained on 15.8K hours of data and outperforming prior models on expressiveness and conversational benchmarks.
The First Token Knows: Single-Decode Confidence for Hallucination Detection cs.CL · 2026-05-06 · unverdicted · none · ref 10 · internal anchor
First-token normalized entropy (phi_first) from one greedy decode reaches mean AUROC 0.820 for hallucination detection, matching or exceeding semantic self-consistency (0.793) and surface self-consistency (0.791) across three 7-8B models and two benchmarks.
Delta-Based Neural Architecture Search: LLM Fine-Tuning via Code Diffs cs.LG · 2026-05-06 · unverdicted · none · ref 21 · internal anchor
Fine-tuned 7B LLMs generating unified diffs for neural architecture refinement achieve 66-75% valid rates and 64-66% mean first-epoch accuracy, outperforming full-generation baselines by large margins while cutting output length by 75-85%.
The Right Answer, the Wrong Direction: Why Transformers Fail at Counting and How to Fix It cs.LG · 2026-05-05 · accept · none · ref 6 · 2 links · internal anchor
Transformers store count information internally but cannot read it out as digits due to near-orthogonal alignment with output-head rows; updating digit rows or applying LoRA to attention layers improves constrained and unconstrained counting respectively.
How Language Models Process Negation cs.CL · 2026-05-04 · unverdicted · none · ref 21 · 2 links · internal anchor
LLMs process negation using both attention-based suppression and constructive representation mechanisms (construction dominant), with late-layer attention shortcuts explaining poor accuracy on negation tasks.
Embedding-based In-Context Prompt Training for Enhancing LLMs as Text Encoders cs.CL · 2026-05-02 · unverdicted · none · ref 23 · internal anchor
EPIC trains LLMs to treat continuous embeddings as in-context prompts, yielding state-of-the-art text embedding performance on MTEB with or without prompts at inference and lower compute.
A Multi-View Media Profiling Suite: Resources, Evaluation, and Analysis cs.CL · 2026-05-02 · unverdicted · none · ref 30 · internal anchor
Presents MBFC-2025 dataset and multi-view embeddings with fusion methods for media bias and factuality, reporting SOTA results on ACL-2020 and new benchmarks on MBFC-2025.
ReLay: Personalized LLM-Generated Plain-Language Summaries for Better Understanding, but at What Cost? cs.CL · 2026-05-01 · unverdicted · none · ref 34 · internal anchor
Personalized LLM-generated plain language summaries improve lay readers' comprehension and quality ratings but increase risks of reinforcing biases and introducing hallucinations compared to static expert summaries.
Attention Is Where You Attack cs.CR · 2026-04-30 · unverdicted · none · ref 10 · internal anchor
ARA jailbreaks safety-aligned LLMs like LLaMA-3 and Mistral by redirecting attention in safety-heavy heads with as few as 5 tokens, achieving 30-36% attack success while ablating the same heads barely affects refusals.
One Pass, Any Order: Position-Invariant Listwise Reranking for LLM-Based Recommendation cs.IR · 2026-04-30 · conditional · none · ref 15 · internal anchor
InvariRank achieves permutation-invariant listwise reranking for LLM-based recommendations via a structured attention mask that blocks cross-candidate interactions and shared positional framing under RoPE, enabling stable rankings in one forward pass.
XGRAG: A Graph-Native Framework for Explaining KG-based Retrieval-Augmented Generation cs.AI · 2026-04-27 · unverdicted · none · ref 13 · internal anchor
XGRAG uses graph perturbations to quantify component contributions in GraphRAG and achieves 14.81% better explanation quality than text-based baselines on QA datasets, with correlations to graph centrality.
Coverage-Based Calibration for Post-Training Quantization via Weighted Set Cover over Outlier Channels cs.LG · 2026-04-27 · conditional · none · ref 11 · internal anchor
COVERCAL selects PTQ calibration samples via weighted set cover over outlier channels, with a stylized clipping model showing missed coverage upper-bounds surrogate loss, yielding gains over random and other baselines on LLaMA and Mistral models.
Can an MLP Absorb Its Own Skip Connection? cs.LG · 2026-04-26 · accept · none · ref 5 · internal anchor
Skip-connected MLPs and residual-free MLPs of equal width represent generically disjoint function classes for common activations, with explicit impossibility proofs and a non-generic absorption condition for ReLU and GELU.
Supernodes and Halos: Loss-Critical Hubs in LLM Feed-Forward Layers cs.LG · 2026-04-26 · unverdicted · none · ref 4 · internal anchor
In LLM feed-forward networks, the top 1% of channels per layer carry a median 58.7% of loss sensitivity, forming supernodes whose protection enables effective 50% sparsity pruning with much lower perplexity than baselines.
Fine-tuning vs. In-context Learning in Large Language Models: A Formal Language Learning Perspective cs.CL · 2026-04-25 · conditional · none · ref 30 · 2 links · internal anchor
A controlled formal language task reveals fine-tuning outperforms in-context learning on in-distribution generalization but equals it on out-of-distribution, with ICL showing greater sensitivity to model size and tokenization.
Evaluating Temporal Consistency in Multi-Turn Language Models cs.CL · 2026-04-24 · unverdicted · none · ref 20 · internal anchor
Language models frequently violate temporal scope stability in multi-turn dialogues by drifting toward present-day assumptions even when they possess the correct facts.
Provably Secure Steganography Based on List Decoding cs.CR · 2026-04-23 · conditional · none · ref 9 · internal anchor
List decoding enables a provably secure steganography scheme with higher embedding capacity for LLMs via candidate sets and suffix matching.
Serialisation Strategy Matters: How FHIR Data Format Affects LLM Medication Reconciliation cs.CL · 2026-04-22 · conditional · none · ref 23 · internal anchor
Clinical narrative format beats raw JSON for LLMs up to 8B parameters on medication reconciliation but raw JSON wins at 70B scale, with omissions as the main error type.
Breaking MCP with Function Hijacking Attacks: Novel Threats for Function Calling and Agentic Models cs.CR · 2026-04-22 · unverdicted · none · ref 9 · internal anchor
A novel function hijacking attack achieves 70-100% success rates in forcing specific function calls across five LLMs on the BFCL benchmark and is robust to context semantics.
HaS: Accelerating RAG through Homology-Aware Speculative Retrieval cs.IR · 2026-04-22 · unverdicted · none · ref 16 · internal anchor
HaS accelerates RAG retrieval via homology-aware speculative retrieval and homologous query re-identification validation, cutting latency 24-37% with 1-2% accuracy drop on tested datasets.
SafeAnchor: Preventing Cumulative Safety Erosion in Continual Domain Adaptation of Large Language Models cs.LG · 2026-04-20 · unverdicted · none · ref 25 · internal anchor
SafeAnchor preserves 93.2% of original safety alignment across sequential domain adaptations by anchoring low-rank safety subspaces and constraining orthogonal updates, while matching unconstrained fine-tuning performance within 1.5 points.
How Tokenization Limits Phonological Knowledge Representation in Language Models and How to Improve Them cs.CL · 2026-04-18 · unverdicted · none · ref 28 · internal anchor
Subword tokenization impairs phonological knowledge encoding in LMs, but an IPA-based fine-tuning method restores it with minimal impact on other capabilities.
On the Robustness of LLM-Based Dense Retrievers: A Systematic Analysis of Generalizability and Stability cs.IR · 2026-04-17 · unverdicted · none · ref 30 · internal anchor
LLM-based dense retrievers generalize better when instruction-tuned but pay a specialization tax when optimized for reasoning; they resist typos and corpus poisoning better than encoder-only baselines yet remain vulnerable to semantic perturbations, with larger models and certain embedding geometry,
RAGognizer: Hallucination-Aware Fine-Tuning via Detection Head Integration cs.CL · 2026-04-17 · unverdicted · none · ref 32 · internal anchor
RAGognizer adds a detection head to LLMs for joint training on generation and token-level hallucination detection, yielding SOTA detection and fewer hallucinations in RAG while preserving output quality.
GTA-2: Benchmarking General Tool Agents from Atomic Tool-Use to Open-Ended Workflows cs.CL · 2026-04-17 · conditional · none · ref 58 · internal anchor
GTA-2 benchmark shows frontier models achieve below 50% on atomic tool tasks and only 14.39% success on realistic long-horizon workflows, with execution harnesses like Manus providing substantial gains.
Conjunctive Prompt Attacks in Multi-Agent LLM Systems cs.MA · 2026-04-17 · unverdicted · none · ref 13 · internal anchor
Conjunctive prompt attacks split adversarial elements across agents and routing paths in multi-agent LLM systems, evading isolated defenses and succeeding through topology-aware optimization.
Response-Aware User Memory Selection for LLM Personalization cs.AI · 2026-04-15 · unverdicted · none · ref 4 · internal anchor
RUMS selects LLM user memory via mutual information with model outputs to reduce response uncertainty, outperforming similarity-based methods in human alignment and response quality with up to 95% lower cost.
TokenFormer: Unify the Multi-Field and Sequential Recommendation Worlds cs.IR · 2026-04-15 · unverdicted · none · ref 23 · internal anchor
TokenFormer unifies multi-field and sequential recommendation modeling via bottom-full-top-sliding attention and non-linear interaction representations to avoid sequential collapse and deliver state-of-the-art performance.
Better and Worse with Scale: How Contextual Entrainment Diverges with Model Size cs.CL · 2026-04-14 · unverdicted · none · ref 9 · internal anchor
Contextual entrainment decreases for semantic contexts but increases for non-semantic ones as LLMs scale, following power-law trends with 4x better resistance to misinformation but 2x more copying of arbitrary tokens.
A Sanity Check on Composed Image Retrieval cs.CV · 2026-04-14 · unverdicted · none · ref 13 · internal anchor
The paper creates FISD, a controlled benchmark for composed image retrieval that removes query ambiguity via generative models, and proposes a multi-round agentic evaluation to assess models in interactive settings.

Title resolution pending

hub tools

citation-role summary

citation-polarity summary

claims ledger

authors

co-cited works

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer