archive

Every paper Pith has read. Search by title, abstract, or pith.

8185 papers in cs.AI · page 1

cs.CV 2026-05-14 reviewed

Memory bank preserves characters across 48-shot gaps in video
EntityBench: Towards Entity-Consistent Long-Range Multi-Shot Video Generation

Meng Wei +3
cs.CV 2026-05-14 reviewed

One token unifies agentic and latent visual reasoning
ATLAS: Agentic or Latent Visual Reasoning? One Word is Enough for Both

Pheng-Ann Heng +3
cs.LG 2026-05-14 reviewed

FutureSim shows top AI agents predict events at 25% accuracy
FutureSim: Replaying World Events to Evaluate Adaptive Agents

Ameya Prabhu +7
cs.CV 2026-05-14 reviewed

New index catches 3D geometry errors in video generators
Quantitative Video World Model Evaluation for Geometric-Consistency

Jiaxin Wu +4
cs.LG 2026-05-14 reviewed

This paper introduces Shodh-MoE
Eradicating Negative Transfer in Multi-Physics Foundation Models via Sparse Mixture-of-Experts Routing

Arastu Sharma +1
cs.AI 2026-05-14 reviewed

Pairwise votes raise LLM code Elo by 405 points
OpenDeepThink: Parallel Reasoning via Bradley--Terry Aggregation

Huanzhi Mao +5
cs.CL 2026-05-14 reviewed

EHR tables sharpen timing in text-based clinical timelines
Text Knows What, Tables Know When: Clinical Timeline Reconstruction via Retrieval-Augmented Multimodal Alignment

Jeremy C. Weiss +3
cs.CL 2026-05-14 reviewed

Memory model lets LLMs add knowledge without retraining
MeMo: Memory as a Model

Alfred Wei Lun Leong +8
cs.RO 2026-05-14 reviewed

Single model tops VLM and world benchmarks while ranking near first on robot actions
Pelican-Unified 1.0: A Unified Embodied Intelligence Model for Understanding, Reasoning, Imagination and Action

Che Liu +26
cs.AI 2026-05-14 reviewed

APWA scales agent workflows by parallelizing non-communicating subproblems
APWA: A Distributed Architecture for Parallelizable Agentic Workflows

Alina Oprea +4
cs.HC 2026-05-14 reviewed

Students treat AI as quick fix but want long-term cultural support companion
Understanding How International Students in the U.S. Are Using Conversational AI to Support Cross-Cultural Adaptation

Anisa Callis +5
cs.AI 2026-05-14 reviewed

Citations miss key context in agent graph answers
Why Neighborhoods Matter: Traversal Context and Provenance in Agentic GraphRAG

Maximilian von Zastrow +2
stat.ML 2026-05-14 reviewed

Optimal logging policies minimize OPE error via reward-coverage balance
Logging Policy Design for Off-Policy Evaluation

Connor Douglas +2
cs.AI 2026-05-14 reviewed

The paper proposes Dual-Dimensional Consistency (DDC)
Dual-Dimensional Consistency: Balancing Budget and Quality in Adaptive Inference-Time Scaling

Bo Li +5
cs.NE 2026-05-14 reviewed

Taxonomy sorts SNN training rules and adds shared testbed
NeuroTrain: Surveying Local Learning Rules for Spiking Neural Networks with an Open Benchmarking Framework

Alessandro Savino +4
cs.SD 2026-05-14 reviewed

SpeakerLLM turns speaker verification into natural-language reasoning
SpeakerLLM: A Speaker-Specialized Audio-LLM for Speaker Understanding and Verification Reasoning

Ha-Jin Yu +4
cs.OS 2026-05-14 reviewed

LLM tunes Linux knobs for 72 percent stable gain over defaults
SemaTune: Semantic-Aware Online OS Tuning with Large Language Models

Georgios Liargkovas +3
cs.LG 2026-05-14 reviewed

128 random demos suffice for strong RLVR results
Boosting Reinforcement Learning with Verifiable Rewards via Randomly Selected Few-Shot Guidance

Alexander G. Schwing +2
cs.CV 2026-05-14 reviewed

Geometry-first method cuts satellite-to-street 3D error by 23 percent
Sat3DGen: Comprehensive Street-Level 3D Scene Generation from Single Satellite Image

Bin Tan +8
cs.CV 2026-05-14 reviewed

The paper introduces MicroscopyMatching
MicroscopyMatching: Towards a Ready-to-use Framework for Microscopy Image Analysis in Diverse Conditions

Haoxuan Qu +5
cs.SE 2026-05-14 reviewed

Viverra adds verified assertions to LLM-generated C code
Viverra: Text-to-Code with Guarantees

Haoze Wu +3
cs.AI 2026-05-14 reviewed

Survey ties LLM agent collaboration to failure detection and self-fix
Beyond Individual Intelligence: Surveying Collaboration, Failure Attribution, and Self-Evolution in LLM-based Multi-Agent Systems

Bifan Wei +17
cs.AI 2026-05-14 reviewed

BiFedKD raises ECG accuracy 3.5 percent with 40 percent less communication
BiFedKD: Bidirectional Federated Knowledge Distillation Framework for Non-IID and Long-Tailed ECG Monitoring

Hen-Wei Huang +2
cs.CV 2026-05-14 reviewed

The paper presents the Closed-Loop Visual Reasoning (CLVR) framework that integrates…
Unlocking Complex Visual Generation via Closed-Loop Verified Reasoning

Hanbo Cheng +4
cs.AI 2026-05-14 reviewed

Decomposing traces boosts AI agent diagnosis accuracy up to 12x
Holistic Evaluation and Failure Diagnosis of AI Agents

Alon Mecilati +14
cs.AI 2026-05-14 reviewed

The paper presents a fixed six-stage deterministic workflow that confines language model…
A Deterministic Agentic Workflow for HS Tariff Classification: Multi-Dimensional Rule Reasoning with Interpretable Decisions

Dongjiang Zhuang +6
cs.CV 2026-05-14 reviewed

Model reads cell types and protein levels from label-free images
Towards Label-Free Single-Cell Phenotyping Using Multi-Task Learning

Ardhendu Behera +1
cs.CV 2026-05-14 reviewed

Vision features align LLM text with clinical data for stroke prognosis
Vision-Core Guided Contrastive Learning for Balanced Multi-modal Prognosis Prediction of Stroke

Guanjie Wang +7
cs.CV 2026-05-14 reviewed

VLMs fail to locate hidden functional objects from task instructions
SceneFunRI: Reasoning the Invisible for Task-Driven Functional Object Localization

Gueter Josmy Faure +4
cs.CV 2026-05-14 reviewed

Vision framework with physical priors lifts water level accuracy
Vision-Based Water Level and Flow Estimation

ZhiXin Sun
cs.CV 2026-05-14 reviewed

RefineCAM improves high-resolution CAMs for CNN explanations
How to Evaluate and Refine your CAM

Alessandra Stramiglio +3
cs.CV 2026-05-14 reviewed

Multi-label benchmark shows MLLMs still miss full emotion mixes
MultiEmo-Bench: Multi-label Visual Emotion Analysis for Multi-modal Large Language Models

Mo Fan +5
cs.LG 2026-05-14 reviewed

Learned potential reweights bridges to improve generative fidelity
Action-Inspired Generative Models

Debnath Pal +1
cs.LG 2026-05-14 reviewed

Neural solvers reach energy parity after 158000 deployments
An Amortized Efficiency Threshold for Comparing Neural and Heuristic Solvers in Combinatorial Optimization

Sohaib Afifi
cs.CV 2026-05-14 reviewed

Internal masking cuts hallucinations in vision-language models
Do We Really Need External Tools to Mitigate Hallucinations? SIRA: Shared-Prefix Internal Reconstruction of Attribution

Junzhe Chen +5
cs.LG 2026-05-14 reviewed

Min-Max-IRL reaches fast O(n^{-1}) rates without exploration
Fast Rates for Inverse Reinforcement Learning

Andreas Schlaginhaufen +1
cs.LG 2026-05-14 reviewed

SAM worsens DRL backdoors while other fixes reduce them
Angel or Demon: Investigating the Plasticity Interventions' Impact on Backdoor Threats in Deep Reinforcement Learning

Chunyi Zhou +6
cs.CV 2026-05-14 reviewed

Aggregated vectors make different financial docs look identical
A Picture is Worth a Thousand Words? An Empirical Study of Aggregation Strategies for Visual Financial Document Retrieval

Ho Hung Lim +1
cs.AI 2026-05-14 reviewed

Segment annotations raise LLM reasoning accuracy
Prompt Segmentation and Annotation Optimisation: Controlling LLM Behaviour via Optimised Segment-Level Annotations

Anjin Liu +7
cs.AI 2026-05-14 reviewed

PyCSP3 Scheduling compiles abstractions to standard constraints
PyCSP3-Scheduling: A Scheduling Extension for PyCSP3

Sohaib Afifi
cs.LG 2026-05-14 reviewed

Action tokens carry the training signal in agentic RL
Resolving Action Bottleneck: Agentic Reinforcement Learning Informed by Token-Level Energy

David Wipf +9
cs.AI 2026-05-14 reviewed

Crowdsourcing platform collects multimodal data for embodied AI training
TeachAnything: A Multimodal Crowdsourcing Platform for Training Embodied AI Agents in Symmetrical Reality

Rongkai Liu +3
cs.SD 2026-05-14 reviewed

Drum MIDI becomes audio matching any reference timbre
Break-the-Beat! Controllable MIDI-to-Drum Audio Synthesis

Chihiro Nagashima +11
cs.LG 2026-05-14 reviewed

Bandits recover multi-objective prompts more efficiently
Efficient Multi-objective Prompt Optimization via Pure-exploration Bandits

Chengshuai Shi +4
cs.AI 2026-05-14 reviewed

LLMs are complacent not sycophantic due to training design
Complacent, Not Sycophantic: Reframing Large Language Models and Designing AI Literacy for Complacent Machines

Federico Germani +1
cs.LG 2026-05-14 reviewed

LLMs top out at 46 percent exact match on medication choices
RxEval: A Prescription-Level Benchmark for Evaluating LLM Medication Recommendation

Changmiao Wang +6
cs.AI 2026-05-14 reviewed

Fine-tuned AI host beats LLMs with 23% more informative live-sales responses
VerbalValue: A Socially Intelligent Virtual Host for Sales-Driven Live Commerce

Yuyan Chen
cs.AI 2026-05-14 reviewed

Coherent strategy trumps high spending in LLM agent benchmark
Cattle Trade: A Multi-Agent Benchmark for LLM Bluffing, Bidding, and Bargaining

Clemens M\"uller +1
cs.CV 2026-05-14 reviewed

RC metrics align object removal scores with human perception
PROVE: A Perceptual RemOVal cohErence Benchmark for Visual Media

Daiguo Zhou +8
cs.CL 2026-05-14 reviewed

Many perfect LLM scores hide dimensional intent failures
Dimension-Level Intent Fidelity Evaluation for Large Language Models: Evidence from Structured Prompt Ablation

Gang Peng