archive
Every paper Pith has read. Search by title, abstract, or pith.
8185 papers in cs.AI · page 1
-
Memory bank preserves characters across 48-shot gaps in video
EntityBench: Towards Entity-Consistent Long-Range Multi-Shot Video Generation
-
One token unifies agentic and latent visual reasoning
ATLAS: Agentic or Latent Visual Reasoning? One Word is Enough for Both
-
FutureSim shows top AI agents predict events at 25% accuracy
FutureSim: Replaying World Events to Evaluate Adaptive Agents
-
New index catches 3D geometry errors in video generators
Quantitative Video World Model Evaluation for Geometric-Consistency
-
This paper introduces Shodh-MoE
Eradicating Negative Transfer in Multi-Physics Foundation Models via Sparse Mixture-of-Experts Routing
-
Pairwise votes raise LLM code Elo by 405 points
OpenDeepThink: Parallel Reasoning via Bradley--Terry Aggregation
-
EHR tables sharpen timing in text-based clinical timelines
Text Knows What, Tables Know When: Clinical Timeline Reconstruction via Retrieval-Augmented Multimodal Alignment
-
Memory model lets LLMs add knowledge without retraining
MeMo: Memory as a Model
-
Single model tops VLM and world benchmarks while ranking near first on robot actions
Pelican-Unified 1.0: A Unified Embodied Intelligence Model for Understanding, Reasoning, Imagination and Action
-
APWA scales agent workflows by parallelizing non-communicating subproblems
APWA: A Distributed Architecture for Parallelizable Agentic Workflows
-
Students treat AI as quick fix but want long-term cultural support companion
Understanding How International Students in the U.S. Are Using Conversational AI to Support Cross-Cultural Adaptation
-
Citations miss key context in agent graph answers
Why Neighborhoods Matter: Traversal Context and Provenance in Agentic GraphRAG
-
Optimal logging policies minimize OPE error via reward-coverage balance
Logging Policy Design for Off-Policy Evaluation
-
The paper proposes Dual-Dimensional Consistency (DDC)
Dual-Dimensional Consistency: Balancing Budget and Quality in Adaptive Inference-Time Scaling
-
Taxonomy sorts SNN training rules and adds shared testbed
NeuroTrain: Surveying Local Learning Rules for Spiking Neural Networks with an Open Benchmarking Framework
-
SpeakerLLM turns speaker verification into natural-language reasoning
SpeakerLLM: A Speaker-Specialized Audio-LLM for Speaker Understanding and Verification Reasoning
-
LLM tunes Linux knobs for 72 percent stable gain over defaults
SemaTune: Semantic-Aware Online OS Tuning with Large Language Models
-
128 random demos suffice for strong RLVR results
Boosting Reinforcement Learning with Verifiable Rewards via Randomly Selected Few-Shot Guidance
-
Geometry-first method cuts satellite-to-street 3D error by 23 percent
Sat3DGen: Comprehensive Street-Level 3D Scene Generation from Single Satellite Image
-
The paper introduces MicroscopyMatching
MicroscopyMatching: Towards a Ready-to-use Framework for Microscopy Image Analysis in Diverse Conditions
-
Viverra adds verified assertions to LLM-generated C code
Viverra: Text-to-Code with Guarantees
-
Survey ties LLM agent collaboration to failure detection and self-fix
Beyond Individual Intelligence: Surveying Collaboration, Failure Attribution, and Self-Evolution in LLM-based Multi-Agent Systems
-
BiFedKD raises ECG accuracy 3.5 percent with 40 percent less communication
BiFedKD: Bidirectional Federated Knowledge Distillation Framework for Non-IID and Long-Tailed ECG Monitoring
-
The paper presents the Closed-Loop Visual Reasoning (CLVR) framework that integrates…
Unlocking Complex Visual Generation via Closed-Loop Verified Reasoning
-
Decomposing traces boosts AI agent diagnosis accuracy up to 12x
Holistic Evaluation and Failure Diagnosis of AI Agents
-
The paper presents a fixed six-stage deterministic workflow that confines language model…
A Deterministic Agentic Workflow for HS Tariff Classification: Multi-Dimensional Rule Reasoning with Interpretable Decisions
-
Model reads cell types and protein levels from label-free images
Towards Label-Free Single-Cell Phenotyping Using Multi-Task Learning
-
Vision features align LLM text with clinical data for stroke prognosis
Vision-Core Guided Contrastive Learning for Balanced Multi-modal Prognosis Prediction of Stroke
-
VLMs fail to locate hidden functional objects from task instructions
SceneFunRI: Reasoning the Invisible for Task-Driven Functional Object Localization
-
Vision framework with physical priors lifts water level accuracy
Vision-Based Water Level and Flow Estimation
-
RefineCAM improves high-resolution CAMs for CNN explanations
How to Evaluate and Refine your CAM
-
Multi-label benchmark shows MLLMs still miss full emotion mixes
MultiEmo-Bench: Multi-label Visual Emotion Analysis for Multi-modal Large Language Models
-
Learned potential reweights bridges to improve generative fidelity
Action-Inspired Generative Models
-
Neural solvers reach energy parity after 158000 deployments
An Amortized Efficiency Threshold for Comparing Neural and Heuristic Solvers in Combinatorial Optimization
-
Internal masking cuts hallucinations in vision-language models
Do We Really Need External Tools to Mitigate Hallucinations? SIRA: Shared-Prefix Internal Reconstruction of Attribution
-
Min-Max-IRL reaches fast O(n^{-1}) rates without exploration
Fast Rates for Inverse Reinforcement Learning
-
SAM worsens DRL backdoors while other fixes reduce them
Angel or Demon: Investigating the Plasticity Interventions' Impact on Backdoor Threats in Deep Reinforcement Learning
-
Aggregated vectors make different financial docs look identical
A Picture is Worth a Thousand Words? An Empirical Study of Aggregation Strategies for Visual Financial Document Retrieval
-
Segment annotations raise LLM reasoning accuracy
Prompt Segmentation and Annotation Optimisation: Controlling LLM Behaviour via Optimised Segment-Level Annotations
-
PyCSP3 Scheduling compiles abstractions to standard constraints
PyCSP3-Scheduling: A Scheduling Extension for PyCSP3
-
Action tokens carry the training signal in agentic RL
Resolving Action Bottleneck: Agentic Reinforcement Learning Informed by Token-Level Energy
-
Crowdsourcing platform collects multimodal data for embodied AI training
TeachAnything: A Multimodal Crowdsourcing Platform for Training Embodied AI Agents in Symmetrical Reality
-
Drum MIDI becomes audio matching any reference timbre
Break-the-Beat! Controllable MIDI-to-Drum Audio Synthesis
-
Bandits recover multi-objective prompts more efficiently
Efficient Multi-objective Prompt Optimization via Pure-exploration Bandits
-
LLMs are complacent not sycophantic due to training design
Complacent, Not Sycophantic: Reframing Large Language Models and Designing AI Literacy for Complacent Machines
-
LLMs top out at 46 percent exact match on medication choices
RxEval: A Prescription-Level Benchmark for Evaluating LLM Medication Recommendation
-
Fine-tuned AI host beats LLMs with 23% more informative live-sales responses
VerbalValue: A Socially Intelligent Virtual Host for Sales-Driven Live Commerce
-
Coherent strategy trumps high spending in LLM agent benchmark
Cattle Trade: A Multi-Agent Benchmark for LLM Bluffing, Bidding, and Bargaining
-
RC metrics align object removal scores with human perception
PROVE: A Perceptual RemOVal cohErence Benchmark for Visual Media
-
Many perfect LLM scores hide dimensional intent failures
Dimension-Level Intent Fidelity Evaluation for Large Language Models: Evidence from Structured Prompt Ablation