pith. machine review for the scientific record. sign in

archive

Every paper Pith has read. Search by title, abstract, or pith.

8185 papers in cs.AI · page 1

  1. cs.CV 2026-05-14 reviewed
    Memory bank preserves characters across 48-shot gaps in video

    EntityBench: Towards Entity-Consistent Long-Range Multi-Shot Video Generation

    Meng Wei +3

  2. cs.CV 2026-05-14 reviewed
    One token unifies agentic and latent visual reasoning

    ATLAS: Agentic or Latent Visual Reasoning? One Word is Enough for Both

    Pheng-Ann Heng +3

  3. cs.LG 2026-05-14 reviewed
    FutureSim shows top AI agents predict events at 25% accuracy

    FutureSim: Replaying World Events to Evaluate Adaptive Agents

    Ameya Prabhu +7

  4. cs.CV 2026-05-14 reviewed
    New index catches 3D geometry errors in video generators

    Quantitative Video World Model Evaluation for Geometric-Consistency

    Jiaxin Wu +4

  5. cs.LG 2026-05-14 reviewed
    This paper introduces Shodh-MoE

    Eradicating Negative Transfer in Multi-Physics Foundation Models via Sparse Mixture-of-Experts Routing

    Arastu Sharma +1

  6. cs.AI 2026-05-14 reviewed
    Pairwise votes raise LLM code Elo by 405 points

    OpenDeepThink: Parallel Reasoning via Bradley--Terry Aggregation

    Huanzhi Mao +5

  7. cs.CL 2026-05-14 reviewed
    EHR tables sharpen timing in text-based clinical timelines

    Text Knows What, Tables Know When: Clinical Timeline Reconstruction via Retrieval-Augmented Multimodal Alignment

    Jeremy C. Weiss +3

  8. cs.CL 2026-05-14 reviewed
    Memory model lets LLMs add knowledge without retraining

    MeMo: Memory as a Model

    Alfred Wei Lun Leong +8

  9. cs.RO 2026-05-14 reviewed
    Single model tops VLM and world benchmarks while ranking near first on robot actions

    Pelican-Unified 1.0: A Unified Embodied Intelligence Model for Understanding, Reasoning, Imagination and Action

    Che Liu +26

  10. cs.AI 2026-05-14 reviewed
    APWA scales agent workflows by parallelizing non-communicating subproblems

    APWA: A Distributed Architecture for Parallelizable Agentic Workflows

    Alina Oprea +4

  11. cs.HC 2026-05-14 reviewed
    Students treat AI as quick fix but want long-term cultural support companion

    Understanding How International Students in the U.S. Are Using Conversational AI to Support Cross-Cultural Adaptation

    Anisa Callis +5

  12. cs.AI 2026-05-14 reviewed
    Citations miss key context in agent graph answers

    Why Neighborhoods Matter: Traversal Context and Provenance in Agentic GraphRAG

    Maximilian von Zastrow +2

  13. stat.ML 2026-05-14 reviewed
    Optimal logging policies minimize OPE error via reward-coverage balance

    Logging Policy Design for Off-Policy Evaluation

    Connor Douglas +2

  14. cs.AI 2026-05-14 reviewed
    The paper proposes Dual-Dimensional Consistency (DDC)

    Dual-Dimensional Consistency: Balancing Budget and Quality in Adaptive Inference-Time Scaling

    Bo Li +5

  15. cs.NE 2026-05-14 reviewed
    Taxonomy sorts SNN training rules and adds shared testbed

    NeuroTrain: Surveying Local Learning Rules for Spiking Neural Networks with an Open Benchmarking Framework

    Alessandro Savino +4

  16. cs.SD 2026-05-14 reviewed
    SpeakerLLM turns speaker verification into natural-language reasoning

    SpeakerLLM: A Speaker-Specialized Audio-LLM for Speaker Understanding and Verification Reasoning

    Ha-Jin Yu +4

  17. cs.OS 2026-05-14 reviewed
    LLM tunes Linux knobs for 72 percent stable gain over defaults

    SemaTune: Semantic-Aware Online OS Tuning with Large Language Models

    Georgios Liargkovas +3

  18. cs.LG 2026-05-14 reviewed
    128 random demos suffice for strong RLVR results

    Boosting Reinforcement Learning with Verifiable Rewards via Randomly Selected Few-Shot Guidance

    Alexander G. Schwing +2

  19. cs.CV 2026-05-14 reviewed
    Geometry-first method cuts satellite-to-street 3D error by 23 percent

    Sat3DGen: Comprehensive Street-Level 3D Scene Generation from Single Satellite Image

    Bin Tan +8

  20. cs.CV 2026-05-14 reviewed
    The paper introduces MicroscopyMatching

    MicroscopyMatching: Towards a Ready-to-use Framework for Microscopy Image Analysis in Diverse Conditions

    Haoxuan Qu +5

  21. cs.SE 2026-05-14 reviewed
    Viverra adds verified assertions to LLM-generated C code

    Viverra: Text-to-Code with Guarantees

    Haoze Wu +3

  22. cs.AI 2026-05-14 reviewed
    Survey ties LLM agent collaboration to failure detection and self-fix

    Beyond Individual Intelligence: Surveying Collaboration, Failure Attribution, and Self-Evolution in LLM-based Multi-Agent Systems

    Bifan Wei +17

  23. cs.AI 2026-05-14 reviewed
    BiFedKD raises ECG accuracy 3.5 percent with 40 percent less communication

    BiFedKD: Bidirectional Federated Knowledge Distillation Framework for Non-IID and Long-Tailed ECG Monitoring

    Hen-Wei Huang +2

  24. cs.CV 2026-05-14 reviewed
    The paper presents the Closed-Loop Visual Reasoning (CLVR) framework that integrates…

    Unlocking Complex Visual Generation via Closed-Loop Verified Reasoning

    Hanbo Cheng +4

  25. cs.AI 2026-05-14 reviewed
    Decomposing traces boosts AI agent diagnosis accuracy up to 12x

    Holistic Evaluation and Failure Diagnosis of AI Agents

    Alon Mecilati +14

  26. cs.AI 2026-05-14 reviewed
    The paper presents a fixed six-stage deterministic workflow that confines language model…

    A Deterministic Agentic Workflow for HS Tariff Classification: Multi-Dimensional Rule Reasoning with Interpretable Decisions

    Dongjiang Zhuang +6

  27. cs.CV 2026-05-14 reviewed
    Model reads cell types and protein levels from label-free images

    Towards Label-Free Single-Cell Phenotyping Using Multi-Task Learning

    Ardhendu Behera +1

  28. cs.CV 2026-05-14 reviewed
    Vision features align LLM text with clinical data for stroke prognosis

    Vision-Core Guided Contrastive Learning for Balanced Multi-modal Prognosis Prediction of Stroke

    Guanjie Wang +7

  29. cs.CV 2026-05-14 reviewed
    VLMs fail to locate hidden functional objects from task instructions

    SceneFunRI: Reasoning the Invisible for Task-Driven Functional Object Localization

    Gueter Josmy Faure +4

  30. cs.CV 2026-05-14 reviewed
    Vision framework with physical priors lifts water level accuracy

    Vision-Based Water Level and Flow Estimation

    ZhiXin Sun

  31. cs.CV 2026-05-14 reviewed
    RefineCAM improves high-resolution CAMs for CNN explanations

    How to Evaluate and Refine your CAM

    Alessandra Stramiglio +3

  32. cs.CV 2026-05-14 reviewed
    Multi-label benchmark shows MLLMs still miss full emotion mixes

    MultiEmo-Bench: Multi-label Visual Emotion Analysis for Multi-modal Large Language Models

    Mo Fan +5

  33. cs.LG 2026-05-14 reviewed
    Learned potential reweights bridges to improve generative fidelity

    Action-Inspired Generative Models

    Debnath Pal +1

  34. cs.LG 2026-05-14 reviewed
    Neural solvers reach energy parity after 158000 deployments

    An Amortized Efficiency Threshold for Comparing Neural and Heuristic Solvers in Combinatorial Optimization

    Sohaib Afifi

  35. cs.CV 2026-05-14 reviewed
    Internal masking cuts hallucinations in vision-language models

    Do We Really Need External Tools to Mitigate Hallucinations? SIRA: Shared-Prefix Internal Reconstruction of Attribution

    Junzhe Chen +5

  36. cs.LG 2026-05-14 reviewed
    Min-Max-IRL reaches fast O(n^{-1}) rates without exploration

    Fast Rates for Inverse Reinforcement Learning

    Andreas Schlaginhaufen +1

  37. cs.LG 2026-05-14 reviewed
    SAM worsens DRL backdoors while other fixes reduce them

    Angel or Demon: Investigating the Plasticity Interventions' Impact on Backdoor Threats in Deep Reinforcement Learning

    Chunyi Zhou +6

  38. cs.CV 2026-05-14 reviewed
    Aggregated vectors make different financial docs look identical

    A Picture is Worth a Thousand Words? An Empirical Study of Aggregation Strategies for Visual Financial Document Retrieval

    Ho Hung Lim +1

  39. cs.AI 2026-05-14 reviewed
    Segment annotations raise LLM reasoning accuracy

    Prompt Segmentation and Annotation Optimisation: Controlling LLM Behaviour via Optimised Segment-Level Annotations

    Anjin Liu +7

  40. cs.AI 2026-05-14 reviewed
    PyCSP3 Scheduling compiles abstractions to standard constraints

    PyCSP3-Scheduling: A Scheduling Extension for PyCSP3

    Sohaib Afifi

  41. cs.LG 2026-05-14 reviewed
    Action tokens carry the training signal in agentic RL

    Resolving Action Bottleneck: Agentic Reinforcement Learning Informed by Token-Level Energy

    David Wipf +9

  42. cs.AI 2026-05-14 reviewed
    Crowdsourcing platform collects multimodal data for embodied AI training

    TeachAnything: A Multimodal Crowdsourcing Platform for Training Embodied AI Agents in Symmetrical Reality

    Rongkai Liu +3

  43. cs.SD 2026-05-14 reviewed
    Drum MIDI becomes audio matching any reference timbre

    Break-the-Beat! Controllable MIDI-to-Drum Audio Synthesis

    Chihiro Nagashima +11

  44. cs.LG 2026-05-14 reviewed
    Bandits recover multi-objective prompts more efficiently

    Efficient Multi-objective Prompt Optimization via Pure-exploration Bandits

    Chengshuai Shi +4

  45. cs.AI 2026-05-14 reviewed
    LLMs are complacent not sycophantic due to training design

    Complacent, Not Sycophantic: Reframing Large Language Models and Designing AI Literacy for Complacent Machines

    Federico Germani +1

  46. cs.LG 2026-05-14 reviewed
    LLMs top out at 46 percent exact match on medication choices

    RxEval: A Prescription-Level Benchmark for Evaluating LLM Medication Recommendation

    Changmiao Wang +6

  47. cs.AI 2026-05-14 reviewed
    Fine-tuned AI host beats LLMs with 23% more informative live-sales responses

    VerbalValue: A Socially Intelligent Virtual Host for Sales-Driven Live Commerce

    Yuyan Chen

  48. cs.AI 2026-05-14 reviewed
    Coherent strategy trumps high spending in LLM agent benchmark

    Cattle Trade: A Multi-Agent Benchmark for LLM Bluffing, Bidding, and Bargaining

    Clemens M\"uller +1

  49. cs.CV 2026-05-14 reviewed
    RC metrics align object removal scores with human perception

    PROVE: A Perceptual RemOVal cohErence Benchmark for Visual Media

    Daiguo Zhou +8

  50. cs.CL 2026-05-14 reviewed
    Many perfect LLM scores hide dimensional intent failures

    Dimension-Level Intent Fidelity Evaluation for Large Language Models: Evidence from Structured Prompt Ablation

    Gang Peng