pith. machine review for the scientific record. sign in

archive

Every paper Pith has read. Search by title, abstract, or pith.

4138 papers in cs.CL · page 1

  1. cs.CV 2026-05-14 reviewed
    One token unifies agentic and latent visual reasoning

    ATLAS: Agentic or Latent Visual Reasoning? One Word is Enough for Both

    Pheng-Ann Heng +3

  2. cs.LG 2026-05-14 reviewed
    FutureSim shows top AI agents predict events at 25% accuracy

    FutureSim: Replaying World Events to Evaluate Adaptive Agents

    Ameya Prabhu +7

  3. cs.CL 2026-05-14 reviewed
    Grep beats vector search in most agentic tasks

    Is Grep All You Need? How Agent Harnesses Reshape Agentic Search

    Akhil Kasturi +4

  4. cs.CR 2026-05-14 reviewed
    Length alone triggers LLM backdoors to leak secrets

    MetaBackdoor: Exploiting Positional Encoding as a Backdoor Attack Surface in LLMs

    Ahmed Salem +4

  5. cs.CL 2026-05-14 reviewed
    EHR tables sharpen timing in text-based clinical timelines

    Text Knows What, Tables Know When: Clinical Timeline Reconstruction via Retrieval-Augmented Multimodal Alignment

    Jeremy C. Weiss +3

  6. cs.CL 2026-05-14 reviewed
    Memory model lets LLMs add knowledge without retraining

    MeMo: Memory as a Model

    Alfred Wei Lun Leong +8

  7. cs.CR 2026-05-14 reviewed
    The paper builds a 507-leaf taxonomy of LLM inference attacks from 932 recent security…

    Talk is (Not) Cheap: A Taxonomy and Benchmark Coverage Audit for LLM Attacks

    Alexey A. Shvets +3

  8. cs.CL 2026-05-14 reviewed
    The paper presents a framework that converts existing text-based tool-calling benchmarks…

    From Text to Voice: A Reproducible and Verifiable Framework for Evaluating Tool Calling LLM Agents

    Jonas Robertson +5

  9. cs.LG 2026-05-14 reviewed
    128 random demos suffice for strong RLVR results

    Boosting Reinforcement Learning with Verifiable Rewards via Randomly Selected Few-Shot Guidance

    Alexander G. Schwing +2

  10. cs.AI 2026-05-14 reviewed
    Decomposing traces boosts AI agent diagnosis accuracy up to 12x

    Holistic Evaluation and Failure Diagnosis of AI Agents

    Alon Mecilati +14

  11. cs.CV 2026-05-14 reviewed
    Internal masking cuts hallucinations in vision-language models

    Do We Really Need External Tools to Mitigate Hallucinations? SIRA: Shared-Prefix Internal Reconstruction of Attribution

    Junzhe Chen +5

  12. cs.CL 2026-05-14 reviewed
    Terminal anchors extend LLM context to 64K from short sequences

    EndPrompt: Efficient Long-Context Extension via Terminal Anchoring

    Dawei Yin +11

  13. cs.CL 2026-05-14 reviewed
    Denoising paths supply low-cost uncertainty scores for language diffusion models

    Uncertainty Quantification for Large Language Diffusion Models

    Artem Shelmanov +5

  14. cs.SE 2026-05-14 reviewed
    ML classifier beats rules at spotting BDD refactoring chances

    Mining Subscenario Refactoring Opportunities in Behaviour-Driven Software Test Suites: ML Classifiers and LLM-Judge Baselines

    Ali Hassaan Mughal +2

  15. cs.SE 2026-05-14 reviewed
    Memory agent keeps repo documentation consistent

    Remember Your Trace: Memory-Guided Long-Horizon Agentic Framework for Consistent and Hierarchical Repository-Level Code Documentation

    Changkyu Choi +4

  16. cs.LG 2026-05-14 reviewed
    Action tokens carry the training signal in agentic RL

    Resolving Action Bottleneck: Agentic Reinforcement Learning Informed by Token-Level Energy

    David Wipf +9

  17. cs.CL 2026-05-14 reviewed
    CIPO turns LLM failures into better reasoning

    Learning from Failures: Correction-Oriented Policy Optimization with Verifiable Rewards

    Boxi Cao +8

  18. cs.CL 2026-05-14 reviewed
    Optimal control reformulation gives language models fast parallel sampling at high quality

    Language Generation as Optimal Control: Closed-Loop Diffusion in Latent Control Space

    Liang Lin +5

  19. cs.CL 2026-05-14 reviewed
    Many perfect LLM scores hide dimensional intent failures

    Dimension-Level Intent Fidelity Evaluation for Large Language Models: Evidence from Structured Prompt Ablation

    Gang Peng

  20. cs.CL 2026-05-14 reviewed
    LLM memory systems hit only 46% on group conversations

    GroupMemBench: Benchmarking LLM Agent Memory in Multi-Party Conversations

    Evgeniy Gabrilovich +5

  21. cs.CL 2026-05-14 reviewed
    Ming glossaries used flexible Chinese characters to approximate foreign sounds

    Cross-Linguistic Transcription and Phonological Representation in the Hu\`it\'onggu\v{a}nx\`i Hu\'ay\'iy\`iy\v{u}

    Ji-eun Kim

  22. cs.SE 2026-05-14 reviewed
    Stale code snippets make models output outdated helpers

    When Retrieval Hurts Code Completion: A Diagnostic Study of Stale Repository Context

    Haobin Pan +4

  23. cs.CL 2026-05-14 reviewed
    Probe shows RAG follows wrong context in 85 percent of conflict cases

    Does RAG Know When Retrieval Is Wrong? Diagnosing Context Compliance under Knowledge Conflict

    Huan Xu +6

  24. cs.LG 2026-05-14 reviewed
    Guardrails adapt from sparse noisy failures via conservative induction

    LiSA: Lifelong Safety Adaptation via Conservative Policy Induction

    Bharath Chandrasekhar +8

  25. cs.LG 2026-05-14 reviewed
    Orthogonal projection isolates hallucination signals in LLM answers

    When Answers Stray from Questions: Hallucination Detection via Question-Answer Orthogonal Decomposition

    Erhu Feng +2

  26. cs.CV 2026-05-14 reviewed
    Adaptive gate skips reasoning for simple multimodal inputs

    Think When Needed: Adaptive Reasoning-Driven Multimodal Embeddings with a Dual-LoRA Architecture

    Guanghao Zhang +4

  27. cs.CL 2026-05-14 reviewed
    Calculus finds optimal vocabulary size for ASR

    A Calculus-Based Framework for Determining Vocabulary Size in End-to-End ASR

    Sunil Kumar Kopparapu

  28. cs.SE 2026-05-14 reviewed
    Agents resolve 45 percent of chained package upgrades

    SWE-Chain: Benchmarking Coding Agents on Chained Release-Level Package Upgrades

    Chaozheng Wang +7

  29. cs.CL 2026-05-14 reviewed
    New scores track whether unlearning works across languages

    Knowledge Beyond Language: Bridging the Gap in Multilingual Machine Unlearning Evaluation

    Hyeonjin Kim +3

  30. cs.CL 2026-05-14 reviewed
    Three-tier memory raises recommender hit rate 26 percent

    Agentic Recommender System with Hierarchical Belief-State Memory

    Benyu Zhang +10

  31. cs.LG 2026-05-14 reviewed
    Synthetic queries trigger up to 5x higher LLM failure rates

    NodeSynth: Socially Aligned Synthetic Data for AI Evaluation

    Darlene Neal +7

  32. cs.CL 2026-05-14 reviewed
    Synthetic augmentation lifts defense classification to 58% accuracy

    Mitigating Data Scarcity in Psychological Defense Classification with Context-Aware Synthetic Augmentation

    Hoang-Thuy-Duong Vu +2

  33. cs.CL 2026-05-14 reviewed
    Geometry scores pick shallow layers for diffusion insertion in transformers

    Where Should Diffusion Enter a Language Model? Geometry-Guided Hidden-State Replacement

    Hyoungjoon Lee +2

  34. cs.CL 2026-05-14 reviewed
    Semantic RL adds low-resource languages without erasing prior skills

    Reinforcement Learning with Semantic Rewards Enables Low-Resource Language Expansion without Alignment Tax

    Guixian Xu +9

  35. cs.HC 2026-05-14 reviewed
    Short concern texts track with activity drops and sleep issues

    A Formative Study of Brief Affective Text as a Complement to Wearable Sensing for Longitudinal Student Health Monitoring

    Christopher Danforth +9

  36. cs.CL 2026-05-14 reviewed
    LLM filter and clustering finds 41 manipulative narrative clusters

    LLM-based Detection of Manipulative Political Narratives

    Florian Steuber +2

  37. cs.CL 2026-05-14 reviewed
    Transformers score German texts on left-right scale

    Ideology Prediction of German Political Texts

    Florian Steuber +3

  38. cs.LG 2026-05-14 reviewed
  39. cs.CL 2026-05-14 reviewed
    Exact prefix factorization removes errors in diffusion language models

    Factorization-Error-Free Discrete Diffusion Language Model via Speculative Decoding

    Hang Yuan +3

  40. cs.LG 2026-05-14 reviewed
    Simple diversity penalty in KV scorer beats complex designs

    Minimal-Intervention KV Retention: A Design-Space Study and a Diversity-Penalty Survivor

    Libo Sun +3

  41. cs.CR 2026-05-14 reviewed
    Hidden noise stops vision-language models learning real content

    To See is Not to Learn: Protecting Multimodal Data from Unauthorized Fine-Tuning of Large Vision-Language Model

    Chengshuai Zhao +4

  42. cs.CR 2026-05-14 reviewed
    Web agents should plan before seeing page content

    Web Agents Should Adopt the Plan-Then-Execute Paradigm

    Annabella Chow +7

  43. cs.LG 2026-05-14 reviewed
    MetaMoE combines independently trained expert models into one Mixture-of-Experts system…

    MetaMoE: Diversity-Aware Proxy Selection for Privacy-Preserving Mixture-of-Experts Unification

    Shuhao Chen +2

  44. cs.CL 2026-05-14 reviewed
    Agent harnesses allow unsafe actions even with correct final outputs

    Auditing Agent Harness Safety

    Chengzhi Liu +10

  45. cs.AI 2026-05-14 reviewed
    Hypergraph reasoner hits 94.7% on supply chain RCA

    Hypergraph Enterprise Agentic Reasoner over Heterogeneous Business Systems

    Cheng cheng +10

  46. cs.CL 2026-05-14 reviewed
    Spelling and test design confound KVL word difficulty ratings

    What Makes Words Hard? Sakura at BEA 2026 Shared Task on Vocabulary Difficulty Prediction

    Adam Nohejl +5

  47. cs.LG 2026-05-14 reviewed
    Active learners raise NDCG@10 per call in PRP reranking

    Active Learners as Efficient PRP Rerankers

    Francisco Nattero Santiago Mauricio Barron Bucolo +4

  48. cs.LG 2026-05-14 reviewed
    Transformer predicts next disease with 0.871 median AUC across 896 categories

    DT-Transformer: A Foundation Model for Disease Trajectory Prediction on a Real-world Health System

    Andrew R Weckstein +3

  49. cs.LG 2026-05-14 reviewed
    Small mismatches in LLM RL rollout and optimization cause collapse

    Diagnosing Training Inference Mismatch in LLM Reinforcement Learning

    Geoffrey Fox +7

  50. cs.LG 2026-05-14 reviewed
    Prefill-only adapters deliver 1.9x throughput for 512 users

    PreFT: Prefill-only finetuning for efficient inference

    Andrew Lanpouthakoun +6