pith. sign in

Leshem Choshen

Identifiers

  • name variant Leshem Choshen 0.60 · backfill

Papers (17)

  1. MINDGAMES: A Live Arena for Evaluating Social and Strategic Reasoning in Multi-Agent LLMs cs.AI · 2026 · author #47
  2. Instructions Shape Production of Language, not Processing cs.CL · 2026 · author #2
  3. Growing Pains: Extensible and Efficient LLM Benchmarking Via Fixed Parameter Calibration cs.CL · 2026 · author #7
  4. General Agent Evaluation cs.AI · 2026 · author #12
  5. When AI Benchmarks Plateau: A Systematic Study of Benchmark Saturation cs.AI · 2026 · author #12
  6. A Latent Variable Framework for Scaling Laws in Large Language Models stat.AP · 2025 · author #5
  7. Global PIQA: Evaluating Commonsense Reasoning Across 100+ Languages and Cultures cs.CL · 2025 · author #195
  8. Beyond Binary Rewards: Training LMs to Reason About Their Uncertainty cs.LG · 2025 · author #5
  9. LLM Hypnosis: Exploiting User Feedback for Unauthorized Knowledge Injection to All Users cs.CL · 2025 · author #5
  10. DOVE: A Large-Scale Multi-Dimensional Predictions Dataset Towards Meaningful LLM Evaluation cs.CL · 2025 · author #6
  11. Holmes: A Benchmark to Assess the Linguistic Competence of Language Models cs.CL · 2024 · author #3
  12. Are You Convinced? Choosing the More Convincing Evidence with a Siamese Network cs.LG · 2019 · author #3
  13. Learning to combine Grammatical Error Corrections cs.CL · 2019 · author #3
  14. The Language of Legal and Illegal Activity on the Darknet cs.CL · 2019 · author #1
  15. Automatic Metric Validation for Grammatical Error Correction cs.CL · 2018 · author #1
  16. DORA The Explorer: Directed Outreaching Reinforcement Action-Selection cs.LG · 2018 · author #1
  17. Reference-less Measure of Faithfulness for Grammatical Error Correction cs.CL · 2018 · author #1

Mentions

  • 2512.06553 #5 · arxiv_oai · confidence 0.70 Leshem Choshen
  • 2602.16763 #12 · arxiv_oai · confidence 0.70 Leshem Choshen
  • 2510.24081 #195 · arxiv_oai · confidence 0.70 Leshem Choshen
  • 2605.29512 #47 · arxiv_oai · confidence 0.70 Leshem Choshen
  • 2507.16806 #5 · arxiv_oai · confidence 0.70 Leshem Choshen

Frequent Coauthors