MemGym unifies agent gyms into a memory benchmark with isolated scoring across tool-use, research, coding, and computer-use regimes plus a lightweight reward model for tractable coding evaluation.
Mixed citations
Title resolution pending
Mixed citation behavior. Most common role is background (60%).
citation-role summary
citation-polarity summary
representative citing papers
LLM adoption in science follows a compressing inverted-U trajectory where release year predicts time-to-peak and lifespan better than model attributes.
A comprehensive survey of code-switched NLP research with LLMs across modalities, covering 327 studies, 15+ tasks, 30+ datasets, and 80+ languages while outlining challenges and a future roadmap.
Scideator enables facet-based scientific ideation through LLM-driven extraction, human-guided recombination, analogous retrieval, and facet-grounded novelty verification, showing significantly higher creativity support than a baseline LLM in a user study with CS researchers.
The authors propose a retrieval-augmented framework that grounds AI exposure labels for 18,796 O*NET occupation-task pairs in retrieved news and academic abstracts, outperforming zero-shot prompting in 72% of disagreements and aligning better with observed real-world usage.
Analogical reasoning increases LLM solution diversity by 90-173% and novelty rate to over 50%, delivering up to 13-fold gains on biomedical tasks including perturbation prediction and cell communication.
Adversarial compromise of tool outputs misleads agentic AI via breadth and depth attacks, revealing that epistemic and navigational robustness are distinct and often trade off against each other.
Only 39% of LLM safety benchmark repositories run without modification, 6% include ethical warnings, and adoption tracks author prominence and runnability rather than code quality metrics.
Attribution gradients consolidate citation evidence and enable incremental unfolding of secondary sources, leading to deeper engagement in a lab study of critical reading tasks for AI answers.
Prompt chaining with off-the-shelf LLMs outperforms in-context learning and BERT for 1st- and 2nd-level classification on the ORKG taxonomy using the FORC dataset, but struggles at the 3rd level.
Lit2Vec delivers a documented, reproducible pipeline that extracts and annotates a large licensed chemistry paper corpus from S2ORC with paragraph embeddings and subfield labels.
Omakase monitors project documents to infer timely queries and distills research reports into actionable suggestions that users rated significantly more useful than raw reports.
A literature survey on abstract concept recognition in videos that catalogs prior tasks and datasets while advocating for foundation models and reuse of decades of community experience.
citing papers explorer
-
Beyond Monolingual Assumptions: A Survey of Code-Switched NLP in the Era of Large Language Models across Modalities
A comprehensive survey of code-switched NLP research with LLMs across modalities, covering 327 studies, 15+ tasks, 30+ datasets, and 80+ languages while outlining challenges and a future roadmap.