MatClaw shows a code-first LLM agent autonomously generating and executing workflows for ML force field training, Curie temperature prediction, and parameter search on CuInP2S6, succeeding on code but requiring interventions for tacit domain knowledge.
hub
write newline
13 Pith papers cite this work. Polarity classification is still indexing.
hub tools
citation-role summary
citation-polarity summary
polarities
unclear 3representative citing papers
Proposes a task taxonomy for functional diversity in LLM outputs, validates it via user study, introduces targeted sampling to boost diversity only where needed, and presents evidence that the diversity-quality tradeoff may be an artifact of task-agnostic measurement.
A resampling-based extension of maxT that delivers simultaneous FDX control over all thresholds and enables data-dependent confidence envelopes for the first time.
A Markov category framework for language models provides an information-theoretic rationale for speculative decoding and shows that a quadratic surrogate to negative log-likelihood induces generalized CCA alignment in linear-softmax heads after normalization.
Blenders constructed via C^r-small perturbations of heterodimensional cycles generate C^1-robust tangencies, and homoclinic tangency unfolding produces uncountably many robust examples under the stated conditions, answering Bonatti-Díaz.
Introduces HS-S (aggregating dynamic threat powers) and Coco-S (fixed points of statewise HS Bellman operator) for stochastic games, proves they coincide for two players but disagree for three, shows uniqueness via extended axioms and topological degree theory, and gives sampling estimators.
Large-scale experiments on two million agents reveal that collective intelligence does not emerge from scale alone due to sparse and shallow interactions.
Pioneer Agent automates the full lifecycle of adapting and continually improving small language models via diagnosis-driven data synthesis and regression-constrained retraining, delivering gains of 1.6-83.8 points on benchmarks and large lifts in production-style tasks.
Well-designed experience replay buffers reduce inference compute in LLM RL post-training while maintaining or improving performance and preserving policy entropy.
Hierarchical planning over multi-scale latent world models enables 70% success on real robotic pick-and-place with goal-only input where flat models achieve 0%, while cutting planning compute up to 4x in simulations.
Small cortical patches multiplex phonetic, syllabic, and lexical representations during speech production via dynamic temporal coding.
Reasoning Memory decomposes reasoning trajectories into 32 million subquestion-subroutine pairs and retrieves them via in-thought prompts to improve language model performance on math, science, and coding benchmarks by up to 19.2%.
MetaEmbed trains fixed learnable Meta Tokens to produce granularity-organized multi-vector embeddings that support test-time scaling in multimodal retrieval.
citing papers explorer
-
MatClaw: An Autonomous Code-First LLM Agent for End-to-End Materials Exploration
MatClaw shows a code-first LLM agent autonomously generating and executing workflows for ML force field training, Curie temperature prediction, and parameter search on CuInP2S6, succeeding on code but requiring interventions for tacit domain knowledge.
-
Task-Dependent Evaluation of LLM Output Homogenization: A Taxonomy-Guided Framework
Proposes a task taxonomy for functional diversity in LLM outputs, validates it via user study, introduces targeted sampling to boost diversity only where needed, and presents evidence that the diversity-quality tradeoff may be an artifact of task-agnostic measurement.
-
Resampling-based multi-resolution false discovery exceedance control
A resampling-based extension of maxT that delivers simultaneous FDX control over all thresholds and enables data-dependent confidence envelopes for the first time.
-
A Markov Categorical Framework for Language Modeling
A Markov category framework for language models provides an information-theoretic rationale for speculative decoding and shows that a quadratic surrogate to negative log-likelihood induces generalized CCA alignment in linear-softmax heads after normalization.
-
$C^1$-robust homoclinic tangencies
Blenders constructed via C^r-small perturbations of heterodimensional cycles generate C^1-robust tangencies, and homoclinic tangency unfolding produces uncountably many robust examples under the stated conditions, answering Bonatti-Díaz.
-
Learning Strategic Value and Cooperation in Multi-Player Stochastic Games through Side Payments
Introduces HS-S (aggregating dynamic threat powers) and Coco-S (fixed points of statewise HS Bellman operator) for stochastic games, proves they coincide for two players but disagree for three, shows uniqueness via extended axioms and topological degree theory, and gives sampling estimators.
-
Superminds Test: Actively Evaluating Collective Intelligence of Agent Society via Probing Agents
Large-scale experiments on two million agents reveal that collective intelligence does not emerge from scale alone due to sparse and shallow interactions.
-
Pioneer Agent: Continual Improvement of Small Language Models in Production
Pioneer Agent automates the full lifecycle of adapting and continually improving small language models via diagnosis-driven data synthesis and regression-constrained retraining, delivering gains of 1.6-83.8 points on benchmarks and large lifts in production-style tasks.
-
Efficient RL Training for LLMs with Experience Replay
Well-designed experience replay buffers reduce inference compute in LLM RL post-training while maintaining or improving performance and preserving policy entropy.
-
Hierarchical Planning with Latent World Models
Hierarchical planning over multi-scale latent world models enables 70% success on real robotic pick-and-place with goal-only input where flat models achieve 0%, while cutting planning compute up to 4x in simulations.
-
Temporal structure of the language hierarchy within small cortical patches
Small cortical patches multiplex phonetic, syllabic, and lexical representations during speech production via dynamic temporal coding.
-
Procedural Knowledge at Scale Improves Reasoning
Reasoning Memory decomposes reasoning trajectories into 32 million subquestion-subroutine pairs and retrieves them via in-thought prompts to improve language model performance on math, science, and coding benchmarks by up to 19.2%.
-
MetaEmbed: Scaling Multimodal Retrieval at Test-Time with Flexible Late Interaction
MetaEmbed trains fixed learnable Meta Tokens to produce granularity-organized multi-vector embeddings that support test-time scaling in multimodal retrieval.