Long-range dependency in integer multiplication is a mirage from 1D representation; a 2D grid reduces it to local 3x3 operations, letting a 321-parameter neural cellular automaton generalize perfectly to inputs 683 times longer than training while Transformers fail.
Chain of code: Reasoning with a language model-augmented code emulator
7 Pith papers cite this work. Polarity classification is still indexing.
representative citing papers
An AI-agent social platform generated mostly neutral content whose use in fine-tuning reduced model truthfulness comparably to human Reddit data, suggesting limited unique harm but flagging tail risks like secret leaks.
CoT prompting in LLM4Code shows mixed robustness that depends on model family, task structure, and perturbations destabilizing structural anchors, leading to trajectory deformations like lengthening, branching, and simplification.
MR-Coupler leverages functional coupling analysis and LLMs to generate valid metamorphic test cases for over 90% of tasks while detecting 44% of real bugs, outperforming baselines by 64.90% in validity and 36.56% in false-alarm reduction.
PruneTIR prunes erroneous tool-call trajectories during LLM inference via three trigger-based components to raise Pass@1 accuracy and efficiency while shortening context.
The survey proposes the LIFE framework to unify fragmented research on collaboration, failure attribution, and self-evolution in LLM multi-agent systems into a progression toward self-organizing intelligence.
A systematic survey categorizes prompt engineering methods for LLMs and VLMs by application area, summarizing methodologies, applications, models, datasets, strengths, and limitations for each technique along with a taxonomy and summary table.
citing papers explorer
-
On the Mirage of Long-Range Dependency, with an Application to Integer Multiplication
Long-range dependency in integer multiplication is a mirage from 1D representation; a 2D grid reduces it to local 3x3 operations, letting a 321-parameter neural cellular automaton generalize perfectly to inputs 683 times longer than training while Transformers fail.
-
The Moltbook Files: A Harmless Slopocalypse or Humanity's Last Experiment
An AI-agent social platform generated mostly neutral content whose use in fine-tuning reduced model truthfulness comparably to human Reddit data, suggesting limited unique harm but flagging tail risks like secret leaks.
-
Structural Anchors and Reasoning Fragility:Understanding CoT Robustness in LLM4Code
CoT prompting in LLM4Code shows mixed robustness that depends on model family, task structure, and perturbations destabilizing structural anchors, leading to trajectory deformations like lengthening, branching, and simplification.
-
MR-Coupler: Automated Metamorphic Test Generation via Functional Coupling Analysis
MR-Coupler leverages functional coupling analysis and LLMs to generate valid metamorphic test cases for over 90% of tasks while detecting 44% of real bugs, outperforming baselines by 64.90% in validity and 36.56% in false-alarm reduction.
-
PruneTIR: Inference-Time Tool Call Pruning for Effective yet Efficient Tool-Integrated Reasoning
PruneTIR prunes erroneous tool-call trajectories during LLM inference via three trigger-based components to raise Pass@1 accuracy and efficiency while shortening context.
-
Beyond Individual Intelligence: Surveying Collaboration, Failure Attribution, and Self-Evolution in LLM-based Multi-Agent Systems
The survey proposes the LIFE framework to unify fragmented research on collaboration, failure attribution, and self-evolution in LLM multi-agent systems into a progression toward self-organizing intelligence.
-
A Systematic Survey of Prompt Engineering in Large Language Models: Techniques and Applications
A systematic survey categorizes prompt engineering methods for LLMs and VLMs by application area, summarizing methodologies, applications, models, datasets, strengths, and limitations for each technique along with a taxonomy and summary table.