KamonBench is a grammar-based dataset of 20,000 synthetic Japanese crests with multi-format annotations that enables direct evaluation of factor recovery beyond caption accuracy in vision-language models.
Compositionality decomposed: How do neural networks generalise?Journal of Artificial Intelligence Research, 67:757– 795
6 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
verdicts
UNVERDICTED 6roles
background 1polarities
background 1representative citing papers
A rule-generation perspective lets LLMs write programs as rules for data mapping and applies complexity theory to estimate their compositionality, tested on string-to-grid tasks.
VLMs fail at visual counting extrapolation because they cannot project visual magnitudes onto symbolic tokens, despite intact perceptual representations, supporting a fractured magnitude hypothesis.
Iterated learning theory predicts and LLM experiments confirm non-monotonic compositionality during self-training, reframing model collapse as cultural transmission with matching human regularization patterns.
A new benchmark finds that state-of-the-art ML interatomic potentials struggle with compositional generalization, producing errors an order of magnitude higher on unseen molecular combinations than on training-like cases.
ALiBi enables transformers trained on length-1024 sequences to extrapolate to length-2048 with the same perplexity as a sinusoidal model trained on 2048, while training 11% faster and using 11% less memory.
citing papers explorer
-
KamonBench: A Grammar-Based Dataset for Evaluating Compositional Factor Recovery in Vision-Language Models
KamonBench is a grammar-based dataset of 20,000 synthetic Japanese crests with multi-format annotations that enables direct evaluation of factor recovery beyond caption accuracy in vision-language models.
-
Investigating More Explainable and Partition-Free Compositionality Estimation for LLMs: A Rule-Generation Perspective
A rule-generation perspective lets LLMs write programs as rules for data mapping and applies complexity theory to estimate their compositionality, tested on string-to-grid tasks.
-
Unveiling the Visual Counting Bottleneck in Vision-Language Models
VLMs fail at visual counting extrapolation because they cannot project visual magnitudes onto symbolic tokens, despite intact perceptual representations, supporting a fractured magnitude hypothesis.
-
Model Collapse as Cultural Evolution
Iterated learning theory predicts and LLM experiments confirm non-monotonic compositionality during self-training, reframing model collapse as cultural transmission with matching human regularization patterns.
-
Benchmarking Compositional Generalisation for Machine Learning Interatomic Potentials
A new benchmark finds that state-of-the-art ML interatomic potentials struggle with compositional generalization, producing errors an order of magnitude higher on unseen molecular combinations than on training-like cases.
-
Train Short, Test Long: Attention with Linear Biases Enables Input Length Extrapolation
ALiBi enables transformers trained on length-1024 sequences to extrapolate to length-2048 with the same perplexity as a sinusoidal model trained on 2048, while training 11% faster and using 11% less memory.