pith. sign in

arxiv: 2101.11911 · v1 · pith:RIAJYRWKnew · submitted 2021-01-28 · 💻 cs.CL · cs.CV

The Role of Syntactic Planning in Compositional Image Captioning

classification 💻 cs.CL cs.CV
keywords captioningcompositionalgeneralizationimageimagessyntacticdifferentgeneralizing
0
0 comments X
read the original abstract

Image captioning has focused on generalizing to images drawn from the same distribution as the training set, and not to the more challenging problem of generalizing to different distributions of images. Recently, Nikolaus et al. (2019) introduced a dataset to assess compositional generalization in image captioning, where models are evaluated on their ability to describe images with unseen adjective-noun and noun-verb compositions. In this work, we investigate different methods to improve compositional generalization by planning the syntactic structure of a caption. Our experiments show that jointly modeling tokens and syntactic tags enhances generalization in both RNN- and Transformer-based models, while also improving performance on standard metrics.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 2 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. Compositionality Emerges in a Narrow Depth-Connectivity Regime: Architecture Constraints and Solution Manifolds

    cs.LG 2026-06 unverdicted novelty 6.0

    Compositionality emerges in neural networks only in a narrow depth-connectivity regime, with gradient descent converging to fractured solutions outside it.

  2. A Systematic Study of Behavioral Cloning for Scientific Data Annotation

    cs.HC 2026-05 unverdicted novelty 6.0

    Introduces 9 synthetic annotation tasks and benchmarks for behavioral cloning, finding hierarchical skill learning, scaling benefits, effective multi-task pretraining, and shared internal representations of task phase...