GiLT augments Transformers with semantic dependency graphs by modulating attention to improve syntactic generalization while keeping perplexity competitive and enabling better finetuning on downstream tasks.
Dependency Transformer Grammars: Integrating Dependency Structures into Transformer Language Models
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
fields
cs.CL 2years
2026 2representative citing papers
Varying the number of simultaneous parses in RNNGs increases predicted garden-path effects but does not fully reconcile LM surprisal with human reading times.
citing papers explorer
-
Why are language models less surprised than humans? Testing the Parse Multiplicity Mismatch Hypothesis
Varying the number of simultaneous parses in RNNGs increases predicted garden-path effects but does not fully reconcile LM surprisal with human reading times.