GiLT augments Transformers with semantic dependency graphs by modulating attention to improve syntactic generalization while keeping perplexity competitive and enabling better finetuning on downstream tasks.
Structural Guidance for Transformer Language Models
3 Pith papers cite this work. Polarity classification is still indexing.
fields
cs.CL 3representative citing papers
Varying the number of simultaneous parses in RNNGs increases predicted garden-path effects but does not fully reconcile LM surprisal with human reading times.
Semantic constituency graphs outperform syntactic constituency and dependency structures from seven formalisms when added to a Transformer for language modeling.
citing papers explorer
-
GiLT: Augmenting Transformer Language Models with Dependency Graphs
GiLT augments Transformers with semantic dependency graphs by modulating attention to improve syntactic generalization while keeping perplexity competitive and enabling better finetuning on downstream tasks.
-
Why are language models less surprised than humans? Testing the Parse Multiplicity Mismatch Hypothesis
Varying the number of simultaneous parses in RNNGs increases predicted garden-path effects but does not fully reconcile LM surprisal with human reading times.
-
Linguistic Frameworks Go Toe-to-Toe at Neuro-Symbolic Language Modeling
Semantic constituency graphs outperform syntactic constituency and dependency structures from seven formalisms when added to a Transformer for language modeling.