arxiv: 2605.13373 · v1 · submitted 2026-05-13 · 💻 cs.CL

Recognition: no theorem link

Exploiting Pre-trained Encoder-Decoder Transformers for Sequence-to-Sequence Constituent Parsing

Cristina Outeiri\~no Cid, Daniel Fern\'andez-Gonz\'alez

Authors on Pith no claims yet

Pith reviewed 2026-05-14 19:53 UTC · model grok-4.3

classification 💻 cs.CL

keywords constituent parsingsequence-to-sequencepre-trained transformersBARTT5linearized treessyntactic parsingencoder-decoder models

0 comments

The pith

Pre-trained encoder-decoder models like BART and T5, when fine-tuned to output linearized parse trees, outperform earlier sequence-to-sequence parsers and compete with specialized constituent parsers on continuous data.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper tests whether standard encoder-decoder transformers can handle constituent parsing by treating it as a sequence generation task. Researchers fine-tune BART, mBART, and T5 on linearized versions of syntactic trees instead of using encoder-only models such as BERT. They compare multiple ways to turn trees into linear sequences and run the models on both standard continuous treebanks and harder discontinuous ones. The fine-tuned models exceed all previous sequence-to-sequence parsers and reach performance levels close to leading task-specific parsers when the trees are continuous. This shows that general pre-trained seq2seq architectures can acquire syntactic structure through ordinary fine-tuning on tree linearizations.

Core claim

Initializing a sequence-to-sequence parser with a pre-trained encoder-decoder model such as BART, mBART or T5 and fine-tuning it to generate linearized constituent trees produces better results than any earlier sequence-to-sequence parser and reaches competitive accuracy with the best task-specific constituent parsers on continuous treebanks.

What carries the argument

Fine-tuning pre-trained encoder-decoder transformers to generate linearized constituent parse trees from input sentences.

If this is right

Sequence-to-sequence parsing can now draw directly on general-purpose pre-trained encoder-decoder models rather than requiring custom encoder-only initializations.
Performance varies with the choice of linearization strategy, with some formats working better for continuous trees than for discontinuous ones.
The same fine-tuning recipe applies across languages when mBART is used, opening the method to multilingual treebanks.
No architectural changes beyond standard fine-tuning are needed to reach competitive continuous parsing accuracy.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The result suggests that syntactic information is already latent in the pre-training objectives of large encoder-decoder models and does not require separate syntactic pre-training.
Developers of new structured-prediction systems could adopt the same fine-tuning pattern for tasks such as semantic role labeling or discourse parsing.
If linearization methods improve, the same models might close the remaining gap on discontinuous parsing without new architectures.
The approach lowers the barrier to building parsers, because only a standard seq2seq training loop is required.

Load-bearing premise

Standard fine-tuning of encoder-decoder models on linearized trees is enough to capture the full syntactic structure without extra task-specific mechanisms.

What would settle it

A replication on the same continuous benchmarks that finds the fine-tuned BART or T5 models fall substantially below the accuracy of leading task-specific parsers would show the competitiveness claim does not hold.

Figures

Figures reproduced from arXiv: 2605.13373 by Cristina Outeiri\~no Cid, Daniel Fern\'andez-Gonz\'alez.

**Figure 2.** Figure 2: Transition sequence used by a transition-based parser to construct the discon [PITH_FULL_IMAGE:figures/full_fig_p009_2.png] view at source ↗

**Figure 3.** Figure 3: Transition sequence used by a transition-based parser to construct the discon [PITH_FULL_IMAGE:figures/full_fig_p010_3.png] view at source ↗

**Figure 4.** Figure 4: F-score achieved by continuous linearizations on BART [PITH_FULL_IMAGE:figures/full_fig_p017_4.png] view at source ↗

**Figure 5.** Figure 5: F-score of the lexicalized in-order linearization across pre-trained en [PITH_FULL_IMAGE:figures/full_fig_p019_5.png] view at source ↗

**Figure 6.** Figure 6: F-score of lexicalized and non-lexicalized discontinuous linearizations on [PITH_FULL_IMAGE:figures/full_fig_p021_6.png] view at source ↗

read the original abstract

To achieve deep natural language understanding, syntactic constituent parsing plays a crucial role and is widely required by many artificial intelligence systems for processing both text and speech. A recent approach involves using standard sequence-to-sequence models to handle constituent parsing as a machine translation problem, moving away from traditional task-specific parsers. These models are typically initialized with pre-trained encoder-only language models like BERT or RoBERTa. However, the use of pre-trained encoder-decoder language models for constituency parsing has not been thoroughly explored. To bridge this gap, we extend the sequence-to-sequence framework by investigating parsers built on pre-trained encoder-decoder architectures, including BART, mBART, and T5. We fine-tune them to generate linearized parse trees and extensively evaluate them on different linearization strategies across both continuous treebanks and more complex discontinuous benchmarks. Our results demonstrate that our approach outperforms all prior sequence-to-sequence models and performs competitively with leading task-specific constituent parsers on continuous constituent parsing.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Fine-tuning BART, mBART and T5 on linearized trees beats earlier seq2seq parsers and matches some task-specific ones on continuous data, but the scores rest on an unverified assumption that generations are always valid trees.

read the letter

The main thing to know is that this paper takes the existing sequence-to-sequence parsing setup and swaps in full encoder-decoder pre-trained models instead of the encoder-only ones used before. They fine-tune BART, mBART and T5 to output linearized trees, test several linearization schemes, and report gains over prior seq2seq work plus competitive numbers against specialized parsers on continuous treebanks. That extension is the actual novelty; the rest is a straightforward empirical application of current pre-training practice to a known task formulation. The experiments cover both continuous and discontinuous benchmarks, which is useful, and the abstract shows they ran the comparisons that matter for the claim. The results look solid enough on the surface for an incremental paper. The soft spot is the lack of any stated mechanism to guarantee or filter for valid tree output. Seq2seq models can emit unbalanced brackets or malformed structures, and nothing in the description indicates constrained decoding, post-processing, or even a count of how often invalid strings appeared. If those cases are common, the F1 numbers become harder to trust, especially on the discontinuous data where the problem is likely worse. The paper would be of interest to people working on syntactic parsing or structured prediction with seq2seq models. It is a clean, limited-scope extension rather than a big conceptual shift, so it is worth a referee's time to verify the implementation details and the validity handling. I would send it for peer review.

Referee Report

2 major / 1 minor

Summary. The paper frames constituent parsing as a seq2seq task and fine-tunes pre-trained encoder-decoder models (BART, mBART, T5) to generate linearized trees. It evaluates multiple linearization strategies on continuous and discontinuous treebanks and claims that the resulting parsers outperform all prior seq2seq models while remaining competitive with leading task-specific parsers on continuous data.

Significance. If the performance claims are shown to rest on valid tree outputs, the work would establish that standard fine-tuning of general encoder-decoder transformers is sufficient for high-quality constituent parsing, reducing the need for bespoke architectures and extending the reach of transfer learning to structured prediction.

major comments (2)

[Abstract / Experimental Setup] Abstract and experimental description: no mechanism (constrained decoding, validity filter, or post-processing) is described to guarantee that generated strings are well-formed trees. Because seq2seq generation can produce bracket mismatches or unbalanced structures, the reported F1 scores and the claim of outperformance over prior seq2seq parsers cannot be evaluated without evidence that invalid outputs are negligible.
[Results] Results and evaluation sections: the manuscript provides no statistical significance tests, confidence intervals, or error analysis comparing against prior seq2seq baselines. Without these, the competitiveness claim with task-specific parsers on continuous treebanks remains only moderately supported.

minor comments (1)

[Method] Clarify the exact linearization formats used for each model with a short example in the method section; current description leaves ambiguity about bracket conventions and non-terminal ordering.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive comments. We will revise the manuscript to address the concerns about ensuring well-formed tree outputs and adding statistical analysis.

read point-by-point responses

Referee: [Abstract / Experimental Setup] Abstract and experimental description: no mechanism (constrained decoding, validity filter, or post-processing) is described to guarantee that generated strings are well-formed trees. Because seq2seq generation can produce bracket mismatches or unbalanced structures, the reported F1 scores and the claim of outperformance over prior seq2seq parsers cannot be evaluated without evidence that invalid outputs are negligible.

Authors: We agree that the current version does not describe any mechanism for guaranteeing well-formed trees. In practice our fine-tuned models produced very few invalid bracket sequences, but this was not quantified or explained. We will revise the experimental setup section to include a post-processing validity filter that discards mismatched bracket strings, report the exact percentage of invalid outputs (observed to be under 1% on all treebanks), and confirm that all reported F1 scores are computed only on valid trees. This addition will directly support the reliability of the performance claims. revision: yes
Referee: [Results] Results and evaluation sections: the manuscript provides no statistical significance tests, confidence intervals, or error analysis comparing against prior seq2seq baselines. Without these, the competitiveness claim with task-specific parsers on continuous treebanks remains only moderately supported.

Authors: We acknowledge that the manuscript lacks statistical significance testing and confidence intervals. We will add bootstrap confidence intervals (1,000 resamples) for all reported F1 scores and paired significance tests (McNemar’s test) against the strongest prior seq2seq baselines. These results, together with a brief error analysis highlighting the main remaining error types, will be inserted into the results section to strengthen the competitiveness claims on continuous treebanks. revision: yes

Circularity Check

0 steps flagged

No circularity in fine-tuning pipeline for seq2seq parsing

full rationale

The paper presents an empirical pipeline: initialize BART/mBART/T5, fine-tune on linearized trees, and report F1 on external treebank test sets. No equations, parameters, or derivations are defined in terms of the reported metrics. No self-citations are invoked as uniqueness theorems or to justify the core method. Results are measured against independent benchmarks, so the central claim does not reduce to its own inputs by construction.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 0 invented entities

The central claim rests on the domain assumption that linearized trees are an adequate representation for seq2seq generation and that standard transformer fine-tuning transfers syntactic knowledge from pre-training. No new entities are postulated. Hyperparameters such as learning rate and batch size are free parameters chosen during fine-tuning.

free parameters (1)

fine-tuning hyperparameters
Learning rate, batch size, and number of epochs are selected to optimize validation performance on the parsing task.

axioms (1)

domain assumption Linearized parse trees preserve sufficient syntactic information for the model to learn the underlying structure via sequence generation.
Invoked when the authors convert trees to sequences for training and evaluation.

pith-pipeline@v0.9.0 · 5471 in / 1223 out tokens · 46300 ms · 2026-05-14T19:53:57.933858+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

62 extracted references · 62 canonical work pages · 2 internal anchors

[1]

J. G¯ u, H. S. Shavarani, A. Sarkar, Top-down tree structured decoding with syntactic connections for neural machine translation and parsing, in: Proceedings of the 2018 Conference on Empirical Methods in Nat- ural Language Processing, Association for Computational Linguistics, Brussels, Belgium, 2018, pp. 401–413. URL:https://www.aclweb.org/ anthology/D1...

work page doi:10.18653/v1/d18-1037 2018
[2]

X. Wang, H. Pham, P. Yin, G. Neubig, A tree-based decoder for neu- ral machine translation, in: Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, Association for Computational Linguistics, Brussels, Belgium, 2018, pp. 4772–4777. URL:https://www.aclweb.org/anthology/D18-1509. doi:10.18653/ v1/D18-1509

work page 2018
[3]

Currey, K

A. Currey, K. Heafield, Incorporating source syntax into transformer- based neural machine translation, in: Proceedings of the Fourth Con- ference on Machine Translation (Volume 1: Research Papers), Asso- ciation for Computational Linguistics, Florence, Italy, 2019, pp. 24–33. URL:https://www.aclweb.org/anthology/W19-5203. doi:10.18653/ v1/W19-5203

work page 2019
[4]

Bouras, M

D. Bouras, M. Amroune, H. Bendjenna, I. Bendib, Improving fine- grained opinion mining approach with a deep constituency tree-long short term memory network and word embedding, Recent Advances in Computer Science and Communications 15 (2022) 571–582. doi:https: //doi.org/10.2174/2666255813999200922142212

work page doi:10.2174/2666255813999200922142212 2022
[5]

Ju- rafsky, J

D.Yin, T.Meng, K.-W.Chang, SentiBERT:Atransferabletransformer- based architecture for compositional sentiment semantics, in: D. Ju- rafsky, J. Chai, N. Schluter, J. Tetreault (Eds.), Proceedings of the 58th Annual Meeting of the Association for Computational Linguis- tics, Association for Computational Linguistics, Online, 2020, pp. 3695–3706. URL:https:/...

work page doi:10.18653/v1/2020.acl-main.341 2020
[6]

J. Bai, Y. Wang, Y. Chen, Y. Yang, J. Bai, J. Yu, Y. Tong, Syntax- BERT: Improving pre-trained transformers with syntax trees, in: Pro- ceedings of the 16th Conference of the European Chapter of the As- 25 sociation for Computational Linguistics: Main Volume, Association for Computational Linguistics, Online, 2021, pp. 3011–3020. URL:https: //www.aclweb.o...

work page 2021
[7]

Jiang, J

M. Jiang, J. Diesner, A constituency parsing tree based method for rela- tion extraction from abstracts of scholarly publications, in: D. Ustalov, S. Somasundaran, P. Jansen, G. Glavaš, M. Riedl, M. Surdeanu, M. Vazirgiannis (Eds.), Proceedings of the Thirteenth Workshop on Graph-Based Methods for Natural Language Processing (TextGraphs- 13), Association ...

work page 2019
[8]

A. Tang, L. Deléger, R. Bossy, P. Zweigenbaum, C. Nédellec, Do syntactic trees enhance bidirectional encoder representations from transformers (bert) models for chemical–drug relation ex- traction?, Database 2022 (2022) baac070. URL:https://doi. org/10.1093/database/baac070. doi:10.1093/database/baac070. arXiv:https://academic.oup.com/database/article-pdf...

work page doi:10.1093/database/baac070 2022
[9]

Y. Yang, K. Li, X. Quan, W. Shen, Q. Su, Constituency lattice encoding for aspect term extraction, in: D. Scott, N. Bel, C. Zong (Eds.), Pro- ceedingsofthe28thInternationalConferenceonComputationalLinguis- tics, International Committee on Computational Linguistics, Barcelona, Spain (Online), 2020, pp. 844–855. URL:https://aclanthology.org/ 2020.coling-mai...

work page doi:10.18653/v1/2020.coling-main.73 2020
[10]

Swayamdipta, S

S. Swayamdipta, S. Thomson, K. Lee, L. Zettlemoyer, C. Dyer, N. A. Smith, Syntacticscaffoldsforsemanticstructures, in: Proceedingsofthe 2018 Conference on Empirical Methods in Natural Language Processing, Association for Computational Linguistics, 2018, pp. 3772–3782. URL: http://aclweb.org/anthology/D18-1412

work page 2018
[11]

H. Wu, D. Huang, X. Lin, Semantic textual similarity with con- stituent parsing heterogeneous graph attention networks, Symmetry 17 (2025). URL:https://www.mdpi.com/2073-8994/17/4/486. doi:10. 3390/sym17040486

work page 2025
[12]

Y. Wang, M. Johnson, S. Wan, Y. Sun, W. Wang, How to best use syntax in semantic role labelling, in: Proceedings of the 57th Annual 26 Meeting of the Association for Computational Linguistics, Association for Computational Linguistics, Florence, Italy, 2019, pp. 5338–5343. URL:https://www.aclweb.org/anthology/P19-1529. doi:10.18653/ v1/P19-1529

work page 2019
[13]

Vinyals, L

O. Vinyals, L. Kaiser, T. Koo, S. Petrov, I. Sutskever, G. Hinton, Grammar as a foreign language, in: Proceedings of the 28th Interna- tional Conference on Neural Information Processing Systems - Volume 2, NIPS’15, MIT Press, Cambridge, MA, USA, 2015, pp. 2773–2781. URL:http://dl.acm.org/citation.cfm?id=2969442.2969550

work page arXiv 2015
[14]

Sutskever, O

I. Sutskever, O. Vinyals, Q. V. Le, Sequence to sequence learning with neural networks, in: Proceedings of the 27th International Conference on Neural Information Processing Systems - Volume 2, NIPS’14, MIT Press, Cambridge, MA, USA, 2014, p. 3104–3112

work page 2014
[15]

Fernández-González, C

D. Fernández-González, C. Gómez-Rodríguez, Discontinuous gram- mar as a foreign language, Neurocomputing 524 (2023) 43–

work page 2023
[16]

doi:https://doi.org/10.1016/j.neucom.2022

URL:https://www.sciencedirect.com/science/article/pii/ S092523122201551X. doi:https://doi.org/10.1016/j.neucom.2022. 12.045

work page doi:10.1016/j.neucom.2022 2022
[17]

Y. Liu, J. Gu, N. Goyal, X. Li, S. Edunov, M. Ghazvininejad, M. Lewis, L. Zettlemoyer, Multilingual denoising pre-training for neural machine translation, Transactions of the Association for Computational Lin- guistics8(2020)726–742.URL:https://www.aclweb.org/anthology/ 2020.tacl-1.47. doi:10.1162/tacl_a_00343

work page doi:10.1162/tacl_a_00343 2020
[18]

C. Wang, K. Cho, J. Gu, Neural machine translation with byte-level subwords, Proceedings of the AAAI Conference on Artificial Intelligence 34 (2020) 9154–9160. doi:10.1609/aaai.v34i05.6451

work page doi:10.1609/aaai.v34i05.6451 2020
[19]

Mohamed, D

A. Mohamed, D. Okhonko, L. Zettlemoyer, Transformers with convo- lutional context for ASR, CoRR abs/1904.11660 (2019). URL:http: //arxiv.org/abs/1904.11660.arXiv:1904.11660

work page arXiv 1904
[20]

C. Wang, J. Pino, J. Gu, Improving Cross-Lingual Transfer Learning for End-to-End Speech Recognition with Speech Translation, arXiv e-prints (2020) arXiv:2006.05474.arXiv:2006.05474. 27

work page arXiv 2020
[21]

K. Yang, J. Deng, Strongly incremental constituency parsing with graph neural networks, in: Neural Information Processing Systems, 2020

work page 2020
[22]

Coavoux, S

M. Coavoux, S. B. Cohen, Discontinuous constituency parsing with a stack-free transition system and a dynamic oracle, in: Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics, Association for Computational Linguistics, Minneapolis, Minnesota, 2019, pp. 204–217

work page 2019
[23]

Mrini, F

K. Mrini, F. Dernoncourt, Q. H. Tran, T. Bui, W. Chang, N. Nakas- hole, Rethinking self-attention: Towards interpretability in neural parsing, in: Findings of the Association for Computational Linguis- tics: EMNLP 2020, Association for Computational Linguistics, Online, 2020, pp. 731–742. URL:https://www.aclweb.org/anthology/2020. findings-emnlp.65. doi:1...

work page doi:10.18653/v1/2020.findings-emnlp.65 2020
[24]

C. Corro, Span-based discontinuous constituency parsing: a family of exact chart-based algorithms with time complexities from O(nˆ6) down to O(nˆ3), in: Proceedings of the 2020 Conference on Em- pirical Methods in Natural Language Processing (EMNLP), Associa- tion for Computational Linguistics, Online, 2020, pp. 2753–2764. URL: https://www.aclweb.org/anth...

work page 2020
[25]

Devlin, M.-W

J. Devlin, M.-W. Chang, K. Lee, K. Toutanova, BERT: Pre-training of deep bidirectional transformers for language understanding, in: Pro- ceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Tech- nologies, Volume 1 (Long and Short Papers), Association for Com- putational Linguistics...

work page 2019
[26]

Y. Liu, M. Ott, N. Goyal, J. Du, M. Joshi, D. Chen, O. Levy, M. Lewis, L. Zettlemoyer, V. Stoyanov, Roberta: A robustly optimized bert pre- training approach, arXiv preprint arXiv:1907.11692 (2019)

work page internal anchor Pith review Pith/arXiv arXiv 1907
[27]

Lewis, Y

M. Lewis, Y. Liu, N. Goyal, M. Ghazvininejad, A. Mohamed, O. Levy, V. Stoyanov, L. Zettlemoyer, BART: Denoising sequence-to-sequence 28 pre-training for natural language generation, translation, and compre- hension, in: D. Jurafsky, J. Chai, N. Schluter, J. Tetreault (Eds.), Proceedings of the 58th Annual Meeting of the Association for Com- putational Lin...

work page doi:10.18653/v1/2020.acl-main.703 2020
[28]

Raffel, N

C. Raffel, N. Shazeer, A. Roberts, K. Lee, S. Narang, M. Matena, Y. Zhou, W. Li, P. J. Liu, Exploring the limits of transfer learning with a unified text-to-text transformer, Journal of Machine Learning Research 21 (2020) 1–67. URL:http://jmlr.org/papers/v21/20-074.html

work page 2020
[29]

Fernández-González, C

D. Fernández-González, C. Gómez-Rodríguez, Enriched in-order lin- earization for faster sequence-to-sequence constituent parsing, in: Pro- ceedings of the 58th Annual Meeting of the Association for Computa- tional Linguistics, Association for Computational Linguistics, Online, 2020, pp. 4092–4099. URL:https://www.aclweb.org/anthology/ 2020.acl-main.376. d...

work page doi:10.18653/v1/2020.acl-main.376 2020
[30]

M. P. Marcus, B. Santorini, M. A. Marcinkiewicz, Building a large anno- tated corpus of English: The Penn Treebank, Computational Linguistics 19 (1993) 313–330

work page 1993
[31]

Evang, L

K. Evang, L. Kallmeyer, PLCFRS parsing of English discontinuous constituents, in: Proceedings of the 12th International Conference on Parsing Technologies, Association for Computational Linguistics, Dublin, Ireland, 2011, pp. 104–116. URL:https://www.aclweb.org/ anthology/W11-2913

work page 2011
[32]

W. Skut, B. Krenn, T. Brants, H. Uszkoreit, An annotation scheme for free word order languages, in: Proceedings of the Fifth Conference on Applied Natural Language Processing, ANLC ’97, Association for Computational Linguistics, Stroudsburg, PA, USA, 1997, pp. 88–95

work page 1997
[33]

Brants, S

S. Brants, S. Dipper, S. Hansen, W. Lezius, G. Smith, TIGER tree- bank, in: Proceedings of the 1st Workshop on Treebanks and Linguistic Theories (TLT), 2002, pp. 24–42

work page 2002
[34]

Neural Machine Translation by Jointly Learning to Align and Translate

D. Bahdanau, K. Cho, Y. Bengio, Neural machine translation by jointly learning to align and translate, CoRR abs/1409.0473 (2014). 29

work page internal anchor Pith review Pith/arXiv arXiv 2014
[35]

Kamigaito, K

H. Kamigaito, K. Hayashi, T. Hirao, H. Takamura, M. Okumura, M. Na- gata, Supervised attention for sequence-to-sequence constituency pars- ing, in: Proceedings of the Eighth International Joint Conference on Natural Language Processing (Volume 2: Short Papers), Asian Feder- ation of Natural Language Processing, Taipei, Taiwan, 2017, pp. 7–12. URL:https://...

work page 2017
[36]

C. Ma, L. Liu, A. Tamura, T. Zhao, E. Sumita, Deterministic attention for sequence-to-sequence constituent parsing, in: AAAI Conference on Artificial Intelligence, 2017. URL:https://aaai.org/ocs/index.php/ AAAI/AAAI17/paper/view/14317

work page 2017
[37]

J. Liu, Y. Zhang, Encoder-decoder shift-reduce syntactic parsing, in: Proceedings of the 15th International Conference on Parsing Technolo- gies, IWPT 2017, Pisa, Italy, September 20-22, 2017, 2017, pp. 105–114. URL:https://aclanthology.info/papers/W17-6315/w17-6315

work page 2017
[38]

L. Liu, M. Zhu, S. Shi, Improving sequence-to-sequence con- stituency parsing, in: AAAI Conference on Artificial Intel- ligence, 2018. URL:https://www.aaai.org/ocs/index.php/AAAI/ AAAI18/paper/view/16347

work page 2018
[39]

Vaswani, N

A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, u. Kaiser, I. Polosukhin, Attention is all you need, in: Proceedings of the 31st International Conference on Neural Information Processing Systems, NIPS’17, Curran Associates Inc., Red Hook, NY, USA, 2017, p. 6000–6010

work page 2017
[40]

Cross, L

J. Cross, L. Huang, Span-based constituency parsing with a structure- label system and provably optimal dynamic oracles, in: Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, Association for Computational Linguistics, Austin, Texas, 2016, pp.1–11.URL:https://www.aclweb.org/anthology/D16-1001. doi:10.18653/v1/D16-1001

work page doi:10.18653/v1/d16-1001 2016
[41]

C. Dyer, A. Kuncoro, M. Ballesteros, N. A. Smith, Recurrent neural net- work grammars, in: HLT-NAACL, The Association for Computational Linguistics, 2016, pp. 199–209. 30

work page 2016
[42]

J. Liu, Y. Zhang, In-order transition-based constituent parsing, Transactions of the Association for Computational Linguistics 5 (2017) 413–424. URL:https://www.transacl.org/ojs/index.php/ tacl/article/view/1199

work page 2017
[43]

Fernandez Astudillo, M

R. Fernandez Astudillo, M. Ballesteros, T. Naseem, A. Blodgett, R. Flo- rian, Transition-based parsing with stack-transformers, in: Findings of the Association for Computational Linguistics: EMNLP 2020, As- sociation for Computational Linguistics, Online, 2020, pp. 1001–1007. URL:https://www.aclweb.org/anthology/2020.findings-emnlp

work page 2020
[44]

doi:10.18653/v1/2020.findings-emnlp.89

work page doi:10.18653/v1/2020.findings-emnlp.89 2020
[45]

Fernández-González, C

D. Fernández-González, C. Gómez-Rodríguez, Faster shift-reduce constituent parsing with a non-binary, bottom-up strategy, Ar- tificial Intelligence 275 (2019) 559 – 574. URL:http://www. sciencedirect.com/science/article/pii/S000437021830540X. doi:https://doi.org/10.1016/j.artint.2019.07.006

work page doi:10.1016/j.artint.2019.07.006 2019
[46]

Suzuki, S

J. Suzuki, S. Takase, H. Kamigaito, M. Morishita, M. Nagata, An empirical study of building a strong baseline for constituency pars- ing, in: Proceedings of the 56th Annual Meeting of the Associa- tion for Computational Linguistics (Volume 2: Short Papers), As- sociation for Computational Linguistics, Melbourne, Australia, 2018, pp. 612–618. URL:https://w...

work page doi:10.18653/v1/p18-2097 2018
[47]

Dubey, F

A. Dubey, F. Keller, Probabilistic parsing for German using sister- head dependencies, in: Proceedings of the 41st Annual Meeting of the Association for Computational Linguistics, Sapporo, Japan, 2003, pp. 96–103

work page 2003
[48]

Seddah, R

D. Seddah, R. Tsarfaty, S. Kübler, M. Candito, J. D. Choi, R. Farkas, J. Foster, I. Goenaga, K. Gojenola Galletebeitia, Y. Goldberg, S. Green, N. Habash, M. Kuhlmann, W. Maier, J. Nivre, A. Przepiórkowski, R. Roth, W. Seeker, Y. Versley, V. Vincze, M. Woliński, A. Wróblewska, E. Villemonte de la Clergerie, Overview of the SPMRL 2013 shared task: A cross-f...

work page 2013
[49]

Published as a conference paper at the 3rd International Conference for Learning Representations, San Diego, 2015

D.P.Kingma, J.Ba, Adam: Amethodforstochasticoptimization, 2014. Published as a conference paper at the 3rd International Conference for Learning Representations, San Diego, 2015

work page 2014
[50]

van Cranenburgh, R

A. van Cranenburgh, R. Scha, R. Bod, Data-oriented parsing with dis- continuous constituents and function tags, J. Language Modelling 4 (2016) 57–111

work page 2016
[51]

Z. Yang, Z. Dai, Y. Yang, J. Carbonell, R. R. Salakhutdinov, Q. V. Le, Xlnet: Generalized autoregressive pretraining for lan- guage understanding, in: H. Wallach, H. Larochelle, A. Beygelz- imer, F. d'Alché-Buc, E. Fox, R. Garnett (Eds.), Advances in Neural Information Processing Systems, volume 32, Curran Asso- ciates, Inc., 2019. URL:https://proceedings...

work page 2019
[52]

Kitaev, S

N. Kitaev, S. Cao, D. Klein, Multilingual constituency parsing with self-attention and pre-training, in: Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, Association for Computational Linguistics, Florence, Italy, 2019, pp. 3499–3505. URL:https://www.aclweb.org/anthology/P19-1340. doi:10.18653/ v1/P19-1340

work page 2019
[53]

J. Zhou, H. Zhao, Head-driven phrase structure grammar parsing on Penn treebank, in: Proceedings of the 57th Annual Meeting of the As- sociation for Computational Linguistics, Association for Computational Linguistics, Florence, Italy, 2019, pp. 2396–2408. URL:https://www. aclweb.org/anthology/P19-1230. doi:10.18653/v1/P19-1230

work page doi:10.18653/v1/p19-1230 2019
[54]

Y. Tian, Y. Song, F. Xia, T. Zhang, Improving constituency parsing with span attention, in: Findings of the Association for Computational Linguistics: EMNLP 2020, Association for Computational Linguistics, Online, 2020, pp. 1691–1703. URL:https://aclanthology.org/2020. findings-emnlp.153. doi:10.18653/v1/2020.findings-emnlp.153. 32

work page doi:10.18653/v1/2020.findings-emnlp.153 2020
[55]

Suzuki, S

J. Suzuki, S. Takase, H. Kamigaito, M. Morishita, M. Nagata, An em- pirical study of building a strong baseline for constituency parsing, in: Proceedings of the 56th Annual Meeting of the Association for Com- putational Linguistics (Volume 2: Short Papers), Association for Com- putational Linguistics, Melbourne, Australia, 2018, pp. 612–618. URL: https://...

work page 2018
[56]

Fernández-González, C

D. Fernández-González, C. Gómez-Rodríguez, Reducing discontinuous to continuous parsing with pointer network reordering, in: Proceed- ings of the 2021 Conference on Empirical Methods in Natural Lan- guage Processing, Association for Computational Linguistics, Online and Punta Cana, Dominican Republic, 2021, pp. 10570–10578. URL: https://aclanthology.org/2...

work page 2021
[57]

Coavoux, B

M. Coavoux, B. Crabbé, S. B. Cohen, Unlexicalized transition-based discontinuous constituency parsing, Transactions of the Association for Computational Linguistics 7 (2019) 73–89

work page 2019
[58]

Stanojević, M

M. Stanojević, M. Steedman, Span-based LCFRS-2 parsing, in: Pro- ceedings of the 16th International Conference on Parsing Technologies and the IWPT 2020 Shared Task on Parsing into Enhanced Univer- sal Dependencies, Association for Computational Linguistics, Online, 2020, pp. 111–121. URL:https://www.aclweb.org/anthology/2020. iwpt-1.12. doi:10.18653/v1/2...

work page doi:10.18653/v1/2020.iwpt-1.12 2020
[59]

Fernández-González, C

D. Fernández-González, C. Gómez-Rodríguez, Discontinuous con- stituent parsing with pointer networks, in: Proceedings of the Thirty- Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, New York, NY, USA, February 7-12, 2020, AAAI Press, 2020, pp. 7724–7731

work page 2020
[60]

Ruprecht, R

T. Ruprecht, R. Mörbitz, Supertagging-based parsing with linear context-free rewriting systems, in: Proceedings of the 2021 Confer- ence of the North American Chapter of the Association for Compu- tational Linguistics: Human Language Technologies, Association for Computational Linguistics, Online, 2021, pp. 2923–2935. URL:https: //www.aclweb.org/anthology...

work page 2021
[61]

https://doi.org/10.1016/j

D. Fernández-González, C. Gómez-Rodríguez, Multitask pointer net- work for multi-representational parsing, Knowledge-Based Systems 33 236 (2022) 107760. URL:https://www.sciencedirect.com/science/ article/pii/S0950705121009849. doi:https://doi.org/10.1016/j. knosys.2021.107760

work page doi:10.1016/j 2022
[62]

Scheible, F

R. Scheible, F. Thomczyk, P. Tippmann, V. Jaravine, M. Boeker, Got- tbert: a pure german language model, 2020.arXiv:2012.02110. 34 Parsers PTB (task-specific approaches) Kitaev et al. [50] + BERTLarge 95.59 Yang and Deng [20] + BERTLarge 95.79 Zhou and Zhao [51] + dep + BERTLarge 95.84 Tian et al. [52] + PoS + BERTLarge 95.86 Zhou and Zhao [51] + dep + XL...

work page arXiv 2020