pith. machine review for the scientific record. sign in

arxiv: 2605.13373 · v1 · submitted 2026-05-13 · 💻 cs.CL

Recognition: no theorem link

Exploiting Pre-trained Encoder-Decoder Transformers for Sequence-to-Sequence Constituent Parsing

Cristina Outeiri\~no Cid, Daniel Fern\'andez-Gonz\'alez

Authors on Pith no claims yet

Pith reviewed 2026-05-14 19:53 UTC · model grok-4.3

classification 💻 cs.CL
keywords constituent parsingsequence-to-sequencepre-trained transformersBARTT5linearized treessyntactic parsingencoder-decoder models
0
0 comments X

The pith

Pre-trained encoder-decoder models like BART and T5, when fine-tuned to output linearized parse trees, outperform earlier sequence-to-sequence parsers and compete with specialized constituent parsers on continuous data.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper tests whether standard encoder-decoder transformers can handle constituent parsing by treating it as a sequence generation task. Researchers fine-tune BART, mBART, and T5 on linearized versions of syntactic trees instead of using encoder-only models such as BERT. They compare multiple ways to turn trees into linear sequences and run the models on both standard continuous treebanks and harder discontinuous ones. The fine-tuned models exceed all previous sequence-to-sequence parsers and reach performance levels close to leading task-specific parsers when the trees are continuous. This shows that general pre-trained seq2seq architectures can acquire syntactic structure through ordinary fine-tuning on tree linearizations.

Core claim

Initializing a sequence-to-sequence parser with a pre-trained encoder-decoder model such as BART, mBART or T5 and fine-tuning it to generate linearized constituent trees produces better results than any earlier sequence-to-sequence parser and reaches competitive accuracy with the best task-specific constituent parsers on continuous treebanks.

What carries the argument

Fine-tuning pre-trained encoder-decoder transformers to generate linearized constituent parse trees from input sentences.

If this is right

  • Sequence-to-sequence parsing can now draw directly on general-purpose pre-trained encoder-decoder models rather than requiring custom encoder-only initializations.
  • Performance varies with the choice of linearization strategy, with some formats working better for continuous trees than for discontinuous ones.
  • The same fine-tuning recipe applies across languages when mBART is used, opening the method to multilingual treebanks.
  • No architectural changes beyond standard fine-tuning are needed to reach competitive continuous parsing accuracy.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The result suggests that syntactic information is already latent in the pre-training objectives of large encoder-decoder models and does not require separate syntactic pre-training.
  • Developers of new structured-prediction systems could adopt the same fine-tuning pattern for tasks such as semantic role labeling or discourse parsing.
  • If linearization methods improve, the same models might close the remaining gap on discontinuous parsing without new architectures.
  • The approach lowers the barrier to building parsers, because only a standard seq2seq training loop is required.

Load-bearing premise

Standard fine-tuning of encoder-decoder models on linearized trees is enough to capture the full syntactic structure without extra task-specific mechanisms.

What would settle it

A replication on the same continuous benchmarks that finds the fine-tuned BART or T5 models fall substantially below the accuracy of leading task-specific parsers would show the competitiveness claim does not hold.

Figures

Figures reproduced from arXiv: 2605.13373 by Cristina Outeiri\~no Cid, Daniel Fern\'andez-Gonz\'alez.

Figure 1
Figure 1. Figure 1: Examples of continuous (a) and discontinuous (b) constituent trees, and different [PITH_FULL_IMAGE:figures/full_fig_p006_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Transition sequence used by a transition-based parser to construct the discon [PITH_FULL_IMAGE:figures/full_fig_p009_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Transition sequence used by a transition-based parser to construct the discon [PITH_FULL_IMAGE:figures/full_fig_p010_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: F-score achieved by continuous linearizations on BART [PITH_FULL_IMAGE:figures/full_fig_p017_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: F-score of the lexicalized in-order linearization across pre-trained en [PITH_FULL_IMAGE:figures/full_fig_p019_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: F-score of lexicalized and non-lexicalized discontinuous linearizations on [PITH_FULL_IMAGE:figures/full_fig_p021_6.png] view at source ↗
read the original abstract

To achieve deep natural language understanding, syntactic constituent parsing plays a crucial role and is widely required by many artificial intelligence systems for processing both text and speech. A recent approach involves using standard sequence-to-sequence models to handle constituent parsing as a machine translation problem, moving away from traditional task-specific parsers. These models are typically initialized with pre-trained encoder-only language models like BERT or RoBERTa. However, the use of pre-trained encoder-decoder language models for constituency parsing has not been thoroughly explored. To bridge this gap, we extend the sequence-to-sequence framework by investigating parsers built on pre-trained encoder-decoder architectures, including BART, mBART, and T5. We fine-tune them to generate linearized parse trees and extensively evaluate them on different linearization strategies across both continuous treebanks and more complex discontinuous benchmarks. Our results demonstrate that our approach outperforms all prior sequence-to-sequence models and performs competitively with leading task-specific constituent parsers on continuous constituent parsing.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. The paper frames constituent parsing as a seq2seq task and fine-tunes pre-trained encoder-decoder models (BART, mBART, T5) to generate linearized trees. It evaluates multiple linearization strategies on continuous and discontinuous treebanks and claims that the resulting parsers outperform all prior seq2seq models while remaining competitive with leading task-specific parsers on continuous data.

Significance. If the performance claims are shown to rest on valid tree outputs, the work would establish that standard fine-tuning of general encoder-decoder transformers is sufficient for high-quality constituent parsing, reducing the need for bespoke architectures and extending the reach of transfer learning to structured prediction.

major comments (2)
  1. [Abstract / Experimental Setup] Abstract and experimental description: no mechanism (constrained decoding, validity filter, or post-processing) is described to guarantee that generated strings are well-formed trees. Because seq2seq generation can produce bracket mismatches or unbalanced structures, the reported F1 scores and the claim of outperformance over prior seq2seq parsers cannot be evaluated without evidence that invalid outputs are negligible.
  2. [Results] Results and evaluation sections: the manuscript provides no statistical significance tests, confidence intervals, or error analysis comparing against prior seq2seq baselines. Without these, the competitiveness claim with task-specific parsers on continuous treebanks remains only moderately supported.
minor comments (1)
  1. [Method] Clarify the exact linearization formats used for each model with a short example in the method section; current description leaves ambiguity about bracket conventions and non-terminal ordering.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive comments. We will revise the manuscript to address the concerns about ensuring well-formed tree outputs and adding statistical analysis.

read point-by-point responses
  1. Referee: [Abstract / Experimental Setup] Abstract and experimental description: no mechanism (constrained decoding, validity filter, or post-processing) is described to guarantee that generated strings are well-formed trees. Because seq2seq generation can produce bracket mismatches or unbalanced structures, the reported F1 scores and the claim of outperformance over prior seq2seq parsers cannot be evaluated without evidence that invalid outputs are negligible.

    Authors: We agree that the current version does not describe any mechanism for guaranteeing well-formed trees. In practice our fine-tuned models produced very few invalid bracket sequences, but this was not quantified or explained. We will revise the experimental setup section to include a post-processing validity filter that discards mismatched bracket strings, report the exact percentage of invalid outputs (observed to be under 1% on all treebanks), and confirm that all reported F1 scores are computed only on valid trees. This addition will directly support the reliability of the performance claims. revision: yes

  2. Referee: [Results] Results and evaluation sections: the manuscript provides no statistical significance tests, confidence intervals, or error analysis comparing against prior seq2seq baselines. Without these, the competitiveness claim with task-specific parsers on continuous treebanks remains only moderately supported.

    Authors: We acknowledge that the manuscript lacks statistical significance testing and confidence intervals. We will add bootstrap confidence intervals (1,000 resamples) for all reported F1 scores and paired significance tests (McNemar’s test) against the strongest prior seq2seq baselines. These results, together with a brief error analysis highlighting the main remaining error types, will be inserted into the results section to strengthen the competitiveness claims on continuous treebanks. revision: yes

Circularity Check

0 steps flagged

No circularity in fine-tuning pipeline for seq2seq parsing

full rationale

The paper presents an empirical pipeline: initialize BART/mBART/T5, fine-tune on linearized trees, and report F1 on external treebank test sets. No equations, parameters, or derivations are defined in terms of the reported metrics. No self-citations are invoked as uniqueness theorems or to justify the core method. Results are measured against independent benchmarks, so the central claim does not reduce to its own inputs by construction.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 0 invented entities

The central claim rests on the domain assumption that linearized trees are an adequate representation for seq2seq generation and that standard transformer fine-tuning transfers syntactic knowledge from pre-training. No new entities are postulated. Hyperparameters such as learning rate and batch size are free parameters chosen during fine-tuning.

free parameters (1)
  • fine-tuning hyperparameters
    Learning rate, batch size, and number of epochs are selected to optimize validation performance on the parsing task.
axioms (1)
  • domain assumption Linearized parse trees preserve sufficient syntactic information for the model to learn the underlying structure via sequence generation.
    Invoked when the authors convert trees to sequences for training and evaluation.

pith-pipeline@v0.9.0 · 5471 in / 1223 out tokens · 46300 ms · 2026-05-14T19:53:57.933858+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

62 extracted references · 62 canonical work pages · 2 internal anchors

  1. [1]

    J. G¯ u, H. S. Shavarani, A. Sarkar, Top-down tree structured decoding with syntactic connections for neural machine translation and parsing, in: Proceedings of the 2018 Conference on Empirical Methods in Nat- ural Language Processing, Association for Computational Linguistics, Brussels, Belgium, 2018, pp. 401–413. URL:https://www.aclweb.org/ anthology/D1...

  2. [2]

    X. Wang, H. Pham, P. Yin, G. Neubig, A tree-based decoder for neu- ral machine translation, in: Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, Association for Computational Linguistics, Brussels, Belgium, 2018, pp. 4772–4777. URL:https://www.aclweb.org/anthology/D18-1509. doi:10.18653/ v1/D18-1509

  3. [3]

    Currey, K

    A. Currey, K. Heafield, Incorporating source syntax into transformer- based neural machine translation, in: Proceedings of the Fourth Con- ference on Machine Translation (Volume 1: Research Papers), Asso- ciation for Computational Linguistics, Florence, Italy, 2019, pp. 24–33. URL:https://www.aclweb.org/anthology/W19-5203. doi:10.18653/ v1/W19-5203

  4. [4]

    Bouras, M

    D. Bouras, M. Amroune, H. Bendjenna, I. Bendib, Improving fine- grained opinion mining approach with a deep constituency tree-long short term memory network and word embedding, Recent Advances in Computer Science and Communications 15 (2022) 571–582. doi:https: //doi.org/10.2174/2666255813999200922142212

  5. [5]

    Ju- rafsky, J

    D.Yin, T.Meng, K.-W.Chang, SentiBERT:Atransferabletransformer- based architecture for compositional sentiment semantics, in: D. Ju- rafsky, J. Chai, N. Schluter, J. Tetreault (Eds.), Proceedings of the 58th Annual Meeting of the Association for Computational Linguis- tics, Association for Computational Linguistics, Online, 2020, pp. 3695–3706. URL:https:/...

  6. [6]

    J. Bai, Y. Wang, Y. Chen, Y. Yang, J. Bai, J. Yu, Y. Tong, Syntax- BERT: Improving pre-trained transformers with syntax trees, in: Pro- ceedings of the 16th Conference of the European Chapter of the As- 25 sociation for Computational Linguistics: Main Volume, Association for Computational Linguistics, Online, 2021, pp. 3011–3020. URL:https: //www.aclweb.o...

  7. [7]

    Jiang, J

    M. Jiang, J. Diesner, A constituency parsing tree based method for rela- tion extraction from abstracts of scholarly publications, in: D. Ustalov, S. Somasundaran, P. Jansen, G. Glavaš, M. Riedl, M. Surdeanu, M. Vazirgiannis (Eds.), Proceedings of the Thirteenth Workshop on Graph-Based Methods for Natural Language Processing (TextGraphs- 13), Association ...

  8. [8]

    A. Tang, L. Deléger, R. Bossy, P. Zweigenbaum, C. Nédellec, Do syntactic trees enhance bidirectional encoder representations from transformers (bert) models for chemical–drug relation ex- traction?, Database 2022 (2022) baac070. URL:https://doi. org/10.1093/database/baac070. doi:10.1093/database/baac070. arXiv:https://academic.oup.com/database/article-pdf...

  9. [9]

    Y. Yang, K. Li, X. Quan, W. Shen, Q. Su, Constituency lattice encoding for aspect term extraction, in: D. Scott, N. Bel, C. Zong (Eds.), Pro- ceedingsofthe28thInternationalConferenceonComputationalLinguis- tics, International Committee on Computational Linguistics, Barcelona, Spain (Online), 2020, pp. 844–855. URL:https://aclanthology.org/ 2020.coling-mai...

  10. [10]

    Swayamdipta, S

    S. Swayamdipta, S. Thomson, K. Lee, L. Zettlemoyer, C. Dyer, N. A. Smith, Syntacticscaffoldsforsemanticstructures, in: Proceedingsofthe 2018 Conference on Empirical Methods in Natural Language Processing, Association for Computational Linguistics, 2018, pp. 3772–3782. URL: http://aclweb.org/anthology/D18-1412

  11. [11]

    H. Wu, D. Huang, X. Lin, Semantic textual similarity with con- stituent parsing heterogeneous graph attention networks, Symmetry 17 (2025). URL:https://www.mdpi.com/2073-8994/17/4/486. doi:10. 3390/sym17040486

  12. [12]

    Y. Wang, M. Johnson, S. Wan, Y. Sun, W. Wang, How to best use syntax in semantic role labelling, in: Proceedings of the 57th Annual 26 Meeting of the Association for Computational Linguistics, Association for Computational Linguistics, Florence, Italy, 2019, pp. 5338–5343. URL:https://www.aclweb.org/anthology/P19-1529. doi:10.18653/ v1/P19-1529

  13. [13]

    Vinyals, L

    O. Vinyals, L. Kaiser, T. Koo, S. Petrov, I. Sutskever, G. Hinton, Grammar as a foreign language, in: Proceedings of the 28th Interna- tional Conference on Neural Information Processing Systems - Volume 2, NIPS’15, MIT Press, Cambridge, MA, USA, 2015, pp. 2773–2781. URL:http://dl.acm.org/citation.cfm?id=2969442.2969550

  14. [14]

    Sutskever, O

    I. Sutskever, O. Vinyals, Q. V. Le, Sequence to sequence learning with neural networks, in: Proceedings of the 27th International Conference on Neural Information Processing Systems - Volume 2, NIPS’14, MIT Press, Cambridge, MA, USA, 2014, p. 3104–3112

  15. [15]

    Fernández-González, C

    D. Fernández-González, C. Gómez-Rodríguez, Discontinuous gram- mar as a foreign language, Neurocomputing 524 (2023) 43–

  16. [16]

    doi:https://doi.org/10.1016/j.neucom.2022

    URL:https://www.sciencedirect.com/science/article/pii/ S092523122201551X. doi:https://doi.org/10.1016/j.neucom.2022. 12.045

  17. [17]

    Y. Liu, J. Gu, N. Goyal, X. Li, S. Edunov, M. Ghazvininejad, M. Lewis, L. Zettlemoyer, Multilingual denoising pre-training for neural machine translation, Transactions of the Association for Computational Lin- guistics8(2020)726–742.URL:https://www.aclweb.org/anthology/ 2020.tacl-1.47. doi:10.1162/tacl_a_00343

  18. [18]

    C. Wang, K. Cho, J. Gu, Neural machine translation with byte-level subwords, Proceedings of the AAAI Conference on Artificial Intelligence 34 (2020) 9154–9160. doi:10.1609/aaai.v34i05.6451

  19. [19]

    Mohamed, D

    A. Mohamed, D. Okhonko, L. Zettlemoyer, Transformers with convo- lutional context for ASR, CoRR abs/1904.11660 (2019). URL:http: //arxiv.org/abs/1904.11660.arXiv:1904.11660

  20. [20]

    C. Wang, J. Pino, J. Gu, Improving Cross-Lingual Transfer Learning for End-to-End Speech Recognition with Speech Translation, arXiv e-prints (2020) arXiv:2006.05474.arXiv:2006.05474. 27

  21. [21]

    K. Yang, J. Deng, Strongly incremental constituency parsing with graph neural networks, in: Neural Information Processing Systems, 2020

  22. [22]

    Coavoux, S

    M. Coavoux, S. B. Cohen, Discontinuous constituency parsing with a stack-free transition system and a dynamic oracle, in: Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics, Association for Computational Linguistics, Minneapolis, Minnesota, 2019, pp. 204–217

  23. [23]

    Mrini, F

    K. Mrini, F. Dernoncourt, Q. H. Tran, T. Bui, W. Chang, N. Nakas- hole, Rethinking self-attention: Towards interpretability in neural parsing, in: Findings of the Association for Computational Linguis- tics: EMNLP 2020, Association for Computational Linguistics, Online, 2020, pp. 731–742. URL:https://www.aclweb.org/anthology/2020. findings-emnlp.65. doi:1...

  24. [24]

    C. Corro, Span-based discontinuous constituency parsing: a family of exact chart-based algorithms with time complexities from O(nˆ6) down to O(nˆ3), in: Proceedings of the 2020 Conference on Em- pirical Methods in Natural Language Processing (EMNLP), Associa- tion for Computational Linguistics, Online, 2020, pp. 2753–2764. URL: https://www.aclweb.org/anth...

  25. [25]

    Devlin, M.-W

    J. Devlin, M.-W. Chang, K. Lee, K. Toutanova, BERT: Pre-training of deep bidirectional transformers for language understanding, in: Pro- ceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Tech- nologies, Volume 1 (Long and Short Papers), Association for Com- putational Linguistics...

  26. [26]

    Y. Liu, M. Ott, N. Goyal, J. Du, M. Joshi, D. Chen, O. Levy, M. Lewis, L. Zettlemoyer, V. Stoyanov, Roberta: A robustly optimized bert pre- training approach, arXiv preprint arXiv:1907.11692 (2019)

  27. [27]

    Lewis, Y

    M. Lewis, Y. Liu, N. Goyal, M. Ghazvininejad, A. Mohamed, O. Levy, V. Stoyanov, L. Zettlemoyer, BART: Denoising sequence-to-sequence 28 pre-training for natural language generation, translation, and compre- hension, in: D. Jurafsky, J. Chai, N. Schluter, J. Tetreault (Eds.), Proceedings of the 58th Annual Meeting of the Association for Com- putational Lin...

  28. [28]

    Raffel, N

    C. Raffel, N. Shazeer, A. Roberts, K. Lee, S. Narang, M. Matena, Y. Zhou, W. Li, P. J. Liu, Exploring the limits of transfer learning with a unified text-to-text transformer, Journal of Machine Learning Research 21 (2020) 1–67. URL:http://jmlr.org/papers/v21/20-074.html

  29. [29]

    Fernández-González, C

    D. Fernández-González, C. Gómez-Rodríguez, Enriched in-order lin- earization for faster sequence-to-sequence constituent parsing, in: Pro- ceedings of the 58th Annual Meeting of the Association for Computa- tional Linguistics, Association for Computational Linguistics, Online, 2020, pp. 4092–4099. URL:https://www.aclweb.org/anthology/ 2020.acl-main.376. d...

  30. [30]

    M. P. Marcus, B. Santorini, M. A. Marcinkiewicz, Building a large anno- tated corpus of English: The Penn Treebank, Computational Linguistics 19 (1993) 313–330

  31. [31]

    Evang, L

    K. Evang, L. Kallmeyer, PLCFRS parsing of English discontinuous constituents, in: Proceedings of the 12th International Conference on Parsing Technologies, Association for Computational Linguistics, Dublin, Ireland, 2011, pp. 104–116. URL:https://www.aclweb.org/ anthology/W11-2913

  32. [32]

    W. Skut, B. Krenn, T. Brants, H. Uszkoreit, An annotation scheme for free word order languages, in: Proceedings of the Fifth Conference on Applied Natural Language Processing, ANLC ’97, Association for Computational Linguistics, Stroudsburg, PA, USA, 1997, pp. 88–95

  33. [33]

    Brants, S

    S. Brants, S. Dipper, S. Hansen, W. Lezius, G. Smith, TIGER tree- bank, in: Proceedings of the 1st Workshop on Treebanks and Linguistic Theories (TLT), 2002, pp. 24–42

  34. [34]

    Neural Machine Translation by Jointly Learning to Align and Translate

    D. Bahdanau, K. Cho, Y. Bengio, Neural machine translation by jointly learning to align and translate, CoRR abs/1409.0473 (2014). 29

  35. [35]

    Kamigaito, K

    H. Kamigaito, K. Hayashi, T. Hirao, H. Takamura, M. Okumura, M. Na- gata, Supervised attention for sequence-to-sequence constituency pars- ing, in: Proceedings of the Eighth International Joint Conference on Natural Language Processing (Volume 2: Short Papers), Asian Feder- ation of Natural Language Processing, Taipei, Taiwan, 2017, pp. 7–12. URL:https://...

  36. [36]

    C. Ma, L. Liu, A. Tamura, T. Zhao, E. Sumita, Deterministic attention for sequence-to-sequence constituent parsing, in: AAAI Conference on Artificial Intelligence, 2017. URL:https://aaai.org/ocs/index.php/ AAAI/AAAI17/paper/view/14317

  37. [37]

    J. Liu, Y. Zhang, Encoder-decoder shift-reduce syntactic parsing, in: Proceedings of the 15th International Conference on Parsing Technolo- gies, IWPT 2017, Pisa, Italy, September 20-22, 2017, 2017, pp. 105–114. URL:https://aclanthology.info/papers/W17-6315/w17-6315

  38. [38]

    L. Liu, M. Zhu, S. Shi, Improving sequence-to-sequence con- stituency parsing, in: AAAI Conference on Artificial Intel- ligence, 2018. URL:https://www.aaai.org/ocs/index.php/AAAI/ AAAI18/paper/view/16347

  39. [39]

    Vaswani, N

    A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, u. Kaiser, I. Polosukhin, Attention is all you need, in: Proceedings of the 31st International Conference on Neural Information Processing Systems, NIPS’17, Curran Associates Inc., Red Hook, NY, USA, 2017, p. 6000–6010

  40. [40]

    Cross, L

    J. Cross, L. Huang, Span-based constituency parsing with a structure- label system and provably optimal dynamic oracles, in: Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, Association for Computational Linguistics, Austin, Texas, 2016, pp.1–11.URL:https://www.aclweb.org/anthology/D16-1001. doi:10.18653/v1/D16-1001

  41. [41]

    C. Dyer, A. Kuncoro, M. Ballesteros, N. A. Smith, Recurrent neural net- work grammars, in: HLT-NAACL, The Association for Computational Linguistics, 2016, pp. 199–209. 30

  42. [42]

    J. Liu, Y. Zhang, In-order transition-based constituent parsing, Transactions of the Association for Computational Linguistics 5 (2017) 413–424. URL:https://www.transacl.org/ojs/index.php/ tacl/article/view/1199

  43. [43]

    Fernandez Astudillo, M

    R. Fernandez Astudillo, M. Ballesteros, T. Naseem, A. Blodgett, R. Flo- rian, Transition-based parsing with stack-transformers, in: Findings of the Association for Computational Linguistics: EMNLP 2020, As- sociation for Computational Linguistics, Online, 2020, pp. 1001–1007. URL:https://www.aclweb.org/anthology/2020.findings-emnlp

  44. [44]

    doi:10.18653/v1/2020.findings-emnlp.89

  45. [45]

    Fernández-González, C

    D. Fernández-González, C. Gómez-Rodríguez, Faster shift-reduce constituent parsing with a non-binary, bottom-up strategy, Ar- tificial Intelligence 275 (2019) 559 – 574. URL:http://www. sciencedirect.com/science/article/pii/S000437021830540X. doi:https://doi.org/10.1016/j.artint.2019.07.006

  46. [46]

    Suzuki, S

    J. Suzuki, S. Takase, H. Kamigaito, M. Morishita, M. Nagata, An empirical study of building a strong baseline for constituency pars- ing, in: Proceedings of the 56th Annual Meeting of the Associa- tion for Computational Linguistics (Volume 2: Short Papers), As- sociation for Computational Linguistics, Melbourne, Australia, 2018, pp. 612–618. URL:https://w...

  47. [47]

    Dubey, F

    A. Dubey, F. Keller, Probabilistic parsing for German using sister- head dependencies, in: Proceedings of the 41st Annual Meeting of the Association for Computational Linguistics, Sapporo, Japan, 2003, pp. 96–103

  48. [48]

    Seddah, R

    D. Seddah, R. Tsarfaty, S. Kübler, M. Candito, J. D. Choi, R. Farkas, J. Foster, I. Goenaga, K. Gojenola Galletebeitia, Y. Goldberg, S. Green, N. Habash, M. Kuhlmann, W. Maier, J. Nivre, A. Przepiórkowski, R. Roth, W. Seeker, Y. Versley, V. Vincze, M. Woliński, A. Wróblewska, E. Villemonte de la Clergerie, Overview of the SPMRL 2013 shared task: A cross-f...

  49. [49]

    Published as a conference paper at the 3rd International Conference for Learning Representations, San Diego, 2015

    D.P.Kingma, J.Ba, Adam: Amethodforstochasticoptimization, 2014. Published as a conference paper at the 3rd International Conference for Learning Representations, San Diego, 2015

  50. [50]

    van Cranenburgh, R

    A. van Cranenburgh, R. Scha, R. Bod, Data-oriented parsing with dis- continuous constituents and function tags, J. Language Modelling 4 (2016) 57–111

  51. [51]

    Z. Yang, Z. Dai, Y. Yang, J. Carbonell, R. R. Salakhutdinov, Q. V. Le, Xlnet: Generalized autoregressive pretraining for lan- guage understanding, in: H. Wallach, H. Larochelle, A. Beygelz- imer, F. d'Alché-Buc, E. Fox, R. Garnett (Eds.), Advances in Neural Information Processing Systems, volume 32, Curran Asso- ciates, Inc., 2019. URL:https://proceedings...

  52. [52]

    Kitaev, S

    N. Kitaev, S. Cao, D. Klein, Multilingual constituency parsing with self-attention and pre-training, in: Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, Association for Computational Linguistics, Florence, Italy, 2019, pp. 3499–3505. URL:https://www.aclweb.org/anthology/P19-1340. doi:10.18653/ v1/P19-1340

  53. [53]

    J. Zhou, H. Zhao, Head-driven phrase structure grammar parsing on Penn treebank, in: Proceedings of the 57th Annual Meeting of the As- sociation for Computational Linguistics, Association for Computational Linguistics, Florence, Italy, 2019, pp. 2396–2408. URL:https://www. aclweb.org/anthology/P19-1230. doi:10.18653/v1/P19-1230

  54. [54]

    Y. Tian, Y. Song, F. Xia, T. Zhang, Improving constituency parsing with span attention, in: Findings of the Association for Computational Linguistics: EMNLP 2020, Association for Computational Linguistics, Online, 2020, pp. 1691–1703. URL:https://aclanthology.org/2020. findings-emnlp.153. doi:10.18653/v1/2020.findings-emnlp.153. 32

  55. [55]

    Suzuki, S

    J. Suzuki, S. Takase, H. Kamigaito, M. Morishita, M. Nagata, An em- pirical study of building a strong baseline for constituency parsing, in: Proceedings of the 56th Annual Meeting of the Association for Com- putational Linguistics (Volume 2: Short Papers), Association for Com- putational Linguistics, Melbourne, Australia, 2018, pp. 612–618. URL: https://...

  56. [56]

    Fernández-González, C

    D. Fernández-González, C. Gómez-Rodríguez, Reducing discontinuous to continuous parsing with pointer network reordering, in: Proceed- ings of the 2021 Conference on Empirical Methods in Natural Lan- guage Processing, Association for Computational Linguistics, Online and Punta Cana, Dominican Republic, 2021, pp. 10570–10578. URL: https://aclanthology.org/2...

  57. [57]

    Coavoux, B

    M. Coavoux, B. Crabbé, S. B. Cohen, Unlexicalized transition-based discontinuous constituency parsing, Transactions of the Association for Computational Linguistics 7 (2019) 73–89

  58. [58]

    Stanojević, M

    M. Stanojević, M. Steedman, Span-based LCFRS-2 parsing, in: Pro- ceedings of the 16th International Conference on Parsing Technologies and the IWPT 2020 Shared Task on Parsing into Enhanced Univer- sal Dependencies, Association for Computational Linguistics, Online, 2020, pp. 111–121. URL:https://www.aclweb.org/anthology/2020. iwpt-1.12. doi:10.18653/v1/2...

  59. [59]

    Fernández-González, C

    D. Fernández-González, C. Gómez-Rodríguez, Discontinuous con- stituent parsing with pointer networks, in: Proceedings of the Thirty- Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, New York, NY, USA, February 7-12, 2020, AAAI Press, 2020, pp. 7724–7731

  60. [60]

    Ruprecht, R

    T. Ruprecht, R. Mörbitz, Supertagging-based parsing with linear context-free rewriting systems, in: Proceedings of the 2021 Confer- ence of the North American Chapter of the Association for Compu- tational Linguistics: Human Language Technologies, Association for Computational Linguistics, Online, 2021, pp. 2923–2935. URL:https: //www.aclweb.org/anthology...

  61. [61]

    https://doi.org/10.1016/j

    D. Fernández-González, C. Gómez-Rodríguez, Multitask pointer net- work for multi-representational parsing, Knowledge-Based Systems 33 236 (2022) 107760. URL:https://www.sciencedirect.com/science/ article/pii/S0950705121009849. doi:https://doi.org/10.1016/j. knosys.2021.107760

  62. [62]

    Scheible, F

    R. Scheible, F. Thomczyk, P. Tippmann, V. Jaravine, M. Boeker, Got- tbert: a pure german language model, 2020.arXiv:2012.02110. 34 Parsers PTB (task-specific approaches) Kitaev et al. [50] + BERTLarge 95.59 Yang and Deng [20] + BERTLarge 95.79 Zhou and Zhao [51] + dep + BERTLarge 95.84 Tian et al. [52] + PoS + BERTLarge 95.86 Zhou and Zhao [51] + dep + XL...