SETUP: Sentence-level English-To-Uniform Meaning Representation Parser
Pith reviewed 2026-05-21 18:09 UTC · model grok-4.3
The pith
A model called SETUP parses English sentences into Uniform Meaning Representation graphs at AnCast 84 and SMATCH++ 91.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
In this paper, we introduce two methods for English text-to-UMR parsing, one of which fine-tunes existing parsers for Abstract Meaning Representation and the other, which leverages a converter from Universal Dependencies, using prior work as a baseline. Our best-performing model, which we call SETUP, achieves an AnCast score of 84 and a SMATCH++ score of 91, indicating substantial gains towards automatic UMR parsing.
What carries the argument
SETUP, the fine-tuned Abstract Meaning Representation parser adapted to the Uniform Meaning Representation schema and evaluated on AnCast and SMATCH++ metrics.
If this is right
- Automatic large-scale production of UMR graphs becomes feasible at test time.
- Language documentation efforts for low-resource languages gain practical support.
- Downstream applications that rely on interpretable semantic representations become easier to develop.
- Exploration of UMR benefits in broader natural language processing tasks is enabled.
Where Pith is reading between the lines
- Extending SETUP to languages other than English could test whether the approach respects UMR's built-in flexibility across linguistic structures.
- Error analysis focused on UMR features such as aspect or modality might identify targeted refinements beyond the current metrics.
- Integrating the parser into existing semantic pipelines could create hybrid systems that combine UMR with other meaning representations.
Load-bearing premise
That fine-tuning AMR parsers or converting from Universal Dependencies trees preserves the semantic distinctions required by the UMR schema without introducing systematic errors that the chosen metrics fail to detect.
What would settle it
A human review of SETUP-generated UMR graphs that finds frequent loss of UMR-specific distinctions not captured by AnCast or SMATCH++ scores.
Figures
read the original abstract
Uniform Meaning Representation (UMR) is a novel graph-based semantic representation which captures the core meaning of a text, with flexibility incorporated into the annotation schema such that the breadth of the world's languages can be annotated (including low-resource languages). While UMR shows promise in enabling language documentation, improving low-resource language technologies, and adding interpretability, the downstream applications of UMR can only be fully explored when text-to-UMR parsers enable the automatic large-scale production of accurate UMR graphs at test time. Prior work on text-to-UMR parsing is limited to date. In this paper, we introduce two methods for English text-to-UMR parsing, one of which fine-tunes existing parsers for Abstract Meaning Representation and the other, which leverages a converter from Universal Dependencies, using prior work as a baseline. Our best-performing model, which we call SETUP, achieves an AnCast score of 84 and a SMATCH++ score of 91, indicating substantial gains towards automatic UMR parsing.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript introduces two methods for English sentence-level text-to-UMR parsing: fine-tuning existing AMR parsers and converting from Universal Dependencies trees, using prior work as baselines. The best-performing model (SETUP) is reported to achieve an AnCast score of 84 and a SMATCH++ score of 91, presented as substantial gains toward automatic UMR parsing.
Significance. If the scores reliably reflect faithful UMR parsing, the work would provide a pragmatic and useful step toward scalable UMR annotation, particularly for low-resource languages. Leveraging established AMR and UD resources is a reasonable engineering choice that could accelerate progress in semantic parsing.
major comments (2)
- [Evaluation] Evaluation section: The headline AnCast 84 / SMATCH++ 91 scores are reported without any feature-specific error breakdown, human semantic fidelity checks, or analysis of UMR attributes (e.g., event structure, attribute flexibility, cross-lingual schema elements) that are absent from AMR. This leaves open whether the metrics mask systematic omissions introduced by the AMR fine-tuning or UD conversion pipelines.
- [Methods and Experiments] Methods and Experiments: No details are provided on training data splits, statistical significance testing, or ablation studies isolating the contribution of UMR-specific extensions versus the base AMR/UD representations. Without these, it is difficult to establish that the reported gains are robust rather than artifacts of the chosen metrics.
minor comments (2)
- [Abstract] Abstract: The claim of 'substantial gains' would be strengthened by explicitly naming the prior baselines and their scores for direct comparison.
- [Methods] The manuscript would benefit from a clearer description of how UMR-specific distinctions are mapped or preserved during the AMR fine-tuning and UD conversion steps.
Simulated Author's Rebuttal
We thank the referee for the constructive comments on our manuscript. We address each major point below and indicate the revisions made to strengthen the evaluation and experimental details.
read point-by-point responses
-
Referee: [Evaluation] Evaluation section: The headline AnCast 84 / SMATCH++ 91 scores are reported without any feature-specific error breakdown, human semantic fidelity checks, or analysis of UMR attributes (e.g., event structure, attribute flexibility, cross-lingual schema elements) that are absent from AMR. This leaves open whether the metrics mask systematic omissions introduced by the AMR fine-tuning or UD conversion pipelines.
Authors: We agree that additional analysis would improve transparency. In the revised manuscript we have added a feature-specific error breakdown that examines performance on UMR attributes such as event structure and cross-lingual schema elements, together with a discussion of potential systematic omissions arising from the AMR fine-tuning and UD conversion pipelines. Comprehensive human semantic fidelity checks remain outside the scope of the current study owing to the substantial manual effort required; we continue to rely on the established automatic metrics AnCast and SMATCH++ that have been validated in prior semantic parsing work. revision: partial
-
Referee: [Methods and Experiments] Methods and Experiments: No details are provided on training data splits, statistical significance testing, or ablation studies isolating the contribution of UMR-specific extensions versus the base AMR/UD representations. Without these, it is difficult to establish that the reported gains are robust rather than artifacts of the chosen metrics.
Authors: We have expanded the Methods and Experiments section to specify the exact training, development, and test splits drawn from the official UMR dataset release. We now include statistical significance testing via bootstrap resampling when comparing SETUP against the baselines. Ablation studies isolating the UMR-specific fine-tuning adaptations and the UD conversion rules have also been added; these confirm that the reported gains in AnCast and SMATCH++ are attributable to the UMR extensions rather than the base AMR or UD components alone. revision: yes
Circularity Check
No circularity: empirical evaluation of parsing methods on external benchmarks
full rationale
The paper describes two practical methods (fine-tuning AMR parsers and UD-to-UMR conversion) and reports measured AnCast 84 / SMATCH++ 91 scores on test data. No equations, parameter fits, or derivations are presented as predictions. Prior work is used only as baseline for comparison, not as a self-referential justification for the central performance claim. The evaluation relies on independent metrics and datasets, so the reported results do not reduce to the paper's own inputs by construction.
Axiom & Free-Parameter Ledger
Lean theorems connected to this paper
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
Our best-performing model ... achieves an AnCast score of 84 and a SMATCH++ score of 91
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
-
[1]
SETUP: Sentence-level English-To-Uniform Meaning Representation Parser
Introduction Uniform Meaning Representation (UMR) is a graph- based semantic framework designed to capture the meaning of text across various languages, account- ing for linguistic diversity during the annotation pro- cess (Van Gysel et al., 2021). An example UMR graph can be seen in Figure 1. UMR builds on the foundations of Abstract Meaning Representati...
work page internal anchor Pith review Pith/arXiv arXiv 2021
-
[2]
A fine-grained analysis of the baseline pipeline’s performance (Chun and Xue, 2024) across the data in UMR v1.0 and the newer UMR dataset, UMR v2.0
work page 2024
-
[3]
Five English sentence-level text-to-UMR parser models fine-tuned from existing text- to-AMR architectures
-
[4]
A fine-tuned model that converts partial UMRs to complete UMRs, employing the UD to UMR bootstrap approach by Gamba et al. (2025). Our best performing model is our fine-tuned text- to-AMR parsers, which we call SETUP: Sentence- level English-to-UMR Parser.1
work page 2025
-
[5]
Background & Related Work UMR builds upon AMR, a semantic graph frame- work developed for English and subsequently ex- tended to multiple languages (Wein and Schnei- der, 2024). Since AMR was designed for English, each language variation of AMR has to account for language-specific features, whereas UMR’s an- notation framework is designed to accommodate m...
work page 2024
-
[6]
extends AnCast by providing a unified evalu- ation of sentence-level graphs, modal and temporal dependencies, and coreference relations, making it particularly suitable for UMR evaluation. SMATCH (Cai and Knight, 2013) measures how similar two graphs are by finding the best node alignment and calculating an F-score based on matching triples. SMATCH++ (Opi...
work page 2013
-
[7]
provide a cross-linguistically consistent syn- tactic framework that represents grammatical rela- tions in a dependency-tree format. The design of UD prioritizes cross-lingual comparability, making it a valuable structural foundation that can be uti- lized in UMR parsing. Recent work has leveraged UD trees to bootstrap partial UMR graphs (Gamba et al., 20...
work page 2025
-
[8]
Methods We build on prior work (Chun and Xue, 2024), us- ing it as a baseline model and evaluating its perfor- mance on the newly released UMR v2.0 dataset (Bonn et al., 2025), which will serve as a refer- ence point for the following approaches. Next, we build on the recent UD-to-partial UMR converter (Gamba et al., 2025) to automatically transform depen...
work page 2024
-
[9]
to retrieve sentence-level AMR alignments, while UD graphs are generated automatically by the pipeline’s conversion code from the same sen- tences. Following the baseline approach of Chun and Xue (2024), these alignments and UD struc- tures together serve as inputs to the rule-based UMR conversion component constituting the final stage of the pipeline. We...
work page 2024
-
[10]
amrlib: T5 model (Raffel et al., 2023) trained on AMR v3.0 (Knight et al., 2020)
work page 2023
-
[11]
SPRING: BART -based model that treats AMR parsing and generation as two complemen- tary, bidirectional tasks, trained on AMR v2.0 (Knight et al., 2017) and AMR v3.0
work page 2017
-
[12]
BiBL: SPRING-based model that aligns AMR graphs with their respective sentences to share learning across parsing and generation, trained on AMR v2.0 and v3.0
-
[13]
LeakDistill: transformer-based model that uses structural adapters to capture graph infor- mation, building on word-to-node alignments trained on AMR v2.0 and v3.0
-
[14]
AMRBART: BART -based model trained on AMR v2.0 and AMR v3.0. In this approach, we fine-tune the existing text-to- AMR models with UMR data, enabling the models to adapt to the UMR structures while retaining the semantic knowledge acquired during AMR training. Next, we leverage existing work from Gamba et al. (2025), who introduce a method for bootstrap- p...
work page 2025
-
[15]
Results In this section, we evaluate the performance of our different UMR parsing approaches. We first sum- marize the performance of the baseline pipeline, noting that it performs worse on UMR v2.0 than on UMR v1.0, due to substantial differences in the text genre. We further analyze the pipeline’s performance across different splits within UMR v2.0, hig...
work page 2024
-
[16]
Conclusion & Future Work In this work, we take steps towards building sentence-level English text-to-UMR parsers by ex- ploring two primary strategies: (1) fine-tuning exist- ing text-to-AMR parsers on UMR data, and (2) con- verting UD trees into partial UMR graphs and train- ing a T5 model to complete them. First, we find that the current state-of-the-ar...
work page 2024
-
[17]
Bibliographical References Xuefeng Bai, Yulong Chen, and Yue Zhang. 2022. Graph pre-training for AMR parsing and genera- tion. InProceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 6001–6015, Dublin, Ireland. Association for Computational Linguistics. Michele Bevilacqua, Rexhina Blloshmi, ...
work page 2022
-
[18]
Accelerating UMR adoption: Neuro- symbolic conversion from AMR-to-UMR with low supervision. InProceedings of the Fifth International Workshop on Designing Meaning Representations @ LREC-COLING 2024, pages 140–150, Torino, Italia. ELRA and ICCL. Peng Qi, Yuhao Zhang, Yuhui Zhang, Jason Bolton, and Christopher D. Manning. 2020. Stanza: A Python natural lang...
work page 2024
-
[19]
Language Resource References
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.