HULAT2 at MER-TRANS 2026: Governed Multi-Agent Simplification for Spanish Easy-to-Read Generation

Lourdes Moreno; Marco Antonio Sanchez-Escudero; Miguel Dom\'inguez-G\'omez; Paloma Mart\'inez

arxiv: 2607.02381 · v1 · pith:CB7MT6TEnew · submitted 2026-07-02 · 💻 cs.CL

HULAT2 at MER-TRANS 2026: Governed Multi-Agent Simplification for Spanish Easy-to-Read Generation

Lourdes Moreno , Paloma Mart\'inez , Marco Antonio Sanchez-Escudero , Miguel Dom\'inguez-G\'omez This is my paper

Pith reviewed 2026-07-03 14:32 UTC · model grok-4.3

classification 💻 cs.CL

keywords Easy-to-Read generationmulti-agent workflowtext simplificationSpanish machine translationshared taskquality signalsLangGraph

0 comments

The pith

Signal-guided multi-agent routing outperforms linear regeneration for Spanish Easy-to-Read generation.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper reports three automatic runs submitted to the Spanish track of the MER-TRANS 2026 shared task on Easy-to-Read translation. Two runs employ a LangGraph multi-agent workflow that combines Gemini 2.5 Flash and RigoChat-7B-v2, uses parallel generation, and routes edits via Event-Condition-Action logic driven by five internal quality signals. The third run is a linear generate-evaluate-regenerate baseline using only RigoChat with prompt engineering and LoRA adaptation. On the official SARI metric, the base multi-agent run scores 44.0543, the version with added lexical support scores 43.1049, and the baseline scores 38.5136. The authors conclude that the signal-guided routing produced higher reference-based scores than the linear baseline and that the lexical layer did not yield further gains.

Core claim

In the Spanish track of MER-TRANS 2026, the signal-guided multi-agent workflow achieved the highest SARI score of 44.0543 among the team's submissions and outperformed the linear regeneration baseline at 38.5136, while addition of a lexical-support layer produced a lower score of 43.1049.

What carries the argument

Event-Condition-Action routing driven by internal quality signals for semantic fidelity, readability, lexical simplicity, syntactic clarity and factual consistency.

If this is right

Signal-guided multi-agent routing yields higher SARI than linear regeneration in this Easy-to-Read setting.
An added lexical-support layer based on glossaries does not raise reference-based scores.
Traceable decisions from the routing allow controlled editing during generation.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Segment-level and document-level breakdowns could show whether the SARI gains correspond to improved readability for target readers.
The same routing logic could be applied to other languages in the shared task to test cross-lingual consistency.
Human adequacy ratings would be needed to check whether higher automatic scores reflect better factual consistency and user suitability.

Load-bearing premise

The five internal quality signals serve as reliable proxies for the official SARI and BERTScore metrics.

What would settle it

A measured correlation near zero between the internal signals and official SARI scores on the same test segments would show that the routing decisions do not track the leaderboard metric.

Figures

Figures reproduced from arXiv: 2607.02381 by Lourdes Moreno, Marco Antonio Sanchez-Escudero, Miguel Dom\'inguez-G\'omez, Paloma Mart\'inez.

**Figure 1.** Figure 1: LangGraph-based multi-agent workflow used for RUN1 and RUN2 in the MER-TRANS 2026 Spanish submissions. 3.3. Multi-Agent Workflow RUN1 and RUN2 were generated with the same LangGraph-based multi-agent architecture. LangGraph was used to implement a stateful workflow with specialised nodes, conditional routing, retry loops and traceable intermediate decisions [34]. The system combines Gemini 2.5 Flash [35] a… view at source ↗

read the original abstract

This paper describes the participation of HULAT2-UC3M in the Spanish track of MER-TRANS 2026, a shared task on multilingual Easy-to-Read translation. Three fully automatic Spanish runs were submitted. RUN1 and RUN2 used a LangGraph-based multi-agent workflow combining Gemini 2.5 Flash and RigoChat-7B-v2, parallel generation strategies, internal quality signals, Event-Condition-Action routing, controlled editing and traceable decisions. RUN1 used the base workflow, while RUN2 activated an additional lexical-support layer based on a glossary and lexical resources. RUN3 was a RigoChat-based generate-evaluate-regenerate baseline with prompt engineering and LoRA-based adaptation. The official leaderboard reports BLEU-Orig, BLEU-Gold, SARI and BERTScore. During development, additional internal signals were also inspected, including semantic fidelity, readability, lexical simplicity, syntactic clarity and factual consistency. According to official SARI, RUN1 was the best HULAT2 run, with 44.0543 points, followed by RUN2 with 43.1049 and RUN3 with 38.5136. These results indicate that, in this task setting, signal-guided multi-agent routing outperformed the linear regeneration baseline. They also show that adding lexical support did not automatically improve reference-based scores. Further segment-level and document-level analysis are required to assess readability, factual consistency and user-oriented adequacy.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The SARI edge for the multi-agent runs is probably from including Gemini rather than from the routing or signals, since the baseline sticks to the weaker model only.

read the letter

The main thing here is that the reported SARI win for the multi-agent system is hard to credit to the routing because the better runs include Gemini 2.5 Flash while the baseline does not.

This is a participation report for the Spanish track of MER-TRANS 2026 on Easy-to-Read generation. The authors describe a LangGraph workflow that combines Gemini and RigoChat-7B-v2 with parallel generation, internal quality signals like semantic fidelity and factual consistency, and Event-Condition-Action routing for decisions. They submitted three runs, with RUN1 as the full multi-agent setup, RUN2 adding glossary lookup, and RUN3 as a simpler RigoChat generate-evaluate-regenerate baseline with LoRA. The official leaderboard shows RUN1 at 44.0543 SARI, RUN2 at 43.1049, and RUN3 at 38.5136.

The paper does a solid job of documenting the system components and the leaderboard outcomes without overinterpreting them. It also flags that the lexical support layer did not raise the reference-based scores.

The soft spot is the absence of any ablation that holds the model fixed, which leaves the performance delta open to the alternative explanation that Gemini simply does better on this task. There are no error bars, statistical comparisons, or segment-level results mentioned, and the abstract gives no evidence that the internal signals actually predict the official metrics.

This paper is mainly for other teams in the same shared task who want to see what a multi-agent entry looked like. It has little to offer readers outside that setting.

I would not bring it to reading group or cite it. It does not merit sending out for peer review in this form.

Referee Report

2 major / 2 minor

Summary. The manuscript reports HULAT2's participation in the Spanish track of the MER-TRANS 2026 shared task on Easy-to-Read translation. It describes three runs: RUN1 and RUN2 implement a LangGraph multi-agent workflow using Gemini 2.5 Flash and RigoChat-7B-v2 with parallel generation, internal quality signals (semantic fidelity, readability, lexical simplicity, syntactic clarity, factual consistency), Event-Condition-Action routing, and controlled editing; RUN2 adds a lexical-support glossary layer. RUN3 is a linear generate-evaluate-regenerate baseline using only RigoChat-7B-v2 with LoRA adaptation. Official leaderboard results are reported: RUN1 achieves the highest SARI (44.0543), followed by RUN2 (43.1049) and RUN3 (38.5136). The authors conclude that signal-guided multi-agent routing outperformed the linear baseline and that lexical support did not improve reference-based scores.

Significance. The work contributes a detailed system description and official shared-task scores for a governed multi-agent approach to Spanish text simplification. If the SARI advantage can be isolated to the routing mechanism, it would support the value of internal quality signals and traceable multi-agent control in simplification pipelines. The explicit comparison of multiple runs and the use of official metrics are strengths. However, the absence of controlled ablations limits the ability to draw firm conclusions about the routing component.

major comments (2)

[Results] Results: The SARI difference between RUN1 (44.0543) and RUN3 (38.5136) is presented as evidence that signal-guided multi-agent routing outperformed the linear regeneration baseline, yet RUN1 combines Gemini 2.5 Flash with RigoChat-7B-v2 while RUN3 uses only RigoChat-7B-v2 with LoRA; no ablation holding the underlying LLMs fixed is reported, so the performance delta cannot be attributed to the Event-Condition-Action routing, parallel generation, or internal signals rather than model capability.
[Methods] Methods: The Event-Condition-Action routing decisions depend on internal quality signals whose correlation with the official SARI and BERTScore metrics is not validated or quantified anywhere in the manuscript, leaving open whether these signals are reliable proxies for the leaderboard outcomes.

minor comments (2)

[Abstract] Abstract and Results: No error bars, segment-level breakdowns, or statistical tests are provided for the reported score differences, which weakens the strength of the outperformance claim.
[Conclusion] The manuscript states that 'further segment-level and document-level analysis are required' but does not include any such analysis.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive comments on result attribution and signal validation. We respond point by point below.

read point-by-point responses

Referee: [Results] Results: The SARI difference between RUN1 (44.0543) and RUN3 (38.5136) is presented as evidence that signal-guided multi-agent routing outperformed the linear regeneration baseline, yet RUN1 combines Gemini 2.5 Flash with RigoChat-7B-v2 while RUN3 uses only RigoChat-7B-v2 with LoRA; no ablation holding the underlying LLMs fixed is reported, so the performance delta cannot be attributed to the Event-Condition-Action routing, parallel generation, or internal signals rather than model capability.

Authors: We agree that the SARI difference cannot be attributed solely to the routing, parallel generation, or internal signals, as RUN1 and RUN3 employ different underlying models. The manuscript will be revised to clarify this limitation and rephrase the conclusion to state that the multi-agent workflow with mixed models achieved the highest score among our submissions, without claiming isolation of the routing effect. revision: yes
Referee: [Methods] Methods: The Event-Condition-Action routing decisions depend on internal quality signals whose correlation with the official SARI and BERTScore metrics is not validated or quantified anywhere in the manuscript, leaving open whether these signals are reliable proxies for the leaderboard outcomes.

Authors: The manuscript describes the internal signals and their use in routing but provides no quantitative correlation with official metrics. We will revise to explicitly note this as a limitation and add a brief discussion of how the signals were inspected during development, without claiming they are validated proxies. revision: yes

Circularity Check

0 steps flagged

No circularity: purely empirical shared-task reporting

full rationale

The paper reports participation in the MER-TRANS 2026 shared task by describing three runs (RUN1/RUN2 multi-agent workflows vs RUN3 linear baseline) and citing official leaderboard scores (SARI 44.0543 vs 38.5136). No equations, derivations, fitted parameters presented as predictions, or self-citation chains appear. The central claim rests on external competition metrics rather than any internal reduction to inputs or ansatzes. This is self-contained empirical reporting against an external benchmark.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

No mathematical derivation or theoretical model; the work is an empirical system description for a competition. No free parameters, axioms, or invented entities are introduced.

pith-pipeline@v0.9.1-grok · 5818 in / 1080 out tokens · 22445 ms · 2026-07-03T14:32:33.210757+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

41 extracted references · 21 canonical work pages

[1]

International Organization for Standardization, ISO 21801-1:2020 Cognitive accessibility – Part 1: General guidelines, https://www.iso.org/standard/71711.html, 2020

2020
[2]

International Organization for Standardization, ISO 24495-1:2023 Plain language – Part 1: Govern- ing principles and guidelines, https://www.iso.org/standard/78907.html, 2023

2023
[3]

Lectura Fácil

Asociación Española de Normalización, UNE 153101:2018 EX. Lectura Fácil. Pautas y recomenda- ciones para la elaboración de documentos, https://www.une.org/, 2018

2018
[4]

Nomura, G

M. Nomura, G. S. Nielsen, B. Tronbacke, Guidelines for easy-to-read materials, https://www.ifla. org/wp-content/uploads/2019/05/assets/hq/publications/professional-report/120.pdf, 2010

2019
[5]

Inclusion Europe, Information for all: European standards for making information easy to read and understand, https://www.inclusion-europe.eu/easy-to-read-standards-guidelines/, 2009

2009
[6]

Saggion, Automatic Text Simplification, Synthesis Lectures on Human Language Technologies, Springer Cham, 2017

H. Saggion, Automatic Text Simplification, Synthesis Lectures on Human Language Technologies, Springer Cham, 2017. URL: https://link.springer.com/book/10.1007/978-3-031-02166-4. doi: 10. 1007/978-3-031-02166-4

work page doi:10.1007/978-3-031-02166-4 2017
[7]

Nisioi, S

S. Nisioi, S. Štajner, S. P. Ponzetto, L. P. Dinu, Exploring neural text simplification models, in: Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics, Association for Computational Linguistics, Vancouver, Canada, 2017, pp. 85–91. URL: https: //aclanthology.org/P17-2014/. doi:10.18653/v1/P17-2014

work page doi:10.18653/v1/p17-2014 2017
[8]

Martínez, A

P. Martínez, A. Ramos, L. Moreno, Exploring large language models to generate easy to read content, Frontiers in Computer Science 6 (2024) 1394705. URL: https://doi.org/10.3389/fcomp.2024.1394705. doi:10.3389/fcomp.2024.1394705

work page doi:10.3389/fcomp.2024.1394705 2024
[9]

Jan Melechovsky, Abhinaba Roy, and Dorien Herremans

K. Papineni, S. Roukos, T. Ward, W.-J. Zhu, BLEU: A method for automatic evaluation of machine translation, in: Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics, Association for Computational Linguistics, Philadelphia, Pennsylvania, USA, 2002, pp. 311–318. URL: https://aclanthology.org/P02-1040/. doi:10.3115/1073083.1073135

work page doi:10.3115/1073083.1073135 2002
[10]

W. Xu, C. Napoles, E. Pavlick, Q. Chen, C. Callison-Burch, Optimizing statistical machine transla- tion for text simplification, Transactions of the Association for Computational Linguistics 4 (2016) 401–415. URL: https://aclanthology.org/Q16-1029/. doi:10.1162/tacl_a_00107

work page doi:10.1162/tacl_a_00107 2016
[11]

Zhang, V

T. Zhang, V. Kishore, F. Wu, K. Q. Weinberger, Y. Artzi, BERTScore: Evaluating text generation with BERT, in: International Conference on Learning Representations, 2020. URL: https://openreview. net/forum?id=SkeHuCVFDr

2020
[12]

Alva-Manchego, C

F. Alva-Manchego, C. Scarton, L. Specia, The (Un)Suitability of automatic evaluation metrics for text simplification, Computational Linguistics 47 (2021) 861–889. doi:10.1162/coli_a_00418

work page doi:10.1162/coli_a_00418 2021
[13]

Maddela, Y

M. Maddela, Y. Dou, D. Heineman, W. Xu, LENS: A learnable evaluation metric for text sim- plification, in: Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics, Association for Computational Linguistics, Toronto, Canada, 2023, pp. 16383–16408. URL: https://aclanthology.org/2023.acl-long.905/. doi:10.18653/v1/2023.acl-long.905

work page doi:10.18653/v1/2023.acl-long.905 2023
[14]

Maddela, F

M. Maddela, F. Alva-Manchego, Adapting sentence-level automatic metrics for document-level simplification evaluation, in: Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies, As- sociation for Computational Linguistics, Albuquerque, New Mexico, 2025, pp. 64...

work page doi:10.18653/v1/2025.naacl-long.327 2025
[15]

Cripwell, J

L. Cripwell, J. Legrand, C. Gardent, Evaluating document simplification: On the importance of separately assessing simplicity and meaning preservation, in: Proceedings of the 3rd Workshop on Tools and Resources for People with READing Difficulties (READI) @ LREC-COLING 2024, ELRA and ICCL, Torino, Italia, 2024, pp. 1–14. URL: https://aclanthology.org/2024...

2024
[16]

Agrawal, M

S. Agrawal, M. Carpuat, Do text simplification systems preserve meaning? a human evaluation via reading comprehension, Transactions of the Association for Computational Linguistics 12 (2024) 432–448. URL: https://aclanthology.org/2024.tacl-1.24/. doi:10.1162/tacl_a_00653

work page doi:10.1162/tacl_a_00653 2024
[17]

Carrer, A

L. Carrer, A. Säuberli, M. Kappus, S. Ebling, Towards holistic human evaluation of automatic text simplification, in: Proceedings of the Fourth Workshop on Human Evaluation of NLP Systems (HumEval) @ LREC-COLING 2024, ELRA and ICCL, Torino, Italia, 2024, pp. 71–80. URL: https://aclanthology.org/2024.humeval-1.7/

2024
[18]

Saggion, S

H. Saggion, S. Štajner, D. Ferrés, K. C. Sheang, M. Shardlow, K. North, M. Zampieri, Findings of the TSAR-2022 shared task on multilingual lexical simplification, in: Proceedings of the Workshop on Text Simplification, Accessibility, and Readability (TSAR-2022), Association for Computational Linguistics, Abu Dhabi, United Arab Emirates, 2022, pp. 271–283....

work page doi:10.18653/v1/2022.tsar-1.31 2022
[19]

Štajner, H

S. Štajner, H. Saggion, M. Shardlow, F. Alva-Manchego (Eds.), Proceedings of the Second Workshop on Text Simplification, Accessibility and Readability, INCOMA Ltd., Shoumen, Bulgaria, Varna, Bulgaria, 2023. URL: https://aclanthology.org/2023.tsar-1.0/

2023
[20]

Shardlow, H

M. Shardlow, H. Saggion, F. Alva-Manchego, M. Zampieri, K. North, S. Štajner, R. Stodden (Eds.), Proceedings of the Third Workshop on Text Simplification, Accessibility and Readabil- ity (TSAR 2024), Association for Computational Linguistics, Miami, Florida, USA, 2024. URL: https://aclanthology.org/2024.tsar-1.0/. doi:10.18653/v1/2024.tsar-1.0

work page doi:10.18653/v1/2024.tsar-1.0 2024
[21]

Shardlow, F

M. Shardlow, F. Alva-Manchego, K. North, R. Stodden, H. Saggion, N. Khallaf, A. Hayakawa (Eds.), Proceedings of the Fourth Workshop on Text Simplification, Accessibility and Readability (TSAR 2025), Association for Computational Linguistics, Suzhou, China, 2025. URL: https://aclanthology. org/2025.tsar-1.0/. doi:10.18653/v1/2025.tsar-1.0

work page doi:10.18653/v1/2025.tsar-1.0 2025
[22]

Alva-Manchego, R

F. Alva-Manchego, R. Stodden, J. M. Imperial, A. Barayan, K. North, H. Tayyar Madabushi, Findings of the TSAR 2025 shared task on readability-controlled text simplification, in: Proceedings of the Fourth Workshop on Text Simplification, Accessibility and Readability (TSAR 2025), Association for Computational Linguistics, Suzhou, China, 2025, pp. 116–130. ...

work page doi:10.18653/v1/2025.tsar-1.8 2025
[23]

Wilkens, R

R. Wilkens, R. Cardon, A. Todirascu, N. Gala (Eds.), Proceedings of the 3rd Workshop on Tools and Resources for People with READing Difficulties (READI) @ LREC-COLING 2024, ELRA and ICCL, Torino, Italia, 2024. URL: https://aclanthology.org/2024.readi-1.0/

2024
[24]

Botella-Gil, I

B. Botella-Gil, I. Espinosa-Zaragoza, A. Bonet-Jover, M. Madina, L. Molino Piñar, P. Moreda, I. Gonzalez-Dios, M. T. Martín-Valdivia, L. A. Ureña-López, Overview of clears at iberlef 2025: Challenge for plain language and easy-to-read adaptation for spanish texts, Procesamiento del Lenguaje Natural (2025) 393–400. doi:10.26342/2025-75-28

work page doi:10.26342/2025-75-28 2025
[25]

Saggion, M

H. Saggion, M. Tareh, N. Khallaf, D. Adanza, S. Bott, N. Pérez-Rojas, A. Rascón-Alcaina, S. Szasz, Overview of MER-TRANS at IberLEF 2026: First shared task on multilingual easy-to-read transla- tion, Procesamiento del Lenguaje Natural 77 (2026)

2026
[26]

Bonet-Jover, J

A. Bonet-Jover, J. Á. González-Barba, L. Chiruzzo, Overview of IberLEF 2026: Natural language processing challenges for spanish and other iberian languages, in: Proceedings of the Iberian Languages Evaluation Forum (IberLEF 2026), co-located with the 42nd Conference of the Spanish Society for Natural Language Processing (SEPLN 2026), CEUR-WS.org, 2026

2026
[27]

S. Bott, V. Riegler, H. Saggion, A. Rascón Alcaina, N. Khallaf, A multilingual human annotated corpus of original and easy-to-read texts to support access to democratic participatory processes, in: Proceedings of the Fifteenth Language Resources and Evaluation Conference (LREC 2026), European Language Resources Association (ELRA), Palma, Mallorca, Spain, ...

work page doi:10.63317/4b56cza6e7zk 2026
[28]

Khallaf, S

N. Khallaf, S. Sharoff, Automatic difficulty classification of Arabic sentences, in: N. Habash, H. Bouamor, H. Hajj, W. Magdy, W. Zaghouani, F. Bougares, N. Tomeh, I. Abu Farha, S. Touileb (Eds.), Proceedings of the Sixth Arabic Natural Language Processing Workshop, Association for Computational Linguistics, Kyiv, Ukraine (Virtual), 2021, pp. 105–114. URL...

2021
[29]

MER-TRANS Organizers, MER-TRANS 2026: Multilingual easy-to-read translation shared task, https://lastus-taln-upf.github.io/mertrans-iberlef-2026/, 2026

2026
[30]

LaSTUS-TALN-UPF, MER-TRANS Evaluator 2026: Evaluator for shared-task submissions, https: //github.com/LaSTUS-TALN-UPF/mertrans-evaluator-2026, 2026

2026
[31]

HULAT-UC3M, Bilingual Simplification Metrics: A unified evaluation framework for text simplifi- cation (ES/EN), https://github.com/hulat-group/bilingual_simplification_metrics, 2026

2026
[32]

Campillos-Llanos, A

L. Campillos-Llanos, A. R. Terroba Reinares, S. Zakhir Puig, A. Valverde-Mateos, A. Capllonch- Carrión, Building a comparable corpus and a benchmark for spanish medical text simplification, Procesamiento del Lenguaje Natural (2022) 189–196. doi:10.26342/2022-69-16

work page doi:10.26342/2022-69-16 2022
[33]

Moreno, P

L. Moreno, P. Martínez, A human-in/on-the-loop framework for accessible text generation, in: Proceedings of the Fifteenth Language Resources and Evaluation Conference (LREC 2026), European Language Resources Association (ELRA), Palma, Mallorca, Spain, 2026, pp. 7236–7247. URL: https://lrec.elra.info/lrec2026-main-574. doi:10.63317/2gtngp2nmx63

work page doi:10.63317/2gtngp2nmx63 2026
[34]

URL: https://docs.langchain.com/oss/python/langgraph/ overview

LangChain, LangGraph overview, 2026. URL: https://docs.langchain.com/oss/python/langgraph/ overview

2026
[35]

URL: https://ai.google.dev/gemini-api/docs/ models/gemini-2.5-flash

Google AI for Developers, Gemini 2.5 flash, 2026. URL: https://ai.google.dev/gemini-api/docs/ models/gemini-2.5-flash

2026
[36]

Santamaría Gómez, G

G. Santamaría Gómez, G. García Subies, P. Gutiérrez Ruiz, M. González Valero, N. Fuertes, H. Montoro Zamorano, C. Muñoz Sanz, L. Rosado Plaza, N. Aldama García, D. Betancur Sánchez, K. Sushkova, M. Guerrero Nieto, Á. Barbero Jiménez, RigoChat 2: An adapted language model to spanish using a bounded dataset and reduced hardware, 2025. URL: https://arxiv.org...

work page arXiv 2025
[37]

Moreno, P

L. Moreno, P. Martínez, AIGov-Access: AI Governance for Accessibility-Oriented Text Adaptation — MER-TRANS 2026 Profile, 2026. URL: https://doi.org/10.5281/zenodo.20855013. doi: 10.5281/ zenodo.20855013

work page doi:10.5281/zenodo.20855013 2026
[38]

E. J. Hu, Y. Shen, P. Wallis, Z. Allen-Zhu, Y. Li, S. Wang, L. Wang, W. Chen, LoRA: Low-rank adaptation of large language models, in: International Conference on Learning Representations,
[39]

URL: https://openreview.net/forum?id=nZeVKeeFYf9
[40]

Beauchemin, H

D. Beauchemin, H. Saggion, R. Khoury, MeaningBERT: Assessing meaning preservation between sentences, Frontiers in Artificial Intelligence 6 (2023). URL: https://www.frontiersin.org/articles/ 10.3389/frai.2023.1223924. doi:10.3389/frai.2023.1223924

work page doi:10.3389/frai.2023.1223924 2023
[41]

Fernández Huerta, Medidas sencillas de lecturabilidad, Consigna 214 (1959) 29–32

J. Fernández Huerta, Medidas sencillas de lecturabilidad, Consigna 214 (1959) 29–32. A. Public Resources Used for Calibration and Lexical Support The following public resources were used during system calibration, prompt refinement or glossary construction: • Plena Inclusión España, The Spanish Constitution in Easy-to-Read Format, Easy-to-Read publication...

work page arXiv 1959

[1] [1]

International Organization for Standardization, ISO 21801-1:2020 Cognitive accessibility – Part 1: General guidelines, https://www.iso.org/standard/71711.html, 2020

2020

[2] [2]

International Organization for Standardization, ISO 24495-1:2023 Plain language – Part 1: Govern- ing principles and guidelines, https://www.iso.org/standard/78907.html, 2023

2023

[3] [3]

Lectura Fácil

Asociación Española de Normalización, UNE 153101:2018 EX. Lectura Fácil. Pautas y recomenda- ciones para la elaboración de documentos, https://www.une.org/, 2018

2018

[4] [4]

Nomura, G

M. Nomura, G. S. Nielsen, B. Tronbacke, Guidelines for easy-to-read materials, https://www.ifla. org/wp-content/uploads/2019/05/assets/hq/publications/professional-report/120.pdf, 2010

2019

[5] [5]

Inclusion Europe, Information for all: European standards for making information easy to read and understand, https://www.inclusion-europe.eu/easy-to-read-standards-guidelines/, 2009

2009

[6] [6]

Saggion, Automatic Text Simplification, Synthesis Lectures on Human Language Technologies, Springer Cham, 2017

H. Saggion, Automatic Text Simplification, Synthesis Lectures on Human Language Technologies, Springer Cham, 2017. URL: https://link.springer.com/book/10.1007/978-3-031-02166-4. doi: 10. 1007/978-3-031-02166-4

work page doi:10.1007/978-3-031-02166-4 2017

[7] [7]

Nisioi, S

S. Nisioi, S. Štajner, S. P. Ponzetto, L. P. Dinu, Exploring neural text simplification models, in: Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics, Association for Computational Linguistics, Vancouver, Canada, 2017, pp. 85–91. URL: https: //aclanthology.org/P17-2014/. doi:10.18653/v1/P17-2014

work page doi:10.18653/v1/p17-2014 2017

[8] [8]

Martínez, A

P. Martínez, A. Ramos, L. Moreno, Exploring large language models to generate easy to read content, Frontiers in Computer Science 6 (2024) 1394705. URL: https://doi.org/10.3389/fcomp.2024.1394705. doi:10.3389/fcomp.2024.1394705

work page doi:10.3389/fcomp.2024.1394705 2024

[9] [9]

Jan Melechovsky, Abhinaba Roy, and Dorien Herremans

K. Papineni, S. Roukos, T. Ward, W.-J. Zhu, BLEU: A method for automatic evaluation of machine translation, in: Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics, Association for Computational Linguistics, Philadelphia, Pennsylvania, USA, 2002, pp. 311–318. URL: https://aclanthology.org/P02-1040/. doi:10.3115/1073083.1073135

work page doi:10.3115/1073083.1073135 2002

[10] [10]

W. Xu, C. Napoles, E. Pavlick, Q. Chen, C. Callison-Burch, Optimizing statistical machine transla- tion for text simplification, Transactions of the Association for Computational Linguistics 4 (2016) 401–415. URL: https://aclanthology.org/Q16-1029/. doi:10.1162/tacl_a_00107

work page doi:10.1162/tacl_a_00107 2016

[11] [11]

Zhang, V

T. Zhang, V. Kishore, F. Wu, K. Q. Weinberger, Y. Artzi, BERTScore: Evaluating text generation with BERT, in: International Conference on Learning Representations, 2020. URL: https://openreview. net/forum?id=SkeHuCVFDr

2020

[12] [12]

Alva-Manchego, C

F. Alva-Manchego, C. Scarton, L. Specia, The (Un)Suitability of automatic evaluation metrics for text simplification, Computational Linguistics 47 (2021) 861–889. doi:10.1162/coli_a_00418

work page doi:10.1162/coli_a_00418 2021

[13] [13]

Maddela, Y

M. Maddela, Y. Dou, D. Heineman, W. Xu, LENS: A learnable evaluation metric for text sim- plification, in: Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics, Association for Computational Linguistics, Toronto, Canada, 2023, pp. 16383–16408. URL: https://aclanthology.org/2023.acl-long.905/. doi:10.18653/v1/2023.acl-long.905

work page doi:10.18653/v1/2023.acl-long.905 2023

[14] [14]

Maddela, F

M. Maddela, F. Alva-Manchego, Adapting sentence-level automatic metrics for document-level simplification evaluation, in: Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies, As- sociation for Computational Linguistics, Albuquerque, New Mexico, 2025, pp. 64...

work page doi:10.18653/v1/2025.naacl-long.327 2025

[15] [15]

Cripwell, J

L. Cripwell, J. Legrand, C. Gardent, Evaluating document simplification: On the importance of separately assessing simplicity and meaning preservation, in: Proceedings of the 3rd Workshop on Tools and Resources for People with READing Difficulties (READI) @ LREC-COLING 2024, ELRA and ICCL, Torino, Italia, 2024, pp. 1–14. URL: https://aclanthology.org/2024...

2024

[16] [16]

Agrawal, M

S. Agrawal, M. Carpuat, Do text simplification systems preserve meaning? a human evaluation via reading comprehension, Transactions of the Association for Computational Linguistics 12 (2024) 432–448. URL: https://aclanthology.org/2024.tacl-1.24/. doi:10.1162/tacl_a_00653

work page doi:10.1162/tacl_a_00653 2024

[17] [17]

Carrer, A

L. Carrer, A. Säuberli, M. Kappus, S. Ebling, Towards holistic human evaluation of automatic text simplification, in: Proceedings of the Fourth Workshop on Human Evaluation of NLP Systems (HumEval) @ LREC-COLING 2024, ELRA and ICCL, Torino, Italia, 2024, pp. 71–80. URL: https://aclanthology.org/2024.humeval-1.7/

2024

[18] [18]

Saggion, S

H. Saggion, S. Štajner, D. Ferrés, K. C. Sheang, M. Shardlow, K. North, M. Zampieri, Findings of the TSAR-2022 shared task on multilingual lexical simplification, in: Proceedings of the Workshop on Text Simplification, Accessibility, and Readability (TSAR-2022), Association for Computational Linguistics, Abu Dhabi, United Arab Emirates, 2022, pp. 271–283....

work page doi:10.18653/v1/2022.tsar-1.31 2022

[19] [19]

Štajner, H

S. Štajner, H. Saggion, M. Shardlow, F. Alva-Manchego (Eds.), Proceedings of the Second Workshop on Text Simplification, Accessibility and Readability, INCOMA Ltd., Shoumen, Bulgaria, Varna, Bulgaria, 2023. URL: https://aclanthology.org/2023.tsar-1.0/

2023

[20] [20]

Shardlow, H

M. Shardlow, H. Saggion, F. Alva-Manchego, M. Zampieri, K. North, S. Štajner, R. Stodden (Eds.), Proceedings of the Third Workshop on Text Simplification, Accessibility and Readabil- ity (TSAR 2024), Association for Computational Linguistics, Miami, Florida, USA, 2024. URL: https://aclanthology.org/2024.tsar-1.0/. doi:10.18653/v1/2024.tsar-1.0

work page doi:10.18653/v1/2024.tsar-1.0 2024

[21] [21]

Shardlow, F

M. Shardlow, F. Alva-Manchego, K. North, R. Stodden, H. Saggion, N. Khallaf, A. Hayakawa (Eds.), Proceedings of the Fourth Workshop on Text Simplification, Accessibility and Readability (TSAR 2025), Association for Computational Linguistics, Suzhou, China, 2025. URL: https://aclanthology. org/2025.tsar-1.0/. doi:10.18653/v1/2025.tsar-1.0

work page doi:10.18653/v1/2025.tsar-1.0 2025

[22] [22]

Alva-Manchego, R

F. Alva-Manchego, R. Stodden, J. M. Imperial, A. Barayan, K. North, H. Tayyar Madabushi, Findings of the TSAR 2025 shared task on readability-controlled text simplification, in: Proceedings of the Fourth Workshop on Text Simplification, Accessibility and Readability (TSAR 2025), Association for Computational Linguistics, Suzhou, China, 2025, pp. 116–130. ...

work page doi:10.18653/v1/2025.tsar-1.8 2025

[23] [23]

Wilkens, R

R. Wilkens, R. Cardon, A. Todirascu, N. Gala (Eds.), Proceedings of the 3rd Workshop on Tools and Resources for People with READing Difficulties (READI) @ LREC-COLING 2024, ELRA and ICCL, Torino, Italia, 2024. URL: https://aclanthology.org/2024.readi-1.0/

2024

[24] [24]

Botella-Gil, I

B. Botella-Gil, I. Espinosa-Zaragoza, A. Bonet-Jover, M. Madina, L. Molino Piñar, P. Moreda, I. Gonzalez-Dios, M. T. Martín-Valdivia, L. A. Ureña-López, Overview of clears at iberlef 2025: Challenge for plain language and easy-to-read adaptation for spanish texts, Procesamiento del Lenguaje Natural (2025) 393–400. doi:10.26342/2025-75-28

work page doi:10.26342/2025-75-28 2025

[25] [25]

Saggion, M

H. Saggion, M. Tareh, N. Khallaf, D. Adanza, S. Bott, N. Pérez-Rojas, A. Rascón-Alcaina, S. Szasz, Overview of MER-TRANS at IberLEF 2026: First shared task on multilingual easy-to-read transla- tion, Procesamiento del Lenguaje Natural 77 (2026)

2026

[26] [26]

Bonet-Jover, J

A. Bonet-Jover, J. Á. González-Barba, L. Chiruzzo, Overview of IberLEF 2026: Natural language processing challenges for spanish and other iberian languages, in: Proceedings of the Iberian Languages Evaluation Forum (IberLEF 2026), co-located with the 42nd Conference of the Spanish Society for Natural Language Processing (SEPLN 2026), CEUR-WS.org, 2026

2026

[27] [27]

S. Bott, V. Riegler, H. Saggion, A. Rascón Alcaina, N. Khallaf, A multilingual human annotated corpus of original and easy-to-read texts to support access to democratic participatory processes, in: Proceedings of the Fifteenth Language Resources and Evaluation Conference (LREC 2026), European Language Resources Association (ELRA), Palma, Mallorca, Spain, ...

work page doi:10.63317/4b56cza6e7zk 2026

[28] [28]

Khallaf, S

N. Khallaf, S. Sharoff, Automatic difficulty classification of Arabic sentences, in: N. Habash, H. Bouamor, H. Hajj, W. Magdy, W. Zaghouani, F. Bougares, N. Tomeh, I. Abu Farha, S. Touileb (Eds.), Proceedings of the Sixth Arabic Natural Language Processing Workshop, Association for Computational Linguistics, Kyiv, Ukraine (Virtual), 2021, pp. 105–114. URL...

2021

[29] [29]

MER-TRANS Organizers, MER-TRANS 2026: Multilingual easy-to-read translation shared task, https://lastus-taln-upf.github.io/mertrans-iberlef-2026/, 2026

2026

[30] [30]

LaSTUS-TALN-UPF, MER-TRANS Evaluator 2026: Evaluator for shared-task submissions, https: //github.com/LaSTUS-TALN-UPF/mertrans-evaluator-2026, 2026

2026

[31] [31]

HULAT-UC3M, Bilingual Simplification Metrics: A unified evaluation framework for text simplifi- cation (ES/EN), https://github.com/hulat-group/bilingual_simplification_metrics, 2026

2026

[32] [32]

Campillos-Llanos, A

L. Campillos-Llanos, A. R. Terroba Reinares, S. Zakhir Puig, A. Valverde-Mateos, A. Capllonch- Carrión, Building a comparable corpus and a benchmark for spanish medical text simplification, Procesamiento del Lenguaje Natural (2022) 189–196. doi:10.26342/2022-69-16

work page doi:10.26342/2022-69-16 2022

[33] [33]

Moreno, P

L. Moreno, P. Martínez, A human-in/on-the-loop framework for accessible text generation, in: Proceedings of the Fifteenth Language Resources and Evaluation Conference (LREC 2026), European Language Resources Association (ELRA), Palma, Mallorca, Spain, 2026, pp. 7236–7247. URL: https://lrec.elra.info/lrec2026-main-574. doi:10.63317/2gtngp2nmx63

work page doi:10.63317/2gtngp2nmx63 2026

[34] [34]

URL: https://docs.langchain.com/oss/python/langgraph/ overview

LangChain, LangGraph overview, 2026. URL: https://docs.langchain.com/oss/python/langgraph/ overview

2026

[35] [35]

URL: https://ai.google.dev/gemini-api/docs/ models/gemini-2.5-flash

Google AI for Developers, Gemini 2.5 flash, 2026. URL: https://ai.google.dev/gemini-api/docs/ models/gemini-2.5-flash

2026

[36] [36]

Santamaría Gómez, G

G. Santamaría Gómez, G. García Subies, P. Gutiérrez Ruiz, M. González Valero, N. Fuertes, H. Montoro Zamorano, C. Muñoz Sanz, L. Rosado Plaza, N. Aldama García, D. Betancur Sánchez, K. Sushkova, M. Guerrero Nieto, Á. Barbero Jiménez, RigoChat 2: An adapted language model to spanish using a bounded dataset and reduced hardware, 2025. URL: https://arxiv.org...

work page arXiv 2025

[37] [37]

Moreno, P

L. Moreno, P. Martínez, AIGov-Access: AI Governance for Accessibility-Oriented Text Adaptation — MER-TRANS 2026 Profile, 2026. URL: https://doi.org/10.5281/zenodo.20855013. doi: 10.5281/ zenodo.20855013

work page doi:10.5281/zenodo.20855013 2026

[38] [38]

E. J. Hu, Y. Shen, P. Wallis, Z. Allen-Zhu, Y. Li, S. Wang, L. Wang, W. Chen, LoRA: Low-rank adaptation of large language models, in: International Conference on Learning Representations,

[39] [39]

URL: https://openreview.net/forum?id=nZeVKeeFYf9

[40] [40]

Beauchemin, H

D. Beauchemin, H. Saggion, R. Khoury, MeaningBERT: Assessing meaning preservation between sentences, Frontiers in Artificial Intelligence 6 (2023). URL: https://www.frontiersin.org/articles/ 10.3389/frai.2023.1223924. doi:10.3389/frai.2023.1223924

work page doi:10.3389/frai.2023.1223924 2023

[41] [41]

Fernández Huerta, Medidas sencillas de lecturabilidad, Consigna 214 (1959) 29–32

J. Fernández Huerta, Medidas sencillas de lecturabilidad, Consigna 214 (1959) 29–32. A. Public Resources Used for Calibration and Lexical Support The following public resources were used during system calibration, prompt refinement or glossary construction: • Plena Inclusión España, The Spanish Constitution in Easy-to-Read Format, Easy-to-Read publication...

work page arXiv 1959