arxiv: 2605.10714 · v1 · submitted 2026-05-11 · 💻 cs.CL · cs.AI

Recognition: 2 theorem links

· Lean Theorem

Why Low-Resource NLP Needs More Than Cross-Lingual Transfer: Lessons Learned from Luxembourgish

Fred Philippy , Siwen Guo , Jacques Klein , Tegawend\'e F. Bissyand\'e

Authors on Pith no claims yet

Pith reviewed 2026-05-12 04:17 UTC · model grok-4.3

classification 💻 cs.CL cs.AI

keywords low-resource NLPcross-lingual transferLuxembourgishlanguage-specific datamultilingual modelsNLP pipelines

0 comments

The pith

Cross-lingual transfer succeeds in low-resource NLP only when paired with high-quality task-aligned target-language data.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper uses Luxembourgish as a concrete case to show that transfer from high-resource languages improves task performance but requires enough quality labeled data in the target language to deliver strong results. Limited target-language resources on their own fall short of competitive performance. Those same resources achieve their best outcomes only when placed inside a cross-lingual training or adaptation framework. The authors therefore treat transfer and language-specific data creation as interdependent parts of one pipeline rather than rival strategies. They close with guidelines for deciding how much effort to allocate to each component in practice.

Core claim

Cross-lingual transfer can substantially improve target-language performance, but its success depends critically on the availability of sufficiently high-quality, task-aligned target-language data. Such resources, particularly in low-resource settings, are typically too limited in scale to drive strong performance on their own. Instead, they reach their full potential only when leveraged within a cross-lingual framework. Cross-lingual transfer and language-specific efforts therefore function as complementary components of a sustainable low-resource NLP pipeline.

What carries the argument

The interdependence between cross-lingual transfer gains and the scale plus task alignment of target-language labeled data, demonstrated through Luxembourgish experiments and prior results.

If this is right

Low-resource pipelines must budget for both data collection in the target language and cross-lingual model use rather than choosing one.
Existing small target-language datasets should be aligned to tasks where cross-lingual signals are available.
Development plans for new languages should start by measuring how much task-aligned data already exists before scaling transfer methods.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Languages farther from high-resource relatives may require proportionally more target data before transfer becomes useful.
Resource allocation decisions could be guided by first estimating the minimum target data threshold needed for a given language pair.
The same complementarity may appear in other modalities such as speech or code when similar transfer and data constraints apply.

Load-bearing premise

The mutual dependence observed for Luxembourgish, a language close to several high-resource ones, also holds for low-resource languages that are more distant or that lack even minimal task-aligned data.

What would settle it

A low-resource language achieving strong task performance with purely cross-lingual transfer and no target-language labeled examples at all, or reaching the same performance level with target-language data alone and no transfer.

Figures

Figures reproduced from arXiv: 2605.10714 by Fred Philippy, Jacques Klein, Siwen Guo, Tegawend\'e F. Bissyand\'e.

**Figure 1.** Figure 1: Radar chart illustrating structural and sociotechnical dimensions associated with NLP inclusion and cross-lingual transfer: cultural/geographical proximity to high-resource languages, lexical similarity, typological similarity, relative digital presence, and socioeconomic context. Luxembourgish ( ) approximates an upper bound among lower-resource languages, combining structural proximity with strong i… view at source ↗

**Figure 2.** Figure 2: PCA projection of concatenated syntactic, phonological, inventory, genetic, and geographical representations for each language. Each point denotes a language; spatial proximity reflects overall linguistic similarity. Colors indicate the logarithm of the number of Wikipedia articles (resource proxy). Luxembourgish is located within a dense cluster of predominantly midto high-resource languages. More detail… view at source ↗

**Figure 3.** Figure 3: Estimated number of speakers vs. number of Wikipedia articles (top) / Common Crawl pages (bottom) across languages. Each point represents a language (both axes shown on a log scale). Shaded quadrants indicate languages with (i) fewer speakers and fewer articles (bottom left), (ii) more speakers but fewer articles (bottom right), (iii) fewer speakers but more articles (top left), and (iv) more speakers and … view at source ↗

**Figure 4.** Figure 4: Distribution of cosine similarities between EN-LB sentence pairs. More details are provided in Appendix A.4. As a result, building usable parallel corpora for low-resource languages frequently requires human intervention, not necessarily in the form of full manual translation, but through targeted guidance and language-aware constraints that improve mining precision. A concrete example is presented by Phi… view at source ↗

read the original abstract

Cross-lingual transfer has become a central paradigm for extending natural language processing (NLP) technologies to low-resource languages. By leveraging supervision from high-resource languages, multilingual language models can achieve strong task performance with little or no labeled target-language data. However, it remains unclear to what extent cross-lingual transfer can substitute for language-specific efforts. In this paper, we synthesize prior research findings and data collection results on Luxembourgish, which, despite its typological proximity to high-resource languages and its presence in a multilingual context, remains insufficiently represented in modern NLP technologies. Across findings, we observe a fundamental interdependence between cross-lingual transfer and language-specific efforts. Cross-lingual transfer can substantially improve target-language performance, but its success depends critically on the availability of sufficiently high-quality, task-aligned target-language data. At the same time, such resources, particularly in low-resource settings, are typically too limited in scale to drive strong performance on their own. Instead, such resources reach their full potential only when leveraged within a cross-lingual framework. We therefore argue that cross-lingual transfer and language-specific efforts should not be viewed as competing alternatives. Instead, they function as complementary components of a sustainable low-resource NLP pipeline. Based on these insights, we provide practical guidelines for integrating and balancing cross-lingual transfer with language-specific development in sustainable low-resource NLP pipelines.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

1 major / 1 minor

Summary. The manuscript synthesizes prior research findings and new data collection results on Luxembourgish to argue that cross-lingual transfer and language-specific efforts are not competing alternatives but complementary components of a sustainable low-resource NLP pipeline. It claims that transfer can substantially improve target-language performance yet depends critically on high-quality, task-aligned target data, while such limited resources reach their full potential only when leveraged within a cross-lingual framework. The paper concludes by offering practical guidelines for integrating and balancing the two approaches.

Significance. If the observed interdependence generalizes, the work would be significant for shifting low-resource NLP away from over-reliance on transfer alone toward balanced, sustainable pipelines that value even modest target-language resources. The synthesis of prior findings with Luxembourgish-specific data collection provides concrete, actionable lessons and highlights limits of transfer in isolation. Strengths include the explicit complementarity framing and practical guidelines, which could inform resource allocation decisions.

major comments (1)

[Abstract] Abstract: The central claim of a 'fundamental interdependence' between cross-lingual transfer and language-specific efforts is synthesized from Luxembourgish results, yet the manuscript provides no parallel experiments, ablations, or comparisons on languages lacking typological proximity to high-resource languages (e.g., German/French) or with no task-aligned data at all. This makes the necessity of complementarity (rather than simple scaling or transfer alone) specific to the observed case and does not establish the general pipeline recommendation.

minor comments (1)

[Abstract] The abstract and conclusion could more explicitly qualify the scope of the claims (e.g., 'for languages with some multilingual pretraining overlap') to avoid overgeneralization.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the constructive feedback. The comment raises a valid point about the scope of our claims, which we address below by clarifying the paper's positioning as lessons from Luxembourgish combined with synthesis of prior work. We have made partial revisions to the abstract and added explicit discussion of limitations and generalizability.

read point-by-point responses

Referee: [Abstract] Abstract: The central claim of a 'fundamental interdependence' between cross-lingual transfer and language-specific efforts is synthesized from Luxembourgish results, yet the manuscript provides no parallel experiments, ablations, or comparisons on languages lacking typological proximity to high-resource languages (e.g., German/French) or with no task-aligned data at all. This makes the necessity of complementarity (rather than simple scaling or transfer alone) specific to the observed case and does not establish the general pipeline recommendation.

Authors: We agree that the manuscript's new empirical contributions focus on Luxembourgish, a language with typological proximity to German and French. However, the central claim is not derived solely from Luxembourgish but from synthesizing prior research findings across a broader set of low-resource languages (including more distant ones) together with the Luxembourgish case studies. Luxembourgish was deliberately selected as a 'best-case' scenario for cross-lingual transfer due to its similarity and multilingual context; the fact that even here high-quality target data proves essential strengthens rather than weakens the complementarity argument. For settings with no task-aligned data, the practical guidelines explicitly recommend minimal initial data collection to bootstrap effective transfer. We have revised the abstract to replace 'fundamental interdependence' with 'observed interdependence in the Luxembourgish context and supported by prior work' and added a dedicated limitations paragraph discussing the need for future validation on typologically distant languages. This does not alter the actionable recommendations but better bounds their evidential basis. revision: partial

Circularity Check

0 steps flagged

No circularity: empirical synthesis from Luxembourgish observations and prior work

full rationale

The paper derives its central claim of interdependence between cross-lingual transfer and language-specific efforts directly from synthesized prior research findings plus new Luxembourgish data collection results, as stated in the abstract. No self-definitional loops, fitted parameters renamed as predictions, or load-bearing self-citations that reduce the argument to unverified internal premises appear in the provided text. The derivation remains inductive and externally grounded in observable performance patterns rather than constructed equivalence to its inputs.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

This is a qualitative synthesis and position paper; it introduces no mathematical models, fitted parameters, new axioms, or invented entities.

pith-pipeline@v0.9.0 · 5558 in / 1184 out tokens · 63678 ms · 2026-05-12T04:17:25.101345+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Foundation/AbsoluteFloorClosure.lean reality_from_one_distinction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

Cross-lingual transfer and language-specific efforts... function as complementary components of a sustainable low-resource NLP pipeline.
IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

high-quality parallel data... acts as an anchoring mechanism

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

300 extracted references · 300 canonical work pages · 1 internal anchor

[1]

Proceedings of the 21st Workshop of Young Researchers' Roundtable on Spoken Dialogue Systems. 2025

work page 2025
[2]

Research on LLM s-Empowered Conversational AI for Sustainable Behaviour Change

Chen, Ben. Research on LLM s-Empowered Conversational AI for Sustainable Behaviour Change. Proceedings of the 21st Workshop of Young Researchers' Roundtable on Spoken Dialogue Systems. 2025

work page 2025
[3]

Deep Reinforcement Learning of LLM s using RLHF

Levandovsky, Enoch. Deep Reinforcement Learning of LLM s using RLHF. Proceedings of the 21st Workshop of Young Researchers' Roundtable on Spoken Dialogue Systems. 2025

work page 2025
[4]

Conversational Collaborative Robots

Kranti, Chalamalasetti. Conversational Collaborative Robots. Proceedings of the 21st Workshop of Young Researchers' Roundtable on Spoken Dialogue Systems. 2025

work page 2025
[5]

Dialogue System using Large Language Model-based Dynamic Slot Generation

Hashimoto, Ekai. Dialogue System using Large Language Model-based Dynamic Slot Generation. Proceedings of the 21st Workshop of Young Researchers' Roundtable on Spoken Dialogue Systems. 2025

work page 2025
[6]

Towards Adaptive Human-Agent Collaboration in Real-Time Environments

Nakae, Kaito. Towards Adaptive Human-Agent Collaboration in Real-Time Environments. Proceedings of the 21st Workshop of Young Researchers' Roundtable on Spoken Dialogue Systems. 2025

work page 2025
[7]

Towards Human-Like Dialogue Systems: Integrating Multimodal Emotion Recognition and Non-Verbal Cue Generation

Jiang, Jingjing. Towards Human-Like Dialogue Systems: Integrating Multimodal Emotion Recognition and Non-Verbal Cue Generation. Proceedings of the 21st Workshop of Young Researchers' Roundtable on Spoken Dialogue Systems. 2025

work page 2025
[8]

Controlling Dialogue Systems with Graph-Based Structures

Hilgendorf, Laetitia Mina. Controlling Dialogue Systems with Graph-Based Structures. Proceedings of the 21st Workshop of Young Researchers' Roundtable on Spoken Dialogue Systems. 2025

work page 2025
[9]

Multimodal Agentic Dialogue Systems for Situated Human-Robot Interaction

Sucal, Virgile. Multimodal Agentic Dialogue Systems for Situated Human-Robot Interaction. Proceedings of the 21st Workshop of Young Researchers' Roundtable on Spoken Dialogue Systems. 2025

work page 2025
[10]

Knowledge Graphs and Representational Models for Dialogue Systems

Walker, Nicholas Thomas. Knowledge Graphs and Representational Models for Dialogue Systems. Proceedings of the 21st Workshop of Young Researchers' Roundtable on Spoken Dialogue Systems. 2025

work page 2025
[11]

Proceedings of the 1st Joint Workshop on Large Language Models and Structure Modeling (XLLM 2025). 2025. doi:10.18653/v1/2025.xllm-1.0

work page doi:10.18653/v1/2025.xllm-1.0 2025
[12]

Fine-Tuning Large Language Models for Relation Extraction within a Retrieval-Augmented Generation Framework

Efeoglu, Sefika and Paschke, Adrian. Fine-Tuning Large Language Models for Relation Extraction within a Retrieval-Augmented Generation Framework. Proceedings of the 1st Joint Workshop on Large Language Models and Structure Modeling (XLLM 2025). 2025. doi:10.18653/v1/2025.xllm-1.1

work page doi:10.18653/v1/2025.xllm-1.1 2025
[13]

Benchmarking Table Extraction: Multimodal LLM s vs Traditional OCR

Nunes, Guilherme and Rolla, Vitor and Pereira, Duarte and Alves, Vasco and Carreiro, Andre and Baptista, M \'a rcia. Benchmarking Table Extraction: Multimodal LLM s vs Traditional OCR. Proceedings of the 1st Joint Workshop on Large Language Models and Structure Modeling (XLLM 2025). 2025. doi:10.18653/v1/2025.xllm-1.2

work page doi:10.18653/v1/2025.xllm-1.2 2025
[14]

Injecting Structured Knowledge into LLM s via Graph Neural Networks

Li, Zichao and Ke, Zong and Zhao, Puning. Injecting Structured Knowledge into LLM s via Graph Neural Networks. Proceedings of the 1st Joint Workshop on Large Language Models and Structure Modeling (XLLM 2025). 2025. doi:10.18653/v1/2025.xllm-1.3

work page doi:10.18653/v1/2025.xllm-1.3 2025
[15]

Regular-pattern-sensitive CRF s for Distant Label Interactions

Papay, Sean and Klinger, Roman and Pad \'o , Sebastian. Regular-pattern-sensitive CRF s for Distant Label Interactions. Proceedings of the 1st Joint Workshop on Large Language Models and Structure Modeling (XLLM 2025). 2025. doi:10.18653/v1/2025.xllm-1.4

work page doi:10.18653/v1/2025.xllm-1.4 2025
[16]

From Syntax to Semantics: Evaluating the Impact of Linguistic Structures on LLM -Based Information Extraction

Swarup, Anushka and Bhandarkar, Avanti and Wilson, Ronald and Pan, Tianyu and Woodard, Damon. From Syntax to Semantics: Evaluating the Impact of Linguistic Structures on LLM -Based Information Extraction. Proceedings of the 1st Joint Workshop on Large Language Models and Structure Modeling (XLLM 2025). 2025. doi:10.18653/v1/2025.xllm-1.5

work page doi:10.18653/v1/2025.xllm-1.5 2025
[17]

Detecting Referring Expressions in Visually Grounded Dialogue with Autoregressive Language Models

Willemsen, Bram and Skantze, Gabriel. Detecting Referring Expressions in Visually Grounded Dialogue with Autoregressive Language Models. Proceedings of the 1st Joint Workshop on Large Language Models and Structure Modeling (XLLM 2025). 2025. doi:10.18653/v1/2025.xllm-1.6

work page doi:10.18653/v1/2025.xllm-1.6 2025
[18]

Exploring Multilingual Probing in Large Language Models: A Cross-Language Analysis

Li, Daoyang and Zhao, Haiyan and Zeng, Qingcheng and Du, Mengnan. Exploring Multilingual Probing in Large Language Models: A Cross-Language Analysis. Proceedings of the 1st Joint Workshop on Large Language Models and Structure Modeling (XLLM 2025). 2025. doi:10.18653/v1/2025.xllm-1.7

work page doi:10.18653/v1/2025.xllm-1.7 2025
[19]

Self-Contrastive Loop of Thought Method for Text-to- SQL Based on Large Language Model

Kang, Fengrui and Tan, Mingxi and Huang, Xianying and Yang, Shiju. Self-Contrastive Loop of Thought Method for Text-to- SQL Based on Large Language Model. Proceedings of the 1st Joint Workshop on Large Language Models and Structure Modeling (XLLM 2025). 2025. doi:10.18653/v1/2025.xllm-1.8

work page doi:10.18653/v1/2025.xllm-1.8 2025
[20]

Combining Automated and Manual Data for Effective Downstream Fine-Tuning of Transformers for Low-Resource Language Applications

Isaeva, Ulyana and Astafurov, Danil and Martynov, Nikita. Combining Automated and Manual Data for Effective Downstream Fine-Tuning of Transformers for Low-Resource Language Applications. Proceedings of the 1st Joint Workshop on Large Language Models and Structure Modeling (XLLM 2025). 2025. doi:10.18653/v1/2025.xllm-1.9

work page doi:10.18653/v1/2025.xllm-1.9 2025
[21]

Seamlessly Integrating Tree-Based Positional Embeddings into Transformer Models for Source Code Representation

Bartkowiak, Patryk and Grali \'n ski, Filip. Seamlessly Integrating Tree-Based Positional Embeddings into Transformer Models for Source Code Representation. Proceedings of the 1st Joint Workshop on Large Language Models and Structure Modeling (XLLM 2025). 2025. doi:10.18653/v1/2025.xllm-1.10

work page doi:10.18653/v1/2025.xllm-1.10 2025
[22]

Enhancing AMR Parsing with Group Relative Policy Optimization

Barta, Botond and Hamerlik, Endre and Nyist, Mil \'a n and Ito, Masato and Acs, Judit. Enhancing AMR Parsing with Group Relative Policy Optimization. Proceedings of the 1st Joint Workshop on Large Language Models and Structure Modeling (XLLM 2025). 2025. doi:10.18653/v1/2025.xllm-1.11

work page doi:10.18653/v1/2025.xllm-1.11 2025
[23]

Structure Modeling Approach for UD Parsing of Historical M odern J apanese

Ozaki, Hiroaki and Omura, Mai and Komiya, Kanako and Asahara, Masayuki and Ogiso, Toshinobu. Structure Modeling Approach for UD Parsing of Historical M odern J apanese. Proceedings of the 1st Joint Workshop on Large Language Models and Structure Modeling (XLLM 2025). 2025. doi:10.18653/v1/2025.xllm-1.12

work page doi:10.18653/v1/2025.xllm-1.12 2025
[24]

BARTABSA ++: Revisiting BARTABSA with Decoder LLM s

Pfister, Jan and V. BARTABSA ++: Revisiting BARTABSA with Decoder LLM s. Proceedings of the 1st Joint Workshop on Large Language Models and Structure Modeling (XLLM 2025). 2025. doi:10.18653/v1/2025.xllm-1.13

work page doi:10.18653/v1/2025.xllm-1.13 2025
[25]

Typed- RAG : Type-Aware Decomposition of Non-Factoid Questions for Retrieval-Augmented Generation

Lee, DongGeon and Park, Ahjeong and Lee, Hyeri and Nam, Hyeonseo and Maeng, Yunho. Typed- RAG : Type-Aware Decomposition of Non-Factoid Questions for Retrieval-Augmented Generation. Proceedings of the 1st Joint Workshop on Large Language Models and Structure Modeling (XLLM 2025). 2025. doi:10.18653/v1/2025.xllm-1.14

work page doi:10.18653/v1/2025.xllm-1.14 2025
[26]

Do we still need Human Annotators? Prompting Large Language Models for Aspect Sentiment Quad Prediction

Hellwig, Nils Constantin and Fehle, Jakob and Kruschwitz, Udo and Wolff, Christian. Do we still need Human Annotators? Prompting Large Language Models for Aspect Sentiment Quad Prediction. Proceedings of the 1st Joint Workshop on Large Language Models and Structure Modeling (XLLM 2025). 2025. doi:10.18653/v1/2025.xllm-1.15

work page doi:10.18653/v1/2025.xllm-1.15 2025
[27]

Can LLM s Interpret and Leverage Structured Linguistic Representations? A Case Study with AMR s

Raut, Ankush and Zhu, Xiaofeng and Pacheco, Maria Leonor. Can LLM s Interpret and Leverage Structured Linguistic Representations? A Case Study with AMR s. Proceedings of the 1st Joint Workshop on Large Language Models and Structure Modeling (XLLM 2025). 2025. doi:10.18653/v1/2025.xllm-1.16

work page internal anchor Pith review doi:10.18653/v1/2025.xllm-1.16 2025
[28]

LLM Dependency Parsing with In-Context Rules

Ginn, Michael and Palmer, Alexis. LLM Dependency Parsing with In-Context Rules. Proceedings of the 1st Joint Workshop on Large Language Models and Structure Modeling (XLLM 2025). 2025. doi:10.18653/v1/2025.xllm-1.17

work page doi:10.18653/v1/2025.xllm-1.17 2025
[29]

Cognitive Mirroring for D oc RE : A Self-Supervised Iterative Reflection Framework with Triplet-Centric Explicit and Implicit Feedback

Han, Xu and Wang, Bo and Sun, Yueheng and Zhao, Dongming and Qu, Zongfeng and He, Ruifang and Hou, Yuexian and Hu, Qinghua. Cognitive Mirroring for D oc RE : A Self-Supervised Iterative Reflection Framework with Triplet-Centric Explicit and Implicit Feedback. Proceedings of the 1st Joint Workshop on Large Language Models and Structure Modeling (XLLM 2025)...

work page doi:10.18653/v1/2025.xllm-1.18 2025
[30]

Cross-Document Event-Keyed Summarization

Walden, William and Kuchmiichuk, Pavlo and Martin, Alexander and Jin, Chihsheng and Cao, Angela and Sun, Claire and Allen, Curisia and White, Aaron Steven. Cross-Document Event-Keyed Summarization. Proceedings of the 1st Joint Workshop on Large Language Models and Structure Modeling (XLLM 2025). 2025. doi:10.18653/v1/2025.xllm-1.19

work page doi:10.18653/v1/2025.xllm-1.19 2025
[31]

Transfer of Structural Knowledge from Synthetic Languages

Budnikov, Mikhail and Yamshchikov, Ivan. Transfer of Structural Knowledge from Synthetic Languages. Proceedings of the 1st Joint Workshop on Large Language Models and Structure Modeling (XLLM 2025). 2025. doi:10.18653/v1/2025.xllm-1.20

work page doi:10.18653/v1/2025.xllm-1.20 2025
[32]

Language Models are Universal Embedders

Zhang, Xin and Li, Zehan and Zhang, Yanzhao and Long, Dingkun and Xie, Pengjun and Zhang, Meishan and Zhang, Min. Language Models are Universal Embedders. Proceedings of the 1st Joint Workshop on Large Language Models and Structure Modeling (XLLM 2025). 2025. doi:10.18653/v1/2025.xllm-1.21

work page doi:10.18653/v1/2025.xllm-1.21 2025
[33]

D ia DP @ XLLM 25: Advancing C hinese Dialogue Parsing via Unified Pretrained Language Models and Biaffine Dependency Scoring

Duan, Shuoqiu and Chen, Xiaoliang and Miao, Duoqian and Gu, Xu and Li, Xianyong and Du, Yajun. D ia DP @ XLLM 25: Advancing C hinese Dialogue Parsing via Unified Pretrained Language Models and Biaffine Dependency Scoring. Proceedings of the 1st Joint Workshop on Large Language Models and Structure Modeling (XLLM 2025). 2025. doi:10.18653/v1/2025.xllm-1.22

work page doi:10.18653/v1/2025.xllm-1.22 2025
[34]

LLMSR @ XLLM 25: Less is More: Enhancing Structured Multi-Agent Reasoning via Quality-Guided Distillation

Yuan, Jiahao and Sun, Xingzhe and Yu, Xing and Wang, Jingwen and Du, Dehui and Cui, Zhiqing and Di, Zixiang. LLMSR @ XLLM 25: Less is More: Enhancing Structured Multi-Agent Reasoning via Quality-Guided Distillation. Proceedings of the 1st Joint Workshop on Large Language Models and Structure Modeling (XLLM 2025). 2025. doi:10.18653/v1/2025.xllm-1.23

work page doi:10.18653/v1/2025.xllm-1.23 2025
[35]

S peech EE @ XLLM 25: End-to-End Structured Event Extraction from Speech

Chaudhuri, Soham and Biswas, Diganta and Saha, Dipanjan and Das, Dipankar and Bandyopadhyay, Sivaji. S peech EE @ XLLM 25: End-to-End Structured Event Extraction from Speech. Proceedings of the 1st Joint Workshop on Large Language Models and Structure Modeling (XLLM 2025). 2025. doi:10.18653/v1/2025.xllm-1.24

work page doi:10.18653/v1/2025.xllm-1.24 2025
[36]

Luu, Son and Van Nguyen, Kiet

Pham Hoang Le, Nguyen and Dinh Thien, An and T. Luu, Son and Van Nguyen, Kiet. D oc IE @ XLLM 25: Z ero S emble - Robust and Efficient Zero-Shot Document Information Extraction with Heterogeneous Large Language Model Ensembles. Proceedings of the 1st Joint Workshop on Large Language Models and Structure Modeling (XLLM 2025). 2025. doi:10.18653/v1/2025.xllm-1.25

work page doi:10.18653/v1/2025.xllm-1.25 2025
[37]

D oc IE @ XLLM 25: In-Context Learning for Information Extraction using Fully Synthetic Demonstrations

Popovic, Nicholas and Kangen, Ashish and Schopf, Tim and F. D oc IE @ XLLM 25: In-Context Learning for Information Extraction using Fully Synthetic Demonstrations. Proceedings of the 1st Joint Workshop on Large Language Models and Structure Modeling (XLLM 2025). 2025. doi:10.18653/v1/2025.xllm-1.26

work page doi:10.18653/v1/2025.xllm-1.26 2025
[38]

LLMSR @ XLLM 25: Integrating Reasoning Prompt Strategies with Structural Prompt Formats for Enhanced Logical Inference

Tai, Le and Van, Thin. LLMSR @ XLLM 25: Integrating Reasoning Prompt Strategies with Structural Prompt Formats for Enhanced Logical Inference. Proceedings of the 1st Joint Workshop on Large Language Models and Structure Modeling (XLLM 2025). 2025. doi:10.18653/v1/2025.xllm-1.27

work page doi:10.18653/v1/2025.xllm-1.27 2025
[39]

D oc IE @ XLLM 25: UIEP rompter: A Unified Training-Free Framework for universal document-level information extraction via Structured Prompt

Qiu, Chengfeng and Zhou, Lifeng and Wei, Kaifeng and Li, Yuke. D oc IE @ XLLM 25: UIEP rompter: A Unified Training-Free Framework for universal document-level information extraction via Structured Prompt. Proceedings of the 1st Joint Workshop on Large Language Models and Structure Modeling (XLLM 2025). 2025. doi:10.18653/v1/2025.xllm-1.28

work page doi:10.18653/v1/2025.xllm-1.28 2025
[40]

LLMSR @ XLLM 25: SWRV : Empowering Self-Verification of Small Language Models through Step-wise Reasoning and Verification

Chen, Danchun. LLMSR @ XLLM 25: SWRV : Empowering Self-Verification of Small Language Models through Step-wise Reasoning and Verification. Proceedings of the 1st Joint Workshop on Large Language Models and Structure Modeling (XLLM 2025). 2025. doi:10.18653/v1/2025.xllm-1.29

work page doi:10.18653/v1/2025.xllm-1.29 2025
[41]

LLMSR @ XLLM 25: An Empirical Study of LLM for Structural Reasoning

Li, Xinye and Wan, Mingqi and Sui, Dianbo. LLMSR @ XLLM 25: An Empirical Study of LLM for Structural Reasoning. Proceedings of the 1st Joint Workshop on Large Language Models and Structure Modeling (XLLM 2025). 2025. doi:10.18653/v1/2025.xllm-1.30

work page doi:10.18653/v1/2025.xllm-1.30 2025
[42]

LLMSR @ XLLM 25: A Language Model-Based Pipeline for Structured Reasoning Data Construction

Xing, Hongrui and Liu, Xinzhang and Jiang, Zhuo and Yang, Zhihao and Yao, Yitong and Wang, Zihan and Deng, Wenmin and Wang, Chao and Song, Shuangyong and Yang, Wang and He, Zhongjiang and Li, Yongxiang. LLMSR @ XLLM 25: A Language Model-Based Pipeline for Structured Reasoning Data Construction. Proceedings of the 1st Joint Workshop on Large Language Model...

work page doi:10.18653/v1/2025.xllm-1.31 2025
[43]

S peech EE @ XLLM 25: Retrieval-Enhanced Few-Shot Prompting for Speech Event Extraction

Gedeon, M \'a t \'e. S peech EE @ XLLM 25: Retrieval-Enhanced Few-Shot Prompting for Speech Event Extraction. Proceedings of the 1st Joint Workshop on Large Language Models and Structure Modeling (XLLM 2025). 2025. doi:10.18653/v1/2025.xllm-1.32

work page doi:10.18653/v1/2025.xllm-1.32 2025
[44]

Proceedings of the Workshop on Sign Language Processing (WSLP). 2025

work page 2025
[45]

Overview of the First Workshop on Sign Language Processing ( WSLP 2025)

Singh, Sanjeet and Joshi, Abhinav and Artiaga, Keren and Hasanuzzaman, Mohammed and Quiroga, Facundo Manuel and Kamila, Sabyasachi and Modi, Ashutosh. Overview of the First Workshop on Sign Language Processing ( WSLP 2025). Proceedings of the Workshop on Sign Language Processing (WSLP). 2025

work page 2025
[46]

I ndian S ign L anguage Recognition and Translation into O dia

Nayak, Astha Swarupa and Subudhi, Naisargika and Rana, Tannushree and Sahu, Muktikanta and Balabantaray, Rakesh Chandra. I ndian S ign L anguage Recognition and Translation into O dia. Proceedings of the Workshop on Sign Language Processing (WSLP). 2025

work page 2025
[47]

Low-Resource Sign Language Glossing Profits From Data Augmentation

Ortiz, Diana Vania Lara and Pad \'o , Sebastian. Low-Resource Sign Language Glossing Profits From Data Augmentation. Proceedings of the Workshop on Sign Language Processing (WSLP). 2025

work page 2025
[48]

Augmenting Sign Language Translation Datasets with Large Language Models

Bianco, Pedro Alejandro Dal and Reinhold, Jean Paul Nunes and Quiroga, Facundo Manuel and Ronchetti, Franco. Augmenting Sign Language Translation Datasets with Large Language Models. Proceedings of the Workshop on Sign Language Processing (WSLP). 2025

work page 2025
[49]

Multilingual Sign Language Translation with Unified Datasets and Pose-Based Transformers

Bianco, Pedro Alejandro Dal and Stanchi, Oscar Agust \'i n and Quiroga, Facundo Manuel and Ronchetti, Franco. Multilingual Sign Language Translation with Unified Datasets and Pose-Based Transformers. Proceedings of the Workshop on Sign Language Processing (WSLP). 2025

work page 2025
[50]

and Namboodiri, Vinay P

R, Kirandevraj and Kurmi, Vinod K. and Namboodiri, Vinay P. and Jawahar, C.v. Continuous Fingerspelling Dataset for I ndian S ign L anguage. Proceedings of the Workshop on Sign Language Processing (WSLP). 2025

work page 2025
[51]

Enhancing I ndian S ign L anguage Translation via Motion-Aware Modeling

Chowdhury, Anal Roy and Sanyal, Debarshi Kumar. Enhancing I ndian S ign L anguage Translation via Motion-Aware Modeling. Proceedings of the Workshop on Sign Language Processing (WSLP). 2025

work page 2025
[52]

Pose-Based Temporal Convolutional Networks for Isolated I ndian S ign L anguage Word Recognition

Reddy, Tatigunta Bhavi Teja and Kamakshi, Vidhya. Pose-Based Temporal Convolutional Networks for Isolated I ndian S ign L anguage Word Recognition. Proceedings of the Workshop on Sign Language Processing (WSLP). 2025

work page 2025
[53]

Cross-Linguistic Phonological Similarity Analysis in Sign Languages Using H am N o S ys

Varanasi, Abhishek Bharadwaj and Sinha, Manjira and Dasgupta, Tirthankar. Cross-Linguistic Phonological Similarity Analysis in Sign Languages Using H am N o S ys. Proceedings of the Workshop on Sign Language Processing (WSLP). 2025

work page 2025
[54]

Pose-Based Sign Language Spotting via an End-to-End Encoder Architecture

Johnny, Samuel Ebimobowei and Guda, Blessed and Aaron, Emmanuel and Gueye, Assane. Pose-Based Sign Language Spotting via an End-to-End Encoder Architecture. Proceedings of the Workshop on Sign Language Processing (WSLP). 2025

work page 2025
[55]

Finetuning Pre-trained Language Models for Bidirectional Sign Language Gloss to Text Translation

Kermani, Arshia and Irani, Habib and Metsis, Vangelis. Finetuning Pre-trained Language Models for Bidirectional Sign Language Gloss to Text Translation. Proceedings of the Workshop on Sign Language Processing (WSLP). 2025

work page 2025
[56]

Computational Sanskrit and Digital Humanities - World Sanskrit Conference 2025. 2025

work page 2025
[57]

An introduction to computational identification and classification of Upam \= a alaṇk \= a ra

Jadhav, Bhakti and Dutta, Himanshu and Kanitkar, Shruti and Kulkarni, Malhar and Bhattacharyya, Pushpak. An introduction to computational identification and classification of Upam \= a alaṇk \= a ra. Computational Sanskrit and Digital Humanities - World Sanskrit Conference 2025. 2025

work page 2025
[58]

Aesthetics of S anskrit Poetry from the Perspective of Computational Linguistics: A Case Study Analysis on \'S ikṣ \= a ṣṭaka

Sandhan, Jivnesh and Barbadikar, Amruta and Maity, Malay and Satuluri, Pavankumar and Sandhan, Tushar and Gupta, Ravi M and Goyal, Pawan and Behera, Laxmidhar. Aesthetics of S anskrit Poetry from the Perspective of Computational Linguistics: A Case Study Analysis on \'S ikṣ \= a ṣṭaka. Computational Sanskrit and Digital Humanities - World Sanskrit Confere...

work page 2025
[59]

Itaretara Dvandva: A challenge for Dependency Tree semantics

Kulkarni, Amba and Neelamana, Vasudha. Itaretara Dvandva: A challenge for Dependency Tree semantics. Computational Sanskrit and Digital Humanities - World Sanskrit Conference 2025. 2025

work page 2025
[60]

A Case Study of Handwritten Text Recognition from Pre-Colonial era S anskrit Manuscripts

Chincholikar, Kartik and Dwivedi, Shagun and Gopalan, Kaushik and Awasthi, Tarinee. A Case Study of Handwritten Text Recognition from Pre-Colonial era S anskrit Manuscripts. Computational Sanskrit and Digital Humanities - World Sanskrit Conference 2025. 2025

work page 2025
[61]

Towards Accent-Aware V edic S anskrit Optical Character Recognition Based on Transformer Models

Tsukagoshi, Yuzuki and Kuroiwa, Ryo and Ohmukai, Ikki. Towards Accent-Aware V edic S anskrit Optical Character Recognition Based on Transformer Models. Computational Sanskrit and Digital Humanities - World Sanskrit Conference 2025. 2025

work page 2025
[62]

Vedavani: A Benchmark Corpus for ASR on V edic S anskrit Poetry

Kumar, Sujeet and Ray, Pretam and Beerukuri, Abhinay and Kamoji, Shrey and Jagadeeshan, Manoj Balaji and Goyal, Pawan. Vedavani: A Benchmark Corpus for ASR on V edic S anskrit Poetry. Computational Sanskrit and Digital Humanities - World Sanskrit Conference 2025. 2025

work page 2025
[63]

Compound Type Identification in S anskrit

Krishnan, Sriram and Satuluri, Pavankumar and Barbadikar, Amruta and Prasanna Venkatesh, T S and Kulkarni, Amba. Compound Type Identification in S anskrit. Computational Sanskrit and Digital Humanities - World Sanskrit Conference 2025. 2025

work page 2025
[64]

IKML : A Markup Language for Collaborative Semantic Annotation of I ndic Texts

Lakkundi, Chaitanya S and Rajaraman, Gopalakrishnan and Susarla, Sai Rama Krishna. IKML : A Markup Language for Collaborative Semantic Annotation of I ndic Texts. Computational Sanskrit and Digital Humanities - World Sanskrit Conference 2025. 2025

work page 2025
[65]

Challenges in Processing V edic S anskrit: Towards creating a normalized dataset for the Ṛgveda-saṃhit \= a

Krishnan, Sriram and Gayathri, Sepuri and Kulkarni, Amba. Challenges in Processing V edic S anskrit: Towards creating a normalized dataset for the Ṛgveda-saṃhit \= a. Computational Sanskrit and Digital Humanities - World Sanskrit Conference 2025. 2025

work page 2025
[66]

P \= a ṇḍitya: Visualizing S anskrit Intellectual Networks

Neill, Tyler. P \= a ṇḍitya: Visualizing S anskrit Intellectual Networks. Computational Sanskrit and Digital Humanities - World Sanskrit Conference 2025. 2025

work page 2025
[67]

Anveshana: A New Benchmark Dataset for Cross-Lingual Information Retrieval on E nglish Queries and S anskrit Documents

Jagadeeshan, Manoj Balaji and Raj, Prince and Goyal, Pawan. Anveshana: A New Benchmark Dataset for Cross-Lingual Information Retrieval on E nglish Queries and S anskrit Documents. Computational Sanskrit and Digital Humanities - World Sanskrit Conference 2025. 2025

work page 2025
[68]

Concordance of S anskrit Synonyms

Patel, Dhaval. Concordance of S anskrit Synonyms. Computational Sanskrit and Digital Humanities - World Sanskrit Conference 2025. 2025

work page 2025
[69]

Proceedings of the First Workshop on Writing Aids at the Crossroads of AI, Cognitive Science and NLP (WRAICOGS 2025). 2025

work page 2025
[70]

Chain-of- M eta W riting: Linguistic and Textual Analysis of How Small Language Models Write Young Students Texts

Buhnila, Ioana and Cislaru, Georgeta and Todirascu, Amalia. Chain-of- M eta W riting: Linguistic and Textual Analysis of How Small Language Models Write Young Students Texts. Proceedings of the First Workshop on Writing Aids at the Crossroads of AI, Cognitive Science and NLP (WRAICOGS 2025). 2025

work page 2025
[71]

Semantic Masking in a Needle-in-a-haystack Test for Evaluating Large Language Model Long-Text Capabilities

Shi, Ken and Penn, Gerald. Semantic Masking in a Needle-in-a-haystack Test for Evaluating Large Language Model Long-Text Capabilities. Proceedings of the First Workshop on Writing Aids at the Crossroads of AI, Cognitive Science and NLP (WRAICOGS 2025). 2025

work page 2025
[72]

Reading Between the Lines: A dataset and a study on why some texts are tougher than others

Khallaf, Nouran and Eugeni, Carlo and Sharoff, Serge. Reading Between the Lines: A dataset and a study on why some texts are tougher than others. Proceedings of the First Workshop on Writing Aids at the Crossroads of AI, Cognitive Science and NLP (WRAICOGS 2025). 2025

work page 2025
[73]

P ara R ev : Building a dataset for Scientific Paragraph Revision annotated with revision instruction

Jourdan, L \'e ane and Boudin, Florian and Dufour, Richard and Hernandez, Nicolas and Aizawa, Akiko. P ara R ev : Building a dataset for Scientific Paragraph Revision annotated with revision instruction. Proceedings of the First Workshop on Writing Aids at the Crossroads of AI, Cognitive Science and NLP (WRAICOGS 2025). 2025

work page 2025
[74]

Towards an operative definition of creative writing: a preliminary assessment of creativeness in AI and human texts

Maggi, Chiara and Vitaletti, Andrea. Towards an operative definition of creative writing: a preliminary assessment of creativeness in AI and human texts. Proceedings of the First Workshop on Writing Aids at the Crossroads of AI, Cognitive Science and NLP (WRAICOGS 2025). 2025

work page 2025
[75]

Decoding Semantic Representations in the Brain Under Language Stimuli with Large Language Models

Sato, Anna and Kobayashi, Ichiro. Decoding Semantic Representations in the Brain Under Language Stimuli with Large Language Models. Proceedings of the First Workshop on Writing Aids at the Crossroads of AI, Cognitive Science and NLP (WRAICOGS 2025). 2025

work page 2025
[76]

Proceedings of the 5th Wordplay: When Language meets Games Workshop (Wordplay 2025). 2025. doi:10.18653/v1/2025.wordplay-1.0

work page doi:10.18653/v1/2025.wordplay-1.0 2025
[77]

Proceedings of the The 9th Workshop on Online Abuse and Harms (WOAH). 2025

work page 2025
[78]

A Comprehensive Taxonomy of Bias Mitigation Methods for Hate Speech Detection

Fillies, Jan and Wawerek, Marius and Paschke, Adrian. A Comprehensive Taxonomy of Bias Mitigation Methods for Hate Speech Detection. Proceedings of the The 9th Workshop on Online Abuse and Harms (WOAH). 2025

work page 2025
[79]

Sensitive Content Classification in Social Media: A Holistic Resource and Evaluation

Antypas, Dimosthenis and Sen, Indira and Perez Almendros, Carla and Camacho-Collados, Jose and Barbieri, Francesco. Sensitive Content Classification in Social Media: A Holistic Resource and Evaluation. Proceedings of the The 9th Workshop on Online Abuse and Harms (WOAH). 2025

work page 2025
[80]

From civility to parity: Marxist-feminist ethics for context-aware algorithmic content moderation

Oh, Dayei. From civility to parity: Marxist-feminist ethics for context-aware algorithmic content moderation. Proceedings of the The 9th Workshop on Online Abuse and Harms (WOAH). 2025

work page 2025

Showing first 80 references.