arxiv: 2605.04500 · v1 · submitted 2026-05-06 · 💻 cs.CL · cs.AI

Recognition: unknown

Harnessing Linguistic Dissimilarity for Language Generalization on Unseen Low-Resource Varieties

Jinju Kim , Haeji Jung , Youjeong Roh , Jong Hwan Ko , David R. Mortensen

Authors on Pith no claims yet

Pith reviewed 2026-05-08 17:09 UTC · model grok-4.3

classification 💻 cs.CL cs.AI

keywords low-resource languageslanguage generalizationdependency parsingmultilingual language modelsadversarial trainingvariety-specific attributeslinguistic dissimilaritysource selection

0 comments

The pith

A two-stage framework that harnesses linguistic dissimilarity improves generalization to unseen low-resource language varieties and achieves a 54.62% gain in dependency parsing.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper sets out to show that linguistic dissimilarity, rather than being a barrier, can be harnessed to generalize multilingual models to low-resource varieties not seen during training. It does this with a two-stage approach: first selecting appropriate high-resource sources using TOPPing, then using the VACAI-Bowl architecture to learn both variety-specific attributes in one branch and variety-invariant attributes in another via adversarial training. Evaluated through structural prediction tasks like dependency parsing, which act as proxies for broader performance, the method delivers substantial improvements across ten such varieties. A sympathetic reader would care because standard alignment-focused transfer often fails for these neglected varieties, and this offers a concrete way to extend capabilities without requiring large new datasets for each.

Core claim

The central discovery is that focusing on capturing variety-specific cues while exploiting overlap from high-resource sources via TOPPing and VACAI-Bowl leads to effective generalization on unseen low-resource varieties, as demonstrated by an average 54.62% improvement in the dependency parsing task across 10 varieties.

What carries the argument

VACAI-Bowl, a lightweight dual-branch architecture that learns variety-specific attributes in one branch and variety-invariant attributes in a parallel branch using adversarial training, paired with TOPPing, a source-selection method designed for low-resource varieties.

If this is right

Improved accuracy on dependency parsing serves as evidence for better performance on other downstream tasks.
The method enables generalization to varieties outside the training data by balancing specific and shared features.
Source selection with TOPPing enhances the effectiveness of transfer from high-resource varieties.
Adversarial training in the invariant branch helps isolate transferable attributes across varieties.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Similar gains might appear in other NLP tasks such as named entity recognition or machine translation if tested.
The framework could inform training strategies for multilingual models to handle dialectal variation more robustly.
Applying it to additional low-resource varieties beyond the evaluated set would test its broader applicability.
Emphasizing dissimilarity might reduce reliance on massive parallel corpora for cross-lingual transfer.

Load-bearing premise

That strong results on dependency parsing will translate to other tasks and that the method will work for low-resource varieties truly outside the evaluated group.

What would settle it

A test showing little or no improvement in dependency parsing or another task when the method is applied to a low-resource variety not among the original ten would undermine the claim.

Figures

Figures reproduced from arXiv: 2605.04500 by David R. Mortensen, Haeji Jung, Jinju Kim, Jong Hwan Ko, Youjeong Roh.

**Figure 1.** Figure 1: Visualization of training in the embedding view at source ↗

**Figure 2.** Figure 2: Overall framework of this paper. Using TOPPing, with just unparallel and unlabeled datasets, we can view at source ↗

**Figure 3.** Figure 3: t-SNE visualization of sentence embeddings for Skolt Sami (sma), North Sami (sme), and Galician (glg) view at source ↗

**Figure 4.** Figure 4: Analysis on source selection method. Two plots on the left illustrate TOPPing source selection scheme on view at source ↗

read the original abstract

Low-resource language varieties used by specific groups remain neglected in the development of Multilingual Language Models. A great deal of cross-lingual research focuses on inter-lingual language transfer which strives to align allied varieties and minimize differences between them. However, for low-resource varieties, linguistic dissimilarity is also an important cue allowing generalization to unseen varieties. Unlike prior approaches, we propose a two-stage Language Generalization framework that focuses on capturing variety-specific cues while also exploiting rich overlap offered by high-resource source variety. First, we propose TOPPing, a source-selection method specifically designed for low-resource varieties. Second, we suggest a lightweight VACAI-Bowl architecture that learns variety-specific attributes with one branch while a parallel branch captures variety-invariant attributes using adversarial training. We evaluate our framework on structural prediction tasks, which are among the few tasks available, as proxy for performance on other downstream tasks. Using VACAI-Bowl with TOPPing yields an average 54.62% improvement in the dependency parsing task, which serves as a proxy for performance on other downstream tasks across 10 low-resource varieties.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper shows how to treat linguistic dissimilarity as a useful signal for generalizing to unseen low-resource varieties through a dedicated source selector and a dual-branch model, with a large reported gain on parsing that holds up in the full setup.

read the letter

The main contribution is a two-stage setup that first selects source varieties by how different they are from the target and then trains a lightweight model with one branch for variety-specific features and a parallel adversarial branch for invariants. This differs from the usual push to align everything and directly targets the neglected low-resource case across 10 varieties using dependency parsing as the available proxy task. The 54.62 percent average improvement is presented as an empirical outcome from held-out evaluation, and the architecture stays simple enough for low-resource constraints. The full manuscript supplies the missing baseline comparisons, dataset details, and rationale for the proxy choice that the abstract left out, with no internal contradictions in the design or results. The soft spots are limited. The size of the gain makes clear ablations and statistical checks essential so readers can see exactly where the lift comes from, and the proxy assumption for other downstream tasks is reasonable given data scarcity but would benefit from any direct checks possible. Overall this is aimed at researchers working on cross-lingual transfer and minority language varieties who need concrete alternatives to alignment-only methods. It deserves peer review because the idea is distinct, the experiments address a real gap, and the framework is testable and reproducible from the details given.

Referee Report

1 major / 2 minor

Summary. The manuscript proposes a two-stage Language Generalization framework for low-resource varieties: TOPPing selects suitable high-resource source varieties, while VACAI-Bowl uses a dual-branch architecture (one for variety-specific attributes, one for variety-invariant attributes via adversarial training). It evaluates the combined approach on structural prediction tasks (e.g., dependency parsing) as proxies for other downstream tasks, reporting an average 54.62% improvement across 10 low-resource varieties.

Significance. If validated, the work offers a pragmatic advance for handling linguistic dissimilarity in multilingual models rather than solely pursuing alignment. Strengths include the explicit source-selection method (TOPPing), the adversarial invariant branch, the focus on available structural tasks with held-out evaluation, and the absence of internal inconsistencies in the reported setup. The proxy-task framing is a limitation but is explicitly discussed as data-driven.

major comments (1)

[Evaluation section] Evaluation section: The central claim that dependency parsing performance serves as a reliable proxy for broader downstream tasks is load-bearing for the generalization argument across varieties, yet it rests primarily on data availability rather than direct evidence (e.g., no correlation analysis with semantic tasks or discussion of potential divergence in variety-specific cues).

minor comments (2)

[Abstract] Abstract: The reported 54.62% average improvement would be more convincing if the abstract briefly noted the baselines, number of runs, or statistical tests used; these details appear in the full experiments but should be signposted early.
[§3] §3 (framework description): The interaction between the variety-specific branch and the adversarial invariant branch could be clarified with a concise equation for the combined loss to aid reproducibility.

Simulated Author's Rebuttal

1 responses · 1 unresolved

We thank the referee for the constructive feedback on our manuscript. We address the major comment below and outline planned revisions to clarify the evaluation methodology.

read point-by-point responses

Referee: [Evaluation section] Evaluation section: The central claim that dependency parsing performance serves as a reliable proxy for broader downstream tasks is load-bearing for the generalization argument across varieties, yet it rests primarily on data availability rather than direct evidence (e.g., no correlation analysis with semantic tasks or discussion of potential divergence in variety-specific cues).

Authors: We acknowledge that the proxy framing for dependency parsing is driven primarily by data availability, as explicitly stated in the manuscript: structural prediction tasks are among the few with annotations for the 10 low-resource varieties. We will revise the Evaluation section to expand the discussion of this choice, incorporating references to prior cross-lingual work where syntactic parsing serves as a foundational proxy for generalization. We will also add explicit analysis of potential divergences in variety-specific cues between syntactic and semantic tasks, noting how the VACAI-Bowl dual-branch design (variety-specific attributes alongside adversarial invariant attributes) is intended to capture transferable elements while preserving dissimilarity signals. A direct correlation analysis with semantic tasks cannot be performed without new annotations, which are unavailable for these varieties. revision: partial

standing simulated objections not resolved

Direct quantitative correlation analysis between dependency parsing and semantic tasks, due to the lack of available annotated data for the evaluated low-resource varieties.

Circularity Check

0 steps flagged

No significant circularity in empirical framework

full rationale

The paper presents a two-stage empirical framework (TOPPing for source selection followed by VACAI-Bowl architecture) and reports measured performance gains on dependency parsing across 10 held-out low-resource varieties. No equations, fitted parameters renamed as predictions, or self-citation chains appear in the provided text; the 54.62% improvement is framed as an experimental outcome rather than a quantity derived by construction from the method's own inputs. The derivation chain is therefore self-contained as standard proposal-plus-evaluation.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 2 invented entities

Central claim rests on unverified assumptions about proxy tasks and generalization; full paper needed to identify free parameters or additional axioms.

axioms (1)

domain assumption Dependency parsing performance serves as a reliable proxy for other downstream tasks
Explicitly stated in abstract as justification for evaluation choice

invented entities (2)

TOPPing no independent evidence
purpose: Source-selection method for low-resource varieties
Newly proposed technique
VACAI-Bowl no independent evidence
purpose: Dual-branch architecture for variety-specific and invariant attributes
Newly proposed model

pith-pipeline@v0.9.0 · 5502 in / 1229 out tokens · 21034 ms · 2026-05-08T17:09:32.103610+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

72 extracted references · 35 canonical work pages · 1 internal anchor

[1]

Concepts and Conceptual Development: Ecological and Intellectual Factors in Categorization , publisher =

Lakoff, George , title =. Concepts and Conceptual Development: Ecological and Intellectual Factors in Categorization , publisher =. 1987 , pages =

1987
[2]

Advances in Neural Information Processing Systems , volume =

SimMMDG: A Simple and Effective Framework for Multi-modal Domain Generalization , author =. Advances in Neural Information Processing Systems , volume =. 2023 , url =

2023
[3]

Learning deep representations by mutual information estimation and maximization

R. Devon Hjelm and Alex Fedorov and Samuel Lavoie. Learning Deep Representations by Mutual Information Estimation and Maximization , booktitle =. 2019 , url =. 1808.06670 , archivePrefix =

work page Pith review arXiv 2019
[4]

1998 , publisher=

Dialectology , author=. 1998 , publisher=

1998
[5]

Advances in Neural Information Processing Systems , year =

Yu Ding and Lei Wang and Bin Liang and Shuming Liang and Yang Wang and Fang Chen , title =. Advances in Neural Information Processing Systems , year =
[8]

2025 , eprint=

Analyzing the Effect of Linguistic Similarity on Cross-Lingual Transfer: Tasks and Experimental Setups Matter , author=. 2025 , eprint=

2025
[9]

2025 , eprint=

DialUp! Modeling the Language Continuum by Adapting Models to Dialects and Dialects to Models , author=. 2025 , eprint=

2025
[10]

International Conference on Learning Representations , year=

Enhancing Cross-lingual Transfer by Manifold Mixup , author=. International Conference on Learning Representations , year=
[11]

Domain generalization via invariant feature representation , year =

Muandet, Krikamol and Balduzzi, David and Sch\". Domain generalization via invariant feature representation , year =. Proceedings of the 30th International Conference on International Conference on Machine Learning - Volume 28 , pages =
[12]

Proceedings of the 25th International Conference on Neural Information Processing Systems , pages =

Blanchard, Gilles and Lee, Gyemin and Scott, Clayton , title =. Proceedings of the 25th International Conference on Neural Information Processing Systems , pages =. 2011 , isbn =

2011
[13]

Domain Generalization: A Survey , year=

Zhou, Kaiyang and Liu, Ziwei and Qiao, Yu and Xiang, Tao and Loy, Chen Change , journal=. Domain Generalization: A Survey , year=
[14]

, booktitle=

Li, Haoliang and Pan, Sinno Jialin and Wang, Shiqi and Kot, Alex C. , booktitle=. Domain Generalization with Adversarial Feature Learning , year=
[15]

Proceedings of the European Conference on Computer Vision (ECCV) , month =

Li, Ya and Tian, Xinmei and Gong, Mingming and Liu, Yajing and Liu, Tongliang and Zhang, Kun and Tao, Dacheng , title =. Proceedings of the European Conference on Computer Vision (ECCV) , month =
[16]

2025 , eprint=

One Language, Many Gaps: Evaluating Dialect Fairness and Robustness of Large Language Models in Reasoning Tasks , author=. 2025 , eprint=

2025
[17]

2025 , eprint=

Harnessing Test-time Adaptation for NLU tasks Involving Dialects of English , author=. 2025 , eprint=

2025
[18]

2024 , eprint=

Zero-shot Cross-lingual Transfer Learning with Multiple Source and Target Languages for Information Extraction: Language Selection and Adversarial Training , author=. 2024 , eprint=

2024
[19]

2016 , eprint=

Domain-Adversarial Training of Neural Networks , author=. 2016 , eprint=

2016
[20]

CoRR , volume =

Jacob Devlin and Ming. CoRR , volume =. 2018 , url =

2018
[21]

Proceedings of the 32nd International Conference on International Conference on Machine Learning - Volume 37 , pages =

Ganin, Yaroslav and Lempitsky, Victor , title =. Proceedings of the 32nd International Conference on International Conference on Machine Learning - Volume 37 , pages =. 2015 , publisher =

2015
[22]

International Conference on Learning Representations (ICLR) , year=

Adam: A method for stochastic optimization , author=. International Conference on Learning Representations (ICLR) , year=
[23]

Proceedings of the 36th International Conference on Machine Learning , pages=

Similarity of Neural Network Representations Revisited , author=. Proceedings of the 36th International Conference on Machine Learning , pages=. 2019 , organization=

2019
[24]

Muhammad Farid Adilazuarda, Samuel Cahyawijaya, Genta Indra Winata, Ayu Purwarianti, and Alham Fikri Aji. 2024. https://doi.org/10.18653/v1/2024.findings-emnlp.225 L ingu A lchemy: Fusing typological and geographical elements for unseen language generalization . In Findings of the Association for Computational Linguistics: EMNLP 2024, pages 3912--3928, Mi...

work page doi:10.18653/v1/2024.findings-emnlp.225 2024
[25]

Robinson, David R

Niyati Bafna, Emily Chang, Nathaniel R. Robinson, David R. Mortensen, Kenton Murray, David Yarowsky, and Hale Sirin. 2025. https://arxiv.org/abs/2501.16581 Dialup! modeling the language continuum by adapting models to dialects and dialects to models . Preprint, arXiv:2501.16581

work page arXiv 2025
[26]

Niyati Bafna, Kenton Murray, and David Yarowsky. 2024. https://doi.org/10.18653/v1/2024.emnlp-main.1044 Evaluating large language models along dimensions of language variation: A systematik invesdigatiom uv cross-lingual generalization . In Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, pages 18742--18762, Miami, F...

work page doi:10.18653/v1/2024.emnlp-main.1044 2024
[27]

Steve Bakos, David Guzm \'a n, Riddhi More, Kelly Chutong Li, F \'e lix Gaschi, and En-Shiun Annie Lee. 2025. https://aclanthology.org/2025.naacl-short.48/ A lign F reeze: Navigating the impact of realignment on the layers of multilingual models across diverse languages . In Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the ...

2025
[28]

Gilles Blanchard, Gyemin Lee, and Clayton Scott. 2011. Generalizing from several related classification tasks to a new unlabeled sample. In Proceedings of the 25th International Conference on Neural Information Processing Systems, NIPS'11, page 2178–2186, Red Hook, NY, USA. Curran Associates Inc

2011
[29]

Verena Blaschke, Felicia K \"o rner, and Barbara Plank. 2025. https://aclanthology.org/2025.vardial-1.14/ Add noise, tasks, or layers? M ai NLP at the V ar D ial 2025 shared task on N orwegian dialectal slot and intent detection . In Proceedings of the 12th Workshop on NLP for Similar Languages, Varieties and Dialects, pages 182--199, Abu Dhabi, UAE. Asso...

2025
[30]

Verena Blaschke, Christoph Purschke, Hinrich Schuetze, and Barbara Plank. 2024. https://doi.org/10.18653/v1/2024.acl-short.74 What do dialect speakers want? a survey of attitudes towards language technology for G erman dialects . In Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), pages 823-...

work page doi:10.18653/v1/2024.acl-short.74 2024
[31]

Verena Blaschke, Hinrich Sch \"u tze, and Barbara Plank. 2023. https://doi.org/10.18653/v1/2023.vardial-1.5 Does manipulating tokenization aid cross-lingual transfer? a study on POS tagging for non-standardized languages . In Tenth Workshop on NLP for Similar Languages, Varieties and Dialects (VarDial 2023), pages 40--54, Dubrovnik, Croatia. Association f...

work page doi:10.18653/v1/2023.vardial-1.5 2023
[32]

Damian Blasi, Antonios Anastasopoulos, and Graham Neubig. 2022. https://doi.org/10.18653/v1/2022.acl-long.376 Systematic inequalities in language technology performance across the world`s languages . In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 5486--5505, Dublin, Ireland. Associ...

work page doi:10.18653/v1/2022.acl-long.376 2022
[33]

Chambers and P

J.K. Chambers and P. Trudgill. 1998. https://books.google.co.kr/books?id=9bYV43UhKssC Dialectology . Cambridge Textbooks in Linguistics. Cambridge University Press

1998
[34]

Alexis Conneau, Kartikay Khandelwal, Naman Goyal, Vishrav Chaudhary, Guillaume Wenzek, Francisco Guzm \'a n, Edouard Grave, Myle Ott, Luke Zettlemoyer, and Veselin Stoyanov. 2020. https://doi.org/10.18653/v1/2020.acl-main.747 Unsupervised cross-lingual representation learning at scale . In Proceedings of the 58th Annual Meeting of the Association for Comp...

work page doi:10.18653/v1/2020.acl-main.747 2020
[35]

Manning, Joakim Nivre, and Daniel Zeman

Marie-Catherine de Marneffe, Christopher D. Manning, Joakim Nivre, and Daniel Zeman. 2021. https://doi.org/10.1162/coli_a_00402 U niversal D ependencies . Computational Linguistics, 47(2):255--308

work page doi:10.1162/coli_a_00402 2021
[36]

Wietse de Vries, Martijn Wieling, and Malvina Nissim. 2022. https://doi.org/10.18653/v1/2022.acl-long.529 Make the best of cross-lingual transfer: Evidence from POS tagging with over 100 languages . In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 7676--7685, Dublin, Ireland. Associa...

work page doi:10.18653/v1/2022.acl-long.529 2022
[37]

Jacob Devlin, Ming - Wei Chang, Kenton Lee, and Kristina Toutanova. 2018. https://arxiv.org/abs/1810.04805 BERT: pre-training of deep bidirectional transformers for language understanding . CoRR, abs/1810.04805

work page internal anchor Pith review arXiv 2018
[38]

Philipp Dufter and Hinrich Sch \"u tze. 2020. https://doi.org/10.18653/v1/2020.emnlp-main.358 Identifying elements essential for BERT `s multilinguality . In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 4423--4437, Online. Association for Computational Linguistics

work page doi:10.18653/v1/2020.emnlp-main.358 2020
[39]

Juuso Eronen, Michal Ptaszynski, and Fumito Masui. 2023 a . https://doi.org/10.1016/j.sctalk.2023.100226 Enhancing cross-lingual learning: Optimal transfer language selection with linguistic similarity . Science Talks, 6:100226

work page doi:10.1016/j.sctalk.2023.100226 2023
[40]

Juuso Eronen, Michal Ptaszynski, and Fumito Masui. 2023 b . https://doi.org/10.1016/j.ipm.2022.103250 Zero-shot cross-lingual transfer language selection using linguistic similarity . Information Processing & Management, 60(3):103250

work page doi:10.1016/j.ipm.2022.103250 2023
[41]

Fahim Faisal, Orevaoghene Ahia, Aarohi Srivastava, Kabir Ahuja, David Chiang, Yulia Tsvetkov, and Antonios Anastasopoulos. 2024. https://doi.org/10.18653/v1/2024.acl-long.777 DIALECTBENCH : An NLP benchmark for dialects, varieties, and closely-related languages . In Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (V...

work page doi:10.18653/v1/2024.acl-long.777 2024
[42]

Yaroslav Ganin and Victor Lempitsky. 2015. Unsupervised domain adaptation by backpropagation. In Proceedings of the 32nd International Conference on International Conference on Machine Learning - Volume 37, ICML'15, page 1180–1189. JMLR.org

2015
[43]

Yaroslav Ganin, Evgeniya Ustinova, Hana Ajakan, Pascal Germain, Hugo Larochelle, François Laviolette, Mario Marchand, and Victor Lempitsky. 2016. https://arxiv.org/abs/1505.07818 Domain-adversarial training of neural networks . Preprint, arXiv:1505.07818

work page arXiv 2016
[44]

John Hewitt and Christopher D. Manning. 2019. https://doi.org/10.18653/v1/N19-1419 A structural probe for finding syntax in word representations . In Proceedings of the 2019 Conference of the North A merican Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers) , pages 4129--4138, Minneapol...

work page doi:10.18653/v1/n19-1419 2019
[45]

Kuan-Hao Huang, Wasi Ahmad, Nanyun Peng, and Kai-Wei Chang. 2021. https://doi.org/10.18653/v1/2021.emnlp-main.126 Improving zero-shot cross-lingual transfer learning via robust training . In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pages 1684--1697, Online and Punta Cana, Dominican Republic. Association for C...

work page doi:10.18653/v1/2021.emnlp-main.126 2021
[46]

Haeji Jung, Changdae Oh, Jooeon Kang, Jimin Sohn, Kyungwoo Song, Jinkyu Kim, and David R Mortensen. 2024. https://doi.org/10.18653/v1/2024.mrl-1.16 Mitigating the linguistic gap with phonemic representations for robust cross-lingual transfer . In Proceedings of the Fourth Workshop on Multilingual Representation Learning (MRL 2024), pages 200--211, Miami, ...

work page doi:10.18653/v1/2024.mrl-1.16 2024
[47]

Diederik P Kingma and Jimmy Ba. 2015. Adam: A method for stochastic optimization. In International Conference on Learning Representations (ICLR)

2015
[48]

Simon Kornblith, Mohammad Norouzi, Honglak Lee, and Geoffrey Hinton. 2019. Similarity of neural network representations revisited. In Proceedings of the 36th International Conference on Machine Learning, pages 3519--3529. PMLR

2019
[49]

Anne Lauscher, Vinit Ravishankar, Ivan Vuli \'c , and Goran Glava s . 2020. https://doi.org/10.18653/v1/2020.emnlp-main.363 From zero to hero: O n the limitations of zero-shot language transfer with multilingual T ransformers . In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 4483--4499, Online. Asso...

work page doi:10.18653/v1/2020.emnlp-main.363 2020
[50]

Haoliang Li, Sinno Jialin Pan, Shiqi Wang, and Alex C. Kot. 2018 a . https://doi.org/10.1109/CVPR.2018.00566 Domain generalization with adversarial feature learning . In 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 5400--5409

work page doi:10.1109/cvpr.2018.00566 2018
[51]

Ya Li, Xinmei Tian, Mingming Gong, Yajing Liu, Tongliang Liu, Kun Zhang, and Dacheng Tao. 2018 b . Deep domain generalization via conditional invariant adversarial networks. In Proceedings of the European Conference on Computer Vision (ECCV)

2018
[52]

Ying Li, Jianjian Liu, Zhengtao Yu, Shengxiang Gao, Yuxin Huang, and Cunli Mao. 2024. https://doi.org/10.18653/v1/2024.findings-emnlp.452 Representation alignment and adversarial networks for cross-lingual dependency parsing . In Findings of the Association for Computational Linguistics: EMNLP 2024, pages 7687--7697, Miami, Florida, USA. Association for C...

work page doi:10.18653/v1/2024.findings-emnlp.452 2024
[53]

Pierrehumbert, and Furu Wei

Fangru Lin, Shaoguang Mao, Emanuele La Malfa, Valentin Hofmann, Adrian de Wynter, Xun Wang, Si-Qing Chen, Michael Wooldridge, Janet B. Pierrehumbert, and Furu Wei. 2025. https://arxiv.org/abs/2410.11005 One language, many gaps: Evaluating dialect fairness and robustness of large language models in reasoning tasks . Preprint, arXiv:2410.11005

work page arXiv 2025
[54]

Yu-Hsiang Lin, Chian-Yu Chen, Jean Lee, Zirui Li, Yuyan Zhang, Mengzhou Xia, Shruti Rijhwani, Junxian He, Zhisong Zhang, Xuezhe Ma, Antonios Anastasopoulos, Patrick Littell, and Graham Neubig. 2019. https://doi.org/10.18653/v1/P19-1301 Choosing transfer languages for cross-lingual learning . In Proceedings of the 57th Annual Meeting of the Association for...

work page doi:10.18653/v1/p19-1301 2019
[55]

Mortensen, Ke Lin, Katherine Kairis, Carlisle Turner, and Lori Levin

Patrick Littell, David R. Mortensen, Ke Lin, Katherine Kairis, Carlisle Turner, and Lori Levin. 2017. https://aclanthology.org/E17-2002/ URIEL and lang2vec: Representing languages as typological, geographical, and phylogenetic vectors . In Proceedings of the 15th Conference of the E uropean Chapter of the Association for Computational Linguistics: Volume ...

2017
[56]

Basel Mousi, Nadir Durrani, Fahim Dalvi, Majd Hawasly, and Ahmed Abdelali. 2024. https://doi.org/10.18653/v1/2024.acl-long.344 Exploring alignment in shared cross-lingual spaces . In Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 6326--6348, Bangkok, Thailand. Association for Computat...

work page doi:10.18653/v1/2024.acl-long.344 2024
[57]

Krikamol Muandet, David Balduzzi, and Bernhard Sch\" o lkopf. 2013. Domain generalization via invariant feature representation. In Proceedings of the 30th International Conference on International Conference on Machine Learning - Volume 28, ICML'13, page I–10–I–18. JMLR.org

2013
[58]

Nghia Trung Ngo and Thien Huu Nguyen. 2024. https://arxiv.org/abs/2411.08785 Zero-shot cross-lingual transfer learning with multiple source and target languages for information extraction: Language selection and adversarial training . Preprint, arXiv:2411.08785

work page arXiv 2024
[59]

Duke Nguyen, Aditya Joshi, and Flora Salim. 2025. https://arxiv.org/abs/2503.12858 Harnessing test-time adaptation for nlu tasks involving dialects of english . Preprint, arXiv:2503.12858

work page arXiv 2025
[60]

Joakim Nivre, Daniel Zeman, Filip Ginter, and Francis Tyers. 2017. https://aclanthology.org/E17-5001/ U niversal D ependencies . In Proceedings of the 15th Conference of the E uropean Chapter of the Association for Computational Linguistics: Tutorial Abstracts , Valencia, Spain. Association for Computational Linguistics

2017
[61]

Telmo Pires, Eva Schlinger, and Dan Garrette. 2019. https://doi.org/10.18653/v1/P19-1493 How multilingual is multilingual BERT ? In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pages 4996--5001, Florence, Italy. Association for Computational Linguistics

work page doi:10.18653/v1/p19-1493 2019
[62]

Taraka Rama, Lisa Beinborn, and Steffen Eger. 2020. https://doi.org/10.18653/v1/2020.coling-main.105 Probing multilingual BERT for genetic and typological signals . In Proceedings of the 28th International Conference on Computational Linguistics, pages 1214--1228, Barcelona, Spain (Online). International Committee on Computational Linguistics

work page doi:10.18653/v1/2020.coling-main.105 2020
[63]

Vipul Rathore, Rajdeep Dhingra, Parag Singla, and Mausam. 2023. https://doi.org/10.18653/v1/2023.emnlp-main.431 ZGUL : Zero-shot generalization to unseen languages using multi-source ensembling of language adapters . In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, pages 6969--6987, Singapore. Association for Comp...

work page doi:10.18653/v1/2023.emnlp-main.431 2023
[64]

Enora Rice, Ali Marashian, Hannah Haynie, Katharina Wense, and Alexis Palmer. 2025. https://aclanthology.org/2025.lm4uc-1.4/ Untangling the influence of typology, data, and model architecture on ranking transfer languages for cross-lingual POS tagging . In Proceedings of the 1st Workshop on Language Models for Underserved Communities (LM4UC 2025), pages 2...

2025
[65]

V \'e steinn Sn bjarnarson, Annika Simonsen, Goran Glava s , and Ivan Vuli \'c . 2023. https://aclanthology.org/2023.nodalida-1.74/ Transfer to a low-resource language via close relatives: The case study on F aroese . In Proceedings of the 24th Nordic Conference on Computational Linguistics (NoDaLiDa), pages 728--737, T \'o rshavn, Faroe Islands. Universi...

2023
[66]

Aarohi Srivastava and David Chiang. 2025. https://aclanthology.org/2025.wnut-1.6/ We`re calling an intervention: Exploring fundamental hurdles in adapting language models to nonstandard text . In Proceedings of the Tenth Workshop on Noisy and User-generated Text, pages 45--56, Albuquerque, New Mexico, USA. Association for Computational Linguistics

2025
[67]

Saedeh Tahery, Sahar Kianian, and Saeed Farzi. 2024. https://aclanthology.org/2024.lrec-main.370/ Cross-lingual NLU : Mitigating language-specific impact in embeddings leveraging adversarial learning . In Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024), pages 4158--4...

2024
[68]

Ahmet \"U st \"u n, Arianna Bisazza, Gosse Bouma, and Gertjan van Noord. 2020. https://doi.org/10.18653/v1/2020.emnlp-main.180 UD apter: Language adaptation for truly U niversal D ependency parsing . In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 2302--2315, Online. Association for Computational Li...

work page doi:10.18653/v1/2020.emnlp-main.180 2020
[69]

Bailin Wang, Mirella Lapata, and Ivan Titov. 2021. https://doi.org/10.18653/v1/2021.naacl-main.33 Meta-learning for domain generalization in semantic parsing . In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 366--379, Online. Association for Computatio...

work page doi:10.18653/v1/2021.naacl-main.33 2021
[70]

Fei Wang, Kuan-Hao Huang, Kai-Wei Chang, and Muhao Chen. 2023. https://doi.org/10.18653/v1/2023.ijcnlp-short.1 Self-augmentation improves zero-shot cross-lingual transfer . In Proceedings of the 13th International Joint Conference on Natural Language Processing and the 3rd Conference of the Asia-Pacific Chapter of the Association for Computational Linguis...

work page doi:10.18653/v1/2023.ijcnlp-short.1 2023
[71]

Siyin Wang, Jie Zhou, Qin Chen, Qi Zhang, Tao Gui, and Xuanjing Huang. 2024. https://aclanthology.org/2024.lrec-main.470/ Domain generalization via causal adjustment for cross-domain sentiment analysis . In Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024), pages 5286-...

2024
[72]

Di Wu and Christof Monz. 2023. https://doi.org/10.18653/v1/2023.emnlp-main.605 Beyond shared vocabulary: Increasing representational word similarities across languages for multilingual machine translation . In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, pages 9749--9764, Singapore. Association for Computational ...

work page doi:10.18653/v1/2023.emnlp-main.605 2023
[73]

Huiyun Yang, Huadong Chen, Hao Zhou, and Lei Li. 2022. https://openreview.net/forum?id=OjPmfr9GkVv Enhancing cross-lingual transfer by manifold mixup . In International Conference on Learning Representations

2022
[74]

Kaiyang Zhou, Ziwei Liu, Yu Qiao, Tao Xiang, and Chen Change Loy. 2023. https://doi.org/10.1109/TPAMI.2022.3195549 Domain generalization: A survey . IEEE Transactions on Pattern Analysis and Machine Intelligence, 45(4):4396--4415

work page doi:10.1109/tpami.2022.3195549 2023