Recognition: unknown
Anthropogenic Regional Adaptation in Multimodal Vision-Language Model
Pith reviewed 2026-05-10 16:24 UTC · model grok-4.3
The pith
Regional data filtering and model merging lets vision-language models gain cultural relevance in specific areas while keeping global performance.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Anthropogenic Regional Adaptation is a paradigm that optimizes vision-language model relevance to specific regional contexts while retaining global generalization; it is realized through GG-EZ, which applies regional data filtering followed by model merging, and produces 5-15 percent gains in cultural relevance metrics for Southeast Asia across three model architectures with over 98 percent preservation of global performance.
What carries the argument
Anthropogenic Regional Adaptation, implemented by the GG-EZ method that combines regional data filtering with model merging to balance local cultural alignment against retained worldwide capabilities.
If this is right
- The same filtering-and-merging recipe applies without modification to large vision-language models, text-to-image diffusion models, and vision-language embedding models.
- Cultural relevance metrics in Southeast Asia can rise 5-15 percent while global task performance stays at or above 98 percent of the unadapted model.
- Model merging after regional filtering provides a lighter alternative to full retraining for achieving regional alignment.
- The approach can be repeated for other geographic regions using analogous local data subsets.
Where Pith is reading between the lines
- If the method generalizes, organizations could maintain a single global base model and produce lightweight regional variants on demand rather than training separate models from scratch.
- The technique might reduce the risk of cultural misalignment in deployed systems by allowing periodic regional updates without restarting the entire training pipeline.
- Testing on languages and visual traditions outside Southeast Asia would clarify whether the gains depend on the specific data characteristics of that region.
Load-bearing premise
Regional data filtering plus model merging will consistently raise cultural relevance scores without creating new biases or hidden performance losses that the chosen metrics miss.
What would settle it
Running the same GG-EZ procedure on additional vision-language architectures or regions and finding either cultural relevance gains below 5 percent, global performance falling under 98 percent of baseline, or new qualitative failures on local content would falsify the central effectiveness claim.
Figures
read the original abstract
While the field of vision-language (VL) has achieved remarkable success in integrating visual and textual information across multiple languages and domains, there is still no dedicated framework for assessing human-centric alignment in vision-language systems. We offer two contributions to address this gap. First, we introduce Anthropogenic Regional Adaptation: a novel paradigm that aims to optimize model relevance to specific regional contexts while ensuring the retention of global generalization capabilities. Second, we present a simple, but effective adaptation method named Geographical-generalization-made-easy (GG-EZ), which utilizes regional data filtering and model merging. Through comprehensive experiments on 3 VL architectures: large vision-language models, text-to-image diffusion models, and vision-language embedding models, and a case study in Southeast Asia (SEA) regional adaptation, we demonstrate the importance of Anthropogenic Regional Adaptation and the effectiveness of GG-EZ, showing 5-15% gains in cultural relevance metrics across SEA while maintaining over 98% of global performance and even occasionally surpassing it. Our findings establish Anthropogenic Regional Alignment as a foundational paradigm towards applicability of multimodal vision-language models in diverse regions and demonstrate a simple-yet-effective baseline method that optimizes regional value alignment while preserving global generalization.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper introduces Anthropogenic Regional Adaptation as a paradigm for optimizing multimodal vision-language models for specific regional contexts (e.g., Southeast Asia) while preserving global generalization. It proposes GG-EZ, a method based on regional data filtering followed by model merging, and reports results from experiments across three VL architectures (large vision-language models, text-to-image diffusion models, and vision-language embedding models) claiming 5-15% gains in cultural relevance metrics with retention of over 98% global performance.
Significance. If the empirical results prove robust and reproducible, the work could meaningfully advance practical deployment of VL models in culturally diverse settings by offering a simple baseline for regional value alignment. The emphasis on retaining global capabilities alongside regional gains addresses a relevant gap in current multimodal alignment research.
major comments (2)
- Abstract: The central quantitative claims (5-15% gains in cultural relevance metrics and >98% retention of global performance) are presented without any description of the exact metrics, baselines, statistical significance tests, or controls for confounding factors, preventing assessment of whether the data support the effectiveness of GG-EZ.
- Abstract: The GG-EZ method is described only at high level ('regional data filtering and model merging') with no specification of the merging operator, regional data selection criteria, or the precise global benchmarks used to certify performance retention; this directly undermines evaluation of the weakest assumption that the approach avoids unintended global capability erosion.
minor comments (1)
- Abstract: Inconsistent terminology at the end of the abstract ('Anthropogenic Regional Alignment' instead of 'Anthropogenic Regional Adaptation' as used in the title and earlier text).
Simulated Author's Rebuttal
We thank the referee for the constructive feedback on the abstract. The comments highlight opportunities to improve clarity for readers evaluating the claims. We address each point below and will revise the abstract in the resubmission to incorporate additional specificity while maintaining its concise format.
read point-by-point responses
-
Referee: Abstract: The central quantitative claims (5-15% gains in cultural relevance metrics and >98% retention of global performance) are presented without any description of the exact metrics, baselines, statistical significance tests, or controls for confounding factors, preventing assessment of whether the data support the effectiveness of GG-EZ.
Authors: We agree that the abstract would benefit from greater specificity on these elements to facilitate immediate evaluation. The full manuscript details the cultural relevance metrics (SEA-specific expert-annotated scores and automated proxies), baselines (vanilla VL models and alternative adaptation techniques), statistical significance (paired t-tests with p < 0.05 reported in results tables), and controls (matched global task sets and ablation studies) in Sections 3 and 4. To address the comment, we will revise the abstract to briefly reference the metric types, use of statistical testing, and control benchmarks. This change will be incorporated in the next version. revision: yes
-
Referee: Abstract: The GG-EZ method is described only at high level ('regional data filtering and model merging') with no specification of the merging operator, regional data selection criteria, or the precise global benchmarks used to certify performance retention; this directly undermines evaluation of the weakest assumption that the approach avoids unintended global capability erosion.
Authors: We acknowledge that the high-level phrasing in the abstract leaves key implementation details implicit. The manuscript specifies the merging operator (task-vector weighted averaging), regional data selection (metadata-based filtering combined with cultural relevance thresholding), and global benchmarks (VQA, captioning on COCO, and standard VL classification tasks) in Section 2.2 and the experimental protocol. We will update the abstract to include concise references to these aspects (e.g., noting 'task-vector merging' and 'retention verified on global VL benchmarks') to better substantiate the no-erosion claim. This revision will be made. revision: yes
Circularity Check
No circularity: empirical method validation with no derivations or self-referential reductions
full rationale
The paper introduces Anthropogenic Regional Adaptation as a paradigm and GG-EZ as a method (regional filtering + merging), then reports experimental gains on SEA cultural metrics while retaining global performance across three VL architectures. No equations, derivations, fitted parameters renamed as predictions, or load-bearing self-citations appear in the provided text. Claims rest on direct experimental reporting rather than reducing by construction to inputs or prior author work. This is a standard empirical contribution with independent content.
Axiom & Free-Parameter Ledger
axioms (2)
- domain assumption Regional cultural contexts can be effectively captured and optimized through data filtering and model merging without degrading global capabilities.
- domain assumption Existing VL architectures are amenable to the same adaptation strategy.
invented entities (1)
-
Anthropogenic Regional Adaptation
no independent evidence
Reference graph
Works this paper leans on
-
[1]
Transactions of the Association for Computational Linguistics9, 1116– 1131 (2021)
Adelani, D.I., Abbott, J., Neubig, G., D’Souza, D., Kreutzer, J., Lignos, C., Palen- Michel, C., Rijhwani, S., et al.: Masakhaner: Named entity recognition for african languages. Transactions of the Association for Computational Linguistics9, 1116– 1131 (2021)
2021
-
[2]
Towards Measuring and Modeling ``Culture'' in LLM s: A Survey
Adilazuarda, M.F., Mukherjee, S., Lavania, P., Singh, S.S., Aji, A.F., O’Neill, J., Modi, A., Choudhury, M.: Towards measuring and modeling “culture” in LLMs: A survey. In: Al-Onaizan, Y., Bansal, M., Chen, Y.N. (eds.) Proceed- ings of the 2024 Conference on Empirical Methods in Natural Language Process- ing. pp. 15763–15784. Association for Computational...
-
[3]
In: Potdar, S., Rojas- Barahona, L., Montella, S
Agarwal, A., Meghwani, H., Patel, H.L., Sheng, T., Ravi, S., Roth, D.: Aligning LLMs for multilingual consistency in enterprise applications. In: Potdar, S., Rojas- Barahona, L., Montella, S. (eds.) Proceedings of the 2025 Conference on Empirical MethodsinNaturalLanguageProcessing:IndustryTrack.pp.117–137.Association for Computational Linguistics, Suzhou ...
2025
-
[4]
Agrawal, P., Antoniak, S., Hanna, E.B., Bout, B., Chaplot, D., Chudnovsky, J., Costa, D., Monicault, B.D., Garg, S., Gervet, T., Ghosh, S., Héliou, A., Jacob, P., Jiang, A.Q., Khandelwal, K., Lacroix, T., Lample, G., Casas, D.L., Lavril, T., Scao, T.L., Lo, A., Marshall, W., Martin, L., Mensch, A., Muddireddy, P., Nemy- chnikova, V., Pellat, M., Platen, P...
work page internal anchor Pith review arXiv 2024
-
[5]
Alam, N., Kanjula, K.R., Guthikonda, S., Chung, T., Vegesna, B.K.S., Das, A., Susevski,A.,Chan,R.S.Y.,Uddin,S.M.I.,Islam,S.B.,Santhosh,R.,A,S.,Sharma, D., Liu, C., Chaturvedi, I., Winata, G.I., S, A., Mukherjee, S., Aji, A.F.: Maya: An instruction finetuned multilingual multimodal model (2024),https://arxiv. org/abs/2412.07112
-
[6]
arXiv preprint arXiv:2512.05959 (2025)
Anugraha, D., Irawan, P.A., Singh, A., Lee, E.S.A., Winata, G.I.: M4-rag: A massive-scale multilingual multi-cultural multimodal rag. arXiv preprint arXiv:2512.05959 (2025)
-
[7]
a critical analysis of globalization indices
Axel, D., Noel, G., Pim, M., Lotte, V.B.: Measuring globalization opening the black box. a critical analysis of globalization indices. Journal of Globalization Studies 1(1), 166–185 (2010)
2010
-
[8]
Bai, S., Cai, Y., Chen, R., Chen, K., Chen, X., Cheng, Z., Deng, L., Ding, W., Gao, C., Ge, C., Ge, W., Guo, Z., Huang, Q., Huang, J., Huang, F., Hui, B., Jiang, S., Li, Z., Li, M., Li, M., Li, K., Lin, Z., Lin, J., Liu, X., Liu, J., Liu, C., Liu, Y., Liu, D., 16 Cahyawijaya et al. Liu, S., Lu, D., Luo, R., Lv, C., Men, R., Meng, L., Ren, X., Ren, X., Son...
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[9]
Bai, S., Chen, K., Liu, X., Wang, J., Ge, W., Song, S., Dang, K., Wang, P., Wang, S., Tang, J., Zhong, H., Zhu, Y., Yang, M., Li, Z., Wan, J., Wang, P., Ding, W., Fu, Z., Xu, Y., Ye, J., Zhang, X., Xie, T., Cheng, Z., Zhang, H., Yang, Z., Xu, H., Lin, J.: Qwen2.5-vl technical report (2025),https://arxiv.org/abs/2502.13923
work page internal anchor Pith review Pith/arXiv arXiv 2025
- [10]
-
[11]
In: Chiruzzo, L., Ritter, A., Wang, L
Cahyawijaya, S., Chen, D., Bang, Y., Khalatbari, L., Wilie, B., Ji, Z., Ishii, E., Fung, P.: High-dimension human value representation in large language models. In: Chiruzzo, L., Ritter, A., Wang, L. (eds.) Proceedings of the 2025 Confer- ence of the Nations of the Americas Chapter of the Association for Computa- tional Linguistics: Human Language Technol...
-
[12]
Cahyawijaya, S., Lovenia, H., Aji, A.F., Winata, G., Wilie, B., Koto, F., Ma- hendra, R., Wibisono, C., Romadhony, A., Vincentio, K., Santoso, J., Moeljadi, D., Wirawan, C., Hudi, F., Wicaksono, M.S., Parmonangan, I., Alfina, I., Putra, I.F., Rahmadani, S., Oenang, Y., Septiandri, A., Jaya, J., Dhole, K., Suryani, A., Putri, R.A., Su, D., Stevens, K., Nit...
-
[13]
In: Park, J.C., Arase, Y., Hu, B., Lu, W., Wijaya, D., Purwarianti, A., Krisnadhi, A.A
Cahyawijaya, S., Lovenia, H., Koto, F., Adhista, D., Dave, E., Oktavianti, S., Akbar, S., Lee, J., Shadieq, N., Cenggoro, T.W., Linuwih, H., Wilie, B., Muri- dan, G., Winata, G., Moeljadi, D., Aji, A.F., Purwarianti, A., Fung, P.: Nu- saWrites: Constructing high-quality corpora for underrepresented and extremely low-resource languages. In: Park, J.C., Ara...
-
[15]
Cahyawijaya, S., Lovenia, H., Moniz, J.R.A., Wong, T.H., Farhansyah, M.R., Maung, T.T., Hudi, F., Anugraha, D., Habibi, M.R.S., Qorib, M.R., Agarwal, A., Imperial, J.M., Patel, H.L., Feliren, V., Nasution, B.I., Rufino, M.A., Winata, G.I., Rajagede, R.A., Catalan, C.R., Imam, M.F.M., Pattnayak, P., Pranida, S.Z., Pratama, K., Bangera, Y., Na-Thalang, A., ...
-
[16]
Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) pp
Cahyawijaya, S., Lovenia, H., Moniz, J.R.A., Wong, T.H., Farhansyah, M.R., Maung, T.T., Hudi, F., Anugraha, D., Habibi, M.R.S., Qorib, M.R., et al.: Sea-vl: A multicultural vision-language dataset for southeast asia. Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) pp. 18685–18717 (2025)
2025
-
[17]
In: Duh, K., Gomez, H., Bethard, S
Cecilia Liu, C., Koto, F., Baldwin, T., Gurevych, I.: Are multilingual LLMs culturally-diverse reasoners? an investigation into multicultural proverbs and say- ings. In: Duh, K., Gomez, H., Bethard, S. (eds.) Proceedings of the 2024 Con- ference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Vo...
2024
-
[18]
https://doi.org/10.18653/v1/2024.naacl-long.112,https://aclanthology
Association for Computational Linguistics, Mexico City, Mexico (Jun 2024). https://doi.org/10.18653/v1/2024.naacl-long.112,https://aclanthology. org/2024.naacl-long.112/
-
[19]
Cohere, T., :, Aakanksha, Ahmadian, A., Ahmed, M., Alammar, J., Alizadeh, M., Alnumay, Y., Althammer, S., Arkhangorodsky, A., Aryabumi, V., Aumiller, D., Avalos, R., Aviv, Z., Bae, S., Baji, S., Barbet, A., Bartolo, M., Bebensee, B., Beladia, N., Beller-Morales, W., Bérard, A., Berneshawi, A., Bialas, A., Blunsom, P., Bobkin, M., Bongale, A., Braun, S., B...
-
[20]
Dash, S., Nan, Y., Dang, J., Ahmadian, A., Singh, S., Smith, M., Venkitesh, B., Shmyhlo, V., Aryabumi, V., Beller-Morales, W., Pekmez, J., Ozuzu, J., Richemond, P.H., Locatelli, A., Frosst, N., Blunsom, P., Gomez, A., Zhang, I., Fadaee, M., Govindassamy, M., Roy, S., Gallé, M., Ermis, B., Üstün, A., Hooker,S.:Ayavision:Advancingthefrontierofmultilingualmu...
2026
-
[21]
Deitke, M., Clark, C., Lee, S., Tripathi, R., Yang, Y., Park, J.S., Salehi, M., Muen- nighoff, N., Lo, K., Soldaini, L., Lu, J., Anderson, T., Bransom, E., Ehsani, K., Ngo, H., Chen, Y., Patel, A., Yatskar, M., Callison-Burch, C., Head, A., Hen- drix, R., Bastani, F., VanderBilt, E., Lambert, N., Chou, Y., Chheda, A., Sparks, J., Skjonsberg, S., Schmitz, ...
work page internal anchor Pith review arXiv 2024
-
[22]
Figge, L., Martens, P.: Globalisation continues: The maastricht globalisation index revisited and updated. Globalizations11(6), 875–893 (2014).https://doi.org/ 10.1080/14747731.2014.887389,https://doi.org/10.1080/14747731.2014. 887389
-
[23]
Grattafiori, A., Dubey, A., Jauhri, A., Pandey, A., Kadian, A., Al-Dahle, A., Let- man, A., Mathur, A., Schelten, A., Vaughan, A., Yang, A., Fan, A., Goyal, A., Hartshorn, A., Yang, A., Mitra, A., Sravankumar, A., Korenev, A., Hinsvark, A., Rao, A., Zhang, A., Rodriguez, A., Gregerson, A., Spataru, A., Roziere, B., Biron, B., Tang, B., Chern, B., Cauchete...
work page internal anchor Pith review Pith/arXiv arXiv 2024
-
[24]
Available: http://dx.doi.org/10.1038/s41586-025-09422-z
Guo, D., Yang, D., Zhang, H., Song, J., Wang, P., Zhu, Q., Xu, R., Zhang, R., Ma, S., Bi, X., Zhang, X., Yu, X., Wu, Y., Wu, Z.F., Gou, Z., Shao, Z., Li, Z., Gao, Z., Liu, A., Xue, B., Wang, B., Wu, B., Feng, B., Lu, C., Zhao, C., Deng, C., Ruan, C., Dai, D., Chen, D., Ji, D., Li, E., Lin, F., Dai, F., Luo, F., Hao, G., Chen, G., Li, G., Zhang, H., Xu, H....
- [25]
-
[26]
The Review of International Organizations14(3), 543–574 (Jan 2019)
Gygli, S., Haelg, F., Potrafke, N., Sturm, J.E.: The kof globalisation index – re- visited. The Review of International Organizations14(3), 543–574 (Jan 2019). https://doi.org/10.1007/s11558-019-09344-2,http://dx.doi.org/10.1007/ s11558-019-09344-2
-
[27]
Haelg, F.: The kof globalisation index – a multidimensional approach to glob- alisation. Jahrbücher für Nationalökonomie und Statistik240(5), 691–696 (Sep 2019).https://doi.org/10.1515/jbnst- 2019- 0045,http://dx.doi.org/10. 1515/jbnst-2019-0045
- [28]
-
[29]
Hu, X., Wang, R., Fang, Y., Fu, B., Cheng, P., Yu, G.: Ella: Equip diffusion models with llm for enhanced semantic alignment (2024),https://arxiv.org/abs/2403. 05135
2024
-
[30]
10685–10706 (2024)
Jain,A.,etal.:Ai4bharatindicbert:Amonolingualbertmodelforindianlanguages pp. 10685–10706 (2024)
2024
- [31]
-
[32]
In: Rogers, A., Boyd-Graber, J., Okazaki, N
Kabra, A., Liu, E., Khanuja, S., Aji, A.F., Winata, G., Cahyawijaya, S., Aremu, A., Ogayo, P., Neubig, G.: Multi-lingual and multi-cultural figurative language understanding. In: Rogers, A., Boyd-Graber, J., Okazaki, N. (eds.) Findings of the Association for Computational Linguistics: ACL 2023. pp. 8269–8284. As- sociation for Computational Linguistics, T...
-
[33]
10685–10706 (2024)
Khan, F., et al.: Ai4bharat indicbert: A monolingual bert model for indian lan- guages pp. 10685–10706 (2024)
2024
-
[34]
Command-A-Translate: Raising the Bar of Machine Translation with Difficulty Filtering
Kocmi, T., Arkhangorodsky, A., Berard, A., Blunsom, P., Cahyawijaya, S., De- haze, T., Fadaee, M., Frosst, N., Galle, M., Gomez, A., Govindarajan, N., Ko, W.Y., Kreutzer, J., Marchisio, K., Üstün, A., Vincent, S., Zhang, I.: Command-a- translate: Raising the bar of machine translation with difficulty filtering. In: Had- dow, B., Kocmi, T., Koehn, P., Monz...
-
[35]
In: Proceedings of the ACM SIGOPS 29th Symposium on Operating Systems Principles (2023)
Kwon, W., Li, Z., Zhuang, S., Sheng, Y., Zheng, L., Yu, C.H., Gonzalez, J.E., Zhang, H., Stoica, I.: Efficient memory management for large language model serv- ing with pagedattention. In: Proceedings of the ACM SIGOPS 29th Symposium on Operating Systems Principles (2023)
2023
-
[36]
Labs, C.: Aya vision benchmark (2025),https://huggingface.co/datasets/ CohereLabs/AyaVisionBench
2025
-
[37]
Liu, C.C., Gurevych, I., Korhonen, A.: Culturally aware and adapted NLP: A taxonomy and a survey of the state of the art. Transactions of the Association for Computational Linguistics13, 652–689 (2025).https://doi.org/10.1162/tacl_ a_00760,https://aclanthology.org/2025.tacl-1.31/ 22 Cahyawijaya et al
-
[38]
Decoupled Weight Decay Regularization
Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. arXiv (2017). https://doi.org/10.48550/arxiv.1711.05101
work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.1711.05101 2017
-
[39]
In: Al-Onaizan, Y., Bansal, M., Chen, Y.N
Lovenia, H., Mahendra, R., Akbar, S.M., Miranda, L.J.V., Santoso, J., Aco, E., Fadhilah, A., Mansurov, J., Imperial, J.M., Kampman, O.P., Moniz, J.R.A., Habibi, M.R.S., Hudi, F., Montalan, R., Ignatius, R., Lopo, J.A., Nixon, W., Karls- son, B.F., Jaya, J., Diandaru, R., Gao, Y., Amadeus, P., Wang, B., Cruz, J.C.B., Whitehouse, C., Parmonangan, I.H., Khel...
-
[40]
Martens, P., Raza, M.: The maastricht globalisation index: An update, pp. 279–
-
[41]
Nova Science Publishers, United States (2009)
2009
-
[42]
In: The Thirty-eight Conference on Neural Information Processing Systems Datasets and Benchmarks Track (2024), https://openreview.net/forum?id=E18kRXTGmV
Mogrovejo, D.O.R., Lyu, C., Wibowo, H.A., Góngora, S., Mandal, A., Purkayastha, S., Ortiz-Barajas, J.G., Cueva, E.V., Baek, J., Jeong, S., Hamed, I., Yong, Z.X., Lim, Z.W., Silva, P.M., Dunstan, J., Jouitteau, M., MEUR, D.L., Nwatu, J., Bat- nasan, G., Otgonbold, M.E., Gochoo, M., Ivetta, G., Benotti, L., Alemany, L.A., Maina, H., Geng, J., Torrent, T.T.,...
2024
-
[43]
In: Sawaf, H., El-Beltagy, S., Zaghouani, W., Magdy, W., Abdelali, A., Tomeh, N., Abu Farha, I., Habash, N., Khalifa, S., Keleg, A., Haddad, H., Zitouni, I., Mrini, K., Almatham, R
Mohamed, A., Alwajih, F., Nagoudi, E.M.B., Inciarte, A., Abdul-Mageed, M.: Violet: A vision-language model for Arabic image captioning with gemini decoder. In: Sawaf, H., El-Beltagy, S., Zaghouani, W., Magdy, W., Abdelali, A., Tomeh, N., Abu Farha, I., Habash, N., Khalifa, S., Keleg, A., Haddad, H., Zitouni, I., Mrini, K., Almatham, R. (eds.) Proceedings ...
2023
-
[44]
In: Ku, L.W., Martins, A., Srikumar, V
Naous,T.,Ryan,M.J.,Ritter,A.,Xu,W.:Havingbeerafterprayer?measuringcul- tural bias in large language models. In: Ku, L.W., Martins, A., Srikumar, V. (eds.) Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). pp. 16366–16393. Association for Computa- Anthropogenic Regional Adaptation 23 tional Lin...
2024
-
[45]
In: Inui, K., Sakti, S., Wang, H., Wong, D.F., Bhattacharyya, P., Banerjee, B., Ekbal, A., Chakraborty, T., Singh, D.P
Ng, R., Nguyen, T.N., Yuli, H., Chia, T.N., Yi, L.W., Leong, W.Q., Yong, X., Ngui, J.G.,Susanto,Y.,Cheng,N.,Rengarajan,H.,Limkonchotiwat,P.,Hulagadri,A.V., Teng, K.W., Tong, Y.Y., Siow, B., Teo, W.Y., Meng, T.C., Ong, B., Ong, Z.H., Montalan, J.R., Chan, A., Antonyrex, S., Lee, R., Choa, E., Tat-Wee, D.O., Liu, B.J.D., Tjhi, W.C., Cambria, E., Teo, L.: SE...
2025
-
[46]
Nguyen, T.S., Qorib, M.R., Ng, H.T.: Openseal: Good, fast, and cheap construction of an open-source southeast asian llm via parallel data (2026),https://arxiv. org/abs/2602.02266
-
[47]
Math-Shepherd: Verify and reinforce LLMs step-by-step without human annotations
Nguyen, X.P., Zhang, W., Li, X., Aljunied, M., Hu, Z., Shen, C., Chia, Y.K., Li, X., Wang, J., Tan, Q., Cheng, L., Chen, G., Deng, Y., Yang, S., Liu, C., Zhang, H., Bing, L.: SeaLLMs - large language models for Southeast Asia. In: Cao, Y., Feng, Y., Xiong, D. (eds.) Proceedings of ACL. pp. 294–304 (Aug 2024). https://doi.org/10.18653/v1/2024.acl- demos.28...
-
[48]
Nyandwi,J.D.D.,Song,Y.,Khanuja,S.,Neubig,G.:Groundingmultilingualmulti- modalLLMswithculturalknowledge.In:Christodoulopoulos,C.,Chakraborty,T., Rose, C., Peng, V. (eds.) Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing. pp. 24187–24231. Association for Computational Linguistics, Suzhou, China (Nov 2025).https://doi.or...
-
[49]
Nyborg, J., Pelletier, C., Lefèvre, S., Assent, I.: Timematch: Unsupervised cross- region adaptation by temporal shift estimation. ISPRS Journal of Photogram- metry and Remote Sensing188, 301–313 (2022).https://doi.org/https: //doi.org/10.1016/j.isprsjprs.2022.04.018,https://www.sciencedirect. com/science/article/pii/S0924271622001216
-
[50]
Patel, H.L., Agarwal, A., Das, A., Kumar, B., Panda, S., Pattnayak, P., Rafi, T.H., Kumar, T., Chae, D.K.: Sweeval: Do llms really swear? a safety benchmark for testing limits for enterprise use. In: Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volu...
2025
-
[51]
Singapore, A.: Sea-lion (southeast asian languages in one network): A family of large language models for southeast asia.https://github.com/aisingapore/ sealion(2024)
2024
-
[52]
arXiv preprint arXiv:2412.03555 (2024) 1
Steiner, A.e.a.: Paligemma 2: A family of versatile vlms for transfer. arXiv preprint arXiv:2412.03555 (2024)
-
[53]
Gemini: A Family of Highly Capable Multimodal Models
Team, G., Anil, R., Borgeaud, S., Alayrac, J.B., Yu, J., Soricut, R., Schalkwyk, J., Dai, A.M., Hauth, A., Millican, K., Silver, D., Johnson, M., Antonoglou, I., Schrittwieser, J., Glaese, A., Chen, J., Pitler, E., Lillicrap, T., Lazaridou, A., Firat, O., Molloy, J., Isard, M., Barham, P.R., Hennigan, T., Lee, B., Viola, F., Reynolds, M., Xu, Y., Doherty,...
work page internal anchor Pith review Pith/arXiv arXiv 2023
-
[54]
Team, G., Kamath, A., Ferret, J., Pathak, S., Vieillard, N., Merhej, R., Perrin, S., Matejovicova, T., Ramé, A., Rivière, M., Rouillard, L., Mesnard, T., Cideron, G., bastien Grill, J., Ramos, S., Yvinec, E., Casbon, M., Pot, E., Penchev, I., Liu, G., Visin, F., Kenealy, K., Beyer, L., Zhai, X., Tsitsulin, A., Busa-Fekete, R., Feng, A., Sachdeva, N., Cole...
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[55]
No Language Left Behind: Scaling Human-Centered Machine Translation
Team, N., Costa-jussà, M.R., Cross, J., Çelebi, O., Elbayad, M., Heafield, K., Hef- fernan,K.,Kalbassi,E.,Lam,J.,Licht,D.,Maillard,J.,Sun,A.,Wang,S.,Wenzek, G., Youngblood, A., Akula, B., Barrault, L., Gonzalez, G.M., Hansanti, P., Hoff- man, J., Jarrett, S., Sadagopan, K.R., Rowe, D., Spruit, S., Tran, C., Andrews, P., Ayan, N.F., Bhosale, S., Edunov, S....
work page internal anchor Pith review arXiv 2022
-
[56]
In: Gu, J., Fu, T.J.R., Hudson, D., Celikyilmaz, A., Wang, W
Urailertprasert, N., Limkonchotiwat, P., Suwajanakorn, S., Nutanong, S.: SEA- VQA: Southeast Asian cultural context dataset for visual question answering. In: Gu, J., Fu, T.J.R., Hudson, D., Celikyilmaz, A., Wang, W. (eds.) Proceedings of the 3rd Workshop on Advances in Language and Vision Research (ALVR). pp. 173–
-
[57]
https://doi.org/10.18653/v1/2024.alvr-1.15,https://aclanthology.org/ 2024.alvr-1.15/
Association for Computational Linguistics, Bangkok, Thailand (Aug 2024). https://doi.org/10.18653/v1/2024.alvr-1.15,https://aclanthology.org/ 2024.alvr-1.15/
-
[58]
arXiv preprint arXiv:2411.02538 (2025) Anthropogenic Regional Adaptation 25
Verma, S., Khanuja, M.S.U.R., Kumar, V., Murthy, R., Sen, J.: Milu: A multi-task indic language understanding benchmark. arXiv preprint arXiv:2411.02538 (2025) Anthropogenic Regional Adaptation 25
-
[59]
Wang, H., Yao, Y., Liu, J., Zhang, X., Zhao, Y., Li, S., Liu, Z., Zhang, X., Zeng, Y.: Unsupervised cross-regional and cross-year adaptation by climate in- dicator discrepancy for crop classification. Journal of Remote Sensing5(Jan 2025).https://doi.org/10.34133/remotesensing.0439,http://dx.doi.org/ 10.34133/remotesensing.0439
-
[60]
5997–6023 (2024)
Wang, J., Adelani, D.I., et al.: Afrimte and africomet: Enhancing comet to embrace under-resourced african languages pp. 5997–6023 (2024)
2024
-
[61]
Qwen2-VL: Enhancing Vision-Language Model's Perception of the World at Any Resolution
Wang, P., Bai, S., Tan, S., Wang, S., Fan, Z., Bai, J., Chen, K., Liu, X., Wang, J., Ge, W., Fan, Y., Dang, K., Du, M., Ren, X., Men, R., Liu, D., Zhou, C., Zhou, J., Lin, J.: Qwen2-vl: Enhancing vision-language model’s perception of the world at any resolution. arXiv preprint arXiv:2409.12191 (2024)
work page internal anchor Pith review Pith/arXiv arXiv 2024
-
[62]
Wang, Y., Zang, Y., Li, H., Jin, C., Wang, J.: Unified reward model for multimodal understanding and generation (2026),https://arxiv.org/abs/2503.05236
work page internal anchor Pith review arXiv 2026
-
[63]
In: Vlachos, A., Augenstein, I
Winata, G.I., Aji, A.F., Cahyawijaya, S., Mahendra, R., Koto, F., Romadhony, A., Kurniawan, K., Moeljadi, D., Prasojo, R.E., Fung, P., Baldwin, T., Lau, J.H., Sennrich, R., Ruder, S.: NusaX: Multilingual parallel sentiment dataset for 10 Indonesian local languages. In: Vlachos, A., Augenstein, I. (eds.) Proceedings of the 17th Conference of the European C...
-
[64]
In: Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers)
Winata, G.I., Hudi, F., Irawan, P.A., Anugraha, D., Putri, R.A., Yutong, W., Nohejl, A., Prathama, U.A., Ousidhoum, N., Amriani, A., et al.: Worldcuisines: A massive-scale benchmark for multilingual and multicultural visual question an- swering on global cuisines. In: Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Associa...
2025
-
[65]
arXiv preprint arXiv:2405.14133 (2024)
Winata, G.I., et al.: Worldcuisines: A benchmark dataset for multilingual and multicultural image classification. arXiv preprint arXiv:2405.14133 (2024)
-
[66]
Wu, X., Hao, Y., Sun, K., Chen, Y., Zhu, F., Zhao, R., Li, H.: Human preference score v2: A solid benchmark for evaluating human preferences of text-to-image synthesis (2023),https://arxiv.org/abs/2306.09341
work page internal anchor Pith review arXiv 2023
-
[67]
Xu, J., Huang, Y., Cheng, J., Yang, Y., Xu, J., Wang, Y., Duan, W., Yang, S., Jin, Q., Li, S., Teng, J., Yang, Z., Zheng, W., Liu, X., Zhang, D., Ding, M., Zhang, X., Gu, X., Huang, S., Huang, M., Tang, J., Dong, Y.: Visionreward: Fine-grained multi-dimensional human preference learning for image and video generation (2026),https://arxiv.org/abs/2412.21059
-
[68]
Advances in Neural Information Processing Systems36, 15903–15935 (2023)
Xu, J., Liu, X., Wu, Y., Tong, Y., Li, Q., Ding, M., Tang, J., Dong, Y.: Imagere- ward: Learning and evaluating human preferences for text-to-image generation. Advances in Neural Information Processing Systems36, 15903–15935 (2023)
2023
-
[69]
arXiv preprint arXiv:2410.16153 , year=
Yue, X., Song, Y., Asai, A., Kim, S., de Dieu Nyandwi, J., Khanuja, S., Kan- tharuban, A., Sutawika, L., Ramamoorthy, S., Neubig, G.: Pangea: A fully open multilingual multimodal llm for 39 languages. arXiv preprint arXiv:2410.16153 (2024),https://arxiv.org/abs/2410.16153
-
[70]
Zhang, Y., Li, M., Long, D., Zhang, X., Lin, H., Yang, B., Xie, P., Yang, A., Liu, D., Lin, J., Huang, F., Zhou, J.: Qwen3 embedding: Advancing text embedding and reranking through foundation models (2025),https://arxiv.org/abs/2506. 05176 Appendix A Assessment for SEA languages Translation Quality Table 6:Human evaluation of English→5 SEA languages trans...
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.