LaViSA: A Language and Vision Structural Ambiguity Benchmark

Koichiro Yoshino; Lee Sangmyeong; Shun Inadumi

arxiv: 2606.19552 · v1 · pith:ZKQYAIYYnew · submitted 2026-06-17 · 💻 cs.CL

LaViSA: A Language and Vision Structural Ambiguity Benchmark

Lee Sangmyeong , Shun Inadumi , Koichiro Yoshino This is my paper

Pith reviewed 2026-06-26 20:39 UTC · model grok-4.3

classification 💻 cs.CL

keywords structural ambiguityvision-language modelsbenchmarksentence disambiguationvisual groundingmultimodal evaluationsyntactic ambiguity

0 comments

The pith

Vision-language models use images to resolve some sentence ambiguities but still fail on specific types and subtle distinctions.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces LaViSA, a benchmark containing ambiguous sentences, their disambiguated versions, and corresponding images across seven structural ambiguity categories. It evaluates a range of vision-language models to determine how effectively they use visual scenes to select the correct interpretation. Results indicate that models gain some benefit from the images yet continue to struggle with particular ambiguity kinds and fine visual semantic differences. This setup matters because structural ambiguity is a core challenge in language understanding, and visual grounding offers one potential way to address it in real-world multimodal settings.

Core claim

LaViSA evaluates the ability of VLMs to resolve structural ambiguity leveraging visual scenes. It consists of ambiguous sentences, their disambiguated sentences, and corresponding images of these disambiguated sentences across seven ambiguity categories. Experimental results show that although recent VLMs can leverage visual scenes to resolve structural ambiguity to some extent, they still struggle with certain ambiguity types and visually subtle semantic distinctions, indicating remaining limitations in resolving structural ambiguity using visual scenes.

What carries the argument

LaViSA benchmark, which supplies structurally ambiguous sentences paired with disambiguated interpretations and their unique visual depictions across seven categories.

If this is right

VLMs exhibit uneven performance across the seven ambiguity categories, with some types remaining harder even when visuals are provided.
Visually subtle semantic distinctions continue to limit model accuracy regardless of overall model scale or reasoning ability.
Current visual-linguistic integration in models is insufficient for full resolution of structural ambiguity in all cases.
Targeted improvements in how models map visual details to syntactic alternatives would be required to close the observed gaps.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The benchmark could be extended to dynamic scenes or video to test whether motion cues further aid disambiguation.
Results point toward the value of training objectives that explicitly reward alignment between syntactic parses and visual features.
Similar datasets in additional languages would allow testing whether the observed limitations are language-specific or general.
The findings connect to broader efforts to make multimodal systems robust when linguistic input alone is under-specified.

Load-bearing premise

The images created for each disambiguated sentence accurately and uniquely represent only that interpretation without introducing new visual ambiguities or biases.

What would settle it

Re-running the model evaluations on a fresh set of images for the same sentences, or having independent raters judge whether each image uniquely matches only one interpretation, would show whether performance differences stem from the benchmark design itself.

Figures

Figures reproduced from arXiv: 2606.19552 by Koichiro Yoshino, Lee Sangmyeong, Shun Inadumi.

**Figure 2.** Figure 2: Overview of the visual disambiguation task [PITH_FULL_IMAGE:figures/full_fig_p003_2.png] view at source ↗

**Figure 3.** Figure 3: Examples from LaViSA across ambiguity categories. Each ambiguous sentence is paired with two or three disambiguated sentences and corresponding visual scenes. A direct evaluation methodology for visual disambiguation would be to require VLMs to generate disambiguated sentences. However, automatic evaluation metrics that compare generated sentences with references, such as BERTScore (Zhang et al., 2020), … view at source ↗

**Figure 4.** Figure 4: Data collection process of LaViSA. Each step [PITH_FULL_IMAGE:figures/full_fig_p004_4.png] view at source ↗

**Figure 5.** Figure 5: Error cases involving visually salient but structurally misleading objects. Pink annotations indicate the [PITH_FULL_IMAGE:figures/full_fig_p007_5.png] view at source ↗

**Figure 6.** Figure 6: Error cases from the Ellip category. The definition of pink and purple follows that of Figure [PITH_FULL_IMAGE:figures/full_fig_p008_6.png] view at source ↗

read the original abstract

Structural ambiguity arises when a single sentence admits multiple valid interpretations due to its syntactic structure, posing a fundamental challenge for language understanding. Visual scenes serve as useful cues for resolving such ambiguity, and Vision and Language Models (VLMs) need to be capable of deriving possible semantic interpretations from visual scenes. We introduce Language and Vision Structural Ambiguity (LaViSA), a benchmark designed to evaluate the ability of VLMs to resolve structural ambiguity leveraging visual scenes. LaViSA consists of ambiguous sentences, their disambiguated sentences, and corresponding images of these disambiguated sentences across seven ambiguity categories. Using LaViSA, we conduct a comprehensive evaluation of diverse VLMs, including both proprietary and open-source models with varying parameter scales and reasoning capabilities. Experimental results show that although recent VLMs can leverage visual scenes to resolve structural ambiguity to a some extent, they still struggle with certain ambiguity types and visually subtle semantic distinctions, indicating remaining limitations in resolving structural ambiguity using visual scenes.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

LaViSA adds a new benchmark with seven ambiguity categories and image-sentence pairs, but the reported model struggles rest on images that may not cleanly isolate the intended readings.

read the letter

The main thing to know is that this paper creates LaViSA, a benchmark of ambiguous sentences, their disambiguated versions, and paired images across seven structural ambiguity categories, then tests a range of VLMs on it. The headline result is that models can use visuals to resolve some ambiguity but still fall short on certain types and subtle distinctions.

What is new is the explicit seven-category breakdown and the release of matched image-sentence data for this exact task. Prior work touched on visual disambiguation, but this structured collection did not exist. The paper does a reasonable job of running the same protocol across proprietary and open models of different sizes, which gives a consistent picture of where current systems are weak.

The soft spot is the image construction. The central claim that models struggle with structural ambiguity using visual scenes only holds if each image uniquely supports one reading and does not introduce its own ambiguities or biases. The abstract gives no information on how the images were created, checked for uniqueness, or validated by multiple annotators. If an image allows the alternative parse or adds unrelated visual cues, the error rates cannot be attributed cleanly to the intended limitation. This is a load-bearing assumption, and without those checks the results are hard to interpret.

This paper is for people working on multimodal language understanding who need a focused test set for structural ambiguity. A reader building or evaluating VLMs would get practical value from the dataset if the construction details hold up. It deserves a serious referee because a properly validated benchmark in this niche can be useful even if the current version needs tightening on the visual side. I would send it to review to see the full construction protocol and any agreement statistics.

Referee Report

2 major / 2 minor

Summary. The paper introduces LaViSA, a benchmark of ambiguous sentences, their disambiguated variants, and corresponding images across seven structural ambiguity categories. It evaluates a range of VLMs (proprietary and open-source) and reports that models can leverage visual scenes to resolve ambiguity to some extent but continue to struggle with certain categories and visually subtle semantic distinctions.

Significance. If the image-sentence pairs are shown to cleanly isolate the intended readings, the benchmark would offer a useful diagnostic for multimodal structural ambiguity resolution, an area where current VLMs remain limited; the work also supplies an explicit dataset that could support future controlled experiments.

major comments (2)

[§3 and §4] §3 (Benchmark Construction) and §4 (Image Generation): the manuscript provides no inter-annotator agreement scores, no validation protocol confirming that each image uniquely supports only the target disambiguation, and no checks that images avoid introducing new visual ambiguities or exploitable biases. Because every performance number flows through these image-sentence pairs, this validation gap directly undermines attribution of model errors to limitations in 'resolving structural ambiguity using visual scenes.'
[§5] §5 (Experiments): results are summarized at the level of 'struggle with certain ambiguity types' without per-category accuracy tables, statistical significance tests, or error analysis that isolates whether failures stem from syntactic ambiguity versus low-level visual recognition. This makes it impossible to assess whether the central claim holds after controlling for image quality.

minor comments (2)

[Abstract] Abstract: 'to a some extent' is a typographical error.
[Related Work] The paper does not cite prior benchmarks on syntactic ambiguity (e.g., PP-attachment or coordination ambiguity datasets) or recent VLM work on visual grounding of syntax.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback. We address each major comment below and will revise the manuscript accordingly to improve the reporting of validation procedures and experimental details.

read point-by-point responses

Referee: [§3 and §4] §3 (Benchmark Construction) and §4 (Image Generation): the manuscript provides no inter-annotator agreement scores, no validation protocol confirming that each image uniquely supports only the target disambiguation, and no checks that images avoid introducing new visual ambiguities or exploitable biases. Because every performance number flows through these image-sentence pairs, this validation gap directly undermines attribution of model errors to limitations in 'resolving structural ambiguity using visual scenes.'

Authors: We acknowledge that the manuscript does not explicitly report inter-annotator agreement or a formal validation protocol. During benchmark construction, multiple annotators reviewed each image-sentence pair to confirm alignment with the target disambiguation and to flag potential new visual ambiguities, but these steps were not documented in detail. We will add a subsection to §3 describing the full annotation and validation process, including agreement scores and bias checks. This revision will strengthen the basis for attributing model performance to structural ambiguity resolution. revision: yes
Referee: [§5] §5 (Experiments): results are summarized at the level of 'struggle with certain ambiguity types' without per-category accuracy tables, statistical significance tests, or error analysis that isolates whether failures stem from syntactic ambiguity versus low-level visual recognition. This makes it impossible to assess whether the central claim holds after controlling for image quality.

Authors: We agree that more granular reporting would strengthen the experimental section. While the manuscript includes some category-level observations, we will expand §5 to include complete per-category accuracy tables, statistical significance tests across models and categories, and a dedicated error analysis distinguishing visual recognition issues from ambiguity resolution failures. These additions will better support evaluation of the central claims. revision: yes

Circularity Check

0 steps flagged

Empirical benchmark evaluation with no derivation chain

full rationale

The paper introduces LaViSA as a benchmark consisting of ambiguous sentences, disambiguated sentences, and corresponding images across seven categories, then reports direct empirical evaluations of VLMs on this dataset. No equations, fitted parameters, predictions derived from inputs, or self-citation chains appear in the manuscript. All reported results are measurements on the constructed benchmark rather than reductions of claims to prior self-defined quantities, rendering the work self-contained with no circularity in any derivation.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The benchmark rests on the domain assumption that visual scenes can disambiguate syntactic structure and that the seven categories are representative; no free parameters or invented entities are introduced.

axioms (1)

domain assumption Visual scenes provide sufficient cues to resolve syntactic structural ambiguity in language.
Stated in the abstract as the core premise for the benchmark design.

pith-pipeline@v0.9.1-grok · 5701 in / 1136 out tokens · 15303 ms · 2026-06-26T20:39:53.585851+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

82 extracted references · 10 canonical work pages

[1]

and Ullman, Jeffrey D

Aho, Alfred V. and Ullman, Jeffrey D. , title =. 1972 , isbn =

1972
[2]

Publications Manual , year =
[3]

Chandra and Dexter C

Ashok K. Chandra and Dexter C. Kozen and Larry J. Stockmeyer , year =. Alternation , journal =
[4]

Scalable training of

Andrew, Galen and Gao, Jianfeng , booktitle=. Scalable training of. 2007 , url =

2007
[5]

1997 , publisher =

Dan Gusfield , title =. 1997 , publisher =

1997
[6]

Tetreault , year =

Mohammad Sadegh Rasooli and Joel R. Tetreault , year =
[7]

Journal of Machine Learning Research , year =

Rie Kubota Ando and Tong Zhang , title =. Journal of Machine Learning Research , year =
[8]

Studia Linguistica , year=

Ambiguity in Linguistics1 , author=. Studia Linguistica , year=
[9]

Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition , year=

Winoground: Probing Vision and Language Models for Visio-Linguistic Compositionality , author=. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition , year=
[10]

Proceedings of the 11th International Conference on Learning Representations , year=

When and why vision-language models behave like bags-of-words, and what to do about it? , author=. Proceedings of the 11th International Conference on Learning Representations , year=
[11]

Do You See What

Berzak, Yevgeni and Barbu, Andrei and Harari, Daniel and Katz, Boris and Ullman, Shimon , booktitle =. Do You See What. 2015 , url =. doi:10.18653/v1/D15-1172 , pages =

work page doi:10.18653/v1/d15-1172 2015
[12]

Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics , year =

Resolving Ambiguities in Text-to-Image Generative Models , author =. Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics , year =
[13]

Computer Science , volume=

Improving image generation with better captions , author=. Computer Science , volume=. 2023 , url=

2023
[14]

2023 , howpublished=

Hierarchical Text-Conditional Image Generation with CLIP Latents , author=. 2023 , howpublished=

2023
[15]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , year=

Rombach, Robin and Blattmann, Andreas and Lorenz, Dominik and Esser, Patrick and Ommer, Bj\"orn , title =. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , year=
[16]

Proceedings of the 38th International Conference on Machine Learning , pages =

Learning Transferable Visual Models From Natural Language Supervision , author =. Proceedings of the 38th International Conference on Machine Learning , pages =. 2021 , volume =

2021
[17]

Proceedings of the IEEE/CVF International Conference on Computer Vision , year =

Zhai, Xiaohua and Mustafa, Basil and Kolesnikov, Alexander and Beyer, Lucas , title =. Proceedings of the IEEE/CVF International Conference on Computer Vision , year =
[18]

2024 , howpublished=

Building and better understanding vision-language models: insights and future directions , author=. 2024 , howpublished=

2024
[19]

2023 , url =

Tang, Yingtian and Yamada, Yutaro and Zhang, Yoyo and Yildirim, Ilker , booktitle =. 2023 , url =. doi:10.18653/v1/2023.emnlp-main.886 , pages =

work page doi:10.18653/v1/2023.emnlp-main.886 2023
[20]

An Examination of the Compositionality of Large Generative Vision-Language Models

Ma, Teli and Li, Rong and Liang, Junwei. An Examination of the Compositionality of Large Generative Vision-Language Models. Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2024. doi:10.18653/v1/2024.naacl-long.39

work page doi:10.18653/v1/2024.naacl-long.39 2024
[21]

2024 , url=

Irene Huang and Wei Lin and Muhammad Jehanzeb Mirza and Jacob A Hansen and Sivan Doveh and Victor Ion Butoi and Roei Herzig and Assaf Arbelle and Hilde Kuehne and Trevor Darrell and Chuang Gan and Aude Oliva and Rogerio Feris and Leonid Karlinsky , booktitle=. 2024 , url=

2024
[22]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , year =

Mitra, Chancharik and Huang, Brandon and Darrell, Trevor and Herzig, Roei , title =. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , year =
[23]

Holographic

Yamaki, Ryosuke and Taniguchi, Tadahiro and Mochihashi, Daichi , booktitle =. Holographic. 2023 , url =

2023
[24]

Michael Tschannen and Alexey Gritsenko and Xiao Wang and Muhammad Ferjad Naeem and Ibrahim M. Alabdulmohsin and Nikhil Parthasarathy and Talfan Evans and Lucas Beyer and Ye Xia and Basil Mustafa and Olivier H'enaff and Jeremiah Harmsen and Andreas Steiner and Xiao-Qi Zhai , year=
[25]

Demystifying

Hu Xu and Saining Xie and Xiaoqing Tan and Po-Yao Huang and Russell Howes and Vasu Sharma and Shang-Wen Li and Gargi Ghosh and Luke Zettlemoyer and Christoph Feichtenhofer , booktitle=. Demystifying. 2024 , url=

2024
[26]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , year =

Cherti, Mehdi and Beaumont, Romain and Wightman, Ross and Wortsman, Mitchell and Ilharco, Gabriel and Gordon, Cade and Schuhmann, Christoph and Schmidt, Ludwig and Jitsev, Jenia , title =. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , year =
[27]

Quan Sun and Yuxin Fang and Ledell Yu Wu and Xinlong Wang and Yue Cao , year=
[28]

Weinberger and Yoav Artzi , booktitle=

Tianyi Zhang and Varsha Kishore and Felix Wu and Kilian Q. Weinberger and Yoav Artzi , booktitle=. 2020 , url=

2020
[29]

2013 , url =

Banarescu, Laura and Bonial, Claire and Cai, Shu and Georgescu, Madalina and Griffitt, Kira and Hermjakob, Ulf and Knight, Kevin and Koehn, Philipp and Palmer, Martha and Schneider, Nathan , booktitle =. 2013 , url =

2013
[30]

2013 , url =

Cai, Shu and Knight, Kevin , booktitle =. 2013 , url =

2013
[31]

Bleu: a method for automatic evaluation of machine translation

Papineni, Kishore and Roukos, Salim and Ward, Todd and Zhu, Wei-Jing , booktitle =. 2002 , url =. doi:10.3115/1073083.1073135 , pages =

work page doi:10.3115/1073083.1073135 2002
[32]

Liu, Haotian and Li, Chunyuan and Li, Yuheng and Li, Bo and Zhang, Yuanhan and Shen, Sheng and Lee, Yong Jae , year=
[33]

2025 , howpublished=

Gemma 3 Technical Report , author=. 2025 , howpublished=

2025
[34]

Shuai Bai and Keqin Chen and Xuejing Liu and Jialin Wang and Wenbin Ge and Sibo Song and Kai Dang and Peng Wang and Shijie Wang and Jun Tang and Humen Zhong and Yuanzhi Zhu and Mingkun Yang and Zhaohai Li and Jianqiang Wan and Pengfei Wang and Wei Ding and Zheren Fu and Yiheng Xu and Jiabo Ye and Xi Zhang and Tianbao Xie and Zesen Cheng and Hang Zhang and...
[35]

Pixtral 12

Pravesh Agrawal and Szymon Antoniak and Emma Bou Hanna and Devendra Singh Chaplot and Jessica Chudnovsky and Saurabh Garg and Th. Pixtral 12. 2024 , howpublished=

2024
[36]

2024 , howpublished=

2024
[37]

Proceedings of the 12th Conference of the

Learning to Interpret Utterances Using Dialogue History , author =. Proceedings of the 12th Conference of the. 2009 , url =

2009
[38]

Speech Recognition and Meaning Interpretation: Towards Disambiguation of Structurally Ambiguous Spoken Utterances in

Widiaputri, Ruhiyah and Purwarianti, Ayu and Lestari, Dessi and Azizah, Kurniawati and Tanaya, Dipta and Sakti, Sakriani , booktitle =. Speech Recognition and Meaning Interpretation: Towards Disambiguation of Structurally Ambiguous Spoken Utterances in. 2023 , url =. doi:10.18653/v1/2023.emnlp-main.1045 , pages =

work page doi:10.18653/v1/2023.emnlp-main.1045 2023
[39]

Proceedings of the 31st International Conference on Computational Linguistics , year =

Does Vision Accelerate Hierarchical Generalization in Neural Language Learners? , author =. Proceedings of the 31st International Conference on Computational Linguistics , year =
[40]

Frontiers in Psychology , year=

Why Is There So Much More Research on Vision Than on Any Other Sensory Modality? , author=. Frontiers in Psychology , year=
[41]

User Intent Recognition and Satisfaction with Large Language Models: A User Study with

Anna Bodonhelyi and Efe Bozkir and Shuo Yang and Enkelejda Kasneci and Gjergji Kasneci , howpublished =. User Intent Recognition and Satisfaction with Large Language Models: A User Study with. 2024 , url =

2024
[42]

2024 , howpublished =

Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context , author =. 2024 , howpublished =

2024
[43]

Gemini 3.1

Google , howpublished=. Gemini 3.1. 2025 , url=

2025
[44]

Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition , year=

Deep Residual Learning for Image Recognition , author=. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition , year=
[45]

An Image is Worth

Alexey Dosovitskiy and Lucas Beyer and Alexander Kolesnikov and Dirk Weissenborn and Xiaohua Zhai and Thomas Unterthiner and Mostafa Dehghani and Matthias Minderer and Georg Heigold and Sylvain Gelly and Jakob Uszkoreit and Neil Houlsby , booktitle=. An Image is Worth. 2020 , url=

2020
[46]

Zhuang Liu and Hanzi Mao and Chaozheng Wu and Christoph Feichtenhofer and Trevor Darrell and Saining Xie , booktitle=. A. 2022 , pages=

2022
[47]

Zettlemoyer and Xinlei Chen and Zhuang Liu and Saining Xie and Wen-tau Yih and Shang-Wen Li and Hu Xu , year=

Yung-Sung Chuang and Yang Li and Dong Wang and Ching-Feng Yeh and Kehan Lyu and Ramya Raghavendra and James Glass and Lifei Huang and Jason Weston and Luke S. Zettlemoyer and Xinlei Chen and Zhuang Liu and Saining Xie and Wen-tau Yih and Shang-Wen Li and Hu Xu , year=. Meta
[48]

Journal of the American Statistical Association , year=

Probable Inference, the Law of Succession, and Statistical Inference , author=. Journal of the American Statistical Association , year=
[49]

When are Lemons Purple? The Concept Association Bias of

Yutaro Yamada and Yingtian Tang and Ilker Yildirim , booktitle =. When are Lemons Purple? The Concept Association Bias of. 2023 , pages =

2023
[50]

2025 , number=

Jiwan Chung and Seungwon Lim and Sangkyu Lee and Youngjae Yu , booktitle=. 2025 , number=

2025
[51]

2025 , url =

Xiaolong Wang and Zhaolu Kang and Wangyuxuan Zhai and Xinyue Lou and Yunghwei Lai and Ziyue Wang and Yawen Wang and Kaiyu Huang and Yile Wang and Peng Li and Yang Liu , booktitle =. 2025 , url =

2025
[52]

Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing , pages =

Can visual language models resolve textual ambiguity with visual cues? Let visual puns tell you! , author =. Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing , pages =. 2024 , url =

2024
[53]

Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation , year =

Shun Inadumi and Seiya Kawano and Akishige Yuguchi and Yasutomo Kawanishi and Koichiro Yoshino , title =. Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation , year =

2024
[54]

OpenAI , howpublished=
[55]

2025 , url=

Shuai Bai and Yuxuan Cai and Ruizhe Chen and Keqin Chen and Xiong-Hui Chen and Zesen Cheng and Lianghao Deng and Wei Ding and Rongyao Fang and Chang Gao and Chunjiang Ge and Wenbin Ge and Zhifang Guo and Qidong Huang and Jie Huang and Fei Huang and Binyuan Hui and Shutong Jiang and Zhaohai Li and Mingsheng Li and Mei Li and Kaixin Li and Zicheng Lin and J...

2025
[56]

2020 , booktitle =

A Simple Framework for Contrastive Learning of Visual Representations , author =. 2020 , booktitle =

2020
[57]

Proceedings of the 34th International Conference on Neural Information Processing Systems , year =

Supervised Contrastive Learning , author =. Proceedings of the 34th International Conference on Neural Information Processing Systems , year =
[58]

Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing , pages =

Learning Semantic Representations of Users and Products for Document Level Sentiment Classification , author=. Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing , pages =. 2015 , url=

2015
[59]

Understanding Natural Language Commands for Robotic Navigation and Mobile Manipulation , booktitle=

Tellex, Stefanie and Kollar, Thomas and Dickerson, Steven and Walter, Matthew and Banerjee, Ashis and Teller, Seth and Roy, Nicholas , year=. Understanding Natural Language Commands for Robotic Navigation and Mobile Manipulation , booktitle=
[60]

Proceedings of the Robotics Science and Systems , year=

Asking for Help Using Inverse Semantics , author=. Proceedings of the Robotics Science and Systems , year=
[61]

2020 , pages=

Mohit Shridhar and Jesse Thomason and Daniel Gordon and Yonatan Bisk and Winson Han and Roozbeh Mottaghi and Luke Zettlemoyer and Dieter Fox , booktitle=. 2020 , pages=

2020
[62]

Artificial Intelligence , year=

Connecting language to the world , author=. Artificial Intelligence , year=. doi:https://doi.org/10.1016/j.artint.2005.06.002 , url =

work page doi:10.1016/j.artint.2005.06.002 2005
[63]

2005 , doi =

Word sense disambiguation with pictures , journal =. 2005 , doi =

2005
[64]

2005 , doi =

Choosing words in computer-generated weather forecasts , journal =. 2005 , doi =

2005
[65]

2005 , doi =

Semiotic schemas: A framework for grounding language in action and perception , journal =. 2005 , doi =

2005
[66]

2024 , howpublished=

DeepSeek-V3 Technical Report , author=. 2024 , howpublished=

2024
[67]

Bo Li and Yuanhan Zhang and Dong Guo and Renrui Zhang and Feng Li and Hao Zhang and Kaichen Zhang and Yanwei Li and Ziwei Liu and Chunyuan Li , year=
[68]

Structural Ambiguity and Lexical Relations

Hindle, Donald and Rooth, Mats. Structural Ambiguity and Lexical Relations. Computational Linguistics. 1993

1993
[69]

Aspects of the Theory of Syntax , year =

Noam Chomsky , publisher =. Aspects of the Theory of Syntax , year =
[70]

Proceedings of the 10th Linguistic Annotation Workshop held in conjunction with

Generating Disambiguating Paraphrases for Structurally Ambiguous Sentences , author =. Proceedings of the 10th Linguistic Annotation Workshop held in conjunction with. 2016 , url =. doi:10.18653/v1/W16-1718 , pages =

work page doi:10.18653/v1/w16-1718 2016
[71]

Why Did the Chicken Cross the Road? Rephrasing and Analyzing Ambiguous Questions in

Stengel-Eskin, Elias and Guallar-Blasco, Jimena and Zhou, Yi and Van Durme, Benjamin , booktitle =. Why Did the Chicken Cross the Road? Rephrasing and Analyzing Ambiguous Questions in. 2023 , url =. doi:10.18653/v1/2023.acl-long.569 , pages =

work page doi:10.18653/v1/2023.acl-long.569 2023
[72]

Challenges of Syntactic Ambiguity in ESL Learners: A Case Study of QUEST Nawabshah , volume =

Chandio, Ali and Oad, Shan and Tunio, Wazir , journal =. Challenges of Syntactic Ambiguity in ESL Learners: A Case Study of QUEST Nawabshah , volume =. 2025 , pages =

2025
[73]

When the

Samuel Joseph Amouyal and Aya Meltzer-Asscher and Jonathan Berant , booktitle=. When the. 2025 , url =. doi:10.18653/v1/2025.acl-long.403 , pages =

work page doi:10.18653/v1/2025.acl-long.403 2025
[74]

2024 , howpublished=

Syntactic Surprisal From Neural Models Predicts, But Underestimates, Human Processing Difficulty From Syntactic Ambiguities , author=. 2024 , howpublished=

2024
[75]

Language, Cognition and Neuroscience , year=

On the parsing of garden-path sentences , author=. Language, Cognition and Neuroscience , year=
[76]

Quarterly Journal of Experimental Psychology , year=

The interaction of visual and linguistic saliency during syntactic ambiguity resolution , author=. Quarterly Journal of Experimental Psychology , year=
[77]

International Conference on Language Resources and Evaluation , year=

Adding Syntactic Annotations to Flickr30k Entities Corpus for Multimodal Ambiguous Prepositional-Phrase Attachment Resolution , author=. International Conference on Language Resources and Evaluation , year=
[78]

2025 , volume =

Kankan Zhou and Eason Lai and Kyriakos Mouratidis and Jing Jiang , booktitle=. 2025 , volume =. doi:10.18653/v1/2025.acl-long.1337 , pages =

work page doi:10.18653/v1/2025.acl-long.1337 2025
[79]

A Safety Report on

Xingjun Ma and Yixu Wang and Hengyuan Xu and Yutao Wu and Yifan Ding and Yunhan Zhao and Zilong Wang and Jiabin Hua and Ming Wen and Jianan Liu and Ranjie Duan and Yifeng Gao and Yingshui Tan and Yunhao Chen and Hui Xue and Xin Wang and Wei Cheng and Jingjing Chen and Zuxuan Wu and Bo Li and Yu-Gang Jiang , year=. A Safety Report on
[80]

Xiang An and Yin Xie and Kaicheng Yang and Wenkang Zhang and Xiuwei Zhao and Zhengxue Cheng and Yirui Wang and Songcen Xu and Changrui Chen and Chun Yat Wu and Huajie Tan and Chunyuan Li and Jing Yang and Jiecao Yu and Xiyao Wang and Bin Qin and Yumeng Wang and Zizhen Yan and Ziyong Feng and Ziwei Liu and Bo Li and Jiankang Deng , year=

Showing first 80 references.

[1] [1]

and Ullman, Jeffrey D

Aho, Alfred V. and Ullman, Jeffrey D. , title =. 1972 , isbn =

1972

[2] [2]

Publications Manual , year =

[3] [3]

Chandra and Dexter C

Ashok K. Chandra and Dexter C. Kozen and Larry J. Stockmeyer , year =. Alternation , journal =

[4] [4]

Scalable training of

Andrew, Galen and Gao, Jianfeng , booktitle=. Scalable training of. 2007 , url =

2007

[5] [5]

1997 , publisher =

Dan Gusfield , title =. 1997 , publisher =

1997

[6] [6]

Tetreault , year =

Mohammad Sadegh Rasooli and Joel R. Tetreault , year =

[7] [7]

Journal of Machine Learning Research , year =

Rie Kubota Ando and Tong Zhang , title =. Journal of Machine Learning Research , year =

[8] [8]

Studia Linguistica , year=

Ambiguity in Linguistics1 , author=. Studia Linguistica , year=

[9] [9]

Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition , year=

Winoground: Probing Vision and Language Models for Visio-Linguistic Compositionality , author=. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition , year=

[10] [10]

Proceedings of the 11th International Conference on Learning Representations , year=

When and why vision-language models behave like bags-of-words, and what to do about it? , author=. Proceedings of the 11th International Conference on Learning Representations , year=

[11] [11]

Do You See What

Berzak, Yevgeni and Barbu, Andrei and Harari, Daniel and Katz, Boris and Ullman, Shimon , booktitle =. Do You See What. 2015 , url =. doi:10.18653/v1/D15-1172 , pages =

work page doi:10.18653/v1/d15-1172 2015

[12] [12]

Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics , year =

Resolving Ambiguities in Text-to-Image Generative Models , author =. Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics , year =

[13] [13]

Computer Science , volume=

Improving image generation with better captions , author=. Computer Science , volume=. 2023 , url=

2023

[14] [14]

2023 , howpublished=

Hierarchical Text-Conditional Image Generation with CLIP Latents , author=. 2023 , howpublished=

2023

[15] [15]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , year=

Rombach, Robin and Blattmann, Andreas and Lorenz, Dominik and Esser, Patrick and Ommer, Bj\"orn , title =. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , year=

[16] [16]

Proceedings of the 38th International Conference on Machine Learning , pages =

Learning Transferable Visual Models From Natural Language Supervision , author =. Proceedings of the 38th International Conference on Machine Learning , pages =. 2021 , volume =

2021

[17] [17]

Proceedings of the IEEE/CVF International Conference on Computer Vision , year =

Zhai, Xiaohua and Mustafa, Basil and Kolesnikov, Alexander and Beyer, Lucas , title =. Proceedings of the IEEE/CVF International Conference on Computer Vision , year =

[18] [18]

2024 , howpublished=

Building and better understanding vision-language models: insights and future directions , author=. 2024 , howpublished=

2024

[19] [19]

2023 , url =

Tang, Yingtian and Yamada, Yutaro and Zhang, Yoyo and Yildirim, Ilker , booktitle =. 2023 , url =. doi:10.18653/v1/2023.emnlp-main.886 , pages =

work page doi:10.18653/v1/2023.emnlp-main.886 2023

[20] [20]

An Examination of the Compositionality of Large Generative Vision-Language Models

Ma, Teli and Li, Rong and Liang, Junwei. An Examination of the Compositionality of Large Generative Vision-Language Models. Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2024. doi:10.18653/v1/2024.naacl-long.39

work page doi:10.18653/v1/2024.naacl-long.39 2024

[21] [21]

2024 , url=

Irene Huang and Wei Lin and Muhammad Jehanzeb Mirza and Jacob A Hansen and Sivan Doveh and Victor Ion Butoi and Roei Herzig and Assaf Arbelle and Hilde Kuehne and Trevor Darrell and Chuang Gan and Aude Oliva and Rogerio Feris and Leonid Karlinsky , booktitle=. 2024 , url=

2024

[22] [22]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , year =

Mitra, Chancharik and Huang, Brandon and Darrell, Trevor and Herzig, Roei , title =. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , year =

[23] [23]

Holographic

Yamaki, Ryosuke and Taniguchi, Tadahiro and Mochihashi, Daichi , booktitle =. Holographic. 2023 , url =

2023

[24] [24]

Michael Tschannen and Alexey Gritsenko and Xiao Wang and Muhammad Ferjad Naeem and Ibrahim M. Alabdulmohsin and Nikhil Parthasarathy and Talfan Evans and Lucas Beyer and Ye Xia and Basil Mustafa and Olivier H'enaff and Jeremiah Harmsen and Andreas Steiner and Xiao-Qi Zhai , year=

[25] [25]

Demystifying

Hu Xu and Saining Xie and Xiaoqing Tan and Po-Yao Huang and Russell Howes and Vasu Sharma and Shang-Wen Li and Gargi Ghosh and Luke Zettlemoyer and Christoph Feichtenhofer , booktitle=. Demystifying. 2024 , url=

2024

[26] [26]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , year =

Cherti, Mehdi and Beaumont, Romain and Wightman, Ross and Wortsman, Mitchell and Ilharco, Gabriel and Gordon, Cade and Schuhmann, Christoph and Schmidt, Ludwig and Jitsev, Jenia , title =. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , year =

[27] [27]

Quan Sun and Yuxin Fang and Ledell Yu Wu and Xinlong Wang and Yue Cao , year=

[28] [28]

Weinberger and Yoav Artzi , booktitle=

Tianyi Zhang and Varsha Kishore and Felix Wu and Kilian Q. Weinberger and Yoav Artzi , booktitle=. 2020 , url=

2020

[29] [29]

2013 , url =

Banarescu, Laura and Bonial, Claire and Cai, Shu and Georgescu, Madalina and Griffitt, Kira and Hermjakob, Ulf and Knight, Kevin and Koehn, Philipp and Palmer, Martha and Schneider, Nathan , booktitle =. 2013 , url =

2013

[30] [30]

2013 , url =

Cai, Shu and Knight, Kevin , booktitle =. 2013 , url =

2013

[31] [31]

Bleu: a method for automatic evaluation of machine translation

Papineni, Kishore and Roukos, Salim and Ward, Todd and Zhu, Wei-Jing , booktitle =. 2002 , url =. doi:10.3115/1073083.1073135 , pages =

work page doi:10.3115/1073083.1073135 2002

[32] [32]

Liu, Haotian and Li, Chunyuan and Li, Yuheng and Li, Bo and Zhang, Yuanhan and Shen, Sheng and Lee, Yong Jae , year=

[33] [33]

2025 , howpublished=

Gemma 3 Technical Report , author=. 2025 , howpublished=

2025

[34] [34]

Shuai Bai and Keqin Chen and Xuejing Liu and Jialin Wang and Wenbin Ge and Sibo Song and Kai Dang and Peng Wang and Shijie Wang and Jun Tang and Humen Zhong and Yuanzhi Zhu and Mingkun Yang and Zhaohai Li and Jianqiang Wan and Pengfei Wang and Wei Ding and Zheren Fu and Yiheng Xu and Jiabo Ye and Xi Zhang and Tianbao Xie and Zesen Cheng and Hang Zhang and...

[35] [35]

Pixtral 12

Pravesh Agrawal and Szymon Antoniak and Emma Bou Hanna and Devendra Singh Chaplot and Jessica Chudnovsky and Saurabh Garg and Th. Pixtral 12. 2024 , howpublished=

2024

[36] [36]

2024 , howpublished=

2024

[37] [37]

Proceedings of the 12th Conference of the

Learning to Interpret Utterances Using Dialogue History , author =. Proceedings of the 12th Conference of the. 2009 , url =

2009

[38] [38]

Speech Recognition and Meaning Interpretation: Towards Disambiguation of Structurally Ambiguous Spoken Utterances in

Widiaputri, Ruhiyah and Purwarianti, Ayu and Lestari, Dessi and Azizah, Kurniawati and Tanaya, Dipta and Sakti, Sakriani , booktitle =. Speech Recognition and Meaning Interpretation: Towards Disambiguation of Structurally Ambiguous Spoken Utterances in. 2023 , url =. doi:10.18653/v1/2023.emnlp-main.1045 , pages =

work page doi:10.18653/v1/2023.emnlp-main.1045 2023

[39] [39]

Proceedings of the 31st International Conference on Computational Linguistics , year =

Does Vision Accelerate Hierarchical Generalization in Neural Language Learners? , author =. Proceedings of the 31st International Conference on Computational Linguistics , year =

[40] [40]

Frontiers in Psychology , year=

Why Is There So Much More Research on Vision Than on Any Other Sensory Modality? , author=. Frontiers in Psychology , year=

[41] [41]

User Intent Recognition and Satisfaction with Large Language Models: A User Study with

Anna Bodonhelyi and Efe Bozkir and Shuo Yang and Enkelejda Kasneci and Gjergji Kasneci , howpublished =. User Intent Recognition and Satisfaction with Large Language Models: A User Study with. 2024 , url =

2024

[42] [42]

2024 , howpublished =

Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context , author =. 2024 , howpublished =

2024

[43] [43]

Gemini 3.1

Google , howpublished=. Gemini 3.1. 2025 , url=

2025

[44] [44]

Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition , year=

Deep Residual Learning for Image Recognition , author=. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition , year=

[45] [45]

An Image is Worth

Alexey Dosovitskiy and Lucas Beyer and Alexander Kolesnikov and Dirk Weissenborn and Xiaohua Zhai and Thomas Unterthiner and Mostafa Dehghani and Matthias Minderer and Georg Heigold and Sylvain Gelly and Jakob Uszkoreit and Neil Houlsby , booktitle=. An Image is Worth. 2020 , url=

2020

[46] [46]

Zhuang Liu and Hanzi Mao and Chaozheng Wu and Christoph Feichtenhofer and Trevor Darrell and Saining Xie , booktitle=. A. 2022 , pages=

2022

[47] [47]

Zettlemoyer and Xinlei Chen and Zhuang Liu and Saining Xie and Wen-tau Yih and Shang-Wen Li and Hu Xu , year=

Yung-Sung Chuang and Yang Li and Dong Wang and Ching-Feng Yeh and Kehan Lyu and Ramya Raghavendra and James Glass and Lifei Huang and Jason Weston and Luke S. Zettlemoyer and Xinlei Chen and Zhuang Liu and Saining Xie and Wen-tau Yih and Shang-Wen Li and Hu Xu , year=. Meta

[48] [48]

Journal of the American Statistical Association , year=

Probable Inference, the Law of Succession, and Statistical Inference , author=. Journal of the American Statistical Association , year=

[49] [49]

When are Lemons Purple? The Concept Association Bias of

Yutaro Yamada and Yingtian Tang and Ilker Yildirim , booktitle =. When are Lemons Purple? The Concept Association Bias of. 2023 , pages =

2023

[50] [50]

2025 , number=

Jiwan Chung and Seungwon Lim and Sangkyu Lee and Youngjae Yu , booktitle=. 2025 , number=

2025

[51] [51]

2025 , url =

Xiaolong Wang and Zhaolu Kang and Wangyuxuan Zhai and Xinyue Lou and Yunghwei Lai and Ziyue Wang and Yawen Wang and Kaiyu Huang and Yile Wang and Peng Li and Yang Liu , booktitle =. 2025 , url =

2025

[52] [52]

Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing , pages =

Can visual language models resolve textual ambiguity with visual cues? Let visual puns tell you! , author =. Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing , pages =. 2024 , url =

2024

[53] [53]

Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation , year =

Shun Inadumi and Seiya Kawano and Akishige Yuguchi and Yasutomo Kawanishi and Koichiro Yoshino , title =. Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation , year =

2024

[54] [54]

OpenAI , howpublished=

[55] [55]

2025 , url=

Shuai Bai and Yuxuan Cai and Ruizhe Chen and Keqin Chen and Xiong-Hui Chen and Zesen Cheng and Lianghao Deng and Wei Ding and Rongyao Fang and Chang Gao and Chunjiang Ge and Wenbin Ge and Zhifang Guo and Qidong Huang and Jie Huang and Fei Huang and Binyuan Hui and Shutong Jiang and Zhaohai Li and Mingsheng Li and Mei Li and Kaixin Li and Zicheng Lin and J...

2025

[56] [56]

2020 , booktitle =

A Simple Framework for Contrastive Learning of Visual Representations , author =. 2020 , booktitle =

2020

[57] [57]

Proceedings of the 34th International Conference on Neural Information Processing Systems , year =

Supervised Contrastive Learning , author =. Proceedings of the 34th International Conference on Neural Information Processing Systems , year =

[58] [58]

Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing , pages =

Learning Semantic Representations of Users and Products for Document Level Sentiment Classification , author=. Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing , pages =. 2015 , url=

2015

[59] [59]

Understanding Natural Language Commands for Robotic Navigation and Mobile Manipulation , booktitle=

Tellex, Stefanie and Kollar, Thomas and Dickerson, Steven and Walter, Matthew and Banerjee, Ashis and Teller, Seth and Roy, Nicholas , year=. Understanding Natural Language Commands for Robotic Navigation and Mobile Manipulation , booktitle=

[60] [60]

Proceedings of the Robotics Science and Systems , year=

Asking for Help Using Inverse Semantics , author=. Proceedings of the Robotics Science and Systems , year=

[61] [61]

2020 , pages=

Mohit Shridhar and Jesse Thomason and Daniel Gordon and Yonatan Bisk and Winson Han and Roozbeh Mottaghi and Luke Zettlemoyer and Dieter Fox , booktitle=. 2020 , pages=

2020

[62] [62]

Artificial Intelligence , year=

Connecting language to the world , author=. Artificial Intelligence , year=. doi:https://doi.org/10.1016/j.artint.2005.06.002 , url =

work page doi:10.1016/j.artint.2005.06.002 2005

[63] [63]

2005 , doi =

Word sense disambiguation with pictures , journal =. 2005 , doi =

2005

[64] [64]

2005 , doi =

Choosing words in computer-generated weather forecasts , journal =. 2005 , doi =

2005

[65] [65]

2005 , doi =

Semiotic schemas: A framework for grounding language in action and perception , journal =. 2005 , doi =

2005

[66] [66]

2024 , howpublished=

DeepSeek-V3 Technical Report , author=. 2024 , howpublished=

2024

[67] [67]

Bo Li and Yuanhan Zhang and Dong Guo and Renrui Zhang and Feng Li and Hao Zhang and Kaichen Zhang and Yanwei Li and Ziwei Liu and Chunyuan Li , year=

[68] [68]

Structural Ambiguity and Lexical Relations

Hindle, Donald and Rooth, Mats. Structural Ambiguity and Lexical Relations. Computational Linguistics. 1993

1993

[69] [69]

Aspects of the Theory of Syntax , year =

Noam Chomsky , publisher =. Aspects of the Theory of Syntax , year =

[70] [70]

Proceedings of the 10th Linguistic Annotation Workshop held in conjunction with

Generating Disambiguating Paraphrases for Structurally Ambiguous Sentences , author =. Proceedings of the 10th Linguistic Annotation Workshop held in conjunction with. 2016 , url =. doi:10.18653/v1/W16-1718 , pages =

work page doi:10.18653/v1/w16-1718 2016

[71] [71]

Why Did the Chicken Cross the Road? Rephrasing and Analyzing Ambiguous Questions in

Stengel-Eskin, Elias and Guallar-Blasco, Jimena and Zhou, Yi and Van Durme, Benjamin , booktitle =. Why Did the Chicken Cross the Road? Rephrasing and Analyzing Ambiguous Questions in. 2023 , url =. doi:10.18653/v1/2023.acl-long.569 , pages =

work page doi:10.18653/v1/2023.acl-long.569 2023

[72] [72]

Challenges of Syntactic Ambiguity in ESL Learners: A Case Study of QUEST Nawabshah , volume =

Chandio, Ali and Oad, Shan and Tunio, Wazir , journal =. Challenges of Syntactic Ambiguity in ESL Learners: A Case Study of QUEST Nawabshah , volume =. 2025 , pages =

2025

[73] [73]

When the

Samuel Joseph Amouyal and Aya Meltzer-Asscher and Jonathan Berant , booktitle=. When the. 2025 , url =. doi:10.18653/v1/2025.acl-long.403 , pages =

work page doi:10.18653/v1/2025.acl-long.403 2025

[74] [74]

2024 , howpublished=

Syntactic Surprisal From Neural Models Predicts, But Underestimates, Human Processing Difficulty From Syntactic Ambiguities , author=. 2024 , howpublished=

2024

[75] [75]

Language, Cognition and Neuroscience , year=

On the parsing of garden-path sentences , author=. Language, Cognition and Neuroscience , year=

[76] [76]

Quarterly Journal of Experimental Psychology , year=

The interaction of visual and linguistic saliency during syntactic ambiguity resolution , author=. Quarterly Journal of Experimental Psychology , year=

[77] [77]

International Conference on Language Resources and Evaluation , year=

Adding Syntactic Annotations to Flickr30k Entities Corpus for Multimodal Ambiguous Prepositional-Phrase Attachment Resolution , author=. International Conference on Language Resources and Evaluation , year=

[78] [78]

2025 , volume =

Kankan Zhou and Eason Lai and Kyriakos Mouratidis and Jing Jiang , booktitle=. 2025 , volume =. doi:10.18653/v1/2025.acl-long.1337 , pages =

work page doi:10.18653/v1/2025.acl-long.1337 2025

[79] [79]

A Safety Report on

Xingjun Ma and Yixu Wang and Hengyuan Xu and Yutao Wu and Yifan Ding and Yunhan Zhao and Zilong Wang and Jiabin Hua and Ming Wen and Jianan Liu and Ranjie Duan and Yifeng Gao and Yingshui Tan and Yunhao Chen and Hui Xue and Xin Wang and Wei Cheng and Jingjing Chen and Zuxuan Wu and Bo Li and Yu-Gang Jiang , year=. A Safety Report on

[80] [80]

Xiang An and Yin Xie and Kaicheng Yang and Wenkang Zhang and Xiuwei Zhao and Zhengxue Cheng and Yirui Wang and Songcen Xu and Changrui Chen and Chun Yat Wu and Huajie Tan and Chunyuan Li and Jing Yang and Jiecao Yu and Xiyao Wang and Bin Qin and Yumeng Wang and Zizhen Yan and Ziyong Feng and Ziwei Liu and Bo Li and Jiankang Deng , year=