Recognition: no theorem link
CommonWhy: A Dataset for Evaluating Entity-Based Causal Commonsense Reasoning in Large Language Models
Pith reviewed 2026-05-14 20:29 UTC · model grok-4.3
The pith
CommonWhy introduces 15,000 why questions that test whether LLMs can combine specific entity facts with causal commonsense inference
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The paper presents CommonWhy, a dataset of 15,000 why questions that evaluate LLMs on entity-based commonsense reasoning about causal relationships. Every query is answerable from information already present in the Wikidata knowledge graph, turning the task into a KGQA benchmark that targets causal inference rather than simple fact lookup. Tests on state-of-the-art LLMs and LLM-based KGQA methods show frequent factual hallucinations together with failures to perform the required causal reasoning.
What carries the argument
The CommonWhy dataset of why questions, which forces models to retrieve entity facts from Wikidata and then apply causal commonsense to generate answers and explanations
If this is right
- LLMs will continue to generate factually incorrect answers on causal why questions until the underlying retrieval and inference failures are addressed.
- KGQA systems built on LLMs will underperform on tasks that require causal chaining rather than direct fact lookup.
- Explanation quality will remain low because models cannot reliably trace causal links between entities.
- New evaluation protocols for LLMs must include open-ended why questions to expose gaps hidden by true/false formats.
Where Pith is reading between the lines
- Hybrid systems that explicitly connect LLMs to knowledge graphs may need separate modules for causal chaining to reach reliable performance.
- The dataset could be extended to track whether improvements on CommonWhy also improve performance on other forms of abductive reasoning outside Wikidata.
- Future work could measure how much training data volume is required before models stop hallucinating the causal relations tested here.
Load-bearing premise
The questions require genuine integration of entity facts with causal commonsense reasoning instead of being solvable through superficial patterns learned in training.
What would settle it
A model that produces correct answers and explanations on most CommonWhy questions while avoiding factual hallucinations would show that current shortcomings are not as widespread as claimed.
Figures
read the original abstract
To effectively interact with the real world, Large Language Models (LLMs) require entity-based commonsense reasoning, a challenging task that necessitates integrating factual knowledge about specific entities with commonsense inference. Existing datasets for evaluating LLM entity-based commonsense reasoning have largely focused on True/False or multiple-choice questions, leaving the explicit assessment of the model's ability in abductive reasoning about causes and effects and generating explanations largely unexamined. In this work, we introduce CommonWhy, a dataset of 15,000 why questions designed to evaluate entity-based commonsense reasoning about causal relationships in LLMs. CommonWhy also serves as a Knowledge Graph Question Answering (KGQA) benchmark, as all supporting knowledge required to answer its queries is available in the Wikidata knowledge graph. Unlike existing KGQA datasets, which primarily test fact retrieval, CommonWhy targets causal commonsense reasoning, establishing a new paradigm for KGQA evaluation. Experiments with state-of-the-art LLMs and LLM-based KGQA methods reveal their significant shortcomings, including frequent factual hallucinations and failures in causal reasoning.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript introduces CommonWhy, a dataset of 15,000 why-questions for assessing entity-based causal commonsense reasoning in LLMs. The questions are designed to require integration of specific entity facts from Wikidata with abductive causal inference, serving also as a KGQA benchmark beyond simple fact retrieval. Experiments with SOTA LLMs and KGQA methods highlight shortcomings like hallucinations and causal reasoning failures.
Significance. Should the dataset construction and evaluation hold up under scrutiny, this work could provide a useful new benchmark for probing LLMs on causal reasoning tasks that combine factual knowledge with commonsense, addressing a gap in existing True/False or multiple-choice datasets. The emphasis on generating explanations is a positive aspect.
major comments (3)
- [Dataset Construction] Dataset Construction section: insufficient detail is provided on the question generation process from Wikidata entities and the validation steps used to confirm that questions require genuine integration of entity facts with causal commonsense rather than surface-level patterns.
- [Experiments] Experiments section: no ablation studies (e.g., with vs. without KG access, or performance on paraphrased vs. original questions) are reported to demonstrate that failures cannot be explained by memorized patterns or superficial cues, which is load-bearing for interpreting results as evidence of reasoning deficits.
- [Results] Results and Analysis: quantitative breakdowns of error types (hallucinations vs. causal failures) and inter-annotator agreement for any human validation are missing, weakening support for the claim of significant shortcomings.
minor comments (1)
- [Abstract] Abstract: the exact train/test split and any filtering criteria for the 15,000 questions should be stated explicitly for reproducibility.
Simulated Author's Rebuttal
We thank the referee for the constructive feedback. The comments highlight areas where additional detail and analysis will strengthen the paper. We will revise the manuscript to expand the dataset construction description, incorporate ablation studies, and provide quantitative error breakdowns along with inter-annotator agreement metrics. Below we address each major comment.
read point-by-point responses
-
Referee: [Dataset Construction] Dataset Construction section: insufficient detail is provided on the question generation process from Wikidata entities and the validation steps used to confirm that questions require genuine integration of entity facts with causal commonsense rather than surface-level patterns.
Authors: We agree that the Dataset Construction section would benefit from greater detail. In the revised manuscript we will expand this section with a step-by-step account of entity selection from Wikidata, the hybrid template- and LLM-assisted question generation procedure, and the multi-stage validation protocol (including explicit criteria and examples) used to verify that each question necessitates integration of specific entity facts with abductive causal reasoning rather than surface-level lexical patterns. revision: yes
-
Referee: [Experiments] Experiments section: no ablation studies (e.g., with vs. without KG access, or performance on paraphrased vs. original questions) are reported to demonstrate that failures cannot be explained by memorized patterns or superficial cues, which is load-bearing for interpreting results as evidence of reasoning deficits.
Authors: We concur that ablation studies are important for ruling out alternative explanations. We will add these experiments to the revised manuscript, specifically reporting performance with and without KG retrieval access as well as results on paraphrased question variants. These additions will help isolate whether observed shortcomings stem from reasoning deficits rather than memorization or superficial cues. revision: yes
-
Referee: [Results] Results and Analysis: quantitative breakdowns of error types (hallucinations vs. causal failures) and inter-annotator agreement for any human validation are missing, weakening support for the claim of significant shortcomings.
Authors: We will revise the Results and Analysis section to include quantitative error-type breakdowns derived from manual inspection of a representative sample of model outputs, explicitly separating factual hallucinations from causal reasoning failures. We will also report inter-annotator agreement statistics for the human validation steps performed during dataset construction and error categorization. revision: yes
Circularity Check
No significant circularity in dataset creation and evaluation
full rationale
The paper introduces the CommonWhy dataset of 15,000 why-questions grounded in Wikidata entity facts and causal commonsense, then reports empirical results on LLMs and KGQA methods. No equations, derivations, parameter fitting, or load-bearing self-citations appear in the provided text. All claims rest on direct experimental outcomes rather than any reduction of outputs to inputs by construction, so the work is self-contained.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption All supporting knowledge required to answer the queries is available in the Wikidata knowledge graph
Reference graph
Works this paper leans on
-
[1]
Satanjeev Banerjee and Alon Lavie. 2005. METEOR: An automatic metric for MT evaluation with improved correlation with human judgments. InProceedings of the acl workshop on intrinsic and extrinsic evaluation measures for machine translation and/or summarization. 65–72
work page 2005
-
[2]
Marco Baroni, Armand Joulin, Allan Jabri, German Kruszewski, Angeliki Lazari- dou, Klemen Simonic, and Tomas Mikolov. 2017. CommAI: Evaluating the first steps towards a useful general AI.arXiv preprint arXiv:1701.08954(2017)
work page internal anchor Pith review Pith/arXiv arXiv 2017
-
[3]
Jonathan Berant, Andrew Chou, Roy Frostig, and Percy Liang. 2013. Seman- tic parsing on freebase from question-answer pairs. InProceedings of the 2013 conference on empirical methods in natural language processing. 1533–1544
work page 2013
-
[4]
Jonathan Berant, Andrew Chou, Roy Frostig, and Percy Liang. 2013. Semantic Parsing on Freebase from Question-Answer Pairs. InProceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, Grand Hyatt Seattle, Seattle, Washington, USA, A meeting of SIGDAT, a Special Interest Group of the ACL. ACL, 15...
work page 2013
-
[5]
Lukas Berglund, Meg Tong, Maximilian Kaufmann, Mikita Balesni, Asa Cooper Stickland, Tomasz Korbak, and Owain Evans. 2024. The Reversal Curse: LLMs trained on “A is B” fail to learn “B is A”. InThe Twelfth International Conference on Learning Representations
work page 2024
-
[6]
Yonatan Bisk, Rowan Zellers, Jianfeng Gao, Yejin Choi, et al. 2020. Piqa: Reasoning about physical commonsense in natural language. InProceedings of the AAAI conference on artificial intelligence, Vol. 34. 7432–7439
work page 2020
-
[7]
Kurt Bollacker, Colin Evans, Praveen Paritosh, Tim Sturge, and Jamie Taylor
-
[8]
InProceedings of the 2008 ACM SIGMOD international conference on Management of data
Freebase: a collaboratively created graph database for structuring human knowledge. InProceedings of the 2008 ACM SIGMOD international conference on Management of data. 1247–1250
work page 2008
-
[9]
Ana Brassard, Benjamin Heinzerling, Pride Kavumba, and Kentaro Inui. 2022. COPA-SSE: Semi-structured Explanations for Commonsense Reasoning. InPro- ceedings of the Thirteenth Language Resources and Evaluation Conference. 3994– 4000
work page 2022
-
[10]
Gheorghe Comanici, Eric Bieber, Mike Schaekermann, Ice Pasupat, Noveen Sachdeva, Inderjit Dhillon, Marcel Blistein, Ori Ram, Dan Zhang, Evan Rosen, et al. 2025. Gemini 2.5: Pushing the frontier with advanced reasoning, multi- modality, long context, and next generation agentic capabilities.arXiv preprint arXiv:2507.06261(2025)
work page internal anchor Pith review Pith/arXiv arXiv 2025
- [11]
-
[12]
Ernest Davis and Gary Marcus. 2015. Commonsense reasoning and commonsense knowledge in artificial intelligence.Commun. ACM58, 9 (2015), 92–103
work page 2015
-
[14]
İbrahim Ethem Deveci and Duygu Ataman. 2025. The Ouroboros of Benchmark- ing: Reasoning Evaluation in an Era of Saturation. InNeurIPS 2025 Workshop on Evaluating the Evolving LLM Lifecycle: Benchmarks, Emergent Abilities, and Scaling
work page 2025
-
[15]
Mor Geva, Daniel Khashabi, Elad Segal, Tushar Khot, Dan Roth, and Jonathan Berant. 2021. Did Aristotle Use a Laptop? A Question Answering Benchmark with Implicit Reasoning Strategies.Trans. Assoc. Comput. Linguistics9 (2021), 346–361. doi:10.1162/TACL_A_00370
-
[16]
In: Proceedings of the Web Conference 2021
Yu Gu, Sue Kase, Michelle Vanni, Brian M. Sadler, Percy Liang, Xifeng Yan, and Yu Su. 2021. Beyond I.I.D.: Three Levels of Generalization for Question Answering on Knowledge Bases. InWWW ’21: The Web Conference 2021, Virtual Event / Ljubljana, Slovenia, April 19-23, 2021, Jure Leskovec, Marko Grobelnik, Marc Najork, Jie Tang, and Leila Zia (Eds.). ACM / I...
-
[17]
Xinyan Guan, Yanjiang Liu, Hongyu Lin, Yaojie Lu, Ben He, Xianpei Han, and Le Sun. 2024. Mitigating large language model hallucinations via autonomous knowledge graph-based retrofitting. InProceedings of the AAAI Conference on Artificial Intelligence, Vol. 38. 18126–18134
work page 2024
- [18]
-
[19]
Yanzhu Guo, Guokan Shang, and Chloé Clavel. 2025. Benchmarking linguistic diversity of large language models.Transactions of the Association for Computa- tional Linguistics13 (2025), 1507–1526
work page 2025
-
[20]
Benjamin Heinzerling and Kentaro Inui. 2021. Language models as knowledge bases: On entity representations, storage capacity, and paraphrased queries. In Proceedings of the 16th conference of the european chapter of the association for computational linguistics: Main Volume. 1772–1791
work page 2021
- [21]
-
[22]
Aidan Hogan, Eva Blomqvist, Michael Cochez, Claudia d’Amato, Gerard De Melo, Claudio Gutierrez, Sabrina Kirrane, José Emilio Labra Gayo, Roberto Navigli, Sebastian Neumaier, et al. 2021. Knowledge graphs.ACM Computing Surveys (Csur)54, 4 (2021), 1–37
work page 2021
- [23]
-
[24]
Tianjie Ju, Weiwei Sun, Wei Du, Xinwei Yuan, Zhaochun Ren, and Gongshen Liu. 2024. How Large Language Models Encode Context Knowledge? A Layer- Wise Probing Study. InProceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024). 8235–8246
work page 2024
-
[25]
Gregory Karvounarakis, Sofia Alexaki, Vassilis Christophides, Dimitris Plex- ousakis, and Michel Scholl. 2002. RQL: a declarative query language for RDF. In Proceedings of the 11th international conference on World Wide Web. 592–603
work page 2002
-
[26]
Mehran Kazemi, Najoung Kim, Deepti Bhatia, Xin Xu, and Deepak Ramachandran
-
[27]
Lambada: Backward chaining for automated reasoning in natural language. InProceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 6547–6568
-
[28]
Jens Lehmann, Robert Isele, Max Jakob, Anja Jentzsch, Dimitris Kontokostas, Pablo N Mendes, Sebastian Hellmann, Mohamed Morsey, Patrick Van Kleef, Sören Auer, et al. 2015. DBpedia–a large-scale, multilingual knowledge base extracted from wikipedia.Semantic web6, 2 (2015), 167–195
work page 2015
-
[29]
Tianle Li, Xueguang Ma, Alex Zhuang, Yu Gu, Yu Su, and Wenhu Chen. 2023. Few- shot In-context Learning on Knowledge Base Question Answering. InProceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 6966–6980
work page 2023
-
[30]
Chin-Yew Lin. 2004. Rouge: A package for automatic evaluation of summaries. InText summarization branches out. 74–81
work page 2004
-
[31]
Trond Linjordet and Krisztian Balog. 2022. Would you ask it that way? measuring and improving question naturalness for knowledge graph question answering. InProceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval. 3090–3098. SIGIR ’26, July 20–24, 2026, Melbourne, VIC, Australia Armin Toroghi, Faeze ...
work page 2022
-
[32]
Aixin Liu, Aoxue Mei, Bangcai Lin, Bing Xue, Bingxuan Wang, Bingzheng Xu, Bochao Wu, Bowei Zhang, Chaofan Lin, Chen Dong, et al . 2025. Deepseek- v3. 2: Pushing the frontier of open large language models.arXiv preprint arXiv:2512.02556(2025)
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[33]
Hugo Liu and Push Singh. 2004. ConceptNet—a practical commonsense reasoning tool-kit.BT technology journal22, 4 (2004), 211–226
work page 2004
-
[34]
Adyasha Maharana and Mohit Bansal. 2022. GraDA: Graph generative data augmentation for commonsense reasoning. InProceedings of the 29th International Conference on Computational Linguistics. 4499–4516
work page 2022
-
[35]
Meta AI. 2024. LLaMA 3.3 70B Instruct. https://huggingface.co/meta-llama/ Llama-3.3-70B-Instruct
work page 2024
-
[36]
Sewon Min, Kalpesh Krishna, Xinxi Lyu, Mike Lewis, Wen-tau Yih, Pang Koh, Mohit Iyyer, Luke Zettlemoyer, and Hannaneh Hajishirzi. 2023. FActScore: Fine- grained Atomic Evaluation of Factual Precision in Long Form Text Generation. InProceedings of the 2023 Conference on Empirical Methods in Natural Language Processing. 12076–12100
work page 2023
-
[37]
1982.The role of logic in knowledge representation and common- sense reasoning
Robert C Moore. 1982.The role of logic in knowledge representation and common- sense reasoning. SRI International. Artificial Intelligence Center
work page 1982
-
[38]
Yasumasa Onoe, Michael J. Q. Zhang, Eunsol Choi, and Greg Dur- rett. 2021. CREAK: A Dataset for Commonsense Reasoning over En- tity Knowledge. InProceedings of the Neural Information Processing Sys- tems Track on Datasets and Benchmarks 1, NeurIPS Datasets and Bench- marks 2021, December 2021, virtual, Joaquin Vanschoren and Sai-Kit Yeung (Eds.). https://...
work page 2021
-
[39]
OpenAI. 2024. GPT-4o Technical Report. (2024). https://cdn.openai.com/gpt-4o- system-card.pdf
work page 2024
-
[40]
OpenAI. 2025. GPT-5.1. https://openai.com/index/gpt-5-1/
work page 2025
-
[41]
OpenAI. 2025. OpenAI o3. https://platform.openai.com/docs/models/o3
work page 2025
-
[42]
Edoardo Maria Ponti, Goran Glavaš, Olga Majewska, Qianchu Liu, Ivan Vulić, and Anna Korhonen. 2020. XCOPA: A Multilingual Dataset for Causal Commonsense Reasoning. InProceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP). 2362–2376
work page 2020
- [43]
- [44]
-
[45]
Melissa Roemmele, Cosmin Adrian Bejan, and Andrew S Gordon. 2011. Choice of Plausible Alternatives: An Evaluation of Commonsense Causal Reasoning.. In AAAI spring symposium: logical formalizations of commonsense reasoning. 90–95
work page 2011
-
[46]
Maarten Sap, Ronan Le Bras, Emily Allaway, Chandra Bhagavatula, Nicholas Lourie, Hannah Rashkin, Brendan Roof, Noah A Smith, and Yejin Choi. 2019. Atomic: An atlas of machine commonsense for if-then reasoning. InProceedings of the AAAI conference on artificial intelligence, Vol. 33. 3027–3035
work page 2019
-
[47]
Maarten Sap, Hannah Rashkin, Derek Chen, Ronan Le Bras, and Yejin Choi. 2019. Social IQa: Commonsense Reasoning about Social Interactions. InProceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP- IJCNLP). 4463–4473
work page 2019
-
[48]
Andy Seaborne and Eric Prud’hommeaux. 2008. SPARQL query language for RDF.W3C Recommendation, W3C(2008)
work page 2008
-
[49]
Robyn Speer, Joshua Chin, and Catherine Havasi. 2017. ConceptNet 5.5: An Open Multilingual Graph of General Knowledge. InProceedings of the Thirty- First AAAI Conference on Artificial Intelligence, February 4-9, 2017, San Francisco, California, USA, Satinder Singh and Shaul Markovitch (Eds.). AAAI Press, 4444–
work page 2017
-
[50]
doi:10.1609/AAAI.V31I1.11164
-
[51]
Alon Talmor, Jonathan Herzig, Nicholas Lourie, and Jonathan Berant. 2019. Com- monsenseQA: A Question Answering Challenge Targeting Commonsense Knowl- edge. InProceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL- HLT 2019, Minneapolis, MN, USA, June 2-7, 2019,...
-
[52]
Zhenwei Tang, Griffin Floto, Armin Toroghi, Shichao Pei, Xiangliang Zhang, and Scott Sanner. 2023. LogicRec: Recommendation with Users’ Logical Requirements. InProceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval. 2129–2133
work page 2023
-
[53]
Armin Toroghi, Griffin Floto, Zhenwei Tang, and Scott Sanner. 2023. Bayesian Knowledge-driven Critiquing with Indirect Evidence. InProceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval. 1838–1842
work page 2023
-
[54]
Armin Toroghi, Willis Guo, Ali Pesaranghader, and Scott Sanner. 2024. Verifiable, Debuggable, and Repairable Commonsense Logical Reasoning via LLM-based Theory Resolution. InProceedings of the 2024 Conference on Empirical Methods in Natural Language Processing. 6634–6652
work page 2024
-
[55]
Armin Toroghi, Willis Guo, Mohammad Mahdi Abdollah Pour, and Scott Sanner
-
[56]
InProceedings of the 2024 Conference on Empirical Methods in Natural Language Processing
Right for Right Reasons: Large Language Models for Verifiable Com- monsense Knowledge Graph Question Answering. InProceedings of the 2024 Conference on Empirical Methods in Natural Language Processing. 6601–6633
work page 2024
-
[57]
Armin Toroghi, Willis Guo, and Scott Sanner. 2025. CoLoTa: A Dataset for Entity- based Commonsense Reasoning over Long-Tail Knowledge. InProceedings of the 48th International ACM SIGIR Conference on Research and Development in Information Retrieval. 3444–3454
work page 2025
-
[58]
Armin Toroghi, Ali Pesaranghader, Tanmana Sadhu, and Scott Sanner. 2025. Llm- based typed hyperresolution for commonsense reasoning with knowledge bases. InThe Thirteenth International Conference on Learning Representations
work page 2025
-
[59]
Armin Toroghi and Scott Sanner. 2024. Bayesian inference with complex knowl- edge graph evidence. InProceedings of the AAAI Conference on Artificial Intelli- gence, Vol. 38. 20550–20558
work page 2024
-
[60]
Priyansh Trivedi, Gaurav Maheshwari, Mohnish Dubey, and Jens Lehmann. 2017. LC-QuAD: A Corpus for Complex Question Answering over Knowledge Graphs. InThe Semantic Web - ISWC 2017 - 16th International Semantic Web Conference, Vienna, Austria, October 21-25, 2017, Proceedings, Part II (Lecture Notes in Computer Science, Vol. 10588), Claudia d’Amato, Miriam ...
-
[61]
Denny Vrandečić and Markus Krötzsch. 2014. Wikidata: a free collaborative knowledgebase.Commun. ACM57, 10 (2014), 78–85
work page 2014
-
[62]
Yubo Wang, Xueguang Ma, Ge Zhang, Yuansheng Ni, Abhranil Chandra, Shiguang Guo, Weiming Ren, Aaran Arulraj, Xuan He, Ziyan Jiang, et al. 2024. Mmlu-pro: A more robust and challenging multi-task language understanding benchmark.Advances in Neural Information Processing Systems37 (2024), 95266– 95290
work page 2024
-
[63]
Chulin Xie, Yangsibo Huang, Chiyuan Zhang, Da Yu, Xinyun Chen, Bill Yuchen Lin, Bo Li, Badih Ghazi, and Ravi Kumar. 2025. On memorization of large language models in logical reasoning. InProceedings of the 14th International Joint Conference on Natural Language Processing and the 4th Conference of the Asia-Pacific Chapter of the Association for Computatio...
work page 2025
-
[64]
Wen-tau Yih, Matthew Richardson, Christopher Meek, Ming-Wei Chang, and Jina Suh. 2016. The Value of Semantic Parse Labeling for Knowledge Base Question Answering. InProceedings of the 54th Annual Meeting of the Association for Computational Linguistics, ACL 2016, August 7-12, 2016, Berlin, Germany, Volume 2: Short Papers. The Association for Computer Ling...
-
[65]
Wen-tau Yih, Matthew Richardson, Christopher Meek, Ming-Wei Chang, and Jina Suh. 2016. The value of semantic parse labeling for knowledge base ques- tion answering. InProceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers). 201–206
work page 2016
-
[66]
Tianyi Zhang, Varsha Kishore, Felix Wu, Kilian Q Weinberger, and Yoav Artzi
-
[67]
InInternational Con- ference on Learning Representations
BERTScore: Evaluating Text Generation with BERT. InInternational Con- ference on Learning Representations
-
[68]
Yuyu Zhang, Hanjun Dai, Zornitsa Kozareva, Alexander Smola, and Le Song
-
[69]
In Proceedings of the AAAI conference on artificial intelligence, Vol
Variational reasoning for question answering with knowledge graph. In Proceedings of the AAAI conference on artificial intelligence, Vol. 32
-
[70]
Zirui Zhao, Wee Sun Lee, and David Hsu. 2023. Large language models as commonsense knowledge for large-scale task planning.Advances in neural information processing systems36 (2023), 31967–31987
work page 2023
-
[71]
Weiguo Zheng, Hong Cheng, Lei Zou, Jeffrey Xu Yu, and Kangfei Zhao. 2017. Natural language question/answering: Let users talk with the knowledge graph. InProceedings of the 2017 ACM on Conference on Information and Knowledge Management. 217–226
work page 2017
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.