{"total":26,"items":[{"citing_arxiv_id":"2606.27527","ref_index":28,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"Large Language Model Teaches Visual Students: Cross-Modality Transfer of Fine-Grained Conceptual Knowledge","primary_cat":"cs.CV","submitted_at":"2026-06-25T20:19:50+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"LaViD distills LLM conceptual knowledge to vision models via LLM-generated MCQ soft labels, outperforming vision-language distillation baselines on fine-grained benchmarks while improving robustness on spurious correlation datasets.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2606.09881","ref_index":28,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"Toward Calibrated, Fair, and accurate Deepfake Detection","primary_cat":"cs.LG","submitted_at":"2026-06-03T05:44:29+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"Face-Feature Tuning is a label-free logit remapping method that reduces FPR/TPR gaps across groups in deepfake detection while preserving overall accuracy.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.14169","ref_index":19,"ref_count":1,"confidence":0.9,"is_internal_anchor":true,"paper_title":"BOOKMARKS: Efficient Active Storyline Memory for Role-playing","primary_cat":"cs.CL","submitted_at":"2026-05-13T22:48:24+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"BOOKMARKS introduces searchable bookmarks as reusable answers to storyline questions, enabling active initialization and passive synchronization for more consistent role-playing agent memory than recurrent summarization.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.10640","ref_index":45,"ref_count":1,"confidence":0.9,"is_internal_anchor":true,"paper_title":"Towards Understanding Continual Factual Knowledge Acquisition of Language Models: From Theory to Algorithm","primary_cat":"cs.CL","submitted_at":"2026-05-11T14:28:02+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"Theoretical analysis of continual factual knowledge acquisition shows data replay stabilizes pretrained knowledge by shifting convergence dynamics while regularization only slows forgetting, leading to the STOC method for attention-based replay selection.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.05459","ref_index":5,"ref_count":1,"confidence":0.9,"is_internal_anchor":true,"paper_title":"Privacy Without Losing Place: A Paradigm for Private Retrieval in Spatial RAGs","primary_cat":"cs.CR","submitted_at":"2026-05-06T21:33:43+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"PAS encodes locations via relative anchors and bins to deliver roughly 370-400m adversarial error in spatial RAG while retaining over half the baseline retrieval performance and keeping generation quality robust.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.01359","ref_index":44,"ref_count":1,"confidence":0.9,"is_internal_anchor":true,"paper_title":"Structural Ranking of the Cognitive Plausibility of Computational Models of Analogy and Metaphors with the Minimal Cognitive Grid","primary_cat":"cs.AI","submitted_at":"2026-05-02T10:08:10+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":5.0,"formal_verification":"none","one_line_summary":"A formalized Minimal Cognitive Grid ranks computational models of analogy and metaphor by alignment with cognitive theories using Functional/Structural Ratio, Generality, and Performance Match dimensions.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2604.18124","ref_index":49,"ref_count":1,"confidence":0.9,"is_internal_anchor":true,"paper_title":"TLoRA: Task-aware Low Rank Adaptation of Large Language Models","primary_cat":"cs.CL","submitted_at":"2026-04-20T11:43:55+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"TLoRA jointly optimizes LoRA initialization via task-data SVD and sensitivity-driven rank allocation, delivering stronger results than standard LoRA across NLU, reasoning, math, code, and chat tasks while using fewer trainable parameters.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2604.15945","ref_index":3,"ref_count":1,"confidence":0.9,"is_internal_anchor":true,"paper_title":"RAGognizer: Hallucination-Aware Fine-Tuning via Detection Head Integration","primary_cat":"cs.CL","submitted_at":"2026-04-17T11:07:32+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"RAGognizer adds a detection head to LLMs for joint training on generation and token-level hallucination detection, yielding SOTA detection and fewer hallucinations in RAG while preserving output quality.","context_count":1,"top_context_role":"background","top_context_polarity":"background","context_text":"losses only on the response tokens (t > L). The causal language modeling loss is LCE =− TX t=L+1 logP(x t |x <t; Θ∗, θLoRA), and the token-level hallucination detection loss is LBCE = TX t=L+1 BCE(ˆpt, ydet,t), where ˆpt is the probability predicted by the detection head fϕ. Let D denote the training distribution. The joint training objective is min θLoRA,ϕ E(x,ydet)∼D [LCE +λL BCE],(3) where λ controls the trade-off between generation quality and hallucination detection performance.By integrating hallucination supervision into training, RAGognizer steers the LLM toward representations that enhance the separability of faithful and hallucinatory states, while preserving generation quality. 3.3 Training Setup We fine-tune the base LLMs using LoRA to ensure parameter efficiency while keeping the pre-trained backbone frozen."},{"citing_arxiv_id":"2604.05732","ref_index":27,"ref_count":1,"confidence":0.9,"is_internal_anchor":true,"paper_title":"Graph Topology Information Enhanced Heterogeneous Graph Representation Learning","primary_cat":"cs.LG","submitted_at":"2026-04-07T11:35:45+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"ToGRL learns high-quality graph structures from raw heterogeneous graphs via a two-stage topology extraction process and prompt tuning, outperforming prior methods on five datasets.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2604.14172","ref_index":21,"ref_count":1,"confidence":0.9,"is_internal_anchor":true,"paper_title":"Tug-of-War within A Decade: Conflict Resolution in Vulnerability Analysis via Teacher-Guided Retrieval-Augmented Generations","primary_cat":"cs.CL","submitted_at":"2026-03-25T07:32:38+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":5.0,"formal_verification":"none","one_line_summary":"CRVA-TGRAG combines parent-document segmentation, ensemble retrieval, and teacher-guided fine-tuning to mitigate knowledge conflicts and improve accuracy in LLM-based CVE vulnerability analysis.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2604.06179","ref_index":8,"ref_count":1,"confidence":0.9,"is_internal_anchor":true,"paper_title":"ARIA: Adaptive Retrieval Intelligence Assistant -- A Multimodal RAG Framework for Domain-Specific Engineering Education","primary_cat":"cs.IR","submitted_at":"2026-02-04T01:08:24+00:00","verdict":"CONDITIONAL","verdict_confidence":"LOW","novelty_score":5.0,"formal_verification":"none","one_line_summary":"ARIA is a multimodal RAG framework that filters domain-specific questions with 97.5% accuracy and outperforms ChatGPT-5 on pedagogical quality for a university civil engineering course.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2511.02626","ref_index":19,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"Understanding New-Knowledge-Induced Factual Hallucinations in LLMs: Analysis and Interpretation","primary_cat":"cs.CL","submitted_at":"2025-11-04T14:55:24+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"Fine-tuning on new knowledge induces propagating hallucinations in LLMs by weakening attention to key entities, with mitigation via reintroducing known knowledge during later training stages.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2510.17934","ref_index":25,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"AtlasKV: Augmenting LLMs with Billion-Scale Knowledge Graphs in 20GB VRAM","primary_cat":"cs.CL","submitted_at":"2025-10-20T15:40:14+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"AtlasKV integrates billion-scale KGs into LLMs parametrically with sub-linear complexity and low memory by converting triples into key-value representations handled by the model's attention.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2509.08461","ref_index":30,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"Adapting Vision-Language Models for Neutrino Event Classification in High-Energy Physics","primary_cat":"cs.LG","submitted_at":"2025-09-10T10:07:27+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":4.0,"formal_verification":"none","one_line_summary":"Fine-tuned LLaMA 3.2 VLM outperforms CNN baselines on neutrino event classification while adding interpretability via language reasoning.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2502.11008","ref_index":34,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"CounterBench: Evaluating and Improving Counterfactual Reasoning in Large Language Models","primary_cat":"cs.CL","submitted_at":"2025-02-16T06:19:37+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"Introduces CounterBench benchmark and CoIn iterative reasoning method showing LLMs perform near random on formal counterfactual tasks but improve substantially with guided backtracking.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2502.02737","ref_index":210,"ref_count":1,"confidence":0.9,"is_internal_anchor":true,"paper_title":"SmolLM2: When Smol Goes Big -- Data-Centric Training of a Small Language Model","primary_cat":"cs.CL","submitted_at":"2025-02-04T21:43:16+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":5.0,"formal_verification":"none","one_line_summary":"SmolLM2 is a 1.7B-parameter language model that outperforms Qwen2.5-1.5B and Llama3.2-1B after overtraining on 11 trillion tokens using custom FineMath, Stack-Edu, and SmolTalk datasets in a multi-stage pipeline.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2401.18059","ref_index":131,"ref_count":1,"confidence":0.9,"is_internal_anchor":true,"paper_title":"RAPTOR: Recursive Abstractive Processing for Tree-Organized Retrieval","primary_cat":"cs.CL","submitted_at":"2024-01-31T18:30:21+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"RAPTOR introduces a tree-organized retrieval method using recursive abstractive summaries, achieving a 20% absolute accuracy improvement on the QuALITY benchmark when paired with GPT-4.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2401.00761","ref_index":56,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"Identifying the Achilles' Heel: An Iterative Method for Dynamically Uncovering Factual Errors in Large Language Models","primary_cat":"cs.SE","submitted_at":"2024-01-01T14:02:27+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"HalluHunter is a knowledge-graph and rule-based NLP framework that iteratively generates single- and multi-hop questions to uncover factual errors in LLMs, triggering errors in up to 55% of cases on nine models while preserving coverage.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2310.11511","ref_index":88,"ref_count":1,"confidence":0.9,"is_internal_anchor":true,"paper_title":"Self-RAG: Learning to Retrieve, Generate, and Critique through Self-Reflection","primary_cat":"cs.CL","submitted_at":"2023-10-17T18:18:32+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"Self-RAG trains LLMs to adaptively retrieve passages on demand and self-critique using reflection tokens, outperforming ChatGPT and retrieval-augmented Llama2 on QA, reasoning, and fact verification.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2309.11495","ref_index":162,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"Chain-of-Verification Reduces Hallucination in Large Language Models","primary_cat":"cs.CL","submitted_at":"2023-09-20T17:50:55+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"Chain-of-Verification reduces hallucinations in large language models by drafting responses, planning independent verification questions, answering them separately, and generating a final verified output.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2207.05608","ref_index":3,"ref_count":1,"confidence":0.9,"is_internal_anchor":true,"paper_title":"Inner Monologue: Embodied Reasoning through Planning with Language Models","primary_cat":"cs.RO","submitted_at":"2022-07-12T15:20:48+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"LLMs form an inner monologue from closed-loop language feedback to improve high-level instruction completion in simulated and real robotic rearrangement and kitchen manipulation tasks.","context_count":1,"top_context_role":"background","top_context_polarity":"background","context_text":", HRL [2]), effective high-level reasoning about complex tasks also requires semantic knowledge and understanding of the world. One of the remarkable observations in recent machine learning research is that large language models (LLMs) can not only generate fluent textual descriptions, but also appear to have rich internalized knowledge about the world [3, 4, 5, 6, 7]. When appropriately conditioned (e.g., prompted), they can even carry out some degree of deduction and respond to questions that appear to require reasoning and inference [8, 9, 10, 11, 12, 13]. This raises an intriguing possibility: beyond their ability to interpret natural language instructions, can language models further serve as reasoning models that combine multiple sources of feedback and become interactive"},{"citing_arxiv_id":"2205.00445","ref_index":13,"ref_count":1,"confidence":0.9,"is_internal_anchor":true,"paper_title":"MRKL Systems: A modular, neuro-symbolic architecture that combines large language models, external knowledge sources and discrete reasoning","primary_cat":"cs.CL","submitted_at":"2022-05-01T11:01:28+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"MRKL is a modular neuro-symbolic architecture that integrates LLMs with external knowledge and discrete reasoning to overcome limitations of pure neural language models.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2204.00598","ref_index":116,"ref_count":1,"confidence":0.9,"is_internal_anchor":true,"paper_title":"Socratic Models: Composing Zero-Shot Multimodal Reasoning with Language","primary_cat":"cs.CV","submitted_at":"2022-04-01T17:43:13+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"Socratic Models compose zero-shot multimodal reasoning by prompting pretrained language and vision models to exchange information and enable new capabilities without finetuning.","context_count":1,"top_context_role":"background","top_context_polarity":"background","context_text":"Wu, and A. Zisserman. A short note on the kinetics-700- 2020 human action dataset. arXiv preprint arXiv:2010.10864, 2020. [115] A. Kuznetsova, H. Rom, N. Alldrin, J. Uijlings, I. Krasin, J. Pont-Tuset, S. Kamali, S. Popov, M. Malloci, A. Kolesnikov, et al. The open images dataset v4. International Journal of Computer Vision, 128(7): 1956-1981, 2020. [116] F. Petroni, T. Rocktäschel, P. Lewis, A. Bakhtin, Y . Wu, A. H. Miller, and S. Riedel. Language models as knowledge bases? arXiv preprint arXiv:1909.01066, 2019. 14 [117] P. Agarwal, A. Betancourt, V . Panagiotou, and N. Díaz-Rodríguez. Egoshots, an ego-vision life-logging dataset and semantic ﬁdelity metric to evaluate diversity in image captioning models."},{"citing_arxiv_id":"2002.08155","ref_index":51,"ref_count":1,"confidence":0.9,"is_internal_anchor":true,"paper_title":"CodeBERT: A Pre-Trained Model for Programming and Natural Languages","primary_cat":"cs.CL","submitted_at":"2020-02-19T13:09:07+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"CodeBERT pre-trains a bimodal model on code and text pairs plus unimodal data to achieve state-of-the-art results on natural language code search and code documentation generation.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2002.08910","ref_index":64,"ref_count":1,"confidence":0.9,"is_internal_anchor":true,"paper_title":"How Much Knowledge Can You Pack Into the Parameters of a Language Model?","primary_cat":"cs.CL","submitted_at":"2020-02-10T18:55:58+00:00","verdict":"ACCEPT","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"Fine-tuned language models store knowledge in parameters to answer questions competitively with retrieval-based open-domain QA systems.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2002.08909","ref_index":13,"ref_count":1,"confidence":0.9,"is_internal_anchor":true,"paper_title":"REALM: Retrieval-Augmented Language Model Pre-Training","primary_cat":"cs.CL","submitted_at":"2020-02-10T18:40:59+00:00","verdict":"ACCEPT","verdict_confidence":"MODERATE","novelty_score":8.0,"formal_verification":"none","one_line_summary":"REALM augments language-model pre-training with an unsupervised retriever over Wikipedia documents and reports 4-16% absolute gains on open-domain QA benchmarks over prior implicit and explicit knowledge methods.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null}],"limit":50,"offset":0}