A survey on the honesty of large language models.arXiv preprint arXiv:2409.18786

Siheng Li, Cheng Yang, Taiqiang Wu, Chufan Shi, Yuji Zhang, Xinyu Zhu, Zesen Cheng, Deng Cai, Mo Yu, Lemao Liu, et al · 2024 · arXiv 2409.18786

5 Pith papers cite this work. Polarity classification is still indexing.

5 Pith papers citing it

read on arXiv browse 5 citing papers

citation-role summary

background 2

citation-polarity summary

background 2

representative citing papers

From Scalars to Tensors: Declared Losses Recover Epistemic Distinctions That Neutrosophic Scalars Cannot Express

cs.AI · 2026-03-10 · unverdicted · novelty 7.0

Declared losses recover epistemic distinctions collapsed by scalar neutrosophic T/I/F values in LLM evaluations.

cs.CR · 2025-08-15 · accept · novelty 7.0

A survey of LLM copyright protection that unifies text watermarking, model watermarking, and model fingerprinting while presenting new coverage of fingerprint transfer and removal.

The Reasoning Trap: How Enhancing LLM Reasoning Amplifies Tool Hallucination

cs.LG · 2025-10-27 · conditional · novelty 6.0

Strengthening LLM reasoning through RL, SFT, or chain-of-thought prompting increases tool hallucination rates on SimpleToolHalluBench, with a reliability-capability trade-off observed across mitigation attempts.

Emergent Social Intelligence Risks in Generative Multi-Agent Systems

cs.MA · 2026-03-29 · unverdicted · novelty 5.0

Generative multi-agent systems exhibit emergent collusion and conformity behaviors that cannot be prevented by existing agent-level safeguards.

Locate, Steer, and Improve: A Practical Survey of Actionable Mechanistic Interpretability in Large Language Models

cs.CL · 2026-01-20 · unverdicted · novelty 5.0

The survey organizes mechanistic interpretability techniques into a Locate-Steer-Improve framework to enable actionable improvements in LLM alignment, capability, and efficiency.

citing papers explorer

Showing 5 of 5 citing papers.

From Scalars to Tensors: Declared Losses Recover Epistemic Distinctions That Neutrosophic Scalars Cannot Express cs.AI · 2026-03-10 · unverdicted · none · ref 3
Declared losses recover epistemic distinctions collapsed by scalar neutrosophic T/I/F values in LLM evaluations.
Copyright Protection for Large Language Models: A Survey of Methods, Challenges, and Trends cs.CR · 2025-08-15 · accept · none · ref 88
A survey of LLM copyright protection that unifies text watermarking, model watermarking, and model fingerprinting while presenting new coverage of fingerprint transfer and removal.
The Reasoning Trap: How Enhancing LLM Reasoning Amplifies Tool Hallucination cs.LG · 2025-10-27 · conditional · none · ref 6
Strengthening LLM reasoning through RL, SFT, or chain-of-thought prompting increases tool hallucination rates on SimpleToolHalluBench, with a reliability-capability trade-off observed across mitigation attempts.
Emergent Social Intelligence Risks in Generative Multi-Agent Systems cs.MA · 2026-03-29 · unverdicted · none · ref 76
Generative multi-agent systems exhibit emergent collusion and conformity behaviors that cannot be prevented by existing agent-level safeguards.
Locate, Steer, and Improve: A Practical Survey of Actionable Mechanistic Interpretability in Large Language Models cs.CL · 2026-01-20 · unverdicted · none · ref 175
The survey organizes mechanistic interpretability techniques into a Locate-Steer-Improve framework to enable actionable improvements in LLM alignment, capability, and efficiency.

A survey on the honesty of large language models.arXiv preprint arXiv:2409.18786

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer