Declared losses recover epistemic distinctions collapsed by scalar neutrosophic T/I/F values in LLM evaluations.
A survey on the honesty of large language models.arXiv preprint arXiv:2409.18786
5 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
roles
background 2polarities
background 2representative citing papers
A survey of LLM copyright protection that unifies text watermarking, model watermarking, and model fingerprinting while presenting new coverage of fingerprint transfer and removal.
Strengthening LLM reasoning through RL, SFT, or chain-of-thought prompting increases tool hallucination rates on SimpleToolHalluBench, with a reliability-capability trade-off observed across mitigation attempts.
Generative multi-agent systems exhibit emergent collusion and conformity behaviors that cannot be prevented by existing agent-level safeguards.
The survey organizes mechanistic interpretability techniques into a Locate-Steer-Improve framework to enable actionable improvements in LLM alignment, capability, and efficiency.
citing papers explorer
-
From Scalars to Tensors: Declared Losses Recover Epistemic Distinctions That Neutrosophic Scalars Cannot Express
Declared losses recover epistemic distinctions collapsed by scalar neutrosophic T/I/F values in LLM evaluations.
-
Copyright Protection for Large Language Models: A Survey of Methods, Challenges, and Trends
A survey of LLM copyright protection that unifies text watermarking, model watermarking, and model fingerprinting while presenting new coverage of fingerprint transfer and removal.
-
The Reasoning Trap: How Enhancing LLM Reasoning Amplifies Tool Hallucination
Strengthening LLM reasoning through RL, SFT, or chain-of-thought prompting increases tool hallucination rates on SimpleToolHalluBench, with a reliability-capability trade-off observed across mitigation attempts.
-
Emergent Social Intelligence Risks in Generative Multi-Agent Systems
Generative multi-agent systems exhibit emergent collusion and conformity behaviors that cannot be prevented by existing agent-level safeguards.
-
Locate, Steer, and Improve: A Practical Survey of Actionable Mechanistic Interpretability in Large Language Models
The survey organizes mechanistic interpretability techniques into a Locate-Steer-Improve framework to enable actionable improvements in LLM alignment, capability, and efficiency.