AI Safety Landscape for Large Language Models: Taxonomy, State-of-the-art, and Future Directions

· 2024 · cs.AI · arXiv 2408.12935

2 Pith papers cite this work. Polarity classification is still indexing.

2 Pith papers citing it

open full Pith review browse 2 citing papers arXiv PDF

abstract

AI Safety is an emerging area of critical importance to the safe adoption and deployment of AI systems. With the rapid proliferation of AI and especially with the recent advancement of Generative AI (or GAI), the technology ecosystem behind the design, development, adoption, and deployment of AI systems has drastically changed, broadening the scope of AI Safety to address impacts on public safety and national security. In this paper, we propose a novel architectural framework for understanding and analyzing AI Safety; defining its characteristics from three perspectives: Trustworthy AI, Responsible AI, and Safe AI. We provide an extensive review of current research and advancements in AI safety from these perspectives, highlighting their key challenges and mitigation approaches. Through examples from state-of-the-art technologies, particularly Large Language Models (LLMs), we present innovative mechanism, methodologies, and techniques for designing and testing AI safety. Our goal is to promote advancement in AI safety research, and ultimately enhance people's trust in digital transformation.

representative citing papers

Black-box, Adaptive, Efficient, Transferable, Harmful, Applicable... Attacks Are All You Need to Break LLMs

cs.CR · 2026-06-02 · unverdicted · novelty 6.0

IHO is a new black-box jailbreak attack for LLMs that is adaptive, efficient, transferable across models and behaviors, and effective even against layered defenses without modification.

Limitations on Accurate, Trusted, Human-level Reasoning

cs.LG · 2025-09-25 · unverdicted · novelty 6.0

An accurate and trusted AI system cannot achieve human-level reasoning because there exist tasks easily solvable by humans but not by the system.

citing papers explorer

Showing 1 of 1 citing paper after filters.

Limitations on Accurate, Trusted, Human-level Reasoning cs.LG · 2025-09-25 · unverdicted · none · ref 14 · internal anchor
An accurate and trusted AI system cannot achieve human-level reasoning because there exist tasks easily solvable by humans but not by the system.

AI Safety Landscape for Large Language Models: Taxonomy, State-of-the-art, and Future Directions

fields

years

verdicts

representative citing papers

citing papers explorer