arXiv preprint arXiv:2205.12628 , year=

· 2022 · arXiv 2205.12628

8 Pith papers cite this work. Polarity classification is still indexing.

8 Pith papers citing it

read on arXiv browse 8 citing papers

citation-role summary

background 2

citation-polarity summary

background 2

representative citing papers

Negative Preference Optimization: From Catastrophic Collapse to Effective Unlearning

cs.LG · 2024-04-08 · conditional · novelty 8.0

NPO enables stable unlearning of 50%+ training data in LLMs on TOFU by making collapse exponentially slower than gradient ascent, preserving sensible outputs where prior methods fail.

Pythia: A Suite for Analyzing Large Language Models Across Training and Scaling

cs.CL · 2023-04-03 · accept · novelty 8.0

Pythia releases 16 identically trained LLMs with full checkpoints and data tools to study training dynamics, scaling, memorization, and bias in language models.

BEAVER: An Efficient Deterministic LLM Verifier

cs.AI · 2025-12-05 · unverdicted · novelty 7.0

BEAVER is the first practical deterministic verifier that maintains sound probability bounds on LLM safety properties using token tries and frontier data structures, finding 2-3x more violations than sampling at 1/10 the compute.

PrivUn: Unveiling Latent Ripple Effects and Shallow Forgetting in Privacy Unlearning

cs.LG · 2026-04-23 · unverdicted · novelty 6.0

PrivUn shows privacy unlearning in LLMs produces gradient-driven ripple effects and only shallow forgetting across layers, with new strategies proposed for deeper removal.

To Lie or Not to Lie? Investigating The Biased Spread of Global Lies by LLMs

cs.CL · 2026-04-08 · unverdicted · novelty 6.0

LLMs propagate misinformation more in lower-resource languages and lower-HDI countries, with input safety classifiers and retrieval-augmented fact-checking showing cross-lingual and regional gaps.

Revisiting the Past: Data Unlearning with Model State History

cs.LG · 2025-06-26 · unverdicted · novelty 5.0

MSA performs data unlearning in LLMs by arithmetic operations on prior model checkpoints to remove targeted datapoint influence, with experiments showing competitive or better results than existing unlearning methods.

Agentic AI Security: Threats, Defenses, Evaluation, and Open Challenges

cs.AI · 2025-10-27 · unverdicted · novelty 4.0

A survey that taxonomizes threats to agentic AI, reviews benchmarks and evaluation methods, discusses technical and governance defenses, and identifies open challenges.

Large Language Model Agent: A Survey on Methodology, Applications and Challenges

cs.CL · 2025-03-27 · accept · novelty 3.0

A survey that deconstructs LLM agent systems via a methodology-centered taxonomy linking design principles to emergent behaviors, applications, and challenges.

citing papers explorer

Showing 8 of 8 citing papers.

Negative Preference Optimization: From Catastrophic Collapse to Effective Unlearning cs.LG · 2024-04-08 · conditional · none · ref 10
NPO enables stable unlearning of 50%+ training data in LLMs on TOFU by making collapse exponentially slower than gradient ascent, preserving sensible outputs where prior methods fail.
Pythia: A Suite for Analyzing Large Language Models Across Training and Scaling cs.CL · 2023-04-03 · accept · none · ref 225
Pythia releases 16 identically trained LLMs with full checkpoints and data tools to study training dynamics, scaling, memorization, and bias in language models.
BEAVER: An Efficient Deterministic LLM Verifier cs.AI · 2025-12-05 · unverdicted · none · ref 25
BEAVER is the first practical deterministic verifier that maintains sound probability bounds on LLM safety properties using token tries and frontier data structures, finding 2-3x more violations than sampling at 1/10 the compute.
PrivUn: Unveiling Latent Ripple Effects and Shallow Forgetting in Privacy Unlearning cs.LG · 2026-04-23 · unverdicted · none · ref 16
PrivUn shows privacy unlearning in LLMs produces gradient-driven ripple effects and only shallow forgetting across layers, with new strategies proposed for deeper removal.
To Lie or Not to Lie? Investigating The Biased Spread of Global Lies by LLMs cs.CL · 2026-04-08 · unverdicted · none · ref 16
LLMs propagate misinformation more in lower-resource languages and lower-HDI countries, with input safety classifiers and retrieval-augmented fact-checking showing cross-lingual and regional gaps.
Revisiting the Past: Data Unlearning with Model State History cs.LG · 2025-06-26 · unverdicted · none · ref 19
MSA performs data unlearning in LLMs by arithmetic operations on prior model checkpoints to remove targeted datapoint influence, with experiments showing competitive or better results than existing unlearning methods.
Agentic AI Security: Threats, Defenses, Evaluation, and Open Challenges cs.AI · 2025-10-27 · unverdicted · none · ref 141
A survey that taxonomizes threats to agentic AI, reviews benchmarks and evaluation methods, discusses technical and governance defenses, and identifies open challenges.
Large Language Model Agent: A Survey on Methodology, Applications and Challenges cs.CL · 2025-03-27 · accept · none · ref 226
A survey that deconstructs LLM agent systems via a methodology-centered taxonomy linking design principles to emergent behaviors, applications, and challenges.

arXiv preprint arXiv:2205.12628 , year=

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer