Detecting Personal Information in Training Corpora: an Analysis

Nishant Subramani, Sasha Luccioni, Jesse Dodge, Margaret Mitchell · 2023 · DOI 10.18653/v1/2023.trustnlp-1.18

4 Pith papers cite this work. Polarity classification is still indexing.

4 Pith papers citing it

open at publisher browse 4 citing papers

representative citing papers

Clean Me If You Can: A Large Collection of Real-World Addresses for Data Cleaning Benchmarking

cs.DB · 2026-06-30 · unverdicted · novelty 7.0 · 2 refs

Releases a large real-world dataset of dirty postal addresses with ground truth for benchmarking data cleaning algorithms.

Detecting Sensitive Personal Information in Japanese Pre-Training Corpora for Large Language Models

cs.CL · 2026-06-10 · unverdicted · novelty 6.0

The paper constructs an SCPI dataset via LLM-based annotation and trains classifiers to detect sensitive personal information in Japanese pre-training corpora, claiming this is the first such exploration.

"I understand your perspective": LLM Persuasion and Sycophancy through the Lens of Communicative Action Theory

cs.CL · 2026-06-06 · unverdicted · novelty 5.0

LLMs outperform humans in expressing illocutionary intents and sycophancy in successful persuasive counter-arguments from ChangeMyView, with crowd workers preferring LLM versions.

A Comprehensive Survey of Hallucination Mitigation Techniques in Large Language Models

cs.CL · 2024-01-02 · accept · novelty 4.0

A survey that compiles and taxonomizes more than 32 existing hallucination mitigation techniques for LLMs while analyzing their challenges and limitations.

citing papers explorer

Showing 3 of 3 citing papers after filters.

Clean Me If You Can: A Large Collection of Real-World Addresses for Data Cleaning Benchmarking cs.DB · 2026-06-30 · unverdicted · none · ref 39 · 2 links
Releases a large real-world dataset of dirty postal addresses with ground truth for benchmarking data cleaning algorithms.
Detecting Sensitive Personal Information in Japanese Pre-Training Corpora for Large Language Models cs.CL · 2026-06-10 · unverdicted · none · ref 8
The paper constructs an SCPI dataset via LLM-based annotation and trains classifiers to detect sensitive personal information in Japanese pre-training corpora, claiming this is the first such exploration.
"I understand your perspective": LLM Persuasion and Sycophancy through the Lens of Communicative Action Theory cs.CL · 2026-06-06 · unverdicted · none · ref 295
LLMs outperform humans in expressing illocutionary intents and sycophancy in successful persuasive counter-arguments from ChangeMyView, with crowd workers preferring LLM versions.

Detecting Personal Information in Training Corpora: an Analysis

fields

years

verdicts

representative citing papers

citing papers explorer