pith:UN5FDI2D
Beyond Content: A Comprehensive Speech Toxicity Dataset and Detection Framework Incorporating Paralinguistic Cues
A dual-head model that separates paralinguistic from textual toxicity sources raises Macro-F1 by 21 percent in speech detection.
arxiv:2605.15984 v1 · 2026-05-15 · cs.SD · cs.AI · cs.CR
Add to your LaTeX paper
\usepackage{pith}
\pithnumber{UN5FDI2DAO7OZBQRRJMQAV7BIH}
Prints a linked badge after your title and injects PDF metadata. Compiles on arXiv. Learn more · Embed verified badge
Record completeness
Claims
Leveraging paralinguistic features significantly improves detection performance. Our method consistently outperforms existing baselines across multiple evaluation metrics, with a 21.1% relative improvement in Macro-F1 score and a 13.0% relative gain in accuracy over the strongest baseline.
The human annotations that distinguish textual content toxicity from paralinguistic origins are accurate, consistent, and capture the intended distinction without substantial label noise or annotator bias.
ToxiAlert-Bench dataset and dual-head neural network detect toxic speech by distinguishing textual versus paralinguistic sources, reporting 21.1% Macro-F1 and 13% accuracy gains over baselines.
References
Receipt and verification
| First computed | 2026-05-20T00:01:47.730425Z |
|---|---|
| Builder | pith-number-builder-2026-05-17-v1 |
| Signature | Pith Ed25519
(pith-v1-2026-05) · public key |
| Schema | pith-number/v1.0 |
Canonical hash
a37a51a34303beec86118a590057e141ea41710a2b239434c1607b3535c65d86
Aliases
· · · · ·Agent API
Verify this Pith Number yourself
curl -sH 'Accept: application/ld+json' https://pith.science/pith/UN5FDI2DAO7OZBQRRJMQAV7BIH \
| jq -c '.canonical_record' \
| python3 -c "import sys,json,hashlib; b=json.dumps(json.loads(sys.stdin.read()), sort_keys=True, separators=(',',':'), ensure_ascii=False).encode(); print(hashlib.sha256(b).hexdigest())"
# expect: a37a51a34303beec86118a590057e141ea41710a2b239434c1607b3535c65d86
Canonical record JSON
{
"metadata": {
"abstract_canon_sha256": "f688d55f20f9b84bcbe5499b4bc6f1f94995db945adb06193e936c2415b17266",
"cross_cats_sorted": [
"cs.AI",
"cs.CR"
],
"license": "http://arxiv.org/licenses/nonexclusive-distrib/1.0/",
"primary_cat": "cs.SD",
"submitted_at": "2026-05-15T14:17:19Z",
"title_canon_sha256": "2fcbffd859c0b3dc74618da49d89e6b1f258fc52cfd7b40363a59d4fc4e8e800"
},
"schema_version": "1.0",
"source": {
"id": "2605.15984",
"kind": "arxiv",
"version": 1
}
}