hub

How close is ChatGPT to human experts? Comparison corpus, evaluation, and detection

· 2023 · arXiv 2301.07597

16 Pith papers cite this work. Polarity classification is still indexing.

16 Pith papers citing it

read on arXiv browse 16 citing papers

hub tools

JSON dossier citing papers JSON arXiv source

citation-role summary

dataset 2 background 1

citation-polarity summary

use dataset 2 background 1

representative citing papers

Segmenting Human-LLM Co-authored Text via Change Point Detection

cs.CL · 2026-05-05 · unverdicted · novelty 7.0

Adapts change point detection to segment human-LLM co-authored text using weighted and generalized algorithms with minimax optimality and strong empirical results against baselines.

ExaGPT: Example-Based Machine-Generated Text Detection for Human Interpretability

cs.CL · 2025-02-17 · unverdicted · novelty 7.0

ExaGPT uses span-level similarity retrieval from human and LLM datastores to detect machine-generated text while supplying the matching spans as human-interpretable evidence, achieving up to 37-point accuracy gains over prior interpretable detectors at 1% FPR.

MELD: Multi-Task Equilibrated Learning Detector for AI-Generated Text

cs.CL · 2026-05-07 · unverdicted · novelty 6.0

MELD is a multi-task AI-text detector using auxiliary heads, uncertainty-weighted losses, EMA distillation, and pairwise ranking that reaches 99.9% TPR at 1% FPR on a new held-out benchmark while remaining competitive on the RAID leaderboard.

More Aligned, Less Diverse? Analyzing the Grammar and Lexicon of Two Generations of LLMs

cs.CL · 2026-05-07 · unverdicted · novelty 6.0

Newer LLMs exhibit reduced syntactic and lexical diversity in English news text generation compared to older models, as measured by HPSG grammar and diversity metrics from ecology and information theory, while human-authored text shows little change.

LLM Output Detectability and Task Performance Can be Jointly Optimized

cs.CL · 2026-05-02 · unverdicted · novelty 6.0

PUPPET jointly optimizes LLM outputs for high detectability and task performance via RL rewards from a detector and a task evaluator, outperforming watermarking on tasks while matching detectability.

Rethinking Publication: A Certification Framework for AI-Enabled Research

cs.AI · 2026-04-23 · unverdicted · novelty 6.0 · 2 refs

A two-layer certification framework decouples knowledge validity from human authorship to accommodate AI-enabled research in existing publication systems.

All in One: A Unified Synthetic Data Pipeline for Multimodal Video Understanding

cs.CV · 2026-04-14 · unverdicted · novelty 6.0

A unified synthetic data generation pipeline produces unlimited annotated multimodal video data across multiple tasks, enabling models trained mostly on synthetic data to generalize effectively to real-world video understanding benchmarks.

Beyond the Final Actor: Modeling the Dual Roles of Creator and Editor for Fine-Grained LLM-Generated Text Detection

cs.CL · 2026-04-06 · unverdicted · novelty 6.0 · 2 refs

RACE applies rhetorical structure analysis to model creator and editor roles separately for four-class fine-grained detection of LLM-generated text.

GigaCheck: Detecting LLM-generated Content via Object-Centric Span Localization

cs.CL · 2024-10-31 · unverdicted · novelty 6.0

GigaCheck detects LLM-generated text at both document and span levels by combining fine-tuned language-model embeddings with a DETR-like architecture that treats generated intervals as detectable objects.

Hidden Human-Like Nature of Machine-Generated Texts: Theory and Detection Enhancement

cs.CL · 2026-05-22 · unverdicted · novelty 5.0

Reveals hidden human-like spans in machine-generated texts that raise detection complexity and proposes a stacked enhancement framework with hard-EM optimization to improve detectors across LLMs.

Multi-Level Contextual Token Relation Modeling for Machine-Generated Text Detection

cs.CL · 2026-05-15 · unverdicted · novelty 5.0

A multi-level framework that models local and global relations among token detection scores to improve machine-generated text detection with low overhead.

Feature-Augmented Transformers for Robust AI-Text Detection Across Domains and Generators

cs.CL · 2026-05-05 · conditional · novelty 5.0

Feature-augmented DeBERTa-v3-base with attention-based fusion reaches 85.9% balanced accuracy on the multi-domain M4 benchmark under fixed-threshold evaluation, outperforming zero-shot baselines by up to 7.22 points.

Please Make it Sound like Human: Encoder-Decoder vs. Decoder-Only Transformers for AI-to-Human Text Style Transfer

cs.CL · 2026-04-13 · unverdicted · novelty 5.0

BART-large outperforms Mistral-7B in AI-to-human style transfer with higher reference similarity scores and far fewer parameters, while showing that marker shift can reflect overshoot rather than accurate transfer.

LifeAlign: Lifelong Alignment for Large Language Models with Memory-Augmented Focalized Preference Optimization

cs.CL · 2025-09-21 · unverdicted · novelty 5.0

LifeAlign uses focalized preference optimization and short-to-long memory consolidation via dimensionality reduction to let LLMs align with new preferences while retaining prior knowledge.

Is Human-Like Text Liked by Humans? Multilingual Human Detection and Preference Against AI

cs.CL · 2025-02-17 · conditional · novelty 5.0

Humans detect AI-generated text at 87.6% accuracy across 9 languages and 9 domains, outperforming prior near-random results, and do not always prefer human-written text when the source is unclear.

A Survey of Large Language Models

cs.CL · 2023-03-31 · accept · novelty 3.0

This survey reviews the background, key techniques, and evaluation methods for large language models, emphasizing emergent abilities that appear at large scales.

citing papers explorer

Showing 16 of 16 citing papers.

Segmenting Human-LLM Co-authored Text via Change Point Detection cs.CL · 2026-05-05 · unverdicted · none · ref 4
Adapts change point detection to segment human-LLM co-authored text using weighted and generalized algorithms with minimax optimality and strong empirical results against baselines.
ExaGPT: Example-Based Machine-Generated Text Detection for Human Interpretability cs.CL · 2025-02-17 · unverdicted · none · ref 11
ExaGPT uses span-level similarity retrieval from human and LLM datastores to detect machine-generated text while supplying the matching spans as human-interpretable evidence, achieving up to 37-point accuracy gains over prior interpretable detectors at 1% FPR.
MELD: Multi-Task Equilibrated Learning Detector for AI-Generated Text cs.CL · 2026-05-07 · unverdicted · none · ref 12
MELD is a multi-task AI-text detector using auxiliary heads, uncertainty-weighted losses, EMA distillation, and pairwise ranking that reaches 99.9% TPR at 1% FPR on a new held-out benchmark while remaining competitive on the RAID leaderboard.
More Aligned, Less Diverse? Analyzing the Grammar and Lexicon of Two Generations of LLMs cs.CL · 2026-05-07 · unverdicted · none · ref 15
Newer LLMs exhibit reduced syntactic and lexical diversity in English news text generation compared to older models, as measured by HPSG grammar and diversity metrics from ecology and information theory, while human-authored text shows little change.
LLM Output Detectability and Task Performance Can be Jointly Optimized cs.CL · 2026-05-02 · unverdicted · none · ref 20
PUPPET jointly optimizes LLM outputs for high detectability and task performance via RL rewards from a detector and a task evaluator, outperforming watermarking on tasks while matching detectability.
Rethinking Publication: A Certification Framework for AI-Enabled Research cs.AI · 2026-04-23 · unverdicted · none · ref 13 · 2 links
A two-layer certification framework decouples knowledge validity from human authorship to accommodate AI-enabled research in existing publication systems.
All in One: A Unified Synthetic Data Pipeline for Multimodal Video Understanding cs.CV · 2026-04-14 · unverdicted · none · ref 33
A unified synthetic data generation pipeline produces unlimited annotated multimodal video data across multiple tasks, enabling models trained mostly on synthetic data to generalize effectively to real-world video understanding benchmarks.
Beyond the Final Actor: Modeling the Dual Roles of Creator and Editor for Fine-Grained LLM-Generated Text Detection cs.CL · 2026-04-06 · unverdicted · none · ref 1 · 2 links
RACE applies rhetorical structure analysis to model creator and editor roles separately for four-class fine-grained detection of LLM-generated text.
GigaCheck: Detecting LLM-generated Content via Object-Centric Span Localization cs.CL · 2024-10-31 · unverdicted · none · ref 22
GigaCheck detects LLM-generated text at both document and span levels by combining fine-tuned language-model embeddings with a DETR-like architecture that treats generated intervals as detectable objects.
Hidden Human-Like Nature of Machine-Generated Texts: Theory and Detection Enhancement cs.CL · 2026-05-22 · unverdicted · none · ref 10
Reveals hidden human-like spans in machine-generated texts that raise detection complexity and proposes a stacked enhancement framework with hard-EM optimization to improve detectors across LLMs.
Multi-Level Contextual Token Relation Modeling for Machine-Generated Text Detection cs.CL · 2026-05-15 · unverdicted · none · ref 8
A multi-level framework that models local and global relations among token detection scores to improve machine-generated text detection with low overhead.
Feature-Augmented Transformers for Robust AI-Text Detection Across Domains and Generators cs.CL · 2026-05-05 · conditional · none · ref 3
Feature-augmented DeBERTa-v3-base with attention-based fusion reaches 85.9% balanced accuracy on the multi-domain M4 benchmark under fixed-threshold evaluation, outperforming zero-shot baselines by up to 7.22 points.
Please Make it Sound like Human: Encoder-Decoder vs. Decoder-Only Transformers for AI-to-Human Text Style Transfer cs.CL · 2026-04-13 · unverdicted · none · ref 3
BART-large outperforms Mistral-7B in AI-to-human style transfer with higher reference similarity scores and far fewer parameters, while showing that marker shift can reflect overshoot rather than accurate transfer.
LifeAlign: Lifelong Alignment for Large Language Models with Memory-Augmented Focalized Preference Optimization cs.CL · 2025-09-21 · unverdicted · none · ref 10
LifeAlign uses focalized preference optimization and short-to-long memory consolidation via dimensionality reduction to let LLMs align with new preferences while retaining prior knowledge.
Is Human-Like Text Liked by Humans? Multilingual Human Detection and Preference Against AI cs.CL · 2025-02-17 · conditional · none · ref 11
Humans detect AI-generated text at 87.6% accuracy across 9 languages and 9 domains, outperforming prior near-random results, and do not always prefer human-written text when the source is unclear.
A Survey of Large Language Models cs.CL · 2023-03-31 · accept · none · ref 187
This survey reviews the background, key techniques, and evaluation methods for large language models, emphasizing emergent abilities that appear at large scales.

How close is ChatGPT to human experts? Comparison corpus, evaluation, and detection

hub tools

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer