DRCD: a Chinese Machine Reading Comprehension Dataset

· 2018 · cs.CL · arXiv 1806.00920

3 Pith papers cite this work. Polarity classification is still indexing.

3 Pith papers citing it

open full Pith review browse 3 citing papers arXiv PDF

abstract

In this paper, we introduce DRCD (Delta Reading Comprehension Dataset), an open domain traditional Chinese machine reading comprehension (MRC) dataset. This dataset aimed to be a standard Chinese machine reading comprehension dataset, which can be a source dataset in transfer learning. The dataset contains 10,014 paragraphs from 2,108 Wikipedia articles and 30,000+ questions generated by annotators. We build a baseline model that achieves an F1 score of 89.59%. F1 score of Human performance is 93.30%.

representative citing papers

KaLM-Reranker-V1: Fast but Not Late Interaction for Compressed Document Reranking

cs.CL · 2026-06-22 · unverdicted · novelty 5.0

KaLM-Reranker-V1 introduces a fast but not late-interaction reranker that decouples passage pre-encoding from query processing via encoder-decoder architecture and cross-attention to achieve efficiency and competitive performance.

End-to-end Contrastive Language-Speech Pretraining Model For Long-form Spoken Question Answering

cs.SD · 2025-11-12 · unverdicted · novelty 5.0

CLSR is an end-to-end contrastive language-speech retriever using an intermediate text-like conversion step to improve retrieval of relevant segments from long audio for spoken question answering.

A Practice of Post-Training on Llama-3 70B with Optimal Selection of Additional Language Mixture Ratio

cs.CL · 2024-09-10 · unverdicted · novelty 2.0

Empirical practice of continual pre-training Llama-3 models with optimized additional language mixture ratios to enhance Chinese capabilities, showing gains in benchmarks and domains like math and coding.

citing papers explorer

Showing 3 of 3 citing papers.

KaLM-Reranker-V1: Fast but Not Late Interaction for Compressed Document Reranking cs.CL · 2026-06-22 · unverdicted · none · ref 31 · internal anchor
KaLM-Reranker-V1 introduces a fast but not late-interaction reranker that decouples passage pre-encoding from query processing via encoder-decoder architecture and cross-attention to achieve efficiency and competitive performance.
End-to-end Contrastive Language-Speech Pretraining Model For Long-form Spoken Question Answering cs.SD · 2025-11-12 · unverdicted · none · ref 32 · internal anchor
CLSR is an end-to-end contrastive language-speech retriever using an intermediate text-like conversion step to improve retrieval of relevant segments from long audio for spoken question answering.
A Practice of Post-Training on Llama-3 70B with Optimal Selection of Additional Language Mixture Ratio cs.CL · 2024-09-10 · unverdicted · none · ref 3 · internal anchor
Empirical practice of continual pre-training Llama-3 models with optimized additional language mixture ratios to enhance Chinese capabilities, showing gains in benchmarks and domains like math and coding.

DRCD: a Chinese Machine Reading Comprehension Dataset

fields

years

verdicts

representative citing papers

citing papers explorer