KaLM-Reranker-V1 introduces a fast but not late-interaction reranker that decouples passage pre-encoding from query processing via encoder-decoder architecture and cross-attention to achieve efficiency and competitive performance.
DRCD: a Chinese Machine Reading Comprehension Dataset
3 Pith papers cite this work. Polarity classification is still indexing.
abstract
In this paper, we introduce DRCD (Delta Reading Comprehension Dataset), an open domain traditional Chinese machine reading comprehension (MRC) dataset. This dataset aimed to be a standard Chinese machine reading comprehension dataset, which can be a source dataset in transfer learning. The dataset contains 10,014 paragraphs from 2,108 Wikipedia articles and 30,000+ questions generated by annotators. We build a baseline model that achieves an F1 score of 89.59%. F1 score of Human performance is 93.30%.
verdicts
UNVERDICTED 3representative citing papers
CLSR is an end-to-end contrastive language-speech retriever using an intermediate text-like conversion step to improve retrieval of relevant segments from long audio for spoken question answering.
Empirical practice of continual pre-training Llama-3 models with optimized additional language mixture ratios to enhance Chinese capabilities, showing gains in benchmarks and domains like math and coding.
citing papers explorer
-
KaLM-Reranker-V1: Fast but Not Late Interaction for Compressed Document Reranking
KaLM-Reranker-V1 introduces a fast but not late-interaction reranker that decouples passage pre-encoding from query processing via encoder-decoder architecture and cross-attention to achieve efficiency and competitive performance.
-
End-to-end Contrastive Language-Speech Pretraining Model For Long-form Spoken Question Answering
CLSR is an end-to-end contrastive language-speech retriever using an intermediate text-like conversion step to improve retrieval of relevant segments from long audio for spoken question answering.
-
A Practice of Post-Training on Llama-3 70B with Optimal Selection of Additional Language Mixture Ratio
Empirical practice of continual pre-training Llama-3 models with optimized additional language mixture ratios to enhance Chinese capabilities, showing gains in benchmarks and domains like math and coding.