hub

RACE : Large-scale R e A ding comprehension dataset from examinations

Guokun Lai, Qizhe Xie, Hanxiao Liu, Yiming Yang, Eduard Hovy · 2017 · DOI 10.18653/v1/d17-1082

15 Pith papers cite this work. Polarity classification is still indexing.

15 Pith papers citing it

open at publisher browse 15 citing papers

hub tools

JSON dossier citing papers JSON publisher DOI

representative citing papers

Queryable LoRA: Instruction-Regularized Routing Over Shared Low-Rank Update Atoms

cs.LG · 2026-05-08 · unverdicted · novelty 7.0

Queryable LoRA adds dynamic routing over shared low-rank atoms with attention and language-instruction regularization to make parameter-efficient fine-tuning more adaptive across inputs and layers.

Are LLM Uncertainty and Correctness Encoded by the Same Features? A Functional Dissociation via Sparse Autoencoders

cs.LG · 2026-04-21 · unverdicted · novelty 7.0

Uncertainty and correctness in LLMs are encoded by distinct feature populations, with suppression of confounded features improving accuracy and reducing entropy.

SimDiff: Depth Pruning via Similarity and Difference

cs.AI · 2026-04-21 · unverdicted · novelty 7.0

SimDiff uses similarity and difference metrics to prune LLM layers more effectively than cosine similarity alone, retaining over 91% performance at 25% pruning on LLaMA2-7B.

DeepSeek-V2: A Strong, Economical, and Efficient Mixture-of-Experts Language Model

cs.CL · 2024-05-07 · unverdicted · novelty 7.0

DeepSeek-V2 delivers top-tier open-source LLM performance using only 21B active parameters by compressing the KV cache 93.3% and cutting training costs 42.5% via MLA and DeepSeekMoE.

GAIA: a benchmark for General AI Assistants

cs.CL · 2023-11-21 · unverdicted · novelty 7.0

GAIA benchmark shows humans at 92% accuracy on simple real-world questions far outperform current AI systems at 15%, proposing this gap as a key milestone for general AI.

The Power of Scale for Parameter-Efficient Prompt Tuning

cs.CL · 2021-04-18 · unverdicted · novelty 7.0

Prompt tuning matches full model tuning performance on large language models while tuning only a small fraction of parameters and improves robustness to domain shifts.

ALBERT: A Lite BERT for Self-supervised Learning of Language Representations

cs.CL · 2019-09-26 · accept · novelty 7.0

ALBERT reduces BERT parameters via embedding factorization and layer sharing, adds inter-sentence coherence pretraining, and reaches SOTA on GLUE, RACE, and SQuAD with fewer parameters than BERT-large.

BoolQ: Exploring the Surprising Difficulty of Natural Yes/No Questions

cs.CL · 2019-05-24 · accept · novelty 7.0

BoolQ introduces naturally occurring yes/no questions as a challenging benchmark where BERT fine-tuned on MultiNLI reaches 80.4% accuracy against 90% human performance.

Attention to Mamba: A Recipe for Cross-Architecture Distillation

cs.CL · 2026-04-01 · unverdicted · novelty 6.0

A two-stage distillation recipe converts a Pythia-1B Transformer into a Mamba model that preserves performance with perplexity 14.11 versus the teacher's 13.86.

Representation Engineering: A Top-Down Approach to AI Transparency

cs.LG · 2023-10-02 · unverdicted · novelty 6.0

Representation engineering uses population-level representations in deep neural networks to monitor and manipulate cognitive phenomena like honesty and harmlessness, providing simple effective baselines for LLM safety.

BloombergGPT: A Large Language Model for Finance

cs.LG · 2023-03-30 · conditional · novelty 6.0

BloombergGPT is a 50B parameter LLM trained on a 708B token mixed financial and general dataset that outperforms prior models on financial benchmarks while preserving general LLM performance.

PaLM: Scaling Language Modeling with Pathways

cs.CL · 2022-04-05 · accept · novelty 6.0

PaLM 540B demonstrates continued scaling benefits by setting new few-shot SOTA results on hundreds of benchmarks and outperforming humans on BIG-bench.

DeepSeekMoE: Towards Ultimate Expert Specialization in Mixture-of-Experts Language Models

cs.CL · 2024-01-11 · unverdicted · novelty 5.0

DeepSeekMoE 2B matches GShard 2.9B performance and approaches a dense 2B model; the 16B version matches LLaMA2-7B at 40% compute by using fine-grained expert segmentation plus shared experts.

PaLM 2 Technical Report

cs.CL · 2023-05-17 · unverdicted · novelty 5.0

PaLM 2 reports state-of-the-art results on language, reasoning, and multilingual tasks with improved efficiency over PaLM.

DeepSeek LLM: Scaling Open-Source Language Models with Longtermism

cs.CL · 2024-01-05 · unverdicted · novelty 4.0

DeepSeek LLM 67B exceeds LLaMA-2 70B on code, mathematics and reasoning benchmarks after pre-training on 2 trillion tokens and alignment via SFT and DPO.

citing papers explorer

Showing 15 of 15 citing papers.

Queryable LoRA: Instruction-Regularized Routing Over Shared Low-Rank Update Atoms cs.LG · 2026-05-08 · unverdicted · none · ref 9
Queryable LoRA adds dynamic routing over shared low-rank atoms with attention and language-instruction regularization to make parameter-efficient fine-tuning more adaptive across inputs and layers.
Are LLM Uncertainty and Correctness Encoded by the Same Features? A Functional Dissociation via Sparse Autoencoders cs.LG · 2026-04-21 · unverdicted · none · ref 44
Uncertainty and correctness in LLMs are encoded by distinct feature populations, with suppression of confounded features improving accuracy and reducing entropy.
SimDiff: Depth Pruning via Similarity and Difference cs.AI · 2026-04-21 · unverdicted · none · ref 24
SimDiff uses similarity and difference metrics to prune LLM layers more effectively than cosine similarity alone, retaining over 91% performance at 25% pruning on LLaMA2-7B.
DeepSeek-V2: A Strong, Economical, and Efficient Mixture-of-Experts Language Model cs.CL · 2024-05-07 · unverdicted · none · ref 105
DeepSeek-V2 delivers top-tier open-source LLM performance using only 21B active parameters by compressing the KV cache 93.3% and cutting training costs 42.5% via MLA and DeepSeekMoE.
GAIA: a benchmark for General AI Assistants cs.CL · 2023-11-21 · unverdicted · none · ref 112
GAIA benchmark shows humans at 92% accuracy on simple real-world questions far outperform current AI systems at 15%, proposing this gap as a key milestone for general AI.
The Power of Scale for Parameter-Efficient Prompt Tuning cs.CL · 2021-04-18 · unverdicted · none · ref 24
Prompt tuning matches full model tuning performance on large language models while tuning only a small fraction of parameters and improves robustness to domain shifts.
ALBERT: A Lite BERT for Self-supervised Learning of Language Representations cs.CL · 2019-09-26 · accept · none · ref 22
ALBERT reduces BERT parameters via embedding factorization and layer sharing, adds inter-sentence coherence pretraining, and reaches SOTA on GLUE, RACE, and SQuAD with fewer parameters than BERT-large.
BoolQ: Exploring the Surprising Difficulty of Natural Yes/No Questions cs.CL · 2019-05-24 · accept · none · ref 18
BoolQ introduces naturally occurring yes/no questions as a challenging benchmark where BERT fine-tuned on MultiNLI reaches 80.4% accuracy against 90% human performance.
Attention to Mamba: A Recipe for Cross-Architecture Distillation cs.CL · 2026-04-01 · unverdicted · none · ref 18
A two-stage distillation recipe converts a Pythia-1B Transformer into a Mamba model that preserves performance with perplexity 14.11 versus the teacher's 13.86.
Representation Engineering: A Top-Down Approach to AI Transparency cs.LG · 2023-10-02 · unverdicted · none · ref 2
Representation engineering uses population-level representations in deep neural networks to monitor and manipulate cognitive phenomena like honesty and harmlessness, providing simple effective baselines for LLM safety.
BloombergGPT: A Large Language Model for Finance cs.LG · 2023-03-30 · conditional · none · ref 56
BloombergGPT is a 50B parameter LLM trained on a 708B token mixed financial and general dataset that outperforms prior models on financial benchmarks while preserving general LLM performance.
PaLM: Scaling Language Modeling with Pathways cs.CL · 2022-04-05 · accept · none · ref 80
PaLM 540B demonstrates continued scaling benefits by setting new few-shot SOTA results on hundreds of benchmarks and outperforming humans on BIG-bench.
DeepSeekMoE: Towards Ultimate Expert Specialization in Mixture-of-Experts Language Models cs.CL · 2024-01-11 · unverdicted · none · ref 31
DeepSeekMoE 2B matches GShard 2.9B performance and approaches a dense 2B model; the 16B version matches LLaMA2-7B at 40% compute by using fine-grained expert segmentation plus shared experts.
PaLM 2 Technical Report cs.CL · 2023-05-17 · unverdicted · none · ref 85
PaLM 2 reports state-of-the-art results on language, reasoning, and multilingual tasks with improved efficiency over PaLM.
DeepSeek LLM: Scaling Open-Source Language Models with Longtermism cs.CL · 2024-01-05 · unverdicted · none · ref 115
DeepSeek LLM 67B exceeds LLaMA-2 70B on code, mathematics and reasoning benchmarks after pre-training on 2 trillion tokens and alignment via SFT and DPO.

RACE : Large-scale R e A ding comprehension dataset from examinations

hub tools

fields

years

verdicts

representative citing papers

citing papers explorer