Think you have solved question answering? try arc, the ai2 reasoning challenge

Peter Clark, Isaac Cowhey, Oren Etzioni, Tushar Khot, Ashish Sabharwal, Carissa Schoenick, Oyvind Tafjord · 2018

9 Pith papers cite this work. Polarity classification is still indexing.

9 Pith papers citing it

browse 9 citing papers

citation-role summary

background 1 dataset 1

citation-polarity summary

background 1 use dataset 1

representative citing papers

MoEITS: A Green AI approach for simplifying MoE-LLMs

cs.LG · 2026-04-12 · unverdicted · novelty 7.0

MoEITS is an information-theoretic algorithm for pruning experts in MoE-LLMs that produces models with higher accuracy and greater size reduction than prior state-of-the-art methods on Mixtral 8x7B, Qwen1.5-2.7B, and DeepSeek-V2-Lite.

WizardLM: Empowering large pre-trained language models to follow complex instructions

cs.CL · 2023-04-24 · conditional · novelty 7.0

WizardLM uses LLM-driven iterative rewriting to generate complex instruction data and fine-tunes LLaMA to reach over 90% of ChatGPT capacity on 17 of 29 evaluated skills.

LLaMA-Adapter: Efficient Fine-tuning of Language Models with Zero-init Attention

cs.CV · 2023-03-28 · conditional · novelty 7.0

LLaMA-Adapter turns frozen LLaMA 7B into a capable instruction follower using only 1.2M new parameters and zero-init attention, matching Alpaca while extending to image-conditioned reasoning on ScienceQA and COCO.

STS: Efficient Sparse Attention with Speculative Token Sparsity

cs.LG · 2026-05-15 · unverdicted · novelty 6.0

STS repurposes draft-model attention scores from speculative decoding to build token-and-head-wise sparsity masks, delivering 2.67x speedup at ~90% sparsity on NarrativeQA with negligible accuracy loss.

The FineWeb Datasets: Decanting the Web for the Finest Text Data at Scale

cs.CL · 2024-06-25 · unverdicted · novelty 6.0

FineWeb is a curated 15T-token web dataset that produces stronger LLMs than prior open collections, while its educational subset sharply improves performance on MMLU and ARC benchmarks.

Smaug: Fixing Failure Modes of Preference Optimisation with DPO-Positive

cs.CL · 2024-02-20 · conditional · novelty 6.0

DPOP is a new loss function that prevents DPO from lowering preferred response likelihoods and outperforms standard DPO on diverse datasets, MT-Bench, and enables Smaug-72B to exceed 80% on the Open LLM Leaderboard.

Quant.npu: Enabling Efficient Mobile NPU Inference for on-device LLMs via Fully Static Quantization

cs.LG · 2026-05-19 · unverdicted · novelty 5.0

Quant.npu provides a fully static quantization pipeline for on-device LLMs on NPUs by combining rotation matrices, bit-width-aware initialization, two-stage selective optimization, and adaptive mixed precision.

STELLA: A Multimodal LLM for Protein Functional Annotation via Unified Sequence-Structure Encoding

q-bio.BM · 2025-06-04 · unverdicted · novelty 5.0

STELLA aligns ESM3 bimodal sequence-structure encodings with Llama-3.1-8B text modeling to claim state-of-the-art results on protein functional description prediction and enzyme-catalyzed reaction prediction.

LoRA-FA: Efficient and Effective Low Rank Representation Fine-tuning

cs.CL · 2023-08-07 · unverdicted · novelty 5.0

LoRA-FA freezes LoRA's A matrix and trains only B with gradient corrections to approximate full fine-tuning gradients more closely.

citing papers explorer

Showing 9 of 9 citing papers.

MoEITS: A Green AI approach for simplifying MoE-LLMs cs.LG · 2026-04-12 · unverdicted · none · ref 10
MoEITS is an information-theoretic algorithm for pruning experts in MoE-LLMs that produces models with higher accuracy and greater size reduction than prior state-of-the-art methods on Mixtral 8x7B, Qwen1.5-2.7B, and DeepSeek-V2-Lite.
WizardLM: Empowering large pre-trained language models to follow complex instructions cs.CL · 2023-04-24 · conditional · none · ref 11
WizardLM uses LLM-driven iterative rewriting to generate complex instruction data and fine-tunes LLaMA to reach over 90% of ChatGPT capacity on 17 of 29 evaluated skills.
LLaMA-Adapter: Efficient Fine-tuning of Language Models with Zero-init Attention cs.CV · 2023-03-28 · conditional · none · ref 10
LLaMA-Adapter turns frozen LLaMA 7B into a capable instruction follower using only 1.2M new parameters and zero-init attention, matching Alpaca while extending to image-conditioned reasoning on ScienceQA and COCO.
STS: Efficient Sparse Attention with Speculative Token Sparsity cs.LG · 2026-05-15 · unverdicted · none · ref 6
STS repurposes draft-model attention scores from speculative decoding to build token-and-head-wise sparsity masks, delivering 2.67x speedup at ~90% sparsity on NarrativeQA with negligible accuracy loss.
The FineWeb Datasets: Decanting the Web for the Finest Text Data at Scale cs.CL · 2024-06-25 · unverdicted · none · ref 11
FineWeb is a curated 15T-token web dataset that produces stronger LLMs than prior open collections, while its educational subset sharply improves performance on MMLU and ARC benchmarks.
Smaug: Fixing Failure Modes of Preference Optimisation with DPO-Positive cs.CL · 2024-02-20 · conditional · none · ref 17
DPOP is a new loss function that prevents DPO from lowering preferred response likelihoods and outperforms standard DPO on diverse datasets, MT-Bench, and enables Smaug-72B to exceed 80% on the Open LLM Leaderboard.
Quant.npu: Enabling Efficient Mobile NPU Inference for on-device LLMs via Fully Static Quantization cs.LG · 2026-05-19 · unverdicted · none · ref 8
Quant.npu provides a fully static quantization pipeline for on-device LLMs on NPUs by combining rotation matrices, bit-width-aware initialization, two-stage selective optimization, and adaptive mixed precision.
STELLA: A Multimodal LLM for Protein Functional Annotation via Unified Sequence-Structure Encoding q-bio.BM · 2025-06-04 · unverdicted · none · ref 35
STELLA aligns ESM3 bimodal sequence-structure encodings with Llama-3.1-8B text modeling to claim state-of-the-art results on protein functional description prediction and enzyme-catalyzed reaction prediction.
LoRA-FA: Efficient and Effective Low Rank Representation Fine-tuning cs.CL · 2023-08-07 · unverdicted · none · ref 15
LoRA-FA freezes LoRA's A matrix and trains only B with gradient corrections to approximate full fine-tuning gradients more closely.

Think you have solved question answering? try arc, the ai2 reasoning challenge

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer