GAIA benchmark shows humans at 92% accuracy on simple real-world questions far outperform current AI systems at 15%, proposing this gap as a key milestone for general AI.
Abstractive Text Summarization using Sequence-to-sequence RNNs and Beyond , url=
6 Pith papers cite this work. Polarity classification is still indexing.
representative citing papers
DPO derives the optimal policy directly from human preferences via a reparameterized reward model, solving the RLHF objective with only a binary classification loss and no sampling or separate reward model.
EVICT adaptively truncates draft trees in MoE speculative decoding by combining drafter signals with profiled costs to retain only cost-effective prefixes, delivering up to 2.35x speedup over autoregressive decoding.
SCURank ranks multiple summary candidates with Summary Content Units to outperform ROUGE and LLM-based methods in summarization distillation.
Uptraining multi-head transformer checkpoints to grouped-query attention models achieves near multi-head quality at multi-query inference speeds using 5% additional compute.
An evidence-based model generates queries from query-free datasets, yielding summaries with competitive ROUGE scores to those using original queries.
citing papers explorer
-
GAIA: a benchmark for General AI Assistants
GAIA benchmark shows humans at 92% accuracy on simple real-world questions far outperform current AI systems at 15%, proposing this gap as a key milestone for general AI.
-
Direct Preference Optimization: Your Language Model is Secretly a Reward Model
DPO derives the optimal policy directly from human preferences via a reparameterized reward model, solving the RLHF objective with only a binary classification loss and no sampling or separate reward model.
-
Making Every Verified Token Count: Adaptive Verification for MoE Speculative Decoding
EVICT adaptively truncates draft trees in MoE speculative decoding by combining drafter signals with profiled costs to retain only cost-effective prefixes, delivering up to 2.35x speedup over autoregressive decoding.
-
SCURank: Ranking Multiple Candidate Summaries with Summary Content Units for Enhanced Summarization
SCURank ranks multiple summary candidates with Summary Content Units to outperform ROUGE and LLM-based methods in summarization distillation.
-
GQA: Training Generalized Multi-Query Transformer Models from Multi-Head Checkpoints
Uptraining multi-head transformer checkpoints to grouped-query attention models achieves near multi-head quality at multi-query inference speeds using 5% additional compute.
-
Generating Query-Focused Summarization Datasets from Query-Free Summarization Datasets
An evidence-based model generates queries from query-free datasets, yielding summaries with competitive ROUGE scores to those using original queries.