hub Canonical reference

Attention is all you need

Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, Illia Polosukhin · 2017

Canonical reference. 70% of citing Pith papers cite this work as background.

30 Pith papers citing it

Background 70% of classified citations

browse 30 citing papers

hub tools

JSON dossier citing papers JSON

citation-role summary

background 7 method 3

citation-polarity summary

background 7 use method 3

representative citing papers

LIBERO: Benchmarking Knowledge Transfer for Lifelong Robot Learning

cs.AI · 2023-06-05 · conditional · novelty 8.0

LIBERO is a new benchmark for lifelong robot learning that evaluates transfer of declarative, procedural, and mixed knowledge across 130 manipulation tasks with provided demonstration data.

Federated LoRA Fine-Tuning for LLMs via Collaborative Alignment

stat.ML · 2026-05-20 · unverdicted · novelty 7.0

CLAIR recovers the shared LoRA subspace and detects contaminated clients in heterogeneous federated settings through structured low-rank plus block-sparse decomposition, with theoretical recovery guarantees and empirical gains over local fine-tuning.

Rodrigues Network for Learning Robot Actions

cs.RO · 2025-06-03 · unverdicted · novelty 7.0

Proposes Rodrigues Network using a learnable Neural Rodrigues Operator to add kinematic inductive biases for improved robot action learning and prediction.

We-Math: Does Your Large Multimodal Model Achieve Human-like Mathematical Reasoning?

cs.AI · 2024-07-01 · accept · novelty 7.0

WE-MATH benchmark reveals most LMMs rely on rote memorization for visual math while GPT-4o has shifted toward knowledge generalization.

CASCADE: Case-Based Continual Adaptation for Large Language Models During Deployment

cs.AI · 2026-05-05 · unverdicted · novelty 6.0

CASCADE enables LLMs to continually adapt at deployment via case-based episodic memory and contextual bandits, improving macro-averaged success by 20.9% over zero-shot on 16 tasks spanning medicine, law, code, and robotics.

PEPS: Positional Encoding Projected Sampling -- Extended

cs.CV · 2026-04-27 · unverdicted · novelty 6.0

PEPS decomposes positional encodings into projected points with unique frequency-dependent motions to support more efficient learned grid-based encodings in INRs, outperforming prior methods on image, texture, and SDF tasks with often 25% fewer parameters.

Graph Concept Bottleneck Models

cs.LG · 2025-08-19 · unverdicted · novelty 6.0

GraphCBMs extend concept bottleneck models by building latent concept graphs to model correlations between concepts, yielding better image classification accuracy, more informative structure for interpretability, and stronger intervention results.

AttnTrace: Contextual Attribution of Prompt Injection and Knowledge Corruption

cs.CL · 2025-08-05 · unverdicted · novelty 6.0

AttnTrace is an attention-weight-based context traceback method for LLMs that claims higher accuracy and efficiency than prior art like TracLLM while aiding prompt injection detection.

The Generalization Ridge: Information Flow in Natural Language Generation

cs.CL · 2025-07-07 · unverdicted · novelty 6.0

InfoRidge reveals a non-monotonic pattern in which predictive mutual information between hidden states and outputs peaks in intermediate layers before declining in final layers.

PrefixMemory-Tuning: Modernizing Prefix-Tuning by Decoupling the Prefix from Attention

cs.CL · 2025-06-16 · unverdicted · novelty 6.0

PrefixMemory-Tuning decouples the prefix from attention to overcome performance limits of traditional prefix-tuning and reaches competitive results with modern PEFT methods on LLM adaptation benchmarks.

Block-wise Adaptive Caching for Accelerating Diffusion Policy

cs.AI · 2025-06-16 · unverdicted · novelty 6.0

BAC accelerates transformer-based Diffusion Policy up to 3x by block-level adaptive feature caching using an Adaptive Caching Scheduler and Bubbling Union Algorithm to control error propagation.

Emu3: Next-Token Prediction is All You Need

cs.CV · 2024-09-27 · unverdicted · novelty 6.0

Emu3 shows that next-token prediction on a unified discrete token space for text, images, and video lets a single transformer outperform task-specific models such as SDXL and LLaVA-1.6 in multimodal generation and perception.

The FineWeb Datasets: Decanting the Web for the Finest Text Data at Scale

cs.CL · 2024-06-25 · unverdicted · novelty 6.0

FineWeb is a curated 15T-token web dataset that produces stronger LLMs than prior open collections, while its educational subset sharply improves performance on MMLU and ARC benchmarks.

Hunyuan-DiT: A Powerful Multi-Resolution Diffusion Transformer with Fine-Grained Chinese Understanding

cs.CV · 2024-05-14 · conditional · novelty 6.0

Hunyuan-DiT is a new multi-resolution diffusion transformer that achieves state-of-the-art Chinese text-to-image generation through custom architecture, data pipelines, and multimodal caption refinement.

SGLang: Efficient Execution of Structured Language Model Programs

cs.AI · 2023-12-12 · conditional · novelty 6.0

SGLang is a new system that speeds up structured LLM programs by up to 6.4x using RadixAttention for KV cache reuse and compressed finite state machines for output decoding.

GPTFUZZER: Red Teaming Large Language Models with Auto-Generated Jailbreak Prompts

cs.AI · 2023-09-19 · unverdicted · novelty 6.0

GPTFuzz is a black-box fuzzing framework that mutates seed jailbreak templates to automatically generate effective attacks, achieving over 90% success rates on models including ChatGPT and Llama-2.

Scaling Autoregressive Models for Content-Rich Text-to-Image Generation

cs.CV · 2022-06-22 · unverdicted · novelty 6.0

Scaling an autoregressive Transformer to 20B parameters for text-to-image generation using image token sequences achieves new SOTA zero-shot FID of 7.23 and fine-tuned FID of 3.22 on MS-COCO.

CoCa: Contrastive Captioners are Image-Text Foundation Models

cs.CV · 2022-05-04 · accept · novelty 6.0

CoCa unifies contrastive and generative pretraining in one image-text model to reach 86.3% zero-shot ImageNet accuracy and new state-of-the-art results on multiple downstream benchmarks.

HoloMotion-1 Technical Report

cs.RO · 2026-05-14 · unverdicted · novelty 5.0 · 2 refs

HoloMotion-1 trains a MoE Transformer policy on hybrid video and MoCap motion data to achieve robust zero-shot tracking that transfers directly to real humanoid robots.

StreamPhy: Streaming Inference of High-Dimensional Physical Dynamics via State Space Models

cs.LG · 2026-05-08 · unverdicted · novelty 5.0 · 2 refs

StreamPhy introduces an end-to-end streaming framework using state-space models and an expressive FT-FiLM decoder to infer continuous physical dynamics from irregular sparse data, claiming 48% better accuracy and 20-100X faster inference than diffusion baselines.

What Deserves Memory: Adaptive Memory Distillation for LLM Agents

cs.AI · 2025-08-05 · unverdicted · novelty 5.0

NEMORI is an adaptive memory distillation framework for LLM agents that transforms raw interactions into narratives and extracts insights via prediction error to decide what deserves retention.

PLLaVA : Parameter-free LLaVA Extension from Images to Videos for Video Dense Captioning

cs.CV · 2024-04-25 · conditional · novelty 5.0

A temporal pooling layer added to LLaVA smooths video feature distributions and lifts performance on dense video captioning and QA to new SOTA levels without extra parameters.

TrustLLM: Trustworthiness in Large Language Models

cs.CL · 2024-01-10 · unverdicted · novelty 5.0

TrustLLM defines eight trustworthiness principles, creates a six-dimension benchmark, and evaluates 16 LLMs showing proprietary models generally lead but some open-source ones are close while over-calibration can hurt utility.

Mixtral of Experts

cs.LG · 2024-01-08 · unverdicted · novelty 5.0

Mixtral 8x7B is a sparse MoE LLM activating 2 of 8 experts per layer that matches or exceeds Llama 2 70B and GPT-3.5 on benchmarks while using only 13B active parameters.

citing papers explorer

Showing 7 of 7 citing papers after filters.

LIBERO: Benchmarking Knowledge Transfer for Lifelong Robot Learning cs.AI · 2023-06-05 · conditional · none · ref 66
LIBERO is a new benchmark for lifelong robot learning that evaluates transfer of declarative, procedural, and mixed knowledge across 130 manipulation tasks with provided demonstration data.
We-Math: Does Your Large Multimodal Model Achieve Human-like Mathematical Reasoning? cs.AI · 2024-07-01 · accept · none · ref 3
WE-MATH benchmark reveals most LMMs rely on rote memorization for visual math while GPT-4o has shifted toward knowledge generalization.
CASCADE: Case-Based Continual Adaptation for Large Language Models During Deployment cs.AI · 2026-05-05 · unverdicted · none · ref 47
CASCADE enables LLMs to continually adapt at deployment via case-based episodic memory and contextual bandits, improving macro-averaged success by 20.9% over zero-shot on 16 tasks spanning medicine, law, code, and robotics.
Block-wise Adaptive Caching for Accelerating Diffusion Policy cs.AI · 2025-06-16 · unverdicted · none · ref 31
BAC accelerates transformer-based Diffusion Policy up to 3x by block-level adaptive feature caching using an Adaptive Caching Scheduler and Bubbling Union Algorithm to control error propagation.
SGLang: Efficient Execution of Structured Language Model Programs cs.AI · 2023-12-12 · conditional · none · ref 51
SGLang is a new system that speeds up structured LLM programs by up to 6.4x using RadixAttention for KV cache reuse and compressed finite state machines for output decoding.
GPTFUZZER: Red Teaming Large Language Models with Auto-Generated Jailbreak Prompts cs.AI · 2023-09-19 · unverdicted · none · ref 58
GPTFuzz is a black-box fuzzing framework that mutates seed jailbreak templates to automatically generate effective attacks, achieving over 90% success rates on models including ChatGPT and Llama-2.
What Deserves Memory: Adaptive Memory Distillation for LLM Agents cs.AI · 2025-08-05 · unverdicted · none · ref 1
NEMORI is an adaptive memory distillation framework for LLM agents that transforms raw interactions into narratives and extracts insights via prediction error to decide what deserves retention.

Attention is all you need

hub tools

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer