LIBERO is a new benchmark for lifelong robot learning that evaluates transfer of declarative, procedural, and mixed knowledge across 130 manipulation tasks with provided demonstration data.
hub Canonical reference
Attention is all you need
Canonical reference. 70% of citing Pith papers cite this work as background.
hub tools
citation-role summary
citation-polarity summary
representative citing papers
CLAIR recovers the shared LoRA subspace and detects contaminated clients in heterogeneous federated settings through structured low-rank plus block-sparse decomposition, with theoretical recovery guarantees and empirical gains over local fine-tuning.
Proposes Rodrigues Network using a learnable Neural Rodrigues Operator to add kinematic inductive biases for improved robot action learning and prediction.
WE-MATH benchmark reveals most LMMs rely on rote memorization for visual math while GPT-4o has shifted toward knowledge generalization.
CASCADE enables LLMs to continually adapt at deployment via case-based episodic memory and contextual bandits, improving macro-averaged success by 20.9% over zero-shot on 16 tasks spanning medicine, law, code, and robotics.
PEPS decomposes positional encodings into projected points with unique frequency-dependent motions to support more efficient learned grid-based encodings in INRs, outperforming prior methods on image, texture, and SDF tasks with often 25% fewer parameters.
GraphCBMs extend concept bottleneck models by building latent concept graphs to model correlations between concepts, yielding better image classification accuracy, more informative structure for interpretability, and stronger intervention results.
AttnTrace is an attention-weight-based context traceback method for LLMs that claims higher accuracy and efficiency than prior art like TracLLM while aiding prompt injection detection.
InfoRidge reveals a non-monotonic pattern in which predictive mutual information between hidden states and outputs peaks in intermediate layers before declining in final layers.
PrefixMemory-Tuning decouples the prefix from attention to overcome performance limits of traditional prefix-tuning and reaches competitive results with modern PEFT methods on LLM adaptation benchmarks.
BAC accelerates transformer-based Diffusion Policy up to 3x by block-level adaptive feature caching using an Adaptive Caching Scheduler and Bubbling Union Algorithm to control error propagation.
Emu3 shows that next-token prediction on a unified discrete token space for text, images, and video lets a single transformer outperform task-specific models such as SDXL and LLaVA-1.6 in multimodal generation and perception.
FineWeb is a curated 15T-token web dataset that produces stronger LLMs than prior open collections, while its educational subset sharply improves performance on MMLU and ARC benchmarks.
Hunyuan-DiT is a new multi-resolution diffusion transformer that achieves state-of-the-art Chinese text-to-image generation through custom architecture, data pipelines, and multimodal caption refinement.
SGLang is a new system that speeds up structured LLM programs by up to 6.4x using RadixAttention for KV cache reuse and compressed finite state machines for output decoding.
GPTFuzz is a black-box fuzzing framework that mutates seed jailbreak templates to automatically generate effective attacks, achieving over 90% success rates on models including ChatGPT and Llama-2.
Scaling an autoregressive Transformer to 20B parameters for text-to-image generation using image token sequences achieves new SOTA zero-shot FID of 7.23 and fine-tuned FID of 3.22 on MS-COCO.
CoCa unifies contrastive and generative pretraining in one image-text model to reach 86.3% zero-shot ImageNet accuracy and new state-of-the-art results on multiple downstream benchmarks.
HoloMotion-1 trains a MoE Transformer policy on hybrid video and MoCap motion data to achieve robust zero-shot tracking that transfers directly to real humanoid robots.
StreamPhy introduces an end-to-end streaming framework using state-space models and an expressive FT-FiLM decoder to infer continuous physical dynamics from irregular sparse data, claiming 48% better accuracy and 20-100X faster inference than diffusion baselines.
NEMORI is an adaptive memory distillation framework for LLM agents that transforms raw interactions into narratives and extracts insights via prediction error to decide what deserves retention.
A temporal pooling layer added to LLaVA smooths video feature distributions and lifts performance on dense video captioning and QA to new SOTA levels without extra parameters.
TrustLLM defines eight trustworthiness principles, creates a six-dimension benchmark, and evaluates 16 LLMs showing proprietary models generally lead but some open-source ones are close while over-calibration can hurt utility.
Mixtral 8x7B is a sparse MoE LLM activating 2 of 8 experts per layer that matches or exceeds Llama 2 70B and GPT-3.5 on benchmarks while using only 13B active parameters.
citing papers explorer
-
LIBERO: Benchmarking Knowledge Transfer for Lifelong Robot Learning
LIBERO is a new benchmark for lifelong robot learning that evaluates transfer of declarative, procedural, and mixed knowledge across 130 manipulation tasks with provided demonstration data.
-
We-Math: Does Your Large Multimodal Model Achieve Human-like Mathematical Reasoning?
WE-MATH benchmark reveals most LMMs rely on rote memorization for visual math while GPT-4o has shifted toward knowledge generalization.
-
CASCADE: Case-Based Continual Adaptation for Large Language Models During Deployment
CASCADE enables LLMs to continually adapt at deployment via case-based episodic memory and contextual bandits, improving macro-averaged success by 20.9% over zero-shot on 16 tasks spanning medicine, law, code, and robotics.
-
Block-wise Adaptive Caching for Accelerating Diffusion Policy
BAC accelerates transformer-based Diffusion Policy up to 3x by block-level adaptive feature caching using an Adaptive Caching Scheduler and Bubbling Union Algorithm to control error propagation.
-
SGLang: Efficient Execution of Structured Language Model Programs
SGLang is a new system that speeds up structured LLM programs by up to 6.4x using RadixAttention for KV cache reuse and compressed finite state machines for output decoding.
-
GPTFUZZER: Red Teaming Large Language Models with Auto-Generated Jailbreak Prompts
GPTFuzz is a black-box fuzzing framework that mutates seed jailbreak templates to automatically generate effective attacks, achieving over 90% success rates on models including ChatGPT and Llama-2.
-
What Deserves Memory: Adaptive Memory Distillation for LLM Agents
NEMORI is an adaptive memory distillation framework for LLM agents that transforms raw interactions into narratives and extracts insights via prediction error to decide what deserves retention.