mega hub Canonical reference

LLaMA: Open and Efficient Foundation Language Models

· 2023 · cs.CL · arXiv 2302.13971

Canonical reference. 82% of citing Pith papers cite this work as background.

1301 Pith papers citing it

Background 82% of classified citations

open full Pith review browse 1301 citing papers arXiv PDF

abstract

We introduce LLaMA, a collection of foundation language models ranging from 7B to 65B parameters. We train our models on trillions of tokens, and show that it is possible to train state-of-the-art models using publicly available datasets exclusively, without resorting to proprietary and inaccessible datasets. In particular, LLaMA-13B outperforms GPT-3 (175B) on most benchmarks, and LLaMA-65B is competitive with the best models, Chinchilla-70B and PaLM-540B. We release all our models to the research community.

hub tools

JSON dossier citing papers JSON arXiv source

citation-role summary

background 207 method 20 baseline 8 other 6 dataset 1 extension 1

citation-polarity summary

background 199 use method 21 unclear 13 baseline 7 extend 1 support 1 use dataset 1

claims ledger

abstract We introduce LLaMA, a collection of foundation language models ranging from 7B to 65B parameters. We train our models on trillions of tokens, and show that it is possible to train state-of-the-art models using publicly available datasets exclusively, without resorting to proprietary and inaccessible datasets. In particular, LLaMA-13B outperforms GPT-3 (175B) on most benchmarks, and LLaMA-65B is competitive with the best models, Chinchilla-70B and PaLM-540B. We release all our models to the research community.

mega hub controls

export citing contexts JSON export graph JSON export full bundle JSON open full Pith review annotated reader queued

Recognition alignment

counterfactual ablation

If this work disappeared, these are the nearest dependency candidates in Pith, weighted toward method, dataset, baseline, and extension contexts where available. This is a structural signal, not a retraction verdict.

co-cited works

representative citing papers

Tight Sample Complexity of Transformers

cs.LG · 2026-06-08 · unverdicted · novelty 8.0

Depth-L transformers with W parameters have VC dimension Theta(L W log(T W)), yielding matching O(L W log((T+T')W)) upper and Omega(L W log((T+T')W/L)) lower bounds on sample complexity for chain-of-thought learning.

SVHalluc: Benchmarking Speech-Vision Hallucination in Audio-Visual Large Language Models

eess.AS · 2026-05-31 · unverdicted · novelty 8.0

SVHalluc benchmark shows open-source audio-visual LLMs achieve near-random accuracy on semantic and temporal speech-vision alignment tasks while Gemini 2.5 Pro performs substantially better.

Privacy Auditing with Zero (0) Training Run

cs.CR · 2026-05-14 · unverdicted · novelty 8.0

Zero-Run auditing supplies valid lower bounds on differential privacy parameters from fixed member and non-member datasets by modeling and correcting distribution-shift confounding via causal-inference techniques.

Effective Context in Transformers: An Analysis of Fragmentation and Tokenization

cs.LG · 2026-05-13 · unverdicted · novelty 8.0

Fragmentation strictly raises optimal finite-context log-loss on Markov sources while tokenization can make a short token window equivalent to a longer source window under reliability and compression conditions.

Grid Games: The Power of Multiple Grids for Quantizing Large Language Models

cs.LG · 2026-05-12 · accept · novelty 8.0

Allowing each quantization group to select among multiple 4-bit grids improves accuracy over single-grid FP4 for both post-training and pre-training of LLMs.

Steering Without Breaking: Mechanistically Informed Interventions for Discrete Diffusion Language Models

cs.LG · 2026-05-08 · unverdicted · novelty 8.0

Adaptive scheduling of interventions in discrete diffusion language models, timed to attribute-specific commitment schedules discovered with sparse autoencoders, delivers precise multi-attribute steering up to 93% strength while preserving generation quality.

When and Why SignSGD Outperforms SGD: A Theoretical Study Based on $\ell_1$-norm Lower Bounds

cs.LG · 2026-05-07 · unverdicted · novelty 8.0

SignSGD provably beats SGD by a factor of d under sparse noise via matched ℓ1-norm upper and lower bounds, with an equivalent result for Muon on matrices, and this predicts faster GPT-2 pretraining.

Backdoor Attacks on Decentralised Post-Training

cs.CR · 2026-03-31 · conditional · novelty 8.0 · 2 refs

An adversary controlling an intermediate pipeline stage in decentralized LLM post-training can inject a backdoor that reduces alignment from 80% to 6%, with the backdoor persisting in 60% of cases even after subsequent safety training.

Model Context Protocol (MCP) at First Glance: Studying the Security and Maintainability of MCP Servers

cs.SE · 2025-06-16 · conditional · novelty 8.0

First study of 1,899 MCP servers finds eight distinct vulnerabilities (only three traditional), 7.2% with general issues, 5.5% with tool poisoning, and 66% with code smells, urging MCP-specific security practices.

BEAVER: An Enterprise Benchmark for Text-to-SQL

cs.CL · 2024-09-03 · unverdicted · novelty 8.0

BEAVER is the first text-to-SQL benchmark from private enterprise data warehouses, revealing SOTA agentic frameworks achieve only 10.8% accuracy on complex real-world queries.

MME-RealWorld: Could Your Multimodal LLM Challenge High-Resolution Real-World Scenarios that are Difficult for Humans?

cs.CV · 2024-08-23 · conditional · novelty 8.0

MME-RealWorld is the largest manually annotated high-resolution benchmark for MLLMs, where even the best models achieve less than 60% accuracy on challenging real-world tasks.

AgentDojo: A Dynamic Environment to Evaluate Prompt Injection Attacks and Defenses for LLM Agents

cs.CR · 2024-06-19 · unverdicted · novelty 8.0

AgentDojo introduces an extensible evaluation framework populated with realistic agent tasks and security test cases to measure prompt injection robustness in tool-using LLM agents.

AgentClinic: a multimodal agent benchmark to evaluate AI in simulated clinical environments

cs.HC · 2024-05-13 · conditional · novelty 8.0

AgentClinic is a multimodal agent benchmark demonstrating that LLM diagnostic accuracy on MedQA drops to below one-tenth in sequential clinical simulations, with Claude-3.5 leading and large tool-use differences across models.

ORPO: Monolithic Preference Optimization without Reference Model

cs.CL · 2024-03-12 · conditional · novelty 8.0

ORPO performs preference alignment during supervised fine-tuning via a monolithic odds ratio penalty, allowing 7B models to outperform larger state-of-the-art models on alignment benchmarks.

Bridging Language and Items for Retrieval and Recommendation: Benchmarking LLMs as Semantic Encoders

cs.IR · 2024-03-06 · unverdicted · novelty 8.0

BLaIR is a new benchmark and 570M-review dataset showing that LLM performance rankings on recommendation tasks have little correlation with rankings on general embedding benchmarks like MTEB.

Mamba: Linear-Time Sequence Modeling with Selective State Spaces

cs.LG · 2023-12-01 · unverdicted · novelty 8.0

Mamba is a linear-time sequence model using input-dependent selective SSMs that achieves SOTA results across modalities and matches twice-larger Transformers on language modeling with 5x higher inference throughput.

MMMU: A Massive Multi-discipline Multimodal Understanding and Reasoning Benchmark for Expert AGI

cs.CL · 2023-11-27 · unverdicted · novelty 8.0

MMMU provides 11.5K heterogeneous college-level multimodal questions that current models solve at 56-59% accuracy, establishing a new standard for expert multimodal evaluation.

Tree of Thoughts: Deliberate Problem Solving with Large Language Models

cs.CL · 2023-05-17 · accept · novelty 8.0

Tree of Thoughts enables language models to solve complex planning tasks by generating, evaluating, and searching over coherent intermediate thoughts in a tree, raising Game of 24 success from 4% to 74% with GPT-4.

API-Bank: A Comprehensive Benchmark for Tool-Augmented LLMs

cs.CL · 2023-04-14 · conditional · novelty 8.0

API-Bank is a new benchmark and training dataset for tool-augmented LLMs that shows fine-tuned models can approach GPT-3.5 tool-use effectiveness.

Instruction Tuning with GPT-4

cs.CL · 2023-04-06 · unverdicted · novelty 8.0

GPT-4-generated instruction data produces superior zero-shot performance in finetuned LLaMA models versus prior state-of-the-art data.

HERMES: A Multi-Granularity Labeling Substrate for Pre-training Data Mixtures

cs.LG · 2026-07-02 · unverdicted · novelty 7.0

HERMES provides a reusable hierarchical labeling substrate for pre-training data that reveals granularity-specific effects in data mixing rules during model training.

OntoLearner: A Modular Python Library for Ontology Learning with Large Language Models

cs.AI · 2026-07-02 · unverdicted · novelty 7.0

OntoLearner supplies the first cross-domain ontology collection and benchmarking infrastructure for LLM-driven ontology learning, finding that failure scales with ontological complexity instead of model size.

Agentic generation of verifiable rules for deterministic, self-expanding reaction classification

cs.AI · 2026-07-01 · unverdicted · novelty 7.0

Multi-agent LLMs generate and verify 14,073 deterministic reaction rules from 665,901 patents, enabling 97.7% classification of unseen reactions with finer resolution than fixed proprietary systems.

Probing Memorization of Tabular In-Context Learning

cs.LG · 2026-06-30 · unverdicted · novelty 7.0

A new probing framework detects moderate parametric memorization signals in tabular in-context learning models under single-task fine-tuning, strongest on low-cardinality tasks, but signals largely disappear under realistic training.

citing papers explorer

Showing 50 of 1301 citing papers.

Echo-LoRA: Parameter-Efficient Fine-Tuning via Cross-Layer Representation Injection cs.LG · 2026-05-05 · unverdicted · none · ref 15 · internal anchor
Echo-LoRA raises average performance on eight commonsense reasoning benchmarks by 3.0 to 5.7 points over standard LoRA by using a training-only cross-layer echo representation that is discarded after training.
VIDA: A dataset for Visually Dependent Ambiguity in Multimodal Machine Translation cs.CL · 2026-05-03 · unverdicted · none · ref 54 · internal anchor
VIDA provides 2,500 visually-dependent ambiguous translation examples and span-level disambiguation metrics; CoT-SFT on LVLMs improves out-of-distribution performance over standard SFT.
A Multi-View Media Profiling Suite: Resources, Evaluation, and Analysis cs.CL · 2026-05-02 · unverdicted · none · ref 61 · internal anchor
Presents MBFC-2025 dataset and multi-view embeddings with fusion methods for media bias and factuality, reporting SOTA results on ACL-2020 and new benchmarks on MBFC-2025.
One Pass, Any Order: Position-Invariant Listwise Reranking for LLM-Based Recommendation cs.IR · 2026-04-30 · conditional · none · ref 28 · internal anchor
InvariRank achieves permutation-invariant listwise reranking for LLM-based recommendations via a structured attention mask that blocks cross-candidate interactions and shared positional framing under RoPE, enabling stable rankings in one forward pass.
Low Rank Adaptation for Adversarial Perturbation cs.LG · 2026-04-30 · unverdicted · none · ref 87 · internal anchor
Adversarial perturbations possess an inherently low-rank structure that enables more efficient and effective black-box adversarial attacks via subspace projection.
Quantifying Prior Dominance in RAG Systems cs.CL · 2026-04-29 · unverdicted · none · ref 23 · internal anchor
Introduces NCU metric using token log-probabilities and finds small language models match or outperform larger ones in strict factual RAG extraction, while commercial APIs show high prior dominance and negative transfer.
Adaptive and Fine-grained Module-wise Expert Pruning for Efficient LoRA-MoE Fine-Tuning cs.LG · 2026-04-29 · unverdicted · none · ref 4 · internal anchor
DMEP prunes experts module-by-module in LoRA-MoE and removes load balancing after pruning, cutting trainable parameters 35-43% and raising throughput ~10% while matching or exceeding uniform baselines on reasoning tasks.
Membership Inference Attacks Against Video Large Language Models cs.CR · 2026-04-29 · unverdicted · none · ref 36 · internal anchor
A temperature-perturbed black-box attack infers video training membership in VideoLLMs with 0.68 AUC by exploiting sharper generation behavior on member samples.
Spectral Selection in Symmetric Self-Attention Dynamics math.DS · 2026-04-28 · unverdicted · none · ref 28 · internal anchor
Symmetric self-attention dynamics select the dominant eigendirection of V, producing homogeneous alignment when one positive eigenvalue dominates or sign-split polarization when V is negative definite.
Three Models of RLHF Annotation: Extension, Evidence, and Authority cs.CY · 2026-04-28 · unverdicted · none · ref 53 · internal anchor
RLHF should decompose annotations into dimensions each matched to one of three models—extension, evidence, or authority—instead of applying a single unified pipeline.
Incompressible Knowledge Probes: Estimating Black-Box LLM Parameter Counts via Factual Capacity cs.LG · 2026-04-27 · unverdicted · none · ref 18 · internal anchor
Incompressible Knowledge Probes enable log-linear estimation of LLM parameter counts from factual accuracy on obscure questions, showing continued scaling of knowledge capacity across open and closed models.
Can an MLP Absorb Its Own Skip Connection? cs.LG · 2026-04-26 · accept · none · ref 10 · internal anchor
Skip-connected MLPs and residual-free MLPs of equal width represent generically disjoint function classes for common activations, with explicit impossibility proofs and a non-generic absorption condition for ReLU and GELU.
A Systematic Survey of Security Threats and Defenses in LLM-Based AI Agents: A Layered Attack Surface Framework cs.CR · 2026-04-25 · unverdicted · none · ref 24 · internal anchor
A new 7x4 taxonomy organizes agentic AI security threats by architectural layer and persistence timescale, revealing under-explored upper layers and missing defenses after surveying 116 papers.
Fine-tuning vs. In-context Learning in Large Language Models: A Formal Language Learning Perspective cs.CL · 2026-04-25 · conditional · none · ref 66 · 2 links · internal anchor
A controlled formal language task reveals fine-tuning outperforms in-context learning on in-distribution generalization but equals it on out-of-distribution, with ICL showing greater sensitivity to model size and tokenization.
RAG-Reflect: Agentic Retrieval-Augmented Generation with Reflections for Comment-Driven Code Maintenance on Stack Overflow cs.SE · 2026-04-24 · unverdicted · none · ref 44 · internal anchor
RAG-Reflect achieves F1=0.78 on valid comment-edit prediction using retrieval-augmented reasoning and self-reflection, outperforming baselines and approaching fine-tuned models without retraining.
Shard the Gradient, Scale the Model: Serverless Federated Aggregation via Gradient Partitioning cs.DC · 2026-04-23 · unverdicted · none · ref 4 · internal anchor
GradsSharding shards gradients for serverless federated aggregation to support arbitrarily large models with identical results to traditional methods and cost savings above 500 MB gradient size.
VistaBot: View-Robust Robot Manipulation via Spatiotemporal-Aware View Synthesis cs.RO · 2026-04-23 · unverdicted · none · ref 9 · internal anchor
VistaBot integrates 4D geometry estimation and spatiotemporal view synthesis into action policies to improve cross-view generalization by 2.6-2.8x on a new VGS metric in simulation and real tasks.
Locating acts of mechanistic reasoning in student team conversations with mechanistic machine learning physics.ed-ph · 2026-04-23 · unverdicted · none · ref 23 · internal anchor
A probabilistic model with domain-aligned inductive bias detects acts of mechanistic reasoning in student conversations and shows improved generalization to unseen students and novel contexts.
Misinformation Span Detection in Videos via Audio Transcripts cs.CL · 2026-04-23 · unverdicted · none · ref 60 · internal anchor
New datasets and language model classifiers enable detection of misinformation spans in video transcripts with an F1 score of 0.68.
VARestorer: One-Step VAR Distillation for Real-World Image Super-Resolution cs.CV · 2026-04-23 · unverdicted · none · ref 12 · internal anchor
VARestorer converts a text-to-image VAR model into a fast one-step real-world image super-resolution model via distribution matching distillation and pyramid image conditioning.
Breaking MCP with Function Hijacking Attacks: Novel Threats for Function Calling and Agentic Models cs.CR · 2026-04-22 · unverdicted · none · ref 19 · internal anchor
A novel function hijacking attack achieves 70-100% success rates in forcing specific function calls across five LLMs on the BFCL benchmark and is robust to context semantics.
SimDiff: Depth Pruning via Similarity and Difference cs.AI · 2026-04-21 · unverdicted · none · ref 8 · internal anchor
SimDiff uses similarity and difference metrics to prune LLM layers more effectively than cosine similarity alone, retaining over 91% performance at 25% pruning on LLaMA2-7B.
Learning Posterior Predictive Distributions for Node Classification from Synthetic Graph Priors cs.LG · 2026-04-21 · unverdicted · none · ref 280 · internal anchor
NodePFN pre-trains on synthetic graphs with controllable homophily and causal feature-label models to achieve 71.27 average accuracy on 23 node classification benchmarks without graph-specific training.
Local Linearity of LLMs Enables Activation Steering via Model-Based Linear Optimal Control cs.LG · 2026-04-21 · conditional · none · ref 78 · internal anchor
Local linearity of LLM layers enables LQR-based closed-loop activation steering with theoretical tracking guarantees.
Faster by Design: Interactive Aerodynamics via Neural Surrogates Trained on Expert-Validated CFD cs.LG · 2026-04-20 · unverdicted · none · ref 33 · internal anchor
A graph-based neural operator trained on expert-validated race-car CFD data reaches accuracy levels usable for early-stage interactive aerodynamic design exploration.
Long-Text-to-Image Generation via Compositional Prompt Decomposition cs.CV · 2026-04-20 · unverdicted · none · ref 40 · internal anchor
PRISM lets pre-trained text-to-image models handle long prompts by breaking them into compositional parts, predicting noise separately, and merging outputs via energy-based conjunction, matching fine-tuned models while generalizing better to prompts over 500 tokens.
Region-Grounded Report Generation for 3D Medical Imaging: A Fine-Grained Dataset and Graph-Enhanced Framework cs.CV · 2026-04-20 · conditional · none · ref 52 · internal anchor
Introduces the first large-scale 3D PET/CT dataset with fine-grained RoI annotations for Vietnamese and a graph-enhanced HiRRA framework that achieves SOTA report generation by modeling RoI dependencies.
LookasideVLN: Direction-Aware Aerial Vision-and-Language Navigation cs.CV · 2026-04-19 · unverdicted · none · ref 35 · internal anchor
LookasideVLN improves aerial vision-and-language navigation by encoding directional cues from instructions into an egocentric graph and lightweight knowledge base, outperforming prior methods like CityNavAgent even with single-step lookahead.
Cognitive Policy-Driven LLM for Diagnosis and Intervention of Cognitive Distortions in Emotional Support Conversation cs.CL · 2026-04-19 · unverdicted · none · ref 4 · internal anchor
The CogBiasESC dataset and CoPoLLM framework enable LLMs to diagnose cognitive distortions and apply interventions in emotional support conversations, outperforming baselines on accuracy, effectiveness, and safety.
Modeling Multi-Dimensional Cognitive States in Large Language Models under Cognitive Crowding cs.CL · 2026-04-19 · unverdicted · none · ref 43 · internal anchor
CognitiveBench reveals LLMs suffer representation overlap on joint cognitive tasks due to hierarchical structure; HyCoLLM in hyperbolic space fixes the mismatch and outperforms GPT-4o with far fewer parameters.
Understanding and Enforcing Weight Disentanglement in Task Arithmetic cs.AI · 2026-04-18 · unverdicted · none · ref 41 · internal anchor
Task-Feature Specialization explains weight disentanglement in task arithmetic and leads to orthogonality, which OrthoReg enforces to enhance performance of model composition methods.
On the Robustness of LLM-Based Dense Retrievers: A Systematic Analysis of Generalizability and Stability cs.IR · 2026-04-17 · unverdicted · none · ref 69 · internal anchor
LLM-based dense retrievers generalize better when instruction-tuned but pay a specialization tax when optimized for reasoning; they resist typos and corpus poisoning better than encoder-only baselines yet remain vulnerable to semantic perturbations, with larger models and certain embedding geometry,
RAGognizer: Hallucination-Aware Fine-Tuning via Detection Head Integration cs.CL · 2026-04-17 · unverdicted · none · ref 30 · internal anchor
RAGognizer adds a detection head to LLMs for joint training on generation and token-level hallucination detection, yielding SOTA detection and fewer hallucinations in RAG while preserving output quality.
IG-Search: Step-Level Information Gain Rewards for Search-Augmented Reasoning cs.AI · 2026-04-16 · unverdicted · none · ref 31 · internal anchor
IG-Search computes step-level information gain rewards from policy probabilities to improve credit assignment in RL training for search-augmented QA, yielding 1.6-point gains over trajectory-level baselines on multi-hop tasks.
Beyond Visual Cues: Semantic-Driven Token Filtering and Expert Routing for Anytime Person ReID cs.CV · 2026-04-16 · unverdicted · none · ref 40 · internal anchor
STFER uses LVLM-generated identity-consistent semantic text to drive visual token filtering and expert routing for improved any-time person re-identification under clothing changes and modality shifts.
Does RL Expand the Capability Boundary of LLM Agents? A PASS@(k,T) Analysis cs.LG · 2026-04-16 · unverdicted · none · ref 17 · internal anchor
RL expands the capability boundary of LLM agents on compositional tool-use tasks, shown by non-converging pass curves at large k with increasing T, while SFT regresses it and the effect is absent on simpler tasks.
Prompt-Guided Image Editing with Masked Logit Nudging in Visual Autoregressive Models cs.CV · 2026-04-16 · unverdicted · none · ref 42 · internal anchor
Masked Logit Nudging aligns visual autoregressive model logits with source token maps under target prompts inside cross-attention masks, delivering top image editing results on PIE benchmarks and strong reconstructions on COCO and OpenImages while running faster than diffusion approaches.
Don't Let the Video Speak: Audio-Contrastive Preference Optimization for Audio-Visual Language Models cs.CV · 2026-04-15 · unverdicted · none · ref 47 · internal anchor
Audio-Contrastive Preference Optimization (ACPO) mitigates audio hallucination in AVLMs via output-contrastive and input-contrastive objectives that enforce faithful audio grounding.
Reinforcement Learning via Value Gradient Flow cs.LG · 2026-04-15 · unverdicted · none · ref 65 · internal anchor
VGF solves behavior-regularized RL by transporting particles from a reference distribution to the value-induced optimal policy via discrete value-guided gradient flow.
PipeLive: Efficient Live In-place Pipeline Parallelism Reconfiguration for Dynamic LLM Serving cs.DC · 2026-04-14 · unverdicted · none · ref 26 · internal anchor
PipeLive enables live pipeline parallelism reconfiguration for LLMs via KV cache redesign and VM-migration-inspired patching, cutting TTFT by 2.5x and reconfiguration time to under 10ms.
PR-MaGIC: Prompt Refinement Via Mask Decoder Gradient Flow For In-Context Segmentation cs.CV · 2026-04-13 · unverdicted · none · ref 27 · internal anchor
PR-MaGIC refines prompts in in-context segmentation via test-time gradient flow from the mask decoder plus top-1 selection, yielding better masks across benchmarks without training.
Robust Explanations for User Trust in Enterprise NLP Systems cs.CL · 2026-04-13 · unverdicted · none · ref 4 · internal anchor
Decoder LLMs produce substantially more stable explanations than encoder models, with 73% lower top-token flip rates on average and further 44% gains from 7B to 70B scale.
MIXAR: Scaling Autoregressive Pixel-based Language Models to Multiple Languages and Scripts cs.CL · 2026-04-13 · unverdicted · none · ref 28 · internal anchor
MIXAR is the first autoregressive pixel-based language model for eight languages and scripts, with empirical gains on multilingual tasks, robustness to unseen languages, and further improvements when scaled to 0.5B parameters.
Winner-Take-All Spiking Transformer for Language Modeling cs.NE · 2026-04-13 · unverdicted · none · ref 12 · internal anchor
Winner-take-all spiking self-attention replaces softmax in spiking transformers to support language modeling on 16 datasets with spike-driven, energy-efficient architectures.
Semantic-Geometric Dual Compression: Training-Free Visual Token Reduction for Ultra-High-Resolution Remote Sensing Understanding cs.CV · 2026-04-13 · unverdicted · none · ref 36 · internal anchor
DualComp uses a lightweight router to split visual token compression into a semantic stream with size-adaptive clustering and a geometric stream with path-tracing recovery, enabling low-cost high-fidelity UHR remote sensing interpretation.
A Benchmark and Multi-Agent System for Instruction-driven Cinematic Video Compilation cs.CV · 2026-04-12 · unverdicted · none · ref 36 · internal anchor
CineAgents is a multi-agent system that builds hierarchical narrative memory via script reverse-engineering and uses iterative planning to produce instruction-driven cinematic video compilations with better coherence than prior methods.
EgoTL: Egocentric Think-Aloud Chains for Long-Horizon Tasks cs.CV · 2026-04-10 · unverdicted · none · ref 44 · internal anchor
EgoTL provides a new egocentric dataset with think-aloud chains and metric labels that benchmarks VLMs on long-horizon tasks and improves their planning, reasoning, and spatial grounding after finetuning.
Envisioning the Future, One Step at a Time cs.CV · 2026-04-10 · unverdicted · none · ref 101 · internal anchor
An autoregressive diffusion model on sparse point trajectories predicts multi-modal future scene dynamics from single images with orders-of-magnitude faster sampling than dense video simulators while matching accuracy.
Phonemes vs. Projectors: An Investigation of Speech-Language Interfaces for LLM-based ASR eess.AS · 2026-04-10 · unverdicted · none · ref 26 · internal anchor
Phoneme-based interfaces match or surpass projector-based ones for LLM ASR, especially in low-resource languages, and a BPE-phoneme hybrid offers additional improvements.
Learning Vision-Language-Action World Models for Autonomous Driving cs.CV · 2026-04-10 · unverdicted · none · ref 54 · internal anchor
VLA-World improves autonomous driving by using action-guided future image generation followed by reflective reasoning over the imagined scene to refine trajectories.

LLaMA: Open and Efficient Foundation Language Models

hub tools

citation-role summary

citation-polarity summary

claims ledger

mega hub controls

Recognition alignment

counterfactual ablation

co-cited works

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer