Fragmentation strictly raises optimal finite-context log-loss on Markov sources while tokenization can make a short token window equivalent to a longer source window under reliability and compression conditions.
super hub Canonical reference
LLaMA: Open and Efficient Foundation Language Models
Canonical reference. 88% of citing Pith papers cite this work as background.
abstract
We introduce LLaMA, a collection of foundation language models ranging from 7B to 65B parameters. We train our models on trillions of tokens, and show that it is possible to train state-of-the-art models using publicly available datasets exclusively, without resorting to proprietary and inaccessible datasets. In particular, LLaMA-13B outperforms GPT-3 (175B) on most benchmarks, and LLaMA-65B is competitive with the best models, Chinchilla-70B and PaLM-540B. We release all our models to the research community.
hub tools
citation-role summary
citation-polarity summary
claims ledger
- abstract We introduce LLaMA, a collection of foundation language models ranging from 7B to 65B parameters. We train our models on trillions of tokens, and show that it is possible to train state-of-the-art models using publicly available datasets exclusively, without resorting to proprietary and inaccessible datasets. In particular, LLaMA-13B outperforms GPT-3 (175B) on most benchmarks, and LLaMA-65B is competitive with the best models, Chinchilla-70B and PaLM-540B. We release all our models to the research community.
co-cited works
representative citing papers
Allowing each quantization group to select among multiple 4-bit grids improves accuracy over single-grid FP4 for both post-training and pre-training of LLMs.
Adaptive scheduling of interventions in discrete diffusion language models, timed to attribute-specific commitment schedules discovered with sparse autoencoders, delivers precise multi-attribute steering up to 93% strength while preserving generation quality.
SignSGD provably beats SGD by a factor of d under sparse noise via matched ℓ1-norm upper and lower bounds, with an equivalent result for Muon on matrices, and this predicts faster GPT-2 pretraining.
An adversary controlling an intermediate pipeline stage in decentralized LLM post-training can inject a backdoor that reduces alignment from 80% to 6%, with the backdoor persisting in 60% of cases even after subsequent safety training.
AgentDojo introduces an extensible evaluation framework populated with realistic agent tasks and security test cases to measure prompt injection robustness in tool-using LLM agents.
Mamba is a linear-time sequence model using input-dependent selective SSMs that achieves SOTA results across modalities and matches twice-larger Transformers on language modeling with 5x higher inference throughput.
Tree of Thoughts enables language models to solve complex planning tasks by generating, evaluating, and searching over coherent intermediate thoughts in a tree, raising Game of 24 success from 4% to 74% with GPT-4.
GPT-4-generated instruction data produces superior zero-shot performance in finetuned LLaMA models versus prior state-of-the-art data.
A proposer-solver agent pair achieves supervised-level video temporal grounding and fine-grained captioning from 2.5K unlabeled videos via self-reinforcing evolution.
BlockVLA accelerates autoregressive VLA models by 3.3x using block diffusion finetuning, with faster training convergence and better early performance on long-horizon robotic tasks.
A parallel multi-turn medical dialogue dataset spanning English and nine Indic languages is created from synthetic consultations to enable personalized AI healthcare interactions.
LiteLVLM prunes visual tokens for pixel grounding by reversing CLIP visual-text similarity to retain referent region tokens, outperforming prior methods by over 5% with 22% speedup and 2.3x memory reduction without any training.
Probabilistic circuits have an output bottleneck with convex probability combinations and a context bottleneck limited to fixed vtree-aligned partitions, making them less expressive than transformers for language data with heterogeneous dependencies, though decomposable PCs are strictly more capable
MindVLA-U1 introduces a unified streaming VLA with shared backbone, framewise memory, and language-guided action diffusion that surpasses human drivers on WOD-E2E planning metrics.
Large language models achieve macro F1 scores above 0.85 on binary nominal-versus-danger classification from CTAF radio transcripts and METAR weather data using a new synthetic dataset with a 12-category hazard taxonomy.
Gender bias and factual gender knowledge are severely entangled in language model circuits and neurons, making neuron ablation an unreliable method for debiasing.
MoLA infers a mixture of latent actions from generated future videos via modality-aware inverse dynamics models to improve robot manipulation policies.
Pretrained LLMs adapted via convolutional projections and LoRA act as efficient frozen backbones for sensor-based human activity recognition, delivering strong data efficiency and cross-dataset transfer.
DistractMIA performs output-only black-box membership inference on vision-language models by inserting semantic distractors and measuring shifts in generated text responses.
StepCodeReasoner aligns code reasoning with verifiable stepwise execution traces via print anchors and bi-level GRPO reinforcement learning, reaching SOTA results on CRUXEval (91.1%) and LiveCodeBench (86.5%) for a 7B model.
Anchor-guided variance-aware reward modeling uses two response-level anchors to resolve non-identifiability in Gaussian models of pluralistic preferences, yielding provable identification, a joint training objective, and improved RLHF performance.
Temporarily reducing the learning rate on upper-layer query and key projections during early GPT pretraining prevents premature attention specialization and improves model performance.
V-ABS is an action-observer beam search method with entropy-based adaptive weighting and an 80k-sample SFT dataset that delivers 19.7% average gains on visual reasoning tasks for MLLMs.
citing papers explorer
-
One Pass, Any Order: Position-Invariant Listwise Reranking for LLM-Based Recommendation
InvariRank achieves permutation-invariant listwise reranking for LLM-based recommendations via a structured attention mask that blocks cross-candidate interactions and shared positional framing under RoPE, enabling stable rankings in one forward pass.
-
On the Robustness of LLM-Based Dense Retrievers: A Systematic Analysis of Generalizability and Stability
LLM-based dense retrievers generalize better when instruction-tuned but pay a specialization tax when optimized for reasoning; they resist typos and corpus poisoning better than encoder-only baselines yet remain vulnerable to semantic perturbations, with larger models and certain embedding geometry,
-
Fusion and Alignment Enhancement with Large Language Models for Tail-item Sequential Recommendation
FAERec fuses collaborative ID embeddings with LLM semantic embeddings using adaptive gating and dual-level alignment to enhance tail-item sequential recommendations.
-
LARAG: Link-Aware Retrieval Strategy for RAG Systems in Hyperlinked Technical Documentation
LARAG improves RAG answer quality on hyperlinked technical documentation by using author-defined links for retrieval, achieving higher BERTScore while using fewer chunks and tokens than standard embedding-based RAG.
-
An Embarrassingly Simple Graph Heuristic Reveals Shortcut-Solvable Benchmarks for Sequential Recommendation
A simple graph heuristic without training or sequence encoders matches or outperforms trained generative recommenders on 10 of 14 sequential recommendation benchmarks by exploiting local transition and feature shortcuts.
-
TimeMM: Time-as-Operator Spectral Filtering for Dynamic Multimodal Recommendation
TimeMM proposes a time-as-operator spectral filtering framework with adaptive mixing and modality routing to model non-stationary multimodal user preferences in recommendation systems.
-
Disagreement as Signals: Dual-view Calibration for Sequential Recommendation Denoising
DC4SR improves sequential recommendation denoising by iteratively calibrating LLM semantic priors and model learning posteriors using their disagreement as a signal for better alignment with true user interests.
-
From Tokens to Concepts: Leveraging SAE for SPLADE
SAE-SPLADE substitutes SPLADE's backbone vocabulary with SAE-derived semantic concepts and matches standard SPLADE performance with better efficiency on in- and out-of-domain tasks.
-
Modular Representation Compression: Adapting LLMs for Efficient and Effective Recommendations
LLMs exhibit mid-layer representation advantage for recommendations; MARC compresses representations modularly to reduce costs while improving performance, as shown in a large-scale online advertising deployment.
-
UniRec: Bridging the Expressive Gap between Generative and Discriminative Recommendation via Chain-of-Attribute
UniRec bridges the expressive gap in generative recommendation by prefixing semantic ID sequences with structured attribute tokens, recovering explicit feature crossing and yielding +22.6% HR@50 gains plus online lifts in PVCTR, orders, and GMV.
-
Automating Categorization of Scientific Texts with In-Context Learning and Prompt-Chaining in Large Language Models
Prompt chaining with off-the-shelf LLMs outperforms in-context learning and BERT for 1st- and 2nd-level classification on the ORKG taxonomy using the FORC dataset, but struggles at the 3rd level.