mega hub Canonical reference

LLaMA: Open and Efficient Foundation Language Models

· 2023 · cs.CL · arXiv 2302.13971

Canonical reference. 82% of citing Pith papers cite this work as background.

1301 Pith papers citing it

Background 82% of classified citations

open full Pith review browse 1301 citing papers arXiv PDF

abstract

We introduce LLaMA, a collection of foundation language models ranging from 7B to 65B parameters. We train our models on trillions of tokens, and show that it is possible to train state-of-the-art models using publicly available datasets exclusively, without resorting to proprietary and inaccessible datasets. In particular, LLaMA-13B outperforms GPT-3 (175B) on most benchmarks, and LLaMA-65B is competitive with the best models, Chinchilla-70B and PaLM-540B. We release all our models to the research community.

hub tools

JSON dossier citing papers JSON arXiv source

citation-role summary

background 207 method 20 baseline 8 other 6 dataset 1 extension 1

citation-polarity summary

background 199 use method 21 unclear 13 baseline 7 extend 1 support 1 use dataset 1

claims ledger

abstract We introduce LLaMA, a collection of foundation language models ranging from 7B to 65B parameters. We train our models on trillions of tokens, and show that it is possible to train state-of-the-art models using publicly available datasets exclusively, without resorting to proprietary and inaccessible datasets. In particular, LLaMA-13B outperforms GPT-3 (175B) on most benchmarks, and LLaMA-65B is competitive with the best models, Chinchilla-70B and PaLM-540B. We release all our models to the research community.

mega hub controls

export citing contexts JSON export graph JSON export full bundle JSON open full Pith review annotated reader queued

Recognition alignment

counterfactual ablation

If this work disappeared, these are the nearest dependency candidates in Pith, weighted toward method, dataset, baseline, and extension contexts where available. This is a structural signal, not a retraction verdict.

co-cited works

representative citing papers

Tight Sample Complexity of Transformers

cs.LG · 2026-06-08 · unverdicted · novelty 8.0

Depth-L transformers with W parameters have VC dimension Theta(L W log(T W)), yielding matching O(L W log((T+T')W)) upper and Omega(L W log((T+T')W/L)) lower bounds on sample complexity for chain-of-thought learning.

SVHalluc: Benchmarking Speech-Vision Hallucination in Audio-Visual Large Language Models

eess.AS · 2026-05-31 · unverdicted · novelty 8.0

SVHalluc benchmark shows open-source audio-visual LLMs achieve near-random accuracy on semantic and temporal speech-vision alignment tasks while Gemini 2.5 Pro performs substantially better.

Privacy Auditing with Zero (0) Training Run

cs.CR · 2026-05-14 · unverdicted · novelty 8.0

Zero-Run auditing supplies valid lower bounds on differential privacy parameters from fixed member and non-member datasets by modeling and correcting distribution-shift confounding via causal-inference techniques.

Effective Context in Transformers: An Analysis of Fragmentation and Tokenization

cs.LG · 2026-05-13 · unverdicted · novelty 8.0

Fragmentation strictly raises optimal finite-context log-loss on Markov sources while tokenization can make a short token window equivalent to a longer source window under reliability and compression conditions.

Grid Games: The Power of Multiple Grids for Quantizing Large Language Models

cs.LG · 2026-05-12 · accept · novelty 8.0

Allowing each quantization group to select among multiple 4-bit grids improves accuracy over single-grid FP4 for both post-training and pre-training of LLMs.

Steering Without Breaking: Mechanistically Informed Interventions for Discrete Diffusion Language Models

cs.LG · 2026-05-08 · unverdicted · novelty 8.0

Adaptive scheduling of interventions in discrete diffusion language models, timed to attribute-specific commitment schedules discovered with sparse autoencoders, delivers precise multi-attribute steering up to 93% strength while preserving generation quality.

When and Why SignSGD Outperforms SGD: A Theoretical Study Based on $\ell_1$-norm Lower Bounds

cs.LG · 2026-05-07 · unverdicted · novelty 8.0

SignSGD provably beats SGD by a factor of d under sparse noise via matched ℓ1-norm upper and lower bounds, with an equivalent result for Muon on matrices, and this predicts faster GPT-2 pretraining.

Backdoor Attacks on Decentralised Post-Training

cs.CR · 2026-03-31 · conditional · novelty 8.0 · 2 refs

An adversary controlling an intermediate pipeline stage in decentralized LLM post-training can inject a backdoor that reduces alignment from 80% to 6%, with the backdoor persisting in 60% of cases even after subsequent safety training.

Model Context Protocol (MCP) at First Glance: Studying the Security and Maintainability of MCP Servers

cs.SE · 2025-06-16 · conditional · novelty 8.0

First study of 1,899 MCP servers finds eight distinct vulnerabilities (only three traditional), 7.2% with general issues, 5.5% with tool poisoning, and 66% with code smells, urging MCP-specific security practices.

BEAVER: An Enterprise Benchmark for Text-to-SQL

cs.CL · 2024-09-03 · unverdicted · novelty 8.0

BEAVER is the first text-to-SQL benchmark from private enterprise data warehouses, revealing SOTA agentic frameworks achieve only 10.8% accuracy on complex real-world queries.

MME-RealWorld: Could Your Multimodal LLM Challenge High-Resolution Real-World Scenarios that are Difficult for Humans?

cs.CV · 2024-08-23 · conditional · novelty 8.0

MME-RealWorld is the largest manually annotated high-resolution benchmark for MLLMs, where even the best models achieve less than 60% accuracy on challenging real-world tasks.

AgentDojo: A Dynamic Environment to Evaluate Prompt Injection Attacks and Defenses for LLM Agents

cs.CR · 2024-06-19 · unverdicted · novelty 8.0

AgentDojo introduces an extensible evaluation framework populated with realistic agent tasks and security test cases to measure prompt injection robustness in tool-using LLM agents.

AgentClinic: a multimodal agent benchmark to evaluate AI in simulated clinical environments

cs.HC · 2024-05-13 · conditional · novelty 8.0

AgentClinic is a multimodal agent benchmark demonstrating that LLM diagnostic accuracy on MedQA drops to below one-tenth in sequential clinical simulations, with Claude-3.5 leading and large tool-use differences across models.

ORPO: Monolithic Preference Optimization without Reference Model

cs.CL · 2024-03-12 · conditional · novelty 8.0

ORPO performs preference alignment during supervised fine-tuning via a monolithic odds ratio penalty, allowing 7B models to outperform larger state-of-the-art models on alignment benchmarks.

Bridging Language and Items for Retrieval and Recommendation: Benchmarking LLMs as Semantic Encoders

cs.IR · 2024-03-06 · unverdicted · novelty 8.0

BLaIR is a new benchmark and 570M-review dataset showing that LLM performance rankings on recommendation tasks have little correlation with rankings on general embedding benchmarks like MTEB.

Mamba: Linear-Time Sequence Modeling with Selective State Spaces

cs.LG · 2023-12-01 · unverdicted · novelty 8.0

Mamba is a linear-time sequence model using input-dependent selective SSMs that achieves SOTA results across modalities and matches twice-larger Transformers on language modeling with 5x higher inference throughput.

MMMU: A Massive Multi-discipline Multimodal Understanding and Reasoning Benchmark for Expert AGI

cs.CL · 2023-11-27 · unverdicted · novelty 8.0

MMMU provides 11.5K heterogeneous college-level multimodal questions that current models solve at 56-59% accuracy, establishing a new standard for expert multimodal evaluation.

Tree of Thoughts: Deliberate Problem Solving with Large Language Models

cs.CL · 2023-05-17 · accept · novelty 8.0

Tree of Thoughts enables language models to solve complex planning tasks by generating, evaluating, and searching over coherent intermediate thoughts in a tree, raising Game of 24 success from 4% to 74% with GPT-4.

API-Bank: A Comprehensive Benchmark for Tool-Augmented LLMs

cs.CL · 2023-04-14 · conditional · novelty 8.0

API-Bank is a new benchmark and training dataset for tool-augmented LLMs that shows fine-tuned models can approach GPT-3.5 tool-use effectiveness.

Instruction Tuning with GPT-4

cs.CL · 2023-04-06 · unverdicted · novelty 8.0

GPT-4-generated instruction data produces superior zero-shot performance in finetuned LLaMA models versus prior state-of-the-art data.

HERMES: A Multi-Granularity Labeling Substrate for Pre-training Data Mixtures

cs.LG · 2026-07-02 · unverdicted · novelty 7.0

HERMES provides a reusable hierarchical labeling substrate for pre-training data that reveals granularity-specific effects in data mixing rules during model training.

OntoLearner: A Modular Python Library for Ontology Learning with Large Language Models

cs.AI · 2026-07-02 · unverdicted · novelty 7.0

OntoLearner supplies the first cross-domain ontology collection and benchmarking infrastructure for LLM-driven ontology learning, finding that failure scales with ontological complexity instead of model size.

Agentic generation of verifiable rules for deterministic, self-expanding reaction classification

cs.AI · 2026-07-01 · unverdicted · novelty 7.0

Multi-agent LLMs generate and verify 14,073 deterministic reaction rules from 665,901 patents, enabling 97.7% classification of unseen reactions with finer resolution than fixed proprietary systems.

Probing Memorization of Tabular In-Context Learning

cs.LG · 2026-06-30 · unverdicted · novelty 7.0

A new probing framework detects moderate parametric memorization signals in tabular in-context learning models under single-task fine-tuning, strongest on low-cardinality tasks, but signals largely disappear under realistic training.

citing papers explorer

Showing 50 of 142 citing papers after filters.

Backdoor Attacks on Decentralised Post-Training cs.CR · 2026-03-31 · conditional · none · ref 12 · 2 links · internal anchor
An adversary controlling an intermediate pipeline stage in decentralized LLM post-training can inject a backdoor that reduces alignment from 80% to 6%, with the backdoor persisting in 60% of cases even after subsequent safety training.
Model Context Protocol (MCP) at First Glance: Studying the Security and Maintainability of MCP Servers cs.SE · 2025-06-16 · conditional · none · ref 136 · internal anchor
First study of 1,899 MCP servers finds eight distinct vulnerabilities (only three traditional), 7.2% with general issues, 5.5% with tool poisoning, and 66% with code smells, urging MCP-specific security practices.
MME-RealWorld: Could Your Multimodal LLM Challenge High-Resolution Real-World Scenarios that are Difficult for Humans? cs.CV · 2024-08-23 · conditional · none · ref 62 · internal anchor
MME-RealWorld is the largest manually annotated high-resolution benchmark for MLLMs, where even the best models achieve less than 60% accuracy on challenging real-world tasks.
AgentClinic: a multimodal agent benchmark to evaluate AI in simulated clinical environments cs.HC · 2024-05-13 · conditional · none · ref 23 · internal anchor
AgentClinic is a multimodal agent benchmark demonstrating that LLM diagnostic accuracy on MedQA drops to below one-tenth in sequential clinical simulations, with Claude-3.5 leading and large tool-use differences across models.
ORPO: Monolithic Preference Optimization without Reference Model cs.CL · 2024-03-12 · conditional · none · ref 50 · internal anchor
ORPO performs preference alignment during supervised fine-tuning via a monolithic odds ratio penalty, allowing 7B models to outperform larger state-of-the-art models on alignment benchmarks.
API-Bank: A Comprehensive Benchmark for Tool-Augmented LLMs cs.CL · 2023-04-14 · conditional · none · ref 17 · internal anchor
API-Bank is a new benchmark and training dataset for tool-augmented LLMs that shows fine-tuned models can approach GPT-3.5 tool-use effectiveness.
Training-free Task Classification for Multi-Task Model Merging cs.LG · 2026-06-21 · conditional · none · ref 72 · internal anchor
SiM enables training-free routing in multi-task model merging by scoring test inputs via projection residuals onto SVD-based task manifolds precomputed from small support sets.
CheckMIABench: Firm Foundations For Membership Inference Attacks on Language Models cs.LG · 2026-06-16 · conditional · none · ref 284 · internal anchor
CheckMIABench converts LLMs with intermediate checkpoints into clean MIA testbeds by using pre- and post-checkpoint training data from the same distribution and evaluates published attacks on Pythia and OLMo models while releasing an open-source library.
In Defense of Information Leakage in Concept-based Models cs.LG · 2026-06-09 · conditional · none · ref 133 · internal anchor
Concept-based models can use controlled 'benign' information leakage to remain accurate and intervenable under real-world concept incompleteness by reframing their training objective.
APEX4: Efficient Pure W4A4 LLM Inference via Intra-SM Compute Rebalancing cs.DC · 2026-06-07 · conditional · none · ref 25 · 2 links · internal anchor
APEX4 co-designs pure INT4 GEMM kernels with ρ-aware granularity adaptation to deliver up to 2.09× end-to-end speedup on GPUs with low ρ while keeping LLaMA-2-70B perplexity within 0.63 of FP16.
TrustMargin: Training-Free Arbitration between Parametric Memory and Retrieved Evidence in Large Language Models cs.CL · 2026-06-07 · conditional · none · ref 15 · internal anchor
TrustMargin arbitrates between direct and RAG answers from a frozen LLM by combining a parametric-prior margin and an evidence-binding margin computed from model likelihoods, improving results on 2WikiMQA and CWQA.
EvoCut: Multi-Layer Evolution-Aware Visual Token Compression for Efficient Large Vision-Language Models cs.CV · 2026-06-01 · conditional · none · ref 35 · internal anchor
EvoCut is a training-free visual token compression technique that identifies important tokens via multi-layer evolution deviation, retaining 11.1% tokens with 94.4% average performance preserved on LLaVA-1.5-7B.
Repetition Mismatch: Why Data Mixture Experiments Don't Scale and How to Fix Them cs.LG · 2026-05-29 · conditional · none · ref 63 · 2 links · internal anchor
Repetition rate mismatch between small-scale proxies and target budgets is the main reason data mixture experiments do not scale; a subsampling procedure that equalizes repetition rates recovers optimal mixtures from 1/16-scale experiments.
Head-Aware Key-Value Compression for Efficient Autoregressive Image Generation cs.CV · 2026-05-20 · conditional · none · ref 32 · internal anchor
HeadKV compresses KV cache for autoregressive image generation via head-aware budget allocation, early head-type identification from consistent patterns, and stratified token eviction.
Modality-Decoupled Online Recursive Editing cs.LG · 2026-05-19 · conditional · none · ref 16 · internal anchor
M-ORE decouples text and visual update statistics in MLLMs and applies recursive low-rank edits in an orthogonal subspace to reduce cross-modal conflict and long-horizon interference.
Enjoy Your Layer Normalization with the Computational Efficiency of RMSNorm cs.LG · 2026-05-14 · conditional · none · ref 88 · internal anchor
A framework to identify and convert foldable layer normalizations to RMSNorm for exact equivalence and faster inference in deep neural networks.
Transformers Efficiently Perform In-Context Logistic Regression via Normalized Gradient Descent cs.LG · 2026-05-07 · conditional · none · ref 46 · internal anchor
Multi-layer transformers can implement in-context logistic regression by performing normalized gradient descent steps layer by layer, obtained via supervised training of a single attention layer followed by recurrent application with convergence and OOD guarantees.
When Graph Language Models Go Beyond Memorization cs.LG · 2026-05-07 · conditional · none · ref 7 · internal anchor
Large-scale graph language models acquire structural regularities beyond memorization, with subgraph rank correlations persisting after bootstrap and novel-subset controls, especially for high-frequency patterns.
Post Reasoning: Improving the Performance of Non-Thinking Models at No Cost cs.AI · 2026-05-07 · conditional · none · ref 142 · internal anchor
Post-Reasoning boosts LLM accuracy by reversing the usual answer-after-reasoning order, delivering mean relative gains of 17.37% across 117 model-benchmark pairs with zero extra cost.
Sparse Tokens Suffice: Jailbreaking Audio Language Models via Token-Aware Gradient Optimization cs.CR · 2026-05-06 · conditional · none · ref 4 · internal anchor
TAGO performs sparse jailbreak optimization on audio LMs by retaining only high-gradient-energy tokens, preserving near-full ASR at 25% retention across three models.
One Pass, Any Order: Position-Invariant Listwise Reranking for LLM-Based Recommendation cs.IR · 2026-04-30 · conditional · none · ref 28 · internal anchor
InvariRank achieves permutation-invariant listwise reranking for LLM-based recommendations via a structured attention mask that blocks cross-candidate interactions and shared positional framing under RoPE, enabling stable rankings in one forward pass.
Fine-tuning vs. In-context Learning in Large Language Models: A Formal Language Learning Perspective cs.CL · 2026-04-25 · conditional · none · ref 66 · 2 links · internal anchor
A controlled formal language task reveals fine-tuning outperforms in-context learning on in-distribution generalization but equals it on out-of-distribution, with ICL showing greater sensitivity to model size and tokenization.
Local Linearity of LLMs Enables Activation Steering via Model-Based Linear Optimal Control cs.LG · 2026-04-21 · conditional · none · ref 78 · internal anchor
Local linearity of LLM layers enables LQR-based closed-loop activation steering with theoretical tracking guarantees.
Region-Grounded Report Generation for 3D Medical Imaging: A Fine-Grained Dataset and Graph-Enhanced Framework cs.CV · 2026-04-20 · conditional · none · ref 52 · internal anchor
Introduces the first large-scale 3D PET/CT dataset with fine-grained RoI annotations for Vietnamese and a graph-enhanced HiRRA framework that achieves SOTA report generation by modeling RoI dependencies.
SUPERGLASSES: Benchmarking Vision Language Models as Intelligent Agents for AI Smart Glasses cs.CV · 2026-02-26 · conditional · none · ref 41 · internal anchor
SUPERGLASSES is the first VQA benchmark built from actual smart glasses data, and SUPERLENS is an agent using automatic object detection, query decoupling, and multimodal search that outperforms GPT-4o by 2.19% on it.
SuperInfer: SLO-Aware Rotary Scheduling and Memory Management for LLM Inference on Superchips cs.DC · 2026-01-28 · conditional · none · ref 17 · internal anchor
SuperInfer improves TTFT SLO attainment by up to 74.7% on GH200 Superchips via SLO-aware rotary scheduling (RotaSched) and full-duplex KV cache rotation (DuplexKV) over NVLink-C2C while preserving TBT and throughput.
Large Video Planner Enables Generalizable Robot Control cs.RO · 2025-12-17 · conditional · none · ref 79 · internal anchor
A video foundation model trained on human demonstrations generates zero-shot plans that convert to executable robot actions on novel scenes and tasks.
On Interaction Effects in Greybox Fuzzing cs.SE · 2025-10-22 · conditional · none · ref 44 · internal anchor
MuoFuzz improves greybox fuzzing by learning mutator sequence interactions to select effective orders, outperforming AFL++ and MOPT on coverage and unique bugs in FuzzBench and MAGMA.
PuzzleWorld: A Benchmark for Multimodal, Open-Ended Reasoning in Puzzlehunts cs.CL · 2025-06-06 · conditional · none · ref 36 · internal anchor
PuzzleWorld benchmark reveals state-of-the-art AI models solve only 18% of complex puzzlehunt problems with 40% stepwise accuracy, matching novices but trailing enthusiasts, while fine-tuning on traces yields modest gains.
Gen-n-Val: Agentic Image Data Generation and Validation cs.CV · 2025-06-05 · conditional · none · ref 33 · internal anchor
Gen-n-Val uses LLM and VLLM agents with Layer Diffusion and TextGrad to generate and validate synthetic instance data, cutting invalid samples from 50% to 7% and improving rare-class performance on LVIS and COCO benchmarks.
Sundial: A Family of Highly Capable Time Series Foundation Models cs.LG · 2025-02-02 · conditional · none · ref 22 · internal anchor
Sundial uses TimeFlow Loss for native pre-training of Transformers on continuous time series from TimeBench, achieving SOTA point and probabilistic forecasting with millisecond inference.
DuoAttention: Efficient Long-Context LLM Inference with Retrieval and Streaming Heads cs.CL · 2024-10-14 · conditional · none · ref 48 · internal anchor
DuoAttention identifies retrieval heads requiring full KV cache and streaming heads using constant-length cache to reduce memory and latency in long-context LLM inference.
Autoregressive Model Beats Diffusion: Llama for Scalable Image Generation cs.CV · 2024-06-10 · conditional · none · ref 34 · internal anchor
Scaled vanilla autoregressive models based on Llama achieve 2.18 FID on ImageNet 256x256 image generation, beating popular diffusion models without visual inductive biases.
SpinQuant: LLM quantization with learned rotations cs.LG · 2024-05-26 · conditional · none · ref 20 · internal anchor
SpinQuant learns optimal rotations to enable accurate 4-bit quantization of LLM weights, activations, and KV cache, reducing the zero-shot gap to full precision to 2.9 points on LLaMA-2 7B.
MathVerse: Does Your Multi-modal LLM Truly See the Diagrams in Visual Math Problems? cs.CV · 2024-03-21 · conditional · none · ref 55 · internal anchor
MathVerse is a benchmark that tests multi-modal LLMs on visual math by providing each problem in six versions with progressively less diagram and text information to measure true visual understanding.
KTO: Model Alignment as Prospect Theoretic Optimization cs.LG · 2024-02-02 · conditional · none · ref 20 · internal anchor
KTO aligns LLMs by directly maximizing prospect-theoretic utility on binary signals and matches or exceeds preference-based methods like DPO from 1B to 30B parameters.
Hallucination is Inevitable: An Innate Limitation of Large Language Models cs.CL · 2024-01-22 · conditional · none · ref 66 · internal anchor
Hallucinations are inevitable in LLMs because they cannot learn all computable functions according to learning theory.
Self-Rewarding Language Models cs.CL · 2024-01-18 · conditional · none · ref 78 · internal anchor
Iterative self-rewarding via LLM-as-Judge in DPO training on Llama 2 70B improves instruction following and self-evaluation, outperforming GPT-4 on AlpacaEval 2.0.
LRM: Large Reconstruction Model for Single Image to 3D cs.CV · 2023-11-08 · conditional · none · ref 31 · internal anchor
LRM is a large transformer that predicts a NeRF directly from a single image after training on a million-object multi-view dataset.
Detecting Pretraining Data from Large Language Models cs.CL · 2023-10-25 · conditional · none · ref 95 · internal anchor
Min-K% Prob detects pretraining data in LLMs by flagging outlier low-probability words in text, achieving 7.4% better performance than prior methods on the new WIKIMIA benchmark.
Catastrophic Jailbreak of Open-source LLMs via Exploiting Generation cs.CL · 2023-10-10 · conditional · none · ref 20 · internal anchor
Varying decoding strategies such as temperature and sampling methods jailbreaks safety alignments in open-source LLMs, raising misalignment from 0% to over 95% at 30x lower cost than prior attacks.
Time-LLM: Time Series Forecasting by Reprogramming Large Language Models cs.LG · 2023-10-03 · conditional · none · ref 110 · internal anchor
Time-LLM reprograms frozen LLMs for time series forecasting via text prototypes and Prompt-as-Prefix, outperforming specialized models in standard, few-shot, and zero-shot settings.
Efficient Memory Management for Large Language Model Serving with PagedAttention cs.LG · 2023-09-12 · conditional · none · ref 56 · internal anchor
PagedAttention achieves near-zero waste in LLM key-value cache memory and enables 2-4x higher serving throughput than prior systems.
QLoRA: Efficient Finetuning of Quantized LLMs cs.LG · 2023-05-23 · conditional · none · ref 57 · internal anchor
QLoRA finetunes 4-bit quantized LLMs via LoRA adapters to match full-precision performance while using far less memory, enabling 65B-scale training on single GPUs and producing Guanaco models near ChatGPT level.
LIMA: Less Is More for Alignment cs.CL · 2023-05-18 · conditional · none · ref 50 · internal anchor
Fine-tuning a 65B model on 1,000 high-quality examples produces output that humans rate as good as or better than GPT-4 in 43% of cases, indicating most capabilities come from pretraining.
InstructBLIP: Towards General-purpose Vision-Language Models with Instruction Tuning cs.CV · 2023-05-11 · conditional · none · ref 41 · internal anchor
Instruction tuning of BLIP-2 with an instruction-aware Query Transformer delivers state-of-the-art zero-shot performance on held-out vision-language datasets and strong finetuned results on downstream tasks.
VideoChat: Chat-Centric Video Understanding cs.CV · 2023-05-10 · conditional · none · ref 42 · internal anchor
VideoChat integrates video models and LLMs via a learnable interface for chat-based spatiotemporal and causal video reasoning, trained on a new video-centric instruction dataset.
WizardLM: Empowering large pre-trained language models to follow complex instructions cs.CL · 2023-04-24 · conditional · none · ref 39 · internal anchor
WizardLM uses LLM-driven iterative rewriting to generate complex instruction data and fine-tunes LLaMA to reach over 90% of ChatGPT capacity on 17 of 29 evaluated skills.
LLaMA-Adapter: Efficient Fine-tuning of Language Models with Zero-init Attention cs.CV · 2023-03-28 · conditional · none · ref 173 · internal anchor
LLaMA-Adapter turns frozen LLaMA 7B into a capable instruction follower using only 1.2M new parameters and zero-init attention, matching Alpaca while extending to image-conditioned reasoning on ScienceQA and COCO.
Beyond the Prompt: Jailbreaking Function-Calling LLMs via Simulated Moderation Traces cs.CR · 2026-07-01 · conditional · none · ref 23 · internal anchor
SMT achieves the highest attack success rate and HarmScore on commercial function-calling LLMs from five providers by using simulated moderation traces in multi-turn trajectories, outperforming baselines with near-minimal queries.

LLaMA: Open and Efficient Foundation Language Models

hub tools

citation-role summary

citation-polarity summary

claims ledger

mega hub controls

Recognition alignment

counterfactual ablation

co-cited works

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer