Chain-of-thought prompting, by including intermediate reasoning steps in few-shot examples, elicits strong reasoning abilities in large language models on arithmetic, commonsense, and symbolic tasks.
hub Mixed citations
Prefix-Tuning: Optimizing Continuous Prompts for Generation
Mixed citation behavior. Most common role is background (33%).
hub tools
citation-role summary
citation-polarity summary
representative citing papers
CheckMIABench converts LLMs with intermediate checkpoints into clean MIA testbeds by using pre- and post-checkpoint training data from the same distribution and evaluates published attacks on Pythia and OLMo models while releasing an open-source library.
LOFT unifies orthogonal PEFT by treating adaptation as low-rank subspace rotation and adds task-aware support selection that improves efficiency under fixed budgets.
EvoPref applies NSGA-II evolutionary optimization with archive-based diversity to populations of LoRA adapters, yielding 18% higher preference coverage and 47% lower collapse than gradient descent baselines while matching alignment quality.
Prologue adds a small set of learnable tokens trained exclusively with AR cross-entropy loss to decouple generation from reconstruction in autoregressive visual models, yielding lower gFID on ImageNet 256x256.
PSR models that estimate token-specific steering coefficients from activations outperform standard activation steering and compare favorably to prompting on steering benchmarks.
CODI compresses explicit CoT into continuous space via self-distillation and is the first implicit method to match explicit CoT performance on GSM8k at GPT-2 scale with 3.1x compression and 28.2% higher accuracy than prior implicit approaches.
Q-Align trains LMMs on discrete text-defined levels for visual scoring, achieving SOTA on IQA, IAA, and VQA while unifying the tasks in OneAlign.
OPT releases open decoder-only transformers up to 175B parameters that match GPT-3 performance at one-seventh the carbon cost, along with code and training logs.
Prompt tuning matches full model tuning performance on large language models while tuning only a small fraction of parameters and improves robustness to domain shifts.
ProtoKV maintains a fixed-capacity summary state for far history in streaming video, improving accuracy by up to 12.5 points in long-delay query scenarios compared to token-retention methods.
Proposes CBCM for diffusion-based spurious attribute mining and DCD for cross-projection debiasing, claiming SOTA worst-group accuracy on four benchmarks while tuning at most 0.22% of parameters.
LLMs recover dominant binomial orders from corpora but align less closely with exact preference distributions, with preference strength partially encoded in middle-to-late layers and manipulable via steering.
Soft-prompt tuning with 10 vectors improves format compliance on LLM benchmarks and provides a low-cost proxy for comparing base models.
Empirical benchmarks show distribution similarity between adaptation and pretraining data increases practical privacy leakage in DP-adapted LLMs at fixed theoretical guarantees, with LoRA providing strongest protection for OOD cases.
A decoder is trained on 1010 style features to map style representations back to prompts, outperforming direct LLM prompting on style recovery, imitation, and steering tasks.
CRAFT is a Pareto-front prompt optimizer that allocates scarce LLM validation calls to candidates near the current front using accuracy- and cost-oriented generators plus NSGA-II retention.
Transformers are limited to a linearly growing number of accessible output sequences with prompt length, with exponential decay in accessible proportion beyond a critical point, even under unbounded context.
Prefill-only adaptation of LLMs yields 1.9x higher throughput for 512 adapters on Llama 3.1 70B with near-parity performance on RL tasks and recoverable loss on SFT.
Summing outputs from separately trained QLoRA PEFT modules provides strong performance for attribute-controlled text generation, often matching or exceeding single-task modules even on single-attribute tests.
QD-LLM applies neuroevolution to prompt embeddings within a quality-diversity framework, producing 46% higher coverage and 41% higher QD-score than QDAIF on HumanEval, MBPP, and creative writing benchmarks.
Memory Inception is a training-free method that injects latent KV banks at chosen layers to steer LLMs, achieving superior control-drift balance and up to 118x storage reduction on personality and structured-reasoning tasks.
Autoregressive generation modeled as a Markov process over tokens allows new knowledge to be incorporated by extending the state space with a token-to-dictionary mapping whose sample complexity is linear in the number of mapped existing tokens, realized via embedding tuning that induces zero forget.
ClusterRAG applies density-based clustering to user profiles for collaborative retrieval in personalized RAG and reports best performance on LaMP tasks by combining target and similar-user profiles.
citing papers explorer
-
ClusterRAG: Cluster-Based Collaborative Filtering for Personalized Retrieval-Augmented Generation
ClusterRAG applies density-based clustering to user profiles for collaborative retrieval in personalized RAG and reports best performance on LaMP tasks by combining target and similar-user profiles.