Chain-of-thought prompting, by including intermediate reasoning steps in few-shot examples, elicits strong reasoning abilities in large language models on arithmetic, commonsense, and symbolic tasks.
hub Mixed citations
Prefix-Tuning: Optimizing Continuous Prompts for Generation
Mixed citation behavior. Most common role is background (33%).
hub tools
citation-role summary
citation-polarity summary
representative citing papers
CheckMIABench converts LLMs with intermediate checkpoints into clean MIA testbeds by using pre- and post-checkpoint training data from the same distribution and evaluates published attacks on Pythia and OLMo models while releasing an open-source library.
LOFT unifies orthogonal PEFT by treating adaptation as low-rank subspace rotation and adds task-aware support selection that improves efficiency under fixed budgets.
EvoPref applies NSGA-II evolutionary optimization with archive-based diversity to populations of LoRA adapters, yielding 18% higher preference coverage and 47% lower collapse than gradient descent baselines while matching alignment quality.
Prologue adds a small set of learnable tokens trained exclusively with AR cross-entropy loss to decouple generation from reconstruction in autoregressive visual models, yielding lower gFID on ImageNet 256x256.
PSR models that estimate token-specific steering coefficients from activations outperform standard activation steering and compare favorably to prompting on steering benchmarks.
CODI compresses explicit CoT into continuous space via self-distillation and is the first implicit method to match explicit CoT performance on GSM8k at GPT-2 scale with 3.1x compression and 28.2% higher accuracy than prior implicit approaches.
Q-Align trains LMMs on discrete text-defined levels for visual scoring, achieving SOTA on IQA, IAA, and VQA while unifying the tasks in OneAlign.
OPT releases open decoder-only transformers up to 175B parameters that match GPT-3 performance at one-seventh the carbon cost, along with code and training logs.
Prompt tuning matches full model tuning performance on large language models while tuning only a small fraction of parameters and improves robustness to domain shifts.
PAW compiles natural language fuzzy function specifications into parameter-efficient adapters for a small frozen interpreter, matching 32B model performance with 50x less memory.
ProtoKV maintains a fixed-capacity summary state for far history in streaming video, improving accuracy by up to 12.5 points in long-delay query scenarios compared to token-retention methods.
Proposes CBCM for diffusion-based spurious attribute mining and DCD for cross-projection debiasing, claiming SOTA worst-group accuracy on four benchmarks while tuning at most 0.22% of parameters.
LLMs recover dominant binomial orders from corpora but align less closely with exact preference distributions, with preference strength partially encoded in middle-to-late layers and manipulable via steering.
Soft-prompt tuning with 10 vectors improves format compliance on LLM benchmarks and provides a low-cost proxy for comparing base models.
Empirical benchmarks show distribution similarity between adaptation and pretraining data increases practical privacy leakage in DP-adapted LLMs at fixed theoretical guarantees, with LoRA providing strongest protection for OOD cases.
A decoder is trained on 1010 style features to map style representations back to prompts, outperforming direct LLM prompting on style recovery, imitation, and steering tasks.
CRAFT is a Pareto-front prompt optimizer that allocates scarce LLM validation calls to candidates near the current front using accuracy- and cost-oriented generators plus NSGA-II retention.
Transformers are limited to a linearly growing number of accessible output sequences with prompt length, with exponential decay in accessible proportion beyond a critical point, even under unbounded context.
Prefill-only adaptation of LLMs yields 1.9x higher throughput for 512 adapters on Llama 3.1 70B with near-parity performance on RL tasks and recoverable loss on SFT.
Summing outputs from separately trained QLoRA PEFT modules provides strong performance for attribute-controlled text generation, often matching or exceeding single-task modules even on single-attribute tests.
QD-LLM applies neuroevolution to prompt embeddings within a quality-diversity framework, producing 46% higher coverage and 41% higher QD-score than QDAIF on HumanEval, MBPP, and creative writing benchmarks.
Memory Inception is a training-free method that injects latent KV banks at chosen layers to steer LLMs, achieving superior control-drift balance and up to 118x storage reduction on personality and structured-reasoning tasks.
Autoregressive generation modeled as a Markov process over tokens allows new knowledge to be incorporated by extending the state space with a token-to-dictionary mapping whose sample complexity is linear in the number of mapped existing tokens, realized via embedding tuning that induces zero forget.
citing papers explorer
No citing papers match the current filters.