Adaptive scheduling of interventions in discrete diffusion language models, timed to attribute-specific commitment schedules discovered with sparse autoencoders, delivers precise multi-attribute steering up to 93% strength while preserving generation quality.
hub
Language models are unsupervised multitask learners.OpenAI blog, 1(8):9
10 Pith papers cite this work. Polarity classification is still indexing.
hub tools
representative citing papers
LLaDA is a scalable diffusion-based language model that matches autoregressive LLMs like LLaMA3 8B on tasks and surpasses GPT-4o on reversal poem completion.
Marginal-conditioned bridges enable training-free sampling from Flow Language Models by drawing clean one-hot endpoints from factorized posteriors and using Ornstein-Uhlenbeck bridges, preserving token marginals and reducing denoising error versus conditional-mean bridges.
Probabilistic circuits have an output bottleneck with convex probability combinations and a context bottleneck limited to fixed vtree-aligned partitions, making them less expressive than transformers for language data with heterogeneous dependencies, though decomposable PCs are strictly more capable
SOMA estimates a local response manifold from early turns and adapts a small surrogate model via divergence-maximizing prompts and localized LoRA fine-tuning for efficient multi-turn serving.
AdaPreLoRA pairs the Adafactor diagonal Kronecker preconditioner on the full weight matrix with a closed-form factor-space solve that selects the update minimizing an H_t-weighted imbalance, yielding competitive results on GPT-2, Mistral-7B, Qwen2-7B and diffusion personalization tasks.
A DBM-based architecture learns consumer beliefs to enable consistent prediction and counterfactual inference for marketing interventions, outperforming baselines on heterogeneous treatment effects in simulation.
Random label bridge training aligns LLM parameters with vision tasks, and partial training of certain layers often suffices due to their foundational properties.
Advanced language representations shape LLMs' schemas to improve knowledge activation and problem-solving.
citing papers explorer
-
Large Language Diffusion Models
LLaDA is a scalable diffusion-based language model that matches autoregressive LLMs like LLaMA3 8B on tasks and surpasses GPT-4o on reversal poem completion.
- Superposition Yields Robust Neural Scaling