Goggles is a gradient-editing module trained once per base model and frame that, when applied frozen during finetuning, causes LLMs to treat unannotated documents with a specified epistemic stance (e.g., as fiction) at 91% accuracy while preserving benchmark performance.
hub
Learning by distilling context
27 Pith papers cite this work. Polarity classification is still indexing.
hub tools
citation-role summary
citation-polarity summary
roles
background 3polarities
background 3representative citing papers
Doc-to-Atom decomposes documents into composable micro-LoRA adapters selected by a query router for efficient long-context QA.
A sleep mechanism with N offline recurrent passes consolidates context into fast weights, improving performance on reasoning tasks where standard transformers fail.
Self-distillation token rewards measure input-response-feedback pointwise mutual information, and CREDIT extracts the input-specific component with contrastive baselines to improve LLM reasoning performance.
Distillation signals align better with ideal updates on incorrect student rollouts than correct ones, with optimal teacher context depending on student capacity and task.
CoDistill-GRPO lets small and large models mutually improve via co-distillation in GRPO, raising small-model math accuracy by over 11 points while cutting large-model training time by about 18%.
NPO uses a policy's own near-future checkpoint as auxiliary trajectories to maximize effective learning signal S = Q/V, improving performance from 57.88 to 63.15 on Qwen3-VL-8B-Instruct with GRPO while accelerating convergence.
LLMs trained on simple specification gaming generalize to zero-shot reward tampering including rewriting their own reward function.
D2D distills distributional shifts between a suspected model and its base into a cartridge adapter to amplify and detect stealth biases in LLMs across multiple types.
PRISMR replaces in-context list processing with a hypernetwork-generated instance-specific LoRA adapter to reduce parse collapse and improve multimodal listwise ranking performance.
HMARS introduces a hierarchical multi-agent memory system that outperforms standard retrieval and other baselines on long-document and multi-turn reasoning tasks through improved evidence coverage.
ThinkSwitch uses iterative self-distillation with QLoRA and spherical weight interpolation to raise both instruct and thinking checkpoint accuracy on small AIME and PubMedQA sets using only 15 human prompts per domain.
DASD improves math reasoning in LLMs by adaptively directing self-distillation based on per-token entropy to balance exploration and step accuracy, outperforming prior self-distillation and RLVR baselines on six benchmarks.
Attention-state memory externalizes long prefixes into a lightweight lookup table of precomputed attention states, yielding higher accuracy than standard in-context learning at fixed memory budgets and lower latency than full attention.
SSOPD converts intra-group correct-wrong contrast into process supervision by distilling a teacher distribution from the shortest correct completion into prefixes of the longest wrong completion, improving GRPO on AIME and HMMT benchmarks.
VSPO samples rollouts at varying steering intensities to improve behavioral control in LLMs while preserving task accuracy.
Reasoning language models extract answers from sparse, order-shuffled chain-of-thought traces with little accuracy loss.
TSUBASA improves long-horizon personalization in LLMs via dynamic memory evolution for writing and context-distillation self-learning for reading, outperforming Mem0 and Memory-R1 on Qwen-3 benchmarks while reducing token use.
Training LLMs on data that enforces priority levels for instructions makes models robust to prompt injection attacks, including unseen ones, with little loss on standard tasks.
A 540B-parameter LLM improves reasoning performance on GSM8K, DROP, OpenBookQA, and ANLI-A3 by fine-tuning on self-generated high-confidence CoT solutions from unlabeled data.
A single LLM rewrite of skill descriptions using false positive and negative cases matches manual optimization performance in production, with most other pipeline components adding little value.
TASM proposes a task-aware structured memory framework using task-vector compression, bipartite token merging, and a Core Memory plus Latent Bank hierarchy to enable efficient dynamic multi-modal in-context learning.
HyperLoRA amortizes federated LoRA adaptation via hypernetwork-generated initializations and product-space aggregation to fix structural bias and initialization lag.
Existing methods for turning LLM interaction experience into parametric skills collapse over multiple iterations; principle-level experience, step-wise injection, and off-policy teacher distillation yield more stable continual learning.
citing papers explorer
-
Task-Aware Structured Memory for Dynamic Multi-modal In-Context Learning
TASM proposes a task-aware structured memory framework using task-vector compression, bipartite token merging, and a Core Memory plus Latent Bank hierarchy to enable efficient dynamic multi-modal in-context learning.