Defines Conditional Distribution Matching (CDM) as finding inputs whose induced conditional distributions match a target distribution and proposes the MLGD-F inference-time algorithm using pretrained diffusion models to solve it without retraining.
hub Mixed citations
Decoupled weight decay regularization
Mixed citation behavior. Most common role is background (60%).
hub tools
citation-role summary
citation-polarity summary
years
2026 10verdicts
UNVERDICTED 10representative citing papers
LaTER reduces LLM token usage 16-33% on reasoning benchmarks by exploring in latent space then switching to explicit CoT verification, with gains like 70% to 73.3% on AIME 2025 in the training-free version.
A stateful backdoor for LLM agents, modeled as a Mealy machine with a decomposition framework, enables incremental malicious actions across sessions and achieves 80-95% attack success rate on four models.
A fitted iso-depth scaling law measures that one recurrence in looped transformers is worth r^0.46 unique blocks in validation loss.
SpeakerLLM unifies speaker profiling, recording-condition understanding, and structured verification reasoning in an audio-LLM via a hierarchical tokenizer and decision traces.
MILM fine-tunes LLMs on XML-encoded multimodal irregular time series via a two-stage process that exploits informative sampling patterns to achieve top performance on EHR classification datasets.
MetaColloc meta-learns a universal set of neural basis functions offline so that new PDEs can be solved at test time with a single linear solve instead of per-equation neural-network optimization.
DeltaRubric decomposes multimodal preference evaluation into self-generated planning and verification steps within a single model, producing large accuracy improvements on VL-RewardBench via multi-role reinforcement learning.
SPIN performs bidirectional domain transfer in SBI to retain parameter mutual information from unlabeled real observations, improving real-world posterior inference under increasing misspecification.
Reshaping outcome rewards, process signals, and rollout comparability in GRPO raises strict compile-and-semantic accuracy in agentic code repair from 0.385 to 0.535 under weak feedback.
citing papers explorer
-
Inverse Design for Conditional Distribution Matching
Defines Conditional Distribution Matching (CDM) as finding inputs whose induced conditional distributions match a target distribution and proposes the MLGD-F inference-time algorithm using pretrained diffusion models to solve it without retraining.
-
LaTER: Efficient Test-Time Reasoning via Latent Exploration and Explicit Verification
LaTER reduces LLM token usage 16-33% on reasoning benchmarks by exploring in latent space then switching to explicit CoT verification, with gains like 70% to 73.3% on AIME 2025 in the training-free version.
-
Stateful Agent Backdoor
A stateful backdoor for LLM agents, modeled as a Mealy machine with a decomposition framework, enables incremental malicious actions across sessions and achieves 80-95% attack success rate on four models.
-
How Much Is One Recurrence Worth? Iso-Depth Scaling Laws for Looped Language Models
A fitted iso-depth scaling law measures that one recurrence in looped transformers is worth r^0.46 unique blocks in validation loss.
-
SpeakerLLM: A Speaker-Specialized Audio-LLM for Speaker Understanding and Verification Reasoning
SpeakerLLM unifies speaker profiling, recording-condition understanding, and structured verification reasoning in an audio-LLM via a hierarchical tokenizer and decision traces.
-
MILM: Large Language Models for Multimodal Irregular Time Series with Informative Sampling
MILM fine-tunes LLMs on XML-encoded multimodal irregular time series via a two-stage process that exploits informative sampling patterns to achieve top performance on EHR classification datasets.
-
MetaColloc: Optimization-Free PDE Solving via Meta-Learned Basis Functions
MetaColloc meta-learns a universal set of neural basis functions offline so that new PDEs can be solved at test time with a single linear solve instead of per-equation neural-network optimization.
-
DeltaRubric: Generative Multimodal Reward Modeling via Joint Planning and Verification
DeltaRubric decomposes multimodal preference evaluation into self-generated planning and verification steps within a single model, producing large accuracy improvements on VL-RewardBench via multi-role reinforcement learning.
-
Information-Preserving Domain Transfer with Unlabeled Data in Misspecified Simulation-Based Inference
SPIN performs bidirectional domain transfer in SBI to retain parameter mutual information from unlabeled real observations, improving real-world posterior inference under increasing misspecification.
-
Signal Reshaping for GRPO in Weak-Feedback Agentic Code Repair
Reshaping outcome rewards, process signals, and rollout comparability in GRPO raises strict compile-and-semantic accuracy in agentic code repair from 0.385 to 0.535 under weak feedback.