Training open-weight LLMs on conversational serializations of authentic student programming submissions produces artificial learners that better replicate real debugging behavior than code-only baselines or prompted large models.
Title resolution pending
4 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
roles
background 1polarities
background 1representative citing papers
Diffusion models improve generation quality via inference-time search over noise candidates guided by verifiers and algorithms, yielding gains beyond denoising step scaling on class- and text-conditioned benchmarks.
LambdaPO introduces pairwise preference-based advantage estimation and a semantic density reward to extract more optimization signal from trajectory groups than GRPO's monolithic baseline.
Diffusion-APO synchronizes training noise with inference trajectories in video diffusion models to improve preference alignment and visual quality.
citing papers explorer
-
Teaching Language Models How to Code Like Learners: Conversational Serialization for Student Simulation
Training open-weight LLMs on conversational serializations of authentic student programming submissions produces artificial learners that better replicate real debugging behavior than code-only baselines or prompted large models.
-
Inference-Time Scaling for Diffusion Models beyond Scaling Denoising Steps
Diffusion models improve generation quality via inference-time search over noise candidates guided by verifiers and algorithms, yielding gains beyond denoising step scaling on class- and text-conditioned benchmarks.
-
LambdaPO: A Lambda Style Policy Optimization for Reasoning Language Models
LambdaPO introduces pairwise preference-based advantage estimation and a semantic density reward to extract more optimization signal from trajectory groups than GRPO's monolithic baseline.
-
Diffusion-APO: Trajectory-Aware Direct Preference Alignment for Video Diffusion Transformers
Diffusion-APO synchronizes training noise with inference trajectories in video diffusion models to improve preference alignment and visual quality.