Nabla-r2d3: Effective and efficient 3d diffusion alignment with 2d rewards

Qingming Liu, Zhen Liu, Dinghuai Zhang, Kui Jia · 2025 · arXiv 2506.15684

4 Pith papers cite this work. Polarity classification is still indexing.

4 Pith papers citing it

read on arXiv browse 4 citing papers

citation-role summary

background 1

citation-polarity summary

background 1

representative citing papers

Sculpting NeRF Geometry: Human-Preference Fine-Tuning of a 3D-Aware Face GAN

cs.CV · 2026-06-25 · unverdicted · novelty 7.0

Fine-tunes EG3D using a human-preference reward on NeRF density to improve face geometry, achieving 74.4% user preference in pairwise tests with FID rising from 4.09 to 6.66.

A Cross-Model VLM-Judge Protocol for Single-Image 3D Mesh Quality (and Why Cheap Proxies Fall Short)

cs.LG · 2026-06-16 · unverdicted · novelty 6.0

A reproducible VLM-judge protocol with position-bias correction is validated as superior to CLIP similarity and geometry-validity proxies for assessing single-image 3D mesh quality.

Reward Hacking in the Era of Large Models: Mechanisms, Emergent Misalignment, Challenges

cs.LG · 2026-04-15 · unverdicted · novelty 5.0

The paper introduces the Proxy Compression Hypothesis as a unifying framework explaining reward hacking in RLHF as an emergent result of compressing high-dimensional human objectives into proxy reward signals under optimization pressure.

Proxy Reward Internalization and Mechanistic Exploitation: A Learned Precursor to Reward Hacking and Its Generalization

cs.AI · 2026-06-08 · unverdicted · novelty 4.0

Proxy RL produces a staged proxy-internalization capability that emerges before and predicts reward hacking in coding environments.

citing papers explorer

Showing 1 of 1 citing paper after filters.

Reward Hacking in the Era of Large Models: Mechanisms, Emergent Misalignment, Challenges cs.LG · 2026-04-15 · unverdicted · none · ref 176
The paper introduces the Proxy Compression Hypothesis as a unifying framework explaining reward hacking in RLHF as an emergent result of compressing high-dimensional human objectives into proxy reward signals under optimization pressure.

Nabla-r2d3: Effective and efficient 3d diffusion alignment with 2d rewards

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer