pith. sign in

arxiv: 2601.21350 · v2 · pith:O5RC7P44new · submitted 2026-01-29 · 💻 cs.LG

Factored Causal Representation Learning for Robust Reward Modeling in RLHF

Pith reviewed 2026-05-21 14:13 UTC · model grok-4.3

classification 💻 cs.LG
keywords causal representation learningreward modelingRLHFreward hackingadversarial trainingfactored embeddingsspurious correlations
0
0 comments X

The pith

Decomposing model embeddings into causal and non-causal factors creates robust reward models for RLHF.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper tries to establish that reward models for RLHF can avoid reward hacking by learning to base predictions only on causal factors in the input embedding. It separates the embedding into parts that suffice for predicting human rewards and parts that capture irrelevant attributes like response length or sycophantic bias. An adversarial training step with gradient reversal is used to stop the non-causal part from leaking reward information. If this works, it should lead to reward models that generalize better and produce higher-quality aligned language models on tasks like math and dialogue.

Core claim

The central claim is that a factored representation learning approach, which extracts causal factors sufficient for reward prediction from contextual embeddings while isolating non-causal factors, combined with an adversarial head and gradient reversal, results in reward models that are robust to spurious features and improve downstream RLHF performance over baselines.

What carries the argument

The factored causal representation that decomposes contextual embeddings into causal factors for the reward head and non-causal factors blocked by adversarial gradient reversal.

If this is right

  • Reward models will be less prone to exploiting biases such as favoring longer responses or sycophantic content.
  • Downstream RLHF will yield policies with better performance on mathematical and dialogue tasks.
  • The separation helps validate mitigation of specific hacking behaviors like length and sycophantic bias.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • This decomposition might allow for better interpretability of what aspects of responses humans actually value.
  • Similar factoring could be applied to other preference-based learning methods to reduce shortcut learning.
  • Testing on larger models or more diverse feedback sources could reveal if the causal factors are consistent across domains.

Load-bearing premise

That the contextual embedding from the model can be decomposed into causal factors that are sufficient and necessary for accurate reward prediction and non-causal factors that can be isolated without losing predictive power.

What would settle it

Observing that the reward model still performs better when non-causal factors are included or that adversarial training fails to reduce correlation between non-causal factors and rewards would challenge the claim.

Figures

Figures reproduced from arXiv: 2601.21350 by Biwei Huang, Fan Feng, Lei Xu, Lin Qu, Lin Yang, Shikui Tu, Wanxi Deng, Yupei Yang.

Figure 1
Figure 1. Figure 1: Causal graph for standard reward modeling. The prompt– response pair (x, y) encode both causal (z c ) and non-causal (z nc) factors, which in turn affect the predicted reward r. While the path z c → r is desired, the spurious path z nc → r leads to reward hacking. thereby leading to reward hacking. For example, suppose z nc captures response length on mathematical tasks, then changing the length alone may … view at source ↗
Figure 2
Figure 2. Figure 2: Overview of CausalRM. The backbone embedding h is factorized into causal latents z c and non-causal latents z nc via a variational encoder. Reward prediction is restricted to depend only on z c , while an adversarial head trained through a gradient reversal layer (GRL) discourages z nc from encoding reward-predictive information. A reconstruction decoder prevents degenerate factorization by reconstructing … view at source ↗
Figure 4
Figure 4. Figure 4: Average win rate against the SFT model on the ID test sets of open-ended dialogue benchmarks during RLHF. 0.05 0.15 0.25 0.35 0.45 0.55 0.65 0.75 0.85 0.95 0.2 0.4 0.6 0.8 1 Normalized answer length Normalized reward Standard RM (σlen=0.12) GoalRM (σlen=0.22) InfoRM (σlen=0.14) CausalRM (Ours, σlen=0.03) [PITH_FULL_IMAGE:figures/full_fig_p007_4.png] view at source ↗
Figure 6
Figure 6. Figure 6: Length sensitivity under ablations on mathematical reasoning. Length is normalized to [0, 1] and rewards are averaged within length quantile buckets on chosen responses from the ID test set. bottlenecked latent: even when capacity is constrained, a single latent can still entangle spurious cues with reward-relevant features, whereas the factorized design makes it easier to route spurious variation away fro… view at source ↗
Figure 7
Figure 7. Figure 7: Reward hacking behaviors on an ID MATH prompt. Standard RM outputs an incorrect boxed answer (-22), InfoRM exhibits format hacking by outputting code without a final boxed answer, and GoalRM answers correctly but continues with an unrelated prompt (off-topic continuation). In contrast, CausalRM follows the instruction and produces the correct boxed answer (-10). 19 [PITH_FULL_IMAGE:figures/full_fig_p019_7.png] view at source ↗
Figure 8
Figure 8. Figure 8: Reward hacking behaviors on a GSM-Hard prompt. Standard RM computes the correct numerical result but outputs an incorrect boxed answer due to arithmetic error. InfoRM correctly calculates the balance but hacks the format by overriding the true answer with 0, falsely claiming no overpayment. GoalRM produces the right magnitude but misses the negative sign and appends an unrelated continuity proof (off-topic… view at source ↗
Figure 9
Figure 9. Figure 9: Reward hacking behaviors on an Anthropic-Helpful prompt. Standard RM and GoalRM exhibit verbosity hacking by generating excessively long, repetitive ingredient lists (e.g., duplicating the same vegetables or repeatedly listing “bay leaf”), which inflates superficial “helpfulness” without adding useful content. InfoRM produces a reasonable recipe but drifts off-topic by continuing into an unrelated dialogue… view at source ↗
Figure 10
Figure 10. Figure 10: Reward hacking behaviors on a SHP prompt. Standard RM exhibits misleading explanations by providing a factually incorrect rationale (claiming that “Tupperware is not a dish, so it does not get wet”). GoalRM and InfoRM avoid the explicit error but give shallow, incomplete explanations that do not account for how plastic and container geometry affect drying. In contrast, CausalRM produces a coherent, physic… view at source ↗
read the original abstract

A reliable reward model is essential for aligning large language models with human preferences through reinforcement learning from human feedback. However, standard reward models are susceptible to spurious features that are not causally related to human labels. This can lead to reward hacking, where high predicted reward does not translate into better behavior. In this work, we address this problem from a causal perspective by proposing a factored representation learning framework that decomposes the model's contextual embedding into (1) causal factors that are sufficient for reward prediction and (2) non-causal factors that capture reward-irrelevant attributes such as length or sycophantic bias. The reward head is then constrained to depend only on the causal component. In addition, we introduce an adversarial head trained to predict reward from the non-causal factors, while applying gradient reversal to discourage them from encoding reward-relevant information. Experiments on both mathematical and dialogue tasks demonstrate that our method learns more robust reward models and consistently improves downstream RLHF performance over state-of-the-art baselines. Analyses on length and sycophantic bias further validate the effectiveness of our method in mitigating reward hacking behaviors.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper proposes a factored causal representation learning framework for reward modeling in RLHF. It decomposes contextual embeddings into causal factors (used by the reward head) and non-causal factors (capturing attributes like length or sycophancy). An adversarial head predicts reward from non-causal factors with gradient reversal applied to discourage encoding of reward-relevant information. Experiments on mathematical and dialogue tasks report more robust reward models and improved downstream RLHF performance over baselines, supported by analyses mitigating length and sycophantic biases.

Significance. If the causal/non-causal separation holds, the method offers a principled way to reduce reward hacking from spurious correlations, improving reliability of RLHF for LLM alignment. The dual-task experimental validation and bias-specific analyses indicate practical relevance for robust reward modeling.

major comments (2)
  1. [§3.2] §3.2 (Adversarial component): The gradient reversal mechanism is central to the robustness claim, yet no post-training verification is provided, such as adversarial head accuracy, mutual information estimates between non-causal factors and reward labels, or an ablation removing the reversal term to measure increased reward hacking. This leaves open the possibility of residual reward signal leakage.
  2. [§4] §4 (Experiments): While improvements over baselines are reported for both math and dialogue tasks, the results lack an explicit ablation isolating the contribution of the factored decomposition versus standard adversarial training, which is load-bearing for attributing gains to the causal factoring approach.
minor comments (2)
  1. [Abstract] The abstract refers to 'state-of-the-art baselines' without naming them; this should be clarified with specific citations or a table reference.
  2. [§3] Notation for the causal factor z_c and non-causal factor z_n could be introduced with explicit equations in the method section for clarity.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their constructive and detailed comments, which highlight important aspects for strengthening the robustness claims in our work. We address each major comment below and indicate the revisions we will make to the manuscript.

read point-by-point responses
  1. Referee: [§3.2] §3.2 (Adversarial component): The gradient reversal mechanism is central to the robustness claim, yet no post-training verification is provided, such as adversarial head accuracy, mutual information estimates between non-causal factors and reward labels, or an ablation removing the reversal term to measure increased reward hacking. This leaves open the possibility of residual reward signal leakage.

    Authors: We agree that explicit post-training verification of the adversarial component would provide stronger evidence for the effectiveness of gradient reversal in preventing reward signal leakage. In the revised manuscript, we will add analyses including the accuracy of the adversarial head when predicting reward labels from the non-causal factors, as well as an ablation that removes the reversal term and measures the resulting increase in reward hacking behaviors. These additions will directly address the concern regarding residual leakage. revision: yes

  2. Referee: [§4] §4 (Experiments): While improvements over baselines are reported for both math and dialogue tasks, the results lack an explicit ablation isolating the contribution of the factored decomposition versus standard adversarial training, which is load-bearing for attributing gains to the causal factoring approach.

    Authors: We acknowledge that an explicit ablation separating the contribution of the factored causal decomposition from standard adversarial training is necessary to rigorously attribute the observed gains. In the revised experiments section, we will include this comparison, evaluating both the full proposed method and a standard adversarial training baseline (without the causal/non-causal factoring) on the mathematical and dialogue tasks. This will clarify the specific role of the factored representation in improving robustness. revision: yes

Circularity Check

0 steps flagged

No significant circularity detected

full rationale

The paper defines a factored representation learning method that decomposes contextual embeddings into causal and non-causal factors, constrains the reward head to the causal part, and uses an adversarial head with gradient reversal on the non-causal part. This construction is presented as a novel application of existing causal representation learning and adversarial training techniques rather than a self-referential definition or a fitted parameter renamed as a prediction. No equations or steps in the provided abstract reduce the claimed robustness or RLHF improvement to the inputs by construction, and the experimental claims on mathematical and dialogue tasks are presented as independent validation. The derivation chain remains self-contained against external benchmarks with no load-bearing self-citations or uniqueness theorems invoked from prior author work.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Insufficient detail in the abstract to identify specific free parameters, axioms, or invented entities; the method appears to build on standard causal and adversarial ML techniques.

pith-pipeline@v0.9.0 · 5738 in / 1162 out tokens · 73063 ms · 2026-05-21T14:13:20.943874+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. Reward Hacking in the Era of Large Models: Mechanisms, Emergent Misalignment, Challenges

    cs.LG 2026-04 unverdicted novelty 5.0

    The paper introduces the Proxy Compression Hypothesis as a unifying framework explaining reward hacking in RLHF as an emergent result of compressing high-dimensional human objectives into proxy reward signals under op...

Reference graph

Works this paper leans on

37 extracted references · 37 canonical work pages · cited by 1 Pith paper · 12 internal anchors

  1. [1]

    Deep Variational Information Bottleneck

    Alemi, A. A., Fischer, I., Dillon, J. V ., and Murphy, K. Deep variational information bottleneck.arXiv preprint arXiv:1612.00410,

  2. [2]

    Concrete Problems in AI Safety

    Amodei, D., Olah, C., Steinhardt, J., Christiano, P., Schul- man, J., and Mané, D. Concrete problems in ai safety. arXiv preprint arXiv:1606.06565,

  3. [3]

    A General Language Assistant as a Laboratory for Alignment

    Askell, A., Bai, Y ., Chen, A., Drain, D., Ganguli, D., Henighan, T., Jones, A., Joseph, N., Mann, B., DasSarma, N., et al. A general language assistant as a laboratory for alignment.arXiv preprint arXiv:2112.00861,

  4. [4]

    Training a Helpful and Harmless Assistant with Reinforcement Learning from Human Feedback

    Bai, Y ., Jones, A., Ndousse, K., Askell, A., Chen, A., Das- Sarma, N., Drain, D., Fort, S., Ganguli, D., Henighan, T., et al. Training a helpful and harmless assistant with rein- forcement learning from human feedback.arXiv preprint arXiv:2204.05862,

  5. [5]

    Odin: Disentangled reward mitigates hacking in rlhf.arXiv preprint arXiv:2402.07319,

    Chen, L., Zhu, C., Soselia, D., Chen, J., Zhou, T., Goldstein, T., Huang, H., Shoeybi, M., and Catanzaro, B. Odin: Disentangled reward mitigates hacking in rlhf.arXiv preprint arXiv:2402.07319,

  6. [6]

    Exploring the use of large language models for reference-free text quality evaluation: An empirical study.arXiv preprint arXiv:2304.00723,

    Chen, Y ., Wang, R., Jiang, H., Shi, S., and Xu, R. Exploring the use of large language models for reference-free text quality evaluation: An empirical study.arXiv preprint arXiv:2304.00723,

  7. [7]

    E., et al

    Chiang, W.-L., Li, Z., Lin, Z., Sheng, Y ., Wu, Z., Zhang, H., Zheng, L., Zhuang, S., Zhuang, Y ., Gonzalez, J. E., et al. Vicuna: An open-source chatbot impressing gpt-4 with 90%* chatgpt quality.See https://vicuna. lmsys. org (accessed 14 April 2023), 2(3):6,

  8. [8]

    Reward model ensembles help mitigate overoptimization.arXiv preprint arXiv:2310.02743,

    Coste, T., Anwar, U., Kirk, R., and Krueger, D. Reward model ensembles help mitigate overoptimization.arXiv preprint arXiv:2310.02743,

  9. [9]

    Dubois, Y ., Galambosi, B., Liang, P., and Hashimoto, T. B. Length-controlled alpacaeval: A simple way to debias automatic evaluators.arXiv preprint arXiv:2404.04475,

  10. [10]

    Helping or herding? reward model ensembles mitigate but do not eliminate reward hacking.arXiv preprint arXiv:2312.09244,

    Eisenstein, J., Nagpal, C., Agarwal, A., Beirami, A., D’Amour, A., Dvijotham, D., Fisch, A., Heller, K., Pfohl, S., Ramachandran, D., et al. Helping or herding? reward model ensembles mitigate but do not eliminate reward hacking.arXiv preprint arXiv:2312.09244,

  11. [11]

    Reward shaping to mitigate reward hacking in rlhf.arXiv preprint arXiv:2502.18770,

    Fu, J., Zhao, X., Yao, C., Wang, H., Han, Q., and Xiao, Y . Reward shaping to mitigate reward hacking in rlhf.arXiv preprint arXiv:2502.18770,

  12. [12]

    E., and Goodman, N

    He-Yueya, J., Poesia, G., Wang, R. E., and Goodman, N. D. Solving math word problems by combining lan- guage models with symbolic solvers.arXiv preprint arXiv:2304.09102,

  13. [13]

    OpenRLHF: An Easy-to-use, Scalable and High-performance RLHF Framework

    Hu, J., Wu, X., Zhu, Z., Xianyu, Wang, W., Zhang, D., and Cao, Y . Openrlhf: An easy-to-use, scalable 9 Factored Causal Representation Learning for Robust Reward Modeling in RLHF and high-performance rlhf framework.arXiv preprint arXiv:2405.11143,

  14. [14]

    Adarl: What, where, and how to adapt in transfer reinforcement learning.arXiv preprint arXiv:2107.02729,

    Huang, B., Feng, F., Lu, C., Magliacane, S., and Zhang, K. Adarl: What, where, and how to adapt in transfer reinforcement learning.arXiv preprint arXiv:2107.02729,

  15. [15]

    Mawps: A math word problem reposi- tory

    Koncel-Kedziorski, R., Roy, S., Amini, A., Kushman, N., and Hajishirzi, H. Mawps: A math word problem reposi- tory. InProceedings of the 2016 conference of the north american chapter of the association for computational lin- guistics: human language technologies, pp. 1152–1157,

  16. [16]

    Partial identifiability for domain adaptation.arXiv preprint arXiv:2306.06510,

    Kong, L., Xie, S., Yao, W., Zheng, Y ., Chen, G., Stojanov, P., Akinwande, V ., and Zhang, K. Partial identifiability for domain adaptation.arXiv preprint arXiv:2306.06510,

  17. [17]

    Rrm: Robust reward model training mitigates reward hacking.arXiv preprint arXiv:2409.13156,

    Liu, T., Xiong, W., Ren, J., Chen, L., Wu, J., Joshi, R., Gao, Y ., Shen, J., Qin, Z., Yu, T., et al. Rrm: Robust reward model training mitigates reward hacking.arXiv preprint arXiv:2409.13156,

  18. [18]

    Information-theoretic reward modeling for stable rlhf: Detecting and mitigating reward hacking.arXiv preprint arXiv:2510.13694, 2025a

    Miao, Y ., Ding, L., Zhang, S., Bao, R., Zhang, L., and Tao, D. Information-theoretic reward modeling for stable rlhf: Detecting and mitigating reward hacking.arXiv preprint arXiv:2510.13694, 2025a. Miao, Y ., Zhang, S., Ding, L., Zhang, Y ., Zhang, L., and Tao, D. The energy loss phenomenon in rlhf: A new perspective on mitigating reward hacking.arXiv pr...

  19. [19]

    Ovinnikov, I., Bykovets, E., and Buhmann, J. M. Learning causally invariant reward functions from diverse demon- strations.arXiv preprint arXiv:2409.08012,

  20. [20]

    Disentan- gling length from quality in direct preference optimiza- tion.arXiv preprint arXiv:2403.19159,

    Park, R., Rafailov, R., Ermon, S., and Finn, C. Disentan- gling length from quality in direct preference optimiza- tion.arXiv preprint arXiv:2403.19159,

  21. [21]

    Are NLP Models really able to Solve Simple Math Word Problems?

    Patel, A., Bhattamishra, S., and Goyal, N. Are nlp models really able to solve simple math word problems?arXiv preprint arXiv:2103.07191,

  22. [22]

    Discovering language model behaviors with model- written evaluations

    Perez, E., Ringer, S., Lukosiute, K., Nguyen, K., Chen, E., Heiner, S., Pettit, C., Olsson, C., Kundu, S., Kadavath, S., et al. Discovering language model behaviors with model- written evaluations. InFindings of the Association for Computational Linguistics: ACL 2023, pp. 13387–13434,

  23. [23]

    Identifiability of Causal Graphs using Functional Models

    Peters, J., Mooij, J., Janzing, D., and Schölkopf, B. Identifi- ability of causal graphs using functional models.arXiv preprint arXiv:1202.3757,

  24. [24]

    Qwen2.5 Technical Report

    Qwen, :, Yang, A., Yang, B., Zhang, B., Hui, B., Zheng, B., Yu, B., Li, C., Liu, D., Huang, F., Wei, H., Lin, H., Yang, J., Tu, J., Zhang, J., Yang, J., Yang, J., Zhou, J., Lin, J., Dang, K., Lu, K., Bao, K., Yang, K., Yu, L., Li, M., Xue, M., Zhang, P., Zhu, Q., Men, R., Lin, R., Li, T., Tang, T., Xia, T., Ren, X., Ren, X., Fan, Y ., Su, Y ., Zhang, Y .,...

  25. [25]

    Proximal Policy Optimization Algorithms

    Schulman, J., Wolski, F., Dhariwal, P., Radford, A., and Klimov, O. Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347,

  26. [26]

    Towards Understanding Sycophancy in Language Models

    Sharma, M., Tong, M., Korbak, T., Duvenaud, D., Askell, A., Bowman, S. R., Cheng, N., Durmus, E., Hatfield- Dodds, Z., Johnston, S. R., et al. Towards understand- ing sycophancy in language models.arXiv preprint arXiv:2310.13548,

  27. [27]

    A long way to go: Investigating length correlations in rlhf.arXiv preprint arXiv:2310.03716,

    Singhal, P., Goyal, T., Xu, J., and Durrett, G. A long way to go: Investigating length correlations in rlhf.arXiv preprint arXiv:2310.03716,

  28. [28]

    Causal re- ward adjustment: Mitigating reward hacking in exter- nal reasoning via backdoor correction.arXiv preprint arXiv:2508.04216, 2025a

    Song, R., Song, Z., Guo, H., and Qiang, W. Causal re- ward adjustment: Mitigating reward hacking in exter- nal reasoning via backdoor correction.arXiv preprint arXiv:2508.04216, 2025a. Song, X., Sun, J., Li, Z., Zheng, Y ., and Zhang, K. Llm interpretability with identifiable temporal-instantaneous representation.arXiv preprint arXiv:2509.23323, 2025b. St...

  29. [29]

    Counterfactual invariance to spurious correlations: Why and how to pass stress tests.arXiv preprint arXiv:2106.00545,

    Veitch, V ., D’Amour, A., Yadlowsky, S., and Eisenstein, J. Counterfactual invariance to spurious correlations: Why and how to pass stress tests.arXiv preprint arXiv:2106.00545,

  30. [30]

    Beyond reward hacking: Causal rewards for large language model alignment.arXiv preprint arXiv:2501.09620,

    Wang, C., Zhao, Z., Jiang, Y ., Chen, Z., Zhu, C., Chen, Y ., Liu, J., Zhang, L., Fan, X., Ma, H., et al. Beyond reward hacking: Causal rewards for large language model alignment.arXiv preprint arXiv:2501.09620,

  31. [31]

    Yang, A., Zhang, B., Hui, B., Gao, B., Yu, B., Li, C., Liu, D., Tu, J., Zhou, J., Lin, J., et al. Qwen2. 5-math techni- cal report: Toward mathematical expert model via self- improvement.arXiv preprint arXiv:2409.12122, 2024a. Yang, Y ., Huang, B., Feng, F., Wang, X., Tu, S., and Xu, L. Towards generalizable reinforcement learning via causality-guided sel...

  32. [32]

    Learning temporally causal latent processes from general temporal data.arXiv preprint arXiv:2110.05428,

    Yao, W., Sun, Y ., Ho, A., Sun, C., and Zhang, K. Learning temporally causal latent processes from general temporal data.arXiv preprint arXiv:2110.05428,

  33. [33]

    Explore spurious correlations at the concept level in language models for text classification.arXiv preprint arXiv:2311.08648,

    Zhou, Y ., Xu, P., Liu, X., An, B., Ai, W., and Huang, F. Explore spurious correlations at the concept level in language models for text classification.arXiv preprint arXiv:2311.08648,

  34. [34]

    Gsm- infinite: How do your llms behave over infinitely increas- ing context length and reasoning complexity?arXiv preprint arXiv:2502.05252,

    11 Factored Causal Representation Learning for Robust Reward Modeling in RLHF Zhou, Y ., Liu, H., Chen, Z., Tian, Y ., and Chen, B. Gsm- infinite: How do your llms behave over infinitely increas- ing context length and reasoning complexity?arXiv preprint arXiv:2502.05252,

  35. [35]

    Derivation of the Minimal Sufficiency Objective In this section, we derive a variational lower bound for Eq

    12 Factored Causal Representation Learning for Robust Reward Modeling in RLHF A. Derivation of the Minimal Sufficiency Objective In this section, we derive a variational lower bound for Eq. (11), following Alemi et al. (2016) and Miao et al. (2024). Recall that our minimal sufficiency objective for the causal latentz c is maxI(z c;r)−λ c KL I(h;z c),(16) ...

  36. [36]

    Results and analysis.Tables 9 and 10 summarize the ablation results

    All other training details follow the main experiments to ensure a controlled comparison. Results and analysis.Tables 9 and 10 summarize the ablation results. Overall, we observe that the full CausalRM consistently performs best, and different components contribute in complementary ways. Factorization and the structural restriction are important.Compared ...

  37. [37]

    bay leaf

    = 6 (1) = 6\\)\n\nThe least possible result is \\(-10\\).The final answer is (\\boxed{-10}).```python\nimport itertools\n\n# Define the numbers\nnumbers = [4, 5, 6]\n\n# Initialize the minimum value to a large number\nmin_value = float('inf')\n\n# Iterate through all permutations of the numbers\nfor A, B, C in itertools.permutations(numbers):\n # Calculat...