pith. sign in

arxiv: 2605.15726 · v1 · pith:6LX4WER7new · submitted 2026-05-15 · 💻 cs.AI · cs.CL

Nudging Beyond the Comfort Zone: Efficient Strategy-Guided Exploration for RLVR

Pith reviewed 2026-05-20 19:14 UTC · model grok-4.3

classification 💻 cs.AI cs.CL
keywords reinforcement learningverifiable rewardsexplorationlarge language modelsmath reasoningstrategy nudgingRLVRGRPO
0
0 comments X

The pith

Conditioning rollouts on lightweight strategy contexts enables efficient diverse exploration in RLVR without oracle supervision or brute-force scaling.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper aims to overcome the exploration bottleneck in reinforcement learning with verifiable rewards for language models, where policies improve only on sampled trajectories. It proposes NudgeRL, which uses Strategy Nudging to condition each rollout on lightweight strategy-level contexts that induce diverse reasoning paths. A unified objective then decomposes rewards into inter- and intra-context parts while adding a distillation term to transfer useful behaviors back to the base policy. This structured approach outperforms standard GRPO even when the baseline uses up to eight times more rollouts and beats oracle-guided methods on average across five math benchmarks. The core idea is that context-driven diversity offers a scalable alternative to both massive rollout increases and privileged-information techniques.

Core claim

NudgeRL introduces Strategy Nudging, which conditions each rollout on lightweight, strategy-level contexts to induce diverse reasoning trajectories without relying on expensive oracle supervision. It pairs this with a unified objective that decomposes the reward signal into inter- and intra-context components and incorporates a distillation objective to transfer discovered behaviors back to the base policy. Empirically, the method outperforms standard GRPO with up to 8 times larger rollout budgets and outperforms an oracle-guided RL baseline on average across five challenging math benchmarks, showing that structured context-driven exploration can replace both brute-force scaling and methods,

What carries the argument

Strategy Nudging: conditioning each rollout on lightweight, strategy-level contexts to induce diverse reasoning trajectories without oracle supervision or expensive signals.

If this is right

  • Decomposing rewards into inter- and intra-context components allows the policy to learn both from diversity across strategies and consistency within each.
  • Distillation transfers newly discovered behaviors from nudged trajectories back into the base policy without permanent context dependence.
  • Structured exploration scales better than increasing rollout count, delivering higher performance at lower compute for the same training budget.
  • The method works without privileged oracle information, making it applicable where feasibility signals are unavailable.
  • Context-driven diversity provides a practical substitute for brute-force sampling in RLVR settings.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same lightweight-context idea could be tested in non-math domains such as code generation or scientific reasoning to check whether strategy nudging generalizes beyond the reported benchmarks.
  • If the strategy contexts can be generated automatically rather than hand-specified, the framework would require even less human design.
  • Combining NudgeRL with existing reward-shaping or curriculum methods might compound the exploration gains.
  • The inter/intra-context decomposition suggests a general pattern for turning any diversity source into a usable training signal in policy optimization.

Load-bearing premise

Lightweight strategy-level contexts are sufficient to induce meaningfully diverse reasoning trajectories without oracle supervision or expensive additional signals.

What would settle it

A controlled run of NudgeRL with the strategy contexts removed or randomized that fails to show gains over plain GRPO at matched rollout budgets would falsify the claim that the nudging mechanism drives the efficiency improvement.

Figures

Figures reproduced from arXiv: 2605.15726 by Chanuk Lee, Minki Kang, Sangwoo Park, Sung Ju Hwang.

Figure 1
Figure 1. Figure 1: Concept: Improving exploration diversity through Strategy Nudging. (a) Naive sampling methods (e.g., GRPO) often collapse to a dominant reasoning mode, limiting the exploration of the reasoning space. (b) NUDGERL introduces Strategy Nudging, which appends lightweight strategy to the input, forcing the model to traverse diverse reasoning modes. (c) As a result, Strategy Nudging significantly increases the n… view at source ↗
Figure 2
Figure 2. Figure 2: Overview of the NudgeRL learning mechanism. (a) Inter-Intra Group Advantage: Demonstrates credit assignment that emphasizes reliable contexts (i.e., λ ∈ (1, 2]). A successful rollout from a consistently high-reward context (Strategy B) receives a larger positive advantage than a rare success from a low-reward context (Strategy A). (b) Self-distillation: Illustrates bridging the train-test gap. High-quality… view at source ↗
Figure 3
Figure 3. Figure 3: Training dynamics and evaluation performance on Qwen3-4B-Instruct. (a) EMA￾smoothed training reward with decay factor 0.99. (b, c) Average pass@1 and pass@k on AIME24/25, estimated from 64 sampled rollouts using the unbiased estimator. smaller rollout budget. On Olmo3-7B-Instruct-SFT, NUDGERL likewise improves over the best GRPO result, achieving 0.285 compared to 0.281 at 32 rollouts. These results indica… view at source ↗
Figure 4
Figure 4. Figure 4: NUDGERL internalizes effective test-time strategies. Across 32 rollouts on a AIME25 problem, GRPO yields only incorrect and truncated trajectories. Conversely, NUDGERL produces 6 correct solutions using the shoelace formula. 100 200 300 400 500 Step 0.3 0.4 0.5 0.6 Context Reward Hinted Reward Dropout Reward [PITH_FULL_IMAGE:figures/full_fig_p008_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Training dynamics. We report time-weighted EMA re￾ward mean (0.99) with and with￾out context. 0.00 0.25 0.50 0.75 pdrop 0.54 0.56 0.58 0.60 A v era g e p a s s @1 NudgeRL (a) Ablation on pdrop Random Top ranked Hint sampling 0.56 0.58 0.60 A v era g e p a s s @1 NudgeRL (b) Ablation on sampling [PITH_FULL_IMAGE:figures/full_fig_p008_5.png] view at source ↗
Figure 7
Figure 7. Figure 7: Ablation on learning and ϵhigh scaling results. We report Average pass@1 estimated using 128 rollouts on AIME24/25,AMC23,MATH500 dataset. As shown in Fig. 6b, random sampling consistently outperforms top-ranked selection in terms of pass@1. While top-ranked contexts ensure more correctness, they tend to concentrate on a narrow set of reasoning strategies. In contrast, random sampling induces a broader dist… view at source ↗
read the original abstract

Reinforcement learning with verifiable rewards (RLVR) has emerged as a scalable paradigm for improving the reasoning capabilities of large language models. However, its effectiveness is fundamentally limited by exploration: the policy can only improve on trajectories it has already sampled. While increasing the number of rollouts alleviates this issue, such brute-force scaling is computationally expensive, and existing approaches that modify the optimization objective provide limited control over what is explored. In this work, we propose NudgeRL, a framework for structured and diversity-driven exploration in RLVR. Our approach introduces Strategy Nudging, which conditions each rollout on lightweight, strategy-level contexts to induce diverse reasoning trajectories without relying on expensive oracle supervision. To effectively learn from such structured exploration, we further propose a unified objective, which decomposes the reward signal into inter- and intra-context components and incorporates a distillation objective to transfer discovered behaviors back to the base policy. Empirically, NudgeRL outperforms standard GRPO with up to 8 times larger rollout budgets, while outperforming oracle-guided RL baseline on average across five challenging math benchmarks. These results demonstrate that structured, context-driven exploration can serve as an efficient and scalable alternative to both brute-force rollout scaling and feasibility-oriented methods based on privileged information. Our code is available at https://github.com/tally0818/NudgeRL.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The manuscript proposes NudgeRL, a framework for structured and diversity-driven exploration in reinforcement learning with verifiable rewards (RLVR) applied to large language models. It introduces Strategy Nudging, which conditions each rollout on lightweight strategy-level contexts to induce diverse reasoning trajectories without expensive oracle supervision. A unified objective is proposed that decomposes the reward signal into inter- and intra-context components and incorporates a distillation term to transfer discovered behaviors back to the base policy. Empirically, the work claims that NudgeRL outperforms standard GRPO even when the latter uses up to 8 times larger rollout budgets, while also outperforming an oracle-guided RL baseline on average across five challenging math benchmarks.

Significance. If the central claims hold after addressing the empirical and methodological details, this work offers a meaningful advance in efficient exploration for RLVR, providing a scalable alternative to brute-force rollout scaling and to methods that rely on privileged oracle information. The public release of code supports reproducibility and positions the contribution as a practical step toward controlled diversity in LLM reasoning optimization.

major comments (2)
  1. [Abstract and §4] Abstract and §4 (Empirical Evaluation): The central performance claims—outperformance of GRPO with 8× rollout budgets and superiority to the oracle-guided baseline—are presented without reported statistical significance tests, exact benchmark splits, number of independent runs, variance measures, or ablation controls on the inter-/intra-context decomposition and distillation terms. These omissions are load-bearing because the soundness of the efficiency and superiority arguments rests directly on the reliability of the reported gains.
  2. [§3] §3 (Strategy Nudging and Unified Objective): The description of how lightweight strategy-level contexts are generated and selected must explicitly demonstrate that the process does not draw on model capabilities equivalent to those exploited by the oracle-guided baseline. Without this clarification the comparison to the oracle baseline risks circularity, directly affecting the claim that the gains arise from the proposed nudging mechanism rather than hidden privileged signals.
minor comments (2)
  1. [§2.2] §2.2: Define the precise mathematical form of the inter-context and intra-context reward terms at first use to avoid ambiguity when they appear in the unified objective.
  2. [Figure 2] Figure 2: Add explicit labels for the context encoder, reward decomposition, and distillation loss components so that the diagram can be followed without reference to the main text.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the thoughtful and constructive comments on our manuscript. We address each major comment point by point below, indicating where revisions will be made to improve clarity and rigor.

read point-by-point responses
  1. Referee: [Abstract and §4] Abstract and §4 (Empirical Evaluation): The central performance claims—outperformance of GRPO with 8× rollout budgets and superiority to the oracle-guided baseline—are presented without reported statistical significance tests, exact benchmark splits, number of independent runs, variance measures, or ablation controls on the inter-/intra-context decomposition and distillation terms. These omissions are load-bearing because the soundness of the efficiency and superiority arguments rests directly on the reliability of the reported gains.

    Authors: We agree that the current empirical presentation would be strengthened by greater statistical detail and explicit ablations. In the revised manuscript we will report the number of independent runs conducted for each experiment, include variance measures (standard deviations across runs), specify the exact benchmark splits employed, and add statistical significance testing (e.g., paired t-tests) for the reported performance differences. We will also insert a dedicated ablation subsection that isolates the contributions of the inter-context reward term, the intra-context reward term, and the distillation objective. These additions directly address the load-bearing nature of the claims and will be placed in an expanded §4. revision: yes

  2. Referee: [§3] §3 (Strategy Nudging and Unified Objective): The description of how lightweight strategy-level contexts are generated and selected must explicitly demonstrate that the process does not draw on model capabilities equivalent to those exploited by the oracle-guided baseline. Without this clarification the comparison to the oracle baseline risks circularity, directly affecting the claim that the gains arise from the proposed nudging mechanism rather than hidden privileged signals.

    Authors: We appreciate the referee’s emphasis on avoiding any appearance of circularity. The revised §3 will contain an expanded subsection that explicitly describes the context-generation procedure: lightweight strategy contexts are produced by prompting the base policy itself with concise meta-instructions that elicit high-level reasoning directives (for example, “explore an algebraic approach” or “consider a proof by contradiction”). No ground-truth solutions, verified trajectories, or oracle-level supervision are provided at any stage. We will contrast this process with the oracle-guided baseline, which supplies direct access to expert solution paths, and will include illustrative prompts and pseudocode to make the distinction unambiguous. This clarification will confirm that performance gains derive from the structured nudging and unified objective rather than hidden privileged information. revision: yes

Circularity Check

0 steps flagged

New conditioning and objective components supply independent exploration without reduction to fitted inputs

full rationale

The derivation introduces Strategy Nudging via lightweight strategy-level contexts and a unified objective decomposing rewards into inter-/intra-context terms plus distillation. These are presented as novel additions rather than quantities defined by or fitted to the target performance metrics. Comparisons to GRPO (with scaled rollouts) and an oracle-guided baseline supply external grounding. No load-bearing step reduces a claimed prediction or uniqueness result to a self-citation, fitted parameter, or definitional tautology; the central claims retain independent content from the proposed mechanisms.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 0 invented entities

The central claims rest on the untested premise that short strategy contexts reliably produce diverse trajectories and that the decomposed objective transfers those behaviors effectively; no free parameters or invented entities are explicitly quantified in the abstract.

free parameters (1)
  • strategy context design
    Choice of lightweight strategy-level contexts and how they are encoded is introduced without reported fitting procedure or sensitivity analysis.
axioms (1)
  • domain assumption Conditioning on strategy contexts induces diverse reasoning trajectories
    Invoked as the core mechanism of Strategy Nudging.

pith-pipeline@v0.9.0 · 5773 in / 1036 out tokens · 45078 ms · 2026-05-20T19:14:23.350646+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

31 extracted references · 31 canonical work pages · 14 internal anchors

  1. [1]

    Matharena: Evaluating llms on uncontaminated math competitions, February 2025

    Mislav Balunovi ´c, Jasper Dekoninck, Ivo Petrov, Nikola Jovanovi ´c, and Martin Vechev. Matharena: Evaluating llms on uncontaminated math competitions, February 2025. URL https://matharena.ai/

  2. [2]

    Evaluating Large Language Models Trained on Code

    Mark Chen, Jerry Tworek, Heewoo Jun, Qiming Yuan, Henrique Ponde De Oliveira Pinto, Jared Kaplan, Harri Edwards, Yuri Burda, Nicholas Joseph, Greg Brockman, et al. Evaluating large language models trained on code.arXiv preprint arXiv:2107.03374, 2021

  3. [3]

    X., and Wen, J.-R

    Jia Deng, Jie Chen, Zhipeng Chen, Wayne Xin Zhao, and Ji-Rong Wen. Decomposing the entropy-performance exchange: The missing keys to unlocking effective reinforcement learning. arXiv preprint arXiv:2508.02260, 2025

  4. [4]

    Measuring mathematical problem solving with the math dataset.NeurIPS, 2021

    Dan Hendrycks, Collin Burns, Steven Basart, Andy Zou, Dawn Song, and Jacob Steinhardt. Measuring mathematical problem solving with the math dataset.NeurIPS, 2021

  5. [5]

    Brorl: Scaling reinforcement learning via broadened exploration.arXiv preprint arXiv:2510.01180, 2025

    Jian Hu, Mingjie Liu, Ximing Lu, Fang Wu, Zaid Harchaoui, Shizhe Diao, Yejin Choi, Pavlo Molchanov, Jun Yang, Jan Kautz, et al. Brorl: Scaling reinforcement learning via broadened exploration.arXiv preprint arXiv:2510.01180, 2025

  6. [6]

    Math-verify: A toolkit for verifying mathematical reasoning

    HuggingFace. Math-verify: A toolkit for verifying mathematical reasoning. https://github. com/huggingface/Math-Verify, 2024. Accessed 2026-05-06

  7. [7]

    Tulu 3: Pushing Frontiers in Open Language Model Post-Training

    Nathan Lambert, Jacob Morrison, Valentina Pyatkin, Shengyi Huang, Hamish Ivison, Faeze Brahman, Lester James V Miranda, Alisa Liu, Nouha Dziri, Shane Lyu, et al. Tulu 3: Pushing frontiers in open language model post-training.arXiv preprint arXiv:2411.15124, 2024

  8. [8]

    Self-hinting language models enhance reinforcement learning.arXiv preprint arXiv:2602.03143, 2026

    Baohao Liao, Hanze Dong, Xinxing Xu, Christof Monz, and Jiang Bian. Self-hinting language models enhance reinforcement learning.arXiv preprint arXiv:2602.03143, 2026

  9. [9]

    DeepSeek-V3.2: Pushing the Frontier of Open Large Language Models

    Aixin Liu, Aoxue Mei, Bangcai Lin, Bing Xue, Bingxuan Wang, Bingzheng Xu, Bochao Wu, Bowei Zhang, Chaofan Lin, Chen Dong, et al. Deepseek-v3. 2: Pushing the frontier of open large language models.arXiv preprint arXiv:2512.02556, 2025

  10. [10]

    ProRL: Prolonged Reinforcement Learning Expands Reasoning Boundaries in Large Language Models

    Mingjie Liu, Shizhe Diao, Ximing Lu, Jian Hu, Xin Dong, Yejin Choi, Jan Kautz, and Yi Dong. Prorl: Prolonged reinforcement learning expands reasoning boundaries in large language models. arXiv preprint arXiv:2505.24864, 2025

  11. [11]

    Understanding R1-Zero-Like Training: A Critical Perspective

    Zichen Liu, Changyu Chen, Wenjun Li, Penghui Qi, Tianyu Pang, Chao Du, Wee Sun Lee, and Min Lin. Understanding r1-zero-like training: A critical perspective.arXiv, 2503.20783, 2025. URLhttps://doi.org/10.48550/arXiv.2503.20783

  12. [12]

    American mathematics competitions

    Mathematical Association of America. American mathematics competitions. https://www. maa.org/math-competitions, 2023

  13. [13]

    Aime: American invitational mathematics examination

    Mathematical Association of America. Aime: American invitational mathematics examination. https://www.maa.org/math-competitions, 2025

  14. [14]

    Team Olmo, Allyson Ettinger, Amanda Bertsch, Bailey Kuehl, David Graham, David Heineman, Dirk Groeneveld, Faeze Brahman, Finbarr Timbers, Hamish Ivison, Jacob Morrison, Jake Poznanski, Kyle Lo, Luca Soldaini, Matt Jordan, Mayee Chen, Michael Noukhovitch, Nathan Lambert, Pete Walsh, Pradeep Dasigi, Robert Berry, Saumya Malik, Saurabh Shah, Scott 10 Geng, S...

  15. [15]

    Gpt-4o mini

    OpenAI. Gpt-4o mini. https://openai.com/ko-KR/index/ gpt-4o-mini-advancing-cost-efficient-intelligence/ , 2024. Accessed: 2026-05- 04

  16. [16]

    Pope: Learning to reason on hard problems via privileged on-policy exploration.arXiv preprint arXiv:2601.18779, 2026

    Yuxiao Qu, Amrith Setlur, Virginia Smith, Ruslan Salakhutdinov, and Aviral Kumar. Pope: Learning to reason on hard problems via privileged on-policy exploration.arXiv preprint arXiv:2601.18779, 2026

  17. [17]

    Proximal Policy Optimization Algorithms

    John Schulman, Filip Wolski, Prafulla Dhariwal, Alec Radford, and Oleg Klimov. Proximal policy optimization algorithms.arXiv preprint arXiv:1707.06347, 2017

  18. [18]

    Zhihong Shao, Peiyi Wang, Qihao Zhu, Runxin Xu, Junxiao Song, Mingchuan Zhang, Y . K. Li, Y . Wu, and Daya Guo. Deepseekmath: Pushing the limits of mathematical reasoning in open language models.arXiv, 2402.03300, 2024. URL https://doi.org/10.48550/arXiv. 2402.03300

  19. [19]

    arXiv preprint arXiv:2602.02482 , year=

    Yuda Song, Lili Chen, Fahim Tajwar, Remi Munos, Deepak Pathak, J Andrew Bagnell, Aarti Singh, and Andrea Zanette. Expanding the capabilities of reinforcement learning via text feedback.arXiv preprint arXiv:2602.02482, 2026

  20. [20]

    Kimi Team, Angang Du, Bofei Gao, Bowei Xing, Changjiu Jiang, Cheng Chen, Cheng Li, Chenjun Xiao, Chenzhuang Du, Chonghua Liao, et al. Kimi k1. 5: Scaling reinforcement learning with llms.arXiv preprint arXiv:2501.12599, 2025

  21. [21]

    Qwen3 Technical Report

    Qwen Team. Qwen3 technical report, 2025. URLhttps://arxiv.org/abs/2505.09388

  22. [22]

    TRL: Transformers Rein- forcement Learning, 2020

    Leandro von Werra, Younes Belkada, Lewis Tunstall, Edward Beeching, Tristan Thrush, Nathan Lambert, Shengyi Huang, Kashif Rasul, and Quentin Gallouédec. TRL: Transformers Rein- forcement Learning, 2020. URLhttps://github.com/huggingface/trl

  23. [23]

    The invisible leash: Why rlvr may or may not escape its origin.arXiv preprint arXiv:2507.14843, 2025

    Fang Wu, Weihao Xuan, Ximing Lu, Zaïd Harchaoui, and Yejin Choi. The invisible leash: Why RLVR may not escape its origin.arXiv, 2507.14843, 2025. URL https://doi.org/10. 48550/arXiv.2507.14843

  24. [24]

    DAPO: An Open-Source LLM Reinforcement Learning System at Scale

    Qiying Yu, Zheng Zhang, Ruofei Zhu, Yufeng Yuan, Xiaochen Zuo, Yu Yue, Tiantian Fan, Gaohong Liu, Lingjun Liu, Xin Liu, Haibin Lin, Zhiqi Lin, Bole Ma, Guangming Sheng, Yuxuan Tong, Chi Zhang, Mofan Zhang, Wang Zhang, Hang Zhu, Jinhua Zhu, Jiaze Chen, Jiangjie Chen, Chengyi Wang, Hongli Yu, Weinan Dai, Yuxuan Song, Xiangpeng Wei, Hao Zhou, Jingjing Liu, W...

  25. [25]

    URLhttps://doi.org/10.48550/arXiv.2503.14476

  26. [26]

    Does Reinforcement Learning Really Incentivize Reasoning Capacity in LLMs Beyond the Base Model?

    Yang Yue, Zhiqi Chen, Rui Lu, Andrew Zhao, Zhaokai Wang, Yang Yue, Shiji Song, and Gao Huang. Does reinforcement learning really incentivize reasoning capacity in llms beyond the base model?arXiv, 2504.13837, 2025. URL https://doi.org/10.48550/arXiv.2504. 13837

  27. [27]

    Revisiting Entropy Regularization: Adaptive Coefficient Unlocks Its Potential for LLM Reinforcement Learning

    Xiaoyun Zhang, Xiaojian Yuan, Di Huang, Wang You, Chen Hu, Jingqing Ruan, Kejiang Chen, and Xing Hu. Rediscovering entropy regularization: Adaptive coefficient unlocks its potential for llm reinforcement learning.arXiv preprint arXiv:2510.10959, 2025

  28. [28]

    Ceva’s theorem

    Xuechen Zhang, Zijian Huang, Yingcong Li, Chenshun Ni, Jiasi Chen, and Samet Oymak. Bread: Branched rollouts from expert anchors bridge sft & rl for reasoning.arXiv preprint arXiv:2506.17211, 2025. 11 A Related Work A.1 Reinforcement Learning with Verifiable Rewards Reinforcement Learning with Verifiable Rewards (RLVR) has emerged as a scalable paradigm f...

  29. [29]

    <start_working_out>

    [Hint 2] ... {num_hints}. [Hint {num_hints}] Make sure each hint is a distinct mathematical concept or theorem.""" Strategy Nudging prompt.Given a problem and an optional hint, we construct prompts that encourage the model to follow a specific reasoning strategy. The model is instructed to explicitly separate its reasoning process and final answer using p...

  30. [30]

    Let’s solve it

    Without loss of generality, assumex > y, sox−y= 7 √ 2— (4) — ### Step 3: Use equations (1), (2), (3), (4) From (1):x 2 +y 2 = 196 From (4):x=y+ 7 √ 2 Plug into (1): (y+ 7 √ 2)2 +y 2 = 196 y2 + 14 √ 2y+ 98 +y 2 = 196 2y2 + 14 √ 2y+ 98 = 196 2y2 + 14 √ 2y= 98 y2 + 7 √ 2y= 49 This is a quadratic iny. Let’s solve it. Complete the square: y2 + 7 √ 2y= 49 y2 + ...

  31. [31]

    = (13,3 √ 3) and: L= r 196− c 2 2 , c 2 ! Since: c 2 = 8 √ 3 and: c 2 2 = (8 √ 3)2 = 192 So: √ 196−192 = √ 4 = 2 Thus: L= (2,8 √ 3) 25 NudgeRL Solution (Cont.) Therefore: -K= (13,3 √ 3)-L= (2,8 √ 3) Now we want the area of quadrilateralBKLC Points: -B= (26,0)-K= (13,3 √ 3)-L= (2,8 √ 3)-C= (0,16 √ 3) QuadrilateralBKLC— in order: B→K→L→C→B We can compute it...