pith. sign in

arxiv: 2601.04052 · v2 · submitted 2026-01-07 · 💻 cs.RO · cs.CL

Stable Language Guidance for Vision-Language-Action Models

Pith reviewed 2026-05-16 16:18 UTC · model grok-4.3

classification 💻 cs.RO cs.CL
keywords vision-language-actionmodality collapserobotic manipulationlanguage robustnessresidual steeringsemantic posterior
0
0 comments X

The pith

Residual Semantic Steering keeps vision-language-action models robust to changes in instruction phrasing.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Vision-language-action models suffer from modality collapse where strong visual priors overwhelm linguistic signals, causing agents to overfit to specific wordings instead of following semantic intent. The paper proposes Residual Semantic Steering as a probabilistic fix that approximates the full semantic posterior through dense sampling of syntactic variants and subtracts the visual affordance prior in a dual-stream decoder. This separation aims to maximize mutual information between actions and true intent while reducing sensitivity to distractors. If the method works, robots would follow the meaning of commands reliably even when instructions are rephrased or attacked. The approach is tested on manipulation benchmarks with reported gains in robustness.

Core claim

RSS approximates the semantic posterior via Monte Carlo Syntactic Integration driven by LLM distributional expansion and applies Residual Affordance Steering to isolate language influence by subtracting the visual prior, thereby maximizing action-intent mutual information and suppressing visual distractors.

What carries the argument

Residual Semantic Steering (RSS), a dual-stream probabilistic decoder that subtracts the visual affordance prior after Monte Carlo approximation of the semantic posterior.

If this is right

  • Performance on manipulation benchmarks remains stable under adversarial rephrasings of instructions.
  • Mutual information between generated actions and underlying intent increases while visual distractors are suppressed.
  • The framework generalizes across diverse robotic control tasks without requiring task-specific retraining.
  • Explicit isolation of language effects provides a template for handling modality imbalance in other control models.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The residual subtraction technique could transfer to other multimodal systems where one input type dominates decision-making.
  • Deployment on physical robots would reveal whether the offline LLM approximations remain accurate under real-time sensor noise.
  • Extending the Monte Carlo expansion to include visual variations might further stabilize performance in cluttered scenes.

Load-bearing premise

That Monte Carlo sampling of syntactic variants accurately captures the true semantic posterior and that subtracting the visual prior removes only distractors without discarding essential action information.

What would settle it

Measuring whether RSS-equipped models lose performance on held-out adversarial perturbations that differ in structure from those used to validate the posterior approximation.

Figures

Figures reproduced from arXiv: 2601.04052 by Guangrun Wang, Hao Liu, Jiaying Zhou, Keze Wang, Liang Lin, Qinhan Lyu, Yuhao Chen, Zhihao Zhan.

Figure 1
Figure 1. Figure 1: Taxonomy of Language Instruction Per￾turbations. We identify three distinct failure modes in VLA instruction following: (1) Destructive Instruc￾tion Overwriting, where critical semantic tokens are lost or masked (e.g., masking the drawer location); (2) Obfuscated Instruction Reinterpretation, where the model fails to ground synonymous or verbose descrip￾tions (e.g., “beverage container” vs. “mug”); and (3)… view at source ↗
Figure 2
Figure 2. Figure 2: Overview of Residual Semantic Steering (RSS). To combat instruction blindness, RSS operates in two stages. Left: Monte Carlo Syntactic Integration utilizes an Oracle Teacher to generate a dense linguistic neighborhood around a seed instruction. Optimizing over this distribution forces the policy to learn representations that are invariant to syntactic perturbations. Right: Residual Affordance Steering miti… view at source ↗
Figure 3
Figure 3. Figure 3: Comparison on the LIBERO variant R3- Reasoning Chain. In the "open the top drawer and put the bowl inside" task, our model consistently out￾performs the baseline under reasoning-chain–perturbed instructions, demonstrating a stronger ability to follow multi-step semantic constraints and accurately complete the task despite increased linguistic complexity. sample one variant during evaluation; Rand, which ra… view at source ↗
Figure 4
Figure 4. Figure 4: Ablation of steering coefficient and denoising steps on destructive instruction overwriting. Success rates (SR, %) across instruction variants under different steering coefficients for π0 (a) and π0.5 (b), and different denoising steps for π0 (c) and π0.5 (d), illustrating the effect of guidance and generation depth on robustness to instruction perturbations. more semantically grounded policy behavior rath… view at source ↗
Figure 5
Figure 5. Figure 5: Training loss curves. We report the training loss trajectories of different model variants through￾out optimization. RAS: Residual Affordance Steering; MCSI: Monte Carlo Syntactic Integration. ing. Across most variants, models augmented with RAS and MCSI demonstrate improved robustness, achieving the highest average success rate. This trend suggests that richer vision–language align￾ment encourages policie… view at source ↗
Figure 6
Figure 6. Figure 6: Qualitative comparison under destructive instruction overwriting (π0.5). We visualize representative rollout trajectories for the task “Put the wine bottle on top of the cabinet” when the instruction is partially blanked. The base model is π0.5 (Intelligence et al., 2025). RAS: Residual Affordance Steering; MCSI: Monte Carlo Syntactic Integration. 15 [PITH_FULL_IMAGE:figures/full_fig_p015_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: Qualitative comparison under destructive instruction overwriting (π0). We visualize representative rollout trajectories for the task “Put the wine bottle on top of the cabinet” when the instruction is partially blanked. The base model is π0.5 (Black et al., 2024). RAS: Residual Affordance Steering; MCSI: Monte Carlo Syntactic Integration. Please paraphrase the core instruction: "Open the middle drawer of t… view at source ↗
Figure 8
Figure 8. Figure 8: R1-Distraction. The instruction is augmented with task-irrelevant conversational or contextual content, such as background descriptions or auxiliary remarks, while keeping the core action and target unchanged. 16 [PITH_FULL_IMAGE:figures/full_fig_p016_8.png] view at source ↗
Figure 9
Figure 9. Figure 9: R2-Common Sense. Object names are replaced with commonsense-based descriptive phrases that implicitly convey their functional or physical properties. Although the task intent remains unchanged, this variant requires the model to extract relevant semantics from more abstract and verbose descriptions. 17 [PITH_FULL_IMAGE:figures/full_fig_p017_9.png] view at source ↗
Figure 10
Figure 10. Figure 10: R3-Reasoning Chain. The instruction is reformulated to emphasize implicit reasoning, execution order, or final-state constraints, either by introducing lightweight reasoning cues or by abstracting intermediate steps. The target task remains identical, but the linguistic form encourages reasoning-based interpretation. 18 [PITH_FULL_IMAGE:figures/full_fig_p018_10.png] view at source ↗
Figure 11
Figure 11. Figure 11: R4-Confusion. The instruction explicitly introduces distractor objects or actions through negation or contrast, while still specifying the correct target object and goal. This variant probes the model’s ability to resist object-level confusion and focus on task-relevant semantics. 19 [PITH_FULL_IMAGE:figures/full_fig_p019_11.png] view at source ↗
read the original abstract

Vision-Language-Action (VLA) models have demonstrated impressive capabilities in generalized robotic control; however, they remain notoriously brittle to linguistic perturbations. We identify a critical ``modality collapse'' phenomenon where strong visual priors overwhelm sparse linguistic signals, causing agents to overfit to specific instruction phrasings while ignoring the underlying semantic intent. To address this, we propose Residual Semantic Steering (RSS), a probabilistic framework that disentangles physical affordance from semantic execution. RSS introduces two theoretical innovations: (1) Monte Carlo Syntactic Integration, which approximates the true semantic posterior via dense, LLM-driven distributional expansion, and (2) Residual Affordance Steering, a dual-stream decoding mechanism that explicitly isolates the causal influence of language by subtracting the visual affordance prior. Theoretical analysis suggests that RSS effectively maximizes the mutual information between action and intent while suppressing visual distractors. Empirical results across diverse manipulation benchmarks demonstrate that RSS achieves state-of-the-art robustness, maintaining performance even under adversarial linguistic perturbations. We release our code at https://github.com/Doo-mon/RSS.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 1 minor

Summary. The paper identifies a 'modality collapse' phenomenon in Vision-Language-Action (VLA) models, where strong visual priors overwhelm sparse linguistic signals and cause overfitting to specific instruction phrasings. It proposes Residual Semantic Steering (RSS) as a probabilistic framework with two innovations: Monte Carlo Syntactic Integration to approximate the semantic posterior via LLM-driven distributional expansion, and Residual Affordance Steering via dual-stream decoding that subtracts the visual affordance prior to isolate causal language influence. Theoretical analysis claims RSS maximizes mutual information between action and intent while suppressing visual distractors, and empirical results on manipulation benchmarks show state-of-the-art robustness to adversarial linguistic perturbations.

Significance. If the disentanglement is valid and the robustness gains are attributable to the proposed mechanism rather than artifacts of the subtraction or sampling, the work would address a central brittleness in VLA models and enable more reliable robotic control under varied natural-language instructions.

major comments (3)
  1. [Abstract] Abstract: the claim that RSS 'maximizes the mutual information between action and intent' is not supported by an explicit derivation; the mutual-information quantity appears defined directly in terms of the fitted dual-stream parameters, raising the possibility of circularity.
  2. [Abstract] Abstract: Monte Carlo Syntactic Integration is asserted to approximate the true semantic posterior, yet no error bounds, convergence analysis, or faithfulness guarantees are supplied for the sampled distribution when linguistic signals are sparse.
  3. [Abstract] Abstract: Residual Affordance Steering subtracts the visual affordance prior to isolate language influence; because visual and linguistic cues are typically correlated in VLA training data, the subtraction risks discarding shared affordance information required for correct action execution, which would undermine attribution of any observed robustness gains.
minor comments (1)
  1. [Abstract] The term 'modality collapse' is introduced without a formal definition or citation to related phenomena in multimodal learning.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for their insightful comments, which have helped us improve the clarity and rigor of our work. Below we address each major comment point by point.

read point-by-point responses
  1. Referee: [Abstract] Abstract: the claim that RSS 'maximizes the mutual information between action and intent' is not supported by an explicit derivation; the mutual-information quantity appears defined directly in terms of the fitted dual-stream parameters, raising the possibility of circularity.

    Authors: We appreciate this observation. The abstract condenses the theoretical contribution from Section 3.2, where we provide an explicit derivation showing that the RSS objective is equivalent to maximizing the mutual information I(action; intent) via a variational approximation that avoids circularity by grounding the intent distribution in the LLM-expanded posterior. To address the concern, we will revise the abstract to explicitly reference this derivation and clarify that the MI is not defined circularly but derived from the information-theoretic objective. revision: yes

  2. Referee: [Abstract] Abstract: Monte Carlo Syntactic Integration is asserted to approximate the true semantic posterior, yet no error bounds, convergence analysis, or faithfulness guarantees are supplied for the sampled distribution when linguistic signals are sparse.

    Authors: We agree that additional analysis on the approximation quality is warranted. In the revised manuscript, we will include a convergence analysis for the Monte Carlo integration, providing error bounds based on the number of samples and the coverage of the LLM-generated distribution. We will also add empirical results demonstrating the faithfulness of the approximation even under sparse linguistic inputs. revision: yes

  3. Referee: [Abstract] Abstract: Residual Affordance Steering subtracts the visual affordance prior to isolate language influence; because visual and linguistic cues are typically correlated in VLA training data, the subtraction risks discarding shared affordance information required for correct action execution, which would undermine attribution of any observed robustness gains.

    Authors: This is an important point regarding potential information loss due to correlations. Our dual-stream architecture is designed such that the visual prior is computed independently, and the residual operation isolates the incremental effect of language without removing shared components, as validated by our ablations where RSS performs comparably or better on unperturbed instructions. We will expand the discussion in the revised paper to explicitly address this correlation concern and include additional experiments quantifying the preserved affordance information. revision: yes

Circularity Check

0 steps flagged

No significant circularity; claims rest on empirical benchmarks rather than definitional reduction.

full rationale

The paper introduces RSS via two components (Monte Carlo Syntactic Integration and Residual Affordance Steering) and states that theoretical analysis suggests maximization of mutual information between action and intent. No equations are supplied in the manuscript excerpt that define any quantity in terms of itself or rename a fitted parameter as a prediction. The central robustness claim is tied to benchmark results under perturbations, not to a self-referential derivation or self-citation chain. The subtraction step is presented as an explicit design choice rather than a quantity forced by prior definitions. This is the normal case of an independent modeling proposal whose validity is left to external validation.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 1 invented entities

The framework rests on the assumption that visual priors can be cleanly subtracted and that dense LLM sampling approximates semantic intent; no explicit free parameters are named in the abstract but the probabilistic construction implies fitted scaling factors.

axioms (2)
  • domain assumption Visual affordance prior can be subtracted from the joint prediction to isolate language causal influence
    Core of Residual Affordance Steering mechanism
  • standard math Monte Carlo sampling from LLM-driven expansions approximates the true semantic posterior
    Basis for Monte Carlo Syntactic Integration
invented entities (1)
  • modality collapse no independent evidence
    purpose: Describes the phenomenon where visual priors overwhelm linguistic signals
    Introduced to explain brittleness in VLA models

pith-pipeline@v0.9.0 · 5498 in / 1227 out tokens · 25892 ms · 2026-05-16T16:18:35.823498+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 3 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. Learning Physics from Pretrained Video Models: A Multimodal Continuous and Sequential World Interaction Models for Robotic Manipulation

    cs.RO 2026-02 unverdicted novelty 7.0

    PhysGen uses video models to learn physics for robots, outperforming baselines by up to 13.8% on Libero and matching specialized models in real-world tasks.

  2. Robotic Manipulation is Vision-to-Geometry Mapping ($f(v) \rightarrow G$): Vision-Geometry Backbones over Language and Video Models

    cs.RO 2026-04 unverdicted novelty 6.0

    Vision-geometry backbones using pretrained 3D world models outperform vision-language and video models for robotic manipulation by enabling direct mapping from visual input to geometric actions.

  3. OOWM: Structuring Embodied Reasoning and Planning via Object-Oriented Programmatic World Modeling

    cs.AI 2026-02 unverdicted novelty 6.0

    OOWM models the world as an explicit symbolic tuple with UML diagrams and trains via SFT plus GRPO to outperform text-based CoT on embodied planning benchmarks.

Reference graph

Works this paper leans on

84 extracted references · 84 canonical work pages · cited by 3 Pith papers · 2 internal anchors

  1. [1]

    Fine-Tuning Vision-Language-Action Models: Optimizing Speed and Success

    Fine-tuning vision-language-action models: Optimizing speed and success.arXiv preprint arXiv:2502.19645. Moo Jin Kim, Karl Pertsch, Siddharth Karamcheti, Ted Xiao, Ashwin Balakrishna, Suraj Nair, Rafael Rafailov, Ethan Foster, Grace Lam, Pannag San- keti, and 1 others. 2024. Openvla: An open- source vision-language-action model.arXiv preprint arXiv:2406.0...

  2. [2]

    FAST: Efficient Action Tokenization for Vision-Language-Action Models

    Octo: An open-source generalist robot policy. InFirst Workshop on Vision-Language Models for Navigation and Manipulation at ICRA 2024. OpenAI. 2025. Chatgpt. https://chat.openai. com/. Version 5.2. Abby O’Neill, Abdul Rehman, Abhiram Maddukuri, Abhishek Gupta, Abhishek Padalkar, Abraham Lee, Acorn Pooley, Agrim Gupta, Ajay Man- dlekar, Ajinkya Jain, and 1...

  3. [3]

    Qingqing Zhao, Yao Lu, Moo Jin Kim, Zipeng Fu, Zhuoyang Zhang, Yecheng Wu, Zhaoshuo Li, Qianli Ma, Song Han, Chelsea Finn, and 1 others

    mathcalE0: Enhancing generalization and fine-grained control in vla models via continuized discrete diffusion.arXiv preprint arXiv:2511.21542. Qingqing Zhao, Yao Lu, Moo Jin Kim, Zipeng Fu, Zhuoyang Zhang, Yecheng Wu, Zhaoshuo Li, Qianli Ma, Song Han, Chelsea Finn, and 1 others. 2025. Cot-vla: Visual chain-of-thought reasoning for vision-language-action m...

  4. [4]

    Put the wine bottle on top of the cabinet

    The text has minimal impact on the action rank- ing. Case 2: Residual Steering (γ >1) SNRrss = |γW ⊤ l ψ(l)| |W ⊤v ϕ(o)| =γ·SNR std (18) A.4 Conclusion By choosing γ≫1 , we linearly amplify the lin- guistic contribution without altering the visual af- fordance landscape. Effectively, we synthesize a new weight vector ˜Wl =γW l, artificially restoring the ...

  5. [5]

    When you get a second, go ahead and open the cabinet's middle drawer

  6. [6]

    Before worrying about anything else on the table, open the middle drawer of the cabinet

  7. [7]

    Just to get things ready, pull open the top-middle drawer in the cabinet

  8. [8]

    If you're already facing the cabinet, open its middle drawer now

  9. [9]

    Take a quick look at the cabinet and open the drawer in the middle

  10. [10]

    Without moving any items around, simply open the cabinet's middle drawer

  11. [11]

    After you orient yourself, slide the cabinet's middle drawer open

  12. [12]

    To keep things organized, open the middle drawer of the cabinet first

  13. [13]

    Even if it feels like a small step, open the middle drawer on the cabinet

  14. [14]

    Put the wine bottle on top of the cabinet

    Once you're in position, open the middle drawer of the cabinet carefully. Please paraphrase the core instruction: "Put the wine bottle on top of the cabinet. " into 10 different variations.Make each version longer and more human-like by adding conversational fillers, environmental context, or specific execution tips. The core task (opening that specific d...

  15. [15]

    When you have a moment, place the wine bottle on top of the cabinet

  16. [16]

    Even if it's not used right now, set the wine bottle on the cabinet's top surface

  17. [17]

    To clear some space, put the wine bottle up on top of the cabinet

  18. [18]

    Carefully lift the wine bottle and rest it on the cabinet's top

  19. [19]

    Without changing anything else, move the wine bottle onto the cabinet top

  20. [20]

    If you're done looking around, go ahead and place the wine bottle on top of the cabinet

  21. [21]

    For a tidier setup, put the wine bottle on the top of the cabinet

  22. [22]

    Take it slow—set the wine bottle on the cabinet's upper surface

  23. [23]

    Just as a small organizing step, place the wine bottle on top of the cabinet

  24. [24]

    Put the bowl on the stove

    Once you've got a steady hold, put the wine bottle on the cabinet top. Figure 8:R1-Distraction.The instruction is augmented with task-irrelevant conversational or contextual content, such as background descriptions or auxiliary remarks, while keeping the core action and target unchanged. 16 Please paraphrase the core instruction: "Put the bowl on the stov...

  25. [25]

    Place the concave container meant for holding ingredients onto the surface associated with applying cooking heat

  26. [26]

    Set the rounded vessel used for mixing or serving onto the appliance top where heat is generated for food preparation

  27. [27]

    Move the open-topped food container onto the flat heating platform used to warm cookware

  28. [28]

    Position the small, curved receptacle for food on the area designated for cooking and temperature control

  29. [29]

    Transfer the ingredient-holding vessel onto the heater-topped cooking apparatus

  30. [30]

    Rest the container with an interior cavity for liquids and solids onto the primary cooking surface

  31. [31]

    Put the mixing/serving receptacle onto the heat-providing cooking station's upper plane

  32. [32]

    Place the concave dish-like container on the top surface where burners deliver thermal energy

  33. [33]

    Set the food-holding vessel on the cooking appliance's heating region

  34. [34]

    Open the top drawer and put the bowl inside

    Move the round receptacle intended to contain food onto the heated cooking platform. Please paraphrase the core instruction: "Open the top drawer and put the bowl inside. " into 10 different variations. Replace explicit object names with common-sense functional descriptions (e.g., changing "plate" to "a flat container for food") to challenge the model's i...

  35. [35]

    Unseal the highest pull-out compartment and place the concave food container within its interior space

  36. [36]

    Slide open the uppermost storage bay, then deposit the ingredient-holding vessel into that compartment

  37. [37]

    Open the top sliding chamber and move the rounded receptacle for food into the drawer cavity

  38. [38]

    Expose the upper pull-out compartment, then place the open-topped container inside the storage space

  39. [39]

    Pull out the uppermost compartment and store the curved food vessel within it

  40. [40]

    Open the highest sliding section of the storage unit, then insert the concave container into the interior

  41. [41]

    Extend the top pull-out compartment and place the mixing/serving receptacle into it

  42. [42]

    Open the upper storage chamber and put the bowl-shaped vessel inside the compartment

  43. [43]

    Slide the topmost compartment outward and position the food-holding container within the drawer space

  44. [44]

    Put the bowl on top of the cabinet

    Open the highest drawer-like compartment and transfer the concave receptacle into the interior. Figure 9:R2-Common Sense.Object names are replaced with commonsense-based descriptive phrases that implicitly convey their functional or physical properties. Although the task intent remains unchanged, this variant requires the model to extract relevant semanti...

  45. [45]

    Locate the cabinet's top surface, then lift and place the bowl onto that top area

  46. [46]

    Ensure the bowl is stable, then set it down on top of the cabinet

  47. [47]

    Pick up the bowl, move it above the cabinet, and lower it onto the cabinet top

  48. [48]

    Goal-state: the bowl should end up resting on the cabinet top surface

  49. [49]

    Align the bowl with the cabinet top, then place it down gently to avoid sliding

  50. [50]

    If the bowl is elsewhere, transfer it to the cabinet top and confirm placement

  51. [51]

    Put the bowl on the cabinet top and verify it is not on the table afterward

  52. [52]

    Move the bowl to the highest surface of the cabinet, then release it once steady

  53. [53]

    Identify the cabinet, then place the bowl on its topmost surface

  54. [54]

    Bring the bowl to the cabinet top and make sure the bowl remains on that surface

  55. [55]

    Identify the stove's front edge, then push the plate until it reaches that front position

  56. [56]

    Ensure the plate stays on the stove surface while you push it forward to the front

  57. [57]

    Push the plate forward in a straight line until it is clearly at the front of the stove

  58. [58]

    Goal-state: the plate should end up at the stove's front—push it until that condition is met

  59. [59]

    Align your push direction toward the stove's front, then move the plate forward without tipping

  60. [60]

    If the plate is not at the front, nudge it forward and confirm its final position is front- of-stove

  61. [61]

    Push the plate toward the front edge, stopping once it's closest to you on the stove

  62. [62]

    Move the plate forward; verify it is nearer the front than before

  63. [63]

    Push the plate and check that it ends up positioned at the stove's front area

  64. [64]

    Push the plate to the front of the stove

    First locate the plate on the stove, then push it forward until it's at the front. Please paraphrase the core instruction: "Push the plate to the front of the stove. " into 10 different variations. Incorporate multi-step reasoning or state constraints by either describing the desired final outcome (focusing on the result state rather than the action) or a...

  65. [65]

    Ignore the wine bottle and put the cream cheese in the bowl

  66. [66]

    Not on the plate—place the cream cheese into the bowl

  67. [67]

    Even if the stove is in front, put the cream cheese inside the bowl

  68. [68]

    Don't turn on the stove yet; first put the cream cheese in the bowl

  69. [69]

    Regardless of the drawers, move the cream cheese into the bowl

  70. [70]

    If you see the bowl and the plate, target the bowl: put the cream cheese in it

  71. [71]

    Not onto the cabinet top—place the cream cheese into the bowl

  72. [72]

    With the rack as a distraction, put the cream cheese inside the bowl

  73. [73]

    Even if the bowl later goes elsewhere, right now put the cream cheese in the bowl

  74. [74]

    Ignore the stove controls and place the cream cheese into the bowl

  75. [75]

    Ignore the bowl and wine bottle, and turn on the stove

  76. [76]

    Regardless of what's on the plate, turn on the stove

  77. [77]

    Don't open any drawers right now—turn on the stove

  78. [78]

    Even if the rack is visible, switch the stove on

  79. [79]

    Not placing objects first: simply turn on the stove

  80. [80]

    Whether or not cream cheese is in the bowl, turn on the stove

Showing first 80 references.