arxiv: 2504.11468 · v1 · pith:O672LEEWnew · submitted 2025-04-10 · 💻 cs.CL

SFT or RL? An Early Investigation into Training R1-Like Reasoning Large Vision-Language Models

Hardy Chen , Haoqin Tu , Fali Wang , Hui Liu , Xianfeng Tang , Xinya Du , Yuyin Zhou , Cihang Xie This is my paper

Pith reviewed 2026-05-17 15:39 UTC · model grok-4.3

classification 💻 cs.CL

keywords supervised fine-tuningreinforcement learningvision-language modelsreasoning pathspseudo reasoningmultimodal reasoningtraining order

0 comments

The pith

SFT induces pseudo reasoning paths that undermine subsequent RL in vision-language models.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This paper examines the common practice of using supervised fine-tuning before reinforcement learning to train large vision-language models for step-by-step reasoning. It argues that SFT causes models to imitate expert traces in ways that produce long, hesitant, and sometimes inaccurate steps. These imitative patterns then interfere with the model's ability to develop more flexible and effective reasoning during the RL phase. The authors support this by creating a new dataset of visual reasoning examples and running controlled comparisons of training orders and methods. Their findings indicate that RL approaches can foster more natural reasoning when not preceded by standard SFT.

Core claim

The paper establishes that supervised fine-tuning on expert-generated reasoning traces induces pseudo reasoning paths. These paths may appear similar to native reasoning but consist of prolonged, hesitant, less informative steps and incorrect reasoning. This imitation helps models learn output formats yet locks them into rigid modes that reduce the gains from later reinforcement learning. In contrast, reinforcement learning that directly optimizes with combined perception and cognition signals produces more adaptive reasoning behavior without the same imitative constraints.

What carries the argument

Pseudo reasoning paths: imitative step-by-step traces copied from expert models during supervised fine-tuning that resemble but fail to match the flexible, informative reasoning that reinforcement learning can develop.

If this is right

SFT teaches basic reasoning formats but restricts models from improving beyond imitative patterns in subsequent RL stages.
RL with rewards that combine perception accuracy and cognitive step quality supports the emergence of genuine adaptive reasoning.
Models trained without prior SFT or with careful RL setups reach higher accuracy on visual reasoning benchmarks than those following the standard SFT-first sequence.
The order of training methods matters because early imitation can embed structural habits that later optimization struggles to overwrite.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Curating SFT data to remove or shorten hesitant steps might reduce the negative carry-over effect into RL without discarding the format-learning benefit.
The same imitation problem could appear when training language-only models on chain-of-thought data, suggesting the finding is not limited to vision inputs.
Reusing the six-step dataset pipeline for new domains would allow direct tests of whether the pseudo-path effect generalizes beyond the original visual tasks.

Load-bearing premise

The performance differences between SFT-then-RL and RL-only training arise specifically from the induction of pseudo reasoning paths rather than from differences in data difficulty, reward design, or training details.

What would settle it

Run SFT-then-RL and RL-only training on the exact same data splits, reward functions, and hyperparameters, then inspect the generated reasoning traces for hesitation length and error rates; if the performance gap vanishes or hesitation does not appear after SFT, the central claim would not hold.

read the original abstract

This work revisits the dominant supervised fine-tuning (SFT) then reinforcement learning (RL) paradigm for training Large Vision-Language Models (LVLMs), and reveals a key finding: SFT can significantly undermine subsequent RL by inducing ``pseudo reasoning paths'' imitated from expert models. While these paths may resemble the native reasoning paths of RL models, they often involve prolonged, hesitant, less informative steps, and incorrect reasoning. To systematically study this effect, we introduce VLAA-Thinking, a new multimodal dataset designed to support reasoning in LVLMs. Constructed via a six-step pipeline involving captioning, reasoning distillation, answer rewrite and verification, VLAA-Thinking comprises high-quality, step-by-step visual reasoning traces for SFT, along with a more challenging RL split from the same data source. Using this dataset, we conduct extensive experiments comparing SFT, RL and their combinations. Results show that while SFT helps models learn reasoning formats, it often locks aligned models into imitative, rigid reasoning modes that impede further learning. In contrast, building on the Group Relative Policy Optimization (GRPO) with a novel mixed reward module integrating both perception and cognition signals, our RL approach fosters more genuine, adaptive reasoning behavior. Notably, our model VLAA-Thinker, based on Qwen2.5VL 3B, achieves top-1 performance on Open LMM Reasoning Leaderboard (https://huggingface.co/spaces/opencompass/Open_LMM_Reasoning_Leaderboard) among 4B scale LVLMs, surpassing the previous state-of-the-art by 1.8%. We hope our findings provide valuable insights in developing reasoning-capable LVLMs and can inform future research in this area.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

SFT on distilled reasoning traces appears to create rigid imitative patterns that limit later RL gains in LVLMs, backed by a new dataset and a small leaderboard win, but the experiments leave room for other explanations like data splits or hyperparameters.

read the letter

The key point from this paper is that SFT on expert reasoning traces can lock models into less flexible reasoning styles that then limit what RL can achieve afterward. They created VLAA-Thinking, a multimodal dataset with step-by-step visual reasoning examples made through a six-step pipeline of captioning, distillation from experts, answer rewriting, and verification. They split it into SFT-friendly parts and harder ones for RL. Then they ran comparisons using Group Relative Policy Optimization with a new reward that mixes perception and cognition signals. Their RL-first approach on the Qwen2.5-VL 3B model ends up with better results than the standard SFT then RL sequence. The final model tops the Open LMM Reasoning Leaderboard for models around 4B parameters by a small margin. What stands out is the observation that SFT helps with format but can lead to prolonged, hesitant, or incorrect steps that RL has trouble correcting. This is a practical warning for anyone scaling up reasoning training in vision-language systems. The fact that they ship a new dataset and a working recipe adds value beyond just the negative finding on SFT. The soft spot is that the experiments may not have fully isolated the cause. The abstract does not confirm that the SFT-then-RL and RL-only conditions were identical in data difficulty distribution, exact reward weights, or training hyperparameters. If those differed, the gap could come from that instead of the pseudo-reasoning effect specifically. More detailed ablations would help pin it down. This work is aimed at people training reasoning LVLMs, especially at smaller scales where efficiency matters. Anyone looking for alternatives to the dominant pipeline or needing a new reasoning dataset could find it useful. I would send it to peer review. The core question is timely, the results are reported clearly enough to spark discussion, and the dataset is a concrete contribution even if the causal story needs more support.

Referee Report

1 major / 1 minor

Summary. The paper claims that SFT can significantly undermine subsequent RL for training reasoning LVLMs by inducing 'pseudo reasoning paths' (prolonged, hesitant, less informative, or incorrect traces imitated from experts). It introduces the VLAA-Thinking dataset, constructed via a six-step pipeline of captioning, reasoning distillation, answer rewrite, and verification, with an SFT split and a more challenging RL split. Experiments compare SFT, RL, and combinations using GRPO with a novel mixed perception+cognition reward; the resulting VLAA-Thinker (Qwen2.5VL 3B) achieves top-1 on the Open LMM Reasoning Leaderboard among 4B-scale models, surpassing prior SOTA by 1.8%.

Significance. If substantiated, the result would challenge the standard SFT-then-RL pipeline for multimodal reasoning models and emphasize the value of direct RL with carefully designed rewards for fostering adaptive rather than imitative behavior. The introduction of VLAA-Thinking and the leaderboard result constitute concrete contributions to the empirical study of training order effects in LVLMs.

major comments (1)

The central attribution of the SFT-then-RL vs. RL-only performance gap to induction of pseudo-reasoning paths is load-bearing for the main claim, yet the manuscript provides no explicit statement that the two regimes were matched on data difficulty distribution, exact formulation of the mixed reward, GRPO hyperparameters, learning-rate schedules, or batching. Without such controls, alternative explanations (data difficulty, reward design, or hyperparameter differences) cannot be ruled out.

minor comments (1)

The abstract and experimental description mention 'extensive experiments' and leaderboard gains but omit details on baseline controls, statistical significance testing, or ablations isolating the mixed reward; these should be added for reproducibility.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the constructive and detailed feedback. The concern regarding experimental controls is important for strengthening the attribution of results to pseudo-reasoning paths, and we address it directly below with plans for revision.

read point-by-point responses

Referee: The central attribution of the SFT-then-RL vs. RL-only performance gap to induction of pseudo-reasoning paths is load-bearing for the main claim, yet the manuscript provides no explicit statement that the two regimes were matched on data difficulty distribution, exact formulation of the mixed reward, GRPO hyperparameters, learning-rate schedules, or batching. Without such controls, alternative explanations (data difficulty, reward design, or hyperparameter differences) cannot be ruled out.

Authors: We agree that explicit documentation of controls is essential to support the central claim. In our experiments, the SFT-then-RL and RL-only regimes were matched on the following: both drew from the identical VLAA-Thinking dataset with the same difficulty distribution (using the SFT split for format learning where applicable and the more challenging RL split for the reinforcement phase in both settings); the mixed perception+cognition reward was formulated and weighted identically; and GRPO was executed with the same hyperparameters, learning-rate schedules, and batch sizes. We will revise the manuscript by adding a dedicated subsection (and accompanying table) in the Experiments section that explicitly lists these matched settings. This addition will directly address the referee's concern and help rule out alternative explanations. revision: yes

Circularity Check

0 steps flagged

No significant circularity in empirical comparison study

full rationale

This paper is an empirical investigation that introduces the VLAA-Thinking dataset via a six-step pipeline and compares SFT, RL, and combined training regimes on LVLMs using GRPO with a mixed perception-cognition reward. The central claims rest on experimental results, qualitative inspection of reasoning traces, and leaderboard performance rather than any mathematical derivation chain, equations, or fitted parameters that reduce to inputs by construction. No self-definitional steps, uniqueness theorems, or ansatzes imported via self-citation are present in the provided text. The work is self-contained against external benchmarks such as the Open LMM Reasoning Leaderboard and is replicable given the dataset and model details.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Empirical study on training paradigms; no mathematical axioms, free parameters, or invented entities beyond standard machine-learning assumptions about reward signals and policy optimization.

pith-pipeline@v0.9.0 · 5644 in / 1115 out tokens · 92163 ms · 2026-05-17T15:39:06.693023+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

Cost.FunctionalEquation washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

SFT can significantly undermine subsequent RL by inducing ``pseudo reasoning paths'' imitated from expert models. While these paths may resemble the native reasoning paths of RL models, they often involve prolonged, hesitant, less informative steps, and incorrect reasoning.

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Forward citations

Cited by 20 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Learning Agentic Policy from Action Guidance
cs.CL 2026-05 unverdicted novelty 7.0

ActGuide-RL uses human action data as plan-style guidance in mixed-policy RL to overcome exploration barriers in LLM agents, matching SFT+RL performance on search benchmarks without cold-start training.
CGC: Compositional Grounded Contrast for Fine-Grained Multi-Image Understanding
cs.CV 2026-04 unverdicted novelty 7.0

CGC improves fine-grained multi-image understanding in MLLMs by constructing contrastive training instances from existing single-image annotations and adding a rule-based spatial reward, achieving SOTA on MIG-Bench an...
Understanding the Role of Hallucination in Reinforcement Post-Training of Multimodal Reasoning Models
cs.LG 2026-04 unverdicted novelty 7.0

RL post-training on hallucination-forced multimodal data improves reasoning performance and can outperform standard training.
Reasoning Within the Mind: Dynamic Multimodal Interleaving in Latent Space
cs.CV 2025-12 unverdicted novelty 7.0

DMLR performs dynamic visual-textual interleaving in latent space using confidence-guided latent policy gradient optimization and a dynamic visual injection strategy, yielding improved multimodal reasoning on benchmarks.
Asking like Socrates: Socrates helps VLMs understand remote sensing images
cs.CV 2025-11 unverdicted novelty 7.0

RS-EoT uses a SocraticAgent self-play system and two-stage RL to train VLMs for genuine iterative reasoning and visual inspection on remote sensing VQA and grounding tasks, achieving SOTA results.
Latent Visual Reasoning
cs.CV 2025-09 unverdicted novelty 7.0

Latent Visual Reasoning enables autoregressive generation of latent visual states that reconstruct critical image tokens, yielding gains on perception-heavy VQA benchmarks such as 71.67% on MMVP.
Teacher-Guided Policy Optimization for LLM Distillation
cs.LG 2026-05 unverdicted novelty 6.0

TGPO improves on-policy LLM distillation by using teacher predictions conditioned on student rollouts to supply informative guidance when the two distributions diverge.
One Step Forward and K Steps Back: Better Reasoning with Denoising Recursion Models
cs.LG 2026-04 unverdicted novelty 6.0

Denoising Recursion Models train multi-step noise reversal in looped transformers and outperform the prior Tiny Recursion Model on ARC-AGI.
Characterizing Model-Native Skills
cs.AI 2026-04 conditional novelty 6.0

Recovering an orthogonal basis from model activations yields a model-native skill characterization that improves reasoning Pass@1 by up to 41% via targeted data selection and supports inference steering, outperforming...
Generalization in LLM Problem Solving: The Case of the Shortest Path
cs.AI 2026-04 unverdicted novelty 6.0

LLMs show strong spatial generalization to unseen maps in shortest-path tasks but fail length scaling due to recursive instability, with data coverage setting hard limits.
Watch Before You Answer: Learning from Visually Grounded Post-Training
cs.CV 2026-04 unverdicted novelty 6.0

Filtering post-training data to visually grounded questions improves VLM video understanding performance by up to 6.2 points using 69% of the data.
Teaching an Agent to Sketch One Part at a Time
cs.AI 2026-03 unverdicted novelty 6.0

A multi-modal LM agent is trained to produce vector sketches part-by-part via supervised fine-tuning and process-reward RL on the new ControlSketch-Part dataset with automatic part annotations.
DeepEyesV2: Toward Agentic Multimodal Model
cs.CV 2025-11 unverdicted novelty 6.0

DeepEyesV2 uses a two-stage cold-start plus reinforcement learning pipeline to produce an agentic multimodal model that adaptively invokes tools and outperforms direct RL on real-world reasoning benchmarks.
WebSailor: Navigating Super-human Reasoning for Web Agent
cs.CL 2025-07 conditional novelty 6.0

WebSailor trains open-source web agents to match proprietary performance on complex information-seeking tasks by generating high-uncertainty scenarios and using a new RL method called DUPO.
How You Begin is How You Reason: Driving Exploration in RLVR via Prefix-Tuned Priors
cs.AI 2026-05 unverdicted novelty 5.0

IMAX trains soft prefixes with an InfoMax reward to drive diverse exploration in RLVR, yielding up to 11.60% gains in Pass@4 over standard RLVR across model scales.
Act Wisely: Cultivating Meta-Cognitive Tool Use in Agentic Multimodal Models
cs.CV 2026-04 unverdicted novelty 5.0

HDPO reframes tool efficiency as a conditional objective within accurate trajectories, enabling Metis to reduce tool invocations by orders of magnitude while raising reasoning accuracy.
TeamPath: Building MultiModal Pathology Experts with Reasoning AI Copilots
q-bio.QM 2025-11 unverdicted novelty 5.0

TeamPath introduces a reinforcement-learning-powered multimodal AI copilot for pathology that generates reasoned diagnoses and integrates image and transcriptomic data.
Mixture-of-Visual-Thoughts: Exploring Context-Adaptive Reasoning Mode Selection for General Visual Reasoning
cs.AI 2025-09 unverdicted novelty 5.0

MoVT unifies different visual reasoning modes in a single model and uses the AdaVaR two-stage framework with supervised cold-start and RL via AdaGRPO to enable context-adaptive mode selection, yielding consistent gain...
Failure Makes the Agent Stronger: Enhancing Accuracy through Structured Reflection for Reliable Tool Interactions
cs.CV 2025-09 unverdicted novelty 5.0

Structured reflection makes error diagnosis and repair an explicit trainable step that improves reliability and reduces redundant calls in tool-using LLM agents.
A Survey of Self-Evolving Agents: What, When, How, and Where to Evolve on the Path to Artificial Super Intelligence
cs.AI 2025-07 accept novelty 4.0

The paper delivers the first systematic review of self-evolving agents, structured around what components evolve, when adaptation occurs, and how it is implemented.

Reference graph

Works this paper leans on

25 extracted references · 25 canonical work pages · cited by 20 Pith papers

[1]

description

**Replace references to “description”, “caption” and ”rationale”** with wording that references **“the image.”** - For example, “The description says...” could become “The image shows...” - “The caption suggests...” could become “The image suggests...” - “Based on the rationale...” could become “Based on the image...” - Make sure the replacement sounds na...

work page
[2]

**Preserve all line breaks, punctuation, and spacing** as much as possible, and make **no additional edits** outside of these replacements

work page
[3]

—— Here is the input: {input} Figure 10: Prompt for answer rewriting with GPT-4-Turbo

You should only output the rewritten content. —— Here is the input: {input} Figure 10: Prompt for answer rewriting with GPT-4-Turbo. Prompt for Verification You are a fair evaluator. You will be given a groundtruth and an answer from a model. If the answer aligns with the groundtruth, output ”Yes”. Otherwise, output ”No”. Your output should only be ”Yes” ...

work page 2024
[4]

MathVista: The Test Mini split of MathVista dataset; overall accuracy

work page
[5]

MathVision: The Full test set of MathVision; overall accuracy

work page
[6]

MathVerse: The Test Mini split of MathVerse; accuracy of ”Vision Only”

work page
[7]

DynaMath: The Full test set of DynaMath; overall accuracy

work page
[8]

WeMath: The Test Mini split of WeMath; ”Score (Strict)”

work page
[9]

A”, “A)”, “(a)

LogicVista: The Full test set of LogicVista; overall accuracy. C Details of GRPO Experiments C.1 Training We adapt our code from OpenRLHF framework (Hu et al., 2024). To suit for our need of deploying a reward model on the same machine, we offload the reward model to CPU and only move it to GPU when performing rollouts and scoring. This design saves valua...

work page 2024
[10]

- The angle ∠ AOB = 36∘

Identify the given information: - AB is tangent to circle O at point B. - The angle ∠ AOB = 36∘

work page
[11]

Therefore, ∠ OBA = 90

Understand the properties: - Since 𝐴𝐵 is tangent to the circle at 𝐵 , the radius 𝑂𝐵 is perpendicular to the tangent 𝐴𝐵 . Therefore, ∠ OBA = 90. - In triangle 𝑂𝐴𝐵 , the sum of the angles is 180∘. So, we can find ∠ 𝐴𝑂𝐵 : ∠ 𝐴𝑂𝐵 + ∠ 𝑂𝐵𝐴 + ∠ 𝑂𝐴𝐵 = 180∘ 36∘ + 90∘ + ∠ 𝑂𝐴𝐵 = 180∘ ∠ 𝑂𝐴𝐵 = 54∘

work page
[12]

Determine the measure of ∠ 𝑪 - Since ∠ 𝑂𝐴𝐵 is an inscribed angle that intercepts the arc BC , the measure of the central angle ∠ 𝐵𝑂𝐶 that intercepts the same arc BC is twice the measure of the inscribed angle ∠ 𝑂𝐴𝐵 : ∠ 𝐵𝑂𝐶 = 2 × ∠ 𝑂𝐴𝐵 = 2 × 54∘ = 108∘ - Now, in triangle OBC , the sum of the angles is 180∘. So, we can find ∠ 𝐶: ∠ 𝑂𝐵𝐶 + ∠ 𝑂𝐶𝐵 + ∠ 𝐵𝑂𝐶 = 180∘...

work page
[13]

- Points C and D are on the semicircle, with D being the midpoint of arc BC

Understanding the Problem: - We have a semicircle with diameter AB and center O. - Points C and D are on the semicircle, with D being the midpoint of arc BC. - Chords AC and BD intersect at point P, forming an angle DPB of 60 degrees. - We need to find

work page
[14]

- Midpoint of Arc: Since D is the midpoint of arc BC, arcs BD and DC are equal

Key Properties and Theorems: - Intersecting Chords Theorem: The measure of an angle formed by two intersecting chords is half the sum of the measures of the intercepted arcs. - Midpoint of Arc: Since D is the midpoint of arc BC, arcs BD and DC are equal. - Inscribed Angle Theorem: An angle inscribed in a circle is half the measure of its intercepted arc

work page
[15]

Applying the Intersecting Chords Theorem: - Angle DPB is formed by chords BD and AC, so: - Given , we have:

work page
[16]

Let each be

Determining Arc Measures: - Since D is the midpoint of arc BC, arcs BD and DC are equal. Let each be . - Thus, arc BC = . - The total semicircle is , so: - Substituting into the earlier equation: - Therefore, arcs BD and DC are each and arc AC is

work page
[17]

- By the Inscribed Angle Theorem:

Finding Angle ADC: - Angle ADC is an inscribed angle intercepting arc AC. - By the Inscribed Angle Theorem:

work page
[18]

Calculating the Tangent: - . Answer: (Option C) tan(/uni2220ADC) AP/uni22C5PC=BP/uni22C5PD tan(/uni2220ADC) /uni03B8tan(/uni03B8) (0,0) (−r,0) (r,0) PA/uni22C5PC=PB/uni22C5PD r=1 (0,0) (−1,0) (1,0) (/uni03B8+180) 2 2/uni03B1 /uni03B1 1 2(arcDB+arcAC) DB+arcAC=120 AC=180−arcCB BD=arcDC=x arcBC=2x AC=180−arcBC=180−2x 1 2(arcDB+arcAC)=60 1 2(x+(180−2x))=60 1...

work page
[24]

How many objects are left? •Original Answer: 3 Input Image <think> Okay, let's see

Simplify the Negative Sign (Optional): Since , the equation can also be written as: Final Answer: y=a/uni22C5sin(−5x) ( /uni03C0 10,5) a x=/uni03C0 10 y=5 5=a/uni22C5sin(−5/uni22C5/uni03C0 10) −5/uni22C5/uni03C0 10=−/uni03C0 2 5=a/uni22C5sin(−/uni03C0 2) sin(−/uni03C0 2)=−1 5=a/uni22C5(−1) 5=a/uni22C5(−1) a=−5 y=−5/uni22C5sin(−5x) y=a/uni22C5sin(−5x) ( /u...

work page
[25]

To find the value of , I'll substitute the coordinates of the point into the equation

sin(−/uni03C0 2)=−1 5=−a a:a=−5 y=−5/uni22C5sin(−5/uni22C5x) y=−5/uni22C5sin(−5/uni22C5x) Input Image <think> First, I recognize that the function given is , and it passes through the point . To find the value of , I'll substitute the coordinates of the point into the equation. Substituting and , the equation becomes: Simplifying the argument of the sine ...

work page
[26]

Substitute the Point into the Equation: Plug and into the equation:

work page
[27]

Simplify the Argument of the Sine Function: So, the equation becomes:

work page
[28]

Evaluate the Sine Function: Substituting this back into the equation:

work page
[29]

Write the Final Equation: Substitute back into the original function:

work page
[30]

How many objects are left? •Original Answer: 3 Input Image <think> Okay, let's see

Simplify the Negative Sign (Optional): Since , the equation can also be written as: Final Answer: y=a/uni22C5sin(−5x) ( /uni03C0 10,5) a x=/uni03C0 10 y=5 5=a/uni22C5sin(−5/uni22C5/uni03C0 10) −5/uni22C5/uni03C0 10=−/uni03C0 2 5=a/uni22C5sin(−/uni03C0 2) sin(−/uni03C0 2)=−1 5=a/uni22C5(−1) 5=a/uni22C5(−1) a=−5 y=−5/uni22C5sin(−5x) y=a/uni22C5sin(−5x) ( /u...

work page