pith. sign in

arxiv: 2606.16590 · v2 · pith:WV4SDCLHnew · submitted 2026-06-15 · 💻 cs.LG · cs.AI· q-bio.NC

Infant Spontaneous Movement Noise Improves Exploration in Deep RL

Pith reviewed 2026-06-27 04:15 UTC · model grok-4.3

classification 💻 cs.LG cs.AIq-bio.NC
keywords reinforcement learningexploration noiseinfant spontaneous movementscolored noisedeep RLdevelopmental trajectoriesaction noisespectral exponent
0
0 comments X

The pith

Progressively increasing the correlation of action noise to match infant movement statistics improves exploration efficiency in deep RL.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper tests whether exploration noise drawn from the statistics of infant spontaneous movements can outperform standard white noise in deep reinforcement learning. Infant end-effector velocity data show colored noise whose spectral exponent rises with age; the authors implement a schedule that gradually increases the temporal correlation of the agent's action noise to follow the same trajectory. Across multiple RL environments this produces more structured trajectories and faster learning than fixed white or colored noise baselines. The work treats human motor development as a source of inductive bias for artificial exploration mechanisms.

Core claim

By calibrating the power spectral density exponent of action noise to increase over the course of training in the same manner observed in infants' end-effector velocities, the resulting temporally correlated noise generates exploratory behavior with improved state-space coverage and higher learning efficiency than temporally uncorrelated white noise or static colored noise in several standard RL benchmarks.

What carries the argument

A training-time schedule that raises the spectral exponent of the exploration noise process to replicate the developmental increase measured in infant end-effector velocity power spectra.

If this is right

  • Structured exploratory trajectories emerge from the time-varying colored noise.
  • Learning curves improve relative to conventional white-noise and fixed colored-noise strategies.
  • The same noise schedule works across multiple distinct RL environments.
  • Human developmental statistics can supply concrete design choices for agent learning mechanisms.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same developmental-noise principle could be tested in continuous-control robotic tasks where state-space coverage is costly.
  • Alternative developmental trajectories drawn from other motor or cognitive milestones might yield further efficiency gains.
  • If the benefit scales with task complexity, the approach would be most valuable in high-dimensional or sparse-reward settings.

Load-bearing premise

The specific age-dependent increase in spectral exponent seen in human infant movements supplies a useful inductive bias for reinforcement-learning exploration rather than being incidental to biology.

What would settle it

Re-running the RL experiments with a constant spectral exponent (no developmental increase) and obtaining equal or worse sample efficiency than white-noise baselines would falsify the central claim.

Figures

Figures reproduced from arXiv: 2606.16590 by and Jochen Triesch, Francisco Cruz, Francisco M. L\'opez, Markus R. Ernst, Matej Hoffmann.

Figure 1
Figure 1. Figure 1: Overview of our work. Spontaneous movement velocities are extracted from videos of infants and analyzed to obtain [PITH_FULL_IMAGE:figures/full_fig_p002_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Behavioral results. A: Individual PSDs obtained from [PITH_FULL_IMAGE:figures/full_fig_p003_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Dynamics of different noise colors. Top: Examples of short time series generated through spectral shaping with noise [PITH_FULL_IMAGE:figures/full_fig_p004_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Aggregated RL results. Violins and error bars indicate [PITH_FULL_IMAGE:figures/full_fig_p005_4.png] view at source ↗
read the original abstract

Exploration in deep reinforcement learning (RL) is commonly implemented as temporally uncorrelated white noise. However, recent works show that temporally correlated colored noise can improve exploration efficiency by producing smooth trajectories with better coverage of the state space. We inquire whether action noise inspired by infant spontaneous movements can also improve exploration in deep RL. We find that the power spectral densities of babies' end-effector velocities follow a colored noise process where the spectral exponent increases with age. Inspired by this developmental pattern, we introduce a mechanism that progressively increases the temporal auto-correlation of exploration noise during RL training, matching the infant statistics. Experiments across several RL environments show that infant-inspired noise produces structured exploratory behavior and can improve learning efficiency compared to conventional exploration strategies. These findings suggest that human motor and cognitive development can provide useful guidance for designing learning mechanisms in artificial agents. Our code is available at https://github.com/trieschlab/baby-noise-rl.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

1 major / 1 minor

Summary. The manuscript proposes an exploration mechanism for deep RL in which action noise is generated as colored noise whose spectral exponent (and thus temporal auto-correlation) is progressively increased during training to match the developmental trajectory observed in the power spectral densities of infant end-effector velocities. Experiments across several RL environments are reported to demonstrate that this infant-inspired schedule yields structured exploratory trajectories and improved sample efficiency relative to standard white-noise or fixed colored-noise baselines.

Significance. If the reported gains are robust and the infant trajectory is shown to be necessary, the work would indicate that developmental statistics from human motor behavior can supply a useful inductive bias for exploration design in artificial agents. The public release of code supports reproducibility.

major comments (1)
  1. [Experiments] Experiments section: the central claim that matching the specific infant developmental increase in spectral exponent supplies a useful inductive bias requires evidence that this trajectory outperforms simpler monotonic schedules. No comparison is presented against a linear ramp of correlation, a fixed colored noise with the same average exponent, or a random schedule with matched average correlation; without these controls the infant data are not shown to be load-bearing.
minor comments (1)
  1. Abstract: quantitative performance deltas, statistical tests, and environment details are absent; these should be added so readers can immediately assess the magnitude of the reported improvement.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for their constructive feedback and for recognizing the potential significance of the work. We address the major comment below.

read point-by-point responses
  1. Referee: [Experiments] Experiments section: the central claim that matching the specific infant developmental increase in spectral exponent supplies a useful inductive bias requires evidence that this trajectory outperforms simpler monotonic schedules. No comparison is presented against a linear ramp of correlation, a fixed colored noise with the same average exponent, or a random schedule with matched average correlation; without these controls the infant data are not shown to be load-bearing.

    Authors: We agree that the manuscript would be strengthened by direct comparisons demonstrating that the specific infant developmental trajectory is necessary for the reported benefits, rather than any monotonic increase or matched-average schedule. In the revised version we will add these controls: a linear ramp of the spectral exponent, fixed colored noise using the average exponent from the infant data, and a randomized schedule with matched average correlation. The new results will be reported in the Experiments section. revision: yes

Circularity Check

0 steps flagged

No significant circularity; empirical evaluation stands independent of biological inspiration

full rationale

The paper introduces a noise schedule whose parameters are taken directly from external infant movement measurements and then tests the resulting agent performance against standard baselines in multiple RL environments. No derivation, fitted parameter, or uniqueness theorem is presented whose output is definitionally identical to its input. The central claim (improved exploration efficiency) is evaluated by direct comparison to conventional strategies rather than by any self-referential construction. Self-citations, if present, are not load-bearing for the result.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on the empirical observation that infant end-effector velocity spectra follow a colored-noise process whose exponent increases with age; this observation is treated as an external fact imported into the RL design.

axioms (1)
  • domain assumption Infant spontaneous movements produce end-effector velocities whose power spectral density follows a colored noise process with age-dependent spectral exponent
    Stated directly in the abstract as the basis for the noise schedule

pith-pipeline@v0.9.1-grok · 5703 in / 1164 out tokens · 34872 ms · 2026-06-27T04:15:39.219778+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

30 extracted references

  1. [1]

    Helpless infants are learning a foundation model,

    R. Cusack, M. Ranzato, and C. J. Charvet, “Helpless infants are learning a foundation model,”Trends in Cognitive Sciences, vol. 28, no. 8, pp. 726–738, 2024

  2. [2]

    Continuous control with deep reinforcement learning,

    T. P. Lillicrap, J. J. Hunt, A. Pritzel, N. M. O. Heess, T. Erez, Y . Tassa, D. Silver, and D. P. Wierstra, “Continuous control with deep reinforcement learning,” in4th International Conference on Learning Representation, 2016. TABLE I: Average AUC scores for different algorithms and environments, z-scored across multiple noises and seeds. Noise types com...

  3. [3]

    Addressing function approxi- mation error in actor-critic methods,

    S. Fujimoto, H. Hoof, and D. Meger, “Addressing function approxi- mation error in actor-critic methods,” inInternational conference on machine learning. PMLR, 2018, pp. 1587–1596

  4. [4]

    Soft actor-critic: Off- policy maximum entropy deep reinforcement learning with a stochastic actor,

    T. Haarnoja, A. Zhou, P. Abbeel, and S. Levine, “Soft actor-critic: Off- policy maximum entropy deep reinforcement learning with a stochastic actor,” inInternational conference on machine learning. PMLR, 2018, pp. 1861–1870

  5. [5]

    Colored noise in dynamical systems,

    P. H ¨aunggi and P. Jung, “Colored noise in dynamical systems,” Advances in chemical physics, vol. 89, pp. 239–326, 1994

  6. [6]

    Sample-efficient cross-entropy method for real-time planning,

    C. Pinneri, S. Sawant, S. Blaes, J. Achterhold, J. Stueckler, M. Rolinek, and G. Martius, “Sample-efficient cross-entropy method for real-time planning,” inConference on Robot Learning. PMLR, 2021, pp. 1049– 1065

  7. [7]

    Action noise in off-policy deep reinforcement learning: Impact on exploration and performance,

    J. Hollenstein, S. Auddy, M. Saveriano, E. Renaudo, and J. Piater, “Action noise in off-policy deep reinforcement learning: Impact on exploration and performance,”Transactions on Machine Learning Research, 2022

  8. [8]

    Pink noise is all you need: Colored noise exploration in deep reinforcement learning,

    O. Eberhard, J. Hollenstein, C. Pinneri, and G. Martius, “Pink noise is all you need: Colored noise exploration in deep reinforcement learning,” inThe Eleventh International Conference on Learning Representations (ICLR), 2023

  9. [9]

    Colored noise in ppo: Improved exploration and performance through correlated action sampling,

    J. J. Hollenstein, G. Martius, and J. H. Piater, “Colored noise in ppo: Improved exploration and performance through correlated action sampling,” inProceedings of the AAAI Conference on Artificial Intelligence, 2024, pp. 12 466–12 472

  10. [10]

    Pink noise LQR: How does colored noise affect the optimal policy in RL?

    J. Hollenstein, M. Zaric, S. Tosatto, and J. Piater, “Pink noise LQR: How does colored noise affect the optimal policy in RL?” inICML 2024 Workshop: Foundations of Reinforcement Learning and Control– Connections and Perspectives, 2024

  11. [11]

    Coordination, control and skill,

    K. M. Newell, “Coordination, control and skill,” inAdvances in psychology. Elsevier, 1985, vol. 27, pp. 295–317

  12. [12]

    Stride- to-stride time intervals are independently affected by the temporal pattern and probability distribution of visual cues,

    P. C. Raffalt, J. H. Sommerfeld, N. Stergiou, and A. D. Likens, “Stride- to-stride time intervals are independently affected by the temporal pattern and probability distribution of visual cues,”Neuroscience letters, vol. 792, p. 136909, 2023

  13. [13]

    Frequency-specific fractal analysis of postural control accounts for control strategies,

    P. Gilfriche, V . Deschodt-Arsac, E. Blons, and L. M. Arsac, “Frequency-specific fractal analysis of postural control accounts for control strategies,”Frontiers in physiology, vol. 9, p. 293, 2018

  14. [14]

    Scale-free brain activity: past, present, and future,

    B. J. He, “Scale-free brain activity: past, present, and future,”Trends in cognitive sciences, vol. 18, no. 9, pp. 480–487, 2014

  15. [15]

    Thelen and L

    E. Thelen and L. B. Smith,A dynamic systems approach to the development of cognition and action. MIT press, 1994

  16. [16]

    Development as a dynamic system,

    L. B. Smith and E. Thelen, “Development as a dynamic system,” Trends in cognitive sciences, vol. 7, no. 8, pp. 343–348, 2003

  17. [17]

    Early human motor development: From variation to the ability to vary and adapt,

    M. Hadders-Algra, “Early human motor development: From variation to the ability to vary and adapt,”Neuroscience & Biobehavioral Reviews, vol. 90, pp. 411–427, 2018

  18. [18]

    Intrinsic motivation systems for autonomous mental development,

    P.-Y . Oudeyer, F. Kaplan, and V . V . Hafner, “Intrinsic motivation systems for autonomous mental development,”IEEE transactions on evolutionary computation, vol. 11, no. 2, pp. 265–286, 2007

  19. [19]

    Lessons from infant learning for unsupervised machine learning,

    L. Zaadnoordijk, T. R. Besold, and R. Cusack, “Lessons from infant learning for unsupervised machine learning,”Nature Machine Intelligence, vol. 4, no. 6, pp. 510–520, 2022

  20. [20]

    Time to aug- ment self-supervised visual representation learning,

    A. Aubret, M. R. Ernst, C. Teuli `ere, and J. Triesch, “Time to aug- ment self-supervised visual representation learning,” inThe Eleventh International Conference on Learning Representations (ICLR), 2023

  21. [21]

    Openpose: Realtime multi-person 2d pose estimation using part affinity fields,

    Z. Cao, G. Hidalgo Martinez, T. Simon, S. Wei, and Y . A. Sheikh, “Openpose: Realtime multi-person 2d pose estimation using part affinity fields,”IEEE Transactions on Pattern Analysis and Machine Intelligence, 2019

  22. [22]

    On generating power law noise

    J. Timmer and M. Koenig, “On generating power law noise.”Astron- omy and Astrophysics, v. 300, p. 707, vol. 300, p. 707, 1995

  23. [23]

    The nature and perception of fluctuations in human musical rhythms,

    H. Hennig, R. Fleischmann, A. Fredebohm, Y . Hagmayer, J. Nagler, A. Witt, F. J. Theis, and T. Geisel, “The nature and perception of fluctuations in human musical rhythms,”PloS one, vol. 6, no. 10, p. e26457, 2011

  24. [24]

    Long-range correlations in human standing,

    M. Duarte and V . M. Zatsiorsky, “Long-range correlations in human standing,”Physics Letters A, vol. 283, no. 1-2, pp. 124–128, 2001

  25. [25]

    Stable-baselines3: Reliable reinforcement learning im- plementations,

    A. Raffin, A. Hill, A. Gleave, A. Kanervisto, M. Ernestus, and N. Dormann, “Stable-baselines3: Reliable reinforcement learning im- plementations,”Journal of Machine Learning Research, vol. 22, no. 268, pp. 1–8, 2021

  26. [26]

    Gymnasium: A standard interface for reinforcement learning environments,

    M. Towers, A. Kwiatkowski, J. U. Balis, G. D. Cola, T. Deleu, M. Goul ˜ao, K. Andreas, M. Krimmel, A. KG, R. D. L. Perez- Vicente, J. K. Terry, A. Pierr ´e, S. V . Schulhoff, J. J. Tai, H. Tan, and O. G. Younis, “Gymnasium: A standard interface for reinforcement learning environments,” inThe Thirty-ninth Annual Conference on Neural Information Processing ...

  27. [27]

    ‘Helpless’ infants are active, goal-directed agents: response to Cusack et al

    M. Zettersten, R. Foushee, and M. K. Goddu, “‘Helpless’ infants are active, goal-directed agents: response to Cusack et al.”Trends in Cognitive Sciences, 2025

  28. [28]

    MIMo: A multimodal infant model for studying cognitive development,

    D. Mattern, P. Schumacher, F. M. L ´opez, M. C. Raabe, M. R. Ernst, A. Aubret, and J. Triesch, “MIMo: A multimodal infant model for studying cognitive development,”IEEE Transactions on Cognitive and Developmental Systems, vol. 16, no. 4, pp. 1291–1301, 2024

  29. [29]

    MIMo grows! simulating body and sensory development in a mul- timodal infant model,

    F. M. L ´opez, M. Lenz, M. G. Fedozzi, A. Aubret, and J. Triesch, “MIMo grows! simulating body and sensory development in a mul- timodal infant model,” in2025 IEEE International Conference on Development and Learning (ICDL). IEEE, 2025, pp. 1–6

  30. [30]

    Embodiment shapes rolling behavior in a multimodal infant model,

    L. Philipp, F. M. L ´opez, and J. Triesch, “Embodiment shapes rolling behavior in a multimodal infant model,” in2026 IEEE International Conference on Development and Learning (ICDL). IEEE, 2026, pp. 1–7