pith. machine review for the scientific record. sign in

arxiv: 2604.10351 · v2 · submitted 2026-04-11 · 💻 cs.RO

Recognition: unknown

Trajectory-based actuator identification via differentiable simulation

Authors on Pith no claims yet

Pith reviewed 2026-05-10 15:18 UTC · model grok-4.3

classification 💻 cs.RO
keywords actuator identificationdifferentiable simulationsystem identificationrobot dynamicssim-to-real transfertrajectory optimizationPD control
0
0 comments X

The pith

Differentiable simulation recovers accurate actuator models from joint trajectories without torque sensing.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This paper establishes that actuator dynamics can be identified accurately by matching observed joint motions in a differentiable simulator, using only encoder data and no torque or current measurements. The approach optimizes both actuator parameters and simulator settings by backpropagating trajectory errors, supporting everything from simple parametric models to neural networks. A reader would care because high-fidelity actuation models are essential for closing the simulation-to-reality gap in robotics, where poor models lead to policies that fail on hardware. On real high-gear-ratio actuators with embedded controllers, this yields position errors nearly half those of a baseline trained on dedicated test-stand data. The method also improves downstream tasks, such as training locomotion policies that travel farther and stay straighter.

Core claim

We formulate actuator identification as an optimization problem that minimizes the mismatch between simulated and measured joint trajectories by differentiating through the simulator. This torque-sensor-free procedure recovers parameters for a high-gear-ratio actuator that achieve a mean absolute position error of 7.54 mrad on held-out real trajectories, compared to 14.20 mrad for a supervised baseline. When these models are used to train locomotion policies, the robot travels 46% farther with 75% less rotational deviation.

What carries the argument

Gradient-based optimization of actuator parameters by backpropagating through a differentiable dynamics simulator, using only position and velocity trajectory data.

If this is right

  • Reduces mean absolute position error on held-out trajectories from 14.20 mrad to 7.54 mrad.
  • Increases travel distance by 46% in real-robot locomotion experiments.
  • Reduces rotational deviation by 75% relative to policies trained with baseline actuator models.
  • Works with both structured parameterizations and neural actuator mappings in one pipeline.
  • Requires no torque sensors, current measurements, or internal controller access.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • This method might allow continuous online adaptation of actuator models during robot operation using only its normal movements.
  • Similar differentiable identification could apply to other components like sensors or linkages from motion data.
  • Combining this with reinforcement learning could create end-to-end trainable simulation environments that improve policy transfer.
  • The reduction in error suggests that trajectory diversity matters more than steady-state torque data for capturing dynamic actuator behavior.

Load-bearing premise

Errors in simulated joint trajectories are sufficient to uniquely recover the true actuator dynamics without direct torque or internal state measurements.

What would settle it

Finding a set of trajectories where the identified model matches motion well but diverges significantly when tested against independent torque measurements on the same actuator would falsify the uniqueness of recovery from motion alone.

Figures

Figures reproduced from arXiv: 2604.10351 by Egor Davydenko, Ekaterina Chaikovskaia, Roman Gorbachev, Vyacheslav Kovalev.

Figure 1
Figure 1. Figure 1: FIGURE 1 [PITH_FULL_IMAGE:figures/full_fig_p005_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: FIGURE 2 [PITH_FULL_IMAGE:figures/full_fig_p006_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: FIGURE 3 [PITH_FULL_IMAGE:figures/full_fig_p007_3.png] view at source ↗
Figure 5
Figure 5. Figure 5: FIGURE 5 [PITH_FULL_IMAGE:figures/full_fig_p008_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: FIGURE 6 [PITH_FULL_IMAGE:figures/full_fig_p008_6.png] view at source ↗
Figure 4
Figure 4. Figure 4: FIGURE 4 [PITH_FULL_IMAGE:figures/full_fig_p008_4.png] view at source ↗
Figure 7
Figure 7. Figure 7: FIGURE 7 [PITH_FULL_IMAGE:figures/full_fig_p009_7.png] view at source ↗
Figure 8
Figure 8. Figure 8: FIGURE 8 [PITH_FULL_IMAGE:figures/full_fig_p011_8.png] view at source ↗
Figure 9
Figure 9. Figure 9: FIGURE 9 [PITH_FULL_IMAGE:figures/full_fig_p011_9.png] view at source ↗
read the original abstract

Accurate actuation models are critical for bridging the gap between simulation and real robot behavior, yet obtaining high-fidelity actuator dynamics typically requires dedicated test stands and torque sensing. We present a trajectory-based actuator identification method that uses differentiable simulation to fit system-level actuator models from encoder motion alone. Identification is posed as a trajectory-matching problem: given commanded joint positions and measured joint angles and velocities, we optimize actuator and simulator parameters by backpropagating through the simulator, without torque sensors, current/voltage measurements, or access to embedded motor-control internals. The framework supports multiple model classes, ranging from compact structured parameterizations to neural actuator mappings, within a unified optimization pipeline. On held-out real-robot trajectories for a high-gear-ratio actuator with an embedded PD controller, the proposed torque-sensor-free identification achieves much tighter trajectory alignment than a supervised stand-trained baseline dominated by steady-state data, reducing mean absolute position error from 14.20 mrad to as low as 7.54 mrad (1.88 times). Finally, we demonstrate downstream impact for the same actuator class in a real-robot locomotion study: training policies with the refined actuator model increases travel distance by 46% and reduces rotational deviation by 75% relative to the baseline.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The manuscript presents a trajectory-based actuator identification technique that leverages differentiable simulation to optimize actuator and simulator parameters by minimizing the discrepancy between commanded and observed joint trajectories using only encoder data. It reports improved held-out trajectory matching for a high-gear-ratio actuator with embedded PD control, reducing mean absolute position error from 14.20 mrad to 7.54 mrad, and shows that policies trained with the identified model achieve 46% greater travel distance and 75% less rotational deviation in real-robot locomotion experiments.

Significance. Should the approach reliably recover actuator dynamics rather than merely overfitting to specific closed-loop trajectories, it would offer a practical alternative to sensor-intensive identification methods, facilitating more accurate simulation-based policy training without specialized hardware. The inclusion of downstream real-world validation strengthens the potential impact for robotics applications involving sim-to-real transfer.

major comments (2)
  1. [Abstract] Abstract: The central claim of torque-sensor-free 'actuator identification' rests on trajectory matching via backpropagation through the simulator. However, for high-gear-ratio actuators with embedded PD controllers, the optimization of parameters (friction, stiffness, damping, effective gains) to match position/velocity trajectories alone does not establish uniqueness; multiple parameter sets can reproduce the same closed-loop behavior, so the held-out error reduction (14.20 mrad to 7.54 mrad) demonstrates improved fitting but not recovery of true open-loop dynamics.
  2. [§4 (Experiments)] §4 (Experiments): The comparison to the supervised stand-trained baseline is undermined by the baseline being 'dominated by steady-state data'; without explicit details on data splitting, trajectory distribution matching, or whether the baseline had access to equivalent dynamic trajectories, it is unclear whether the reported gains are attributable to the differentiable method or to differences in training data coverage.
minor comments (2)
  1. [Abstract] Abstract and §3 (Method): The manuscript does not report statistical significance (e.g., standard errors or p-values) for the error reductions or the 46%/75% policy improvements, nor does it specify convergence criteria or initialization strategy for the parameter optimization.
  2. [§3 (Method)] §3 (Method): While the framework is said to support multiple model classes, the exact parameterization of the structured models (e.g., how friction and damping terms are defined) and the neural mapping architecture are not detailed enough for full reproducibility.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive and insightful comments. We address each major comment point-by-point below, clarifying the scope of our claims and committing to revisions that strengthen the manuscript's presentation of data details and limitations.

read point-by-point responses
  1. Referee: [Abstract] Abstract: The central claim of torque-sensor-free 'actuator identification' rests on trajectory matching via backpropagation through the simulator. However, for high-gear-ratio actuators with embedded PD controllers, the optimization of parameters (friction, stiffness, damping, effective gains) to match position/velocity trajectories alone does not establish uniqueness; multiple parameter sets can reproduce the same closed-loop behavior, so the held-out error reduction (14.20 mrad to 7.54 mrad) demonstrates improved fitting but not recovery of true open-loop dynamics.

    Authors: We agree that the optimization yields parameters that reproduce observed closed-loop trajectories and does not guarantee uniqueness or recovery of the underlying open-loop dynamics; this is an inherent limitation of torque-sensor-free identification from position/velocity data alone. Our contribution focuses on practical sim-to-real utility: the resulting model improves held-out trajectory prediction and, more importantly, yields policies with 46% greater real-world travel distance. We will revise the abstract and §1 to explicitly state that the method targets closed-loop trajectory fidelity rather than claiming recovery of true open-loop parameters. revision: yes

  2. Referee: [§4 (Experiments)] §4 (Experiments): The comparison to the supervised stand-trained baseline is undermined by the baseline being 'dominated by steady-state data'; without explicit details on data splitting, trajectory distribution matching, or whether the baseline had access to equivalent dynamic trajectories, it is unclear whether the reported gains are attributable to the differentiable method or to differences in training data coverage.

    Authors: We acknowledge that additional details are required to substantiate the baseline comparison. The stand-trained baseline used data from a dedicated test stand that included dynamic trajectories, but the collection protocol resulted in a distribution skewed toward steady-state conditions. Our method leverages a wider range of operational dynamic trajectories. In the revision we will expand §4 with explicit descriptions of data collection protocols, train/validation splits, trajectory statistics (e.g., velocity histograms), and coverage metrics for both datasets to enable direct assessment of whether performance differences arise from the identification method or data distribution. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical trajectory fitting with held-out validation

full rationale

The paper frames actuator identification as an optimization problem that minimizes position/velocity trajectory mismatch by back-propagating through a differentiable simulator. Parameters are fitted on training trajectories and evaluated on held-out real-robot data, with further downstream policy training results reported on physical hardware. This is a standard data-driven fitting procedure whose reported error reductions (e.g., 14.20 mrad to 7.54 mrad) and locomotion improvements are measured against an external baseline on independent test trajectories. No step reduces by definition to its own inputs, no fitted quantity is relabeled as a prediction, and no load-bearing self-citation chain is invoked. The derivation chain remains self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

2 free parameters · 2 axioms · 0 invented entities

The central claim rests on the simulator being differentiable and on trajectory data being informative enough to constrain actuator parameters; no new physical entities are introduced.

free parameters (2)
  • actuator model parameters
    Optimized via backpropagation to match observed trajectories; specific values not stated in abstract.
  • simulator parameters
    Jointly optimized with actuator parameters in the unified pipeline.
axioms (2)
  • domain assumption The simulation dynamics are differentiable with respect to actuator and simulator parameters
    Required to enable gradient-based optimization through the simulator.
  • domain assumption Observed joint trajectories contain sufficient excitation to identify the actuator dynamics
    Necessary for the trajectory-matching objective to recover meaningful parameters without torque data.

pith-pipeline@v0.9.0 · 5524 in / 1427 out tokens · 47001 ms · 2026-05-10T15:18:32.969438+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

44 extracted references · 11 canonical work pages · 2 internal anchors

  1. [1]

    Di Carlo, P

    J. Di Carlo, P . M. Wensing, B. Katz, G. Bledt, and S. Kim, ‘‘Dynamic loco- motion in the mit cheetah 3 through convex model-predictive control,’’ in 2018 IEEE/RSJ international conference on intelligent robots and systems (IROS). IEEE, 2018, pp. 1–9

  2. [2]

    D. Kim, J. Di Carlo, B. Katz, G. Bledt, and S. Kim, ‘‘Highly dynamic quadruped locomotion via whole-body impulse control and model predic- tive control,’’arXiv preprint arXiv:1909.06586, 2019

  3. [3]

    Gaertner, M

    M. Gaertner, M. Bjelonic, F. Farshidian, and M. Hutter, ‘‘Collision-free mpc for legged robots in static and dynamic scenes,’’ in2021 IEEE International Conference on Robotics and Automation (ICRA). IEEE, 2021, pp. 8266–8272

  4. [4]

    Hwangbo, J

    J. Hwangbo, J. Lee, A. Dosovitskiy, D. Bellicoso, V . Tsounis, V . Koltun, and M. Hutter, ‘‘Learning agile and dynamic motor skills for legged robots,’’Science Robotics, vol. 4, no. 26, p. eaau5872, 2019

  5. [5]

    James, P

    S. James, P . Wohlhart, M. Kalakrishnan, D. Kalashnikov, A. Irpan, J. Ibarz, S. Levine, R. Hadsell, and K. Bousmalis, ‘‘Sim-to-real via sim-to-sim: Data-efficient robotic grasping via randomized-to-canonical adaptation networks,’’ inProceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2019, pp. 12 627–12 637

  6. [6]

    Aljalbout, J

    E. Aljalbout, J. Xing, A. Romero, I. Akinola, C. R. Garrett, E. Heiden, A. Gupta, T. Hermans, Y . Narang, D. Foxet al., ‘‘The reality gap in robotics: Challenges, solutions, and best practices,’’Annual Review of Control, Robotics, and Autonomous Systems, vol. 9, 2025

  7. [7]

    Z. Xie, P . Gergondet, F. Kanehiroet al., ‘‘Learning bipedal walking for humanoids with current feedback,’’IEEE Access, vol. 11, pp. 82 013– 82 023, 2023

  8. [8]

    Siekmann, Y

    J. Siekmann, Y . Godse, A. Fern, and J. Hurst, ‘‘Sim-to-real learning of all common bipedal gaits via periodic reward composition,’’ in2021 IEEE International Conference on Robotics and Automation (ICRA). IEEE, 2021, pp. 7309–7315

  9. [9]

    Rodriguez and S

    D. Rodriguez and S. Behnke, ‘‘Deepwalk: Omnidirectional bipedal gait by deep reinforcement learning,’’ in2021 IEEE international conference on robotics and automation (ICRA). IEEE, 2021, pp. 3033–3039

  10. [10]

    Z. Li, X. B. Peng, P . Abbeel, S. Levine, G. Berseth, and K. Sreenath, ‘‘Reinforcement learning for versatile, dynamic, and robust bipedal lo- comotion control,’’The International Journal of Robotics Research, p. 02783649241285161, 2024

  11. [11]

    Ravichandar, L

    P . Ravichandar, L. Krishna, N. Sobanbabu, and Q. Nguyen, ‘‘Pref- erenced oracle guided multi-mode policies for dynamic bipedal loco- manipulation,’’ in2025 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE, 2025, pp. 6600–6606

  12. [12]

    Zhang, L

    H. Zhang, L. Zhang, Z. Chen, L. Chen, Y . Wang, and R. Xiong, ‘‘Natural humanoid robot locomotion with generative motion prior,’’arXiv preprint arXiv:2503.09015, 2025

  13. [13]

    Rudin, D

    N. Rudin, D. Hoeller, P . Reist, and M. Hutter, ‘‘Learning to walk in minutes using massively parallel deep reinforcement learning,’’ inConference on Robot Learning. PMLR, 2022, pp. 91–100

  14. [14]

    Isaac Gym: High Performance GPU-Based Physics Simulation For Robot Learning

    V . Makoviychuk, L. Wawrzyniak, Y . Guo, M. Lu, K. Storey, M. Macklin, D. Hoeller, N. Rudin, A. Allshire, A. Handaet al., ‘‘Isaac gym: High per- formance gpu-based physics simulation for robot learning,’’arXiv preprint arXiv:2108.10470, 2021

  15. [15]

    Isaac Lab: A GPU-Accelerated Simulation Framework for Multi-Modal Robot Learning

    M. Mittal, P . Roth, J. Tigue, A. Richard, O. Zhang, P . Du, A. Serrano- Muñoz, X. Y ao, R. Zurbrügg, N. Rudin, L. Wawrzyniak, M. Rakhsha, A. Denzler, E. Heiden, A. Borovicka, O. Ahmed, I. Akinola, A. Anwar, M. T. Carlson, J. Y . Feng, A. Garg, R. Gasoto, L. Gulich, Y . Guo, M. Gussert, A. Hansen, M. Kulkarni, C. Li, W. Liu, V . Makoviychuk, G. Malczyk, H...

  16. [16]

    Schmidt, T

    A. Schmidt, T. Gumpert, S. Schreiber, and A. Albu-Schäffer, ‘‘Practical approach to characterize realistic motor dynamics for robotic simulation independent of the use case,’’ in2022 IEEE/ASME International Confer- ence on Advanced Intelligent Mechatronics (AIM). IEEE, 2022, pp. 1144– 1151

  17. [17]

    A. C. Bittencourt, E. Wernholt, S. Sander-Tavallaey, and T. Brogårdh, ‘‘An extended friction model to capture load and temperature effects in robot joints,’’ in2010 IEEE/RSJ international conference on intelligent robots and systems. IEEE, 2010, pp. 6161–6167

  18. [18]

    Wolf and M

    S. Wolf and M. Iskandar, ‘‘Extending a dynamic friction model with nonlinear viscous and thermal dependency for a motor and harmonic drive gear,’’ in2018 IEEE International Conference on Robotics and Automation (ICRA). IEEE, 2018, pp. 783–790

  19. [19]

    F. Wang, Z. Zhang, X. Mei, J. Rodríguez, and R. Kennel, ‘‘Advanced con- trol strategies of induction machine: Field oriented control, direct torque control and model predictive control,’’energies, vol. 11, no. 1, p. 120, 2018

  20. [20]

    Zhang, B

    Y . Zhang, B. Xia, H. Y ang, and J. Rodriguez, ‘‘Overview of model pre- dictive control for induction motor drives,’’Chinese Journal of Electrical Engineering, vol. 2, no. 1, pp. 62–76, 2016

  21. [21]

    Martyr and M

    A. Martyr and M. Plint, ‘‘8 - dynamometers and the measurement of torque,’’ inEngine Testing (Third Edition), third edition ed., A. Martyr and M. Plint, Eds. Oxford: Butterworth-Heinemann, 2007, pp. 144–

  22. [22]

    Available: https://www.sciencedirect.com/science/article/ pii/B9780750684392500116

    [Online]. Available: https://www.sciencedirect.com/science/article/ pii/B9780750684392500116

  23. [23]

    Sziki, A

    G. Sziki, A. Szanto, J. Kiss, G. Juhasz, and E. Adamko, ‘‘Measurement system for the experimental study and testing of electric motors at the faculty of engineering,’’University of Debrecen. Applied Sciences, 12 (19), pp. 1–18, 2022

  24. [24]

    Lee and D.-K

    T.-W. Lee and D.-K. Hong, ‘‘Performance validation of high-speed motor for electric turbochargers using various test methods,’’Electronics, vol. 12, no. 13, p. 2937, 2023

  25. [25]

    arXiv preprint arXiv:1910.07113 , year=

    I. Akkaya, M. Andrychowicz, M. Chociej, M. Litwin, B. McGrew, A. Petron, A. Paino, M. Plappert, G. Powell, R. Ribaset al., ‘‘Solving rubik’s cube with a robot hand,’’arXiv preprint arXiv:1910.07113, 2019

  26. [26]

    Tiboni, P

    G. Tiboni, P . Klink, J. Peters, T. Tommasi, C. D’Eramo, and G. Chalvatzaki, ‘‘Domain randomization via entropy maximization,’’arXiv preprint arXiv:2311.01885, 2023

  27. [27]

    Haarnoja, B

    T. Haarnoja, B. Moran, G. Lever, S. H. Huang, D. Tirumala, J. Hump- lik, M. Wulfmeier, S. Tunyasuvunakool, N. Y . Siegel, R. Hafneret al., ‘‘Learning agile soccer skills for a bipedal robot with deep reinforcement learning,’’Science Robotics, vol. 9, no. 89, p. eadi8022, 2024

  28. [28]

    Duclusaud, G

    M. Duclusaud, G. Passault, V . Padois, and O. Ly, ‘‘Extended friction models for the physics simulation of servo actuators,’’ in2025 IEEE International Conference on Robotics and Automation (ICRA). IEEE, 2025, pp. 12 091–12 097

  29. [29]

    C. D. Freeman, E. Frey, A. Raichuk, S. Girgin, I. Mordatch, and O. Bachem, ‘‘Brax - a differentiable physics engine for large scale rigid body simulation,’’ 2021. [Online]. Available: http://github.com/google/brax

  30. [30]

    Heiden, D

    E. Heiden, D. Millard, E. Coumans, Y . Sheng, and G. S. Sukhatme, ‘‘NeuralSim: Augmenting differentiable simulators with neural networks,’’ inProceedings of the IEEE International Conference on Robotics and Automation (ICRA), 2021. [Online]. Available: https://github.com/ google-research/tiny-differentiable-simulator VOLUME 11, 2023 13 Kovalevet al.: Traj...

  31. [31]

    T. Erez, Y . Tassa, and E. Todorov, ‘‘Simulation tools for model-based robotics: Comparison of bullet, havok, mujoco, ode and physx,’’ in2015 IEEE international conference on robotics and automation (ICRA). IEEE, 2015, pp. 4397–4404

  32. [32]

    C. Liao, Y . Wang, X. Ding, Y . Ren, X. Duan, and J. He, ‘‘Performance comparison of typical physics engines using robot models with multiple joints,’’IEEE Robotics and Automation Letters, 2023

  33. [33]

    Z. Xie, G. Berseth, P . Clary, J. Hurst, and M. V an de Panne, ‘‘Feedback control for cassie with deep reinforcement learning,’’ in2018 IEEE/RSJ In- ternational Conference on Intelligent Robots and Systems (IROS). IEEE, 2018, pp. 1241–1246

  34. [34]

    Z. Xie, P . Clary, J. Dao, P . Morais, J. Hurst, and M. van de Panne, ‘‘Iterative reinforcement learning based design of dynamic locomotion skills for cassie,’’arXiv preprint arXiv:1903.09537, 2019

  35. [35]

    M. Kaup, C. Wolff, H. Hwang, J. Mayer, and E. Bruni, ‘‘A review of nine physics engines for reinforcement learning research,’’arXiv preprint arXiv:2407.08590, 2024

  36. [36]

    N. Fey, G. B. Margolis, M. Peticco, and P . Agrawal, ‘‘Bridging the sim-to- real gap for athletic loco-manipulation,’’arXiv preprint arXiv:2502.10894, 2025

  37. [37]

    T. He, J. Gao, W. Xiao, Y . Zhang, Z. Wang, J. Wang, Z. Luo, G. He, N. Sobanbab, C. Panet al., ‘‘Asap: Aligning simulation and real-world physics for learning agile humanoid whole-body skills,’’arXiv preprint arXiv:2502.01143, 2025

  38. [38]

    Aeran and H

    A. Aeran and H. G. Lemu, ‘‘Time integration schemes in dynamic problems-effect of damping on numerical stability and accuracy,’’ in6th International Workshop of Advanced Manufacturing and Automation. At- lantis Press, 2016, pp. 213–220

  39. [39]

    [Online]

    STARKIT, ‘‘Roki-2,’’ https://starkit.su/roki-2/, 2025. [Online]. Available: https://starkit.su/roki-2/

  40. [40]

    2016, arXiv e-prints, arXiv:1604.00772, doi: 10.48550/arXiv.1604.00772

    N. Hansen, ‘‘The cma evolution strategy: A tutorial,’’arXiv preprint arXiv:1604.00772, 2016

  41. [41]

    [Online]

    ‘‘High torque,’’ 2025. [Online]. Available: https://github.com/ HighTorque-Robotics

  42. [42]

    [Online]

    STARKIT, ‘‘Aluminum servo motor,’’ https://starkit.su/servo-alum/, 2025. [Online]. Available: https://starkit.su/servo-alum/

  43. [43]

    Raffin, A

    A. Raffin, A. Hill, A. Gleave, A. Kanervisto, M. Ernestus, and N. Dormann, ‘‘Stable-baselines3: Reliable reinforcement learning implementations,’’ Journal of Machine Learning Research, vol. 22, no. 268, pp. 1–8, 2021. [Online]. Available: http://jmlr.org/papers/v22/20-1364.html VYACHESLAV KOVALEVis a researcher at the Moscow Institute of Physics and Techn...

  44. [44]

    Ekaterina is the author of seven scientific publications in the field of control systems

    Her research interests include reinforcement learning for locomotion, legged robotics, and con- trol systems. Ekaterina is the author of seven scientific publications in the field of control systems. 14 VOLUME 11, 2023 Kovalevet al.: Trajectory-based Actuator Identification via Differentiable Simulation EGOR DAVYDENKOis a researcher at the Moscow Institut...