Recognition: no theorem link
A Proprioceptive-Only Benchmark for Quadruped State Estimation: ATE, RPE, and Runtime Trade-offs Between Filters and Smoothers
Pith reviewed 2026-05-13 00:58 UTC · model grok-4.3
The pith
IEKF and invariant smoother achieve lower long-term trajectory error than MUSE on proprioceptive quadruped data.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
On the CYN-1 sequence, the relative pose errors remain broadly similar across MUSE, the invariant extended Kalman filter, and the invariant smoother. The invariant extended Kalman filter and invariant smoother produce lower absolute trajectory error than MUSE. Computation times per update differ among the three, making the accuracy versus latency trade-offs visible when all methods run on the same fixed hardware and software stack.
What carries the argument
Side-by-side reporting of absolute trajectory error, relative pose error, and per-update runtime for MUSE, IEKF, and IS on the same proprioceptive dataset sequence.
If this is right
- Applications that prioritize low long-term drift without external corrections can favor IEKF or the invariant smoother over MUSE.
- When only short-horizon accuracy matters, any of the three methods may suffice since their relative pose errors are similar.
- The measured runtimes allow direct selection of an estimator whose speed fits a robot's real-time control loop constraints.
- Releasing the full evaluation code makes it possible for others to rerun the comparison on new hardware or additional sequences.
Where Pith is reading between the lines
- The pattern of similar short-term but differing long-term errors suggests that drift accumulation is the main differentiator, which could be tested by adding controlled disturbances to the dataset.
- These results could inform whether hybrid filters that borrow drift-correction ideas from the smoother would narrow the absolute error gap without increasing runtime much.
- Extending the benchmark to include sequences with faster gaits or uneven terrain would show whether the observed trade-offs hold under more dynamic conditions.
Load-bearing premise
The single chosen sequence and the evaluation protocol produce a fair comparison of the three estimators without biases from data selection or metric implementation.
What would settle it
Running the identical code on the identical CYN-1 sequence and hardware but obtaining a lower or equal absolute trajectory error for MUSE than for IEKF and IS.
Figures
read the original abstract
We compare three state-of-the-art proprioceptive state estimators for quadruped robots: MUSE [1], the Invariant Extended Kalman Filter (IEKF) [2], and the Invariant Smoother (IS) [3], on the CYN-1 sequence of the GrandTour Dataset [4]. Our goal is to give practitioners clear guidance on accuracy and computation time: we report long-term accuracy (Absolute Trajectory Error, ATE), short-term accuracy (translational and rotational Relative Pose Error, RPE), and per-update computation time on a fixed hardware/software stack. On this dataset, RPEs are broadly similar across methods, while IEKF and IS achieve a lower ATE than MUSE. Runtime results highlight the accuracy-latency trade-offs across the three approaches. In the discussion, we outline the evaluation choices used to ensure a fair comparison and analyze factors that influence short-horizon metrics. Overall, this study provides a concise snapshot of accuracy and cost, helping readers choose an estimator that fits their application constraints, with all evaluation code and documentation released open-source at https://github.com/iit-DLSLab/state_estimation_benchmark for full reproducibility.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper compares three proprioceptive state estimators for quadruped robots—MUSE, the Invariant Extended Kalman Filter (IEKF), and the Invariant Smoother (IS)—on the CYN-1 sequence of the GrandTour Dataset. It reports Absolute Trajectory Error (ATE) for long-term accuracy, translational and rotational Relative Pose Error (RPE) for short-term accuracy, and per-update runtime on fixed hardware. The central claims are that RPE values are broadly similar across the three methods while IEKF and IS achieve lower ATE than MUSE, with runtime results illustrating accuracy-latency trade-offs; all evaluation code is released open-source.
Significance. If the results hold, the work supplies practitioners with actionable guidance on selecting among existing proprioceptive estimators under accuracy and latency constraints. The use of standard metrics (ATE, RPE), explicit discussion of fairness measures (consistent preprocessing, identical initial conditions, shared covariance tuning), and full release of scripts and documentation constitute a reproducible empirical contribution that is valuable in a field where such benchmarks are often incomplete or non-reproducible.
minor comments (2)
- [Discussion] Discussion section: the paragraph addressing evaluation choices for fairness is helpful, but a short table summarizing the exact preprocessing steps, initial covariance values, and tuning parameters applied identically to MUSE, IEKF, and IS would make the fairness claim immediately verifiable without inspecting the repository.
- [Results] Results: the reported ATE and RPE numbers would benefit from explicit statement of the number of runs or seeds used (if any) and whether the single CYN-1 sequence was the only one processed; this does not affect the central comparison but improves clarity for readers wishing to replicate on additional sequences.
Simulated Author's Rebuttal
We thank the referee for the positive assessment of our benchmark study, the accurate summary of our claims, and the recommendation for minor revision. The significance section correctly identifies the value of our reproducible comparison using standard metrics and open-source code. Since the report lists no specific major comments, we have no individual points to address point-by-point.
Circularity Check
No significant circularity: empirical benchmark only
full rationale
The manuscript is a direct empirical comparison of three pre-existing estimators (MUSE, IEKF, IS) on public CYN-1 data. It computes standard ATE/RPE/runtime metrics under shared preprocessing and tuning, with released code. No derivations, fitted parameters renamed as predictions, self-definitional equations, or load-bearing self-citations appear. All claims rest on observable outputs from external dataset sequences rather than internal reduction to inputs.
Axiom & Free-Parameter Ledger
Reference graph
Works this paper leans on
-
[1]
MUSE: A real-time multi-sensor state estimator for quadruped robots,
Y . Nistic `o, J. C. V . Soares, L. Amatucci, G. Fink, and C. Sem- ini, “MUSE: A real-time multi-sensor state estimator for quadruped robots,”IEEE Robotics and Automation Letters, vol. 10, no. 5, pp. 4620–4627, 2025, DOI: 10.1109/LRA.2025.3553047
-
[2]
Contact- aided invariant extended kalman filtering for robot state estimation,
R. Hartley, M. Ghaffari, R. M. Eustice, and J. W. Grizzle, “Contact- aided invariant extended kalman filtering for robot state estimation,” The International Journal of Robotics Research, vol. 39, no. 4, pp. 402–430, 2020, DOI: 10.1177/0278364919894385
-
[3]
Invariant smoother for legged robot state estimation with dynamic contact event information,
Z. Yoon, J.-H. Kim, and H.-W. Park, “Invariant smoother for legged robot state estimation with dynamic contact event information,”IEEE Transactions on Robotics, vol. 40, pp. 193–212, 2024, DOI: 10.1109/ TRO.2023.3328202
-
[4]
Grandtour: A legged robotics dataset in the wild for multi-modal perception and state estimation,
J. Frey, T. Tuna, F. Fu, K. Patterson, T. Xu, M. Fallon, C. Ca- dena, and M. Hutter, “Grandtour: A legged robotics dataset in the wild for multi-modal perception and state estimation,”arXiv preprint arXiv:2602.18164, 2026
-
[5]
State estimation for legged robots: consistent fusion of leg kinematics and IMU,
M. Bloesch, M. Hutter, M. A. Hoepflinger, S. Leutenegger, C. Gehring, C. D. Remy, and R. Siegwart, “State estimation for legged robots: consistent fusion of leg kinematics and IMU,”Robotics, vol. 17, pp. 17–24, 2013, DOI: 10.15607/RSS.2012.VIII.003
-
[6]
The two-state implicit filter recursive estimation for mobile robots,
M. Bloesch, M. Burri, H. Sommer, R. Siegwart, and M. Hutter, “The two-state implicit filter recursive estimation for mobile robots,”IEEE Robot. Autom. Lett., vol. 3, no. 1, pp. 573–580, 2018, DOI: 10.1109/ LRA.2017.2776340
-
[7]
Proprioceptive sensor fusion for quadruped robot state estimation,
G. Fink and C. Semini, “Proprioceptive sensor fusion for quadruped robot state estimation,” in2020 IEEE/RSJ Int. Conf. Intell. Robots Syst. (IROS), 2020, pp. 10 914–10 920, DOI: 10.1109/IROS45743. 2020.9341521
-
[8]
Proprioceptive state estimation of legged robots with kinematic chain modeling,
V . Agrawal, S. Bertrand, R. Griffin, and F. Dellaert, “Proprioceptive state estimation of legged robots with kinematic chain modeling,” in 2022 IEEE-RAS 21st International Conference on Humanoid Robots (Humanoids), 2022, pp. 178–185, DOI: 10.1109/Humanoids53995. 2022.10000099
-
[9]
H. M. S. Santana, J. C. V . Soares, Y . Nistic `o, M. A. Meggiolaro, and C. Semini, “Proprioceptive state estimation for quadruped robots using invariant Kalman filtering and scale-variant robust cost functions,” in 2024 IEEE-RAS Int. Conf. Humanoid Robots, 2024, pp. 213–220, DOI: 10.1109/Humanoids58906.2024.10769911
-
[10]
A benchmark for the evaluation of RGB-D SLAM systems,
J. Sturm, N. Engelhard, F. Endres, W. Burgard, and D. Cremers, “A benchmark for the evaluation of RGB-D SLAM systems,” in2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, 2012, pp. 573–580, DOI: 10.1109/IROS.2012.6385773
-
[11]
H. F. Grip, T. I. Fossen, T. A. Johansen, and A. Saberi, “Globally exponentially stable attitude and gyro bias estimation with application to GNSS/INS integration,”Automatica, vol. 51, pp. 158–166, 2015, DOI: 10.1016/j.automatica.2014.10.076
-
[12]
The eXogenous Kalman filter (XKF),
T. A. Johansen and T. I. Fossen, “The eXogenous Kalman filter (XKF),”International Journal of Control, vol. 90, no. 2, pp. 161–167, 2017, DOI: 10.1080/00207179.2016.1172390
-
[13]
evo: Python package for the evaluation of odometry and slam
M. Grupp, “evo: Python package for the evaluation of odometry and slam.” https://github.com/MichaelGrupp/evo, 2017
work page 2017
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.