pith. sign in

arxiv: 2605.27644 · v1 · pith:G3WDBEMBnew · submitted 2026-05-26 · 💻 cs.RO · cs.AI· cs.LG

Trinity: Unifying Class-Agnostic Terrain and Semantic Segmentation for Unstructured Outdoor Environments by Leveraging Synthetic Data

Pith reviewed 2026-06-29 16:48 UTC · model grok-4.3

classification 💻 cs.RO cs.AIcs.LG
keywords class-agnostic terrain segmentationsemantic segmentationtransformer networksynthetic dataunstructured outdoor environmentsmobile roboticstraversability estimation
0
0 comments X

The pith

A single transformer learns both semantic classes and unlabeled terrain regions from visual appearance alone.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper proposes Trinity, a transformer-based network that performs semantic segmentation and class-agnostic terrain segmentation in one model. Terrain is defined purely by visual appearance, without any robot-specific traversability scores or fixed class labels. This produces robot-agnostic terrain priors that downstream tasks can later combine with platform-specific data. The authors support training with a new synthetic dataset generated in an extended simulator and a real-world dataset with dual annotations. Experiments show the joint approach is feasible for complex outdoor scenes.

Core claim

Trinity is a unified transformer architecture that jointly outputs class-specific semantic segmentation and class-agnostic terrain segmentation. Terrain segmentation relies solely on visual appearance, without predefined semantic labels or robot-dependent traversability scores. This formulation produces robot-agnostic visual terrain priors usable for tasks such as traversability estimation, visual odometry, and mission planning. Training is enabled by the RUGDSynth synthetic dataset and the EXTerra real-world dataset that supplies both label types.

What carries the argument

The Trinity transformer network that shares a backbone to produce both semantic class maps and class-agnostic terrain maps in a single forward pass.

If this is right

  • Terrain segmentation becomes independent of any particular robot's capabilities or annotation scheme.
  • The same visual terrain priors can be reused across different platforms without new labeling.
  • Synthetic data generation at scale becomes practical for covering diverse terrain appearances.
  • Joint training of the two tasks improves feature sharing inside the network for outdoor scenes.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The separation of terrain appearance from robot-specific scoring could simplify transfer when a robot's wheels, mass, or sensors change.
  • Class-agnostic terrain maps might serve as an additional input channel for other perception modules such as visual odometry.
  • The dual-annotation real dataset could become a benchmark for testing how well appearance-based terrain generalizes beyond the training environments.

Load-bearing premise

Visual appearance by itself is enough to define terrain regions that remain useful when later paired with any robot's own experience.

What would settle it

A controlled test in which terrain maps produced by the class-agnostic head, when fused with robot-specific data, yield no improvement or a measurable drop in traversability estimation accuracy relative to semantic segmentation alone.

Figures

Figures reproduced from arXiv: 2605.27644 by Abel Gawel, Marcus G M\"uller, Maximilian Durner, Riccardo Giubilato, Roland Siegwart, Rudolph Triebel, Wolfgang St\"urzl, Wout Boerdijk.

Figure 1
Figure 1. Figure 1: A rover and a drone observe a planetary exploration outdoor [PITH_FULL_IMAGE:figures/full_fig_p001_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: The figure provides an overview of Trinity-Net. The model consists of an image encoder and three main components: the Split Transformer, the [PITH_FULL_IMAGE:figures/full_fig_p003_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Sample images from RUGDSynth overlaid with the corresponding annotations. Terrain regions are randomly colored in shades of cyan. [PITH_FULL_IMAGE:figures/full_fig_p004_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Illustration of the residual prediction calculation. All region proposal [PITH_FULL_IMAGE:figures/full_fig_p005_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Image of the planetary exploration outdoor laboratory and example samples from the EXTerra dataset with partially overlaid annotations and the [PITH_FULL_IMAGE:figures/full_fig_p006_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Selected qualitative results for the RUGD dataset. Class-agnostic regions are colored in shades of cyan. Note that the cyan colors used for the [PITH_FULL_IMAGE:figures/full_fig_p007_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: Selected qualitative results for the EXTerra dataset. The coloring scheme follows that of Fig. [PITH_FULL_IMAGE:figures/full_fig_p007_7.png] view at source ↗
read the original abstract

Terrain understanding is fundamental for mobile robots operating in unstructured outdoor environments. Existing vision-based traversability estimation methods rely on robot-specific annotations or semantic class mappings, limiting transferability across platforms and requiring costly re-annotation when robot capabilities change, while standard semantic segmentation methods only focus on specific predefined classes, which do not capture the variety of terrains. In this work, we propose a transformer-based architecture that jointly performs class-specific semantic segmentation and class-agnostic terrain segmentation within a unified network, called Trinity. Terrain regions are segmented based solely on visual appearance, without predefined semantic labels or robot-dependent traversability scores. This formulation enables the learning of robot-agnostic visual terrain priors that can be combined with robot-specific experience for downstream tasks such as traversability estimation, visual odometry, and mission planning. To enable large-scale training with diverse terrain appearances, we extend the OAISYS simulator and introduce RUGDSynth, a synthetic dataset inspired by RUGD with class-agnostic terrain samples. Furthermore, we present the EXTerra Dataset, providing real-world images annotated with both class-specific and class-agnostic terrain labels. Experiments demonstrate the feasibility of the proposed task and the effectiveness of our joint segmentation approach in complex outdoor environments. Code and datasets will be released with this publication (after review).

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 3 minor

Summary. The paper claims to introduce a transformer-based architecture called Trinity that unifies class-specific semantic segmentation and class-agnostic terrain segmentation for outdoor robot environments. By using synthetic data from an extended OAISYS simulator (RUGDSynth) and a new real dataset (EXTerra), the model learns terrain segmentation based on visual appearance without semantic labels or robot-specific scores. This is intended to provide robot-agnostic priors for downstream tasks. Experiments are presented to show the feasibility of the joint task and the effectiveness of the approach.

Significance. If the results hold, this work could have significant impact by enabling more generalizable terrain understanding for mobile robots in unstructured environments, addressing the limitations of robot-specific annotations. The creation and release of the RUGDSynth and EXTerra datasets, along with the code, would be a substantial contribution to the robotics community. The class-agnostic approach is a promising way to handle terrain variety beyond predefined classes.

major comments (2)
  1. [Experiments] The claim of effectiveness of the joint segmentation approach requires supporting quantitative evidence. Please provide specific metrics such as mean Intersection over Union (mIoU) for both semantic and terrain segmentation tasks, along with comparisons to baseline models trained separately on each task and ablations on the use of synthetic data.
  2. [Method] §3: The description of the Trinity architecture should detail the loss functions used for the joint training and how the class-agnostic terrain labels are incorporated, as this is central to validating the unified network's performance.
minor comments (3)
  1. [Abstract] The abstract mentions 'complex outdoor environments' but could specify the types of terrains or conditions tested for better context.
  2. [Introduction] Ensure that the motivation for robot-agnostic priors is clearly distinguished from the demonstrated results, as the combination with robot-specific experience is presented as a future possibility.
  3. [Datasets] Provide more details on the annotation process for class-agnostic terrain labels in EXTerra to allow readers to understand the visual appearance criteria used.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive comments on our manuscript. We will revise the paper to address the requests for additional quantitative evidence and expanded methodological details.

read point-by-point responses
  1. Referee: [Experiments] The claim of effectiveness of the joint segmentation approach requires supporting quantitative evidence. Please provide specific metrics such as mean Intersection over Union (mIoU) for both semantic and terrain segmentation tasks, along with comparisons to baseline models trained separately on each task and ablations on the use of synthetic data.

    Authors: We agree that the current presentation of results would be strengthened by explicit quantitative metrics. In the revised manuscript we will report mIoU for both the semantic segmentation and class-agnostic terrain segmentation tasks on EXTerra, include direct comparisons against models trained separately on each task, and add ablation studies that isolate the contribution of the RUGDSynth synthetic data. revision: yes

  2. Referee: [Method] §3: The description of the Trinity architecture should detail the loss functions used for the joint training and how the class-agnostic terrain labels are incorporated, as this is central to validating the unified network's performance.

    Authors: We will expand Section 3 to specify the loss functions employed during joint training and to clarify how the class-agnostic terrain labels are generated and combined with the semantic labels within the shared network. revision: yes

Circularity Check

0 steps flagged

No significant circularity

full rationale

The paper proposes a new transformer architecture (Trinity) for joint semantic and class-agnostic terrain segmentation, along with new synthetic (RUGDSynth) and real (EXTerra) datasets. No equations, fitted parameters, predictions, or derivations are present in the provided text. The formulation of terrain segmentation via visual appearance is presented as a definitional choice enabling robot-agnostic priors, not as a result derived from prior fitted quantities or self-citations. The work is self-contained as an architectural and data contribution with no load-bearing steps that reduce to inputs by construction.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only review yields no explicit free parameters, axioms, or invented entities beyond the model name and new dataset names; no mathematical derivations or fitting procedures are described.

pith-pipeline@v0.9.1-grok · 5804 in / 1103 out tokens · 48405 ms · 2026-06-29T16:48:56.578396+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

41 extracted references · 4 canonical work pages · 2 internal anchors

  1. [1]

    Au- tonomous vehicle perception: The technology of today and tomorrow,

    J. Van Brummelen, M. O’Brien, D. Gruyer, and H. Najjaran, “Au- tonomous vehicle perception: The technology of today and tomorrow,” Transportation Research Part C: Emerging Technologies, 2018

  2. [2]

    AI4MARS: A Dataset for Terrain-Aware Autonomous Driving on Mars,

    R. M. Swan, D. Atha, H. A. Leopold, M. Gildner, S. Oij, C. Chiu, and M. Ono, “AI4MARS: A Dataset for Terrain-Aware Autonomous Driving on Mars,” inIEEE/CVF Conf. on Computer Vision and Pattern Recognition Workshops (CVPRW), 2021

  3. [3]

    Fast traversability estimation for wild visual navigation,

    J. Frey, M. Mattamala, N. Chebrolu, C. Cadena, M. Fallon, and M. Hutter, “Fast traversability estimation for wild visual navigation,” Proc. of Robotics: Science and Systems (RSS), 2023

  4. [4]

    V-STRONG: Visual Self-Supervised Traversability Learning for Off-road Naviga- tion,

    S. Jung, J. Lee, X. Meng, B. Boots, and A. Lambert, “V-STRONG: Visual Self-Supervised Traversability Learning for Off-road Naviga- tion,” inProc. of the IEEE Int. Conf. on Robotics and Automation (ICRA), 2024

  5. [5]

    Segment anything,

    A. Kirillov, E. Mintun, N. Ravi, H. Mao, C. Rolland, L. Gustafson, T. Xiao, S. Whitehead, A. C. Berg, W.-Y . Loet al., “Segment anything,” inProc. of the IEEE/CVF Int. Conf. on Computer Vision (ICCV), 2023

  6. [6]

    A Photorealistic Terrain Simulation Pipeline for Unstruc- tured Outdoor Environments,

    M. G. M ¨uller, M. Durner, A. Gawel, W. St ¨urzl, R. Triebel, and R. Siegwart, “A Photorealistic Terrain Simulation Pipeline for Unstruc- tured Outdoor Environments,” inProc. of the IEEE/RSJ Int. Conf. on Intelligent Robots and Systems (IROS), Sep. 2021

  7. [7]

    A RUGD Dataset for Autonomous Navigation and Visual Perception in Unstructured Outdoor Environments,

    M. Wigness, S. Eum, J. G. Rogers, D. Han, and H. Kwon, “A RUGD Dataset for Autonomous Navigation and Visual Perception in Unstructured Outdoor Environments,” inProc. of the IEEE/RSJ Int. Conf. on Intelligent Robots and Systems (IROS), 2019

  8. [8]

    Semantic Image Segmenta- tion: Two Decades of Research,

    G. Csurka, R. V olpi, and B. Chidlovskii, “Semantic Image Segmenta- tion: Two Decades of Research,”Foundations and Trends in Computer Graphics and Vision, 2022

  9. [9]

    Semantic segmentation using Vision Transformers: A survey,

    H. Thisanke, C. Deshan, K. Chamith, S. Seneviratne, R. Vi- danaarachchi, and D. Herath, “Semantic segmentation using Vision Transformers: A survey,”Engineering Applications of Artificial Intel- ligence, Nov. 2023

  10. [10]

    Autonomous Rock Instance Segmenta- tion for Extra-Terrestrial Robotic Missions,

    M. Durner, W. Boerdijk, Y . Fanger, R. Sakagami, D. L. Risch, R. Triebel, and A. Wedler, “Autonomous Rock Instance Segmenta- tion for Extra-Terrestrial Robotic Missions,” inProc. of the IEEE Aerospace Conf., 2023

  11. [11]

    RockFormer: A U-Shaped Transformer Network for Martian Rock Segmentation,

    H. Liu, M. Yao, X. Xiao, and Y . Xiong, “RockFormer: A U-Shaped Transformer Network for Martian Rock Segmentation,”IEEE Trans. on Geoscience and Remote Sensing, 2023

  12. [12]

    DeepTerramechanics: Terrain Classification and Slip Estimation for Ground Robots via Deep Learning

    R. Gonzalez and K. Iagnemma, “Deepterramechanics: Terrain classifi- cation and slip estimation for ground robots via deep learning,”arXiv preprint arXiv:1806.07379, 2018

  13. [13]

    OmniUnet: A Multimodal Network for Unstructured Terrain Segmentation on Planetary Rovers Using RGB, Depth, and Thermal Imagery,

    R. Castilla-Arquillo, C. Perez-del Pulgar, L. Gerdes, A. Garcia-Cerezo, and M. A. Olivares-Mendez, “OmniUnet: A Multimodal Network for Unstructured Terrain Segmentation on Planetary Rovers Using RGB, Depth, and Thermal Imagery,”arXiv preprint arXiv:2508.00580, 2025. TABLE IV EVALUATION ONEXTERRADATASET Model cs ca mIoU mIoU mPre mRec SAM[5]35.0610.0651.73...

  14. [14]

    Mars Terrain Segmentation with Less Labels,

    E. Goh, J. Chen, and B. Wilson, “Mars Terrain Segmentation with Less Labels,” inProc. of the IEEE Aerospace Conf., 2022

  15. [15]

    S5Mars: Semi- Supervised Learning for Mars Semantic Segmentation,

    J. Zhang, L. Lin, Z. Fan, W. Wang, and J. Liu, “S5Mars: Semi- Supervised Learning for Mars Semantic Segmentation,”IEEE Trans. on Geoscience and Remote Sensing, 2024

  16. [16]

    MTSNet: Joint Feature Adaptation and Enhancement for Text-Guided Multi-view Martian Terrain Segmentation,

    Y . Fang, X. Rao, X. Gao, W. Li, and Z. Min, “MTSNet: Joint Feature Adaptation and Enhancement for Text-Guided Multi-view Martian Terrain Segmentation,” inProc. of the ACM Int. Conf. on Multimedia, Melbourne VIC Australia, 2024

  17. [17]

    CRLNet: Cascaded Resolution Learning Network for Natural Scenes Segmen- tation,

    W. Li, S. Tian, G. Hua, M. Liao, Y . Zhang, and W. Zou, “CRLNet: Cascaded Resolution Learning Network for Natural Scenes Segmen- tation,”IEEE Intelligent Systems, 2026

  18. [18]

    RELLIS-3D Dataset: Data, Benchmarks and Analysis,

    P. Jiang, P. Osteen, M. Wigness, and S. Saripalli, “RELLIS-3D Dataset: Data, Benchmarks and Analysis,” inProc. of the IEEE Int. Conf. on Robotics and Automation (ICRA), 2021

  19. [19]

    GA-Nav: Efficient Terrain Segmen- tation for Robot Navigation in Unstructured Outdoor Environments,

    T. Guan, D. Kothandaraman, R. Chandra, A. J. Sathyamoorthy, K. Weerakoon, and D. Manocha, “GA-Nav: Efficient Terrain Segmen- tation for Robot Navigation in Unstructured Outdoor Environments,” IEEE Robotics and Automation Letters, 2022

  20. [20]

    Contextual-aware terrain segmentation network for navigable areas with triple aggregation,

    W. Li, M. Liao, and W. Zou, “Contextual-aware terrain segmentation network for navigable areas with triple aggregation,”Expert Systems with Applications, 2025

  21. [21]

    Temporally Consistent Unsupervised Segmentation for Mobile Robot Perception,

    C. Ellis, M. Wigness, C. Lennon, and L. Fiondella, “Temporally Consistent Unsupervised Segmentation for Mobile Robot Perception,” arXiv preprint arXiv:2507.22194, 2025

  22. [22]

    Geometric and visual terrain classification for autonomous mobile navigation,

    F. Schilling, X. Chen, J. Folkesson, and P. Jensfelt, “Geometric and visual terrain classification for autonomous mobile navigation,” in Proc. of the IEEE/RSJ Int. Conf. on Intelligent Robots and Systems (IROS), 2017

  23. [23]

    Learning Ground Traversability From Simulations,

    R. O. Chavez-Garcia, J. Guzzi, L. M. Gambardella, and A. Giusti, “Learning Ground Traversability From Simulations,”IEEE Robotics and Automation Letters, 2018

  24. [24]

    Real-time Optimal Navigation Planning Using Learned Motion Costs,

    B. Yang, L. Wellhausen, T. Miki, M. Liu, and M. Hutter, “Real-time Optimal Navigation Planning Using Learned Motion Costs,” inProc. of the IEEE Int. Conf. on Robotics and Automation (ICRA), 2021

  25. [25]

    Semantic Terrain Classification for Off-Road Autonomous Driving,

    A. Shaban, X. Meng, J. Lee, B. Boots, and D. Fox, “Semantic Terrain Classification for Off-Road Autonomous Driving,” inProc. of the Conf. on Robot Learning (CORL), 2022

  26. [26]

    BADGR: An Autonomous Self- Supervised Learning-Based Navigation System,

    G. Kahn, P. Abbeel, and S. Levine, “BADGR: An Autonomous Self- Supervised Learning-Based Navigation System,”IEEE Robotics and Automation Letters, 2021

  27. [27]

    Where Should I Walk? Predicting Terrain Properties From Images Via Self-Supervised Learning,

    L. Wellhausen, A. Dosovitskiy, R. Ranftl, K. Walas, C. Cadena, and M. Hutter, “Where Should I Walk? Predicting Terrain Properties From Images Via Self-Supervised Learning,”IEEE Robotics and Automation Letters, 2019

  28. [28]

    WayFAST: Navigation With Predictive Traversability in the Field,

    M. V . Gasparino, A. N. Sivakumar, Y . Liu, A. E. B. Velasquez, V . A. H. Higuti, J. Rogers, H. Tran, and G. Chowdhary, “WayFAST: Navigation With Predictive Traversability in the Field,”IEEE Robotics and Automation Letters, 2022

  29. [29]

    TerraPN: Unstructured Terrain Navigation using Online Self-Supervised Learning,

    A. J. Sathyamoorthy, K. Weerakoon, T. Guan, J. Liang, and D. Manocha, “TerraPN: Unstructured Terrain Navigation using Online Self-Supervised Learning,” inProc. of the IEEE/RSJ Int. Conf. on Intelligent Robots and Systems (IROS), 2022

  30. [30]

    Bayesian Nonparametric Submodular Video Partition for Robust Anomaly Detection,

    H. Sapkota and Q. Yu, “Bayesian Nonparametric Submodular Video Partition for Robust Anomaly Detection,” inConf. on Computer Vision and Pattern Recognition, 2022

  31. [31]

    Uncertainty Estimation for Planetary Robotic Terrain Segmentation,

    M. G. M ¨uller, M. Durner, W. Boerdijk, H. Blum, A. Gawel, W. St¨urzl, R. Siegwart, and R. Triebel, “Uncertainty Estimation for Planetary Robotic Terrain Segmentation,” inProc. of the IEEE Aerospace Conf., 2023

  32. [32]

    Entropy Maximization and Meta Classification for Out-of-Distribution Detection in Semantic Segmentation,

    R. Chan, M. Rottmann, and H. Gottschalk, “Entropy Maximization and Meta Classification for Out-of-Distribution Detection in Semantic Segmentation,” inProc. of the IEEE/CVF Int. Conf. on Computer Vision (ICCV), 2021

  33. [33]

    Simultaneous Seman- tic Segmentation and Outlier Detection in Presence of Domain Shift,

    P. Bevandi ´c, I. Kreˇso, M. Orˇsi´c, and S. ˇSegvi´c, “Simultaneous Seman- tic Segmentation and Outlier Detection in Presence of Domain Shift,” inPattern Recognition, 2019

  34. [34]

    Detecting Road Obstacles by Erasing Them,

    K. Lis, S. Honari, P. Fua, and M. Salzmann, “Detecting Road Obstacles by Erasing Them,” inIEEE Trans. on Pattern Analysis and Machine Intelligence, 2024

  35. [35]

    OmniAL: A Unified CNN Framework for Unsupervised Anomaly Localization,

    Y . Zhao, “OmniAL: A Unified CNN Framework for Unsupervised Anomaly Localization,” inConf. on Computer Vision and Pattern Recognition, 2023

  36. [36]

    Uninformed Students: Student-Teacher Anomaly Detection With Discriminative Latent Embeddings,

    P. Bergmann, M. Fauser, D. Sattlegger, and C. Steger, “Uninformed Students: Student-Teacher Anomaly Detection With Discriminative Latent Embeddings,” inConf. on Computer Vision and Pattern Recog- nition, 2020

  37. [37]

    DeSTSeg: Segmentation Guided Denoising Student-Teacher for Anomaly Detec- tion,

    X. Zhang, S. Li, X. Li, P. Huang, J. Shan, and T. Chen, “DeSTSeg: Segmentation Guided Denoising Student-Teacher for Anomaly Detec- tion,” inConf. on Computer Vision and Pattern Recognition, 2023

  38. [38]

    Multi-Scale Patch-Based Rep- resentation Learning for Image Anomaly Detection and Segmenta- tion,

    C.-C. Tsai, T.-H. Wu, and S.-H. Lai, “Multi-Scale Patch-Based Rep- resentation Learning for Image Anomaly Detection and Segmenta- tion,” inIEEE/CVF Winter Conf. on Applications of Computer Vision (WACV), 2022

  39. [39]

    Open- World Semantic Segmentation Including Class Similarity,

    M. Sodano, F. Magistri, L. Nunes, J. Behley, and C. Stachniss, “Open- World Semantic Segmentation Including Class Similarity,” inConf. on Computer Vision and Pattern Recognition, 2024

  40. [40]

    SCIM: Simultaneous Clustering, Inference, and Mapping for Open- World Semantic Scene Understanding,

    H. Blum, M. G. M ¨uller, A. Gawel, R. Siegwart, and C. Cadena, “SCIM: Simultaneous Clustering, Inference, and Mapping for Open- World Semantic Scene Understanding,” inRobotics Research, 2023

  41. [41]

    DINOv2: Learning Robust Visual Features without Supervision

    M. Oquab, T. Darcet, T. Moutakanni, H. V o, M. Szafraniec, V . Khali- dov, P. Fernandez, D. Haziza, F. Massa, A. El-Noubyet al., “Dinov2: Learning robust visual features without supervision,”arXiv preprint arXiv:2304.07193, 2023