Trinity: Unifying Class-Agnostic Terrain and Semantic Segmentation for Unstructured Outdoor Environments by Leveraging Synthetic Data

Abel Gawel; Marcus G M\"uller; Maximilian Durner; Riccardo Giubilato; Roland Siegwart; Rudolph Triebel; Wolfgang St\"urzl; Wout Boerdijk

arxiv: 2605.27644 · v1 · pith:G3WDBEMBnew · submitted 2026-05-26 · 💻 cs.RO · cs.AI· cs.LG

Trinity: Unifying Class-Agnostic Terrain and Semantic Segmentation for Unstructured Outdoor Environments by Leveraging Synthetic Data

Marcus G M\"uller , Wout Boerdijk , Maximilian Durner , Riccardo Giubilato , Abel Gawel , Wolfgang St\"urzl , Roland Siegwart , Rudolph Triebel This is my paper

Pith reviewed 2026-06-29 16:48 UTC · model grok-4.3

classification 💻 cs.RO cs.AIcs.LG

keywords class-agnostic terrain segmentationsemantic segmentationtransformer networksynthetic dataunstructured outdoor environmentsmobile roboticstraversability estimation

0 comments

The pith

A single transformer learns both semantic classes and unlabeled terrain regions from visual appearance alone.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper proposes Trinity, a transformer-based network that performs semantic segmentation and class-agnostic terrain segmentation in one model. Terrain is defined purely by visual appearance, without any robot-specific traversability scores or fixed class labels. This produces robot-agnostic terrain priors that downstream tasks can later combine with platform-specific data. The authors support training with a new synthetic dataset generated in an extended simulator and a real-world dataset with dual annotations. Experiments show the joint approach is feasible for complex outdoor scenes.

Core claim

Trinity is a unified transformer architecture that jointly outputs class-specific semantic segmentation and class-agnostic terrain segmentation. Terrain segmentation relies solely on visual appearance, without predefined semantic labels or robot-dependent traversability scores. This formulation produces robot-agnostic visual terrain priors usable for tasks such as traversability estimation, visual odometry, and mission planning. Training is enabled by the RUGDSynth synthetic dataset and the EXTerra real-world dataset that supplies both label types.

What carries the argument

The Trinity transformer network that shares a backbone to produce both semantic class maps and class-agnostic terrain maps in a single forward pass.

If this is right

Terrain segmentation becomes independent of any particular robot's capabilities or annotation scheme.
The same visual terrain priors can be reused across different platforms without new labeling.
Synthetic data generation at scale becomes practical for covering diverse terrain appearances.
Joint training of the two tasks improves feature sharing inside the network for outdoor scenes.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The separation of terrain appearance from robot-specific scoring could simplify transfer when a robot's wheels, mass, or sensors change.
Class-agnostic terrain maps might serve as an additional input channel for other perception modules such as visual odometry.
The dual-annotation real dataset could become a benchmark for testing how well appearance-based terrain generalizes beyond the training environments.

Load-bearing premise

Visual appearance by itself is enough to define terrain regions that remain useful when later paired with any robot's own experience.

What would settle it

A controlled test in which terrain maps produced by the class-agnostic head, when fused with robot-specific data, yield no improvement or a measurable drop in traversability estimation accuracy relative to semantic segmentation alone.

Figures

Figures reproduced from arXiv: 2605.27644 by Abel Gawel, Marcus G M\"uller, Maximilian Durner, Riccardo Giubilato, Roland Siegwart, Rudolph Triebel, Wolfgang St\"urzl, Wout Boerdijk.

**Figure 2.** Figure 2: The figure provides an overview of Trinity-Net. The model consists of an image encoder and three main components: the Split Transformer, the [PITH_FULL_IMAGE:figures/full_fig_p003_2.png] view at source ↗

**Figure 3.** Figure 3: Sample images from RUGDSynth overlaid with the corresponding annotations. Terrain regions are randomly colored in shades of cyan. [PITH_FULL_IMAGE:figures/full_fig_p004_3.png] view at source ↗

**Figure 4.** Figure 4: Illustration of the residual prediction calculation. All region proposal [PITH_FULL_IMAGE:figures/full_fig_p005_4.png] view at source ↗

**Figure 5.** Figure 5: Image of the planetary exploration outdoor laboratory and example samples from the EXTerra dataset with partially overlaid annotations and the [PITH_FULL_IMAGE:figures/full_fig_p006_5.png] view at source ↗

**Figure 6.** Figure 6: Selected qualitative results for the RUGD dataset. Class-agnostic regions are colored in shades of cyan. Note that the cyan colors used for the [PITH_FULL_IMAGE:figures/full_fig_p007_6.png] view at source ↗

**Figure 7.** Figure 7: Selected qualitative results for the EXTerra dataset. The coloring scheme follows that of Fig. [PITH_FULL_IMAGE:figures/full_fig_p007_7.png] view at source ↗

read the original abstract

Terrain understanding is fundamental for mobile robots operating in unstructured outdoor environments. Existing vision-based traversability estimation methods rely on robot-specific annotations or semantic class mappings, limiting transferability across platforms and requiring costly re-annotation when robot capabilities change, while standard semantic segmentation methods only focus on specific predefined classes, which do not capture the variety of terrains. In this work, we propose a transformer-based architecture that jointly performs class-specific semantic segmentation and class-agnostic terrain segmentation within a unified network, called Trinity. Terrain regions are segmented based solely on visual appearance, without predefined semantic labels or robot-dependent traversability scores. This formulation enables the learning of robot-agnostic visual terrain priors that can be combined with robot-specific experience for downstream tasks such as traversability estimation, visual odometry, and mission planning. To enable large-scale training with diverse terrain appearances, we extend the OAISYS simulator and introduce RUGDSynth, a synthetic dataset inspired by RUGD with class-agnostic terrain samples. Furthermore, we present the EXTerra Dataset, providing real-world images annotated with both class-specific and class-agnostic terrain labels. Experiments demonstrate the feasibility of the proposed task and the effectiveness of our joint segmentation approach in complex outdoor environments. Code and datasets will be released with this publication (after review).

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Trinity proposes a single transformer for joint semantic segmentation and class-agnostic terrain segmentation from visual appearance alone, backed by new synthetic and real datasets, but the abstract gives no numbers to judge whether the joint setup actually improves anything.

read the letter

The core idea is a transformer network called Trinity that outputs both standard semantic labels and a separate terrain map based only on how things look, without tying the terrain part to any robot's speed, size, or sensors. They generate RUGDSynth from an extended OAISYS simulator and release the EXTerra real-world set with both label types to train it.

This addresses a genuine pain point: most outdoor traversability work either uses fixed classes or robot-specific scores, so moving to a new platform requires fresh annotations. Separating terrain into pure appearance priors is a reasonable way to make the output more reusable across machines.

The datasets are the clearest contribution. Extending an existing simulator for class-agnostic terrain samples and collecting matching real images with dual annotations is concrete work that others can build on. The architecture itself is a standard transformer with two heads, which keeps the claim modest.

The main gap is evidence. The abstract states that experiments show the joint approach works in complex environments, yet supplies no accuracy figures, no comparison to separate networks, and no test of whether the terrain output actually helps a different robot. Without those details it is hard to know if the unification buys anything beyond what two independent models would achieve.

The motivating claim that these visual priors will combine cleanly with later robot-specific data is left as a possibility rather than a result. That is fine for a methods paper, but it means the transfer benefit remains unshown.

This is aimed at robotics researchers who handle unstructured outdoor scenes and need datasets or a starting architecture for multi-task segmentation. A reader already working on terrain understanding would find the data release useful even if the joint-training gains need more proof.

I would send it for peer review. The problem is real, the data effort is tangible, and the formulation is coherent on its own terms; the missing quantitative checks are exactly what referees can ask for.

Referee Report

2 major / 3 minor

Summary. The paper claims to introduce a transformer-based architecture called Trinity that unifies class-specific semantic segmentation and class-agnostic terrain segmentation for outdoor robot environments. By using synthetic data from an extended OAISYS simulator (RUGDSynth) and a new real dataset (EXTerra), the model learns terrain segmentation based on visual appearance without semantic labels or robot-specific scores. This is intended to provide robot-agnostic priors for downstream tasks. Experiments are presented to show the feasibility of the joint task and the effectiveness of the approach.

Significance. If the results hold, this work could have significant impact by enabling more generalizable terrain understanding for mobile robots in unstructured environments, addressing the limitations of robot-specific annotations. The creation and release of the RUGDSynth and EXTerra datasets, along with the code, would be a substantial contribution to the robotics community. The class-agnostic approach is a promising way to handle terrain variety beyond predefined classes.

major comments (2)

[Experiments] The claim of effectiveness of the joint segmentation approach requires supporting quantitative evidence. Please provide specific metrics such as mean Intersection over Union (mIoU) for both semantic and terrain segmentation tasks, along with comparisons to baseline models trained separately on each task and ablations on the use of synthetic data.
[Method] §3: The description of the Trinity architecture should detail the loss functions used for the joint training and how the class-agnostic terrain labels are incorporated, as this is central to validating the unified network's performance.

minor comments (3)

[Abstract] The abstract mentions 'complex outdoor environments' but could specify the types of terrains or conditions tested for better context.
[Introduction] Ensure that the motivation for robot-agnostic priors is clearly distinguished from the demonstrated results, as the combination with robot-specific experience is presented as a future possibility.
[Datasets] Provide more details on the annotation process for class-agnostic terrain labels in EXTerra to allow readers to understand the visual appearance criteria used.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive comments on our manuscript. We will revise the paper to address the requests for additional quantitative evidence and expanded methodological details.

read point-by-point responses

Referee: [Experiments] The claim of effectiveness of the joint segmentation approach requires supporting quantitative evidence. Please provide specific metrics such as mean Intersection over Union (mIoU) for both semantic and terrain segmentation tasks, along with comparisons to baseline models trained separately on each task and ablations on the use of synthetic data.

Authors: We agree that the current presentation of results would be strengthened by explicit quantitative metrics. In the revised manuscript we will report mIoU for both the semantic segmentation and class-agnostic terrain segmentation tasks on EXTerra, include direct comparisons against models trained separately on each task, and add ablation studies that isolate the contribution of the RUGDSynth synthetic data. revision: yes
Referee: [Method] §3: The description of the Trinity architecture should detail the loss functions used for the joint training and how the class-agnostic terrain labels are incorporated, as this is central to validating the unified network's performance.

Authors: We will expand Section 3 to specify the loss functions employed during joint training and to clarify how the class-agnostic terrain labels are generated and combined with the semantic labels within the shared network. revision: yes

Circularity Check

0 steps flagged

No significant circularity

full rationale

The paper proposes a new transformer architecture (Trinity) for joint semantic and class-agnostic terrain segmentation, along with new synthetic (RUGDSynth) and real (EXTerra) datasets. No equations, fitted parameters, predictions, or derivations are present in the provided text. The formulation of terrain segmentation via visual appearance is presented as a definitional choice enabling robot-agnostic priors, not as a result derived from prior fitted quantities or self-citations. The work is self-contained as an architectural and data contribution with no load-bearing steps that reduce to inputs by construction.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only review yields no explicit free parameters, axioms, or invented entities beyond the model name and new dataset names; no mathematical derivations or fitting procedures are described.

pith-pipeline@v0.9.1-grok · 5804 in / 1103 out tokens · 48405 ms · 2026-06-29T16:48:56.578396+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

41 extracted references · 4 canonical work pages · 2 internal anchors

[1]

Au- tonomous vehicle perception: The technology of today and tomorrow,

J. Van Brummelen, M. O’Brien, D. Gruyer, and H. Najjaran, “Au- tonomous vehicle perception: The technology of today and tomorrow,” Transportation Research Part C: Emerging Technologies, 2018

2018
[2]

AI4MARS: A Dataset for Terrain-Aware Autonomous Driving on Mars,

R. M. Swan, D. Atha, H. A. Leopold, M. Gildner, S. Oij, C. Chiu, and M. Ono, “AI4MARS: A Dataset for Terrain-Aware Autonomous Driving on Mars,” inIEEE/CVF Conf. on Computer Vision and Pattern Recognition Workshops (CVPRW), 2021

2021
[3]

Fast traversability estimation for wild visual navigation,

J. Frey, M. Mattamala, N. Chebrolu, C. Cadena, M. Fallon, and M. Hutter, “Fast traversability estimation for wild visual navigation,” Proc. of Robotics: Science and Systems (RSS), 2023

2023
[4]

V-STRONG: Visual Self-Supervised Traversability Learning for Off-road Naviga- tion,

S. Jung, J. Lee, X. Meng, B. Boots, and A. Lambert, “V-STRONG: Visual Self-Supervised Traversability Learning for Off-road Naviga- tion,” inProc. of the IEEE Int. Conf. on Robotics and Automation (ICRA), 2024

2024
[5]

Segment anything,

A. Kirillov, E. Mintun, N. Ravi, H. Mao, C. Rolland, L. Gustafson, T. Xiao, S. Whitehead, A. C. Berg, W.-Y . Loet al., “Segment anything,” inProc. of the IEEE/CVF Int. Conf. on Computer Vision (ICCV), 2023

2023
[6]

A Photorealistic Terrain Simulation Pipeline for Unstruc- tured Outdoor Environments,

M. G. M ¨uller, M. Durner, A. Gawel, W. St ¨urzl, R. Triebel, and R. Siegwart, “A Photorealistic Terrain Simulation Pipeline for Unstruc- tured Outdoor Environments,” inProc. of the IEEE/RSJ Int. Conf. on Intelligent Robots and Systems (IROS), Sep. 2021

2021
[7]

A RUGD Dataset for Autonomous Navigation and Visual Perception in Unstructured Outdoor Environments,

M. Wigness, S. Eum, J. G. Rogers, D. Han, and H. Kwon, “A RUGD Dataset for Autonomous Navigation and Visual Perception in Unstructured Outdoor Environments,” inProc. of the IEEE/RSJ Int. Conf. on Intelligent Robots and Systems (IROS), 2019

2019
[8]

Semantic Image Segmenta- tion: Two Decades of Research,

G. Csurka, R. V olpi, and B. Chidlovskii, “Semantic Image Segmenta- tion: Two Decades of Research,”Foundations and Trends in Computer Graphics and Vision, 2022

2022
[9]

Semantic segmentation using Vision Transformers: A survey,

H. Thisanke, C. Deshan, K. Chamith, S. Seneviratne, R. Vi- danaarachchi, and D. Herath, “Semantic segmentation using Vision Transformers: A survey,”Engineering Applications of Artificial Intel- ligence, Nov. 2023

2023
[10]

Autonomous Rock Instance Segmenta- tion for Extra-Terrestrial Robotic Missions,

M. Durner, W. Boerdijk, Y . Fanger, R. Sakagami, D. L. Risch, R. Triebel, and A. Wedler, “Autonomous Rock Instance Segmenta- tion for Extra-Terrestrial Robotic Missions,” inProc. of the IEEE Aerospace Conf., 2023

2023
[11]

RockFormer: A U-Shaped Transformer Network for Martian Rock Segmentation,

H. Liu, M. Yao, X. Xiao, and Y . Xiong, “RockFormer: A U-Shaped Transformer Network for Martian Rock Segmentation,”IEEE Trans. on Geoscience and Remote Sensing, 2023

2023
[12]

DeepTerramechanics: Terrain Classification and Slip Estimation for Ground Robots via Deep Learning

R. Gonzalez and K. Iagnemma, “Deepterramechanics: Terrain classifi- cation and slip estimation for ground robots via deep learning,”arXiv preprint arXiv:1806.07379, 2018

work page internal anchor Pith review Pith/arXiv arXiv 2018
[13]

OmniUnet: A Multimodal Network for Unstructured Terrain Segmentation on Planetary Rovers Using RGB, Depth, and Thermal Imagery,

R. Castilla-Arquillo, C. Perez-del Pulgar, L. Gerdes, A. Garcia-Cerezo, and M. A. Olivares-Mendez, “OmniUnet: A Multimodal Network for Unstructured Terrain Segmentation on Planetary Rovers Using RGB, Depth, and Thermal Imagery,”arXiv preprint arXiv:2508.00580, 2025. TABLE IV EVALUATION ONEXTERRADATASET Model cs ca mIoU mIoU mPre mRec SAM[5]35.0610.0651.73...

work page arXiv 2025
[14]

Mars Terrain Segmentation with Less Labels,

E. Goh, J. Chen, and B. Wilson, “Mars Terrain Segmentation with Less Labels,” inProc. of the IEEE Aerospace Conf., 2022

2022
[15]

S5Mars: Semi- Supervised Learning for Mars Semantic Segmentation,

J. Zhang, L. Lin, Z. Fan, W. Wang, and J. Liu, “S5Mars: Semi- Supervised Learning for Mars Semantic Segmentation,”IEEE Trans. on Geoscience and Remote Sensing, 2024

2024
[16]

MTSNet: Joint Feature Adaptation and Enhancement for Text-Guided Multi-view Martian Terrain Segmentation,

Y . Fang, X. Rao, X. Gao, W. Li, and Z. Min, “MTSNet: Joint Feature Adaptation and Enhancement for Text-Guided Multi-view Martian Terrain Segmentation,” inProc. of the ACM Int. Conf. on Multimedia, Melbourne VIC Australia, 2024

2024
[17]

CRLNet: Cascaded Resolution Learning Network for Natural Scenes Segmen- tation,

W. Li, S. Tian, G. Hua, M. Liao, Y . Zhang, and W. Zou, “CRLNet: Cascaded Resolution Learning Network for Natural Scenes Segmen- tation,”IEEE Intelligent Systems, 2026

2026
[18]

RELLIS-3D Dataset: Data, Benchmarks and Analysis,

P. Jiang, P. Osteen, M. Wigness, and S. Saripalli, “RELLIS-3D Dataset: Data, Benchmarks and Analysis,” inProc. of the IEEE Int. Conf. on Robotics and Automation (ICRA), 2021

2021
[19]

GA-Nav: Efficient Terrain Segmen- tation for Robot Navigation in Unstructured Outdoor Environments,

T. Guan, D. Kothandaraman, R. Chandra, A. J. Sathyamoorthy, K. Weerakoon, and D. Manocha, “GA-Nav: Efficient Terrain Segmen- tation for Robot Navigation in Unstructured Outdoor Environments,” IEEE Robotics and Automation Letters, 2022

2022
[20]

Contextual-aware terrain segmentation network for navigable areas with triple aggregation,

W. Li, M. Liao, and W. Zou, “Contextual-aware terrain segmentation network for navigable areas with triple aggregation,”Expert Systems with Applications, 2025

2025
[21]

Temporally Consistent Unsupervised Segmentation for Mobile Robot Perception,

C. Ellis, M. Wigness, C. Lennon, and L. Fiondella, “Temporally Consistent Unsupervised Segmentation for Mobile Robot Perception,” arXiv preprint arXiv:2507.22194, 2025

work page arXiv 2025
[22]

Geometric and visual terrain classification for autonomous mobile navigation,

F. Schilling, X. Chen, J. Folkesson, and P. Jensfelt, “Geometric and visual terrain classification for autonomous mobile navigation,” in Proc. of the IEEE/RSJ Int. Conf. on Intelligent Robots and Systems (IROS), 2017

2017
[23]

Learning Ground Traversability From Simulations,

R. O. Chavez-Garcia, J. Guzzi, L. M. Gambardella, and A. Giusti, “Learning Ground Traversability From Simulations,”IEEE Robotics and Automation Letters, 2018

2018
[24]

Real-time Optimal Navigation Planning Using Learned Motion Costs,

B. Yang, L. Wellhausen, T. Miki, M. Liu, and M. Hutter, “Real-time Optimal Navigation Planning Using Learned Motion Costs,” inProc. of the IEEE Int. Conf. on Robotics and Automation (ICRA), 2021

2021
[25]

Semantic Terrain Classification for Off-Road Autonomous Driving,

A. Shaban, X. Meng, J. Lee, B. Boots, and D. Fox, “Semantic Terrain Classification for Off-Road Autonomous Driving,” inProc. of the Conf. on Robot Learning (CORL), 2022

2022
[26]

BADGR: An Autonomous Self- Supervised Learning-Based Navigation System,

G. Kahn, P. Abbeel, and S. Levine, “BADGR: An Autonomous Self- Supervised Learning-Based Navigation System,”IEEE Robotics and Automation Letters, 2021

2021
[27]

Where Should I Walk? Predicting Terrain Properties From Images Via Self-Supervised Learning,

L. Wellhausen, A. Dosovitskiy, R. Ranftl, K. Walas, C. Cadena, and M. Hutter, “Where Should I Walk? Predicting Terrain Properties From Images Via Self-Supervised Learning,”IEEE Robotics and Automation Letters, 2019

2019
[28]

WayFAST: Navigation With Predictive Traversability in the Field,

M. V . Gasparino, A. N. Sivakumar, Y . Liu, A. E. B. Velasquez, V . A. H. Higuti, J. Rogers, H. Tran, and G. Chowdhary, “WayFAST: Navigation With Predictive Traversability in the Field,”IEEE Robotics and Automation Letters, 2022

2022
[29]

TerraPN: Unstructured Terrain Navigation using Online Self-Supervised Learning,

A. J. Sathyamoorthy, K. Weerakoon, T. Guan, J. Liang, and D. Manocha, “TerraPN: Unstructured Terrain Navigation using Online Self-Supervised Learning,” inProc. of the IEEE/RSJ Int. Conf. on Intelligent Robots and Systems (IROS), 2022

2022
[30]

Bayesian Nonparametric Submodular Video Partition for Robust Anomaly Detection,

H. Sapkota and Q. Yu, “Bayesian Nonparametric Submodular Video Partition for Robust Anomaly Detection,” inConf. on Computer Vision and Pattern Recognition, 2022

2022
[31]

Uncertainty Estimation for Planetary Robotic Terrain Segmentation,

M. G. M ¨uller, M. Durner, W. Boerdijk, H. Blum, A. Gawel, W. St¨urzl, R. Siegwart, and R. Triebel, “Uncertainty Estimation for Planetary Robotic Terrain Segmentation,” inProc. of the IEEE Aerospace Conf., 2023

2023
[32]

Entropy Maximization and Meta Classification for Out-of-Distribution Detection in Semantic Segmentation,

R. Chan, M. Rottmann, and H. Gottschalk, “Entropy Maximization and Meta Classification for Out-of-Distribution Detection in Semantic Segmentation,” inProc. of the IEEE/CVF Int. Conf. on Computer Vision (ICCV), 2021

2021
[33]

Simultaneous Seman- tic Segmentation and Outlier Detection in Presence of Domain Shift,

P. Bevandi ´c, I. Kreˇso, M. Orˇsi´c, and S. ˇSegvi´c, “Simultaneous Seman- tic Segmentation and Outlier Detection in Presence of Domain Shift,” inPattern Recognition, 2019

2019
[34]

Detecting Road Obstacles by Erasing Them,

K. Lis, S. Honari, P. Fua, and M. Salzmann, “Detecting Road Obstacles by Erasing Them,” inIEEE Trans. on Pattern Analysis and Machine Intelligence, 2024

2024
[35]

OmniAL: A Unified CNN Framework for Unsupervised Anomaly Localization,

Y . Zhao, “OmniAL: A Unified CNN Framework for Unsupervised Anomaly Localization,” inConf. on Computer Vision and Pattern Recognition, 2023

2023
[36]

Uninformed Students: Student-Teacher Anomaly Detection With Discriminative Latent Embeddings,

P. Bergmann, M. Fauser, D. Sattlegger, and C. Steger, “Uninformed Students: Student-Teacher Anomaly Detection With Discriminative Latent Embeddings,” inConf. on Computer Vision and Pattern Recog- nition, 2020

2020
[37]

DeSTSeg: Segmentation Guided Denoising Student-Teacher for Anomaly Detec- tion,

X. Zhang, S. Li, X. Li, P. Huang, J. Shan, and T. Chen, “DeSTSeg: Segmentation Guided Denoising Student-Teacher for Anomaly Detec- tion,” inConf. on Computer Vision and Pattern Recognition, 2023

2023
[38]

Multi-Scale Patch-Based Rep- resentation Learning for Image Anomaly Detection and Segmenta- tion,

C.-C. Tsai, T.-H. Wu, and S.-H. Lai, “Multi-Scale Patch-Based Rep- resentation Learning for Image Anomaly Detection and Segmenta- tion,” inIEEE/CVF Winter Conf. on Applications of Computer Vision (WACV), 2022

2022
[39]

Open- World Semantic Segmentation Including Class Similarity,

M. Sodano, F. Magistri, L. Nunes, J. Behley, and C. Stachniss, “Open- World Semantic Segmentation Including Class Similarity,” inConf. on Computer Vision and Pattern Recognition, 2024

2024
[40]

SCIM: Simultaneous Clustering, Inference, and Mapping for Open- World Semantic Scene Understanding,

H. Blum, M. G. M ¨uller, A. Gawel, R. Siegwart, and C. Cadena, “SCIM: Simultaneous Clustering, Inference, and Mapping for Open- World Semantic Scene Understanding,” inRobotics Research, 2023

2023
[41]

DINOv2: Learning Robust Visual Features without Supervision

M. Oquab, T. Darcet, T. Moutakanni, H. V o, M. Szafraniec, V . Khali- dov, P. Fernandez, D. Haziza, F. Massa, A. El-Noubyet al., “Dinov2: Learning robust visual features without supervision,”arXiv preprint arXiv:2304.07193, 2023

work page internal anchor Pith review Pith/arXiv arXiv 2023

[1] [1]

Au- tonomous vehicle perception: The technology of today and tomorrow,

J. Van Brummelen, M. O’Brien, D. Gruyer, and H. Najjaran, “Au- tonomous vehicle perception: The technology of today and tomorrow,” Transportation Research Part C: Emerging Technologies, 2018

2018

[2] [2]

AI4MARS: A Dataset for Terrain-Aware Autonomous Driving on Mars,

R. M. Swan, D. Atha, H. A. Leopold, M. Gildner, S. Oij, C. Chiu, and M. Ono, “AI4MARS: A Dataset for Terrain-Aware Autonomous Driving on Mars,” inIEEE/CVF Conf. on Computer Vision and Pattern Recognition Workshops (CVPRW), 2021

2021

[3] [3]

Fast traversability estimation for wild visual navigation,

J. Frey, M. Mattamala, N. Chebrolu, C. Cadena, M. Fallon, and M. Hutter, “Fast traversability estimation for wild visual navigation,” Proc. of Robotics: Science and Systems (RSS), 2023

2023

[4] [4]

V-STRONG: Visual Self-Supervised Traversability Learning for Off-road Naviga- tion,

S. Jung, J. Lee, X. Meng, B. Boots, and A. Lambert, “V-STRONG: Visual Self-Supervised Traversability Learning for Off-road Naviga- tion,” inProc. of the IEEE Int. Conf. on Robotics and Automation (ICRA), 2024

2024

[5] [5]

Segment anything,

A. Kirillov, E. Mintun, N. Ravi, H. Mao, C. Rolland, L. Gustafson, T. Xiao, S. Whitehead, A. C. Berg, W.-Y . Loet al., “Segment anything,” inProc. of the IEEE/CVF Int. Conf. on Computer Vision (ICCV), 2023

2023

[6] [6]

A Photorealistic Terrain Simulation Pipeline for Unstruc- tured Outdoor Environments,

M. G. M ¨uller, M. Durner, A. Gawel, W. St ¨urzl, R. Triebel, and R. Siegwart, “A Photorealistic Terrain Simulation Pipeline for Unstruc- tured Outdoor Environments,” inProc. of the IEEE/RSJ Int. Conf. on Intelligent Robots and Systems (IROS), Sep. 2021

2021

[7] [7]

A RUGD Dataset for Autonomous Navigation and Visual Perception in Unstructured Outdoor Environments,

M. Wigness, S. Eum, J. G. Rogers, D. Han, and H. Kwon, “A RUGD Dataset for Autonomous Navigation and Visual Perception in Unstructured Outdoor Environments,” inProc. of the IEEE/RSJ Int. Conf. on Intelligent Robots and Systems (IROS), 2019

2019

[8] [8]

Semantic Image Segmenta- tion: Two Decades of Research,

G. Csurka, R. V olpi, and B. Chidlovskii, “Semantic Image Segmenta- tion: Two Decades of Research,”Foundations and Trends in Computer Graphics and Vision, 2022

2022

[9] [9]

Semantic segmentation using Vision Transformers: A survey,

H. Thisanke, C. Deshan, K. Chamith, S. Seneviratne, R. Vi- danaarachchi, and D. Herath, “Semantic segmentation using Vision Transformers: A survey,”Engineering Applications of Artificial Intel- ligence, Nov. 2023

2023

[10] [10]

Autonomous Rock Instance Segmenta- tion for Extra-Terrestrial Robotic Missions,

M. Durner, W. Boerdijk, Y . Fanger, R. Sakagami, D. L. Risch, R. Triebel, and A. Wedler, “Autonomous Rock Instance Segmenta- tion for Extra-Terrestrial Robotic Missions,” inProc. of the IEEE Aerospace Conf., 2023

2023

[11] [11]

RockFormer: A U-Shaped Transformer Network for Martian Rock Segmentation,

H. Liu, M. Yao, X. Xiao, and Y . Xiong, “RockFormer: A U-Shaped Transformer Network for Martian Rock Segmentation,”IEEE Trans. on Geoscience and Remote Sensing, 2023

2023

[12] [12]

DeepTerramechanics: Terrain Classification and Slip Estimation for Ground Robots via Deep Learning

R. Gonzalez and K. Iagnemma, “Deepterramechanics: Terrain classifi- cation and slip estimation for ground robots via deep learning,”arXiv preprint arXiv:1806.07379, 2018

work page internal anchor Pith review Pith/arXiv arXiv 2018

[13] [13]

OmniUnet: A Multimodal Network for Unstructured Terrain Segmentation on Planetary Rovers Using RGB, Depth, and Thermal Imagery,

R. Castilla-Arquillo, C. Perez-del Pulgar, L. Gerdes, A. Garcia-Cerezo, and M. A. Olivares-Mendez, “OmniUnet: A Multimodal Network for Unstructured Terrain Segmentation on Planetary Rovers Using RGB, Depth, and Thermal Imagery,”arXiv preprint arXiv:2508.00580, 2025. TABLE IV EVALUATION ONEXTERRADATASET Model cs ca mIoU mIoU mPre mRec SAM[5]35.0610.0651.73...

work page arXiv 2025

[14] [14]

Mars Terrain Segmentation with Less Labels,

E. Goh, J. Chen, and B. Wilson, “Mars Terrain Segmentation with Less Labels,” inProc. of the IEEE Aerospace Conf., 2022

2022

[15] [15]

S5Mars: Semi- Supervised Learning for Mars Semantic Segmentation,

J. Zhang, L. Lin, Z. Fan, W. Wang, and J. Liu, “S5Mars: Semi- Supervised Learning for Mars Semantic Segmentation,”IEEE Trans. on Geoscience and Remote Sensing, 2024

2024

[16] [16]

MTSNet: Joint Feature Adaptation and Enhancement for Text-Guided Multi-view Martian Terrain Segmentation,

Y . Fang, X. Rao, X. Gao, W. Li, and Z. Min, “MTSNet: Joint Feature Adaptation and Enhancement for Text-Guided Multi-view Martian Terrain Segmentation,” inProc. of the ACM Int. Conf. on Multimedia, Melbourne VIC Australia, 2024

2024

[17] [17]

CRLNet: Cascaded Resolution Learning Network for Natural Scenes Segmen- tation,

W. Li, S. Tian, G. Hua, M. Liao, Y . Zhang, and W. Zou, “CRLNet: Cascaded Resolution Learning Network for Natural Scenes Segmen- tation,”IEEE Intelligent Systems, 2026

2026

[18] [18]

RELLIS-3D Dataset: Data, Benchmarks and Analysis,

P. Jiang, P. Osteen, M. Wigness, and S. Saripalli, “RELLIS-3D Dataset: Data, Benchmarks and Analysis,” inProc. of the IEEE Int. Conf. on Robotics and Automation (ICRA), 2021

2021

[19] [19]

GA-Nav: Efficient Terrain Segmen- tation for Robot Navigation in Unstructured Outdoor Environments,

T. Guan, D. Kothandaraman, R. Chandra, A. J. Sathyamoorthy, K. Weerakoon, and D. Manocha, “GA-Nav: Efficient Terrain Segmen- tation for Robot Navigation in Unstructured Outdoor Environments,” IEEE Robotics and Automation Letters, 2022

2022

[20] [20]

Contextual-aware terrain segmentation network for navigable areas with triple aggregation,

W. Li, M. Liao, and W. Zou, “Contextual-aware terrain segmentation network for navigable areas with triple aggregation,”Expert Systems with Applications, 2025

2025

[21] [21]

Temporally Consistent Unsupervised Segmentation for Mobile Robot Perception,

C. Ellis, M. Wigness, C. Lennon, and L. Fiondella, “Temporally Consistent Unsupervised Segmentation for Mobile Robot Perception,” arXiv preprint arXiv:2507.22194, 2025

work page arXiv 2025

[22] [22]

Geometric and visual terrain classification for autonomous mobile navigation,

F. Schilling, X. Chen, J. Folkesson, and P. Jensfelt, “Geometric and visual terrain classification for autonomous mobile navigation,” in Proc. of the IEEE/RSJ Int. Conf. on Intelligent Robots and Systems (IROS), 2017

2017

[23] [23]

Learning Ground Traversability From Simulations,

R. O. Chavez-Garcia, J. Guzzi, L. M. Gambardella, and A. Giusti, “Learning Ground Traversability From Simulations,”IEEE Robotics and Automation Letters, 2018

2018

[24] [24]

Real-time Optimal Navigation Planning Using Learned Motion Costs,

B. Yang, L. Wellhausen, T. Miki, M. Liu, and M. Hutter, “Real-time Optimal Navigation Planning Using Learned Motion Costs,” inProc. of the IEEE Int. Conf. on Robotics and Automation (ICRA), 2021

2021

[25] [25]

Semantic Terrain Classification for Off-Road Autonomous Driving,

A. Shaban, X. Meng, J. Lee, B. Boots, and D. Fox, “Semantic Terrain Classification for Off-Road Autonomous Driving,” inProc. of the Conf. on Robot Learning (CORL), 2022

2022

[26] [26]

BADGR: An Autonomous Self- Supervised Learning-Based Navigation System,

G. Kahn, P. Abbeel, and S. Levine, “BADGR: An Autonomous Self- Supervised Learning-Based Navigation System,”IEEE Robotics and Automation Letters, 2021

2021

[27] [27]

Where Should I Walk? Predicting Terrain Properties From Images Via Self-Supervised Learning,

L. Wellhausen, A. Dosovitskiy, R. Ranftl, K. Walas, C. Cadena, and M. Hutter, “Where Should I Walk? Predicting Terrain Properties From Images Via Self-Supervised Learning,”IEEE Robotics and Automation Letters, 2019

2019

[28] [28]

WayFAST: Navigation With Predictive Traversability in the Field,

M. V . Gasparino, A. N. Sivakumar, Y . Liu, A. E. B. Velasquez, V . A. H. Higuti, J. Rogers, H. Tran, and G. Chowdhary, “WayFAST: Navigation With Predictive Traversability in the Field,”IEEE Robotics and Automation Letters, 2022

2022

[29] [29]

TerraPN: Unstructured Terrain Navigation using Online Self-Supervised Learning,

A. J. Sathyamoorthy, K. Weerakoon, T. Guan, J. Liang, and D. Manocha, “TerraPN: Unstructured Terrain Navigation using Online Self-Supervised Learning,” inProc. of the IEEE/RSJ Int. Conf. on Intelligent Robots and Systems (IROS), 2022

2022

[30] [30]

Bayesian Nonparametric Submodular Video Partition for Robust Anomaly Detection,

H. Sapkota and Q. Yu, “Bayesian Nonparametric Submodular Video Partition for Robust Anomaly Detection,” inConf. on Computer Vision and Pattern Recognition, 2022

2022

[31] [31]

Uncertainty Estimation for Planetary Robotic Terrain Segmentation,

M. G. M ¨uller, M. Durner, W. Boerdijk, H. Blum, A. Gawel, W. St¨urzl, R. Siegwart, and R. Triebel, “Uncertainty Estimation for Planetary Robotic Terrain Segmentation,” inProc. of the IEEE Aerospace Conf., 2023

2023

[32] [32]

Entropy Maximization and Meta Classification for Out-of-Distribution Detection in Semantic Segmentation,

R. Chan, M. Rottmann, and H. Gottschalk, “Entropy Maximization and Meta Classification for Out-of-Distribution Detection in Semantic Segmentation,” inProc. of the IEEE/CVF Int. Conf. on Computer Vision (ICCV), 2021

2021

[33] [33]

Simultaneous Seman- tic Segmentation and Outlier Detection in Presence of Domain Shift,

P. Bevandi ´c, I. Kreˇso, M. Orˇsi´c, and S. ˇSegvi´c, “Simultaneous Seman- tic Segmentation and Outlier Detection in Presence of Domain Shift,” inPattern Recognition, 2019

2019

[34] [34]

Detecting Road Obstacles by Erasing Them,

K. Lis, S. Honari, P. Fua, and M. Salzmann, “Detecting Road Obstacles by Erasing Them,” inIEEE Trans. on Pattern Analysis and Machine Intelligence, 2024

2024

[35] [35]

OmniAL: A Unified CNN Framework for Unsupervised Anomaly Localization,

Y . Zhao, “OmniAL: A Unified CNN Framework for Unsupervised Anomaly Localization,” inConf. on Computer Vision and Pattern Recognition, 2023

2023

[36] [36]

Uninformed Students: Student-Teacher Anomaly Detection With Discriminative Latent Embeddings,

P. Bergmann, M. Fauser, D. Sattlegger, and C. Steger, “Uninformed Students: Student-Teacher Anomaly Detection With Discriminative Latent Embeddings,” inConf. on Computer Vision and Pattern Recog- nition, 2020

2020

[37] [37]

DeSTSeg: Segmentation Guided Denoising Student-Teacher for Anomaly Detec- tion,

X. Zhang, S. Li, X. Li, P. Huang, J. Shan, and T. Chen, “DeSTSeg: Segmentation Guided Denoising Student-Teacher for Anomaly Detec- tion,” inConf. on Computer Vision and Pattern Recognition, 2023

2023

[38] [38]

Multi-Scale Patch-Based Rep- resentation Learning for Image Anomaly Detection and Segmenta- tion,

C.-C. Tsai, T.-H. Wu, and S.-H. Lai, “Multi-Scale Patch-Based Rep- resentation Learning for Image Anomaly Detection and Segmenta- tion,” inIEEE/CVF Winter Conf. on Applications of Computer Vision (WACV), 2022

2022

[39] [39]

Open- World Semantic Segmentation Including Class Similarity,

M. Sodano, F. Magistri, L. Nunes, J. Behley, and C. Stachniss, “Open- World Semantic Segmentation Including Class Similarity,” inConf. on Computer Vision and Pattern Recognition, 2024

2024

[40] [40]

SCIM: Simultaneous Clustering, Inference, and Mapping for Open- World Semantic Scene Understanding,

H. Blum, M. G. M ¨uller, A. Gawel, R. Siegwart, and C. Cadena, “SCIM: Simultaneous Clustering, Inference, and Mapping for Open- World Semantic Scene Understanding,” inRobotics Research, 2023

2023

[41] [41]

DINOv2: Learning Robust Visual Features without Supervision

M. Oquab, T. Darcet, T. Moutakanni, H. V o, M. Szafraniec, V . Khali- dov, P. Fernandez, D. Haziza, F. Massa, A. El-Noubyet al., “Dinov2: Learning robust visual features without supervision,”arXiv preprint arXiv:2304.07193, 2023

work page internal anchor Pith review Pith/arXiv arXiv 2023