Recognition: unknown
Towards All-Day Perception for Off-Road Driving: A Large-Scale Multispectral Dataset and Comprehensive Benchmark
Pith reviewed 2026-05-07 08:47 UTC · model grok-4.3
The pith
A memory-attention network trained on a new large infrared off-road dataset improves freespace detection accuracy by over 1% while running in real time.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
On the IRON dataset of 24,314 densely annotated infrared images with synchronized RGB, the IRONet model using memory attention and a mask decoder reaches 82.93% IoU and 90.66% F1 score for freespace detection, outperforming previous methods by 1.19% IoU and 0.71% F1 at real-time inference speeds. IRONet further shows strong generalization when applied to RGB images on the ORFD and Rellis datasets.
What carries the argument
The memory-attention mechanism in IRONet that aggregates historical context from previous frames to enforce temporal consistency in freespace segmentation without optical flow.
If this is right
- Temporal consistency can be added to single-frame perception models without the cost of optical-flow computation.
- Infrared perception becomes practical for nighttime off-road autonomous driving.
- The IRON dataset enables further development of multispectral methods for unstructured environments.
- The memory-attention approach transfers across modalities without retraining from scratch.
Where Pith is reading between the lines
- The same temporal aggregation could be tested on longer sequences or fused with additional sensors to handle rapid terrain changes.
- Methods tuned on this off-road infrared data may reveal weaknesses in models originally designed for structured on-road scenes.
- Extending the framework to infrared object detection or depth estimation could produce similar consistency gains.
Load-bearing premise
That the reported accuracy gains arise chiefly from the memory-attention design rather than from dataset annotation quality, training choices, or the particular scenes used in the test split.
What would settle it
An independently collected infrared off-road dataset with different terrain and lighting where IRONet shows no improvement over single-frame baselines on the same metrics.
Figures
read the original abstract
Off-road nighttime autonomous driving suffers from unreliable visible-light perception, making infrared modality crucial for accurate freespace detection. However, progress remains limited due to the scarcity of annotated infrared off-road datasets and the inter-frame inconsistencies inherent to current single-frame methods. To address these gaps, we present the IRON dataset, which, to our knowledge, is the first large-scale infrared dataset for off-road temporal freespace detection under all-day conditions, with strong support for nighttime perception. The dataset comprises 24,314 densely annotated infrared images with synchronized RGB images in diverse scenes and different light conditions. Building upon this dataset, we propose IRONet, a novel flow-free framework for temporal freespace detection that addresses inter-frame inconsistencies by aggregating historical context via a memory-attention mechanism and a carefully designed mask decoder. On our IRON dataset, IRONet achieves state-of-the-art performance, reaching 82.93%(+1.19%) IoU and 90.66%(+0.71%) F1 score at real-time inference. Remarkably, IRONet also exhibits robust generalization to RGB modalities on ORFD and Rellis datasets. Overall, our work establishes a foundation for reliable all-day off-road autonomous driving and future research in infrared temporal perception. The code and IRON dataset are available at https://github.com/wsnbws/IRON.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper introduces the IRON dataset (24,314 densely annotated infrared images paired with RGB, covering diverse off-road scenes and lighting conditions) and proposes IRONet, a flow-free temporal freespace detection network that aggregates historical context via a memory-attention mechanism and a mask decoder. It reports state-of-the-art results on IRON (82.93% IoU and 90.66% F1 at real-time inference) with claimed generalization to RGB on ORFD and Rellis datasets, positioning the work as a foundation for all-day off-road perception.
Significance. If the reported gains hold after proper controls, the work would provide a valuable large-scale infrared benchmark for off-road freespace detection and demonstrate a practical temporal architecture that improves frame-to-frame consistency without optical flow, advancing multispectral perception for autonomous driving in low-light conditions.
major comments (3)
- [Experiments] Experiments section: The headline claims of +1.19% IoU and +0.71% F1 over prior methods are presented without exhaustive ablations (e.g., memory-attention removed, single-frame baseline with identical backbone and training schedule, or simple frame-stacking comparator), error bars across runs, or explicit validation protocol details; this leaves open whether gains derive from the architecture or from dataset-specific factors.
- [Dataset] Dataset section: The IRON train/test split description does not report temporal non-overlap criteria, scene diversity statistics, or cross-validation to rule out memorization of off-road textures or lighting patterns, which is load-bearing for the generalization claims.
- [Experiments] Generalization experiments: The transfer results on ORFD and Rellis lack specification of the protocol (zero-shot inference vs. fine-tuning) and do not include a matched single-frame baseline, weakening the assertion that the memory-attention mechanism drives robust cross-modal performance.
minor comments (2)
- [Abstract] Abstract and method overview: The phrase 'flow-free framework' is used without a brief contrast to flow-based alternatives, which could be clarified for readers unfamiliar with the subfield.
- [Implementation] The GitHub link is provided, but the manuscript does not include a reproducibility checklist or hyperparameter table, which would aid verification of the real-time claims.
Simulated Author's Rebuttal
We thank the referee for the constructive and detailed comments, which have helped us improve the clarity and rigor of the manuscript. We address each major comment point by point below and have made corresponding revisions to the paper.
read point-by-point responses
-
Referee: Experiments section: The headline claims of +1.19% IoU and +0.71% F1 over prior methods are presented without exhaustive ablations (e.g., memory-attention removed, single-frame baseline with identical backbone and training schedule, or simple frame-stacking comparator), error bars across runs, or explicit validation protocol details; this leaves open whether gains derive from the architecture or from dataset-specific factors.
Authors: We agree that the original manuscript would be strengthened by these additional controls. In the revised version, we have added an ablation study that removes the memory-attention module, a single-frame baseline using the identical backbone and training schedule, and a simple frame-stacking comparator. We now report error bars computed over three independent runs with different random seeds and provide explicit details on the train/validation/test protocol and hyperparameter settings in the Experiments section. These new results confirm that the reported gains are attributable to the proposed architecture. revision: yes
-
Referee: Dataset section: The IRON train/test split description does not report temporal non-overlap criteria, scene diversity statistics, or cross-validation to rule out memorization of off-road textures or lighting patterns, which is load-bearing for the generalization claims.
Authors: We acknowledge the need for greater transparency on the split. The revised Dataset section now explicitly describes the temporal non-overlap criteria (no shared frames or consecutive sequences between train and test), provides scene diversity statistics (number of distinct locations, distribution across daytime/nighttime and weather conditions), and includes a cross-validation experiment that demonstrates consistent performance across different scene partitions, thereby addressing concerns about memorization. revision: yes
-
Referee: Generalization experiments: The transfer results on ORFD and Rellis lack specification of the protocol (zero-shot inference vs. fine-tuning) and do not include a matched single-frame baseline, weakening the assertion that the memory-attention mechanism drives robust cross-modal performance.
Authors: We thank the referee for highlighting this omission. The revised manuscript now clearly states that the ORFD and Rellis results were obtained via zero-shot inference with no fine-tuning on the target datasets. We have also added a matched single-frame baseline (same backbone, trained only on IRON) for direct comparison on both datasets, which isolates the contribution of the memory-attention mechanism to the observed cross-modal generalization. revision: yes
Circularity Check
No significant circularity in empirical benchmark claims
full rationale
This is an empirical ML paper introducing the IRON dataset and evaluating IRONet on held-out test splits plus transfer to ORFD and Rellis. Reported IoU/F1 metrics are direct measurements from training and inference on those splits, not quantities that reduce by construction to fitted parameters, self-definitions, or self-citation chains. No equations, uniqueness theorems, or ansatzes are presented that loop the performance claims back to the inputs; the central results remain independent empirical observations.
Axiom & Free-Parameter Ledger
free parameters (1)
- model hyperparameters and training schedule
axioms (2)
- domain assumption Dense manual annotations for freespace are accurate and consistent across frames and lighting conditions
- domain assumption Aggregating historical context via memory attention reduces inter-frame inconsistencies better than single-frame or flow-based alternatives
invented entities (2)
-
IRONet architecture
no independent evidence
-
memory-attention mechanism
no independent evidence
Reference graph
Works this paper leans on
-
[1]
Deep depth estimation from thermal image,
U. Shin, J. Park, and I. S. Kweon, “Deep depth estimation from thermal image,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023, pp. 1043–1053
2023
-
[2]
Causal mode multiplexer: A novel framework for unbiased multispectral pedestrian detection,
T. Kim, S. Shin, Y . Yu, H. G. Kim, and Y . M. Ro, “Causal mode multiplexer: A novel framework for unbiased multispectral pedestrian detection,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024, pp. 26 784–26 793
2024
-
[3]
Multispectral object detection enhanced by cross-modal information complementary and cosine similarity channel resampling modules,
J. Jang, C. Park, H. Kim, J. Lee, and J. Paik, “Multispectral object detection enhanced by cross-modal information complementary and cosine similarity channel resampling modules,” in2025 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV). IEEE, 2025, pp. 9437–9446
2025
-
[4]
Infrared image super- resolution: A systematic review and future trends,
Y . Huang, T. Miyazaki, X. Liu, and S. Omachi, “Infrared image super- resolution: A systematic review and future trends,”IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, 2025
2025
-
[5]
Unirgb- ir: A unified framework for visible-infrared semantic tasks via adapter tuning,
M. Yuan, B. Cui, T. Zhao, J. Wang, S. Fu, X. Yang, and X. Wei, “Unirgb- ir: A unified framework for visible-infrared semantic tasks via adapter tuning,” inProceedings of the 33rd ACM International Conference on Multimedia, 2025, pp. 2409–2418
2025
-
[6]
Progressive domain adaptation for thermal infrared tracking,
Q. Li, K. Tan, D. Yuan, and Q. Liu, “Progressive domain adaptation for thermal infrared tracking,”Electronics, vol. 14, no. 1, p. 162, 2025
2025
-
[7]
Rellis-3d dataset: Data, benchmarks and analysis,
P. Jiang, P. Osteen, M. Wigness, and S. Saripalli, “Rellis-3d dataset: Data, benchmarks and analysis,” in2021 IEEE international conference on robotics and automation (ICRA). IEEE, 2021, pp. 1110–1116
2021
-
[8]
Orfd: A dataset and benchmark for off-road freespace detection,
C. Min, W. Jiang, D. Zhao, J. Xu, L. Xiao, Y . Nie, and B. Dai, “Orfd: A dataset and benchmark for off-road freespace detection,” in2022 international conference on robotics and automation (ICRA). IEEE, 2022, pp. 2532–2538
2022
-
[9]
The goose dataset for perception in unstructured environments,
P. Mortimer, R. Hagmanns, M. Granero, T. Luettel, J. Petereit, and H.-J. Wuensche, “The goose dataset for perception in unstructured environments,” in2024 IEEE International Conference on Robotics and Automation (ICRA). IEEE, 2024, pp. 14 838–14 844
2024
-
[10]
Cat: Cavs traversability dataset for off-road autonomous driving,
S. Sharma, L. Dabbiru, T. Hannis, G. Mason, D. W. Carruth, M. Doude, C. Goodin, C. Hudson, S. Ozier, J. E. Ballet al., “Cat: Cavs traversability dataset for off-road autonomous driving,”IEEE access, vol. 10, pp. 24 759–24 768, 2022
2022
-
[11]
A. Datar, A. Pokhrel, M. Nazeri, M. B. Rao, C. Pan, Y . Zhang, A. Harrison, M. Wigness, P. R. Osteen, J. Yeet al., “M2p2: A multi- modal passive perception dataset for off-road mobility in extreme low- light conditions,”arXiv preprint arXiv:2410.01105, 2024
-
[12]
Video semantic segmentation with inter- frame feature fusion and inner-frame feature refinement,
J. Zhuang, Z. Wang, and J. Li, “Video semantic segmentation with inter- frame feature fusion and inner-frame feature refinement,”arXiv preprint arXiv:2301.03832, 2023
-
[13]
Exploiting temporal state space sharing for video semantic segmentation,
S. A. S. Hesham, Y . Liu, G. Sun, H. Ding, J. Yang, E. Konukoglu, X. Geng, and X. Jiang, “Exploiting temporal state space sharing for video semantic segmentation,” inProceedings of the Computer Vision and Pattern Recognition Conference, 2025, pp. 24 211–24 221
2025
-
[14]
Sne-roadseg+: Rethinking depth- normal translation and deep supervision for freespace detection,
H. Wang, R. Fan, P. Cai, and M. Liu, “Sne-roadseg+: Rethinking depth- normal translation and deep supervision for freespace detection,” in2021 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE, 2021, pp. 1140–1145
2021
-
[15]
Bifnet: Bidirectional fusion network for road segmentation,
H. Li, Y . Chen, Q. Zhang, and D. Zhao, “Bifnet: Bidirectional fusion network for road segmentation,”IEEE transactions on cybernetics, vol. 52, no. 9, pp. 8617–8628, 2021
2021
-
[16]
Roadformer: Duplex transformer for rgb-normal semantic road scene parsing,
J. Li, Y . Zhang, P. Yun, G. Zhou, Q. Chen, and R. Fan, “Roadformer: Duplex transformer for rgb-normal semantic road scene parsing,”IEEE Transactions on Intelligent V ehicles, vol. 9, no. 7, pp. 5163–5172, 2024
2024
-
[17]
Rod: Rgb-only fast and efficient off-road freespace detection,
T. Sun, H. Ye, J. Mei, L. Chen, F. Zhao, L. Zong, and Y . Hu, “Rod: Rgb-only fast and efficient off-road freespace detection,” in2025 IEEE International Conference on Robotics and Automation (ICRA). IEEE, 2025, pp. 9787–9793
2025
-
[18]
Mask propagation for efficient video semantic segmentation,
Y . Weng, M. Han, H. He, M. Li, L. Yao, X. Chang, and B. Zhuang, “Mask propagation for efficient video semantic segmentation,”Advances in Neural Information Processing Systems, vol. 36, pp. 7170–7183, 2023
2023
-
[19]
Global motion understanding in large-scale video object segmentation,
V . Fedynyak, Y . Romanus, O. Dobosevych, I. Babin, and R. Riazantsev, “Global motion understanding in large-scale video object segmentation,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023, pp. 3153–3162
2023
-
[20]
Petrv2: A unified framework for 3d perception from multi-camera images,
Y . Liu, J. Yan, F. Jia, S. Li, A. Gao, T. Wang, and X. Zhang, “Petrv2: A unified framework for 3d perception from multi-camera images,” in Proceedings of the IEEE/CVF international conference on computer vision, 2023, pp. 3262–3272
2023
-
[21]
Rgb-d video object segmentation via enhanced multi-store feature memory,
B. Xu, R. Hou, T. Ren, and G. Wu, “Rgb-d video object segmentation via enhanced multi-store feature memory,” inProceedings of the 2024 International Conference on Multimedia Retrieval, 2024, pp. 1016– 1024
2024
-
[22]
Evolve: Event-guided deformable feature transfer and dual-memory refinement for low-light video object segmentation,
J.-H. Baek, J. Oh, and Y . J. Koh, “Evolve: Event-guided deformable feature transfer and dual-memory refinement for low-light video object segmentation,” inProceedings of the IEEE/CVF International Confer- ence on Computer Vision, 2025, pp. 11 273–11 282
2025
-
[23]
The cityscapes dataset for semantic urban scene understanding,
M. Cordts, M. Omran, S. Ramos, T. Rehfeld, M. Enzweiler, R. Be- nenson, U. Franke, S. Roth, and B. Schiele, “The cityscapes dataset for semantic urban scene understanding,” inProceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 3213– 3223
2016
-
[24]
Vision meets robotics: The kitti dataset,
A. Geiger, P. Lenz, C. Stiller, and R. Urtasun, “Vision meets robotics: The kitti dataset,”The international journal of robotics research, vol. 32, no. 11, pp. 1231–1237, 2013
2013
-
[25]
A rugd dataset for autonomous navigation and visual perception in unstructured outdoor environments,
M. Wigness, S. Eum, J. G. Rogers, D. Han, and H. Kwon, “A rugd dataset for autonomous navigation and visual perception in unstructured outdoor environments,” in2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE, 2019, pp. 5000–5007
2019
-
[26]
Tartandrive 2.0: More modalities and better infrastructure to further self-supervised learning re- search in off-road driving tasks,
M. Sivaprakasam, P. Maheshwari, M. G. Castro, S. Triest, M. Nye, S. Willits, A. Saba, W. Wang, and S. Scherer, “Tartandrive 2.0: More modalities and better infrastructure to further self-supervised learning re- search in off-road driving tasks,” in2024 IEEE International Conference on Robotics and Automation (ICRA). IEEE, 2024, pp. 12 606–12 606
2024
-
[27]
C. Min, J. Mei, H. Zhai, S. Wang, T. Sun, F. Kong, H. Li, F. Mao, F. Liu, S. Wanget al., “Advancing off-road autonomous driving: The large- scale orad-3d dataset and comprehensive benchmarks,”arXiv preprint arXiv:2510.16500, 2025
-
[28]
Of- froadsynth open dataset for semantic segmentation using synthetic-data- based weight initialization for autonomous ugv in off-road environ- ments,
K. Małek, J. Dybała, A. Kordecki, P. Hondra, and K. Kijania, “Of- froadsynth open dataset for semantic segmentation using synthetic-data- based weight initialization for autonomous ugv in off-road environ- ments,”Journal of Intelligent & Robotic Systems, vol. 110, no. 2, p. 76, 2024
2024
-
[29]
Kaist multi-spectral day/night data set for autonomous and as- sisted driving,
Y . Choi, N. Kim, S. Hwang, K. Park, J. S. Yoon, K. An, and I. S. Kweon, “Kaist multi-spectral day/night data set for autonomous and as- sisted driving,”IEEE Transactions on Intelligent Transportation Systems, vol. 19, no. 3, pp. 934–948, 2018
2018
-
[30]
Llvip: A visible-infrared paired dataset for low-light vision,
X. Jia, C. Zhu, M. Li, W. Tang, and W. Zhou, “Llvip: A visible-infrared paired dataset for low-light vision,” inProceedings of the IEEE/CVF international conference on computer vision, 2021, pp. 3496–3504
2021
-
[31]
Mfnet: Towards real-time semantic segmentation for autonomous vehicles with multi-spectral scenes,
Q. Ha, K. Watanabe, T. Karasawa, Y . Ushiku, and T. Harada, “Mfnet: Towards real-time semantic segmentation for autonomous vehicles with multi-spectral scenes,” in2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE, 2017, pp. 5108–5115
2017
-
[32]
Flir adas dataset,
FLIR Systems, Inc., “Flir adas dataset,” Dataset, FLIR Systems, Inc., USA, accessed: 2025-10-27. [Online]. Available: https://oem.flir.com/ en-in/solutions/automotive/adas-dataset-form/
2025
-
[33]
Target-aware dual adversarial learning and a multi-scenario multi- modality benchmark to fuse infrared and visible for object detection,
J. Liu, X. Fan, Z. Huang, G. Wu, R. Liu, W. Zhong, and Z. Luo, “Target-aware dual adversarial learning and a multi-scenario multi- modality benchmark to fuse infrared and visible for object detection,” inProceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2022, pp. 5802–5811
2022
-
[34]
U-net: Convolutional networks for biomedical image segmentation,
O. Ronneberger, P. Fischer, and T. Brox, “U-net: Convolutional networks for biomedical image segmentation,” inInternational Conference on Medical image computing and computer-assisted intervention. Springer, 2015, pp. 234–241
2015
-
[35]
Encoder- decoder with atrous separable convolution for semantic image segmen- tation,
L.-C. Chen, Y . Zhu, G. Papandreou, F. Schroff, and H. Adam, “Encoder- decoder with atrous separable convolution for semantic image segmen- tation,” inProceedings of the European conference on computer vision (ECCV), 2018, pp. 801–818
2018
-
[36]
Segformer: Simple and efficient design for semantic segmentation with transformers,
E. Xie, W. Wang, Z. Yu, A. Anandkumar, J. M. Alvarez, and P. Luo, “Segformer: Simple and efficient design for semantic segmentation with transformers,”Advances in neural information processing systems, vol. 34, pp. 12 077–12 090, 2021
2021
-
[37]
Masked-attention mask transformer for universal image segmentation,
B. Cheng, I. Misra, A. G. Schwing, A. Kirillov, and R. Girdhar, “Masked-attention mask transformer for universal image segmentation,” inProceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2022, pp. 1290–1299
2022
-
[38]
Segment anything,
A. Kirillov, E. Mintun, N. Ravi, H. Mao, C. Rolland, L. Gustafson, T. Xiao, S. Whitehead, A. C. Berg, W.-Y . Loet al., “Segment anything,” inProceedings of the IEEE/CVF international conference on computer vision, 2023, pp. 4015–4026
2023
-
[39]
Sne-roadseg: Incorporating surface normal information into semantic segmentation for accurate freespace detection,
R. Fan, H. Wang, P. Cai, and M. Liu, “Sne-roadseg: Incorporating surface normal information into semantic segmentation for accurate freespace detection,” inEuropean Conference on Computer Vision. Springer, 2020, pp. 340–356
2020
-
[40]
M2f2-net: Multi-modal feature fusion for unstructured off-road freespace detection,
H. Ye, J. Mei, and Y . Hu, “M2f2-net: Multi-modal feature fusion for unstructured off-road freespace detection,” in2023 IEEE Intelligent V ehicles Symposium (IV). IEEE, 2023, pp. 1–7
2023
-
[41]
Raft: Recurrent all-pairs field transforms for optical flow,
Z. Teed and J. Deng, “Raft: Recurrent all-pairs field transforms for optical flow,” inEuropean conference on computer vision. Springer, 2020, pp. 402–419
2020
-
[42]
Flowformer: A transformer architecture for optical flow,
Z. Huang, X. Shi, C. Zhang, Q. Wang, K. C. Cheung, H. Qin, J. Dai, and H. Li, “Flowformer: A transformer architecture for optical flow,” in European conference on computer vision. Springer, 2022, pp. 668–685
2022
-
[43]
SAM 2: Segment Anything in Images and Videos
N. Ravi, V . Gabeur, Y .-T. Hu, R. Hu, C. Ryali, T. Ma, H. Khedr, R. R¨adle, C. Rolland, L. Gustafsonet al., “Sam 2: Segment anything in images and videos,”arXiv preprint arXiv:2408.00714, 2024
work page internal anchor Pith review arXiv 2024
-
[44]
Sam2long: Enhancing sam 2 for long video segmentation with a training-free memory tree,
S. Ding, R. Qian, X. Dong, P. Zhang, Y . Zang, Y . Cao, Y . Guo, D. Lin, and J. Wang, “Sam2long: Enhancing sam 2 for long video segmentation with a training-free memory tree,” inProceedings of the IEEE/CVF International Conference on Computer Vision, 2025, pp. 13 614–13 624
2025
-
[45]
Samwise: Infusing wisdom in sam2 for text-driven video segmentation,
C. Cuttano, G. Trivigno, G. Rosi, C. Masone, and G. Averta, “Samwise: Infusing wisdom in sam2 for text-driven video segmentation,” inPro- ceedings of the Computer Vision and Pattern Recognition Conference, 2025, pp. 3395–3405
2025
-
[46]
Camosam2: Motion-appearance induced auto-refining prompts for video camouflaged object detection,
X. Zhang, K. Fu, and Q. Zhao, “Camosam2: Motion-appearance induced auto-refining prompts for video camouflaged object detection,”arXiv preprint arXiv:2504.00375, 2025
-
[47]
Benchmarking a large- scale fir dataset for on-road pedestrian detection,
Z. Xu, J. Zhuang, Q. Liu, J. Zhou, and S. Peng, “Benchmarking a large- scale fir dataset for on-road pedestrian detection,”Infrared Physics & Technology, vol. 96, pp. 199–208, 2019
2019
-
[48]
Infraparis: A multi-modal and multi-task autonomous driving dataset,
G. Franchi, M. Hariat, X. Yu, N. Belkhir, A. Manzanera, and D. Filliat, “Infraparis: A multi-modal and multi-task autonomous driving dataset,” inProceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, 2024, pp. 2973–2983
2024
-
[49]
Automotive night vision thermal camera – ir-pilot series,
Ray Vision Technologies, “Automotive night vision thermal camera – ir-pilot series,” Webpage, Ray Vision Technologies, Pakistan, accessed: 2025-11-17. [Online]. Available: https://rayvisionpk.com/ automotive-night-vision-thermal-camera/
2025
-
[50]
Seeed studio,
L. Seeed Technology Co., “Seeed studio,” https://www.seeedstudio.com/, 2025, accessed: 2025-11-28
2025
-
[51]
Advanced auto labeling solution with added features,
W. Wang, “Advanced auto labeling solution with added features,” https: //github.com/CVHub520/X-AnyLabeling, CVHub, 2023
2023
-
[52]
Rtfnet: Rgb-thermal fusion network for semantic segmentation of urban scenes,
Y . Sun, W. Zuo, and M. Liu, “Rtfnet: Rgb-thermal fusion network for semantic segmentation of urban scenes,”IEEE Robotics and Automation Letters, vol. 4, no. 3, pp. 2576–2583, 2019
2019
-
[53]
Safe robot navigation via multi-modal anomaly detection,
L. Wellhausen, R. Ranftl, and M. Hutter, “Safe robot navigation via multi-modal anomaly detection,”IEEE Robotics and Automation Letters, vol. 5, no. 2, pp. 1326–1333, 2020
2020
-
[54]
Self-supervised traversability prediction by learning to reconstruct safe terrain,
R. Schmid, D. Atha, F. Sch ¨oller, S. Dey, S. Fakoorian, K. Otsu, B. Ridge, M. Bjelonic, L. Wellhausen, M. Hutteret al., “Self-supervised traversability prediction by learning to reconstruct safe terrain,” in2022 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE, 2022, pp. 12 419–12 425
2022
-
[55]
Learning off-road terrain traversability with self-supervisions only,
J. Seo, S. Sim, and I. Shim, “Learning off-road terrain traversability with self-supervisions only,”IEEE Robotics and Automation Letters, vol. 8, no. 8, pp. 4617–4624, 2023
2023
-
[56]
Convmae: Masked convolution meets masked autoencoders,
P. Gao, T. Ma, H. Li, Z. Lin, J. Dai, and Y . Qiao, “Convmae: Masked convolution meets masked autoencoders,”arXiv preprint arXiv:2205.03892, 2022
-
[57]
O. Sim ´eoni, H. V . V o, M. Seitzer, F. Baldassarre, M. Oquab, C. Jose, V . Khalidov, M. Szafraniec, S. Yi, M. Ramamonjisoaet al., “Dinov3,” arXiv preprint arXiv:2508.10104, 2025
work page internal anchor Pith review arXiv 2025
-
[58]
Imagenet: A large-scale hierarchical image database,
J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei-Fei, “Imagenet: A large-scale hierarchical image database,” in2009 IEEE conference on computer vision and pattern recognition. Ieee, 2009, pp. 248–255
2009
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.