pith. sign in

arxiv: 1906.12151 · v1 · pith:NMYE6P6Nnew · submitted 2019-06-28 · 💻 cs.CV

Place recognition in gardens by learning visual representations: data set and benchmark analysis

Pith reviewed 2026-05-25 13:47 UTC · model grok-4.3

classification 💻 cs.CV
keywords visual place recognitiongarden environmentsconvolutional neural networksdatasetbenchmarkcontrastive lossTB-Places
0
0 comments X

The pith

Learning garden-tailored representations improves place recognition performance within garden environments.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper extends the TB-Places dataset with images from real gardens across seasons and varying light, supplying complete pairwise ground truth labels for whether image pairs show the same place. It benchmarks convolutional networks including ResNet, DenseNet and VGG NetVLAD as backbones in a two-way architecture trained with contrastive loss. Results show that representations learned on this garden data outperform general-purpose networks on garden test images. The work targets the repetitive green patterns and textures that defeat standard visual localization methods developed for indoor or urban scenes.

Core claim

Training existing convolutional networks on the extended TB-Places garden dataset with pairwise place labels produces holistic image descriptors that raise place recognition accuracy in garden scenes relative to off-the-shelf networks, although the resulting representations exhibit limited generalization beyond the training conditions.

What carries the argument

The extended TB-Places dataset supplying ground truth for all image pairs, used to train a two-way CNN architecture with contrastive loss for garden-specific holistic descriptors.

If this is right

  • Garden-specific training raises matching accuracy for places within garden scenes compared with standard pre-trained networks.
  • Including seasonal and lighting variations in the training data supports the observed performance gains.
  • The same two-way contrastive architecture works with multiple backbones such as ResNet, DenseNet and VGG NetVLAD.
  • Generalization across different garden collections remains limited.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same dataset-extension and fine-tuning strategy could be applied to other repetitive natural scenes such as orchards or fields.
  • Hybrid systems that combine garden-specific visual features with additional sensors might reduce the observed generalization limits.
  • Controlled experiments that isolate layout versus lighting variations could clarify the source of limited cross-garden transfer.

Load-bearing premise

The collected garden images and their pairwise labels capture the visual challenges of real gardens without overfitting to the particular sites and collection times.

What would settle it

Evaluation of the trained models on an independent collection of garden images from new locations and seasons shows no accuracy gain over general-purpose networks.

Figures

Figures reproduced from arXiv: 1906.12151 by Maria Leyva-Vallina, Nicolai Petkov, Nicola Strisciuglio.

Figure 1
Figure 1. Figure 1: (a) Robot platform in the Wageningen garden of TrimBot2020 project. [PITH_FULL_IMAGE:figures/full_fig_p003_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Top-view of the trajectories followed by the robot during the recording [PITH_FULL_IMAGE:figures/full_fig_p004_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Examples of TB-Places W18 data set. The left column shows reference [PITH_FULL_IMAGE:figures/full_fig_p005_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Sketch of the employed architecture for learning garden-specific descrip [PITH_FULL_IMAGE:figures/full_fig_p008_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Precision-recall curves achieved on train (W17) and test (W18) by the [PITH_FULL_IMAGE:figures/full_fig_p010_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Train and test performance vs training epoch. The bullet indicates the [PITH_FULL_IMAGE:figures/full_fig_p011_6.png] view at source ↗
read the original abstract

Visual place recognition is an important component of systems for camera localization and loop closure detection. It concerns the recognition of a previously visited place based on visual cues only. Although it is a widely studied problem for indoor and urban environments, the recent use of robots for automation of agricultural and gardening tasks has created new problems, due to the challenging appearance of garden-like environments. Garden scenes predominantly contain green colors, as well as repetitive patterns and textures. The lack of available data recorded in gardens and natural environments makes the improvement of visual localization algorithms difficult. In this paper we propose an extended version of the TB-Places data set, which is designed for testing algorithms for visual place recognition. It contains images with ground truth camera pose recorded in real gardens in different seasons, with varying light conditions. We constructed and released a ground truth for all possible pairs of images, indicating whether they depict the same place or not. We present the results of a benchmark analysis of methods based on convolutional neural networks for holistic image description and place recognition. We train existing networks (i.e. ResNet, DenseNet and VGG NetVLAD) as backbone of a two-way architecture with a contrastive loss function. The results that we obtained demonstrate that learning garden-tailored representations contribute to an improvement of performance, although the generalization capabilities are limited.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The manuscript extends the TB-Places dataset with images recorded in real gardens across seasons and lighting conditions, releases exhaustive ground-truth labels for all image pairs indicating whether they depict the same place, and benchmarks CNN backbones (ResNet, DenseNet, VGG NetVLAD) inside a two-way contrastive architecture. It claims that training garden-tailored representations yields performance gains for visual place recognition, while noting that generalization capabilities remain limited.

Significance. Release of a garden-specific dataset with complete pair-wise ground truth fills a documented gap for agricultural robotics applications. If the reported gains can be shown to arise from transferable domain features rather than site-specific memorization, the benchmark would supply useful empirical guidance on the value of tailored holistic descriptors in repetitive green textures.

major comments (2)
  1. Abstract: the central claim that garden-tailored training improves performance is stated without any quantitative metrics, baseline comparisons, or error analysis, leaving the magnitude and reliability of the improvement unverifiable from the provided text.
  2. Benchmark analysis / Experiments: the evaluation protocol does not describe a cross-garden or leave-one-garden-out train/test split; without such isolation, observed gains cannot be distinguished from overfitting to the finite set of recorded sites, cameras, paths, or collection conditions, which directly undermines the claim that the improvement reflects domain tailoring rather than memorization.
minor comments (2)
  1. Abstract and methods: contrastive-loss hyperparameters, learning-rate schedules, data-augmentation choices, and exact training protocol are not reported, hindering reproducibility.
  2. Dataset description: while pair-wise ground truth is released, the manuscript does not state how seasonal or viewpoint variation is balanced across any proposed splits.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback. The comments highlight opportunities to strengthen the presentation of quantitative results and to better isolate domain-specific effects in the evaluation. We address each major comment below and will incorporate revisions accordingly.

read point-by-point responses
  1. Referee: Abstract: the central claim that garden-tailored training improves performance is stated without any quantitative metrics, baseline comparisons, or error analysis, leaving the magnitude and reliability of the improvement unverifiable from the provided text.

    Authors: We agree that the abstract would be strengthened by including concrete metrics. In the revised manuscript we will update the abstract to report specific quantitative improvements (e.g., recall@1 or mAP gains of the garden-trained models relative to the pre-trained baselines) together with a brief reference to the observed error patterns. revision: yes

  2. Referee: Benchmark analysis / Experiments: the evaluation protocol does not describe a cross-garden or leave-one-garden-out train/test split; without such isolation, observed gains cannot be distinguished from overfitting to the finite set of recorded sites, cameras, paths, or collection conditions, which directly undermines the claim that the improvement reflects domain tailoring rather than memorization.

    Authors: The current experiments employ a standard random train/test split across the collected garden imagery. We acknowledge that this does not explicitly isolate performance across distinct gardens and therefore cannot fully rule out site-specific effects. To address the concern we will add a leave-one-garden-out evaluation protocol and report the corresponding results in the revised manuscript. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical benchmark on new dataset with no self-referential derivations

full rationale

The paper extends the TB-Places dataset with garden images across seasons, constructs ground-truth image pairs, and benchmarks CNN backbones (ResNet, DenseNet, VGG NetVLAD) trained via contrastive loss for place recognition. The central claim is an empirical observation that garden-tailored training improves performance with limited generalization. No equations, predictions, or uniqueness theorems are present. No self-citation chains, fitted parameters renamed as predictions, or ansatzes smuggled via prior work. All results are direct performance metrics on the released pairs; the derivation chain is self-contained data collection plus standard supervised training. This is the expected non-finding for a benchmark paper.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 0 invented entities

The central claim rests on standard assumptions in deep learning that contrastive training on image pairs produces useful place descriptors, plus the representativeness of the collected garden data; no new entities or ad-hoc parameters are introduced beyond typical network training.

free parameters (1)
  • contrastive loss hyperparameters
    Typical margin and weighting parameters in contrastive loss are fitted during training but not specified in the abstract.
axioms (1)
  • domain assumption Convolutional neural networks can extract useful visual representations from image data when trained with contrastive loss
    Invoked when stating that garden-tailored training improves performance.

pith-pipeline@v0.9.0 · 5773 in / 1232 out tokens · 56526 ms · 2026-05-25T13:47:35.360571+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

26 extracted references · 26 canonical work pages · 2 internal anchors

  1. [1]

    Netvlad: Cnn architecture for weakly supervised place recognition

    Relja Arandjelovic, Petr Gronat, Akihiko Torii, Tomas Pajdla, and Josef Sivic. Netvlad: Cnn architecture for weakly supervised place recognition. InIEEE CVPR, pages 5297–5307, 2016

  2. [2]

    Harvesting robots for high-value crops: State-of-the-art review and challenges ahead

    C Wouter Bac, Eldert J van Henten, Jochen Hemming, and Yael Edan. Harvesting robots for high-value crops: State-of-the-art review and challenges ahead. Journal of Field Robotics , 31(6):888–911, 2014

  3. [3]

    Visual topometric localization

    Hern´ an Badino, D Huber, and Takeo Kanade. Visual topometric localization. In Intelligent Vehicles Symposium (IV), 2011 IEEE , pages 794–799, 2011

  4. [4]

    Fab-map: Probabilistic localization and map- ping in the space of appearance

    Mark Cummins and Paul Newman. Fab-map: Probabilistic localization and map- ping in the space of appearance. Int. J. Rob. Res. , 27(6):647–665, 2008

  5. [5]

    Highly scalable appearance-only slam-fab-map 2.0

    Mark Cummins and Paul Newman. Highly scalable appearance-only slam-fab-map 2.0. In Robotics: Science and Systems , volume 5, page 17, 2009

  6. [6]

    Vision meets robotics: The kitti dataset

    Andreas Geiger, Philip Lenz, Christoph Stiller, and Raquel Urtasun. Vision meets robotics: The kitti dataset. Int. J. Rob. Res. , 32(11):1231–1237, 2013

  7. [7]

    Deep residual learning for image recognition

    Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. In IEEE CVPR , pages 770–778, 2016. 12 M. Leyva-Vallina, N. Strisciuglio, N. Petkov

  8. [8]

    Densely connected convolutional networks

    Gao Huang, Zhuang Liu, Laurens Van Der Maaten, and Kilian Q Weinberger. Densely connected convolutional networks. In IEEE CVPR , pages 4700–4708, 2017

  9. [9]

    Imagenet classification with deep convolutional neural networks

    Alex Krizhevsky, Ilya Sutskever, and Geoffrey E Hinton. Imagenet classification with deep convolutional neural networks. In NIPS, pages 1097–1105, 2012

  10. [10]

    Tb-places: a data set for visual place recognition in garden environments

    Mar´ ıa Leyva-Vallina, Nicola Strisciuglio, Manuel L´ opez-Antequera, Radim Tyle- cek, Michael Blaich, and Nicolai Petkov. Tb-places: a data set for visual place recognition in garden environments. IEEE Access, 2019

  11. [11]

    Appearance-invariant place recognition by discriminatively training a convolutional neural network

    Manuel Lopez-Antequera, Ruben Gomez-Ojeda, Nicolai Petkov, and Javier Gonzalez-Jimenez. Appearance-invariant place recognition by discriminatively training a convolutional neural network. Pattern Recognit. Lett., 92:89–95, 2017

  12. [12]

    Visual place recognition: A survey

    Stephanie Lowry, Niko S¨ underhauf, Paul Newman, John J Leonard, David Cox, Peter Corke, and Michael J Milford. Visual place recognition: A survey. IEEE Transactions on Robotics, 32(1):1–19, 2016

  13. [13]

    Stewart, and Paul Newman

    Colin McManus, Winston Churchill, Will Maddern, Alexander D. Stewart, and Paul Newman. Shady dealings: Robust, long-term visual localisation using illumi- nation invariance. In IEEE ICRA , pages 901–906, 2014

  14. [14]

    Seqslam: Visual route-based navigation for sunny summer days and stormy winter nights

    Michael J Milford and Gordon F Wyeth. Seqslam: Visual route-based navigation for sunny summer days and stormy winter nights. InIEEE ICRA, pages 1643–1649, 2012

  15. [15]

    Design of an autonomous precision pollination robot

    Nicholas Ohi, Kyle Lassak, Ryan Watson, Jared Strader, Yixin Du, Chizhao Yang, Gabrielle Hedrick, Jennifer Nguyen, Scott Harper, Dylan Reynolds, et al. Design of an autonomous precision pollination robot. In IEEE IROS , 2018

  16. [16]

    Benchmarking 6dof outdoor visual localization in changing conditions

    Torsten Sattler, Will Maddern, Carl Toft, Akihiko Torii, Lars Hammarstrand, Erik Stenborg, Daniel Safari, Masatoshi Okutomi, Marc Pollefeys, Josef Sivic, et al. Benchmarking 6dof outdoor visual localization in changing conditions. In IEEE CVPR, volume 1, 2018

  17. [17]

    Image retrieval for image-based localization revisited

    Torsten Sattler, Tobias Weyand, Bastian Leibe, and Leif Kobbelt. Image retrieval for image-based localization revisited. In BMVC, volume 1, page 4, 2012

  18. [18]

    Scene coordinate regression forests for camera relocaliza- tion in rgb-d images

    Jamie Shotton, Ben Glocker, Christopher Zach, Shahram Izadi, Antonio Criminisi, and Andrew Fitzgibbon. Scene coordinate regression forests for camera relocaliza- tion in rgb-d images. In IEEE CVPR , pages 2930–2937, 2013

  19. [19]

    Very Deep Convolutional Networks for Large-Scale Image Recognition

    Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 , 2014

  20. [20]

    A Push-Pull Layer Improves Robustness of Convolutional Neural Networks

    Nicola Strisciuglio, Manuel Lopez-Antequera, and Nicolai Petkov. A push-pull layer improves robustness of convolutional neural networks. arXiv preprint arXiv:1901.10208, 2019

  21. [21]

    Nicola Strisciuglio, Radim Tylecek, Michael Blaich, Nicolai Petkov, Peter Bieber, Jochen Hemming, Eldert van Henten, Torsten Sattler, Marc Pollefeys, Theo Gev- ers, Thomas Brox, and Robert B. Fisher. Trimbot2020: an outdoor robot for automatic gardening. ISR, 2018

  22. [22]

    Are we there yet? challenging seqslam on a 3000 km journey across all four seasons

    Niko S¨ underhauf, Peer Neubert, and Peter Protzel. Are we there yet? challenging seqslam on a 3000 km journey across all four seasons. In IEEE ICRA , page 2013, 2013

  23. [23]

    Brief-gist-closing the loop by simple means

    Niko S¨ underhauf and Peter Protzel. Brief-gist-closing the loop by simple means. In IEEE IROS , pages 1234–1241. IEEE, 2011

  24. [24]

    Visual place recognition with repetitive structures

    Akihiko Torii, Josef Sivic, Masatoshi Okutomi, and Tomas Pajdla. Visual place recognition with repetitive structures. IEEE Trans. Pattern Anal. Mach. Intell. , pages 2346–2359, 2015. Place recognition in gardens by learning visual representations 13

  25. [25]

    Flourish-a robotic approach for automation in crop management

    Achim Walter, Raghav Khanna, Philipp Lottes, Cyrill Stachniss, Roland Siegwart, Juan Nieto, and Frank Liebisch. Flourish-a robotic approach for automation in crop management. In ICPA, 2018

  26. [26]

    Improving the robustness of deep neural networks via stability training

    Stephan Zheng, Yang Song, Thomas Leung, and Ian Goodfellow. Improving the robustness of deep neural networks via stability training. In IEEE CVPR , pages 4480–4488, 2016