3D-LENS: A 3D Lifting-based Elevated Novel-view Synthesis method for Single-View Aerial-Ground Re-Identification
Pith reviewed 2026-05-07 11:45 UTC · model grok-4.3
The pith
3D mesh reconstruction from single views enables re-identification across unseen aerial and ground perspectives.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
We propose 3D-LENS, a unified framework that combines geometrically consistent novel-view synthesis obtained by lifting single real images to large-scale 3D meshes with a representation-learning stage that mitigates synthetic-to-real bias, thereby enabling models trained on one viewpoint to achieve state-of-the-art retrieval performance on unseen viewpoints without any paired cross-view annotations or class-specific templates.
What carries the argument
3D Lifting-based Elevated Novel-view Synthesis that reconstructs a 3D mesh from a single real image and renders consistent elevated views without relying on predefined class templates.
Load-bearing premise
Large-scale 3D mesh reconstruction from single real images can produce geometrically consistent novel views across many object categories, and any synthetic-to-real appearance gap can be reduced enough for the downstream re-identification task to succeed.
What would settle it
On a new single-view aerial-ground re-identification test set containing objects with complex carried items, measure whether 3D-LENS retrieval accuracy falls below that of a strong 2D generative baseline; if it does, the central claim is falsified.
Figures
read the original abstract
Aerial-Ground Re-Identification (AG-ReID) is constrained by the viewpoint-domain gap, as drastic viewpoint disparities occlude or distort discriminative features, making cross-viewpoint image retrieval challenging. While existing methods rely on paired cross-view annotations, real-world deployments, such as wilderness search-and-rescue (SAR), often lack target-domain data, requiring retrieval from ground-level references alone. To our knowledge, we are the first to address this challenge by formalizing the Single-View AG-ReID (SV AG-ReID) setting, where models trained on a single real viewpoint must generalize to an unseen viewpoint. We propose 3D Lifting-based Elevated Novel-view Synthesis (3D-LENS), a unified framework combining geometrically-consistent novel view synthesis that leverages large-scale 3D mesh reconstruction, with a robust representation learning scheme to mitigate synthetic-to-real bias. Unlike 2D generative baselines that suffer from geometric inconsistencies or prior 3D methods that are restricted to class-specific templates, our approach ensures view-consistent synthesis across diverse categories without predefined templates that fail to capture fine-grained details, such as carried objects. Extensive experiments demonstrate that our method achieves state-of-the-art performance on SV AG-ReID scenarios. Code and data will be released at https://github.com/TurtleSmoke/3D-LENS.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper formalizes the Single-View Aerial-Ground Re-Identification (SV AG-ReID) problem, where models trained on a single real viewpoint must generalize to unseen viewpoints without paired cross-view data. It proposes 3D-LENS, a unified framework that performs large-scale 3D mesh reconstruction for geometrically consistent elevated novel-view synthesis, combined with representation learning to mitigate synthetic-to-real bias, and claims state-of-the-art performance on SV AG-ReID scenarios across diverse categories without class-specific templates.
Significance. If the central results hold, the work would meaningfully advance AG-ReID for practical settings such as search-and-rescue by removing the need for paired annotations and class-specific 3D templates. The template-free 3D lifting approach and planned code/data release are positive contributions to reproducibility and generalization across object categories.
major comments (2)
- [Abstract] Abstract: the central claim that 3D-LENS achieves SOTA performance on SV AG-ReID rests on the effectiveness of single-view 3D mesh reconstruction for producing usable elevated novel views, yet the abstract provides no quantitative reconstruction metrics, qualitative examples of geometric consistency, or ablation on artifact impact; this is load-bearing because depth ambiguity and self-occlusion in single-view reconstruction commonly produce holes or distortions that would prevent the reported gains over 2D baselines.
- [Abstract] Abstract and method overview: the assertion that synthetic-to-real bias is sufficiently mitigated for effective generalization is not accompanied by any description of the bias-mitigation scheme, dataset statistics, or cross-domain evaluation protocol; without these, it is impossible to determine whether the ReID improvements are attributable to the 3D synthesis or to other factors.
minor comments (1)
- [Abstract] Abstract: the phrase 'large-scale 3D mesh reconstruction' is used without clarifying the reconstruction backbone or scale of the meshes, which would aid reader understanding.
Simulated Author's Rebuttal
We thank the referee for the constructive feedback and for recognizing the potential impact of formalizing the SV AG-ReID setting. We address each major comment below with specific revisions to the abstract and supporting sections.
read point-by-point responses
-
Referee: [Abstract] Abstract: the central claim that 3D-LENS achieves SOTA performance on SV AG-ReID rests on the effectiveness of single-view 3D mesh reconstruction for producing usable elevated novel views, yet the abstract provides no quantitative reconstruction metrics, qualitative examples of geometric consistency, or ablation on artifact impact; this is load-bearing because depth ambiguity and self-occlusion in single-view reconstruction commonly produce holes or distortions that would prevent the reported gains over 2D baselines.
Authors: We agree that the abstract should more explicitly reference the reconstruction quality evidence. The full manuscript reports quantitative metrics (PSNR/SSIM and Chamfer distance) for single-view 3D mesh reconstruction in Section 4.2, provides qualitative examples of geometric consistency and artifact handling in Figure 3, and includes an ablation on artifact impact in Table 4 (showing ReID performance drop when artifacts are not mitigated). We will revise the abstract to include one key reconstruction metric and a short clause on geometric consistency to better support the central claim without exceeding length limits. revision: yes
-
Referee: [Abstract] Abstract and method overview: the assertion that synthetic-to-real bias is sufficiently mitigated for effective generalization is not accompanied by any description of the bias-mitigation scheme, dataset statistics, or cross-domain evaluation protocol; without these, it is impossible to determine whether the ReID improvements are attributable to the 3D synthesis or to other factors.
Authors: We acknowledge the abstract is too terse on this point. Section 3.4 details the bias-mitigation scheme (adversarial domain alignment plus synthetic-to-real feature regularization), Section 4.1 provides dataset statistics (including synthetic vs. real sample counts and category diversity), and Section 4.3 describes the cross-domain protocol with results in Table 2 isolating the contribution of 3D synthesis. We will add a concise description of the bias-mitigation approach and evaluation protocol to the abstract to clarify attribution of gains. revision: yes
Circularity Check
No circularity: method applies external 3D reconstruction techniques to a new task without self-referential reduction
full rationale
The paper formalizes SV AG-ReID as a new setting and proposes 3D-LENS by combining large-scale 3D mesh reconstruction (leveraging prior techniques) with representation learning to mitigate domain bias. No equations or claims reduce by construction to fitted inputs, self-citations, or renamed known results; the SOTA performance assertion rests on experimental validation rather than tautological definitions or load-bearing self-references. The derivation chain remains independent of its own outputs.
Axiom & Free-Parameter Ledger
axioms (2)
- domain assumption Geometrically consistent novel views can be synthesized from single-view 3D mesh reconstructions across diverse object categories without class-specific templates
- domain assumption Synthetic-to-real domain gap can be mitigated via robust representation learning to enable generalization in the single-view setting
Reference graph
Works this paper leans on
-
[1]
Algasov, A., Nepovinnykh, E., Zolotarev, F., Eerola, T., Kälviäinen, H., Zemčík, P., Stewart, C.V.: Unsupervised pelage pattern unwrapping for animal re- identification (2025) 5
work page 2025
- [2]
- [3]
-
[4]
Chen, S., Ye, M., Du, B.: Rotation invariant transformer for recognizing object in uavs. In: ACM MM. pp. 2565–2574 (2022) 10, 11, 12, 13
work page 2022
-
[5]
Chen, S., Ye, M., Huang, Y., Du, B.: Towards effective rotation generalization in uav object re-identification. IEEE TIFS (2025) 2, 4
work page 2025
-
[6]
In: ICLR (2021) 9, 10, 11, 12, 13
Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., Uszkoreit, J., Houlsby, N.: An image is worth 16x16 words: Transformers for image recognition at scale. In: ICLR (2021) 9, 10, 11, 12, 13
work page 2021
-
[7]
Grolleau, W., Chaouch, A., Sabourin, A., Lapouge, G., Achard, C.: Moo: A multi- view oriented observations dataset for viewpoint analysis in cattle re-identification (2026) 9
work page 2026
-
[8]
He, L., Liao, X., Liu, W., Liu, X., Cheng, P., Mei, T.: Fastreid: A pytorch toolbox for general instance re-identification. In: ACM MM. pp. 9664–9667 (2023) 10, 11, 12, 13
work page 2023
- [9]
- [10]
-
[11]
Khanam, R., Hussain, M.: Yolov11: An overview of the key architectural enhance- ments (2024) 7, 10
work page 2024
-
[12]
Kim, I.H., Lee, J., Jin, W., Son, S., Cho, K., Seo, J., Kwak, M.S., Cho, S., Baek, J., Lee, B., Kim, S.: Pose-dive: Pose-diversified augmentation with diffusion model for person re-identification (2024) 2, 4
work page 2024
-
[13]
Le, M.H., Carlsson, N.: Styleid: Identity disentanglement for anonymizing faces (2022) 8, 10
work page 2022
-
[14]
Lee, H., Park, J., Oh, J., Eom, C.: Domain generalization for person re- identification: A survey towards domain-agnostic person matching. Neurocomput- ing p. 130763 (2025) 2
work page 2025
-
[15]
Li, B., Liu, P., Fu, L., Li, J., Fang, J., Xu, Z., Yu, H.: Vehiclegan: Pair-flexible pose guided image synthesis for vehicle re-identification. In: IV. pp. 447–453. IEEE (2024) 2, 4
work page 2024
-
[16]
Sensors25(2), 552 (2025) 4, 10, 11, 12, 13
Li, J., Gong, X.: Unleashing the potential of pre-trained diffusion models for gen- eralizable person re-identification. Sensors25(2), 552 (2025) 4, 10, 11, 12, 13
work page 2025
-
[17]
Li, Q., Li, J., Zhang, Y., Tan, L., Chen, J., Ji, J.: Gsalign: Geometric and semantic alignment network for aerial-ground person re-identification (2026) 2, 4
work page 2026
- [18]
- [19]
- [20]
-
[21]
Loper, M., Mahmood, N., Romero, J., Pons-Moll, G., Black, M.J.: Smpl: A skinned multi-person linear model. ACM TOG34(6) (2015) 2, 5
work page 2015
- [22]
-
[23]
Mei, L., Cheng, Y., Chen, H., Jia, L., Yu, Y.: Unsupervised aerial-ground re- identification from pedestrian to group for uav-based surveillance. Drones9(4), 244 (2025) 2
work page 2025
-
[24]
IEEE TMM25, 2954–2965 (2022) 2, 5
Meng, D., Li, L., Liu, X., Gao, L., Huang, Q.: Viewpoint alignment and discrimi- native parts enhancement in 3d space for vehicle reid. IEEE TMM25, 2954–2965 (2022) 2, 5
work page 2022
-
[25]
Nguyen, H., Nguyen, K., Pemasiri, A., Sridharan, S., Fookes, C.: Beyond geometry: The power of texture in interpretable 3d person reid. CVIU p. 104517 (2025) 5
work page 2025
- [26]
-
[27]
v2: Bridging aerial and ground views for person re-identification
Nguyen, H., Nguyen, K., Sridharan, S., Fookes, C.: Ag-reid. v2: Bridging aerial and ground views for person re-identification. IEEE TIFS19, 2896–2908 (2024) 1, 2, 3, 9
work page 2024
-
[28]
Niu, K., Yu, H., Qian, X., Fu, T., Li, B., Xue, X.: Synthesizing efficient data with diffusion models for person re-identification pre-training. Mach. Learn.114(3), 1–25 (2025) 2, 4
work page 2025
-
[29]
Ridnik, T., Ben-Baruch, E., Noy, A., Zelnik-Manor, L.: Imagenet-21k pretraining for the masses (2021) 11
work page 2021
- [30]
-
[31]
Scarselli, F., Gori, M., Tsoi, A.C., Hagenbuchner, M., Monfardini, G.: The graph neural network model. IEEE TNN20(1), 61–80 (2008) 5
work page 2008
- [32]
- [33]
- [34]
-
[35]
IEEE TCSVT34(6), 4698–4712 (2023) 5
Wang, C., Ning, X., Li, W., Bai, X., Gao, X.: 3d person re-identification based on global semantic guidance and local feature aggregation. IEEE TCSVT34(6), 4698–4712 (2023) 5
work page 2023
- [36]
-
[37]
Wang, X., Liang, Y., Liao, S.: Cloning outfits from real-world images to 3d char- acters for generalizable person re-identification. In: CVPR (2022) 5
work page 2022
-
[38]
Wang, Y., Hu, X., Wang, L., Zhang, P., Lu, H.: Sd-reid: View-aware stable diffusion for aerial-ground person re-identification (2025) 4 3D-LENS 17
work page 2025
-
[39]
Xun, Y., Liu, J., Islam, S.M., Chen, Y.: Multi-view vehicle image generation net- work for vehicle re-identification. In: ICC Workshops. pp. 517–522. IEEE (2024) 4
work page 2024
-
[40]
Ye, H., Zhang, J., Liu, S., Han, X., Yang, W.: Ip-adapter: Text compatible image prompt adapter for text-to-image diffusion models (2023) 4
work page 2023
-
[41]
IEEE TPAMI44(6), 2872–2893 (2021) 10, 11, 12, 13
Ye, M., Shen, J., Lin, G., Xiang, T., Shao, L., Hoi, S.C.: Deep learning for person re-identification: A survey and outlook. IEEE TPAMI44(6), 2872–2893 (2021) 10, 11, 12, 13
work page 2021
-
[42]
IEEE TCSVT34(7), 5589–5602 (2024) 5
Yu, Z., Li, L., Xie, J., Wang, C., Li, W., Ning, X.: Pedestrian 3d shape under- standing for person re-identification via multi-view learning. IEEE TCSVT34(7), 5589–5602 (2024) 5
work page 2024
-
[43]
Zhang, F., Firkat, E., Ma, H., Zhu, J., Zhu, B., Hamdulla, A.: Dari: Transformer- based data augmentation and rotation invariance for uav person re-identification. IEEE TMM (2025) 2, 4
work page 2025
- [44]
- [45]
-
[46]
Zhang, S., Zhang, Q., Yang, Y., Wei, X., Wang, P., Jiao, B., Zhang, Y.: Person re-identification in aerial imagery. IEEE TMM23, 281–291 (2020) 1
work page 2020
- [47]
-
[48]
Zhao, Z., Lai, Z., Lin, Q., Zhao, Y., Liu, H., Yang, S., Feng, Y., Yang, M., Zhang, S., Yang, X., Shi, H., Liu, S., Wu, J., Lian, Y., Yang, F., Tang, R., He, Z., Wang, X., Liu, J., Zuo, X., Chen, Z., Lei, B., Weng, H., Xu, J., Zhu, Y., Liu, X., Xu, L., Hu, C., Yang, S., Zhang, S., Liu, Y., Huang, T., Wang, L., Zhang, J., Chen, M., Dong, L., Jia, Y., Cai, ...
work page 2025
- [49]
-
[50]
IEEE TNNLS35(6), 7534–7547 (2022) 5
Zheng, Z., Wang, X., Zheng, N., Yang, Y.: Parameter-efficient person re- identification in the 3d space. IEEE TNNLS35(6), 7534–7547 (2022) 5
work page 2022
- [51]
- [52]
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.