arxiv: 2605.08557 · v1 · submitted 2026-05-08 · 💻 cs.CV · cs.AI· cs.LG

Recognition: no theorem link

MC-RFM: Geometry-Aware Few-Shot Adaptation via Mixed-Curvature Riemannian Flow Matching

Salim Khazem , Ibrahim Mohamed Serouis , Zakaria Ezzahed

Authors on Pith no claims yet

Pith reviewed 2026-05-12 01:28 UTC · model grok-4.3

classification 💻 cs.CV cs.AIcs.LG

keywords few-shot adaptationRiemannian flow matchingmixed-curvature manifoldsfrozen backbonesvisual recognitionhyperbolic geometryprototype classification

0 comments

The pith

Adapting frozen vision models for few-shot tasks works better by continuously transporting features on a mixed hyperbolic-Euclidean manifold.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper proposes modeling few-shot adaptation as a continuous flow on a product manifold that mixes hyperbolic geometry for semantic hierarchies and Euclidean geometry for visual details. Instead of adding small changes to frozen features in flat space, the method learns to move them smoothly to class prototypes using a flow-matching objective. This runs without updating the backbone and uses only cached features. It shows superior results to other adaptation methods on most of seven benchmarks across different shot settings and backbones, particularly transformers and fine-grained data. This suggests that matching the geometry to the task structure improves how representations adapt.

Core claim

The central claim is that few-shot adaptation of frozen visual backbones can be achieved by formulating the process as task-conditioned continuous transport on a mixed-curvature product manifold, where a hyperbolic component captures hierarchy and a Euclidean one preserves local variation, trained with Riemannian flow matching to reach support-set prototypes and paired with a hybrid classifier.

What carries the argument

Mixed-curvature Riemannian flow matching that performs continuous feature transport on a hyperbolic-Euclidean product manifold from frozen representations to task prototypes.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Similar geometry-aware transport could apply to adapting models in other domains like natural language processing where hierarchical structures appear.
Future work might explore fully learned manifold curvatures instead of fixed hyperbolic and Euclidean factors.
Testing on even lower shot regimes or out-of-distribution tasks would further validate the geometry choice.

Load-bearing premise

That the displacement between frozen features and the target task distribution can be effectively modeled as continuous transport on a product space of hyperbolic and Euclidean geometries.

What would settle it

If MC-RFM fails to rank as the top method in a majority of the tested settings across the seven benchmarks when compared to linear probes, prompt tuning, and low-rank adaptation methods.

read the original abstract

Parameter-efficient adaptation of pretrained vision models is commonly performed through linear probes, prompts, low-rank updates, or lightweight residual modules. While effective, these methods usually treat adaptation as a discrete Euclidean perturbation of frozen representations, without explicitly modeling the geometry of the task-induced feature displacement. We propose \textsc{MC-RFM}, a mixed-curvature Riemannian flow-matching framework for few-shot adaptation of frozen visual backbones. The key idea is to represent adapted features on a product manifold combining a hyperbolic factor, which captures hierarchy-sensitive semantic structure, and a Euclidean factor, which preserves locally discriminative visual variation. Adaptation is formulated as a task-conditioned continuous transport from frozen features to support-set prototypes, trained with a flow-matching objective and coupled to a hybrid prototype-linear classifier. The method is lightweight, backbone-agnostic, and operates entirely on cached frozen features. Across seven visual recognition benchmarks, five frozen backbones, and 1/4/16-shot regimes, \textsc{MC-RFM} is the best-performing method in a majority of evaluated settings, with the strongest gains on Transformer backbones and fine-grained datasets. Ablations show that the mixed-curvature head, task conditioning, adaptive branch gating, prototype shrinkage, and discriminative supervision each contribute to performance. These results suggest that few-shot adaptation benefits not only from deciding which parameters to update, but also from modeling how representations should move through a geometry matched to the structure of the downstream task.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 3 minor

Summary. The paper proposes MC-RFM, a mixed-curvature Riemannian flow-matching framework for parameter-efficient few-shot adaptation of frozen visual backbones. Adaptation is modeled as task-conditioned continuous transport on a product manifold (hyperbolic factor for hierarchy-sensitive semantics plus Euclidean factor for local visual variation), trained via a flow-matching objective and paired with a hybrid prototype-linear classifier. The method operates solely on cached frozen features and is claimed to be backbone-agnostic. Across seven visual recognition benchmarks, five backbones, and 1/4/16-shot regimes, MC-RFM achieves the best performance in a majority of settings (strongest gains on Transformers and fine-grained data), with ablations attributing gains to the mixed-curvature head, task conditioning, adaptive gating, prototype shrinkage, and discriminative supervision.

Significance. If the empirical results hold under rigorous controls, the work offers a geometrically principled alternative to discrete Euclidean perturbations in few-shot adaptation. By explicitly matching the manifold geometry to task-induced feature displacements and providing controlled ablations that isolate each component, it supplies evidence that continuous transport on mixed-curvature spaces can improve adaptation without backbone updates. The broad evaluation across backbones and shot regimes strengthens the case for geometry-aware methods in parameter-efficient fine-tuning.

major comments (2)

[Abstract and Experiments section] The central empirical claim (majority-best performance across seven benchmarks, five backbones, and three shot regimes) is load-bearing, yet the provided text supplies no error bars, standard deviations, or details on the number of random seeds/runs. This prevents verification that reported gains are statistically reliable rather than within noise.
[Method (flow-matching objective and manifold construction)] The modeling assumption that task-induced displacements are well captured by continuous transport on the H × E product manifold is tested via ablations, but the manuscript does not report the specific curvature values, manifold dimensions, or the precise definition of the product metric used in the flow-matching ODE; without these, it is difficult to assess whether the gains truly arise from the mixed-curvature geometry or from the added capacity of the hybrid classifier.

minor comments (3)

[Method] Notation for the hybrid classifier (prototype vs. linear branch) and the gating mechanism should be introduced with explicit equations rather than descriptive text only.
[Ablations] Ablation tables would benefit from consistent reporting of all compared variants (including the Euclidean-only and hyperbolic-only ablations) with the same metrics and shot settings used in the main tables.
[Abstract] The abstract states 'majority of evaluated settings' without quantifying the exact fraction or listing the settings where it underperforms; adding this would improve clarity.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the positive assessment and constructive comments on our manuscript. We appreciate the recognition of the geometric principles and the breadth of the evaluation. We address each major comment below and will incorporate the necessary revisions.

read point-by-point responses

Referee: [Abstract and Experiments section] The central empirical claim (majority-best performance across seven benchmarks, five backbones, and three shot regimes) is load-bearing, yet the provided text supplies no error bars, standard deviations, or details on the number of random seeds/runs. This prevents verification that reported gains are statistically reliable rather than within noise.

Authors: We agree that the absence of error bars and run details limits the ability to assess statistical reliability. In the revised manuscript we will add standard deviations computed over five independent random seeds for all main results, specify the seed count in the experimental protocol, and include these statistics in the tables and text. revision: yes
Referee: [Method (flow-matching objective and manifold construction)] The modeling assumption that task-induced displacements are well captured by continuous transport on the H × E product manifold is tested via ablations, but the manuscript does not report the specific curvature values, manifold dimensions, or the precise definition of the product metric used in the flow-matching ODE; without these, it is difficult to assess whether the gains truly arise from the mixed-curvature geometry or from the added capacity of the hybrid classifier.

Authors: We acknowledge that the manuscript does not provide the exact curvature values, manifold dimensions, or the formal definition of the product metric. In the revision we will insert a dedicated paragraph and table in the Method section that states the curvature parameter for the hyperbolic factor, the dimensions of each manifold component, and the explicit product metric used inside the Riemannian flow-matching ODE, thereby clarifying the geometric contribution. revision: yes

Circularity Check

0 steps flagged

No significant circularity in the derivation chain

full rationale

The paper introduces MC-RFM as a modeling framework that formulates few-shot adaptation as continuous transport on a product manifold (hyperbolic × Euclidean) via a flow-matching objective applied to cached frozen features and support-set prototypes. This objective and the hybrid classifier are derived from standard flow-matching and manifold principles, then directly tested via controlled ablations that isolate the mixed-curvature component, task conditioning, and gating. Performance claims rest on external benchmark comparisons across seven datasets, five backbones, and multiple shot regimes rather than any internal reduction to fitted parameters or self-citation chains. No quoted step equates a claimed prediction or uniqueness result to its own inputs by construction.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on the suitability of a mixed-curvature product manifold for vision feature displacements; no new physical entities are postulated.

axioms (1)

domain assumption Feature displacements induced by downstream visual tasks exhibit both hierarchy-sensitive semantic structure and locally discriminative variation that are respectively captured by hyperbolic and Euclidean geometries.
Invoked in the key idea of representing adapted features on the product manifold.

pith-pipeline@v0.9.0 · 5574 in / 1259 out tokens · 56576 ms · 2026-05-12T01:28:32.726346+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

43 extracted references · 43 canonical work pages · 1 internal anchor

[1]

Hyperbolicimagesegmentation

M.G.Atigh, J.Schoep, E.Acar, N.vanNoord, andP.Mettes. Hyperbolicimagesegmentation. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 4453–4462, 2022

work page 2022
[2]

Bossard, M

L. Bossard, M. Guillaumin, and L. Van Gool. Food-101 – Mining Discriminative Components with Random Forests. In D. Fleet, T. Pajdla, B. Schiele, and T. Tuytelaars, editors,Computer Vision – ECCV 2014, pages 446–461, Cham, 2014. Springer International Publishing

work page 2014
[3]

Chami, R

I. Chami, R. Ying, C. Ré, and J. Leskovec. Hyperbolic graph convolutional neural networks. InAdvances in Neural Information Processing Systems, volume 32, pages 4869–4880, 2019

work page 2019
[4]

R. T. Q. Chen and Y. Lipman. Flow matching on general geometries. InInternational Conference on Learning Representations (ICLR), 2024

work page 2024
[5]

S. Chen, C. Ge, Z. Tong, J. Wang, Y. Song, J. Wang, and P. Luo. Adaptformer: Adapting vision transformers for scalable visual recognition. InAdvances in Neural Information Processing Systems, volume 35, 2022. 9

work page 2022
[6]

T. Chen, S. Kornblith, M. Norouzi, and G. Hinton. A simple framework for contrastive learning of visual representations. InProceedings of the 37th International Conference on Machine Learning, volume 119 of Proceedings of Machine Learning Research, pages 1597–1607. PMLR, 2020

work page 2020
[7]

Cimpoi, S

M. Cimpoi, S. Maji, I. Kokkinos, S. Mohamed, and A. Vedaldi. Describing Textures in the Wild. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2014

work page 2014
[8]

Dosovitskiy, L

A. Dosovitskiy, L. Beyer, A. Kolesnikov, D. Weissenborn, X. Zhai, T. Unterthiner, M. Dehghani, M. Minderer, G. Heigold, S. Gelly, J. Uszkoreit, and N. Houlsby. An image is worth 16x16 words: Transformers for image recognition at scale. InInternational Conference on Learning Representations, 2021

work page 2021
[9]

Ermolov, L

A. Ermolov, L. Mirvakhabova, V. Khrulkov, N. Sebe, and I. Oseledets. Hyperbolic vision transformers: Combining improvements in metric learning. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 7409–7419, 2022

work page 2022
[10]

C. Finn, P. Abbeel, and S. Levine. Model-agnostic meta-learning for fast adaptation of deep networks. In Proceedings of the 34th International Conference on Machine Learning, volume 70 ofProceedings of Machine Learning Research, pages 1126–1135. PMLR, 2017

work page 2017
[11]

Ganea, G

O.-E. Ganea, G. Bécigneul, and T. Hofmann. Hyperbolic neural networks. InAdvances in Neural Information Processing Systems, volume 31, pages 5350–5360, 2018

work page 2018
[12]

A. Gu, F. Sala, B. Gunel, and C. Ré. Learning mixed-curvature representations in product spaces. In International Conference on Learning Representations, 2019

work page 2019
[13]

He, J.-N

J. He, J.-N. Chen, S. Liu, A. Kortylewski, C. Yang, Y. Bai, and C. Wang. Transfg: A transformer architecture for fine-grained recognition. InProceedings of the AAAI conference on artificial intelligence, volume 36, pages 852–860, 2022

work page 2022
[14]

K. He, H. Fan, Y. Wu, S. Xie, and R. Girshick. Momentum contrast for unsupervised visual representation learning. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 9729–9738, 2020

work page 2020
[15]

K. He, X. Zhang, S. Ren, and J. Sun. Deep residual learning for image recognition. InProceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 770–778, 2016

work page 2016
[16]

Helber, B

P. Helber, B. Bischke, A. Dengel, and D. Borth. Eurosat: A novel dataset and deep learning benchmark for land use and land cover classification.IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, 12(7):2217–2226, 2019

work page 2019
[17]

Houlsby, A

N. Houlsby, A. Giurgiu, S. Jastrzebski, B. Morrone, Q. De Laroussilhe, A. Gesmundo, M. Attariyan, and S. Gelly. Parameter-efficient transfer learning for nlp. InProceedings of the 36th International Conference on Machine Learning, volume 97 ofProceedings of Machine Learning Research, pages 2790–2799. PMLR, 2019

work page 2019
[18]

E. J. Hu, Y. Shen, P. Wallis, Z. Allen-Zhu, Y. Li, S. Wang, L. Wang, and W. Chen. Lora: Low-rank adaptation of large language models. InInternational Conference on Learning Representations, 2022

work page 2022
[19]

Karras, M

T. Karras, M. Aittala, T. Aila, and S. Laine. Elucidating the design space of diffusion-based generative models. InAdvances in Neural Information Processing Systems (NeurIPS), 2022

work page 2022
[20]

S. Khazem. Adaptertune: Zero-initialized low-rank adapters for frozen vision transformers.arXiv preprint arXiv:2603.14706, 2026

work page arXiv 2026
[21]

S. Khazem. Topolora-sam: Topology-aware parameter-efficient adaptation of foundation segmenters for thin-structure and cross-domain binary semantic segmentation.arXiv preprint arXiv:2601.02273, 2026

work page arXiv 2026
[22]

Khosla, P

P. Khosla, P. Teterwak, C. Wang, A. Sarna, Y. Tian, P. Isola, A. Maschinot, C. Liu, and D. Krishnan. Supervised contrastive learning. InAdvances in Neural Information Processing Systems, volume 33, pages 18661–18673, 2020

work page 2020
[23]

Khrulkov, L

V. Khrulkov, L. Mirvakhabova, E. Ustinova, I. Oseledets, and V. Lempitsky. Hyperbolic image embeddings. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 6418–6428, 2020

work page 2020
[24]

Krizhevsky, G

A. Krizhevsky, G. Hinton, and others. Learning multiple layers of features from tiny images. 2009

work page 2009
[25]

Lipman, R

Y. Lipman, R. T. Q. Chen, H. Ben-Hamu, M. Nickel, and M. Le. Flow matching for generative modeling. In International Conference on Learning Representations, 2023

work page 2023
[26]

Q. Liu, M. Nickel, and D. Kiela. Hyperbolic graph neural networks. InAdvances in Neural Information Processing Systems, volume 32, pages 8228–8239, 2019. 10

work page 2019
[27]

S. Liu, J. Chen, L. Pan, C.-W. Ngo, T.-S. Chua, and Y.-G. Jiang. Hyperbolic visual embedding learning for zero-shot recognition. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 9273–9281, 2020

work page 2020
[28]

X. Liu, C. Gong, and Q. Liu. Flow straight and fast: Learning to generate and transfer data with rectified flow. InInternational Conference on Learning Representations (ICLR), 2023

work page 2023
[29]

Z. Liu, Y. Lin, Y. Cao, H. Hu, Y. Wei, Z. Zhang, S. Lin, and B. Guo. Swin transformer: Hierarchical vision transformer using shifted windows. InProceedings of the IEEE/CVF International Conference on Computer Vision, pages 9992–10002, 2021

work page 2021
[30]

Z. Liu, H. Mao, C.-Y. Wu, C. Feichtenhofer, T. Darrell, and S. Xie. A convnet for the 2020s. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 11966–11976, 2022

work page 2022
[31]

S. Maji, J. Kannala, E. Rahtu, M. Blaschko, and A. Vedaldi. Fine-Grained Visual Classification of Aircraft. Technical report, 2013. _eprint: 1306.5151

work page internal anchor Pith review Pith/arXiv arXiv 2013
[32]

Intriguingproperties of vision transformers.Advances in Neural Information Processing Systems, 34:23296–23308, 2021

M.M.Naseer, K.Ranasinghe, S.H.Khan, M.Hayat, F.ShahbazKhan, andM.-H.Yang. Intriguingproperties of vision transformers.Advances in Neural Information Processing Systems, 34:23296–23308, 2021

work page 2021
[33]

Nickel and D

M. Nickel and D. Kiela. Poincaré embeddings for learning hierarchical representations. InAdvances in Neural Information Processing Systems, volume 30, 2017

work page 2017
[34]

Nickel and D

M. Nickel and D. Kiela. Learning continuous hierarchies in the lorentz model of hyperbolic geometry. In Proceedings of the 35th International Conference on Machine Learning, volume 80 ofProceedings of Machine Learning Research, pages 3779–3788. PMLR, 2018

work page 2018
[35]

Nilsback and A

M.-E. Nilsback and A. Zisserman. Automated flower classification over a large number of classes. In2008 Sixth Indian conference on computer vision, graphics & image processing, pages 722–729. IEEE, 2008

work page 2008
[36]

Krueger, and I

A.Radford, J.W.Kim, C.Hallacy, A.Ramesh, G.Goh, S.Agarwal, G.Sastry, A.Askell, P.Mishkin, J.Clark, G. Krueger, and I. Sutskever. Learning transferable visual models from natural language supervision. In Proceedings of the 38th International Conference on Machine Learning, volume 139 ofProceedings of Machine Learning Research, pages 8748–8763. PMLR, 2021

work page 2021
[37]

Sáez de Ocáriz Borde, A

H. Sáez de Ocáriz Borde, A. Arroyo, I. Morales, I. Posner, and X. Dong. Neural latent geometry search: Product manifold inference via gromov-hausdorff-informed bayesian optimization. InAdvances in Neural Information Processing Systems, volume 36, 2023

work page 2023
[38]

Skopek, O.-E

O. Skopek, O.-E. Ganea, and G. Bécigneul. Mixed-curvature variational autoencoders. InInternational Conference on Learning Representations, 2020

work page 2020
[39]

Snell, K

J. Snell, K. Swersky, and R. S. Zemel. Prototypical networks for few-shot learning. InAdvances in Neural Information Processing Systems, volume 30, 2017

work page 2017
[40]

Y. Song, J. Sohl-Dickstein, D. P. Kingma, A. Kumar, S. Ermon, and B. Poole. Score-based generative modeling through stochastic differential equations. InInternational Conference on Learning Representations (ICLR), 2021

work page 2021
[41]

A. Tong, K. Fatras, N. Malkin, G. Huguet, Y. Zhang, J. Rector-Brooks, G. Wolf, and Y. Bengio. Improving and generalizing flow-based generative models with minibatch optimal transport.Transactions on Machine Learning Research (TMLR), 2024

work page 2024
[42]

Touvron, M

H. Touvron, M. Cord, M. Douze, F. Massa, A. Sablayrolles, and H. Jégou. Training data-efficient image transformers and distillation through attention. InProceedings of the 38th International Conference on Machine Learning, volume 139 ofProceedings of Machine Learning Research, pages 10347–10357. PMLR, 2021

work page 2021
[43]

Vinyals, C

O. Vinyals, C. Blundell, T. Lillicrap, K. Kavukcuoglu, and D. Wierstra. Matching networks for one shot learning. InAdvances in Neural Information Processing Systems, volume 29, pages 3630–3638, 2016. 11 A Algorithmic Details Training and inference workflow.Algorithms 1 and 2 summarize the full MC-RFM procedure used in our experiments. During training, fro...

work page arXiv 2016