Ranking vs. Assignment: The Metric Mismatch in Multi-View Object Association

Aleksandr Chukhrov; Karina Kvanchiani; Matvei Shelukhan; Timur Mamedov

arxiv: 2606.02022 · v1 · pith:BR3PYYTJnew · submitted 2026-06-01 · 💻 cs.CV · cs.AI· cs.LG

Ranking vs. Assignment: The Metric Mismatch in Multi-View Object Association

Matvei Shelukhan , Timur Mamedov , Aleksandr Chukhrov , Karina Kvanchiani This is my paper

Pith reviewed 2026-06-28 15:40 UTC · model grok-4.3

classification 💻 cs.CV cs.AIcs.LG

keywords multi-view object associationranking metricsassignment problemaverage precisionFPR-95Sinkhorn normalizationevaluation metrics

0 comments

The pith

Pairwise ranking metrics like AP and FPR-95 do not match the assignment objective in multi-view object association.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper establishes that evaluation metrics based on pairwise ranking, such as average precision and false positive rate at 95 percent recall, can give imperfect scores even when the one-to-one assignment of objects across views is already correct. It also shows that the reverse is possible: an optimal ranking can produce an incorrect assignment. Using Sinkhorn normalization on similarity scores as a post-processing step improves the ranking metrics without altering the assignment accuracy. This highlights that optimizing for ranking metrics may not optimize the actual task performance.

Core claim

AP and FPR-95 can be imperfect even when the assignment is already correct, and Sinkhorn-based normalization can make them perfect, while optimal pairwise ranking can still lead to incorrect assignments.

What carries the argument

Sinkhorn-based normalization applied to similarity matrices as a controlled post-processing step to isolate the effect on ranking metrics versus assignment metrics.

If this is right

Models trained to maximize AP or FPR-95 may not achieve the best possible assignments.
Assignment metrics such as ACC and IPAA provide a more direct measure of task success.
Simple post-processing can boost reported ranking scores independently of model improvements.
Evaluation protocols should include both ranking and assignment metrics to avoid misleading conclusions.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Similar metric mismatches could exist in other bipartite matching problems in computer vision.
Future work might explore training methods that directly optimize assignment objectives instead of ranking proxies.
Practitioners should verify if their models suffer from this mismatch by testing Sinkhorn post-processing on their outputs.

Load-bearing premise

The mismatch between ranking and assignment that appears in theory and controlled experiments will also occur with the similarity matrices generated by actual trained models on real data.

What would settle it

A trained model where applying Sinkhorn normalization improves AP and FPR-95 but does not change ACC or IPAA would support the claim; the absence of such improvement or a case where ranking is perfect but assignment is wrong would challenge it.

Figures

Figures reproduced from arXiv: 2606.02022 by Aleksandr Chukhrov, Karina Kvanchiani, Matvei Shelukhan, Timur Mamedov.

**Figure 2.** Figure 2: Overview of our Sinkhorn-based post-processing stress test. The raw affinity ma [PITH_FULL_IMAGE:figures/full_fig_p008_2.png] view at source ↗

**Figure 3.** Figure 3: Mean metric change across all evaluated methods on WILDTRACK under the [PITH_FULL_IMAGE:figures/full_fig_p012_3.png] view at source ↗

**Figure 4.** Figure 4: Distribution of pairwise scores before and after the Sinkhorn-based post [PITH_FULL_IMAGE:figures/full_fig_p012_4.png] view at source ↗

**Figure 5.** Figure 5: Temperature sensitivity for Self-MVA on WILDTRACK under the test-to-test [PITH_FULL_IMAGE:figures/full_fig_p013_5.png] view at source ↗

read the original abstract

Multi-view object association is an important computer vision problem that underlies many multi-camera perception tasks. While this task is naturally formulated as a constrained one-to-one matching problem, recent works heavily rely on pairwise ranking metrics like AP and FPR-95 for model evaluation. We highlight a fundamental mismatch between these metrics and the actual assignment objective. Theoretically, we show that AP and FPR-95 can be imperfect even when the assignment is already correct, and that Sinkhorn-based normalization can make them perfect. Conversely, optimal pairwise ranking can still lead to incorrect assignments. We validate this mismatch in practice by using our Sinkhorn-based normalization as a controlled post-processing stress test. We show that optimizing just a few post-processing parameters significantly boosts AP and FPR-95 without corresponding improvements in assignment-level metrics such as ACC and IPAA.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper shows a theoretical mismatch between ranking metrics like AP and the actual assignment task in multi-view association, backed by a clean post-processing test, but provides no evidence the mismatch occurs in real embedding matrices.

read the letter

The main takeaway is that pairwise ranking metrics can be imperfect even when the assignment is correct, and Sinkhorn normalization can improve them without fixing the assignment. The authors also show the reverse case where optimal ranking still produces wrong assignments. They back this with theory on constructed matrices and then run a controlled post-processing experiment that boosts AP and FPR-95 while leaving assignment metrics like ACC and IPAA unchanged.

What works is the direct separation of the two objectives. The theoretical examples are straightforward, and using Sinkhorn as a post-processing stress test keeps the experiment simple and avoids retraining models. This makes the mismatch easy to see without extra variables.

The soft spot is the missing link to practice. All the counterexamples are built or adjusted by hand, and the paper gives no measurement or argument that similarity matrices from actual multi-view embedding models show the same gap. If real outputs already align ranking and assignment, the metric critique does not change which models get picked. The abstract calls the post-processing a validation in practice, but without checking representativeness the claim stays limited to the constructed cases.

This is for researchers in multi-camera tracking who care about how evaluation choices affect reported results. A reader focused on metric design or assignment pipelines would find the distinction useful. The work shows clear thinking on the ranking-versus-assignment difference, so it deserves peer review even if the next step is testing on real detector outputs.

Referee Report

2 major / 1 minor

Summary. The paper claims a fundamental mismatch between pairwise ranking metrics (AP, FPR-95) commonly used to evaluate multi-view object association models and the underlying constrained one-to-one assignment objective. It provides theoretical constructions demonstrating that AP/FPR-95 can remain imperfect for already-correct assignments (with Sinkhorn normalization able to perfect the metrics without changing the assignment) and that optimal pairwise ranking can still produce incorrect assignments. Empirically, a Sinkhorn-based post-processing stress test is used to show that optimizing a few parameters can significantly improve ranking metrics without corresponding gains in assignment-level metrics such as ACC and IPAA.

Significance. If the mismatch generalizes beyond constructed matrices to the structured similarity matrices produced by real multi-view embedding models, the result would indicate that current evaluation practices are misaligned with the task objective and could be driving suboptimal model development in multi-camera perception. The controlled post-processing experiment is a methodological strength for isolating metric effects.

major comments (2)

[Empirical validation / stress test] The theoretical counter-examples rely on constructed similarity matrices; no measurement or argument is provided showing that the reported mismatch structures (or the effect of Sinkhorn on them) arise in the non-uniform similarity matrices generated by actual feature embeddings from multi-view detectors (see the empirical validation section and the stress-test description).
[Empirical validation] The Sinkhorn post-processing is presented as isolating metric mismatch, but the experiment does not include controls or analysis confirming that the observed AP/FPR-95 gains occur without confounding changes to the underlying pairwise similarities or model outputs (see the description of the post-processing parameters and results on ACC/IPAA).

minor comments (1)

Notation for the assignment problem and the ranking metrics could be introduced more explicitly with a small running example to aid readability.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive comments. We respond to each major comment below, clarifying the role of the empirical stress test on real data while acknowledging where additional details could strengthen the presentation.

read point-by-point responses

Referee: [Empirical validation / stress test] The theoretical counter-examples rely on constructed similarity matrices; no measurement or argument is provided showing that the reported mismatch structures (or the effect of Sinkhorn on them) arise in the non-uniform similarity matrices generated by actual feature embeddings from multi-view detectors (see the empirical validation section and the stress-test description).

Authors: The theoretical constructions use synthetic matrices solely to prove the mathematical possibility of the mismatch. The empirical validation section applies the identical Sinkhorn post-processing directly to the non-uniform similarity matrices produced by trained multi-view embedding models on standard real-world datasets. The observed outcome—that a few optimized parameters produce large gains in AP and FPR-95 while ACC and IPAA remain unchanged—constitutes direct evidence that the mismatch structures are exploitable in the matrices arising from actual detectors. We are prepared to add a short quantitative comparison of matrix statistics (e.g., row-sum variation, off-diagonal concentration) before and after normalization in a revised version. revision: partial
Referee: [Empirical validation] The Sinkhorn post-processing is presented as isolating metric mismatch, but the experiment does not include controls or analysis confirming that the observed AP/FPR-95 gains occur without confounding changes to the underlying pairwise similarities or model outputs (see the description of the post-processing parameters and results on ACC/IPAA).

Authors: The procedure modifies only the output similarity matrix through Sinkhorn normalization; the underlying feature embeddings and model weights are untouched. Isolation is achieved by holding the model fixed and reporting that assignment-level metrics (ACC, IPAA) are invariant while ranking metrics improve. Because the theoretical section proves that Sinkhorn can perfect ranking scores without altering the optimal assignment, the empirical stability of ACC/IPAA confirms that no confounding change to the assignment occurs. The optimized parameters are strictly those of the normalization routine. If desired, we can include an auxiliary experiment comparing Sinkhorn to random score perturbations of matched magnitude. revision: partial

Circularity Check

0 steps flagged

No circularity; theoretical constructions and post-processing experiment are independent

full rationale

The paper's core argument rests on explicit construction of similarity matrices that separate ranking metrics (AP/FPR-95) from assignment correctness, plus a controlled Sinkhorn post-processing experiment that improves the former without the latter. No step reduces to a self-citation chain, a fitted parameter renamed as a prediction, or a quantity defined in terms of the target result. The derivation is self-contained against the stated assumptions and does not invoke uniqueness theorems or ansatzes from prior author work.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Based on abstract only; no explicit free parameters, axioms, or invented entities are described.

pith-pipeline@v0.9.1-grok · 5685 in / 1002 out tokens · 26705 ms · 2026-06-28T15:40:19.552202+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

25 extracted references · 5 canonical work pages · 1 internal anchor

[1]

A structured and methodological review on multi-view human activity recognition for ambient assisted living.Journal of Imaging, 11(6):182, 2025

Fahmid Al Farid, Ahsanul Bari, Abu Saleh Musa Miah, Sarina Mansor, Jia Uddin, and S Prabha Kumaresan. A structured and methodological review on multi-view human activity recognition for ambient assisted living.Journal of Imaging, 11(6):182, 2025

2025
[2]

Action recognition via multi-view perception feature tracking for human–robot interaction.Robotics, 14(4):53, 2025

Chaitanya Bandi and Ulrike Thomas. Action recognition via multi-view perception feature tracking for human–robot interaction.Robotics, 14(4):53, 2025

2025
[3]

Messytable: Instance association in multiple camera views

Zhongang Cai, Junzhe Zhang, Daxuan Ren, Cunjun Yu, Haiyu Zhao, Shuai Yi, Chai Kiat Yeo, and Chen Change Loy. Messytable: Instance association in multiple camera views. InEuropean Conference on Computer Vision, pages 1–16. Springer, 2020

2020
[4]

Wildtrack: A multi-camera hd dataset for dense unscripted pedestrian detection

Tatjana Chavdarova, Pierre Baqué, Stéphane Bouquet, Andrii Maksai, Cijo Jose, Timur Bagautdinov, Louis Lettry, Pascal Fua, Luc Van Gool, and François Fleuret. Wildtrack: A multi-camera hd dataset for dense unscripted pedestrian detection. InProceedings of the IEEE conference on computer vision and pattern recognition, pages 5030–5039, 2018

2018
[5]

Learning from syn- chronization: Self-supervised uncalibrated multi-view person association in challeng- ing scenes

Keqi Chen, Vinkle Srivastav, Didier Mutter, and Nicolas Padoy. Learning from syn- chronization: Self-supervised uncalibrated multi-view person association in challeng- ing scenes. InProceedings of the Computer Vision and Pattern Recognition Confer- ence, pages 24419–24428, 2025

2025
[6]

Rest: A reconfigurable spatial-temporal graph model for multi-camera multi-object tracking

Cheng-Che Cheng, Min-Xuan Qiu, Chen-Kuo Chiang, and Shang-Hong Lai. Rest: A reconfigurable spatial-temporal graph model for multi-camera multi-object tracking. InProceedings of the IEEE/CVF international conference on computer vision, pages 10051–10060, 2023

2023
[7]

Soldiers tracking.https://www.epfl.ch/labs/cvlab/ data/soldiers-tracking/

Leonardo Citraro. Soldiers tracking.https://www.epfl.ch/labs/cvlab/ data/soldiers-tracking/
[8]

Sinkhorn distances: Lightspeed computation of optimal transport.Ad- vances in neural information processing systems, 26, 2013

Marco Cuturi. Sinkhorn distances: Lightspeed computation of optimal transport.Ad- vances in neural information processing systems, 26, 2013

2013
[9]

Self-supervised multi-view multi-human association and tracking

Yiyang Gan, Ruize Han, Liqiang Yin, Wei Feng, and Song Wang. Self-supervised multi-view multi-human association and tracking. InProceedings of the 29th ACM international conference on multimedia, pages 282–290, 2021

2021
[10]

Adam: A Method for Stochastic Optimization

Diederik P Kingma and Jimmy Ba. Adam: A method for stochastic optimization.arXiv preprint arXiv:1412.6980, 2014

work page internal anchor Pith review Pith/arXiv arXiv 2014
[11]

Multi-player tracking for multi-view sports videos with improved k-shortest path algo- rithm.Applied Sciences, 10(3):864, 2020

Qiaokang Liang, Wanneng Wu, Yukun Yang, Ruiheng Zhang, Yu Peng, and Min Xu. Multi-player tracking for multi-view sports videos with improved k-shortest path algo- rithm.Applied Sciences, 10(3):864, 2020

2020
[12]

Graph neu- ral networks for cross-camera data association.IEEE Transactions on Circuits and Systems for Video Technology, 33(2):589–601, 2022

Elena Luna, Juan C SanMiguel, José M Martínez, and Pablo Carballeira. Graph neu- ral networks for cross-camera data association.IEEE Transactions on Circuits and Systems for Video Technology, 33(2):589–601, 2022. 16SHELUKHAN, MAMEDOV , CHUKHROV , KV ANCHIANI: RANKING VS. ASSIGNMENT

2022
[13]

Dynamix: Generalizable person re-identification via dynamic relabeling and mixed data sampling.Neurocom- puting, page 132446, 2025

Timur Mamedov, Anton Konushin, and Vadim Konushin. Dynamix: Generalizable person re-identification via dynamic relabeling and mixed data sampling.Neurocom- puting, page 132446, 2025

2025
[14]

Retext: Text boosts generalization in image-based person re-identification.arXiv:2602.05785, 2026

Timur Mamedov, Karina Kvanchiani, Anton Konushin, and Vadim Konushin. Re- text: Text boosts generalization in image-based person re-identification.arXiv preprint arXiv:2602.05785, 2026

work page arXiv 2026
[15]

Algorithms for the assignment and transportation problems.Society for Industrial and Applied Mathematics, 15:196–210, 1962

James Munkres. Algorithms for the assignment and transportation problems.Society for Industrial and Applied Mathematics, 15:196–210, 1962

1962
[16]

Lmgp: Lifted multicut meets geometry projections for multi-camera multi-object tracking

Duy MH Nguyen, Roberto Henschel, Bodo Rosenhahn, Daniel Sonntag, and Paul Swo- boda. Lmgp: Lifted multicut meets geometry projections for multi-camera multi-object tracking. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 8866–8875, 2022

2022
[17]

Hubness reduction with dual bank sinkhorn normalization for cross-modal retrieval

Zhengxin Pan, Haishuai Wang, Fangyu Wu, Peng Zhang, and Jiajun Bu. Hubness reduction with dual bank sinkhorn normalization for cross-modal retrieval. InPro- ceedings of the 33rd ACM International Conference on Multimedia, pages 6153–6162, 2025

2025
[18]

Mvdet: multi-view multi-class object detection without ground plane assumption.Pattern Analysis and Applications, 26(3): 1059–1070, 2023

Sola Park, Seungjin Yang, and Hyuk-Jae Lee. Mvdet: multi-view multi-class object detection without ground plane assumption.Pattern Analysis and Applications, 26(3): 1059–1070, 2023

2023
[19]

Superglue: Learning feature matching with graph neural networks

Paul-Edouard Sarlin, Daniel DeTone, Tomasz Malisiewicz, and Andrew Rabinovich. Superglue: Learning feature matching with graph neural networks. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 4938– 4947, 2020

2020
[20]

Vit-p3de*: Vision transformer based multi-camera instance association with pseudo 3d position embeddings

Minseok Seo, Hyuk-Jae Lee, and Xuan Truong Nguyen. Vit-p3de*: Vision transformer based multi-camera instance association with pseudo 3d position embeddings. InIJ- CAI, pages 1340–1350, 2023

2023
[21]

The self-optimal-transport feature transform.arXiv preprint arXiv:2204.03065, 3, 2022

Daniel Shalam and Simon Korman. The self-optimal-transport feature transform.arXiv preprint arXiv:2204.03065, 3, 2022

work page arXiv 2022
[22]

arXiv preprint arXiv:1808.08180 (2018)

Vinkle Srivastav, Thibaut Issenhuth, Abdolrahim Kadkhodamohammadi, Michel de Mathelin, Afshin Gangi, and Nicolas Padoy. Mvor: A multi-view rgb-d operating room dataset for 2d and 3d human pose estimation.arXiv preprint arXiv:1808.08180, 2018

work page arXiv 2018
[23]

Optimal transport for label-efficient visible-infrared per- son re-identification

Jiangming Wang, Zhizhong Zhang, Mingang Chen, Yi Zhang, Cong Wang, Bin Sheng, Yanyun Qu, and Yuan Xie. Optimal transport for label-efficient visible-infrared per- son re-identification. InEuropean Conference on Computer Vision, pages 93–109. Springer, 2022

2022
[24]

Mutual information guided optimal transport for unsupervised visible- infrared person re-identification.arXiv preprint arXiv:2407.12758, 2024

Zhizhong Zhang, Jiangming Wang, Xin Tan, Yanyun Qu, Junping Wang, Yong Xie, and Yuan Xie. Mutual information guided optimal transport for unsupervised visible- infrared person re-identification.arXiv preprint arXiv:2407.12758, 2024. SHELUKHAN, MAMEDOV , CHUKHROV , KV ANCHIANI: RANKING VS. ASSIGNMENT17

work page arXiv 2024
[25]

Learning general- isable omni-scale representations for person re-identification.IEEE transactions on pattern analysis and machine intelligence, 44(9):5056–5069, 2021

Kaiyang Zhou, Yongxin Yang, Andrea Cavallaro, and Tao Xiang. Learning general- isable omni-scale representations for person re-identification.IEEE transactions on pattern analysis and machine intelligence, 44(9):5056–5069, 2021. 18SHELUKHAN, MAMEDOV , CHUKHROV , KV ANCHIANI: RANKING VS. ASSIGNMENT A Pseudocode For clarity, we provide the pseudocode for ou...

2021

[1] [1]

A structured and methodological review on multi-view human activity recognition for ambient assisted living.Journal of Imaging, 11(6):182, 2025

Fahmid Al Farid, Ahsanul Bari, Abu Saleh Musa Miah, Sarina Mansor, Jia Uddin, and S Prabha Kumaresan. A structured and methodological review on multi-view human activity recognition for ambient assisted living.Journal of Imaging, 11(6):182, 2025

2025

[2] [2]

Action recognition via multi-view perception feature tracking for human–robot interaction.Robotics, 14(4):53, 2025

Chaitanya Bandi and Ulrike Thomas. Action recognition via multi-view perception feature tracking for human–robot interaction.Robotics, 14(4):53, 2025

2025

[3] [3]

Messytable: Instance association in multiple camera views

Zhongang Cai, Junzhe Zhang, Daxuan Ren, Cunjun Yu, Haiyu Zhao, Shuai Yi, Chai Kiat Yeo, and Chen Change Loy. Messytable: Instance association in multiple camera views. InEuropean Conference on Computer Vision, pages 1–16. Springer, 2020

2020

[4] [4]

Wildtrack: A multi-camera hd dataset for dense unscripted pedestrian detection

Tatjana Chavdarova, Pierre Baqué, Stéphane Bouquet, Andrii Maksai, Cijo Jose, Timur Bagautdinov, Louis Lettry, Pascal Fua, Luc Van Gool, and François Fleuret. Wildtrack: A multi-camera hd dataset for dense unscripted pedestrian detection. InProceedings of the IEEE conference on computer vision and pattern recognition, pages 5030–5039, 2018

2018

[5] [5]

Learning from syn- chronization: Self-supervised uncalibrated multi-view person association in challeng- ing scenes

Keqi Chen, Vinkle Srivastav, Didier Mutter, and Nicolas Padoy. Learning from syn- chronization: Self-supervised uncalibrated multi-view person association in challeng- ing scenes. InProceedings of the Computer Vision and Pattern Recognition Confer- ence, pages 24419–24428, 2025

2025

[6] [6]

Rest: A reconfigurable spatial-temporal graph model for multi-camera multi-object tracking

Cheng-Che Cheng, Min-Xuan Qiu, Chen-Kuo Chiang, and Shang-Hong Lai. Rest: A reconfigurable spatial-temporal graph model for multi-camera multi-object tracking. InProceedings of the IEEE/CVF international conference on computer vision, pages 10051–10060, 2023

2023

[7] [7]

Soldiers tracking.https://www.epfl.ch/labs/cvlab/ data/soldiers-tracking/

Leonardo Citraro. Soldiers tracking.https://www.epfl.ch/labs/cvlab/ data/soldiers-tracking/

[8] [8]

Sinkhorn distances: Lightspeed computation of optimal transport.Ad- vances in neural information processing systems, 26, 2013

Marco Cuturi. Sinkhorn distances: Lightspeed computation of optimal transport.Ad- vances in neural information processing systems, 26, 2013

2013

[9] [9]

Self-supervised multi-view multi-human association and tracking

Yiyang Gan, Ruize Han, Liqiang Yin, Wei Feng, and Song Wang. Self-supervised multi-view multi-human association and tracking. InProceedings of the 29th ACM international conference on multimedia, pages 282–290, 2021

2021

[10] [10]

Adam: A Method for Stochastic Optimization

Diederik P Kingma and Jimmy Ba. Adam: A method for stochastic optimization.arXiv preprint arXiv:1412.6980, 2014

work page internal anchor Pith review Pith/arXiv arXiv 2014

[11] [11]

Multi-player tracking for multi-view sports videos with improved k-shortest path algo- rithm.Applied Sciences, 10(3):864, 2020

Qiaokang Liang, Wanneng Wu, Yukun Yang, Ruiheng Zhang, Yu Peng, and Min Xu. Multi-player tracking for multi-view sports videos with improved k-shortest path algo- rithm.Applied Sciences, 10(3):864, 2020

2020

[12] [12]

Graph neu- ral networks for cross-camera data association.IEEE Transactions on Circuits and Systems for Video Technology, 33(2):589–601, 2022

Elena Luna, Juan C SanMiguel, José M Martínez, and Pablo Carballeira. Graph neu- ral networks for cross-camera data association.IEEE Transactions on Circuits and Systems for Video Technology, 33(2):589–601, 2022. 16SHELUKHAN, MAMEDOV , CHUKHROV , KV ANCHIANI: RANKING VS. ASSIGNMENT

2022

[13] [13]

Dynamix: Generalizable person re-identification via dynamic relabeling and mixed data sampling.Neurocom- puting, page 132446, 2025

Timur Mamedov, Anton Konushin, and Vadim Konushin. Dynamix: Generalizable person re-identification via dynamic relabeling and mixed data sampling.Neurocom- puting, page 132446, 2025

2025

[14] [14]

Retext: Text boosts generalization in image-based person re-identification.arXiv:2602.05785, 2026

Timur Mamedov, Karina Kvanchiani, Anton Konushin, and Vadim Konushin. Re- text: Text boosts generalization in image-based person re-identification.arXiv preprint arXiv:2602.05785, 2026

work page arXiv 2026

[15] [15]

Algorithms for the assignment and transportation problems.Society for Industrial and Applied Mathematics, 15:196–210, 1962

James Munkres. Algorithms for the assignment and transportation problems.Society for Industrial and Applied Mathematics, 15:196–210, 1962

1962

[16] [16]

Lmgp: Lifted multicut meets geometry projections for multi-camera multi-object tracking

Duy MH Nguyen, Roberto Henschel, Bodo Rosenhahn, Daniel Sonntag, and Paul Swo- boda. Lmgp: Lifted multicut meets geometry projections for multi-camera multi-object tracking. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 8866–8875, 2022

2022

[17] [17]

Hubness reduction with dual bank sinkhorn normalization for cross-modal retrieval

Zhengxin Pan, Haishuai Wang, Fangyu Wu, Peng Zhang, and Jiajun Bu. Hubness reduction with dual bank sinkhorn normalization for cross-modal retrieval. InPro- ceedings of the 33rd ACM International Conference on Multimedia, pages 6153–6162, 2025

2025

[18] [18]

Mvdet: multi-view multi-class object detection without ground plane assumption.Pattern Analysis and Applications, 26(3): 1059–1070, 2023

Sola Park, Seungjin Yang, and Hyuk-Jae Lee. Mvdet: multi-view multi-class object detection without ground plane assumption.Pattern Analysis and Applications, 26(3): 1059–1070, 2023

2023

[19] [19]

Superglue: Learning feature matching with graph neural networks

Paul-Edouard Sarlin, Daniel DeTone, Tomasz Malisiewicz, and Andrew Rabinovich. Superglue: Learning feature matching with graph neural networks. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 4938– 4947, 2020

2020

[20] [20]

Vit-p3de*: Vision transformer based multi-camera instance association with pseudo 3d position embeddings

Minseok Seo, Hyuk-Jae Lee, and Xuan Truong Nguyen. Vit-p3de*: Vision transformer based multi-camera instance association with pseudo 3d position embeddings. InIJ- CAI, pages 1340–1350, 2023

2023

[21] [21]

The self-optimal-transport feature transform.arXiv preprint arXiv:2204.03065, 3, 2022

Daniel Shalam and Simon Korman. The self-optimal-transport feature transform.arXiv preprint arXiv:2204.03065, 3, 2022

work page arXiv 2022

[22] [22]

arXiv preprint arXiv:1808.08180 (2018)

Vinkle Srivastav, Thibaut Issenhuth, Abdolrahim Kadkhodamohammadi, Michel de Mathelin, Afshin Gangi, and Nicolas Padoy. Mvor: A multi-view rgb-d operating room dataset for 2d and 3d human pose estimation.arXiv preprint arXiv:1808.08180, 2018

work page arXiv 2018

[23] [23]

Optimal transport for label-efficient visible-infrared per- son re-identification

Jiangming Wang, Zhizhong Zhang, Mingang Chen, Yi Zhang, Cong Wang, Bin Sheng, Yanyun Qu, and Yuan Xie. Optimal transport for label-efficient visible-infrared per- son re-identification. InEuropean Conference on Computer Vision, pages 93–109. Springer, 2022

2022

[24] [24]

Mutual information guided optimal transport for unsupervised visible- infrared person re-identification.arXiv preprint arXiv:2407.12758, 2024

Zhizhong Zhang, Jiangming Wang, Xin Tan, Yanyun Qu, Junping Wang, Yong Xie, and Yuan Xie. Mutual information guided optimal transport for unsupervised visible- infrared person re-identification.arXiv preprint arXiv:2407.12758, 2024. SHELUKHAN, MAMEDOV , CHUKHROV , KV ANCHIANI: RANKING VS. ASSIGNMENT17

work page arXiv 2024

[25] [25]

Learning general- isable omni-scale representations for person re-identification.IEEE transactions on pattern analysis and machine intelligence, 44(9):5056–5069, 2021

Kaiyang Zhou, Yongxin Yang, Andrea Cavallaro, and Tao Xiang. Learning general- isable omni-scale representations for person re-identification.IEEE transactions on pattern analysis and machine intelligence, 44(9):5056–5069, 2021. 18SHELUKHAN, MAMEDOV , CHUKHROV , KV ANCHIANI: RANKING VS. ASSIGNMENT A Pseudocode For clarity, we provide the pseudocode for ou...

2021