pith. machine review for the scientific record. sign in

arxiv: 2605.01402 · v2 · submitted 2026-05-02 · 💻 cs.CL · cs.CV· cs.LG

Recognition: unknown

Injecting Distributional Awareness into MLLMs via Reinforcement Learning for Deep Imbalanced Regression

Authors on Pith no claims yet

Pith reviewed 2026-05-12 04:46 UTC · model grok-4.3

classification 💻 cs.CL cs.CVcs.LG
keywords multimodal large language modelsimbalanced regressionlong-tailed distributionsreinforcement learningconcordance correlation coefficientgroup relative policy optimizationdistribution alignment
0
0 comments X

The pith

Multimodal LLMs improve regression on rare values by adding batch-level distribution matching rewards.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper argues that standard fine-tuning of multimodal language models for numerical prediction fails on long-tailed data because it treats each example in isolation. It identifies the absence of cross-sample comparisons as the core issue and introduces a reinforcement learning setup that scores entire batches by how closely their overall spread matches the true distribution. This matters for applications like medical scoring or demand forecasting where errors on uncommon cases carry high costs. The method requires no model changes and delivers gains especially when data for certain values is scarce.

Core claim

The authors claim that a Group Relative Policy Optimization framework equipped with a Concordance Correlation Coefficient reward supplies the missing relational supervision, aligning model outputs with ground-truth distributions across correlation, scale, and mean; this yields consistent gains over supervised fine-tuning and prior regression methods on long-tailed benchmarks, with the largest lifts in medium- and few-shot regimes.

What carries the argument

Group Relative Policy Optimization using a batch-level Concordance Correlation Coefficient reward that compares predicted and ground-truth distributions for correlation, scale, and location.

If this is right

  • Models exhibit better tail accuracy without regressing to the mean on imbalanced numerical targets.
  • Performance lifts appear most strongly in medium- and few-shot regimes across unified benchmarks.
  • The approach integrates without any architectural modifications to existing MLLMs.
  • Distribution alignment in correlation, scale, and mean follows directly from the batch comparison reward.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same batch-comparison idea could transfer to other prediction settings where labels are naturally uneven, such as time-series forecasting.
  • It might reduce reliance on synthetic data generation for rare cases by instead leveraging relational signals within real batches.
  • Extending the reward to handle multimodal outputs or uncertainty estimates could address related calibration problems.

Load-bearing premise

The main limitation in current MLLM regression is missing cross-sample relational supervision and that a batch-wise CCC reward supplies it effectively without introducing new biases.

What would settle it

Running the same long-tailed regression benchmarks with the CCC batch reward replaced by standard point-wise rewards and observing no improvement or degradation in tail performance would falsify the central claim.

Figures

Figures reproduced from arXiv: 2605.01402 by Shanshan Li, Xiaomeng Li, Yao Du.

Figure 1
Figure 1. Figure 1: SFT exhibits a pronounced regression-to-the-mean ef￾fect, with predictions collapsing toward the many-shot region. Our method produces a substantially more balanced prediction distri￾bution and maintains reliable predictions in tail regions. tions with different numeric errors can incur identical loss as long as they correspond to the same ground-truth token. Consequently, standard supervised fine-tuning (… view at source ↗
Figure 2
Figure 2. Figure 2: Comparison of training paradigms for numerical prediction in MLLMs. Left: SFT treats regression as token-level classification. Middle: Standard GRPO applies point-wise scalar rewards to each generation. Right: CCC-GRPO introduces batch-level, distribution￾aware relational supervision. Reinforcement fine-tuning has recently emerged as an ef￾fective paradigm for training large reasoning models, where supervi… view at source ↗
Figure 3
Figure 3. Figure 3: Overview of the proposed CCC-GRPO framework for deep imbalanced regression in MLLMs. CCC simultaneously captures linear correlation, scale consistency, and mean alignment between two distribu￾tions (Lawrence & Lin, 1989). Unlike pure correlation or ranking-based objectives, CCC explicitly penalizes both variance collapse and mean shift, making it sensitive to dis￾tributional mismatch beyond relative orderi… view at source ↗
Figure 4
Figure 4. Figure 4: Overview of the constructed DIR benchmark for MLLMs. Absolute Errors (GM). MAE reflects average regression ac￾curacy, while GM penalizes concentrated or frequent errors and provides a complementary measure of error uniformity across sparse and under-represented regions. Baselines. We compare against both classical CNN-based DIR methods and MLLM-based regression approaches. Classical DIR baselines employ co… view at source ↗
Figure 5
Figure 5. Figure 5: MAE gain of Ours over SFT on IMDB-Movie-DIR under Qwen2.5-VL-3B view at source ↗
Figure 6
Figure 6. Figure 6: Sorted error distribution curves for CCC-GRPO and SFT on the BoneAge-DIR dataset under Qwen2.5-VL-3B. ness in sparse regimes. DISCO MAE Reward corresponds to our reproduction of difficulty-aware reweighting (Zhou et al., 2025), adapted to the generative numeric regression setting of MLLMs. It further improves tail performance by adjusting instance importance, yet still relies on point-wise supervision and … view at source ↗
Figure 7
Figure 7. Figure 7: Sorted error distribution curves for CCC-GRPO and SFT on the AgeDB-DIR dataset 0 200 400 600 800 Sorted Sample Rank 0 10 20 30 40 Absolute Error Many-shot SFT Ours 0 100 200 Sorted Sample Rank Median-shot SFT Ours 0 20 40 60 Sorted Sample Rank Few-shot SFT Ours view at source ↗
Figure 8
Figure 8. Figure 8: Sorted error distribution curves for CCC-GRPO and SFT on the IMDB-Movie-DIR dataset Complementary Error Metrics. Tables 9–12 report detailed results using three complementary metrics: Mean Squared Error (MSE), Mean Absolute Error (MAE), and the Geometric Mean of Absolute Errors (GM). MSE amplifies large errors and is therefore sensitive to catastrophic failures, MAE reflects average prediction accuracy, wh… view at source ↗
Figure 9
Figure 9. Figure 9: Sorted error distribution curves for CCC-GRPO and SFT on the IMDB-WIKI-DIR dataset 0 200 400 600 Sorted Sample Rank 0 20 40 60 80 Absolute Error Many-shot SFT Ours 0 200 400 Sorted Sample Rank Median-shot SFT Ours 0 100 200 Sorted Sample Rank Few-shot SFT Ours view at source ↗
Figure 10
Figure 10. Figure 10: Sorted error distribution curves for CCC-GRPO and SFT on the BoneAge-DIR dataset by reducing extreme deviations, CCC-GRPO exhibits more pronounced gains under MSE than under MAE, especially in sparse regions. These trends are consistent with the sorted error curves, confirming that CCC-GRPO improves regression robustness by stabilizing predictions across the target spectrum rather than optimizing mean acc… view at source ↗
Figure 11
Figure 11. Figure 11: MAE gain across AgeDB-DIR and IMDB-Movie-DIR datasets under imbalanced training distributions. 0 1000 2000 3000 # of samples Train label distribution (bin = 1) Label distribution Absolute MAE gain (SFT Ours) Many-shot region Medium-shot region Few-shot region 0 20 40 60 80 100 Target value 2.5 0.0 2.5 5.0 7.5 Absolute MAE Gain (SFT Ours) (a) IMDB-WIKI-DIR 0 500 1000 # of samples Train label distribution (… view at source ↗
Figure 12
Figure 12. Figure 12: MAE gain across IMDB-WIKI-DIR and BoneAge-DIR datasets under imbalanced training distributions. Scaling to Larger MLLM Backbones. In addition to the main experiments on Qwen2.5-VL-3B, we further evaluate CCC-GRPO on a larger backbone, Qwen2.5-VL-7B, with detailed results reported in view at source ↗
Figure 13
Figure 13. Figure 13: Imbalanced Training Dataset Overview view at source ↗
Figure 14
Figure 14. Figure 14: Balanced Testing Dataset Overview B.2. IMDB-Movie-DIR IMDB-Movie-DIR is constructed from the IMDB movie dataset (Kaggle, 2025), where each sample consists of a single movie poster paired with a continuous IMDb rating score. The task requires predicting the movie rating from visual input only, introducing substantial domain shift and label noise. We preserve the naturally imbalanced training distribution, … view at source ↗
read the original abstract

Multimodal large language models (MLLMs) struggle with numerical regression under long-tailed target distributions. Token-level supervised fine-tuning (SFT) and point-wise regression rewards bias learning toward high-density regions, leading to regression-to-the-mean behavior and poor tail performance. We identify the lack of cross-sample relational supervision as a key limitation of existing MLLM training paradigms. To address it, we propose a distribution-aware reinforcement learning framework based on Group Relative Policy Optimization, which introduces batch-level comparison-based supervision via the Concordance Correlation Coefficient-based reward to align predicted and ground-truth distributions in terms of correlation, scale, and mean. The framework is plug-and-play, requiring no architectural modification. Experiments on a unified suite of long-tailed regression benchmarks show consistent improvements over SFT and existing MLLM regression methods, with particularly strong gains in medium- and few-shot regimes.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

1 major / 1 minor

Summary. The paper claims that MLLMs exhibit regression-to-the-mean on long-tailed numerical regression tasks because token-level SFT and point-wise rewards lack cross-sample relational supervision. It proposes a plug-and-play distribution-aware RL framework based on Group Relative Policy Optimization (GRPO) that uses a batch-level Concordance Correlation Coefficient (CCC) reward to align predicted and ground-truth distributions along correlation, scale, and mean. Experiments on a unified suite of long-tailed regression benchmarks are said to show consistent gains over SFT and prior MLLM regression methods, especially in medium- and few-shot regimes.

Significance. If the results hold, the contribution is significant as a practical, architecture-agnostic method for improving distributional fidelity in MLLM regression on imbalanced data. The work correctly builds on the established CCC metric and presents GRPO as an off-the-shelf RL variant, giving credit for the clean framing of missing relational supervision and the absence of new hyperparameters or model changes.

major comments (1)
  1. [Method section (reward formulation)] Method section (reward formulation): The central claim that the batch-level CCC reward supplies effective cross-sample relational supervision to correct tail regression-to-the-mean is load-bearing, yet the manuscript does not address how random batching interacts with long-tailed targets. Tail samples typically constitute <5-10% of a random batch, so the CCC gradient is dominated by head samples; this leaves open whether any reported tail gains arise from the claimed mechanism or from generic RL effects, and whether batch size or sampling must be tuned (contradicting the plug-and-play assertion).
minor comments (1)
  1. [Abstract] Abstract: The claim of 'consistent improvements' and 'particularly strong gains' is stated without any numerical values, baseline names, or error bars; a one-sentence summary of key metrics would aid quick assessment.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the constructive feedback. We address the single major comment on the reward formulation below and will incorporate clarifications and additional analysis in the revision.

read point-by-point responses
  1. Referee: The central claim that the batch-level CCC reward supplies effective cross-sample relational supervision to correct tail regression-to-the-mean is load-bearing, yet the manuscript does not address how random batching interacts with long-tailed targets. Tail samples typically constitute <5-10% of a random batch, so the CCC gradient is dominated by head samples; this leaves open whether any reported tail gains arise from the claimed mechanism or from generic RL effects, and whether batch size or sampling must be tuned (contradicting the plug-and-play assertion).

    Authors: We acknowledge that the manuscript does not explicitly analyze the interaction between random batching and long-tailed targets, which is a fair observation. The CCC reward is computed over the full batch and penalizes mismatches in correlation, scale, and location; this global signal can still constrain mean-regression bias even when tails are sparse, because deviations in batch-level statistics affect the reward for all samples. Our point-wise reward baselines already isolate generic RL effects, and the reported gains (particularly in few-shot regimes) are larger than those baselines. Nevertheless, we agree the mechanism would be stronger with direct evidence on batch composition. In the revised manuscript we will add a dedicated paragraph in the method section explaining the batch-level nature of CCC and include an ablation on batch size (standard values 8-32) plus a comparison of random vs. stratified batching to quantify tail-sample influence. These additions require no new hyperparameters or model changes, preserving the plug-and-play claim; batch size is a conventional training choice shared with any RL fine-tuning procedure. revision: partial

Circularity Check

0 steps flagged

No circularity; derivation relies on standard metrics and empirical validation

full rationale

The paper identifies a limitation in existing MLLM regression (lack of cross-sample relational supervision in SFT and pointwise rewards) and proposes a plug-and-play RL framework that applies the established Concordance Correlation Coefficient as a batch-level reward inside Group Relative Policy Optimization. CCC directly quantifies the desired alignment in correlation, scale, and mean, but this is an explicit design choice rather than a self-referential definition or fitted input renamed as a prediction. No equations or self-citations reduce the central claim to its own inputs by construction; gains are asserted via benchmark experiments, not tautological derivation. The chain is self-contained against external benchmarks and standard RL techniques.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

Abstract-only review prevents full audit; the method rests on the domain assumption that CCC supplies useful relational supervision and that GRPO can be applied plug-and-play without further justification.

axioms (1)
  • domain assumption Batch-level CCC reward supplies the missing cross-sample relational supervision needed for tail performance
    Invoked as the core mechanism to overcome regression-to-the-mean without additional evidence in the abstract.

pith-pipeline@v0.9.0 · 5456 in / 1221 out tokens · 51975 ms · 2026-05-12T04:46:51.964194+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

158 extracted references · 158 canonical work pages · 12 internal anchors

  1. [1]

    proceedings of the IEEE conference on computer vision and pattern recognition workshops , pages=

    Agedb: the first manually collected, in-the-wild age database , author=. proceedings of the IEEE conference on computer vision and pattern recognition workshops , pages=

  2. [2]

    Proceedings of the IEEE conference on computer vision and pattern recognition , pages=

    Deep residual learning for image recognition , author=. Proceedings of the IEEE conference on computer vision and pattern recognition , pages=

  3. [3]

    Advances in Neural Information Processing Systems , volume=

    Variational imbalanced regression: Fair uncertainty quantification via probabilistic smoothing , author=. Advances in Neural Information Processing Systems , volume=

  4. [4]

    ConR: Contrastive Regularizer for Deep Imbalanced Regression , author=

  5. [5]

    Proceedings of the AAAI Conference on Artificial Intelligence , volume=

    Leveraging group classification with descending soft labeling for deep imbalanced regression , author=. Proceedings of the AAAI Conference on Artificial Intelligence , volume=

  6. [6]

    Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , pages=

    Deep imbalanced regression via hierarchical classification adjustment , author=. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , pages=

  7. [7]

    Dist Loss: Enhancing Regression in Few-Shot Region through Distribution Distance Constraint , author=

  8. [8]

    Proceedings of the IEEE/CVF International Conference on Computer Vision , pages=

    Enhancing Numerical Prediction of MLLMs with Soft Labeling , author=. Proceedings of the IEEE/CVF International Conference on Computer Vision , pages=

  9. [9]

    Wu, Tianhe and Zou, Jian and Liang, Jie and Zhang, Lei and Ma, Kede , journal=

  10. [10]

    Advances in neural information processing systems , volume=

    Visual instruction tuning , author=. Advances in neural information processing systems , volume=

  11. [11]

    Unhackable temporal rewarding for scalable video mllms.arXiv preprint arXiv:2502.12081, 2025

    Unhackable Temporal Rewarding for Scalable Video MLLMs , author=. arXiv preprint arXiv:2502.12081 , year=

  12. [12]

    arXiv preprint arXiv:2307.09474 , year=

    Chatspot: Bootstrapping multimodal llms via precise referring instruction tuning , author=. arXiv preprint arXiv:2307.09474 , year=

  13. [13]

    arXiv preprint arXiv:2312.00589 , year=

    Merlin: Empowering multimodal llms with foresight minds , author=. arXiv preprint arXiv:2312.00589 , year=

  14. [14]

    RT-2: Vision-Language-Action Models Transfer Web Knowledge to Robotic Control

    Rt-2: Vision-language-action models transfer web knowledge to robotic control , author=. arXiv preprint arXiv:2307.15818 , year=

  15. [15]

    DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models

    Deepseekmath: Pushing the limits of mathematical reasoning in open language models , author=. arXiv preprint arXiv:2402.03300 , year=

  16. [16]

    Proximal Policy Optimization Algorithms

    Proximal policy optimization algorithms , author=. arXiv preprint arXiv:1707.06347 , year=

  17. [17]

    2025 , eprint=

    DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning , author=. 2025 , eprint=

  18. [18]

    Neural networks , volume=

    A systematic study of the class imbalance problem in convolutional neural networks , author=. Neural networks , volume=. 2018 , publisher=

  19. [19]

    Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) , pages=

    Numeracy for language models: Evaluating and improving their ability to predict numbers , author=. Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) , pages=

  20. [20]

    CoRR , volume =

    Q-insight: Understanding image quality via visual reinforcement learning , author=. arXiv preprint arXiv:2503.22679 , year=

  21. [21]

    Biometrics , pages=

    A concordance correlation coefficient to evaluate reproducibility , author=. Biometrics , pages=. 1989 , publisher=

  22. [22]

    arXiv preprint arXiv:2505.15074 , year=

    DISCO Balances the Scales: Adaptive Domain-and Difficulty-Aware Reinforcement Learning on Imbalanced Data , author=. arXiv preprint arXiv:2505.15074 , year=

  23. [23]

    Understanding R1-Zero-Like Training: A Critical Perspective

    Understanding r1-zero-like training: A critical perspective , author=. arXiv preprint arXiv:2503.20783 , year=

  24. [24]

    arXiv preprint arXiv:2506.07464 , year=

    DeepVideo-R1: Video Reinforcement Fine-Tuning via Difficulty-aware Regressive GRPO , author=. arXiv preprint arXiv:2506.07464 , year=

  25. [25]

    Online Distributionally Robust LLM Alignment via Regression to Relative Reward

    DRO-REBEL: Distributionally Robust Relative-Reward Regression for Fast and Efficient LLM Alignment , author=. arXiv preprint arXiv:2509.19104 , year=

  26. [26]

    Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing , pages=

    Reinforcement Learning for Large Language Models via Group Preference Reward Shaping , author=. Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing , pages=

  27. [27]

    Proceedings of the Computer Vision and Pattern Recognition Conference , pages=

    Teaching large language models to regress accurate image quality scores using score distribution , author=. Proceedings of the Computer Vision and Pattern Recognition Conference , pages=

  28. [28]

    Radiology , volume=

    The RSNA pediatric bone age machine learning challenge , author=. Radiology , volume=. 2019 , publisher=

  29. [29]

    Advances in Neural Information Processing Systems , volume=

    Semi-supervised contrastive learning for deep regression with ordinal rankings from spectral seriation , author=. Advances in Neural Information Processing Systems , volume=

  30. [30]

    European Conference on Computer Vision , pages=

    Teach clip to develop a number sense for ordinal regression , author=. European Conference on Computer Vision , pages=. 2024 , organization=

  31. [31]

    Bai, Shuai and Chen, Keqin and Liu, Xuejing and Wang, Jialin and Ge, Wenbin and Song, Sibo and Dang, Kai and Wang, Peng and Wang, Shijie and Tang, Jun and others , journal=

  32. [32]

    International Conference on Learning Representations , year =

    Decoupled Weight Decay Regularization , author=. International Conference on Learning Representations , year =

  33. [33]

    arXiv preprint arXiv:2504.07954 , year =

    Perception-r1: Pioneering perception policy with reinforcement learning , author=. arXiv preprint arXiv:2504.07954 , year=

  34. [34]

    Proceedings of the IEEE/CVF conference on computer vision and pattern recognition , pages=

    Large-scale long-tailed recognition in an open world , author=. Proceedings of the IEEE/CVF conference on computer vision and pattern recognition , pages=

  35. [35]

    Visnumbench: Evaluating number sense of multimodal large language models.arXiv preprint arXiv:2503.14939, 2025

    VisNumBench: Evaluating Number Sense of Multimodal Large Language Models , author=. arXiv preprint arXiv:2503.14939 , year=

  36. [36]

    arXiv preprint arXiv:2511.11239 , year=

    Beyond Flatlands: Unlocking Spatial Intelligence by Decoupling 3D Reasoning from Numerical Regression , author=. arXiv preprint arXiv:2511.11239 , year=

  37. [37]

    Detect anything via next point prediction,

    Detect anything via next point prediction , author=. arXiv preprint arXiv:2510.12798 , year=

  38. [38]

    Think or not think: A study of explicit thinking in rule-based visual reinforcement fine-tuning.arXiv preprint arXiv:2503.16188, 2025

    Cls-rl: Image classification with rule-based reinforcement learning , author=. arXiv preprint arXiv:2503.16188 , volume=

  39. [39]

    arXiv preprint arXiv:2504.04801 , year=

    OrderChain: A General Prompting Paradigm to Improve Ordinal Understanding Ability of MLLM , author=. arXiv preprint arXiv:2504.04801 , year=

  40. [40]

    Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , pages=

    Cogagent: A visual language model for gui agents , author=. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , pages=

  41. [41]

    IEEE Transactions on Medical Imaging , volume=

    Adaptive contrast for image regression in computer-aided disease assessment , author=. IEEE Transactions on Medical Imaging , volume=. 2021 , publisher=

  42. [42]

    Proceedings of the AAAI Conference on Artificial Intelligence , volume=

    Semi-supervised deep regression with uncertainty consistency and variational model ensembling via bayesian neural networks , author=. Proceedings of the AAAI Conference on Artificial Intelligence , volume=

  43. [43]

    Advances in neural information processing systems , volume=

    Ordinal regression by extended binary classification , author=. Advances in neural information processing systems , volume=

  44. [44]

    Proceedings of the 31st ACM International Conference on Multimedia , pages=

    Clip-count: Towards text-guided zero-shot object counting , author=. Proceedings of the 31st ACM International Conference on Multimedia , pages=

  45. [45]

    Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , pages=

    Learning Multi-Modal Class-Specific Tokens for Weakly Supervised Dense Object Localization , author=. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , pages=

  46. [46]

    International Conference on Machine Learning , pages=

    Blip: Bootstrapping language-image pre-training for unified vision-language understanding and generation , author=. International Conference on Machine Learning , pages=. 2022 , organization=

  47. [47]

    Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , pages=

    Imagebind: One embedding space to bind them all , author=. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , pages=

  48. [48]

    Advances in Neural Information Processing Systems , volume=

    Rank-n-contrast: learning continuous representations for regression , author=. Advances in Neural Information Processing Systems , volume=

  49. [49]

    Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , pages=

    Probing conceptual understanding of large visual-language models , author=. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , pages=

  50. [50]

    arXiv preprint arXiv:2306.16048 , year=

    Challenges of Zero-Shot Recognition with Vision-Language Models: Granularity and Correctness , author=. arXiv preprint arXiv:2306.16048 , year=

  51. [51]

    European Conference on Computer Vision , pages=

    No token left behind: Explainability-aided image classification and generation , author=. European Conference on Computer Vision , pages=. 2022 , organization=

  52. [52]

    Proceedings of the IEEE conference on computer vision and pattern recognition , pages=

    Deep ordinal regression network for monocular depth estimation , author=. Proceedings of the IEEE conference on computer vision and pattern recognition , pages=

  53. [53]

    Proceedings of the IEEE conference on computer vision and pattern recognition , pages=

    Ordinal regression with multiple output cnn for age estimation , author=. Proceedings of the IEEE conference on computer vision and pattern recognition , pages=

  54. [54]

    2013 , eprint=

    Efficient Estimation of Word Representations in Vector Space , author=. 2013 , eprint=

  55. [55]

    Proceedings of the European Conference on Computer Vision (ECCV) , pages=

    Deepgum: Learning deep robust regression with a gaussian-uniform mixture model , author=. Proceedings of the European Conference on Computer Vision (ECCV) , pages=

  56. [56]

    Improving language understanding with unsupervised learning

    Radford, Alec and Narasimhan, Karthik and Salimans, Tim and Sutskever, Ilya. Improving language understanding with unsupervised learning. 2018

  57. [57]

    Computer Vision--ECCV 2020: 16th European Conference, Glasgow, UK, August 23--28, 2020, Proceedings, Part XXIII 16 , pages=

    Adaptive variance based label distribution learning for facial age estimation , author=. Computer Vision--ECCV 2020: 16th European Conference, Glasgow, UK, August 23--28, 2020, Proceedings, Part XXIII 16 , pages=. 2020 , organization=

  58. [58]

    Proceedings of International Conference on Multimedia Retrieval , pages=

    Dating color images with ordinal classification , author=. Proceedings of International Conference on Multimedia Retrieval , pages=

  59. [59]

    Computer Vision--ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11--14, 2016, Proceedings, Part I 14 , pages=

    Photo aesthetics ranking network with attributes and content adaptation , author=. Computer Vision--ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11--14, 2016, Proceedings, Part I 14 , pages=. 2016 , organization=

  60. [60]

    Proceedings of the IEEE/CVF conference on computer vision and pattern recognition , pages=

    Soft labels for ordinal regression , author=. Proceedings of the IEEE/CVF conference on computer vision and pattern recognition , pages=

  61. [61]

    Proceedings of the IEEE/CVF conference on computer vision and pattern recognition , pages=

    Bridgenet: A continuity-aware probabilistic network for age estimation , author=. Proceedings of the IEEE/CVF conference on computer vision and pattern recognition , pages=

  62. [62]

    Proceedings of the IEEE conference on computer vision and pattern recognition workshops , pages=

    Fine-grained head pose estimation without keypoints , author=. Proceedings of the IEEE conference on computer vision and pattern recognition workshops , pages=

  63. [63]

    Advances in neural information processing systems , volume=

    Label distribution learning forests , author=. Advances in neural information processing systems , volume=

  64. [64]

    International Journal of Computer Vision , volume=

    Deep expectation of real and apparent age from a single image without facial landmarks , author=. International Journal of Computer Vision , volume=. 2018 , publisher=

  65. [65]

    7th international conference on automatic face and gesture recognition (FGR06) , pages=

    Morph: A longitudinal image database of normal adult age-progression , author=. 7th international conference on automatic face and gesture recognition (FGR06) , pages=. 2006 , organization=

  66. [66]

    International conference on machine learning , pages=

    Learning transferable visual models from natural language supervision , author=. International conference on machine learning , pages=. 2021 , organization=

  67. [67]

    International Journal of Computer Vision , volume=

    Learning to prompt for vision-language models , author=. International Journal of Computer Vision , volume=. 2022 , publisher=

  68. [68]

    IEEE transactions on pattern analysis and machine intelligence , volume=

    Facial age estimation by learning from label distributions , author=. IEEE transactions on pattern analysis and machine intelligence , volume=. 2013 , publisher=

  69. [69]

    Proceedings of the IEEE conference on computer vision and pattern recognition , pages=

    Using ranking-CNN for age estimation , author=. Proceedings of the IEEE conference on computer vision and pattern recognition , pages=

  70. [70]

    Proceedings of the IEEE conference on computer vision and pattern recognition , pages=

    Mean-variance loss for deep age estimation from a face , author=. Proceedings of the IEEE conference on computer vision and pattern recognition , pages=

  71. [71]

    Computer Vision--ECCV 2020: 16th European Conference, Glasgow, UK, August 23--28, 2020, Proceedings, Part XX 16 , pages=

    Energy-based models for deep probabilistic regression , author=. Computer Vision--ECCV 2020: 16th European Conference, Glasgow, UK, August 23--28, 2020, Proceedings, Part XX 16 , pages=. 2020 , organization=

  72. [72]

    Proceedings of the IEEE/CVF conference on computer vision and pattern recognition , pages=

    Learning probabilistic ordinal embeddings for uncertainty-aware regression , author=. Proceedings of the IEEE/CVF conference on computer vision and pattern recognition , pages=

  73. [73]

    Computer Vision--ECCV 2020: 16th European Conference, Glasgow, UK, August 23--28, 2020, Proceedings, Part XXX 16 , pages=

    Self-paced deep regression forests with consideration on underrepresented examples , author=. Computer Vision--ECCV 2020: 16th European Conference, Glasgow, UK, August 23--28, 2020, Proceedings, Part XXX 16 , pages=. 2020 , organization=

  74. [74]

    Advances in neural information processing systems , volume=

    Semi-supervised sequence learning , author=. Advances in neural information processing systems , volume=

  75. [75]

    Proceedings of naacL-HLT , volume=

    Bert: Pre-training of deep bidirectional transformers for language understanding , author=. Proceedings of naacL-HLT , volume=

  76. [76]

    Proceedings of the IEEE/CVF International Conference on Computer Vision , pages=

    Styleclip: Text-driven manipulation of stylegan imagery , author=. Proceedings of the IEEE/CVF International Conference on Computer Vision , pages=

  77. [77]

    ActionCLIP: A New Paradigm for Video Action Recognition

    Wang, Mengmeng and Xing, Jiazheng and Liu, Yong. ActionCLIP: A New Paradigm for Video Action Recognition. arXiv preprint. 2021

  78. [78]

    International Journal of Computer Vision , pages=

    Clip-adapter: Better vision-language models with feature adapters , author=. International Journal of Computer Vision , pages=. 2023 , publisher=

  79. [79]

    Tip-Adapter: Training-free CLIP-Adapter for Better Vision-Language Modeling

    Zhang, Renrui and Fang, Rongyao and Zhang, Wei and Gao, Peng and Li, Kunchang and Dai, Jifeng and Qiao, Yu and Li, Hongsheng. Tip-Adapter: Training-free CLIP-Adapter for Better Vision-Language Modeling. arXiv preprint. 2021

  80. [80]

    ICASSP 2022-2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) , pages=

    Wav2clip: Learning robust audio representations from clip , author=. ICASSP 2022-2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) , pages=. 2022 , organization=

Showing first 80 references.