pith. machine review for the scientific record. sign in

arxiv: 2604.09710 · v1 · submitted 2026-04-08 · 💻 cs.CV · cs.LG

Recognition: 2 theorem links

· Lean Theorem

Robust Fair Disease Diagnosis in CT Images

Authors on Pith no claims yet

Pith reviewed 2026-05-10 18:25 UTC · model grok-4.3

classification 💻 cs.CV cs.LG
keywords fair disease diagnosisCT imagesclass imbalancedemographic fairnessCVaR aggregationlogit-adjusted loss3D ResNetmedical imaging
0
0 comments X

The pith

A two-level loss combining logit-adjusted cross-entropy and CVaR improves both accuracy and fairness in CT-based disease diagnosis.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper establishes that deep learning models for automated chest CT diagnosis fail more severely when class imbalance and demographic underrepresentation coincide, creating compound bias that standard techniques cannot resolve. It introduces a training objective with logit-adjusted cross-entropy to adjust sample-level decision margins according to class frequency and CVaR aggregation to focus on the demographic group with the current highest loss. Evaluation on the Fair Disease Diagnosis benchmark, which includes extreme cases like five female samples for squamous cell carcinoma, shows the combined method reaches a gender-averaged macro F1 of 0.8403 and a fairness gap of 0.0239. This matters for clinical use because reliable diagnosis across patient groups is essential to avoid worsening health disparities.

Core claim

The paper claims that integrating logit-adjusted cross-entropy loss at the sample level with Conditional Value at Risk aggregation at the group level produces a robust and fair classifier for CT image diagnosis. On the benchmark dataset with four disease classes and sex annotations, this yields a 13.3% higher macro F1 score and a 78% smaller fairness gap compared to the baseline, with ablations confirming both components are necessary.

What carries the argument

The two-level objective: logit-adjusted cross-entropy loss that shifts decision margins proportionally to class frequency, paired with CVaR aggregation that directs optimization toward the worst-performing demographic group.

Load-bearing premise

The compound imbalance patterns observed in the Fair Disease Diagnosis benchmark, including extreme underrepresentation in specific disease and sex combinations, are representative of those encountered in broader clinical deployments.

What would settle it

Re-evaluating the method on a new CT dataset with different disease class frequencies and demographic distributions that shows no reduction in fairness gap or even an increase in disparity for some groups would indicate the approach does not reliably address compound bias.

Figures

Figures reproduced from arXiv: 2604.09710 by Aryana Hou, Asmita Yuki Pritha, Daniel Ding, Justin Li, Shu Hu, Xin Wang.

Figure 1
Figure 1. Figure 1: Training set distribution by disease category and pa￾tient sex. The dataset exhibits a compound imbalance: squamous cell carcinoma is already the rarest class overall (84 samples), and its female subset contains only 5 training examples. This intersec￾tion of class rarity and demographic scarcity motivates our two￾level objective that simultaneously addresses both axes of imbal￾ance. than 1.8 million death… view at source ↗
Figure 2
Figure 2. Figure 2: Overview of the proposed framework. Row 1: representative CT slices from the four diagnostic categories. Row 2: preprocess￾ing removes non-pulmonary regions, samples to a fixed depth of 64, and normalizes to produce the input volume. Row 3: a Kinetics-400 pretrained R3D-18 extracts features, classified through a fully connected head. The logit adjusted loss (§ 3.4) corrects class imbalance, while CVaR aggr… view at source ↗
Figure 3
Figure 3. Figure 3: Performance vs. fairness trade-off across all configura [PITH_FULL_IMAGE:figures/full_fig_p006_3.png] view at source ↗
read the original abstract

Automated diagnosis from chest CT has improved considerably with deep learning, but models trained on skewed datasets tend to perform unevenly across patient demographics. However, the situation is worse than simple demographic bias. In clinical data, class imbalance and group underrepresentation often coincide, creating compound failure modes that neither standard rebalancing nor fairness corrections can fix alone. We introduce a two-level objective that targets both axes of this problem. Logit-adjusted cross-entropy loss operates at the sample level, shifting decision margins by class frequency with provable consistency guarantees. Conditional Value at Risk aggregation operates at the group level, directing optimization pressure toward whichever demographic group currently has the higher loss. We evaluate on the Fair Disease Diagnosis benchmark using a 3D ResNet-18 pretrained on Kinetics-400, classifying CT volumes into Adenocarcinoma, Squamous Cell Carcinoma, COVID-19, and Normal groups with patient sex annotations. The training set illustrates the compound problem concretely: squamous cell carcinoma has 84 samples total, 5 of them female. The combined loss reaches a gender-averaged macro F1 of 0.8403 with a fairness gap of 0.0239, a 13.3% improvement in score and 78% reduction in demographic disparity over the baseline. Ablations show that each component alone falls short. The code is publicly available at https://github.com/Purdue-M2/Fair-Disease-Diagnosis.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 3 minor

Summary. The manuscript proposes a two-level training objective for fair and robust disease diagnosis from chest CT scans. At the sample level, it employs logit-adjusted cross-entropy loss to handle class imbalance with claimed consistency guarantees. At the group level, it uses Conditional Value at Risk (CVaR) to focus optimization on the worst-performing demographic group. Evaluated on a benchmark with severe compound imbalance (e.g., only 5 female samples for squamous cell carcinoma), the combined method achieves a gender-averaged macro F1 score of 0.8403 and a fairness gap of 0.0239, improving 13.3% over baseline F1 and reducing disparity by 78%. Ablation studies indicate both components are necessary, and the code is released publicly.

Significance. If the quantitative improvements are confirmed to be statistically significant and generalizable, the work would meaningfully advance methods for mitigating compound biases (class and demographic) in medical imaging, a critical issue for equitable AI in healthcare. The public availability of the code facilitates reproducibility and further testing, which is a positive aspect of the submission.

major comments (2)
  1. Abstract: The reported performance (gender-averaged macro F1 of 0.8403 with fairness gap 0.0239) and improvements (13.3% and 78%) are given as single point estimates without accompanying variance, error bars, or multi-run statistics. Given the extreme subgroup size of only 5 female squamous cell carcinoma samples in training, these metrics are prone to high variance from sampling or initialization; this directly affects the reliability of the central claim that the two-level objective outperforms the baseline.
  2. Method description (logit-adjusted loss): The abstract states that the logit-adjusted cross-entropy 'operates at the sample level, shifting decision margins by class frequency with provable consistency guarantees,' but the manuscript provides no derivation, proof outline, or citation specifying the conditions for these guarantees. This is important because the overall contribution relies on the combination with CVaR, and it is unclear if the guarantees hold in the presence of group-level risk aggregation.
minor comments (3)
  1. Abstract: The exact definition of the 'fairness gap' (e.g., whether it is the difference in per-group F1 scores or another metric) should be clarified explicitly, even if standard.
  2. Dataset description: A table summarizing the per-class, per-group sample counts in train/val/test splits would help readers assess the compound imbalance without referring to external code or the abstract alone.
  3. Training details: Hyperparameters such as the CVaR quantile level, learning rate, and number of epochs for the 3D ResNet-18 are not detailed in the text; while the code is available, a brief description in the paper would improve clarity.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback, which helps improve the clarity and rigor of our work. We address each major comment below.

read point-by-point responses
  1. Referee: Abstract: The reported performance (gender-averaged macro F1 of 0.8403 with fairness gap 0.0239) and improvements (13.3% and 78%) are given as single point estimates without accompanying variance, error bars, or multi-run statistics. Given the extreme subgroup size of only 5 female squamous cell carcinoma samples in training, these metrics are prone to high variance from sampling or initialization; this directly affects the reliability of the central claim that the two-level objective outperforms the baseline.

    Authors: We agree that single-point estimates are insufficient given the small subgroup sizes and potential variance from initialization or sampling. In the revised manuscript, we will rerun the experiments with at least five different random seeds, reporting mean and standard deviation for the key metrics (gender-averaged macro F1 and fairness gap). We will also add a statistical significance analysis (e.g., paired t-test against baseline) to support the reported improvements. revision: yes

  2. Referee: Method description (logit-adjusted loss): The abstract states that the logit-adjusted cross-entropy 'operates at the sample level, shifting decision margins by class frequency with provable consistency guarantees,' but the manuscript provides no derivation, proof outline, or citation specifying the conditions for these guarantees. This is important because the overall contribution relies on the combination with CVaR, and it is unclear if the guarantees hold in the presence of group-level risk aggregation.

    Authors: The logit-adjusted loss is adopted from prior work on imbalanced classification, where consistency guarantees (convergence to the class-frequency-adjusted Bayes classifier) have been established under standard assumptions as sample size grows. We will add the appropriate citation and a concise outline of the relevant theoretical result to the method section. The CVaR component aggregates risks across groups but does not modify the per-sample loss; the guarantees therefore continue to apply to the sample-level term, while the overall method is validated empirically. We will explicitly clarify this separation in the revision. revision: yes

Circularity Check

0 steps flagged

No circularity in empirical method and benchmark results

full rationale

The paper presents a two-level loss (logit-adjusted CE at sample level plus CVaR at group level) and directly measures its effect via macro F1 and fairness gap on the Fair Disease Diagnosis benchmark. These quantities are computed from model outputs on held-out test data; they are not defined in terms of the training parameters or inputs, nor do any equations reduce the reported 0.8403 / 0.0239 figures to the training distribution by construction. No self-definitional, fitted-input-renamed-as-prediction, or self-citation-load-bearing steps appear in the derivation chain. The evaluation is therefore self-contained against an external benchmark.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The approach rests on standard supervised learning assumptions plus the empirical claim that the chosen benchmark exhibits the targeted compound skew. No new entities are postulated.

axioms (2)
  • domain assumption Logit-adjusted cross-entropy provides consistent classification under class imbalance
    Invoked to justify the sample-level term; consistency guarantees are asserted but not derived in the abstract.
  • domain assumption CVaR aggregation directs optimization toward the worst-performing demographic group
    Core justification for the group-level term; relies on the risk-measurement literature without re-derivation.

pith-pipeline@v0.9.0 · 5561 in / 1331 out tokens · 30680 ms · 2026-05-10T18:25:34.728646+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

72 extracted references · 4 canonical work pages · 1 internal anchor

  1. [1]

    A survey on deep learning in medical image analysis,

    G. Litjens, T. Kooi, B. E. Bejnordi, A. A. A. Setio, F. Ciompi, M. Ghafoorian, J. A. Van Der Laak, B. Van Ginneken, and C. I. S´anchez, “A survey on deep learning in medical image analysis,”Medical Image Analysis, vol. 42, pp. 60–88, 2017. 1

  2. [2]

    Chest CT findings in cases from the cruise ship Diamond Princess with coro- navirus disease (COVID-19),

    S. Inui, A. Fujikawa, M. Jitsu, N. Kunishima, S. Watanabe, Y . Suzuki, S. Umeda, and Y . Uwabe, “Chest CT findings in cases from the cruise ship Diamond Princess with coro- navirus disease (COVID-19),”Radiology: Cardiothoracic Imaging, vol. 2, no. 2, p. e200110, 2020. 1

  3. [3]

    Availability of computed tomography in U.S. emergency departments,

    A. S. Raja, A. D. Sodickson, D. L. Bhatt, and H. M. Zafar, “Availability of computed tomography in U.S. emergency departments,”Journal of the American College of Radiology, vol. 14, no. 8, pp. 998–1002, 2017. 1

  4. [4]

    Deep neural architectures for prediction in healthcare,

    D. Kollias, A. Tagaris, A. Stafylopatis, S. Kollias, and G. Tagaris, “Deep neural architectures for prediction in healthcare,” inComplex & Intelligent Systems, vol. 4. Springer, 2018, pp. 297–310. 1

  5. [5]

    A large imaging database and novel deep neural architecture for COVID-19 diagnosis,

    A. Arsenos, D. Kollias, and S. Kollias, “A large imaging database and novel deep neural architecture for COVID-19 diagnosis,” inIEEE 14th Image, Video, and Multidimen- sional Signal Processing Workshop (IVMSP). IEEE, 2022, pp. 1–5. 1

  6. [8]

    Learn- ing imbalanced datasets with label-distribution-aware mar- gin loss,

    K. Cao, C. Wei, A. Gaidon, N. Arechiga, and T. Ma, “Learn- ing imbalanced datasets with label-distribution-aware mar- gin loss,” inAdvances in Neural Information Processing Sys- tems, vol. 32, 2019. 2

  7. [10]

    Fairness in medical AI: Systematic review of bias detection and mitigation,

    Z. Xuet al., “Fairness in medical AI: Systematic review of bias detection and mitigation,”arXiv preprint, 2024. 2

  8. [11]

    Equalization loss for long-tailed object recognition,

    J. Tan, C. Wang, B. Li, Q. Li, W. Ouyang, C. Yin, and J. Yan, “Equalization loss for long-tailed object recognition,” inPro- ceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020, pp. 11 662–11 671. 2, 4

  9. [12]

    Improv- ing fairness in deepfake detection,

    Y . Ju, S. Hu, S. Jia, G. H. Chen, and S. Lyu, “Improv- ing fairness in deepfake detection,” inProceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, 2024, pp. 4655–4665. 2, 3, 5

  10. [13]

    Optimization of condi- tional value-at-risk,

    R. T. Rockafellar and S. Uryasev, “Optimization of condi- tional value-at-risk,”Journal of Risk, vol. 2, pp. 21–42, 2000. 2

  11. [14]

    Domain adaptation and fairness in medical imaging,

    D. Kollias, “Domain adaptation and fairness in medical imaging,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, 2024. 2, 5

  12. [15]

    PHAROS-AIF-MIH: Fair disease diagnosis bench- mark,

    ——, “PHAROS-AIF-MIH: Fair disease diagnosis bench- mark,”arXiv preprint, 2025. 2, 5, 6

  13. [16]

    A survey on deep learning in medical image analysis,

    G. Litjens, T. Kooi, B. E. Bejnordi, A. A. A. Setio, F. Ciompi, M. Ghafoorian, J. A. Van Der Laak, B. Van Ginneken, and C. I. S´anchez, “A survey on deep learning in medical image analysis,”Medical Image Analysis, vol. 42, pp. 60–88, 2017. 2

  14. [17]

    Deep neural architectures for prediction in healthcare,

    D. Kollias, A. Tagaris, A. Stafylopatis, S. Kollias, and G. Tagaris, “Deep neural architectures for prediction in healthcare,” inComplex & Intelligent Systems, vol. 4. Springer, 2018, pp. 297–310

  15. [18]

    An explainable non-local network for covid-19 diagnosis,

    J. Yang, P. Huang, J. Hu, S. Hu, S. Lyu, X. Wang, J. Guo, and X. Wu, “An explainable non-local network for covid-19 diagnosis,”arXiv preprint arXiv:2408.04300, 2024. 2

  16. [19]

    Cgd-net: A hybrid end-to-end net- work with gating decoding for liver tumor segmentation from ct images,

    X. Zhu, S. Huet al., “Cgd-net: A hybrid end-to-end net- work with gating decoding for liver tumor segmentation from ct images,” in2024 IEEE International Conference on Ad- vanced Video and Signal Based Surveillance (AVSS). IEEE, 2024, pp. 1–7. 2

  17. [20]

    Deep residual learning for image recognition,

    K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” inProceedings of the IEEE Confer- ence on Computer Vision and Pattern Recognition, 2016, pp. 770–778. 2

  18. [21]

    Densely connected convolutional networks,

    G. Huang, Z. Liu, L. Van Der Maaten, and K. Q. Weinberger, “Densely connected convolutional networks,” inProceed- ings of the IEEE Conference on Computer Vision and Pattern Recognition, 2017, pp. 4700–4708. 2

  19. [22]

    Xception: Deep learning with depthwise separa- ble convolutions,

    F. Chollet, “Xception: Deep learning with depthwise separa- ble convolutions,” inProceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2017, pp. 1251–

  20. [23]

    A large imaging database and novel deep neural architecture for COVID-19 diagnosis,

    A. Arsenos, D. Kollias, and S. Kollias, “A large imaging database and novel deep neural architecture for COVID-19 diagnosis,” inIEEE 14th Image, Video, and Multidimen- sional Signal Processing Workshop (IVMSP). IEEE, 2022, pp. 1–5. 2

  21. [24]

    MIA- COV19D: COVID-19 detection through 3-D chest CT im- age analysis,

    D. Kollias, A. Arsenos, L. Soukissian, and S. Kollias, “MIA- COV19D: COVID-19 detection through 3-D chest CT im- age analysis,” inProceedings of the IEEE/CVF International Conference on Computer Vision, 2021, pp. 537–544. 2

  22. [25]

    Can spatiotemporal 3D CNNs retrace the history of 2D CNNs and ImageNet?

    K. Hara, H. Kataoka, and Y . Satoh, “Can spatiotemporal 3D CNNs retrace the history of 2D CNNs and ImageNet?” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2018, pp. 6546–6555. 2

  23. [26]

    A closer look at spatiotemporal convolutions for action recognition,

    D. Tran, H. Wang, L. Torresani, J. Ray, Y . LeCun, and M. Paluri, “A closer look at spatiotemporal convolutions for action recognition,” inProceedings of the IEEE Confer- ence on Computer Vision and Pattern Recognition, 2018, pp. 6450–6459. 2

  24. [27]

    A deep neural archi- tecture for harmonizing 3-D input data analysis and decision making in medical imaging,

    D. Kollias, A. Arsenos, and S. Kollias, “A deep neural archi- tecture for harmonizing 3-D input data analysis and decision making in medical imaging,”Neurocomputing, vol. 542, p. 126244, 2023. 2

  25. [28]

    Multi-class COVID-19 diagnosis from CT scans using 3D ResNeSt50,

    Z. Liet al., “Multi-class COVID-19 diagnosis from CT scans using 3D ResNeSt50,”arXiv preprint, 2025. 2

  26. [29]

    ResNeSt: Split-attention networks,

    H. Zhang, C. Wu, Z. Zhang, Y . Zhu, H. Lin, Z. Zhang, Y . Sun, T. He, J. Mueller, R. Manmatha, M. Li, and A. Smola, “ResNeSt: Split-attention networks,” inProceed- ings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, 2022, pp. 2736–2746. 2

  27. [30]

    Multitask learning,

    R. Caruana, “Multitask learning,”Machine Learning, vol. 28, no. 1, pp. 41–75, 1997. 2

  28. [31]

    Residual attention U-Net for automated multi-class segmentation of COVID-19 chest CT images,

    X. Chen, L. Yao, and Y . Zhang, “Residual attention U-Net for automated multi-class segmentation of COVID-19 chest CT images,”arXiv preprint arXiv:2004.05645, 2020. 2

  29. [32]

    Seven-point checklist and skin lesion classification using multitask multimodal neural nets,

    J. Kawahara, S. Daneshvar, G. Argenziano, and G. Hamarneh, “Seven-point checklist and skin lesion classification using multitask multimodal neural nets,”IEEE Journal of Biomedical and Health Informatics, vol. 23, no. 2, pp. 538–546, 2019. 2

  30. [33]

    Learn- ing imbalanced datasets with label-distribution-aware mar- gin loss,

    K. Cao, C. Wei, A. Gaidon, N. Arechiga, and T. Ma, “Learn- ing imbalanced datasets with label-distribution-aware mar- gin loss,” inAdvances in Neural Information Processing Sys- tems, vol. 32, 2019. 3

  31. [35]

    Fairness in medical AI: Systematic review of bias detection and mitigation,

    Z. Xuet al., “Fairness in medical AI: Systematic review of bias detection and mitigation,”arXiv preprint, 2024. 3

  32. [36]

    Address- ing fairness in artificial intelligence for medical imaging,

    M. A. Ricci Lara, R. Echeveste, and E. Ferrante, “Address- ing fairness in artificial intelligence for medical imaging,” Nature Communications, vol. 13, no. 1, p. 4581, 2022. 3

  33. [37]

    Domain adapta- tion, explainability & fairness in AI for medical image anal- ysis: Diagnosis of COVID-19 based on 3-D chest CT-scans,

    D. Kollias, A. Arsenos, and S. Kollias, “Domain adapta- tion, explainability & fairness in AI for medical image anal- ysis: Diagnosis of COVID-19 based on 3-D chest CT-scans,” arXiv preprint arXiv:2403.02192, 2024. 3

  34. [38]

    A survey on bias and fairness in machine learn- ing,

    N. Mehrabi, F. Morstatter, N. Saxena, K. Lerman, and A. Galstyan, “A survey on bias and fairness in machine learn- ing,”ACM Computing Surveys, vol. 54, no. 6, pp. 1–35,

  35. [39]

    Decoupling bias, aligning distributions: Synergistic fair- ness optimization for deepfake detection,

    F. Ding, W. Yi, Y . Zhou, X. He, H. Rao, and S. Hu, “Decoupling bias, aligning distributions: Synergistic fair- ness optimization for deepfake detection,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2026

  36. [40]

    Rethinking individual fair- ness in deepfake detection,

    A. Hou, L. Lin, J. Li, and S. Hu, “Rethinking individual fair- ness in deepfake detection,” inProceedings of the 33rd ACM International Conference on Multimedia, 2025, pp. 11 424– 11 433

  37. [41]

    Robust fairness vision-language learning for medical image analysis,

    S. Bansal, M. Wu, X. Wang, and S. Hu, “Robust fairness vision-language learning for medical image analysis,”MIPR, 2025

  38. [42]

    Preserving auc fairness in learning with noisy protected groups,

    M. Wu, L. Lin, W. Zhang, X. Wang, Z. Yang, and S. Hu, “Preserving auc fairness in learning with noisy protected groups,” inThe 42nd International Conference on Machine Learning (ICML), 2025

  39. [43]

    Preserv- ing fairness generalization in deepfake detection,

    L. Lin, X. He, Y . Ju, X. Wang, F. Ding, and S. Hu, “Preserv- ing fairness generalization in deepfake detection,” inPro- ceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024, pp. 16 815–16 825

  40. [44]

    Fairness in survival analysis with distributionally robust optimization,

    S. Hu and G. H. Chen, “Fairness in survival analysis with distributionally robust optimization,”Journal of Machine Learning Research, vol. 25, no. 246, pp. 1–85, 2024

  41. [45]

    Distributionally robust survival analysis: A novel fairness loss without demographics,

    ——, “Distributionally robust survival analysis: A novel fairness loss without demographics,” inMachine Learning for Health. PMLR, 2022, pp. 62–87. 3

  42. [46]

    GBDF: Gender balanced deepfake dataset towards fair deepfake detection,

    A. V . Nadimpalli and A. Rattani, “GBDF: Gender balanced deepfake dataset towards fair deepfake detection,” inJournal of Computing Sciences in Colleges, 2022. 3

  43. [47]

    Equality of opportunity in supervised learning,

    M. Hardt, E. Price, and N. Srebro, “Equality of opportunity in supervised learning,” inAdvances in Neural Information Processing Systems, vol. 29, 2016. 3

  44. [48]

    Fairness through awareness,

    C. Dwork, M. Hardt, T. Pitassi, O. Reingold, and R. Zemel, “Fairness through awareness,” inProceedings of the 3rd Innovations in Theoretical Computer Science Conference, 2012, pp. 214–226. 3

  45. [49]

    Fairness-aware adversarial perturbation to- wards bias mitigation for deployed deep models,

    Z. Wanget al., “Fairness-aware adversarial perturbation to- wards bias mitigation for deployed deep models,” inPro- ceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022, pp. 10 379–10 388. 3

  46. [50]

    Optimization of condi- tional value-at-risk,

    R. T. Rockafellar and S. Uryasev, “Optimization of condi- tional value-at-risk,”Journal of Risk, vol. 2, pp. 21–42, 2000. 3

  47. [51]

    Large- scale methods for distributionally robust optimization,

    D. Levy, Y . Carmon, J. Duchi, and A. Sidford, “Large- scale methods for distributionally robust optimization,” in Advances in Neural Information Processing Systems, vol. 33, 2020, pp. 8847–8860. 3

  48. [52]

    Rank-based decomposable losses in machine learning: A survey,

    S. Huet al., “Rank-based decomposable losses in machine learning: A survey,”IEEE Transactions on Pattern Analysis and Machine Intelligence, 2023. 3

  49. [53]

    Sum of ranked range loss for supervised learning,

    ——, “Sum of ranked range loss for supervised learning,” arXiv preprint, 2022. 3

  50. [54]

    Distributionally robust optimization for fairness-aware survival analysis,

    S. Hu and G. H. Chen, “Distributionally robust optimization for fairness-aware survival analysis,”arXiv preprint, 2022. 3

  51. [55]

    Improv- ing fairness in deepfake detection,

    Y . Ju, S. Hu, S. Jia, G. H. Chen, and S. Lyu, “Improv- ing fairness in deepfake detection,” inProceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, 2024, pp. 4766–4775. 3

  52. [56]

    Preserv- ing fairness generalization in deepfake detection,

    L. Lin, X. He, Y . Ju, X. Wang, F. Ding, and S. Hu, “Preserv- ing fairness generalization in deepfake detection,” inPro- ceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024, pp. 16 815–16 825. 3

  53. [57]

    Sharpness-aware minimization for efficiently improving generalization,

    P. Foret, A. Kleiner, H. Mobahi, and B. Neyshabur, “Sharpness-aware minimization for efficiently improving generalization,” inInternational Conference on Learning Representations, 2021. 3

  54. [58]

    Rethinking individual fair- ness in deepfake detection,

    A. Hou, L. Lin, J. Li, and S. Hu, “Rethinking individual fair- ness in deepfake detection,” inProceedings of the ACM In- ternational Conference on Multimedia, 2025. 3

  55. [59]

    Robust ai-generated face detection with imbal- anced data,

    Y . S. Krubha, A. Hou, B. Vester, W. Walker, X. Wang, L. Lin, and S. Hu, “Robust ai-generated face detection with imbal- anced data,”MIPR, 2025. 3

  56. [60]

    Robust covid-19 detection in ct images with clip,

    L. Lin, Y . S. Krubha, Z. Yang, C. Ren, T. D. Le, I. Amerini, X. Wang, and S. Hu, “Robust covid-19 detection in ct images with clip,” in2024 IEEE 7th International Conference on Multimedia Information Processing and Retrieval (MIPR). IEEE, 2024, pp. 586–592

  57. [61]

    Robust light-weight facial affective be- havior recognition with clip,

    L. Lin, S. Huet al., “Robust light-weight facial affective be- havior recognition with clip,”MIPR, 2024

  58. [62]

    Robust clip- based detector for exposing diffusion model-generated im- ages,

    L. Lin, I. Amerini, X. Wang, S. Huet al., “Robust clip- based detector for exposing diffusion model-generated im- ages,”MIPR, 2024

  59. [63]

    Rank-based decomposable losses in machine learning: A survey,

    S. Hu, X. Wanget al., “Rank-based decomposable losses in machine learning: A survey,”IEEE Transactions on Pattern Analysis and Machine Intelligence, 2023

  60. [64]

    Learning a deep dual-level network for robust deepfake detection,

    W. Pu, J. Hu, X. Wang, Y . Li, S. Hu, B. Zhu, R. Song, Q. Song, X. Wu, and S. Lyu, “Learning a deep dual-level network for robust deepfake detection,”Pattern Recognition, vol. 130, p. 108832, 2022

  61. [65]

    Robust attentive deep neural network for detecting gan-generated faces,

    H. Guo, S. Hu, X. Wang, M.-C. Chang, and S. Lyu, “Robust attentive deep neural network for detecting gan-generated faces,”IEEE Access, vol. 10, pp. 32 574–32 583, 2022

  62. [66]

    Sum of ranked range loss for supervised learning,

    S. Hu, Y . Ying, X. Wang, and S. Lyu, “Sum of ranked range loss for supervised learning,”Journal of Machine Learning Research, vol. 23, no. 112, pp. 1–44, 2022

  63. [67]

    Learning by minimizing the sum of ranked range,

    ——, “Learning by minimizing the sum of ranked range,” Advances in Neural Information Processing Systems, vol. 33, pp. 21 013–21 023, 2020. 3

  64. [68]

    A closer look at spatiotemporal convolutions for action recognition,

    D. Tran, H. Wang, L. Torresani, J. Ray, Y . LeCun, and M. Paluri, “A closer look at spatiotemporal convolutions for action recognition,” inProceedings of the IEEE Confer- ence on Computer Vision and Pattern Recognition, 2018, pp. 6450–6459. 3

  65. [69]

    What is the effect of importance weighting in deep learning?

    J. Byrd and Z. Lipton, “What is the effect of importance weighting in deep learning?” inInternational Conference on Machine Learning, 2019, pp. 872–881. 4

  66. [70]

    Learn- ing imbalanced datasets with label-distribution-aware mar- gin loss,

    K. Cao, C. Wei, A. Gaidon, N. Arechiga, and T. Ma, “Learn- ing imbalanced datasets with label-distribution-aware mar- gin loss,” inAdvances in Neural Information Processing Sys- tems, 2019. 4

  67. [71]

    Optimization of condi- tional value-at-risk,

    R. T. Rockafellar and S. Uryasev, “Optimization of condi- tional value-at-risk,”Journal of Risk, vol. 2, no. 3, pp. 21–41,

  68. [72]

    Rawls,Justice as Fairness: A Restatement

    J. Rawls,Justice as Fairness: A Restatement. Harvard Uni- versity Press, 2001. 5

  69. [73]

    Long-tail learning via logit adjustment,

    A. K. Menon, S. Jayasumana, A. S. Rawat, H. Jain, A. Veit, and S. Kumar, “Long-tail learning via logit adjustment,” inInternational Conference on Learning Representations,

  70. [74]

    Deep residual learning for image recognition,

    K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” inProceedings of the IEEE Confer- ence on Computer Vision and Pattern Recognition, 2016, pp. 770–778. 6

  71. [75]

    The Kinetics Human Action Video Dataset

    W. Kay, J. Carreira, K. Simonyan, B. Zhang, C. Hillier, S. Vijayanarasimhan, F. Viola, T. Green, T. Back, P. Nat- sev, M. Suleyman, and A. Zisserman, “The Kinetics human action video dataset,” inarXiv preprint arXiv:1705.06950,

  72. [76]

    Adam: A method for stochastic op- timization,

    D. P. Kingma and J. Ba, “Adam: A method for stochastic op- timization,” inInternational Conference on Learning Repre- sentations, 2015. 6