Patient-Level Elbow Abnormality Detection: Leakage-Aware Evaluation of Learned Preprocessing, Calibration, and Triage-Oriented Operating Points

Ahmed Sallam; Ahmet Kaplan

arxiv: 2606.31348 · v1 · pith:FPIXIHPGnew · submitted 2026-06-30 · 💻 cs.CV

Patient-Level Elbow Abnormality Detection: Leakage-Aware Evaluation of Learned Preprocessing, Calibration, and Triage-Oriented Operating Points

Ahmed Sallam , Ahmet Kaplan This is my paper

Pith reviewed 2026-07-01 05:53 UTC · model grok-4.3

classification 💻 cs.CV

keywords elbow radiographsMURApreprocessingDenseNetpatient-level evaluationdata leakagecalibrationabnormality detection

0 comments

The pith

No preprocessing strategy shows consistent advantage over raw DenseNet121 in patient-level elbow abnormality detection.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This paper tests learned and standard preprocessing pipelines for detecting musculoskeletal abnormalities in elbow X-rays from the MURA dataset. It applies a strict patient-level split to avoid data leakage between training and test sets. Across discrimination and calibration metrics, no preprocessing approach delivers a reliable improvement over simply feeding raw images into a DenseNet121 model. The findings indicate that preprocessing benefits are small and vary with the exact configuration chosen.

Core claim

In a leakage-aware evaluation on elbow radiographs, preprocessing pipelines with and without a DnCNN module were compared to a raw-input DenseNet121 baseline for patient-level abnormality detection. Differences in performance were modest and configuration-dependent, with no strategy achieving consistent gains in AUROC, PR-AUC, ECE, or Brier score. The raw-input baseline stayed competitive, and certain raw plus DnCNN combinations even lowered calibration errors while CLAHE with DnCNN did not.

What carries the argument

Leakage-aware patient-level protocol that keeps all images from one patient in a single data split to prevent leakage.

If this is right

Preprocessing effects depend on the specific combination of methods and metrics used.
Raw inputs with DnCNN front-end can reduce expected calibration error and Brier score.
CLAHE preprocessing combined with DnCNN fails to improve calibration.
Validation-selected operating points allow targeting high specificity for triage.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Similar modest preprocessing effects may appear in other radiograph-based detection tasks if patient-level splits are enforced.
Efforts to develop new preprocessing might be better directed toward improving model architectures or data collection instead.
Repeating the experiments on different anatomical regions could test the generality of the baseline's competitiveness.

Load-bearing premise

That the patient-level split fully eliminates leakage while preserving enough data for reliable training and testing.

What would settle it

A preprocessing pipeline that outperforms the raw DenseNet121 baseline on all metrics (AUROC, PR-AUC, ECE, Brier) consistently across repeated patient-level splits would falsify the claim of no consistent advantage.

Figures

Figures reproduced from arXiv: 2606.31348 by Ahmed Sallam, Ahmet Kaplan.

**Figure 1.** Figure 1: Overview of the proposed orthopedic triage pipeline. Patient-level studies from the MURA elbow dataset are processed using different preprocessing pipelines (raw, CLAHE, and diverse representations). A DenseNet121 backbone is employed, optionally preceded by a lightweight DnCNN module for learned denoising. Image-level predictions are aggregated at the patient level, and performance is evaluated using di… view at source ↗

**Figure 2.** Figure 2: Patient-level AUROC on the test set. Error bars indicate standard deviation across seeds. 3.2 Calibration and probability reliability In a triage setting, decisions are made on the model’s confidence, which is based on reliable probability estimates. Here, we evaluated the predicted probabilities at the patient level. Expected calibration error (ECE) and Brier score were used to measure calibration on the … view at source ↗

**Figure 3.** Figure 3: Reliability diagram for the diverse + DnCNN configuration on the test set at the patient level (n = 201 patients). The dashed line indicates perfect calibration. Each point represents the mean predicted probability and empirical accuracy within a confidence bin; bin populations are smallest in the mid-probability range (0.1–0.5), which accounts for the larger deviations observed there [PITH_FULL_IMAGE:fig… view at source ↗

**Figure 4.** Figure 4: Patient-level sensitivity at a target specificity of 0.95 for each of the six configurations on the test set. Error bars indicate standard deviation across seeds (n = 5). Configurations with the DnCNN front-end achieve higher sensitivity at this operating point on raw and diverse inputs, while CLAHE+DnCNN does not show a consistent gain [PITH_FULL_IMAGE:figures/full_fig_p010_4.png] view at source ↗

read the original abstract

In this study, we examine learned preprocessing pipelines in the context of triage-oriented orthopedic abnormality detection task using elbow radiographs from MURA dataset. The evaluation focuses on patient-level detection of musculoskeletal abnormalities under a leakage-aware protocol. We compare multiple preprocessing pipelines, with and without a lightweight DnCNN module as a learned preprocessing component, to assess their impact on discrimination and calibration. Performance is assessed using discrimination metrics (AUROC, PR-AUC), calibration measures (ECE, Brier score), and validation-selected operating point analysis targeting high specificity. Results show that differences across preprocessing strategies are modest and configuration-dependent, with no consistent discrimination advantage over the raw-input DenseNet121 baseline. The raw and diverse inputs combined with the DnCNN front-end showed reduced ECE and Brier score, while CLAHE combined with DnCNN did not improve calibration. Overall, the results suggest that under patient-level evaluation, preprocessing gains are modest and configuration-dependent; the raw-input DenseNet121 baseline remains competitive throughout, and no tested preprocessing strategy produced a consistent discrimination advantage across all metrics.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

No preprocessing pipeline beat the raw DenseNet121 baseline on patient-level MURA elbow detection once leakage is blocked.

read the letter

The key point is that under a patient-level leakage-aware split on MURA elbow radiographs, none of the tested preprocessing pipelines—including variants with a DnCNN learned front-end—produced a consistent improvement in AUROC or PR-AUC over the plain DenseNet121 on raw inputs. Differences were modest and flipped depending on the metric and operating point chosen for high-specificity triage.

The paper does a clean job of running the same backbone across multiple preprocessing choices while tracking both discrimination and calibration (ECE, Brier). It also reports results at validation-selected operating points rather than just reporting peak AUROC. The leakage-aware protocol is the right default for this kind of work, and the authors actually applied it. That combination of patient-level splits plus calibration checks is useful to see written out on a public dataset.

The main limitation is scope: everything is elbow radiographs from one dataset. The abstract gives no error bars or paired statistical tests on the metric differences, so it is hard to judge whether the “modest and configuration-dependent” conclusion is noise or signal. No new method is introduced, so the contribution is the controlled negative result rather than a technical advance.

This is worth a referee for groups that build triage systems or worry about preprocessing overhead in radiology pipelines. A reader who already runs patient-level splits and calibration checks will find the numbers confirmatory rather than surprising. I would send it to review because the evaluation protocol is sound and the negative finding on preprocessing is the kind of result that should be on record, even if the paper stays narrow.

Referee Report

2 major / 2 minor

Summary. The manuscript reports an empirical comparison of preprocessing pipelines (including DnCNN-based learned preprocessing) versus a raw-input DenseNet121 baseline for patient-level elbow abnormality detection on the MURA dataset. Under a leakage-aware patient-level split, it finds that differences in AUROC and PR-AUC are modest and configuration-dependent, with no preprocessing strategy showing consistent discrimination gains; some raw+DnCNN and diverse-input combinations improve calibration (ECE, Brier), while CLAHE+DnCNN does not. The raw baseline remains competitive across metrics and high-specificity operating points.

Significance. If the leakage-aware protocol and modest differences hold under scrutiny, the result is useful for triage-oriented deployment: it indicates that added preprocessing complexity does not reliably improve discrimination on this task and dataset, supporting simpler baselines. The multi-metric evaluation (discrimination + calibration + operating-point analysis) and explicit patient-level focus are strengths that align with clinical requirements.

major comments (2)

[§3] §3 (Methods, leakage-aware protocol): The patient-level split is described as preventing same-patient images across train/val/test, but exact grouping rules, patient counts per split, and verification that no intra-patient leakage occurred are not quantified; this is load-bearing for the central claim that all comparisons are leakage-free and that the baseline remains competitive.
[§4] §4 (Results, discrimination tables): The claim of 'no consistent discrimination advantage' rests on modest AUROC/PR-AUC differences, yet no statistical tests (e.g., DeLong or bootstrap CIs) or effect-size measures are reported for pairwise comparisons against the raw baseline; without these, it is unclear whether observed differences are robust or merely sampling variation.

minor comments (2)

[Table 2] Table 2 and Figure 3: axis labels and legend entries for the DnCNN variants are abbreviated without a clear key, making it difficult to map configurations to the text descriptions.
[§5] §5 (Discussion): The statement that 'preprocessing gains are modest' would benefit from a short quantitative summary (e.g., maximum observed AUROC delta) rather than qualitative description only.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive review and the recommendation for minor revision. The comments correctly identify areas where additional detail and statistical support would strengthen the presentation of the leakage-aware protocol and the discrimination results. We respond to each major comment below.

read point-by-point responses

Referee: [§3] §3 (Methods, leakage-aware protocol): The patient-level split is described as preventing same-patient images across train/val/test, but exact grouping rules, patient counts per split, and verification that no intra-patient leakage occurred are not quantified; this is load-bearing for the central claim that all comparisons are leakage-free and that the baseline remains competitive.

Authors: We agree that the description would benefit from explicit quantification. The current manuscript states that the split is performed at the patient level using unique patient identifiers, but does not report the resulting patient counts or the precise verification steps. In the revision we will add the number of patients (and images) in each split together with a concise statement of the grouping procedure and the check performed to confirm no patient ID appears in more than one partition. revision: yes
Referee: [§4] §4 (Results, discrimination tables): The claim of 'no consistent discrimination advantage' rests on modest AUROC/PR-AUC differences, yet no statistical tests (e.g., DeLong or bootstrap CIs) or effect-size measures are reported for pairwise comparisons against the raw baseline; without these, it is unclear whether observed differences are robust or merely sampling variation.

Authors: We accept the point. While the observed AUROC and PR-AUC differences are small and the raw baseline remains competitive across all tested configurations and secondary metrics, the absence of formal statistical comparison leaves the robustness of those differences unquantified. In the revised manuscript we will report bootstrap confidence intervals for the AUROC differences (or DeLong tests where appropriate) between each preprocessing variant and the raw DenseNet121 baseline. revision: yes

Circularity Check

0 steps flagged

Empirical evaluation with no derivation chain or self-referential claims

full rationale

The paper is a purely empirical comparison of preprocessing pipelines (including DnCNN variants) on the MURA elbow radiographs under a patient-level leakage-aware split. No equations, mathematical derivations, uniqueness theorems, or parameter fits are presented that could reduce to their own inputs. The central claim—that no tested strategy yields a consistent discrimination advantage over the raw DenseNet121 baseline—is a direct reporting of observed AUROC, PR-AUC, ECE, and Brier scores across configurations. This is self-contained against external benchmarks and contains no load-bearing self-citations or ansatzes. Honest non-finding applies.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The work relies on standard assumptions in supervised deep learning for image classification and the properties of the public MURA dataset. No new free parameters or entities are introduced in the reported claim.

axioms (1)

domain assumption The MURA dataset provides reliable labels for elbow abnormalities and the patient-level splits can be made without leakage.
Central to the leakage-aware evaluation protocol described in the abstract.

pith-pipeline@v0.9.1-grok · 5720 in / 1132 out tokens · 42332 ms · 2026-07-01T05:53:01.365470+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

18 extracted references · 18 canonical work pages · 2 internal anchors

[1]

Acta Biomedica89(1-S), 111–123 (2018)

Pinto, A., Berritto, D., Russo, A., Riccitiello, F., Caruso, M., Belfiore, M.P., Papapietro, V.R., Carotti, M., Pinto, F., Giovagnoni, A., Romano, L., Grassi, R.: Traumatic fractures in adults: missed diagnosis on plain radio- graphs in the Emergency Department. Acta Biomedica89(1-S), 111–123 (2018). https://doi.org/10.23750/abm.v89i1-S.7015

work page doi:10.23750/abm.v89i1-s.7015 2018
[2]

Journal of Emergency Nursing39(4), 398–408 (2013)

Robinson, D.J.: An Integrative Review: Triage Protocols and the Effect on ED Length of Stay. Journal of Emergency Nursing39(4), 398–408 (2013). https://doi.org/10.1016/j.jen.2011.12.016 12 A. Sallam and A. Kaplan

work page doi:10.1016/j.jen.2011.12.016 2013
[3]

BMC Musculoskeletal Disorders21(1), 510 (2020)

Samsson, K.S., Larsson, M.E.H.: Effects on health and process outcomes of physiotherapist-led orthopaedic triage for patients with musculoskeletal disorders: a systematic review of comparative studies. BMC Musculoskeletal Disorders21(1), 510 (2020). https://doi.org/10.1186/s12891-020-03673-9

work page doi:10.1186/s12891-020-03673-9 2020
[4]

MURA: Large Dataset for Abnormality Detection in Musculoskeletal Radiographs

Rajpurkar, P., Irvin, J., Bagul, A., Ding, D., Duan, T., Mehta, H., Yang, B., Zhu, K., Laird, D., Ball, R.L., Langlotz, C., Shpanskaya, K., Lungren, M.P., Ng, A.Y.: MURA: Large Dataset for Abnormality Detection in Musculoskeletal Radiographs. arXiv preprint arXiv:1712.06957 (2018)

work page internal anchor Pith review Pith/arXiv arXiv 2018
[5]

Applied Sciences10(4), 1507 (2020)

Tanzi, L., Vezzetti, E., Moreno, R., Moos, S.: X-Ray Bone Fracture Classifica- tion Using Deep Learning: A Baseline for Designing a Reliable Approach. Applied Sciences10(4), 1507 (2020). https://doi.org/10.3390/app10041507

work page doi:10.3390/app10041507 2020
[6]

Clinical Radiology79(11), e1394–e1402 (2024)

Tahir, A., Saadia, A., Khan, K., Gul, A., Qahmash, A., Akram, R.N.: Enhancing diagnosis: ensemble deep-learning model for fracture detec- tion using X-ray images. Clinical Radiology79(11), e1394–e1402 (2024). https://doi.org/10.1016/j.crad.2024.08.006

work page doi:10.1016/j.crad.2024.08.006 2024
[7]

Diagnostics12(10), 2420 (2022)

Meena, T., Roy, S.: Bone Fracture Detection Using Deep Supervised Learning from Radiological Images: A Paradigm Shift. Diagnostics12(10), 2420 (2022). https://doi.org/10.3390/diagnostics12102420

work page doi:10.3390/diagnostics12102420 2022
[8]

Scientific Reports14(1), 23053 (2024)

Husarek, J., Hess, S., Razaeian, S., et al.: Artificial intelligence in commercial frac- ture detection products: a systematic review and meta-analysis of diagnostic test accuracy. Scientific Reports14(1), 23053 (2024). https://doi.org/10.1038/s41598- 024-73058-8

work page doi:10.1038/s41598- 2024
[9]

Materials Today: Proceedings80, 2557– 2562 (2023)

Karanam, S.R., Srinivas, Y., Chakravarty, S.: A systematic review on approach and analysis of bone fracture classification. Materials Today: Proceedings80, 2557– 2562 (2023). https://doi.org/10.1016/j.matpr.2021.06.408

work page doi:10.1016/j.matpr.2021.06.408 2023
[10]

Radiology304(1), 50–62 (2022)

Kuo, R.Y.L., Harrison, C., Curran, T.A., Jones, B., Freethy, A., Cussons, D., Stewart, M., Collins, G.S., Furniss, D.: Artificial Intelligence in Fracture Detec- tion: A Systematic Review and Meta-Analysis. Radiology304(1), 50–62 (2022). https://doi.org/10.1148/radiol.211785

work page doi:10.1148/radiol.211785 2022
[11]

PLOS Digital Health3(1), e0000438 (2024)

Jung, J., Dai, J., Liu, B., Wu, Q.: Artificial intelligence in fracture de- tection with different image modalities and data types: A systematic review and meta-analysis. PLOS Digital Health3(1), e0000438 (2024). https://doi.org/10.1371/journal.pdig.0000438

work page doi:10.1371/journal.pdig.0000438 2024
[12]

NPJ Digital Medicine 3, 144 (2020)

Jones, R.M., Sharma, A., Hotchkiss, R., et al.: Assessment of a deep-learning sys- tem for fracture detection in musculoskeletal radiographs. NPJ Digital Medicine 3, 144 (2020). https://doi.org/10.1038/s41746-020-00352-w

work page doi:10.1038/s41746-020-00352-w 2020
[13]

In: Drukker, K., Mazurowski, M.A

Luo, J., Kitamura, G., Doganay, E., Arefan, D., Wu, S.: Medical knowledge- guided deep curriculum learning for elbow fracture diagnosis from X-ray images. In: Drukker, K., Mazurowski, M.A. (eds.) Medical Imaging 2021: Computer-Aided Diagnosis. SPIE (2021). https://doi.org/10.1117/12.2582184

work page doi:10.1117/12.2582184 2021
[14]

Quantitative Imaging in Medicine and Surgery15(3), 2529–2546 (2025)

Wu, Y., Fong, S., Yu, J.: Enhancing bone radiology images classification through appropriate preprocessing: a deep learning and explainable artificial intelligence approach. Quantitative Imaging in Medicine and Surgery15(3), 2529–2546 (2025). https://doi.org/10.21037/qims-24-1745

work page doi:10.21037/qims-24-1745 2025
[15]

IEEE Transactions on Image Processing26(7), 3142–3155 (2017)

Zhang, K., Zuo, W., Chen, Y., Meng, D., Zhang, L.: Beyond a Gaussian Denoiser: Residual Learning of Deep CNN for Image Denoising. IEEE Transactions on Image Processing26(7), 3142–3155 (2017). https://doi.org/10.1109/tip.2017.2662206

work page doi:10.1109/tip.2017.2662206 2017
[16]

In: 2020 3rd International Conference on Communication Sys- Patient-Level Elbow Abnormality Detection 13 tem, Computing and IT Applications (CSCITA), pp

Sharan, T.S., Bhattacharjee, R., Sharma, S., Sharma, N.: Evaluation of Deep Learning Methods (DnCNN and U-Net) for Denoising of Heart Ausculta- tion Signals. In: 2020 3rd International Conference on Communication Sys- Patient-Level Elbow Abnormality Detection 13 tem, Computing and IT Applications (CSCITA), pp. 151–155. IEEE (2020). https://doi.org/10.1109...

work page doi:10.1109/cscita47329.2020.9137813 2020
[17]

In: Nanda, S.J., et al

Kangralkar, V., Hulmani, V., Nasery, T., Shilaskar, S.: Image Denoising with DnCNN and Autoencoder: A Deep Learning Approach. In: Nanda, S.J., et al. (eds.) Data Science and Applications, pp. 323–336. Springer, Singapore (2025). https://doi.org/10.1007/978-981-96-2299-3_22

work page doi:10.1007/978-981-96-2299-3_22 2025
[18]

MONAI: An open-source framework for deep learning in healthcare

Cardoso, M.J., et al.: MONAI: An open-source framework for deep learning in healthcare. arXiv preprint arXiv:2211.02701 (2022)

work page internal anchor Pith review Pith/arXiv arXiv 2022

[1] [1]

Acta Biomedica89(1-S), 111–123 (2018)

Pinto, A., Berritto, D., Russo, A., Riccitiello, F., Caruso, M., Belfiore, M.P., Papapietro, V.R., Carotti, M., Pinto, F., Giovagnoni, A., Romano, L., Grassi, R.: Traumatic fractures in adults: missed diagnosis on plain radio- graphs in the Emergency Department. Acta Biomedica89(1-S), 111–123 (2018). https://doi.org/10.23750/abm.v89i1-S.7015

work page doi:10.23750/abm.v89i1-s.7015 2018

[2] [2]

Journal of Emergency Nursing39(4), 398–408 (2013)

Robinson, D.J.: An Integrative Review: Triage Protocols and the Effect on ED Length of Stay. Journal of Emergency Nursing39(4), 398–408 (2013). https://doi.org/10.1016/j.jen.2011.12.016 12 A. Sallam and A. Kaplan

work page doi:10.1016/j.jen.2011.12.016 2013

[3] [3]

BMC Musculoskeletal Disorders21(1), 510 (2020)

Samsson, K.S., Larsson, M.E.H.: Effects on health and process outcomes of physiotherapist-led orthopaedic triage for patients with musculoskeletal disorders: a systematic review of comparative studies. BMC Musculoskeletal Disorders21(1), 510 (2020). https://doi.org/10.1186/s12891-020-03673-9

work page doi:10.1186/s12891-020-03673-9 2020

[4] [4]

MURA: Large Dataset for Abnormality Detection in Musculoskeletal Radiographs

Rajpurkar, P., Irvin, J., Bagul, A., Ding, D., Duan, T., Mehta, H., Yang, B., Zhu, K., Laird, D., Ball, R.L., Langlotz, C., Shpanskaya, K., Lungren, M.P., Ng, A.Y.: MURA: Large Dataset for Abnormality Detection in Musculoskeletal Radiographs. arXiv preprint arXiv:1712.06957 (2018)

work page internal anchor Pith review Pith/arXiv arXiv 2018

[5] [5]

Applied Sciences10(4), 1507 (2020)

Tanzi, L., Vezzetti, E., Moreno, R., Moos, S.: X-Ray Bone Fracture Classifica- tion Using Deep Learning: A Baseline for Designing a Reliable Approach. Applied Sciences10(4), 1507 (2020). https://doi.org/10.3390/app10041507

work page doi:10.3390/app10041507 2020

[6] [6]

Clinical Radiology79(11), e1394–e1402 (2024)

Tahir, A., Saadia, A., Khan, K., Gul, A., Qahmash, A., Akram, R.N.: Enhancing diagnosis: ensemble deep-learning model for fracture detec- tion using X-ray images. Clinical Radiology79(11), e1394–e1402 (2024). https://doi.org/10.1016/j.crad.2024.08.006

work page doi:10.1016/j.crad.2024.08.006 2024

[7] [7]

Diagnostics12(10), 2420 (2022)

Meena, T., Roy, S.: Bone Fracture Detection Using Deep Supervised Learning from Radiological Images: A Paradigm Shift. Diagnostics12(10), 2420 (2022). https://doi.org/10.3390/diagnostics12102420

work page doi:10.3390/diagnostics12102420 2022

[8] [8]

Scientific Reports14(1), 23053 (2024)

Husarek, J., Hess, S., Razaeian, S., et al.: Artificial intelligence in commercial frac- ture detection products: a systematic review and meta-analysis of diagnostic test accuracy. Scientific Reports14(1), 23053 (2024). https://doi.org/10.1038/s41598- 024-73058-8

work page doi:10.1038/s41598- 2024

[9] [9]

Materials Today: Proceedings80, 2557– 2562 (2023)

Karanam, S.R., Srinivas, Y., Chakravarty, S.: A systematic review on approach and analysis of bone fracture classification. Materials Today: Proceedings80, 2557– 2562 (2023). https://doi.org/10.1016/j.matpr.2021.06.408

work page doi:10.1016/j.matpr.2021.06.408 2023

[10] [10]

Radiology304(1), 50–62 (2022)

Kuo, R.Y.L., Harrison, C., Curran, T.A., Jones, B., Freethy, A., Cussons, D., Stewart, M., Collins, G.S., Furniss, D.: Artificial Intelligence in Fracture Detec- tion: A Systematic Review and Meta-Analysis. Radiology304(1), 50–62 (2022). https://doi.org/10.1148/radiol.211785

work page doi:10.1148/radiol.211785 2022

[11] [11]

PLOS Digital Health3(1), e0000438 (2024)

Jung, J., Dai, J., Liu, B., Wu, Q.: Artificial intelligence in fracture de- tection with different image modalities and data types: A systematic review and meta-analysis. PLOS Digital Health3(1), e0000438 (2024). https://doi.org/10.1371/journal.pdig.0000438

work page doi:10.1371/journal.pdig.0000438 2024

[12] [12]

NPJ Digital Medicine 3, 144 (2020)

Jones, R.M., Sharma, A., Hotchkiss, R., et al.: Assessment of a deep-learning sys- tem for fracture detection in musculoskeletal radiographs. NPJ Digital Medicine 3, 144 (2020). https://doi.org/10.1038/s41746-020-00352-w

work page doi:10.1038/s41746-020-00352-w 2020

[13] [13]

In: Drukker, K., Mazurowski, M.A

Luo, J., Kitamura, G., Doganay, E., Arefan, D., Wu, S.: Medical knowledge- guided deep curriculum learning for elbow fracture diagnosis from X-ray images. In: Drukker, K., Mazurowski, M.A. (eds.) Medical Imaging 2021: Computer-Aided Diagnosis. SPIE (2021). https://doi.org/10.1117/12.2582184

work page doi:10.1117/12.2582184 2021

[14] [14]

Quantitative Imaging in Medicine and Surgery15(3), 2529–2546 (2025)

Wu, Y., Fong, S., Yu, J.: Enhancing bone radiology images classification through appropriate preprocessing: a deep learning and explainable artificial intelligence approach. Quantitative Imaging in Medicine and Surgery15(3), 2529–2546 (2025). https://doi.org/10.21037/qims-24-1745

work page doi:10.21037/qims-24-1745 2025

[15] [15]

IEEE Transactions on Image Processing26(7), 3142–3155 (2017)

Zhang, K., Zuo, W., Chen, Y., Meng, D., Zhang, L.: Beyond a Gaussian Denoiser: Residual Learning of Deep CNN for Image Denoising. IEEE Transactions on Image Processing26(7), 3142–3155 (2017). https://doi.org/10.1109/tip.2017.2662206

work page doi:10.1109/tip.2017.2662206 2017

[16] [16]

In: 2020 3rd International Conference on Communication Sys- Patient-Level Elbow Abnormality Detection 13 tem, Computing and IT Applications (CSCITA), pp

Sharan, T.S., Bhattacharjee, R., Sharma, S., Sharma, N.: Evaluation of Deep Learning Methods (DnCNN and U-Net) for Denoising of Heart Ausculta- tion Signals. In: 2020 3rd International Conference on Communication Sys- Patient-Level Elbow Abnormality Detection 13 tem, Computing and IT Applications (CSCITA), pp. 151–155. IEEE (2020). https://doi.org/10.1109...

work page doi:10.1109/cscita47329.2020.9137813 2020

[17] [17]

In: Nanda, S.J., et al

Kangralkar, V., Hulmani, V., Nasery, T., Shilaskar, S.: Image Denoising with DnCNN and Autoencoder: A Deep Learning Approach. In: Nanda, S.J., et al. (eds.) Data Science and Applications, pp. 323–336. Springer, Singapore (2025). https://doi.org/10.1007/978-981-96-2299-3_22

work page doi:10.1007/978-981-96-2299-3_22 2025

[18] [18]

MONAI: An open-source framework for deep learning in healthcare

Cardoso, M.J., et al.: MONAI: An open-source framework for deep learning in healthcare. arXiv preprint arXiv:2211.02701 (2022)

work page internal anchor Pith review Pith/arXiv arXiv 2022