arxiv: 2604.14805 · v1 · submitted 2026-04-16 · 💻 cs.CV

Recognition: unknown

From Boundaries to Semantics: Prompt-Guided Multi-Task Learning for Petrographic Thin-section Segmentation

Caleb Chen Cao, Dingwen Xiao, Lei Chen, Li Hou, Lin Wang, Mingjun Zhao, Qianxiao Su, Shiqi Wen, Weiming Zhang, Yili Ren, Zilu Zheng

Authors on Pith no claims yet

Pith reviewed 2026-05-10 10:55 UTC · model grok-4.3

classification 💻 cs.CV

keywords petrographic thin-section segmentationgrain-edge segmentationlithology semantic segmentationSegment Anything Modelmulti-task learningpolarized light viewsmulti-scale feature fusion

0 comments

The pith

Petro-SAM merges seven polarized views in a modified SAM to jointly segment grain edges and lithology semantics in thin-section images.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper proposes Petro-SAM as a two-stage multi-task framework that adapts the Segment Anything Model to perform both grain-edge segmentation and lithology semantic segmentation on petrographic thin sections. It tackles the domain gap caused by extinction-dependent color changes and ultra-fine boundaries by adding a Merge Block that combines multiple polarized light views, plus multi-scale feature fusion and color-entropy priors. A sympathetic reader would care because this approach promises accurate joint analysis of rock fabric and composition without creating large new expert-labeled datasets for each task separately.

Core claim

Petro-SAM is a novel two-stage multi-task framework based on SAM that achieves high-quality joint grain-edge segmentation and lithology semantic segmentation on petrographic images by introducing a Merge Block to integrate seven polarized views, thereby addressing the extinction issue, along with multi-scale feature fusion and color-entropy priors to refine detection.

What carries the argument

The Merge Block integrates seven polarized views into the SAM backbone to resolve extinction-dependent color variations while multi-scale feature fusion and color-entropy priors refine boundaries and semantic labels.

If this is right

Joint GES and LSS become feasible from the same petrographic image stack, enabling direct quantification of both fabric and composition.
The framework reduces the need for task-specific expert annotations by leveraging prompt-guided adaptation of a foundation model.
Multi-view polarized input can be processed in a single forward pass rather than running separate models for each view.
Detection of ultra-fine grains improves when color-entropy priors are combined with the merged polarized features.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same merge-and-prior strategy could extend to other multi-angle imaging problems where extinction or contrast varies across views.
If the priors generalize, they might reduce annotation costs in related domains such as mineral identification from microscopy stacks.
Real-time field deployment would require checking whether the two-stage pipeline runs fast enough on portable hardware.

Load-bearing premise

The Merge Block together with the added fusion and priors can overcome the domain shift from polarized-light color changes and fine grain boundaries without needing extensive new expert annotations or full retraining.

What would settle it

On a held-out set of thin-section images with pronounced extinction effects and sub-micron boundaries, measure whether Petro-SAM's joint segmentation accuracy falls below that of separately trained standard SAM or U-Net baselines by more than a few percent in boundary F1 and semantic IoU.

Figures

Figures reproduced from arXiv: 2604.14805 by Caleb Chen Cao, Dingwen Xiao, Lei Chen, Li Hou, Lin Wang, Mingjun Zhao, Qianxiao Su, Shiqi Wen, Weiming Zhang, Yili Ren, Zilu Zheng.

**Figure 2.** Figure 2: Dataset visualization of multi-angle polarized thin-section samples with seven cross [PITH_FULL_IMAGE:figures/full_fig_p008_2.png] view at source ↗

**Figure 3.** Figure 3: Visualization of Petro-SAM’s Stage 1 framework, Refine Block, Merge Block, and [PITH_FULL_IMAGE:figures/full_fig_p009_3.png] view at source ↗

**Figure 4.** Figure 4: Visualization of Petro-SAM’s Stage 2 framework [PITH_FULL_IMAGE:figures/full_fig_p011_4.png] view at source ↗

**Figure 5.** Figure 5: Visualization of Entropy Block’s structure. [PITH_FULL_IMAGE:figures/full_fig_p013_5.png] view at source ↗

**Figure 6.** Figure 6: Qualitative edge results comparing existing methods with our Petro-SAM. [PITH_FULL_IMAGE:figures/full_fig_p017_6.png] view at source ↗

**Figure 7.** Figure 7: Qualitative semantic results comparing existing methods with our Petro-SAM. [PITH_FULL_IMAGE:figures/full_fig_p019_7.png] view at source ↗

**Figure 8.** Figure 8: Stage 1 Ablation Visualization. Zoom in for a better understanding. [PITH_FULL_IMAGE:figures/full_fig_p022_8.png] view at source ↗

**Figure 9.** Figure 9: Stage 2 Edge Ablation Visualization. Zoom in for a better understanding. [PITH_FULL_IMAGE:figures/full_fig_p023_9.png] view at source ↗

**Figure 10.** Figure 10: Stage 2 Semantic Ablation Visualization. Zoom in for a better understanding. [PITH_FULL_IMAGE:figures/full_fig_p024_10.png] view at source ↗

read the original abstract

Grain-edge segmentation (GES) and lithology semantic segmentation (LSS) are two pivotal tasks for quantifying rock fabric and composition. However, these two tasks are often treated separately, and the segmentation quality is implausible albeit expensive, time-consuming, and expert-annotated datasets have been used. Recently, foundation models, especially the Segment Anything Model (SAM), have demonstrated impressive robustness for boundary alignment. However, directly adapting SAM to joint GES and LSS is nontrivial due to 1) severe domain gap induced by extinction-dependent color variations and ultra-fine grain boundaries, and 2) lacking novel modules for joint learning on multi-angle petrographic image stacks. In this paper, we propose Petro-SAM, a novel two-stage, multi-task framework that can achieve high-quality joint GES and LSS on petrographic images. Specifically, based on SAM, we introduce a Merge Block to integrate seven polarized views, effectively solving the extinction issue. Moreover, we introduce multi-scale feature fusion and color-entropy priors to refine the detection.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper adapts SAM with a Merge Block for seven polarized views and color-entropy priors to do joint grain-edge and lithology segmentation on thin sections, but supplies no metrics or comparisons to show it works.

read the letter

The main point is that this work takes the Segment Anything Model and adds a Merge Block to combine seven polarized-light views plus multi-scale fusion and color-entropy priors, all aimed at joint grain-edge segmentation and lithology semantic segmentation on petrographic images. The goal is to handle the extinction problem that makes colors shift and boundaries hard to see without needing huge new labeled sets.

Referee Report

2 major / 1 minor

Summary. The paper proposes Petro-SAM, a novel two-stage multi-task framework extending the Segment Anything Model (SAM) for joint grain-edge segmentation (GES) and lithology semantic segmentation (LSS) on petrographic thin-section images. It introduces a Merge Block to integrate seven polarized views to address extinction issues, along with multi-scale feature fusion and color-entropy priors to handle domain gaps from extinction-dependent color variations and ultra-fine grain boundaries, claiming high-quality results without requiring large new expert-annotated datasets.

Significance. If the proposed Merge Block, fusion mechanisms, and priors are shown to effectively close the domain gap, the work could meaningfully advance automated petrographic analysis by adapting foundation models to specialized scientific imaging, reducing annotation burdens and improving reproducibility in rock fabric and composition quantification.

major comments (2)

[Abstract] Abstract: the claim that the framework 'can achieve high-quality joint GES and LSS' is unsupported by any quantitative metrics, error bars, baseline comparisons (e.g., against plain SAM), or ablation studies on the Merge Block, multi-scale fusion, or color-entropy priors.
[Methods] Framework description: no architectural equations, fusion mechanism details, loss formulations, or diagrams are provided for the Merge Block integration of seven polarized views or the color-entropy priors, preventing verification that these additions overcome the stated domain gap without new annotations.

minor comments (1)

[Abstract] Abstract: the phrasing 'segmentation quality is implausible albeit expensive, time-consuming, and expert-annotated datasets have been used' is grammatically unclear and should be revised.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive comments on our manuscript. We address the major concerns point-by-point below and will revise the paper to strengthen the abstract and methods sections.

read point-by-point responses

Referee: [Abstract] Abstract: the claim that the framework 'can achieve high-quality joint GES and LSS' is unsupported by any quantitative metrics, error bars, baseline comparisons (e.g., against plain SAM), or ablation studies on the Merge Block, multi-scale fusion, or color-entropy priors.

Authors: We agree that the abstract claim would benefit from explicit quantitative support. In the revised version we will update the abstract to concisely report key metrics (e.g., IoU and Dice scores for GES and LSS), include baseline comparisons against plain SAM, and summarize the ablation results on the Merge Block, multi-scale fusion, and color-entropy priors. Error bars from multiple runs will also be noted. These results already appear in the experimental section and will be distilled into the abstract without altering the overall narrative. revision: yes
Referee: [Methods] Framework description: no architectural equations, fusion mechanism details, loss formulations, or diagrams are provided for the Merge Block integration of seven polarized views or the color-entropy priors, preventing verification that these additions overcome the stated domain gap without new annotations.

Authors: We acknowledge that the current methods description is insufficiently detailed for independent verification. We will expand the methods section to include (1) the mathematical formulation and equations of the Merge Block for integrating the seven polarized views, (2) explicit descriptions and equations for the multi-scale feature fusion and color-entropy prior modules, (3) the complete loss function used for joint training, and (4) additional schematic diagrams. These additions will demonstrate how the components mitigate extinction-induced domain gaps while relying on the pre-trained SAM weights and limited petrographic data, without requiring large new expert annotations. revision: yes

Circularity Check

0 steps flagged

No significant circularity; novel modules proposed atop external SAM

full rationale

The paper proposes Petro-SAM as a two-stage multi-task framework that adds a Merge Block for integrating seven polarized views, plus multi-scale feature fusion and color-entropy priors, to adapt the external Segment Anything Model (SAM) for joint GES and LSS on petrographic images. No equations, loss formulations, or derivations are shown that reduce any claimed prediction or result to fitted inputs or self-definitions by construction. The approach is presented as building on an independent foundation model with new components whose performance requires separate validation, not as a renaming or self-referential fit. No self-citation chains, uniqueness theorems, or ansatzes smuggled via prior work are evident. This is a standard architectural proposal whose central claims remain open to external falsification.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 2 invented entities

Review is based solely on the abstract; full details on any fitted parameters or background assumptions are unavailable.

axioms (1)

domain assumption SAM can be effectively adapted to petrographic images despite domain gap via added modules
The paper relies on this to claim the Merge Block solves extinction issues.

invented entities (2)

Merge Block no independent evidence
purpose: Integrate seven polarized views to solve extinction issue
New module introduced in the framework
color-entropy priors no independent evidence
purpose: Refine detection in multi-scale fusion
New prior introduced to improve segmentation

pith-pipeline@v0.9.0 · 5515 in / 1425 out tokens · 34769 ms · 2026-05-10T10:55:28.074142+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

57 extracted references · 18 canonical work pages · 6 internal anchors

[1]

The edge segmentation of grains in thin-section petrographic images utilising extinction consistency perception network,

P. Zhang, J. Zhou, W. Zhao, X. Li, and L. Pu, “The edge segmentation of grains in thin-section petrographic images utilising extinction consistency perception network,”Complex & Intelligent Systems, vol. 10, no. 1, pp. 1231–1245, 2024

2024
[2]

Holistically-nested edge detection,

S. Xie and Z. Tu, “Holistically-nested edge detection,” inProceedings of the IEEE international conference on computer vision, 2015, pp. 1395–1403

2015
[3]

Bi-directional cascade network for perceptual edge detection,

J. He, S. Zhang, M. Yang, Y . Shan, and T. Huang, “Bi-directional cascade network for perceptual edge detection,” inProceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2019, pp. 3828– 3837

2019
[4]

U-Net: Convolutional Networks for Biomedical Image Segmentation

O. Ronneberger, P. Fischer, and T. Brox, “U-net: Convolutional networks for biomedical image segmentation,” 2015. [Online]. Available: https://arxiv.org/abs/1505.04597

work page internal anchor Pith review Pith/arXiv arXiv 2015
[5]

Deep high-resolution representation learning for visual recognition,

J. Wang, K. Sun, T. Cheng, B. Jiang, C. Deng, Y . Zhao, D. Liu, Y . Mu, M. Tan, X. Wang, W. Liu, and B. Xiao, “Deep high-resolution representation learning for visual recognition,” 2020. [Online]. Available: https://arxiv.org/abs/1908.07919

work page arXiv 2020
[6]

Segment Anything

A. Kirillov, E. Mintun, N. Ravi, H. Mao, C. Rolland, L. Gustafson, T. Xiao, S. Whitehead, A. C. Berg, W.-Y . Loet al., “Segment anything,”arXiv preprint arXiv:2304.02643, 2023

work page internal anchor Pith review arXiv 2023
[7]

Automatic grain boundary detection and grain size analysis using polarization micrographs or orientation images,

R. Heilbronner, “Automatic grain boundary detection and grain size analysis using polarization micrographs or orientation images,”Journal of Structural Geology, vol. 22, no. 7, pp. 969–981, 2000. [Online]. Available: https://www.sciencedirect.com/science/article/pii/S0191814100000146

2000
[8]

Segmentation of thin section images for grain size analysis using region competition and edge-weighted region merging,

M. Jungmann, H. Pape, P. Wißkirchen, C. Clauser, and T. Berlage, “Segmentation of thin section images for grain size analysis using region competition and edge-weighted region merging,”Computers & Geosciences, vol. 72, pp. 33–48, 2014. [Online]. Available: https: //www.sciencedirect.com/science/article/pii/S0098300414001599

2014
[9]

A method for automatic grain segmentation of multi-angle cross-polarized microscopic 26 images of sandstone,

F. Jiang, Q. Gu, H. Hao, N. Li, B. Wang, and X. Hu, “A method for automatic grain segmentation of multi-angle cross-polarized microscopic 26 images of sandstone,”Computers & Geosciences, vol. 115, pp. 143–153,
[10]

Available: https://www.sciencedirect.com/science/article/ pii/S0098300417308178

[Online]. Available: https://www.sciencedirect.com/science/article/ pii/S0098300417308178
[11]

An artificial neural net assisted approach to editing edges in petrographic images collected with the rotating polarizer stage,

F. Fueten and J. Mason, “An artificial neural net assisted approach to editing edges in petrographic images collected with the rotating polarizer stage,”Computers & Geosciences, vol. 33, no. 9, pp. 1176–1188,
[12]

Available: https://www.sciencedirect.com/science/article/ pii/S0098300407000738

[Online]. Available: https://www.sciencedirect.com/science/article/ pii/S0098300407000738
[13]

A new intelligent method for minerals segmentation in thin sections based on a novel incremental color clustering,

H. Izadi, J. Sadri, and N.-A. Mehran, “A new intelligent method for minerals segmentation in thin sections based on a novel incremental color clustering,” Computers & Geosciences, vol. 81, pp. 38–52, 2015. [Online]. Available: https://www.sciencedirect.com/science/article/pii/S0098300415000916

2015
[14]

Automatic grain segmenta- tion in cross-polarized photomicrographs of sedimentary rocks using psy- chophysics inspired models,

R. Das, B. U. Shankar, T. Chakrabortyet al., “Automatic grain segmenta- tion in cross-polarized photomicrographs of sedimentary rocks using psy- chophysics inspired models,”Innovations in Systems and Software Engi- neering, vol. 17, pp. 167–183, 2021

2021
[15]

Very Deep Convolutional Networks for Large-Scale Image Recognition

K. Simonyan and A. Zisserman, “Very deep convolutional networks for large-scale image recognition,” 2015. [Online]. Available: https: //arxiv.org/abs/1409.1556

work page internal anchor Pith review Pith/arXiv arXiv 2015
[16]

Deep residual learning for image recognition,

K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016, pp. 770–778

2016
[17]

Richer convolutional features for edge detection,

Y . Liu, M.-M. Cheng, X. Hu, K. Wang, and X. Bai, “Richer convolutional features for edge detection,” inProceedings of the IEEE conference on com- puter vision and pattern recognition, 2017, pp. 3000–3009

2017
[18]

Pixel difference networks for efficient edge detection,

Z. Su, W. Liu, Z. Yu, D. Hu, Q. Liao, Q. Tian, M. Pietikäinen, and L. Liu, “Pixel difference networks for efficient edge detection,” inProceedings of the IEEE/CVF international conference on computer vision, 2021, pp. 5117– 5127

2021
[19]

Sauge: Taming sam for uncertainty-aligned multi-granularity edge detection,

X. Liufu, C. Tan, X. Lin, Y . Qi, J. Li, and J.-F. Hu, “Sauge: Taming sam for uncertainty-aligned multi-granularity edge detection,” inProceedings of the AAAI Conference on Artificial Intelligence, vol. 39, no. 6, 2025, pp. 5766– 5774. 27

2025
[20]

Fully convolutional networks for semantic segmentation,

J. Long, E. Shelhamer, and T. Darrell, “Fully convolutional networks for semantic segmentation,” inProceedings of the IEEE conference on computer vision and pattern recognition, 2015, pp. 3431–3440

2015
[21]

Rethinking Atrous Convolution for Semantic Image Segmentation

L.-C. Chen, G. Papandreou, F. Schroff, and H. Adam, “Rethinking atrous convolution for semantic image segmentation,” 2017. [Online]. Available: https://arxiv.org/abs/1706.05587

work page internal anchor Pith review arXiv 2017
[22]

CoRR abs/2105.15203(2021),https://arxiv.org/abs/2105.15203

E. Xie, W. Wang, Z. Yu, A. Anandkumar, J. M. Alvarez, and P. Luo, “Segformer: Simple and efficient design for semantic segmentation with transformers,” 2021. [Online]. Available: https://arxiv.org/abs/2105.15203

work page arXiv 2021
[23]

Intelligent evaluation of sandstone rock structure based on a visual large model,

Y . Ren, C. Zeng, X. Li, X. Liu, Y . Hu, Q. Su, X. Wang, Z. Lin, Y . Zhou, Z. Zheng, H. Hu, Y . Yang, and F. Hui, “Intelligent evaluation of sandstone rock structure based on a visual large model,”Petroleum Exploration and Development, vol. 52, no. 2, pp. 548–558, 2025

2025
[24]

Ccnet: Criss- cross attention for semantic segmentation,

Z. Huang, X. Wang, L. Huang, C. Huang, Y . Wei, and W. Liu, “Ccnet: Criss- cross attention for semantic segmentation,” inProceedings of the IEEE/CVF international conference on computer vision, 2019, pp. 603–612

2019
[25]

Swin-unet: Unet-like pure transformer for medical image segmentation,

H. Cao, Y . Wang, J. Chen, D. Jiang, X. Zhang, Q. Tian, and M. Wang, “Swin-unet: Unet-like pure transformer for medical image segmentation,” inEuropean conference on computer vision. Springer, 2022, pp. 205–218

2022
[26]

Masked- attention mask transformer for universal image segmentation,

B. Cheng, I. Misra, A. G. Schwing, A. Kirillov, and R. Girdhar, “Masked- attention mask transformer for universal image segmentation,” inProceed- ings of the IEEE/CVF conference on computer vision and pattern recogni- tion, 2022, pp. 1290–1299

2022
[27]

Dlow: Domain flow for adaptation and generalization,

R. Gong, W. Li, Y . Chen, and L. V . Gool, “Dlow: Domain flow for adaptation and generalization,” 2019. [Online]. Available: https://arxiv.org/abs/1812.05418

work page arXiv 2019
[28]

Domain adaptation for structured output via discriminative patch representations,

Y .-H. Tsai, K. Sohn, S. Schulter, and M. Chandraker, “Domain adaptation for structured output via discriminative patch representations,” 2019. [Online]. Available: https://arxiv.org/abs/1901.05427

work page arXiv 2019
[29]

Prototypical pseudo label denoising and target structure learning for 28 domain adaptive semantic segmentation,

P. Zhang, B. Zhang, T. Zhang, D. Chen, Y . Wang, and F. Wen, “Prototypical pseudo label denoising and target structure learning for 28 domain adaptive semantic segmentation,” 2021. [Online]. Available: https://arxiv.org/abs/2101.10979

work page arXiv 2021
[30]

Semi-supervised semantic segmentation with prototype-based consistency regularization,

H.-M. Xu, L. Liu, Q. Bian, and Z. Yang, “Semi-supervised semantic segmentation with prototype-based consistency regularization,” 2022. [Online]. Available: https://arxiv.org/abs/2210.04388

work page arXiv 2022
[31]

Efficient inference in fully connected crfs with gaussian edge potentials,

P. Krähenbühl and V . Koltun, “Efficient inference in fully connected crfs with gaussian edge potentials,” 2012. [Online]. Available: https: //arxiv.org/abs/1210.5644

work page arXiv 2012
[32]

Serra,Image analysis and mathematical morphology

J. Serra,Image analysis and mathematical morphology. Academic Press, Inc., 1983

1983
[33]

Boundary loss for highly unbalanced segmentation,

H. Kervadec, J. Bouchtiba, C. Desrosiers, E. Granger, J. Dolz, and I. B. Ayed, “Boundary loss for highly unbalanced segmentation,” inInternational conference on medical imaging with deep learning. PMLR, 2019, pp. 285– 296

2019
[34]

Sam-adapter: Adapting segment anything in underperformed scenes,

T. Chen, L. Zhu, C. Deng, R. Cao, Y . Wang, S. Zhang, Z. Li, L. Sun, Y . Zang, and P. Mao, “Sam-adapter: Adapting segment anything in underperformed scenes,” inProceedings of the IEEE/CVF International Conference on Com- puter Vision, 2023, pp. 3367–3375

2023
[35]

Lisa: Reasoning segmentation via large language model,

X. Lai, Z. Tian, Y . Chen, Y . Li, Y . Yuan, S. Liu, and J. Jia, “Lisa: Reasoning segmentation via large language model,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024, pp. 9579– 9589

2024
[36]

Prompt-tuning sam: From generalist to specialist with only 2048 parameters and 16 training images,

T. Piater, B. Barz, and A. Freytag, “Prompt-tuning sam: From generalist to specialist with only 2048 parameters and 16 training images,” inProceed- ings of the Computer Vision and Pattern Recognition Conference, 2025, pp. 4688–4698

2048
[37]

arXiv preprint arXiv:2311.11969

J. Ye, J. Cheng, J. Chen, Z. Deng, T. Li, H. Wang, Y . Su, Z. Huang, J. Chen, L. Jianget al., “Sa-med2d-20m dataset: Segment anything in 2d medical imaging with 20 million masks,”arXiv preprint arXiv:2311.11969, 2023

work page arXiv 2023
[38]

Edgesam: Prompt-in-the-loop distil- lation for sam,

C. Zhou, X. Li, C. C. Loy, and B. Dai, “Edgesam: Prompt-in-the-loop distil- lation for sam,”International Journal of Computer Vision, pp. 1–17, 2025. 29

2025
[39]

Identification of rock fragments after blasting by using deep learning-based segment anything model,

J. Zhao, D. Li, and Y . Yu, “Identification of rock fragments after blasting by using deep learning-based segment anything model,”Minerals, vol. 14, no. 7, p. 654, 2024

2024
[40]

Few-shot intelli- gent identification of rock thin sections based on sam,

Z. Zhang, Q. Li, Z. Wei, Q. Du, X. Li, and Y . Zhou, “Few-shot intelli- gent identification of rock thin sections based on sam,”Available at SSRN 5158450
[41]

Enhancing sam-based digital rock image segmentation via edge-semantics fusion,

Z. Wang, Z. Hou, and D. Cao, “Enhancing sam-based digital rock image segmentation via edge-semantics fusion,”Applied Computing and Geosciences, vol. 28, p. 100292, 2025. [Online]. Available: https://www.sciencedirect.com/science/article/pii/S2590197425000746

2025
[42]

An Overview of Multi-Task Learning in Deep Neural Networks

S. Ruder, “An overview of multi-task learning in deep neural networks,” arXiv preprint arXiv:1706.05098, 2017

work page internal anchor Pith review arXiv 2017
[43]

Sluice networks: Learning what to share between loosely related tasks

S. Ruder, J. Bingel, I. Augenstein, and A. Søgaard, “Sluice networks: Learning what to share between loosely related tasks,”arXiv preprint arXiv:1705.08142, vol. 2, no. 1, 2017

work page arXiv 2017
[44]

End-to-end multi-task learning with attention,

S. Liu, E. Johns, and A. J. Davison, “End-to-end multi-task learning with attention,” inProceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2019, pp. 1871–1880

2019
[45]

Mtlora: Low-rank adaptation approach for efficient multi-task learning,

A. Agiza, M. Neseem, and S. Reda, “Mtlora: Low-rank adaptation approach for efficient multi-task learning,” inProceedings of the IEEE/CVF confer- ence on computer vision and pattern recognition, 2024, pp. 16 196–16 205

2024
[46]

Towards consistent multi-task learning: Un- locking the potential of task-specific parameters,

X. Qin, X. Wang, and J. Yan, “Towards consistent multi-task learning: Un- locking the potential of task-specific parameters,” inProceedings of the Computer Vision and Pattern Recognition Conference, 2025, pp. 10 067– 10 076

2025
[47]

Learning conflict-noticed architecture for multi-task learning,

Z. Yue, Y . Zhang, and J. Liang, “Learning conflict-noticed architecture for multi-task learning,” inProceedings of the AAAI Conference on Artificial Intelligence, vol. 37, no. 9, 2023, pp. 11 078–11 086

2023
[48]

Boundary-aware multitask learn- ing for remote sensing imagery,

Y . Wang, W. Ding, R. Zhang, and H. Li, “Boundary-aware multitask learn- ing for remote sensing imagery,”IEEE Journal of selected topics in applied earth observations and remote sensing, vol. 14, pp. 951–963, 2020. 30

2020
[49]

Dis- tance map loss penalty term for semantic segmentation,

F. Caliva, C. Iriondo, A. M. Martinez, S. Majumdar, and V . Pedoia, “Dis- tance map loss penalty term for semantic segmentation,”arXiv preprint arXiv:1908.03679, 2019

work page arXiv 1908
[50]

Zeiss axio scan . z 1 a reference list for automated slide scanning,

Z. A. Scan.Z, T. Heupel, and C. Zeiss, “Zeiss axio scan . z 1 a reference list for automated slide scanning,” 2018. [Online]. Available: https://api.semanticscholar.org/CorpusID:49565229

2018
[51]

An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale

A. Dosovitskiy, L. Beyer, A. Kolesnikov, D. Weissenborn, X. Zhai, T. Un- terthiner, M. Dehghani, M. Minderer, G. Heigold, S. Gellyet al., “An image is worth 16x16 words: Transformers for image recognition at scale,”arXiv preprint arXiv:2010.11929, 2020

work page internal anchor Pith review Pith/arXiv arXiv 2010
[52]

Localized gaussian splatting editing with contextual awareness

X. Soria, E. Riba, and A. Sappa, “Dense extreme inception network: Towards a robust cnn model for edge detection,” in2020 IEEE Winter Conference on Applications of Computer Vision (WACV). Los Alamitos, CA, USA: IEEE Computer Society, mar 2020, pp. 1912–1921. [Online]. Available: https://doi.ieeecomputersociety.org/10.1109/W ACV45572.2020. 9093290

work page doi:10.1109/w 2020
[53]

Deep structural contour detection,

R. Deng and S. Liu, “Deep structural contour detection,”Proceedings of the 28th ACM International Conference on Multimedia, 2020. [Online]. Available: https://api.semanticscholar.org/CorpusID:222278473

2020
[54]

Efficientnetv2: Smaller models and faster training,

M. Tan and Q. V . Le, “Efficientnetv2: Smaller models and faster training,” in International Conference on Machine Learning, 2021. [Online]. Available: https://api.semanticscholar.org/CorpusID:232478903

2021
[55]

Xdog: advanced image stylization with ex- tended difference-of-gaussians,

H. Winnemöller, “Xdog: advanced image stylization with ex- tended difference-of-gaussians,” inProceedings of the ACM SIG- GRAPH/Eurographics Symposium on Non-Photorealistic Animation and Rendering, ser. NPAR ’11. New York, NY , USA: Association for Computing Machinery, 2011, p. 147–156. [Online]. Available: https://doi.org/10.1145/2024676.2024700

work page doi:10.1145/2024676.2024700 2011
[56]

Fast edge detection using structured forests,

P. Dollár and C. L. Zitnick, “Fast edge detection using structured forests,” IEEE transactions on pattern analysis and machine intelligence, vol. 37, no. 8, pp. 1558–1570, 2014. 31

2014
[57]

Pyramid scene parsing net- work,

H. Zhao, J. Shi, X. Qi, X. Wang, and J. Jia, “Pyramid scene parsing net- work,” inProceedings of the IEEE conference on computer vision and pat- tern recognition, 2017, pp. 2881–2890. 32

2017