A self-supervised learning approach to deep filter banks for texture recognition

Antonio E. Fabris; Joao B. Florindo; Lucas O.Lyra

arxiv: 2605.27843 · v1 · pith:I2V3ZR4Ynew · submitted 2026-05-27 · 💻 cs.CV

A self-supervised learning approach to deep filter banks for texture recognition

Joao B. Florindo , Lucas O.Lyra , Antonio E. Fabris This is my paper

Pith reviewed 2026-06-29 13:56 UTC · model grok-4.3

classification 💻 cs.CV

keywords self-supervised learningtexture recognitionconvolutional autoencoderdeep filter banksFisher vector poolingmasked autoencoderimage classification

0 comments

The pith

A convolutional autoencoder for self-supervised pretraining combined with deep filters and Fisher vector pooling improves texture recognition accuracy and cuts computational cost.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper tackles limited training data in texture recognition by replacing vision transformer masked autoencoders with a convolutional autoencoder for self-supervised pretraining. It rests on the premise that texture patterns carry most information locally, so long-range attention is unnecessary. Deep filters are then applied to the learned representations and pooled via Fisher vectors to produce the final descriptors. Experiments across multiple texture databases show the method matches or exceeds state-of-the-art accuracy while keeping complexity lower.

Core claim

Pretraining a convolutional autoencoder self-supervisedly learns local texture representations that, when fed into deep filter banks and Fisher vector pooling, yield higher classification accuracy than prior methods on standard texture databases and do so with substantially lower computational demands than transformer-based alternatives.

What carries the argument

Convolutional autoencoder pretrained via masked reconstruction, followed by deep filter banks and Fisher vector pooling.

If this is right

Classification accuracy rises on standard texture benchmarks without added compute.
The pipeline remains practical for settings with scarce labeled texture data.
Avoiding attention mechanisms keeps inference and training costs low relative to vision transformers.
Fisher vector pooling of deep filters converts the pretrained features into compact, discriminative descriptors.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same local-pretraining strategy could transfer to other pattern-recognition tasks dominated by short-range statistics.
Designers of lightweight models for mobile or embedded vision might adopt the convolutional autoencoder backbone as a default starting point.
Further gains could be tested by swapping Fisher vectors for alternative pooling layers while keeping the convolutional pretraining fixed.
The local-information premise invites direct measurement of how far spatial correlations actually extend in common texture collections.

Load-bearing premise

Most relevant information in a texture image is contained within a small local neighborhood around each pixel.

What would settle it

A controlled test on a texture dataset engineered with strong long-range dependencies where a transformer-based model achieves markedly higher accuracy than the convolutional autoencoder version.

Figures

Figures reproduced from arXiv: 2605.27843 by Antonio E. Fabris, Joao B. Florindo, Lucas O.Lyra.

**Figure 1.** Figure 1: illustrates the overall architecture of the proposed framework. Encoder Decoder CNN Feature Extraction GMM Tranining Fisher Vectors Concatenation Predictor Latent Representation Image [PITH_FULL_IMAGE:figures/full_fig_p010_1.png] view at source ↗

read the original abstract

An important challenge in texture recognition is the limited amount of data for training frequently found in real-world applications. In computer vision in general, a successful strategy to mitigate this issue is the use of a pretraining stage where the neural network learns to identify relations between parts of the data in a self-supervised manner. A well-established framework in this direction is masked autoencoder. Nevertheless, these models usually rely on computationally intensive architectures, such as vision transformers. In the particular case of texture images, most of the relevant information is compacted within a delimited area around each pixel, which suggests that capturing long-range dependence via the attention mechanism may be unnecessary. Based on that assumption, here we propose a framework where the pretraining model is a convolutional autoencoder. To leverage the rich information conveyed by texture patterns, we employ deep filters coupled with Fisher vector pooling. In this way, we improve the performance of texture recognition without adding significant computational burden. Our approach is compared with several state-of-the-art methods in different texture databases, confirming its potential both in terms of classification accuracy and computational complexity.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

This swaps a conv autoencoder into masked autoencoder pretraining for textures on the untested claim that local neighborhoods make attention unnecessary, but the abstract supplies no numbers to back the accuracy or complexity claims.

read the letter

The paper's main move is to drop the vision transformer from masked autoencoders and use a convolutional autoencoder instead, justified by the idea that texture information lives in small neighborhoods around each pixel. They keep Fisher vector pooling on deep filters to handle the patterns.

What is actually new is the specific combination for this narrow task: conv-based self-supervised pretraining plus the existing pooling step, aimed at data-scarce texture datasets. It does a reasonable job of identifying a potential efficiency lever in an established pipeline.

The soft spots are straightforward. The abstract states improved accuracy and lower complexity but gives no metrics, baselines, or error bars, so those claims cannot be checked. The locality assumption that justifies skipping attention is presented without any test against textures that might have periodic or long-range structure, which directly affects whether the complexity savings are real or just an artifact of the chosen databases. The full text might contain the missing runs, but nothing in the supplied material shows them.

This is for readers who work on practical texture classification and want lighter pretraining options. It will not interest people looking for new frameworks or first-principles results.

I would send it for peer review. The idea is clear enough that referees can evaluate the experiments in one pass, and the work is grounded enough to deserve that check even if revisions are needed on the assumption and the numbers.

Referee Report

2 major / 0 minor

Summary. The paper proposes replacing masked autoencoder pretraining on vision transformers with a convolutional autoencoder, justified by the locality of texture information, and combines this with deep filter banks and Fisher vector pooling to improve texture classification accuracy while reducing computational complexity relative to state-of-the-art methods across multiple texture databases.

Significance. If the empirical comparisons hold, the work could demonstrate a lighter-weight self-supervised pipeline tailored to texture tasks where global attention is unnecessary, offering practical gains in efficiency for data-limited applications.

major comments (2)

[Abstract] Abstract: the central claim that the approach 'confirm[s] its potential both in terms of classification accuracy and computational complexity' is presented without any metrics, baselines, error bars, or dataset-specific results, so the performance assertions cannot be evaluated from the provided text.
[Abstract] Abstract: the design decision to forgo attention mechanisms rests entirely on the untested premise that 'most of the relevant information is compacted within a delimited area around each pixel'; no ablation, comparison against a ViT-based counterpart, or analysis of long-range correlations in the evaluated databases is referenced, rendering the complexity advantage dependent on this locality hypothesis rather than demonstrated superiority.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive comments on the abstract. We will revise the abstract to incorporate quantitative results and strengthen the motivation for the architectural choices.

read point-by-point responses

Referee: [Abstract] Abstract: the central claim that the approach 'confirm[s] its potential both in terms of classification accuracy and computational complexity' is presented without any metrics, baselines, error bars, or dataset-specific results, so the performance assertions cannot be evaluated from the provided text.

Authors: We agree that the abstract would be strengthened by including concrete metrics. In the revised manuscript we will add key results such as accuracy improvements and complexity reductions relative to the compared baselines on the evaluated texture databases. revision: yes
Referee: [Abstract] Abstract: the design decision to forgo attention mechanisms rests entirely on the untested premise that 'most of the relevant information is compacted within a delimited area around each pixel'; no ablation, comparison against a ViT-based counterpart, or analysis of long-range correlations in the evaluated databases is referenced, rendering the complexity advantage dependent on this locality hypothesis rather than demonstrated superiority.

Authors: The locality premise is presented as a domain-motivated hypothesis for texture data. The manuscript already includes direct empirical comparisons of the convolutional pipeline against transformer-based masked autoencoder methods, demonstrating both higher accuracy and lower complexity on the standard texture datasets. These results provide supporting evidence for the design choice. We will revise the abstract and introduction to more explicitly reference these comparisons. revision: partial

Circularity Check

0 steps flagged

No significant circularity detected

full rationale

The paper states an assumption about locality of texture information to motivate replacing vision transformers with a convolutional autoencoder, then describes using deep filters with Fisher vector pooling and reports empirical comparisons on texture databases. No equations, fitted parameters, or self-citations are presented that reduce any claimed prediction or result to the inputs by construction. The derivation chain consists of a design choice justified by an external premise followed by standard empirical validation, which is self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only review supplies no equations, parameters, or explicit assumptions beyond the single sentence about local pixel neighborhoods; the ledger is therefore empty.

pith-pipeline@v0.9.1-grok · 5723 in / 1117 out tokens · 29548 ms · 2026-06-29T13:56:26.830176+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

42 extracted references · 11 canonical work pages · 2 internal anchors

[1]

Young, N

D. Young, N. Khan, S. R. Hobson, D. Sussman, Diagnosis of placenta acc- reta spectrum using ultrasound texture feature fusion and machine learn- ing, Computers in Biology and Medicine 178 (2024) 108757

2024
[2]

Barburiceanu, S

S. Barburiceanu, S. Meza, B. Orza, R. Malutan, R. Terebes, Convolutional neural networks for texture feature extraction. applications to leaf disease classification in precision agriculture, IEEE Access 9 (2021) 160085–160103. 15

2021
[3]

J. Si, S. Kim, V-daft: Visual technique for texture image defect recognition with denoising autoencoder and fourier transform, Signal, Image and Video Processing 18 (10) (2024) 7405–7418

2024
[4]

H. Han, Z. Feng, W. Du, S. Guo, P. Wang, T. Xu, Remote sensing im- age classification based on multi-spectral cross-sensor super-resolution com- bined with texture features: A case study in the liaohe planting area, IEEE Access 12 (2024) 16830–16843

2024
[5]

Akiva, M

P. Akiva, M. Purri, M. Leotta, Self-supervised material and texture rep- resentation learning for remote sensing tasks, in: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022, pp. 8203–8215

2022
[6]

K. He, X. Chen, S. Xie, Y. Li, P. Dollár, R. Girshick, Masked autoencoders are scalable vision learners, in: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2022, pp. 16000–16009

2022
[7]

An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale

A. Dosovitskiy, L. Beyer, A. Kolesnikov, D. Weissenborn, X. Zhai, T. Un- terthiner, M. Dehghani, M. Minderer, G. Heigold, S. Gelly, et al., An image is worth 16x16 words: Transformers for image recognition at scale, arXiv preprint arXiv:2010.11929 (2020)

work page internal anchor Pith review Pith/arXiv arXiv 2010
[8]

L. O. Lyra, A. E. Fabris, J. B. Florindo, A multilevel pooling scheme in convolutional neural networks for texture image recognition, Applied Soft Computing (2024) 111282doi:https://doi.org/10.1016/j.asoc.2024.111282

work page doi:10.1016/j.asoc.2024.111282 2024
[9]

Gogna, A

A. Gogna, A. Majumdar, Discriminative autoencoder for feature extrac- tion: Application to character recognition, Neural Processing Letters 49 (2019) 1723–1735

2019
[10]

Z. Yang, X. Wu, P. Huang, F. Zhang, M. Wan, Z. Lai, Orthogonal autoen- coder regression for image classification, Information Sciences 618 (2022) 400–416. 16

2022
[11]

Q. Kang, J. Gao, K. Li, Q. Lao, Deblurring masked autoencoder is better recipe for ultrasound image recognition, in: International Conference on Medical Image Computing and Computer-Assisted Intervention, Springer, 2023, pp. 352–362

2023
[12]

Cimpoi, S

M. Cimpoi, S. Maji, A. Vedaldi, Deep filter banks for texture recognition and segmentation, in: Proceedings of the IEEE conference on computer vision and pattern recognition, 2015, pp. 3828–3836

2015
[13]

Very Deep Convolutional Networks for Large-Scale Image Recognition

K. Simonyan, A. Zisserman, Very deep convolutional networks for large- scale image recognition, arXiv preprint arXiv:1409.1556 (2014)

work page internal anchor Pith review Pith/arXiv arXiv 2014
[14]

Z. Chen, F. Li, Y. Quan, Y. Xu, H. Ji, Deep texture recognition via exploit- ing cross-layer statistical self-similarity, in: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021, pp. 5231– 5240

2021
[15]

Scabini, K

L. Scabini, K. M. Zielinski, L. C. Ribas, W. N. Gonçalves, B. De Baets, O. M. Bruno, Radam: Texture recognition through randomized aggregated encoding of deep activation maps, Pattern Recognition 143 (2023) 109802. doi:https://doi.org/10.1016/j.patcog.2023.109802. URLhttps://www.sciencedirect.com/science/article/pii/ S0031320323005009

work page doi:10.1016/j.patcog.2023.109802 2023
[16]

Z. Yang, S. Lai, X. Hong, Y. Shi, Y. Cheng, C. Qing, Dfaen: Double-order knowledge fusion and attentional encoding network for texture recognition, Expert Systems with Applications 209 (2022) 118223

2022
[17]

Y. Xu, F. Li, Z. Chen, J. Liang, Y. Quan, Encoding spatial distribution of convolutional features for texture representation, Advances in Neural Information Processing Systems 34 (2021)

2021
[18]

J. B. Florindo, E. E. Laureano, Boff: A bag of fuzzy deep features for texture recognition, Expert Systems with Applications 219 (2023) 119627. 17

2023
[19]

Scabini, A

L. Scabini, A. Sacilotti, K. M. Zielinski, L. C. Ribas, B. De Baets, O. M. Bruno, A comparative survey of vision transformers for feature extraction in texture analysis, arXiv preprint arXiv:2406.06136 (2024)

work page arXiv 2024
[20]

L. Zhu, T. Chen, J. Yin, S. See, J. Liu, Learning gabor texture features for fine-grained recognition, in: Proceedings of the IEEE/CVF international conference on computer vision, 2023, pp. 1621–1631

2023
[21]

A. Bera, D. Bhattacharjee, M. Nasipuri, Deep neural networks fused with textures for image classification, in: International conference on frontiers in computing and systems, Springer, 2022, pp. 103–111

2022
[22]

Goyal, S

V. Goyal, S. Sharma, Texture classification for visual data using transfer learning, Multimedia Tools and Applications 82 (16) (2023) 24841–24864

2023
[23]

Jaakkola, D

T. Jaakkola, D. Haussler, Exploiting generative models in discriminative classifiers, Advances in neural information processing systems 11 (1998)

1998
[24]

Sánchez, F

J. Sánchez, F. Perronnin, T. Mensink, J. Verbeek, Image classification with the fisher vector: Theory and practice, International journal of computer vision 105 (3) (2013) 222–245

2013
[25]

Perronnin, C

F. Perronnin, C. Dance, Fisher kernels on visual vocabularies for image categorization, in: 2007 IEEE conference on computer vision and pattern recognition, IEEE, 2007, pp. 1–8

2007
[26]

Perronnin, J

F. Perronnin, J. Sánchez, T. Mensink, Improving the fisher kernel for large- scale image classification, in: European conference on computer vision, Springer, 2010, pp. 143–156

2010
[27]

M. Tan, Q. Le, Efficientnet: Rethinking model scaling for convolu- tional neural networks, in: International Conference on Machine Learning, PMLR, 2019, pp. 6105–6114

2019
[28]

Caputo, E

B. Caputo, E. Hayman, P. Mallikarjuna, Class-specific material cat- egorisation, in: Tenth IEEE International Conference on Computer 18 Vision (ICCV’05) Volume 1, Vol. 2, 2005, pp. 1597–1604 Vol. 2. doi:10.1109/ICCV.2005.54

work page doi:10.1109/iccv.2005.54 2005
[29]

Sharan, R

L. Sharan, R. Rosenholtz, E. H. Adelson, Accuracy and speed of material categorization in real-world images, Journal of Vision 14 (9) (2014) 12–12. doi:10.1167/14.9.12

work page doi:10.1167/14.9.12 2014
[30]

Cimpoi, S

M. Cimpoi, S. Maji, I. Kokkinos, S. Mohamed, A. Vedaldi, Describing textures in the wild, in: Proceedings of the IEEE conference on computer vision and pattern recognition, 2014, pp. 3606–3613

2014
[31]

Y. Xu, H. Ji, C. Fermüller, Viewpoint invariant texture description using fractal analysis, International Journal of Computer Vision 83 (1) (2009) 85–100. doi:10.1007/s11263-009-0220-6

work page doi:10.1007/s11263-009-0220-6 2009
[32]

Lazebnik, C

S. Lazebnik, C. Schmid, J. Ponce, A sparse texture representation using local affine regions, IEEE Transactions on Pattern Analysis and Machine Intelligence 27 (8) (2005) 1265–1278. doi:10.1109/TPAMI.2005.151

work page doi:10.1109/tpami.2005.151 2005
[33]

Casanova, J

D. Casanova, J. J. de Mesquita Sá Junior, O. M. Bruno, Plant leaf iden- tification using gabor wavelets, International Journal of Imaging Systems and Technology 19 (3) (2009) 236–243. doi:10.1002/ima.20201

work page doi:10.1002/ima.20201 2009
[34]

Cimpoi, S

M. Cimpoi, S. Maji, I. Kokkinos, A. Vedaldi, Deep filter banks for tex- ture recognition, description, and segmentation, International Journal of Computer Vision 118 (1) (2016) 65–94

2016
[35]

Y. Song, F. Zhang, Q. Li, H. Huang, L. J. O’Donnell, W. Cai, Locally- transferred fisher vectors for texture classification, in: Proceedings of the IEEE International Conference on Computer Vision, 2017, pp. 4912–4920

2017
[36]

Zhang, J

H. Zhang, J. Xue, K. Dana, Deep ten: Texture encoding network, in: Pro- ceedings of the IEEE conference on computer vision and pattern recogni- tion, 2017, pp. 708–717. 19

2017
[37]

Jbene, A

M. Jbene, A. D. El Maliani, M. El Hassouni, Fusion of convolutional neu- ral network and statistical features for texture classification, in: 2019 In- ternational Conference on Wireless Networks and Mobile Communications (WINCOM), IEEE, 2019, pp. 1–4

2019
[38]

11010–11019

W.Zhai, Y.Cao, Z.-J.Zha, H.Xie, F.Wu, Deepstructure-revealednetwork for texture recognition, in: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020, pp. 11010–11019

2020
[39]

J. B. Florindo, Y.-S. Lee, K. Jun, G. Jeon, M. K. Albertini, Visgraphnet: A complex network interpretation of convolutional neural features, Infor- mation Sciences 543 (2021) 296–308

2021
[40]

Florindo, K

J. Florindo, K. Metze, Using non-additive entropy to enhance convolu- tional neural features for texture recognition, Entropy 23 (2021) 1259. doi:10.3390/e23101259

work page doi:10.3390/e23101259 2021
[41]

S. Mao, D. Rajan, L. T. Chia, Deep residual pooling network for texture recognition, Pattern Recognition 112 (2021) 107817

2021
[42]

Mamidibathula, S

B. Mamidibathula, S. Amirneni, S. S. Sistla, N. Patnam, Texture classifica- tion using capsule networks, in: Pattern Recognition and Image Analysis: 9th Iberian Conference, IbPRIA 2019, Madrid, Spain, July 1–4, 2019, Pro- ceedings, Part I 9, Springer, 2019, pp. 589–599. 20

2019

[1] [1]

Young, N

D. Young, N. Khan, S. R. Hobson, D. Sussman, Diagnosis of placenta acc- reta spectrum using ultrasound texture feature fusion and machine learn- ing, Computers in Biology and Medicine 178 (2024) 108757

2024

[2] [2]

Barburiceanu, S

S. Barburiceanu, S. Meza, B. Orza, R. Malutan, R. Terebes, Convolutional neural networks for texture feature extraction. applications to leaf disease classification in precision agriculture, IEEE Access 9 (2021) 160085–160103. 15

2021

[3] [3]

J. Si, S. Kim, V-daft: Visual technique for texture image defect recognition with denoising autoencoder and fourier transform, Signal, Image and Video Processing 18 (10) (2024) 7405–7418

2024

[4] [4]

H. Han, Z. Feng, W. Du, S. Guo, P. Wang, T. Xu, Remote sensing im- age classification based on multi-spectral cross-sensor super-resolution com- bined with texture features: A case study in the liaohe planting area, IEEE Access 12 (2024) 16830–16843

2024

[5] [5]

Akiva, M

P. Akiva, M. Purri, M. Leotta, Self-supervised material and texture rep- resentation learning for remote sensing tasks, in: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022, pp. 8203–8215

2022

[6] [6]

K. He, X. Chen, S. Xie, Y. Li, P. Dollár, R. Girshick, Masked autoencoders are scalable vision learners, in: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2022, pp. 16000–16009

2022

[7] [7]

An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale

A. Dosovitskiy, L. Beyer, A. Kolesnikov, D. Weissenborn, X. Zhai, T. Un- terthiner, M. Dehghani, M. Minderer, G. Heigold, S. Gelly, et al., An image is worth 16x16 words: Transformers for image recognition at scale, arXiv preprint arXiv:2010.11929 (2020)

work page internal anchor Pith review Pith/arXiv arXiv 2010

[8] [8]

L. O. Lyra, A. E. Fabris, J. B. Florindo, A multilevel pooling scheme in convolutional neural networks for texture image recognition, Applied Soft Computing (2024) 111282doi:https://doi.org/10.1016/j.asoc.2024.111282

work page doi:10.1016/j.asoc.2024.111282 2024

[9] [9]

Gogna, A

A. Gogna, A. Majumdar, Discriminative autoencoder for feature extrac- tion: Application to character recognition, Neural Processing Letters 49 (2019) 1723–1735

2019

[10] [10]

Z. Yang, X. Wu, P. Huang, F. Zhang, M. Wan, Z. Lai, Orthogonal autoen- coder regression for image classification, Information Sciences 618 (2022) 400–416. 16

2022

[11] [11]

Q. Kang, J. Gao, K. Li, Q. Lao, Deblurring masked autoencoder is better recipe for ultrasound image recognition, in: International Conference on Medical Image Computing and Computer-Assisted Intervention, Springer, 2023, pp. 352–362

2023

[12] [12]

Cimpoi, S

M. Cimpoi, S. Maji, A. Vedaldi, Deep filter banks for texture recognition and segmentation, in: Proceedings of the IEEE conference on computer vision and pattern recognition, 2015, pp. 3828–3836

2015

[13] [13]

Very Deep Convolutional Networks for Large-Scale Image Recognition

K. Simonyan, A. Zisserman, Very deep convolutional networks for large- scale image recognition, arXiv preprint arXiv:1409.1556 (2014)

work page internal anchor Pith review Pith/arXiv arXiv 2014

[14] [14]

Z. Chen, F. Li, Y. Quan, Y. Xu, H. Ji, Deep texture recognition via exploit- ing cross-layer statistical self-similarity, in: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021, pp. 5231– 5240

2021

[15] [15]

Scabini, K

L. Scabini, K. M. Zielinski, L. C. Ribas, W. N. Gonçalves, B. De Baets, O. M. Bruno, Radam: Texture recognition through randomized aggregated encoding of deep activation maps, Pattern Recognition 143 (2023) 109802. doi:https://doi.org/10.1016/j.patcog.2023.109802. URLhttps://www.sciencedirect.com/science/article/pii/ S0031320323005009

work page doi:10.1016/j.patcog.2023.109802 2023

[16] [16]

Z. Yang, S. Lai, X. Hong, Y. Shi, Y. Cheng, C. Qing, Dfaen: Double-order knowledge fusion and attentional encoding network for texture recognition, Expert Systems with Applications 209 (2022) 118223

2022

[17] [17]

Y. Xu, F. Li, Z. Chen, J. Liang, Y. Quan, Encoding spatial distribution of convolutional features for texture representation, Advances in Neural Information Processing Systems 34 (2021)

2021

[18] [18]

J. B. Florindo, E. E. Laureano, Boff: A bag of fuzzy deep features for texture recognition, Expert Systems with Applications 219 (2023) 119627. 17

2023

[19] [19]

Scabini, A

L. Scabini, A. Sacilotti, K. M. Zielinski, L. C. Ribas, B. De Baets, O. M. Bruno, A comparative survey of vision transformers for feature extraction in texture analysis, arXiv preprint arXiv:2406.06136 (2024)

work page arXiv 2024

[20] [20]

L. Zhu, T. Chen, J. Yin, S. See, J. Liu, Learning gabor texture features for fine-grained recognition, in: Proceedings of the IEEE/CVF international conference on computer vision, 2023, pp. 1621–1631

2023

[21] [21]

A. Bera, D. Bhattacharjee, M. Nasipuri, Deep neural networks fused with textures for image classification, in: International conference on frontiers in computing and systems, Springer, 2022, pp. 103–111

2022

[22] [22]

Goyal, S

V. Goyal, S. Sharma, Texture classification for visual data using transfer learning, Multimedia Tools and Applications 82 (16) (2023) 24841–24864

2023

[23] [23]

Jaakkola, D

T. Jaakkola, D. Haussler, Exploiting generative models in discriminative classifiers, Advances in neural information processing systems 11 (1998)

1998

[24] [24]

Sánchez, F

J. Sánchez, F. Perronnin, T. Mensink, J. Verbeek, Image classification with the fisher vector: Theory and practice, International journal of computer vision 105 (3) (2013) 222–245

2013

[25] [25]

Perronnin, C

F. Perronnin, C. Dance, Fisher kernels on visual vocabularies for image categorization, in: 2007 IEEE conference on computer vision and pattern recognition, IEEE, 2007, pp. 1–8

2007

[26] [26]

Perronnin, J

F. Perronnin, J. Sánchez, T. Mensink, Improving the fisher kernel for large- scale image classification, in: European conference on computer vision, Springer, 2010, pp. 143–156

2010

[27] [27]

M. Tan, Q. Le, Efficientnet: Rethinking model scaling for convolu- tional neural networks, in: International Conference on Machine Learning, PMLR, 2019, pp. 6105–6114

2019

[28] [28]

Caputo, E

B. Caputo, E. Hayman, P. Mallikarjuna, Class-specific material cat- egorisation, in: Tenth IEEE International Conference on Computer 18 Vision (ICCV’05) Volume 1, Vol. 2, 2005, pp. 1597–1604 Vol. 2. doi:10.1109/ICCV.2005.54

work page doi:10.1109/iccv.2005.54 2005

[29] [29]

Sharan, R

L. Sharan, R. Rosenholtz, E. H. Adelson, Accuracy and speed of material categorization in real-world images, Journal of Vision 14 (9) (2014) 12–12. doi:10.1167/14.9.12

work page doi:10.1167/14.9.12 2014

[30] [30]

Cimpoi, S

M. Cimpoi, S. Maji, I. Kokkinos, S. Mohamed, A. Vedaldi, Describing textures in the wild, in: Proceedings of the IEEE conference on computer vision and pattern recognition, 2014, pp. 3606–3613

2014

[31] [31]

Y. Xu, H. Ji, C. Fermüller, Viewpoint invariant texture description using fractal analysis, International Journal of Computer Vision 83 (1) (2009) 85–100. doi:10.1007/s11263-009-0220-6

work page doi:10.1007/s11263-009-0220-6 2009

[32] [32]

Lazebnik, C

S. Lazebnik, C. Schmid, J. Ponce, A sparse texture representation using local affine regions, IEEE Transactions on Pattern Analysis and Machine Intelligence 27 (8) (2005) 1265–1278. doi:10.1109/TPAMI.2005.151

work page doi:10.1109/tpami.2005.151 2005

[33] [33]

Casanova, J

D. Casanova, J. J. de Mesquita Sá Junior, O. M. Bruno, Plant leaf iden- tification using gabor wavelets, International Journal of Imaging Systems and Technology 19 (3) (2009) 236–243. doi:10.1002/ima.20201

work page doi:10.1002/ima.20201 2009

[34] [34]

Cimpoi, S

M. Cimpoi, S. Maji, I. Kokkinos, A. Vedaldi, Deep filter banks for tex- ture recognition, description, and segmentation, International Journal of Computer Vision 118 (1) (2016) 65–94

2016

[35] [35]

Y. Song, F. Zhang, Q. Li, H. Huang, L. J. O’Donnell, W. Cai, Locally- transferred fisher vectors for texture classification, in: Proceedings of the IEEE International Conference on Computer Vision, 2017, pp. 4912–4920

2017

[36] [36]

Zhang, J

H. Zhang, J. Xue, K. Dana, Deep ten: Texture encoding network, in: Pro- ceedings of the IEEE conference on computer vision and pattern recogni- tion, 2017, pp. 708–717. 19

2017

[37] [37]

Jbene, A

M. Jbene, A. D. El Maliani, M. El Hassouni, Fusion of convolutional neu- ral network and statistical features for texture classification, in: 2019 In- ternational Conference on Wireless Networks and Mobile Communications (WINCOM), IEEE, 2019, pp. 1–4

2019

[38] [38]

11010–11019

W.Zhai, Y.Cao, Z.-J.Zha, H.Xie, F.Wu, Deepstructure-revealednetwork for texture recognition, in: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020, pp. 11010–11019

2020

[39] [39]

J. B. Florindo, Y.-S. Lee, K. Jun, G. Jeon, M. K. Albertini, Visgraphnet: A complex network interpretation of convolutional neural features, Infor- mation Sciences 543 (2021) 296–308

2021

[40] [40]

Florindo, K

J. Florindo, K. Metze, Using non-additive entropy to enhance convolu- tional neural features for texture recognition, Entropy 23 (2021) 1259. doi:10.3390/e23101259

work page doi:10.3390/e23101259 2021

[41] [41]

S. Mao, D. Rajan, L. T. Chia, Deep residual pooling network for texture recognition, Pattern Recognition 112 (2021) 107817

2021

[42] [42]

Mamidibathula, S

B. Mamidibathula, S. Amirneni, S. S. Sistla, N. Patnam, Texture classifica- tion using capsule networks, in: Pattern Recognition and Image Analysis: 9th Iberian Conference, IbPRIA 2019, Madrid, Spain, July 1–4, 2019, Pro- ceedings, Part I 9, Springer, 2019, pp. 589–599. 20

2019