Zero-Shot Synthetic-to-Real Handwritten Text Recognition via Task Analogies
Pith reviewed 2026-05-10 18:31 UTC · model grok-4.3
The pith
Models learn the parameter shift from synthetic to real handwriting in known languages and apply the same correction to recognize real text in entirely new languages with no real target samples.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Our approach learns how model parameters change when moving from synthetic to real handwriting in one or more source languages and transfers this learned correction to new target languages. When using multiple sources, we rely on linguistic similarity to weigh their contribution when combining them. Experiments across five languages and six architectures show consistent improvements over synthetic-only baselines and reveal that the transferred corrections benefit even languages unrelated to the sources.
What carries the argument
The learned parameter correction that maps a synthetic-trained model toward real-data performance, transferred by analogy to target languages.
If this is right
- Recognition accuracy on real test data rises in target languages compared with models trained only on synthetic data.
- Multiple source languages can be combined by weighting each correction according to linguistic similarity.
- Performance gains occur even when the target language has no linguistic connection to any source language.
- The same correction mechanism improves results across different neural architectures used for handwritten text recognition.
Where Pith is reading between the lines
- Collecting large real handwriting datasets could become unnecessary for many languages once source corrections exist.
- The same idea of learning and transferring parameter shifts might apply to other synthetic-to-real problems such as scene text or document layout analysis.
- If the shifts turn out to be stable, researchers could test whether corrections can be chained across sequences of languages or scripts.
Load-bearing premise
The adjustment that model parameters require when moving from synthetic to real handwriting is similar enough across languages for the correction to transfer directly.
What would settle it
Apply the parameter correction learned from a source language pair to a model for a new target language and measure whether accuracy on real target data improves, stays the same, or drops below the synthetic-only baseline.
Figures
read the original abstract
Handwritten Text Recognition (HTR) models trained on synthetic handwriting often struggle to generalize to real text, and existing adaptation methods still require real samples from the target domain. In this work, we tackle the fully zero-shot synthetic-to-real generalization setting, where no real data from the target language is available. Our approach learns how model parameters change when moving from synthetic to real handwriting in one or more source languages and transfers this learned correction to new target languages. When using multiple sources, we rely on linguistic similarity to weigh their contrubition when combining them. Experiments across five languages and six architectures show consistent improvements over synthetic-only baselines and reveal that the transferred corrections benefit even languages unrelated to the sources.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript proposes a zero-shot synthetic-to-real adaptation method for handwritten text recognition (HTR). It learns a parameter-level correction (delta) from synthetic-to-real shifts observed in one or more source languages and transfers this correction to a target language by adding it to the synthetic-trained model parameters; when multiple sources are available, the deltas are combined via a linguistic-similarity weighting. Experiments on five languages and six architectures are reported to show consistent gains over synthetic-only baselines, including gains on languages unrelated to the sources.
Significance. If the empirical results and the underlying invariance assumption hold under closer scrutiny, the work would be significant for zero-shot domain adaptation in document analysis and computer vision. It offers a practical route to improve HTR models for languages lacking real annotated data by exploiting cross-lingual analogies in parameter space rather than requiring target-domain samples. The multi-architecture evaluation is a positive feature that supports broader applicability.
major comments (3)
- [§3] §3 (Method): The central claim that the learned correction delta = theta_real_source - theta_synth_source can be directly added to theta_synth_target rests on an untested cross-lingual invariance assumption. When source and target languages differ in script or character set, the output-layer dimensionality and character-frequency statistics differ, so the shared-parameter portion of delta may encode language-specific cues rather than a domain shift; the linguistic-similarity weighting does not resolve this.
- [§4] §4 (Experiments) and Table 2: The abstract and results claim 'consistent improvements' across five languages and six architectures, yet no numerical values, baseline CER/WER figures, standard deviations, or statistical significance tests are provided in the visible sections. Without these, it is impossible to judge effect size or whether gains on unrelated languages are reliable or merely within noise.
- [§5.1] §5.1 (Discussion of unrelated languages): The observation that corrections benefit unrelated languages is presented as evidence of generality, but the paper does not report an ablation that isolates whether the benefit arises from the shared convolutional backbone versus the language-specific classifier head. This leaves the invariance claim load-bearing but unverified.
minor comments (3)
- [Abstract] Abstract: 'contrubition' is a typo for 'contribution'.
- [§2] §2 (Related Work): The discussion of prior synthetic-to-real HTR adaptation omits several recent parameter-efficient or prompt-based zero-shot methods that could serve as stronger baselines.
- [Figure 3] Figure 3: Axis labels and legend are too small; the plotted curves for different source combinations are difficult to distinguish.
Simulated Author's Rebuttal
Thank you for the constructive feedback on our manuscript. We provide point-by-point responses to the major comments and have made revisions to address the concerns raised.
read point-by-point responses
-
Referee: [§3] §3 (Method): The central claim that the learned correction delta = theta_real_source - theta_synth_source can be directly added to theta_synth_target rests on an untested cross-lingual invariance assumption. When source and target languages differ in script or character set, the output-layer dimensionality and character-frequency statistics differ, so the shared-parameter portion of delta may encode language-specific cues rather than a domain shift; the linguistic-similarity weighting does not resolve this.
Authors: We thank the referee for this observation. In the revised manuscript, we clarify in §3 that the delta is computed and transferred only for the shared convolutional layers of the network, as the classifier head is language-specific and its parameters are not included in the correction for target languages with different scripts. This ensures the transferred delta focuses on domain shift in feature extraction. We have added a figure showing the selective parameter update. While the invariance is an assumption, the positive results on unrelated languages support its practical utility. The linguistic similarity is used for weighting multiple sources but is secondary to the shared-layer design. revision: yes
-
Referee: [§4] §4 (Experiments) and Table 2: The abstract and results claim 'consistent improvements' across five languages and six architectures, yet no numerical values, baseline CER/WER figures, standard deviations, or statistical significance tests are provided in the visible sections. Without these, it is impossible to judge effect size or whether gains on unrelated languages are reliable or merely within noise.
Authors: We have updated §4 to explicitly quote and discuss the key numerical results from Table 2, including baseline and improved CER/WER values for each language and model. Standard deviations are now reported for experiments with multiple seeds. We also added statistical significance testing using paired t-tests, with p-values indicating that the improvements are significant, including for gains on languages unrelated to the sources. revision: yes
-
Referee: [§5.1] §5.1 (Discussion of unrelated languages): The observation that corrections benefit unrelated languages is presented as evidence of generality, but the paper does not report an ablation that isolates whether the benefit arises from the shared convolutional backbone versus the language-specific classifier head. This leaves the invariance claim load-bearing but unverified.
Authors: This is a fair point. We have performed the requested ablation and included the results in the revised §5.1. Specifically, we compare transferring the full delta (where possible) versus only the backbone delta. The ablation shows that the majority of the benefit comes from the backbone corrections, with minimal or no contribution from head parameters when scripts match. This verifies that the method leverages domain-invariant shifts in the shared parameters. revision: yes
Circularity Check
No circularity: empirical parameter-transfer method with no closed-form derivation or self-referential fitting
full rationale
The paper presents an empirical method for zero-shot domain transfer in HTR by learning parameter deltas on source languages and applying weighted combinations to targets based on linguistic similarity. No equations, derivations, or first-principles claims appear in the provided text; the approach is framed as training on observed source shifts and testing generalization, without any step that reduces a 'prediction' to a fitted input by construction or relies on self-citation for uniqueness. The central claim rests on experimental validation across languages and architectures rather than any mathematical identity or ansatz smuggled via prior work. This is a standard empirical transfer setup with no detectable circular reduction.
Axiom & Free-Parameter Ledger
Reference graph
Works this paper leans on
-
[1]
Abdelrahman Abdallah, Mohamed A. Hamada, and Daniyar Nurseitov. Attention-based fully gated cnn-bgru for russian handwritten text.Journal of Imaging, 2020. 2
work page 2020
-
[2]
Haikal El Abed, V olker M ¨argner, and Michael Blumen- stein. International Conference on Frontiers in Handwriting Recognition (ICFHR 2010) - Competitions Overview. 2010. 2
work page 2010
-
[3]
Aviad Aberdam, Ron Litman, Shahar Tsiper, Oron Anschel, Ron Slossberg, Shai Mazor, R. Manmatha, and Pietro Per- ona. Sequence-to-sequence contrastive learning for text recognition.Computer Vision and Pattern Recognition,
-
[4]
Understanding intermediate layers using linear classifier probes
Guillaume Alain and Yoshua Bengio. Understanding in- termediate layers using linear classifier probes.ArXiv, abs/1610.01644, 2016. 8
work page Pith review arXiv 2016
-
[5]
Jose Carlos Aradillas, Juan Jose Murillo-Fuentes, and Pablo M. Olmos. Boosting Offline Handwritten Text Recog- nition in Historical Documents With Few Labeled Lines. IEEE Access, 9:76674–76688, 2021. 2
work page 2021
-
[6]
Jose Carlos Aradillas, Juan Jose Murillo-Fuentes, and Pablo M. Olmos. Boosting offline handwritten text recog- nition in historical documents with few labeled lines.IEEE Access, 2021. 2
work page 2021
-
[7]
Jose Carlos Aradillas Jaramillo, Juan Jose Murillo-Fuentes, and Pablo M. Olmos. Boosting Handwriting Text Recog- nition in Small Databases with Transfer Learning. In2018 16th International Conference on Frontiers in Handwriting Recognition (ICFHR), pages 429–434, Niagara Falls, NY , USA, 2018. IEEE. 2
work page 2018
-
[8]
Neural machine translation by jointly learning to align and translate
Dzmitry Bahdanau, Kyunghyun Cho, and Yoshua Bengio. Neural machine translation by jointly learning to align and translate. In3rd International Conference on Learning Rep- resentations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings, 2015. 2
work page 2015
-
[9]
A light transformer-based architecture for handwritten text recognition
Killian Barrere, Yann Soullard, Aur ´elie Lemaitre, and Bertrand Co ¨uasnon. A light transformer-based architecture for handwritten text recognition. 2022. 2
work page 2022
-
[10]
Killian Barrere, Yann Soullard, Aur ´elie Lemaitre, and Bertrand Co ¨uasnon. Training transformer architectures on few annotated data: an application to historical handwritten text recognition.International Journal on Document Analy- sis and Recognition (IJDAR), 2024. 2
work page 2024
-
[11]
MetaHTR: Towards Writer-Adaptive Handwritten Text Recognition
Ayan Kumar Bhunia, Shuvozit Ghose, Amandeep Ku- mar, Pinaki Nath Chowdhury, Aneeshan Sain, and Yi-Zhe Song. MetaHTR: Towards Writer-Adaptive Handwritten Text Recognition. In2021 IEEE/CVF Conference on Com- puter Vision and Pattern Recognition (CVPR), pages 15825– 15834, Nashville, TN, USA, 2021. IEEE. 2
work page 2021
-
[12]
The lam dataset: A novel benchmark for line- level handwritten text recognition
Silvia Cascianelli, Vittorio Pippi, Maarand Martin, Marcella Cornia, Lorenzo Baraldi, Kermorvant Christopher, and Rita Cucchiara. The lam dataset: A novel benchmark for line- level handwritten text recognition. InICPR, 2022. 5, 1
work page 2022
-
[13]
End-to-end handwritten paragraph text recognition using a vertical attention network
Denis Coquenet, Clement Chatelain, and Thierry Paquet. End-to-end handwritten paragraph text recognition using a vertical attention network. 2022. 1, 2, 6
work page 2022
-
[14]
Denis Coquenet, Cl ´ement Chatelain, and Thierry Paquet. Dan: a segmentation-free document attention network for handwritten document recognition.IEEE Transactions on Pattern Analysis and Machine Intelligence, 2023. 1, 5
work page 2023
-
[15]
Dan: a segmentation-free document attention network for handwritten document recognition
Denis Coquenet, Cl ´ement Chatelain, and Thierry Paquet. Dan: a segmentation-free document attention network for handwritten document recognition. 2023. 2
work page 2023
-
[16]
Rethinking text line recog- nition models.arXiv, 2021
Daniel Hernandez Diaz, Reeve Ingle, Siyang Qin, Alessan- dro Bissacco, and Yasuhisa Fujii. Rethinking text line recog- nition models.arXiv, 2021. 2
work page 2021
-
[17]
An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale
Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias Minderer, Georg Heigold, Syl- vain Gelly, Jakob Uszkoreit, and Neil Houlsby. An image is worth 16x16 words: Transformers for image recognition at scale.ArXiv, abs/2010.11929, 2020. 2
work page internal anchor Pith review Pith/arXiv arXiv 2010
-
[18]
Dtrocr: Decoder-only transformer for opti- cal character recognition.arXiv.org, 2023
Masato Fujitake. Dtrocr: Decoder-only transformer for opti- cal character recognition.arXiv.org, 2023. 2
work page 2023
-
[19]
Antonio Andrea Gargiulo, Donato Crisostomi, Maria Sofia Bucarelli, Simone Scardapane, Fabrizio Silvestri, and Emanuele Rodol`a. Task singular vectors: Reducing task in- terference in model merging.Proceedings of the IEEE con- ference on Computer Vision and Pattern Recognition, 2025. 2
work page 2025
-
[20]
On the generalization of handwritten text recognition models
Carlos Garrido-Munoz and Jorge Calvo-Zaragoza. On the generalization of handwritten text recognition models. In Proceedings of the IEEE/CVF Conference on Computer Vi- sion and Pattern Recognition (CVPR), pages 15275–15286,
-
[21]
Handwritten text recognition: A survey, 2025
Carlos Garrido-Munoz, Antonio Rios-Vila, and Jorge Calvo- Zaragoza. Handwritten text recognition: A survey, 2025. 1, 2
work page 2025
-
[22]
Alex Graves and J ¨urgen Schmidhuber. Framewise phoneme classification with bidirectional lstm and other neural net- work architectures.Neural Networks, 18(5):602–610, 2005. IJCNN 2005. 2
work page 2005
-
[23]
Graves, Santiago Fern ´andez, Faustino J
A. Graves, Santiago Fern ´andez, Faustino J. Gomez, and J. Schmidhuber. Connectionist temporal classification: la- belling unsegmented sequence data with recurrent neural networks.ICML, 2006. 2, 5
work page 2006
-
[24]
In search of lost do- main generalization.International Conference on Learning Representations, 2021
Ishaan Gulrajani and David Lopez-Paz. In search of lost do- main generalization.International Conference on Learning Representations, 2021. 2, 4, 6
work page 2021
-
[25]
Long short-term memory.Neural Comput., 9(8):1735–1780, 1997
Sepp Hochreiter and J ¨urgen Schmidhuber. Long short-term memory.Neural Comput., 9(8):1735–1780, 1997. 2
work page 1997
-
[26]
Editing models with task arithmetic
Gabriel Ilharco, Marco T ´ulio Ribeiro, Mitchell Wortsman, Ludwig Schmidt, Hannaneh Hajishirzi, and Ali Farhadi. Editing models with task arithmetic. InInternational Con- ference on Learning Representations, 2023. 1, 2, 3
work page 2023
-
[27]
Unsupervised Adaptation for Synthetic- to-Real Handwritten Word Recognition
Lei Kang, Marcal Rusinol, Alicia Fornes, Pau Riba, and Mauricio Villegas. Unsupervised Adaptation for Synthetic- to-Real Handwritten Word Recognition. In2020 IEEE Win- ter Conference on Applications of Computer Vision (WACV), pages 3491–3500, Snowmass Village, CO, USA, 2020. IEEE. 2
work page 2020
-
[28]
Attentionhtr: Handwritten text recognition based on attention encoder-decoder networks
Dmitrijs Kass and Ekta Vats. Attentionhtr: Handwritten text recognition based on attention encoder-decoder networks
-
[29]
Handwrit- ten mail classification experiments with the rimes database
Christopher Kermorvant and J ´erˆome Louradour. Handwrit- ten mail classification experiments with the rimes database. InInternational Conference on Frontiers in Handwriting Recognition, ICFHR 2010, Kolkata, India, 16-18 November 2010, pages 241–246. IEEE Computer Society, 2010. 5, 1
work page 2010
-
[30]
Towards Writing Style Adaptation in Handwriting Recognition, 2023
Jan Koh ´ut, Michal Hradiˇs, and Martin Kiˇsˇs. Towards Writing Style Adaptation in Handwriting Recognition, 2023. Version Number: 1. 2
work page 2023
-
[31]
Lexi- con and attention based handwritten text recognition system
Lalita Kumari, Sukhdeep Singh, Vaibhav Varish Singh Rathore, Anuj Sharma, Lalita Kumari, Sukhdeep Singh, Vaibhav Varish Singh Rathore, and Anuj Sharma. Lexi- con and attention based handwritten text recognition system
-
[32]
Trocr: Transformer-based optical character recognition with pre-trained models.Proceedings of the
Minghao Li, Tengchao Lv, Jingye Chen, Lei Cui, Yijuan Lu, Dinei Florencio, Cha Zhang, Zhoujun Li, and Furu Wei. Trocr: Transformer-based optical character recognition with pre-trained models.Proceedings of the ... AAAI Conference on Artificial Intelligence, 2023. 1, 2, 5, 6
work page 2023
-
[33]
HTR-VT: Handwritten text recognition with vision trans- former
Yuting Li, Dexiong Chen, Tinglong Tang, and Xi Shen. HTR-VT: Handwritten text recognition with vision trans- former. 158:110967, 2024. 1, 5, 6, 2
work page 2024
-
[34]
Jiasen Lu, Dhruv Batra, Devi Parikh, and Stefan Lee. Vilbert: Pretraining task-agnostic visiolinguistic representations for vision-and-language tasks.Neural Information Processing Systems, 2019. 2
work page 2019
-
[35]
Simpler is better: Few-shot semantic segmenta- tion with classifier weight transformer
Zhihe Lu, Sen He, Xiatian Zhu, Li Zhang, Yi-Zhe Song, and Tao Xiang. Simpler is better: Few-shot semantic segmenta- tion with classifier weight transformer. 2021. 2
work page 2021
-
[36]
Magmax: Lever- aging model merging for seamless continual learning
Daniel Marczak, Bartłomiej Twardowski, Tomasz Trzci’nski, and Sebastian Cygert. Magmax: Lever- aging model merging for seamless continual learning. Proceedings of the European Conference on Computer Vision, 2024. 2
work page 2024
-
[37]
Bagdanov, and Joost van de Weijer
Daniel Marczak, Simone Magistri, Sebastian Cygert, Bartłomiej Twardowski, Andrew D. Bagdanov, and Joost van de Weijer. No task left behind: Isotropic model merging with common and task-specific subspaces. InInternational Conference on Machine Learning, 2025
work page 2025
-
[38]
Weighted ensemble models are strong continual learners.arXiv preprint arXiv: 2312.08977, 2023
Imad Eddine Marouf, Subhankar Roy, Enzo Tartaglione, and St´ephane Lathuili`ere. Weighted ensemble models are strong continual learners.arXiv preprint arXiv: 2312.08977, 2023. 2
-
[39]
The iam-database: an en- glish sentence database for offline handwriting recognition
Urs-Viktor Marti and Horst Bunke. The iam-database: an en- glish sentence database for offline handwriting recognition. International Journal on Document Analysis and Recogni- tion, 2002. 5, 1
work page 2002
-
[40]
Merging models with fisher-weighted averaging
Michael Matena and Colin Raffel. Merging models with fisher-weighted averaging. InAdvances in Neural Informa- tion Processing Systems, 2021. 2
work page 2021
-
[41]
Labahn, Tobias Gr ¨uning, and Jochen Z¨ollner
Johannes Michael, R. Labahn, Tobias Gr ¨uning, and Jochen Z¨ollner. Evaluating sequence-to-sequence models for hand- written text recognition.IEEE International Conference on Document Analysis and Recognition, 2019. 2
work page 2019
-
[42]
Saleh Momeni and B. BabaAli. A transformer-based approach for arabic offline handwritten text recognition. arXiv.org, 2023. 2
work page 2023
-
[43]
Aly Mostafa, Omar Mohamed, Ali Ashraf, Ahmed Elbe- hery, Salma Jamal, Ghada Khoriba, and A. Ghoneim. Oc- former: A transformer-based model for arabic handwritten text recognition.2021 International Mobile, Intelligent, and Ubiquitous Computing Conference (MIUCC), 2021. 2
work page 2021
-
[44]
Bag- danov, Simone Calderara, and Joost van de Weijer
Aniello Panariello, Daniel Marczak, Simone Magistri, An- gelo Porrello, Bartłomiej Twardowski, Andrew D. Bag- danov, Simone Calderara, and Joost van de Weijer. Accu- rate and efficient low-rank model merging in core space. In Advances in Neural Information Processing Systems, 2025. 2
work page 2025
-
[45]
Paul, Gagan Madan, Akankshya Mishra, N
S. Paul, Gagan Madan, Akankshya Mishra, N. Hegde, Pradeep Kumar, and Gaurav Aggarwal. Weakly supervised information extraction from inscrutable handwritten docu- ment images.arXiv, 2023. 2
work page 2023
-
[46]
Carlos Pe ˜narrubia, J. J. Valero-Mas, and Jorge Calvo- Zaragoza. Self-supervised learning for text recognition: A critical survey.arXiv.org, 2024. 2
work page 2024
-
[47]
Vittorio Pippi, S. Cascianelli, and R. Cucchiara. Handwritten text generation from visual archetypes.arXiv.org, 2023. 2
work page 2023
-
[48]
How to choose pretrained handwriting recognition models for single writer fine-tuning
Vittorio Pippi, Silvia Cascianelli, Christopher Kermorvant, and Rita Cucchiara. How to choose pretrained handwriting recognition models for single writer fine-tuning. InInterna- tional Conference on Document Analysis and Recognition, pages 330–347. Springer, 2023. 1, 2
work page 2023
-
[49]
Jason Poulos and Rafael Valle. Character-based handwritten text transcription with attention networks.Neural Computing and Applications, 2021. 2
work page 2021
-
[50]
Joan Puigcerver. Are multidimensional recurrent layers re- ally necessary for handwritten text recognition? In14th IAPR International Conference on Document Analysis and Recognition, ICDAR 2017, Kyoto, Japan, November 9-15, 2017, pages 67–72. IEEE, 2017. 1, 5, 6, 2
work page 2017
-
[51]
Learning to learn single domain generalization.Computer Vision and Pattern Recognition, 2020
Fengchun Qiao, Long Zhao, and Xi Peng. Learning to learn single domain generalization.Computer Vision and Pattern Recognition, 2020. 2
work page 2020
-
[52]
Language models are unsuper- vised multitask learners
Alec Radford, Jeff Wu, Rewon Child, David Luan, Dario Amodei, and Ilya Sutskever. Language models are unsuper- vised multitask learners. 2019. 2
work page 2019
-
[53]
Learning transferable visual models from natural language supervision
Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, Gretchen Krueger, and Ilya Sutskever. Learning transferable visual models from natural language supervision. InInternational Conference on Machine Learning, 2021. 2
work page 2021
-
[54]
D. V . Sang and Le Tran Bao Cuong. Improving crnn with efficientnet-like feature extractor and multi-head attention for text recognition.SoICT 2019, 2019. 2
work page 2019
-
[55]
Nicolas Serrano, Francisco Castro, and Alfons Juan. The RODRIGO database. InProceedings of the Seventh Interna- tional Conference on Language Resources and Evaluation (LREC’10), Valletta, Malta, 2010. European Language Re- sources Association (ELRA). 5, 1
work page 2010
-
[56]
Improving Text Recognition using Optical and Language Model Writer Adaptation
Yann Soullard, Wassim Swaileh, Pierrick Tranouez, Thierry Paquet, and Clement Chatelain. Improving Text Recognition using Optical and Language Model Writer Adaptation. In 2019 International Conference on Document Analysis and 10 Recognition (ICDAR), pages 1175–1180, Sydney, Australia,
work page 2019
-
[57]
Wit: Wikipedia-based image text dataset for multimodal multilingual machine learning
Krishna Srinivasan, Karthik Raman, Jiecao Chen, Michael Bendersky, and Marc Najork. Wit: Wikipedia-based image text dataset for multimodal multilingual machine learning. InProceedings of the 44th International ACM SIGIR Confer- ence on Research and Development in Information Retrieval, page 2443–2449, New York, NY , USA, 2021. Association for Computing Ma...
work page 2021
-
[58]
Hsuan Su, Hua Farn, Fan-Yun Sun, Shang-Tse Chen, and Hung yi Lee. Task arithmetic can mitigate synthetic-to-real gap in automatic speech recognition.Empirical Methods in Natural Language Processing, 2024. 2
work page 2024
-
[59]
Joan Andreu S ´anchez, Ver´onica Romero, A. Toselli, and E. Vidal. Icfhr2014 competition on handwritten text recogni- tion on transcriptorium datasets (htrts).2014 14th Interna- tional Conference on Frontiers in Handwriting Recognition,
work page 2014
-
[60]
Joan Andreu S ´anchez, Ver ´onica Romero, Alejandro H. Toselli, and Enrique Vidal. Icfhr2016 competition on hand- written text recognition on the read dataset. In2016 15th In- ternational Conference on Frontiers in Handwriting Recog- nition (ICFHR), pages 630–635, 2016. 5, 1
work page 2016
-
[61]
Joan-Andreu S ´anchez, Ver´onica Romero, A. Toselli, M. Vil- legas, and E. Vidal. Icdar2017 competition on handwritten text recognition on the read dataset.2017 14th IAPR Inter- national Conference on Document Analysis and Recognition (ICDAR), 2017. 2
work page 2017
-
[62]
Sun, Tao Kong, Wenchang Zhang, Chao Yang, and Chunfang Liu
Chuanqi Tan, F. Sun, Tao Kong, Wenchang Zhang, Chao Yang, and Chunfang Liu. A survey on deep transfer learn- ing.International Conference on Artificial Neural Networks,
-
[63]
Is it an i or an l: Test- time Adaptation of Text Line Recognition Models, 2023
Debapriya Tula, Sujoy Paul, Gagan Madan, Peter Garst, Reeve Ingle, and Gaurav Aggarwal. Is it an i or an l: Test- time Adaptation of Text Line Recognition Models, 2023. Version Number: 1. 2
work page 2023
-
[64]
Tobias van der Werff, Maruf A. Dhali, and Lambert Schomaker. Writer adaptation for offline text recognition: An exploration of neural network-based methods, 2023. Ver- sion Number: 1. 2
work page 2023
-
[65]
Bram Vanherle, Vittorio Pippi, S. Cascianelli, Nick Michiels, F. Reeth, and R. Cucchiara. Vatr++: Choose your words wisely for handwritten text generation.arXiv.org, 2024. 2
work page 2024
-
[66]
Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszko- reit, Llion Jones, Aidan N Gomez, Ł ukasz Kaiser, and Illia Polosukhin. Attention is all you need. InAdvances in Neu- ral Information Processing Systems. Curran Associates, Inc.,
-
[67]
Jindong Wang, Cuiling Lan, Chang Liu, Yidong Ouyang, and Tao Qin. Generalizing to unseen domains: A survey on domain generalization.IEEE Transactions on Knowledge and Data Engineering, 2021. 2
work page 2021
-
[68]
Deep visual domain adaptation: A survey.Neurocomputing, 2018
Mei Legam Wang and Weihong Deng. Deep visual domain adaptation: A survey.Neurocomputing, 2018. 2
work page 2018
-
[69]
Zijian Wang, Yadan Luo, Ruihong Qiu, Zi Huang, and Mahsa Baktashmotlagh. Learning to diversify for single do- main generalization.2021 IEEE/CVF International Confer- ence on Computer Vision (ICCV), 2021. 2
work page 2021
-
[70]
Karl R. Weiss, Taghi M. Khoshgoftaar, and Dingding Wang. A survey of transfer learning.Journal of Big Data, 2016. 2
work page 2016
-
[71]
Christoph Wick, Jochen Z ¨ollner, and Tobias Gr ¨uning. Rescoring sequence-to-sequence models for text line recog- nition with ctc-prefixes.arXiv: Computer Vision and Pattern Recognition, 2021. 2
work page 2021
-
[72]
Wick, Jochen Z ¨ollner, and Tobias Gr¨uning
C. Wick, Jochen Z ¨ollner, and Tobias Gr¨uning. Transformer for handwritten text recognition using bidirectional post- decoding.ICDAR, 2021. 2
work page 2021
-
[73]
Garrett Wilson and D. Cook. A survey of unsupervised deep domain adaptation.ACM Transactions on Intelligent Systems and Technology, 2018. 2
work page 2018
-
[74]
Robust fine-tuning of zero-shot models
Mitchell Wortsman, Gabriel Ilharco, Jong Wook Kim, Mike Li, Simon Kornblith, Rebecca Roelofs, Raphael Gon- tijo Lopes, Hannaneh Hajishirzi, Ali Farhadi, Hongseok Namkoong, and Ludwig Schmidt. Robust fine-tuning of zero-shot models. InProceedings of the IEEE conference on Computer Vision and Pattern Recognition, 2022. 2, 3
work page 2022
-
[75]
Resolving interference when merging models.arXiv preprint arXiv:2306.01708, 1, 2023a
Prateek Yadav, Derek Tam, Leshem Choshen, Colin Raffel, and Mohit Bansal. Ties-merging: Resolving interference when merging models.arXiv preprint arXiv:2306.01708, 2023
-
[76]
Model Merging in LLMs, MLLMs, and Beyond: Methods, Theories, Applications and Opportunities
Enneng Yang, Li Shen, Guibing Guo, Xingwei Wang, Xi- aochun Cao, Jie Zhang, and Dacheng Tao. Model merging in llms, mllms, and beyond: Methods, theories, applications and opportunities.arXiv preprint arXiv: 2408.07666, 2024. 2
work page internal anchor Pith review arXiv 2024
-
[77]
Mingkun Yang, Minghui Liao, Pu Lu, Jing Wang, Sheng- gao Zhu, Hualin Luo, Qingzhen Tian, and X. Bai. Read- ing and writing: Discriminative and generative modeling for self-supervised text recognition.ACM Multimedia, 2022. 2
work page 2022
-
[78]
Carbonell, Ruslan Salakhutdinov, and Quoc V
Zhilin Yang, Zihang Dai, Yiming Yang, Jaime G. Carbonell, Ruslan Salakhutdinov, and Quoc V . Le. Xlnet: General- ized autoregressive pretraining for language understanding. arXiv: Computation and Language, 2019. 2
work page 2019
-
[79]
Sequence-To-Sequence Domain Adaptation Network for Robust Text Image Recognition
Yaping Zhang, Shuai Nie, Wenju Liu, Xing Xu, Dongxiang Zhang, and Heng Tao Shen. Sequence-To-Sequence Domain Adaptation Network for Robust Text Image Recognition. In 2019 IEEE/CVF Conference on Computer Vision and Pat- tern Recognition (CVPR), pages 2735–2744, Long Beach, CA, USA, 2019. IEEE. 2
work page 2019
-
[80]
Kaiyang Zhou, Ziwei Liu, Y . Qiao, T. Xiang, and Chen Change Loy. Domain generalization: A survey.IEEE Transactions on Pattern Analysis and Machine Intelligence,
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.