pith. sign in

arxiv: 2606.09479 · v1 · pith:W57JK2JDnew · submitted 2026-06-08 · 💻 cs.CV · cs.DL

Optical Music Recognition for Real-World Manuscripts with Synthetic Data

Pith reviewed 2026-06-27 17:06 UTC · model grok-4.3

classification 💻 cs.CV cs.DL
keywords optical music recognitiondomain adaptationsynthetic datamusic manuscriptsMuNG annotationspiano notationheritage preservation
0
0 comments X

The pith

Domain adaptation on synthetic manuscript images improves optical music recognition on real-world data.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper shows that existing OMR systems trained on born-digital scores fail on the diverse visual domains of real historical manuscripts. In resource-constrained settings where large in-domain labeled sets cannot be created, the authors establish a baseline for complex piano notation. They demonstrate that domain adaptation using synthetic images generated by the Smashcima tool produces significant gains, and that the symbols for synthesis can come from outside the target domain. This reduces the need for expensive fine-grained MuNG annotations while still requiring some real in-domain transcriptions. The result moves OMR toward practical application in libraries and heritage collections.

Core claim

While some direct transcriptions of in-domain data remain essential, domain adaptation using synthetic musical manuscript images brings significant improvement. Furthermore, the symbols used do not need to be in-domain, so the expensive fine-grained annotation can be avoided.

What carries the argument

Domain adaptation performed on synthetic musical manuscript images generated by the Smashcima synthesis tool together with MuNG graph annotations.

If this is right

  • Significant improvement occurs in transcription accuracy on real manuscripts.
  • Out-of-domain symbols can be used for synthesis, avoiding costly fine-grained annotation.
  • A usable baseline is now available for real-world manuscripts containing complex piano notation.
  • OMR moves closer to practical use for preserving musical cultural heritage.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same synthesis-plus-adaptation pattern may apply to other visually diverse historical document types.
  • Combining outputs from multiple synthesis pipelines could further reduce reliance on real annotations.
  • Scaling the method to larger manuscript collections would test whether the gains hold under greater domain variety.
  • The reduced annotation requirement could lower the barrier for smaller institutions to adopt OMR.

Load-bearing premise

The visual statistics of images produced by the Smashcima synthesis tool are close enough to those of real-world manuscripts that adaptation on the synthetic images transfers to the target domain.

What would settle it

Retraining the model on Smashcima synthetics and then measuring no accuracy gain over the non-adapted baseline when both are tested on the same real manuscript collection would falsify the central claim.

Figures

Figures reproduced from arXiv: 2606.09479 by Filip B\'im, Jan Haji\v{c} jr, Ji\v{r}\'i Mayer, Mark\'eta Herz\'anov\'a Vlkov\'a, Martina Dvo\v{r}\'akov\'a, Pavel Pecina, Petr \v{Z}abi\v{c}ka, Samuel \v{S}omorjai, Vojt\v{e}ch Dvo\v{r}\'ak.

Figure 1
Figure 1. Figure 1: Examples of Common Western Music Notation in a library collection. endeavour [51], it is unlikely that building sufficiently large in-domain datasets will be within the means of user institutions any time soon. An obvious next-best solution in the absence of “real” data is synthetic data. A system for OMR data synthesis needs to output two components necessary for supervised learning: (1) the musical conte… view at source ↗
Figure 2
Figure 2. Figure 2: In this paper we experiment with newly available synthetic manuscript images of sheet music to see whether they help with domain adaptation to authentic (real) manuscripts. tructure is now in place to leverage manuscript synthesis for domain adaptation to musical manuscripts, even for diverse collections of resource-constrained memory institutions. However, experimental work to show whether this pathway is… view at source ↗
Figure 3
Figure 3. Figure 3: Example annotation in the MuNG format. On the left, just the symbols are shown; note the accuracy of the symbol masks (e.g., the G-clef or the 8th flag in the first measure of the bottom staff). On the right, the annotation is shown with edges. 3.1 Perception study In a small user study, we have found that images rendered by Smashcima are barely recognisable for humans from authentic scores. We ran a surve… view at source ↗
Figure 4
Figure 4. Figure 4: Different renderings of the same MusicXML file using Smashcima [PITH_FULL_IMAGE:figures/full_fig_p005_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Examples of the variety of out-of-domain datasets [PITH_FULL_IMAGE:figures/full_fig_p007_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Examples of images from the Authentic dataset. The Authentic-MuNG dataset serves as the source of input symbols for creating synthetic data. For simplicity, we exploit the same 59 samples from the training (or fine-tuning) portion of the Authentic dataset, but annotated in the MuNG format. The total number of annotated symbols across all splits is 39,376 but only 15,011 are in the training split, which is … view at source ↗
read the original abstract

Optical Music Recognition (OMR) has seen major progress in model design, with end-to-end methods now capable of recognising notation at all levels of complexity. However, the impact of this progress has been limited by the visual domains of available training datasets, which are largely born-digital. Existing large collections of sheet music in libraries and other heritage institutions contain predominantly manuscripts, whose visual domains are highly diverse and different, so existing OMR systems fail when applied in the real world. These institutions are often resource-constrained, so large in-domain datasets cannot be expected. We provide a first baseline on real-world manuscripts with complex piano notation in the resource-constrained scenario. Using fine-grained music notation graph (MuNG) annotations and the Smashcima synthesis tool, we then show that while some direct transcriptions of in-domain data remain essential, domain adaptation using synthetic musical manuscript images brings significant improvement. Furthermore, the symbols used do not need to be in-domain, so the expensive fine-grained annotation can be avoided. We thus bring OMR closer to one of its stated goals: preserving and promoting musical cultural heritage.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

1 major / 1 minor

Summary. The manuscript claims to provide the first baseline for OMR on real-world manuscripts with complex piano notation in a resource-constrained setting. It uses MuNG annotations and the Smashcima synthesis tool to show that domain adaptation with synthetic musical manuscript images yields significant improvement, and that the symbols in the synthetic data do not need to be in-domain, thereby avoiding expensive fine-grained annotation.

Significance. If the central claims hold, this work is significant because it addresses the gap between born-digital training data and diverse real manuscript domains in OMR, which is critical for applying the technology to cultural heritage preservation. The approach of using synthetic data for adaptation without requiring in-domain symbols is a practical strength that could reduce annotation costs. The provision of a baseline in the resource-constrained scenario is valuable for the field.

major comments (1)
  1. [Results section] Results section: No distribution-distance metric (e.g., FID, MMD, or feature-space divergence) is reported between the Smashcima synthetic training images and the real-world manuscript test set. Without this, it is unclear whether the observed performance gains are due to successful domain adaptation or confounds such as differences in data volume or annotation quality, undermining the attribution of improvement to the synthesis-based adaptation.
minor comments (1)
  1. [Abstract] The abstract states that domain adaptation 'brings significant improvement' but does not include any quantitative results, baselines, or error bars, making it difficult to assess the magnitude of the claimed gains from the summary alone.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the constructive comment on the Results section. We address it point by point below.

read point-by-point responses
  1. Referee: [Results section] Results section: No distribution-distance metric (e.g., FID, MMD, or feature-space divergence) is reported between the Smashcima synthetic training images and the real-world manuscript test set. Without this, it is unclear whether the observed performance gains are due to successful domain adaptation or confounds such as differences in data volume or annotation quality, undermining the attribution of improvement to the synthesis-based adaptation.

    Authors: We acknowledge that no distribution-distance metric is reported. Our experimental design controls for data volume by using identical numbers of training images in each compared condition (born-digital only, synthetic manuscript images only, and mixtures). All conditions are evaluated on the identical fixed real-world test set, so the sole systematic difference is the visual domain of the training data. Annotation quality is likewise controlled in the sense that synthetic data supplies exact ground-truth labels generated by the Smashcima pipeline while real data uses the same MuNG annotation protocol; any performance lift when synthetic manuscript images are added therefore cannot be explained by annotation differences alone. We will revise the manuscript to make these controls explicit and, where feasible, add a feature-space divergence analysis between the synthetic and real image sets. revision: partial

Circularity Check

0 steps flagged

No circularity: empirical gains from external synthesis tool and standard adaptation

full rationale

The paper reports empirical improvements from training OMR models on synthetic manuscript images produced by the Smashcima tool and applying domain adaptation to real-world targets. No equations, fitted parameters, or predictions are described that reduce to the inputs by construction. Claims rest on external synthesis software and conventional adaptation pipelines rather than self-definitional steps or load-bearing self-citations that would force the reported outcome. The derivation chain is therefore self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on the domain-assumption that Smashcima-generated images capture sufficient visual variation to enable transfer; no free parameters or invented entities are mentioned in the abstract.

axioms (1)
  • domain assumption Synthetic images from Smashcima are distributionally close enough to real manuscripts for domain adaptation to succeed
    Invoked when claiming that adaptation on synthetic data produces significant improvement on real data.

pith-pipeline@v0.9.1-grok · 5793 in / 1224 out tokens · 29762 ms · 2026-06-27T17:06:41.266399+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

52 extracted references · 31 canonical work pages · 1 internal anchor

  1. [1]

    International Journal of Multimedia Information Retrieval 12(1), 12 (2023)

    Alfaro-Contreras, M., Iñesta, J.M., Calvo-Zaragoza, J.: Optical music recognition for homophonic scores with neural networks and synthetic music generation. International Journal of Multimedia Information Retrieval 12(1), 12 (2023). https://doi.org/10.1007/s13735-023-00278-5

  2. [2]

    Pattern Recognition Letters123, 1–8 (2019)

    Baró, A., Riba, P., Calvo-Zaragoza, J., Fornés, A.: From optical music recognition to handwritten music recognition: A baseline. Pattern Recognition Letters123, 1–8 (2019). https://doi.org/10.1016/j.patrec. 2019.02.029

  3. [3]

    In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition

    Benigmim, Y., Roy, S., Essid, S., Kalogeiton, V., Lathuilière, S.: One-shot unsupervised domain adaptation with personalized diffusion models. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp. 698–708 (2023)

  4. [4]

    Journal of New Music Research44(3), 169–195 (2015)

    Byrd, D., Simonsen, J.G.: Towards a standard testbed for optical music recognition: Definitions, metrics, and page images. Journal of New Music Research44(3), 169–195 (2015). https://doi.org/10.1080/09298215. 2015.1045424

  5. [5]

    Calvo-Zaragoza, J., Fuentes-Martınez, E., Luna-Barahona, N., Rıos-Vila, A.: Can multimodal large language models read music score images? In: 6th International Workshop on Reading Music Systems. pp. 4–6 (2024)

  6. [6]

    ACM Computing Surveys53(4), 77 (2020)

    Calvo-Zaragoza, J., Hajič jr., J., Pacha, A.: Understanding Optical Music Recognition. ACM Computing Surveys53(4), 77 (2020). https://doi.org/10.1145/3397499

  7. [7]

    In: Coustaty, M., Fornés, A

    Calvo-Zaragoza, J., Martinez-Sevilla, J.C., Penarrubia, C., Rios-Vila, A.: Optical music recognition: Recent advances, current challenges, and future directions. In: Coustaty, M., Fornés, A. (eds.) Document Analysis and Recognition – ICDAR 2023 Workshops. pp. 94–104. Springer Nature Switzerland, Cham (2023)

  8. [8]

    In: 19th International Society for Music Information Retrieval Conference (ISMIR)

    Calvo-Zaragoza, J., Rizo, D.: Camera-primus: Neural end-to-end optical music recognition on realistic mono- phonic scores. In: 19th International Society for Music Information Retrieval Conference (ISMIR). pp. 248–

  9. [9]

    Paris, France (2018), http://ismir2018.ircam.fr/doc/pdfs/33_Paper.pdf

  10. [10]

    Applied Sciences8(4), 606 (2018)

    Calvo-Zaragoza, J., Rizo, D.: End-to-End Neural Optical Music Recognition of Monophonic Scores. Applied Sciences8(4), 606 (2018). https://doi.org/10.3390/app8040606

  11. [11]

    https://doi.org/10.36227/techrxiv.174077177.78767136/v1

    Castellanos, F.J., Gallego, A.J., Fujinaga, I.: Deep learning for optical music recognition: A review (Feb 2025). https://doi.org/10.36227/techrxiv.174077177.78767136/v1

  12. [12]

    In: Proceedings of the 10th International Conference on Digital Libraries for Musicology

    Crawford, T., Lewis, D., Porter, A.: Exploring early vocal music and its lute arrangements: Using f-tempo as a musicological tool. In: Proceedings of the 10th International Conference on Digital Libraries for Musicology. pp. 77–81 (2023)

  13. [13]

    Bibliothek Forschung und Praxis 42(2), 319–323 (Jun 2018)

    Diet, J.: Optical music recognition in der Bayerischen Staatsbibliothek. Bibliothek Forschung und Praxis 42(2), 319–323 (Jun 2018). https://doi.org/10.1515/bfp-2018-0030 12 https://lindat.cz

  14. [14]

    Design Initiative for a 10 TeV pCM Wakefield Collider,

    Dvořák, V., Hajič jr., J., Mayer, J.: Staff layout analysis using the YOLO platform. In: 6th International Workshop on Reading Music Systems (WoRMS). pp. 18–22. Online (2024). https://doi.org/10.48550/arXiv. 2411.15741

  15. [15]

    International Journal on Document Analysis and Recog- nition (IJDAR)15, 243–251 (2011)

    Fornés, A., Dutta, A., Gordo, A., Lladós, J.: CVC-MUSCIMA: A ground truth of handwritten music score images for writer identification and staff removal. International Journal on Document Analysis and Recog- nition (IJDAR)15, 243–251 (2011). https://doi.org/10.1007/s10032-011-0168-2

  16. [16]

    Ahmadi, R

    Fuentes-Martínez, E., Ríos-Vila, A., Martinez-Sevilla, J.C., Rizo, D., Calvo-Zaragoza, J.: Aligned music notation and lyrics transcription. Pattern Recognition170, 112094 (Feb 2026). https://doi.org/10.1016/j. patcog.2025.112094

  17. [17]

    In: 5th International Conference on Digital Libraries for Musicology (DLfM)

    Gotham,M.,Jonas,P.,Bower,B.,Bosworth,W.,Rootham,D.,VanHandel,L.:Scoresofscores:anopenscore project to encode and share sheet music. In: 5th International Conference on Digital Libraries for Musicology (DLfM). p. 87–95. Paris, France (2018). https://doi.org/10.1145/3273024.3273026

  18. [18]

    In: Music Encoding Conference

    Gotham, M.R.H., Jonas, P.: The OpenScore Lieder Corpus. In: Music Encoding Conference. pp. 131–136. Alicante, Spain (2022). https://doi.org/10.17613/1my2-dm23

  19. [19]

    In: 19th International Society for Music Information Retrieval Conference (ISMIR)

    Hajič jr., J., Dorfer, M., Widmer, G., Pecina, P.: Towards full-pipeline handwritten OMR with musical sym- bol detection by u-nets. In: 19th International Society for Music Information Retrieval Conference (ISMIR). pp. 225–232. Paris, France (2018), http://ismir2018.ircam.fr/doc/pdfs/175_Paper.pdf

  20. [20]

    In: Proceedings of the 5th International Conference on Digital Libraries for Musicology

    Hajič jr, J., Kolárová, M., Pacha, A., Calvo-Zaragoza, J.: How current optical music recognition systems are becoming useful for digital libraries. In: Proceedings of the 5th International Conference on Digital Libraries for Musicology. pp. 57–61 (2018)

  21. [21]

    In: 17th International Society for Music Information Retrieval Conference (ISMIR)

    Hajič jr., J., Novotný, J., Pecina, P., Pokorný, J.: Further steps towards a standard testbed for optical music recognition. In: 17th International Society for Music Information Retrieval Conference (ISMIR). pp. 157–163. New York, USA (2016), https://wp.nyu.edu/ismir2016/event/proceedings/

  22. [22]

    In: 14th IAPR International Conference on Document Analysis and Recognition (ICDAR)

    Hajič, jr., J., Pecina, P.: The MUSCIMA++ dataset for handwritten optical music recognition. In: 14th IAPR International Conference on Document Analysis and Recognition (ICDAR). pp. 39–46. Kyoto, Japan (2017). https://doi.org/10.1109/ICDAR.2017.16

  23. [24]

    In: 5th International Workshop on Reading Music Systems (WoRMS)

    Havelka, J., Mayer, J., Pecina, P.: Symbol generation via autoencoders for handwritten music synthesis. In: 5th International Workshop on Reading Music Systems (WoRMS). pp. 20–24. Milan, Italy (2023). https: //doi.org/10.48550/arXiv.2311.04091

  24. [26]

    IEEE Transactions on Audio, Speech and Language Processing pp

    Jung, J., Kim, D., Lee, S., Cho, S., So, H., Bukey, I., Donahue, C., Jeong, D.: U-must: A unified framework for cross-modal translation of score images, symbolic music, and performance audio. IEEE Transactions on Audio, Speech and Language Processing pp. 1–16 (2025). https://doi.org/10.1109/TASLPRO.2025.3648794

  25. [27]

    In: 16th European Conference on Computer Vision (ECCV)

    Kang, L., Riba, P., Wang, Y., Rusiñol, M., Fornés, A., Villegas, M.: Ganwriting: Content-conditioned gener- ation of styled handwritten word images. In: 16th European Conference on Computer Vision (ECCV). pp. 273–289. Glasgow, UK (2020). https://doi.org/10.1007/978-3-030-58592-1_17

  26. [28]

    In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision

    Kang, L., Rusinol, M., Fornés, A., Riba, P., Villegas, M.: Unsupervised writer adaptation for synthetic-to- real handwritten word recognition. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision. pp. 3502–3511 (2020)

  27. [29]

    Quantum Reinforcement Learning for Coordinated Satellite Systems,

    Long, P., Novack, Z., Berg-Kirkpatrick, T., McAuley, J.: PDMX: A large-scale public domain MusicXML dataset for symbolic music processing. In: ICASSP 2025 - 2025 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). p. 1–5. IEEE (Apr 2025). https://doi.org/10.1109/icassp49660. 2025.10890217, http://dx.doi.org/10.1109/ICASSP496...

  28. [30]

    In: 25th Interna- tional Society for Music Information Retrieval (ISMIR) (2025)

    Martinez-Sevilla, J.C., Cerveto-Serrano, J., Luna-Barahona, N., Chapman, G., Sapp, C., Rizo, D., Calvo- Zaragoza, J.: Sheet music benchmark: Standardized optical music recognition evaluation. In: 25th Interna- tional Society for Music Information Retrieval (ISMIR) (2025)

  29. [31]

    In: Proceedings of the 25th International Society for Music Information Retrieval Conference

    Martinez-Sevilla, J.C., Rizo, D., Calvo-Zaragoza, J.: Towards universal optical music recognition: A case study on notation types. In: Proceedings of the 25th International Society for Music Information Retrieval Conference. pp. 914–921. ISMIR (Nov 2024). https://doi.org/10.5281/zenodo.14877479, https://doi.org/10. 5281/zenodo.14877479

  30. [32]

    In: 24th International Society for Music Information Retrieval (ISMIR)

    Matrinez-Sevilla, J., Roselló, A., Rizo, D., Calvo-Zaragoza, J.: On the performance of optical music recogni- tion in the absence of specific training data. In: 24th International Society for Music Information Retrieval (ISMIR). pp. 319–326. Milan, Italy (2023). https://doi.org/10.5281/ZENODO.10265289

  31. [33]

    In: de Luca, E

    Mayer, J., Jebavý, F., Vlková, M., Dvořáková, M., Pecina, P., Hajič jr., J.: MuNG studio: Annotation tool for music notation graph. In: de Luca, E. (ed.) Proceedings of the 12th International Conference on Digital Libraries for Musicology. pp. 114–118. Association for Computing Machinery, Association for Computing Machinery, New York, NY, United States (2025)

  32. [34]

    In: de Luca, E

    Mayer, J., Pecina, P., Hajič jr., J.: Smashcima: Full-page handwritten music document synthesizer. In: de Luca, E. (ed.) Proceedings of the 12th International Conference on Digital Libraries for Musicology. pp. 119–123. Association for Computing Machinery, Association for Computing Machinery, New York, NY, United States (2025)

  33. [35]

    In: 16th International Conference on Document Analysis and Recognition (ICDAR)

    Mayer, J., Pecina, P.: Synthesizing training data for handwritten music recognition. In: 16th International Conference on Document Analysis and Recognition (ICDAR). pp. 626–641. Lausanne, Switzerland (2021). https://doi.org/10.1007/978-3-030-86334-0_41

  34. [36]

    In: 4th International Workshop on Reading Music Systems (WoRMS)

    Mayer, J., Pecina, P.: Obstacles with synthesizing training data for OMR. In: 4th International Workshop on Reading Music Systems (WoRMS). pp. 15–19. Online (2022). https://doi.org/10.48550/arXiv.2211.13285

  35. [37]

    In: 18th International Conference on Document Analysis and Recognition (ICDAR)

    Mayer, J., Straka, M., Hajič jr., J., Pecina, P.: Practical end-to-end optical music recognition for pianoform music. In: 18th International Conference on Document Analysis and Recognition (ICDAR). pp. 55–73. Athens, Greece (2024). https://doi.org/10.1007/978-3-031-70552-6_4

  36. [38]

    In: 20th International Society for Music Information Retrieval (ISMIR)

    Pacha, A., Calvo-Zaragoza, J., Hajič jr., J.: Learning notation graph construction for full-pipeline optical music recognition. In: 20th International Society for Music Information Retrieval (ISMIR). pp. 75–82. Delft, Netherlands (2019). https://doi.org/10.5281/zenodo.3527744

  37. [39]

    IEEE Transactions on Image Processing33, 4245–4260 (2024)

    Peng, D., Ke, Q., Ambikapathi, A., Yazici, Y., Lei, Y., Liu, J.: Unsupervised domain adaptation via domain- adaptive diffusion. IEEE Transactions on Image Processing33, 4245–4260 (2024). https://doi.org/10.1109/ tip.2024.3424985

  38. [40]

    In: 15th International Society for Music Information Retrieval Conference (ISMIR)

    Pugin, L., Zitellini, R., Roland, P.: Verovio: A library for engraving MEI music notation into SVG. In: 15th International Society for Music Information Retrieval Conference (ISMIR). pp. 107–112. Taipei, Taiwan (2014), https://archives.ismir.net/ismir2014/paper/000221.pdf

  39. [41]

    International Journal of Multimedia Information Retrieval14(4) (Oct 2025)

    Rios-Vila, A., Fuentes-Martinez, E., Castellanos, F.J.: An implicit layout-aware transformer for full-page end-to-end optical music recognition. International Journal of Multimedia Information Retrieval14(4) (Oct 2025). https://doi.org/10.1007/s13735-025-00385-5, http://dx.doi.org/10.1007/s13735-025-00385-5

  40. [42]

    In: 4th International Workshop on Reading Music Systems (WoRMS)

    Ríos-Vila,A., Iñesta,J.M., Calvo-Zaragoza,J.: End-to-endfull-pageopticalmusicrecognition ofmonophonic documents via score unfolding. In: 4th International Workshop on Reading Music Systems (WoRMS). pp. 20–24. Online (2022), https://sites.google.com/view/worms2022/proceedings

  41. [43]

    International Journal on Document Analysis and Recognition (IJDAR)26(3), 347–362 (2023)

    Ríos-Vila, A., Rizo, D., Iñesta, J.M., Calvo-Zaragoza, J.: End-to-end optical music recognition for pianoform sheet music. International Journal on Document Analysis and Recognition (IJDAR)26(3), 347–362 (2023). https://doi.org/10.1007/s10032-023-00432-z

  42. [44]

    Roselló, A., Fuentes-Martínez, E., Alfaro-Contreras, M., Rizo, D., Calvo-Zaragoza, J.: Source-Free Domain Adaptation for Optical Music Recognition, p. 3–19. Springer Nature Switzerland (2024). https://doi.org/ 10.1007/978-3-031-70552-6_1, http://dx.doi.org/10.1007/978-3-031-70552-6_1

  43. [45]

    Ríos-Vila, A., Calvo-Zaragoza, J., Paquet, T.: Sheet music transformer: End-to-end optical music recognition beyond monophonic transcription (2024), https://arxiv.org/abs/2402.07596

  44. [46]

    https://doi.org/10.48550/ARXIV.2405.12105, https://arxiv.org/abs/2405

    Ríos-Vila, A., Calvo-Zaragoza, J., Rizo, D., Paquet, T.: End-to-end full-page optical music recognition for pianoform sheet music (2024). https://doi.org/10.48550/ARXIV.2405.12105, https://arxiv.org/abs/2405. 12105

  45. [47]

    2024, 10.1109/BigData62323.2024.10825388

    Shatri, E., Palavala, K.R., Fazekas, G.: Synthesising handwritten music with gans: A comprehensive eval- uation of cyclewgan, progan, and DCGAN. In: Ding, W., Lu, C., Wang, F., Di, L., Wu, K., Huan, J., Nambiar, R., Li, J., Ilievski, F., Baeza-Yates, R., Hu, X. (eds.) IEEE International Conference on Big Data, BigData 2024, Washington, DC, USA, December 1...

  46. [48]

    SN Computer Science5(2) (Feb 2024)

    de Sousa Neto, A.F., Bezerra, B.L.D., de Moura, G.C.D., Toselli, A.H.: Data augmentation for offline handwritten text recognition: A systematic literature review. SN Computer Science5(2) (Feb 2024). https: //doi.org/10.1007/s42979-023-02583-6

  47. [49]

    Steiner, A., Pinto, A.S., Tschannen, M., Keysers, D., Wang, X., Bitton, Y., Gritsenko, A., Minderer, M., Sherbondy, A., Long, S., Qin, S., Ingle, R., Bugliarello, E., Kazemzadeh, S., Mesnard, T., Alabdulmohsin, I., Beyer, L., Zhai, X.: PaliGemma 2: A family of versatile VLMs for transfer (2024), https://arxiv.org/abs/ 2412.03555

  48. [50]

    In: 6th International Workshop on Reading Music Systems (WoRMS)

    Tirupati, N., Shatri, E., Fazekas, G.: Crafting handwritten notations: Towards sheet music generation. In: 6th International Workshop on Reading Music Systems (WoRMS). pp. 50–56. Online (2024). https: //doi.org/10.48550/arXiv.2411.15741

  49. [51]

    International Journal on Document Analysis and Recognition (IJDAR)27, 379–393 (2024)

    Torras, P., Biswas, S., Fornés, A.: A unified representation framework for the evaluation of optical music recognition systems. International Journal on Document Analysis and Recognition (IJDAR)27, 379–393 (2024). https://doi.org/10.1007/s10032-024-00485-8

  50. [52]

    Torras, P., Dvořáková, M., Badal, C., Vlková, M., Asbert, G., Mayer, J., Fornés, A., Hajič, jr., J.: Two journeys: Insights on the annotation of large-scale optical music recognition datasets (2025)

  51. [53]

    In: 6th International Workshop on Reading Music Systems

    Umbreit, J., Schumann, S.: OMR on early music sources at the Bavarian State Library with MuRET– prototyping, automating, scaling. In: 6th International Workshop on Reading Music Systems. p. 43 (2024)

  52. [54]

    In: Proceedings of the Thirty-Fourth International Joint Conference on Artificial Intelligence

    Wang, Y., Wu, S., Hu, J., Du, X., Peng, Y., Huang, Y., Fan, S., Li, X., Yu, F., Sun, M.: NotaGen: advancing musicality in symbolic music generation with large language model training paradigms. In: Proceedings of the Thirty-Fourth International Joint Conference on Artificial Intelligence. pp. 10207–10215 (2025)