Optical Music Recognition for Real-World Manuscripts with Synthetic Data

Filip B\'im; Jan Haji\v{c} jr; Ji\v{r}\'i Mayer; Mark\'eta Herz\'anov\'a Vlkov\'a; Martina Dvo\v{r}\'akov\'a; Pavel Pecina; Petr \v{Z}abi\v{c}ka; Samuel \v{S}omorjai; Vojt\v{e}ch Dvo\v{r}\'ak

arxiv: 2606.09479 · v1 · pith:W57JK2JDnew · submitted 2026-06-08 · 💻 cs.CV · cs.DL

Optical Music Recognition for Real-World Manuscripts with Synthetic Data

Ji\v{r}\'i Mayer , Martina Dvo\v{r}\'akov\'a , Vojt\v{e}ch Dvo\v{r}\'ak , Mark\'eta Herz\'anov\'a Vlkov\'a , Filip B\'im , Pavel Pecina , Samuel \v{S}omorjai , Petr \v{Z}abi\v{c}ka

show 1 more author

Jan Haji\v{c} jr

This is my paper

Pith reviewed 2026-06-27 17:06 UTC · model grok-4.3

classification 💻 cs.CV cs.DL

keywords optical music recognitiondomain adaptationsynthetic datamusic manuscriptsMuNG annotationspiano notationheritage preservation

0 comments

The pith

Domain adaptation on synthetic manuscript images improves optical music recognition on real-world data.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper shows that existing OMR systems trained on born-digital scores fail on the diverse visual domains of real historical manuscripts. In resource-constrained settings where large in-domain labeled sets cannot be created, the authors establish a baseline for complex piano notation. They demonstrate that domain adaptation using synthetic images generated by the Smashcima tool produces significant gains, and that the symbols for synthesis can come from outside the target domain. This reduces the need for expensive fine-grained MuNG annotations while still requiring some real in-domain transcriptions. The result moves OMR toward practical application in libraries and heritage collections.

Core claim

While some direct transcriptions of in-domain data remain essential, domain adaptation using synthetic musical manuscript images brings significant improvement. Furthermore, the symbols used do not need to be in-domain, so the expensive fine-grained annotation can be avoided.

What carries the argument

Domain adaptation performed on synthetic musical manuscript images generated by the Smashcima synthesis tool together with MuNG graph annotations.

If this is right

Significant improvement occurs in transcription accuracy on real manuscripts.
Out-of-domain symbols can be used for synthesis, avoiding costly fine-grained annotation.
A usable baseline is now available for real-world manuscripts containing complex piano notation.
OMR moves closer to practical use for preserving musical cultural heritage.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same synthesis-plus-adaptation pattern may apply to other visually diverse historical document types.
Combining outputs from multiple synthesis pipelines could further reduce reliance on real annotations.
Scaling the method to larger manuscript collections would test whether the gains hold under greater domain variety.
The reduced annotation requirement could lower the barrier for smaller institutions to adopt OMR.

Load-bearing premise

The visual statistics of images produced by the Smashcima synthesis tool are close enough to those of real-world manuscripts that adaptation on the synthetic images transfers to the target domain.

What would settle it

Retraining the model on Smashcima synthetics and then measuring no accuracy gain over the non-adapted baseline when both are tested on the same real manuscript collection would falsify the central claim.

Figures

Figures reproduced from arXiv: 2606.09479 by Filip B\'im, Jan Haji\v{c} jr, Ji\v{r}\'i Mayer, Mark\'eta Herz\'anov\'a Vlkov\'a, Martina Dvo\v{r}\'akov\'a, Pavel Pecina, Petr \v{Z}abi\v{c}ka, Samuel \v{S}omorjai, Vojt\v{e}ch Dvo\v{r}\'ak.

**Figure 1.** Figure 1: Examples of Common Western Music Notation in a library collection. endeavour [51], it is unlikely that building sufficiently large in-domain datasets will be within the means of user institutions any time soon. An obvious next-best solution in the absence of “real” data is synthetic data. A system for OMR data synthesis needs to output two components necessary for supervised learning: (1) the musical conte… view at source ↗

**Figure 2.** Figure 2: In this paper we experiment with newly available synthetic manuscript images of sheet music to see whether they help with domain adaptation to authentic (real) manuscripts. tructure is now in place to leverage manuscript synthesis for domain adaptation to musical manuscripts, even for diverse collections of resource-constrained memory institutions. However, experimental work to show whether this pathway is… view at source ↗

**Figure 3.** Figure 3: Example annotation in the MuNG format. On the left, just the symbols are shown; note the accuracy of the symbol masks (e.g., the G-clef or the 8th flag in the first measure of the bottom staff). On the right, the annotation is shown with edges. 3.1 Perception study In a small user study, we have found that images rendered by Smashcima are barely recognisable for humans from authentic scores. We ran a surve… view at source ↗

**Figure 4.** Figure 4: Different renderings of the same MusicXML file using Smashcima [PITH_FULL_IMAGE:figures/full_fig_p005_4.png] view at source ↗

**Figure 5.** Figure 5: Examples of the variety of out-of-domain datasets [PITH_FULL_IMAGE:figures/full_fig_p007_5.png] view at source ↗

**Figure 6.** Figure 6: Examples of images from the Authentic dataset. The Authentic-MuNG dataset serves as the source of input symbols for creating synthetic data. For simplicity, we exploit the same 59 samples from the training (or fine-tuning) portion of the Authentic dataset, but annotated in the MuNG format. The total number of annotated symbols across all splits is 39,376 but only 15,011 are in the training split, which is … view at source ↗

read the original abstract

Optical Music Recognition (OMR) has seen major progress in model design, with end-to-end methods now capable of recognising notation at all levels of complexity. However, the impact of this progress has been limited by the visual domains of available training datasets, which are largely born-digital. Existing large collections of sheet music in libraries and other heritage institutions contain predominantly manuscripts, whose visual domains are highly diverse and different, so existing OMR systems fail when applied in the real world. These institutions are often resource-constrained, so large in-domain datasets cannot be expected. We provide a first baseline on real-world manuscripts with complex piano notation in the resource-constrained scenario. Using fine-grained music notation graph (MuNG) annotations and the Smashcima synthesis tool, we then show that while some direct transcriptions of in-domain data remain essential, domain adaptation using synthetic musical manuscript images brings significant improvement. Furthermore, the symbols used do not need to be in-domain, so the expensive fine-grained annotation can be avoided. We thus bring OMR closer to one of its stated goals: preserving and promoting musical cultural heritage.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper gives the first baseline for complex piano OMR on real manuscripts and shows that synthetic data still helps even when the symbols are out-of-domain.

read the letter

The paper's main finding is that domain adaptation using synthetic musical manuscript images leads to significant gains in optical music recognition on real-world documents, and that those synthetics can use symbols that aren't from the target domain.

This supplies the first baseline for complex piano notation on actual manuscripts in a resource-limited setting. The result that out-of-domain symbols still help is a useful practical insight because it means you can avoid the expensive step of annotating the exact symbols from your collection. The authors build this on top of the Smashcima synthesis tool and MuNG graph annotations, which lets them generate training data without needing massive manual effort on the real images.

What stands out is the focus on the constraints that libraries and heritage groups actually face. They test the idea that you don't need perfect domain match in the symbols, which could save time and money.

On the downside, the abstract states the improvement without showing any quantitative results, baselines, or error bars. Without those, it's difficult to see how large the effect is or to rule out confounds like simply training on more data overall. The lack of any reported measure of how similar the synthetic images are to the real ones in visual features also leaves the transfer story a bit open.

This paper is for people in the OMR community who want to move beyond born-digital datasets toward real historical sources. It engages honestly with the problem of limited resources and uses standard domain adaptation ideas in a new setting. The work has enough substance to warrant peer review so the full methods and numbers can be examined.

Referee Report

1 major / 1 minor

Summary. The manuscript claims to provide the first baseline for OMR on real-world manuscripts with complex piano notation in a resource-constrained setting. It uses MuNG annotations and the Smashcima synthesis tool to show that domain adaptation with synthetic musical manuscript images yields significant improvement, and that the symbols in the synthetic data do not need to be in-domain, thereby avoiding expensive fine-grained annotation.

Significance. If the central claims hold, this work is significant because it addresses the gap between born-digital training data and diverse real manuscript domains in OMR, which is critical for applying the technology to cultural heritage preservation. The approach of using synthetic data for adaptation without requiring in-domain symbols is a practical strength that could reduce annotation costs. The provision of a baseline in the resource-constrained scenario is valuable for the field.

major comments (1)

[Results section] Results section: No distribution-distance metric (e.g., FID, MMD, or feature-space divergence) is reported between the Smashcima synthetic training images and the real-world manuscript test set. Without this, it is unclear whether the observed performance gains are due to successful domain adaptation or confounds such as differences in data volume or annotation quality, undermining the attribution of improvement to the synthesis-based adaptation.

minor comments (1)

[Abstract] The abstract states that domain adaptation 'brings significant improvement' but does not include any quantitative results, baselines, or error bars, making it difficult to assess the magnitude of the claimed gains from the summary alone.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the constructive comment on the Results section. We address it point by point below.

read point-by-point responses

Referee: [Results section] Results section: No distribution-distance metric (e.g., FID, MMD, or feature-space divergence) is reported between the Smashcima synthetic training images and the real-world manuscript test set. Without this, it is unclear whether the observed performance gains are due to successful domain adaptation or confounds such as differences in data volume or annotation quality, undermining the attribution of improvement to the synthesis-based adaptation.

Authors: We acknowledge that no distribution-distance metric is reported. Our experimental design controls for data volume by using identical numbers of training images in each compared condition (born-digital only, synthetic manuscript images only, and mixtures). All conditions are evaluated on the identical fixed real-world test set, so the sole systematic difference is the visual domain of the training data. Annotation quality is likewise controlled in the sense that synthetic data supplies exact ground-truth labels generated by the Smashcima pipeline while real data uses the same MuNG annotation protocol; any performance lift when synthetic manuscript images are added therefore cannot be explained by annotation differences alone. We will revise the manuscript to make these controls explicit and, where feasible, add a feature-space divergence analysis between the synthetic and real image sets. revision: partial

Circularity Check

0 steps flagged

No circularity: empirical gains from external synthesis tool and standard adaptation

full rationale

The paper reports empirical improvements from training OMR models on synthetic manuscript images produced by the Smashcima tool and applying domain adaptation to real-world targets. No equations, fitted parameters, or predictions are described that reduce to the inputs by construction. Claims rest on external synthesis software and conventional adaptation pipelines rather than self-definitional steps or load-bearing self-citations that would force the reported outcome. The derivation chain is therefore self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on the domain-assumption that Smashcima-generated images capture sufficient visual variation to enable transfer; no free parameters or invented entities are mentioned in the abstract.

axioms (1)

domain assumption Synthetic images from Smashcima are distributionally close enough to real manuscripts for domain adaptation to succeed
Invoked when claiming that adaptation on synthetic data produces significant improvement on real data.

pith-pipeline@v0.9.1-grok · 5793 in / 1224 out tokens · 29762 ms · 2026-06-27T17:06:41.266399+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

52 extracted references · 31 canonical work pages · 1 internal anchor

[1]

International Journal of Multimedia Information Retrieval 12(1), 12 (2023)

Alfaro-Contreras, M., Iñesta, J.M., Calvo-Zaragoza, J.: Optical music recognition for homophonic scores with neural networks and synthetic music generation. International Journal of Multimedia Information Retrieval 12(1), 12 (2023). https://doi.org/10.1007/s13735-023-00278-5

work page doi:10.1007/s13735-023-00278-5 2023
[2]

Pattern Recognition Letters123, 1–8 (2019)

Baró, A., Riba, P., Calvo-Zaragoza, J., Fornés, A.: From optical music recognition to handwritten music recognition: A baseline. Pattern Recognition Letters123, 1–8 (2019). https://doi.org/10.1016/j.patrec. 2019.02.029

work page doi:10.1016/j.patrec 2019
[3]

In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition

Benigmim, Y., Roy, S., Essid, S., Kalogeiton, V., Lathuilière, S.: One-shot unsupervised domain adaptation with personalized diffusion models. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp. 698–708 (2023)

2023
[4]

Journal of New Music Research44(3), 169–195 (2015)

Byrd, D., Simonsen, J.G.: Towards a standard testbed for optical music recognition: Definitions, metrics, and page images. Journal of New Music Research44(3), 169–195 (2015). https://doi.org/10.1080/09298215. 2015.1045424

work page doi:10.1080/09298215 2015
[5]

Calvo-Zaragoza, J., Fuentes-Martınez, E., Luna-Barahona, N., Rıos-Vila, A.: Can multimodal large language models read music score images? In: 6th International Workshop on Reading Music Systems. pp. 4–6 (2024)

2024
[6]

ACM Computing Surveys53(4), 77 (2020)

Calvo-Zaragoza, J., Hajič jr., J., Pacha, A.: Understanding Optical Music Recognition. ACM Computing Surveys53(4), 77 (2020). https://doi.org/10.1145/3397499

work page doi:10.1145/3397499 2020
[7]

In: Coustaty, M., Fornés, A

Calvo-Zaragoza, J., Martinez-Sevilla, J.C., Penarrubia, C., Rios-Vila, A.: Optical music recognition: Recent advances, current challenges, and future directions. In: Coustaty, M., Fornés, A. (eds.) Document Analysis and Recognition – ICDAR 2023 Workshops. pp. 94–104. Springer Nature Switzerland, Cham (2023)

2023
[8]

In: 19th International Society for Music Information Retrieval Conference (ISMIR)

Calvo-Zaragoza, J., Rizo, D.: Camera-primus: Neural end-to-end optical music recognition on realistic mono- phonic scores. In: 19th International Society for Music Information Retrieval Conference (ISMIR). pp. 248–
[9]

Paris, France (2018), http://ismir2018.ircam.fr/doc/pdfs/33_Paper.pdf

2018
[10]

Applied Sciences8(4), 606 (2018)

Calvo-Zaragoza, J., Rizo, D.: End-to-End Neural Optical Music Recognition of Monophonic Scores. Applied Sciences8(4), 606 (2018). https://doi.org/10.3390/app8040606

work page doi:10.3390/app8040606 2018
[11]

https://doi.org/10.36227/techrxiv.174077177.78767136/v1

Castellanos, F.J., Gallego, A.J., Fujinaga, I.: Deep learning for optical music recognition: A review (Feb 2025). https://doi.org/10.36227/techrxiv.174077177.78767136/v1

work page doi:10.36227/techrxiv.174077177.78767136/v1 2025
[12]

In: Proceedings of the 10th International Conference on Digital Libraries for Musicology

Crawford, T., Lewis, D., Porter, A.: Exploring early vocal music and its lute arrangements: Using f-tempo as a musicological tool. In: Proceedings of the 10th International Conference on Digital Libraries for Musicology. pp. 77–81 (2023)

2023
[13]

Bibliothek Forschung und Praxis 42(2), 319–323 (Jun 2018)

Diet, J.: Optical music recognition in der Bayerischen Staatsbibliothek. Bibliothek Forschung und Praxis 42(2), 319–323 (Jun 2018). https://doi.org/10.1515/bfp-2018-0030 12 https://lindat.cz

work page doi:10.1515/bfp-2018-0030 2018
[14]

Design Initiative for a 10 TeV pCM Wakefield Collider,

Dvořák, V., Hajič jr., J., Mayer, J.: Staff layout analysis using the YOLO platform. In: 6th International Workshop on Reading Music Systems (WoRMS). pp. 18–22. Online (2024). https://doi.org/10.48550/arXiv. 2411.15741

work page internal anchor Pith review doi:10.48550/arxiv 2024
[15]

International Journal on Document Analysis and Recog- nition (IJDAR)15, 243–251 (2011)

Fornés, A., Dutta, A., Gordo, A., Lladós, J.: CVC-MUSCIMA: A ground truth of handwritten music score images for writer identification and staff removal. International Journal on Document Analysis and Recog- nition (IJDAR)15, 243–251 (2011). https://doi.org/10.1007/s10032-011-0168-2

work page doi:10.1007/s10032-011-0168-2 2011
[16]

Ahmadi, R

Fuentes-Martínez, E., Ríos-Vila, A., Martinez-Sevilla, J.C., Rizo, D., Calvo-Zaragoza, J.: Aligned music notation and lyrics transcription. Pattern Recognition170, 112094 (Feb 2026). https://doi.org/10.1016/j. patcog.2025.112094

work page doi:10.1016/j 2026
[17]

In: 5th International Conference on Digital Libraries for Musicology (DLfM)

Gotham,M.,Jonas,P.,Bower,B.,Bosworth,W.,Rootham,D.,VanHandel,L.:Scoresofscores:anopenscore project to encode and share sheet music. In: 5th International Conference on Digital Libraries for Musicology (DLfM). p. 87–95. Paris, France (2018). https://doi.org/10.1145/3273024.3273026

work page doi:10.1145/3273024.3273026 2018
[18]

In: Music Encoding Conference

Gotham, M.R.H., Jonas, P.: The OpenScore Lieder Corpus. In: Music Encoding Conference. pp. 131–136. Alicante, Spain (2022). https://doi.org/10.17613/1my2-dm23

work page doi:10.17613/1my2-dm23 2022
[19]

In: 19th International Society for Music Information Retrieval Conference (ISMIR)

Hajič jr., J., Dorfer, M., Widmer, G., Pecina, P.: Towards full-pipeline handwritten OMR with musical sym- bol detection by u-nets. In: 19th International Society for Music Information Retrieval Conference (ISMIR). pp. 225–232. Paris, France (2018), http://ismir2018.ircam.fr/doc/pdfs/175_Paper.pdf

2018
[20]

In: Proceedings of the 5th International Conference on Digital Libraries for Musicology

Hajič jr, J., Kolárová, M., Pacha, A., Calvo-Zaragoza, J.: How current optical music recognition systems are becoming useful for digital libraries. In: Proceedings of the 5th International Conference on Digital Libraries for Musicology. pp. 57–61 (2018)

2018
[21]

In: 17th International Society for Music Information Retrieval Conference (ISMIR)

Hajič jr., J., Novotný, J., Pecina, P., Pokorný, J.: Further steps towards a standard testbed for optical music recognition. In: 17th International Society for Music Information Retrieval Conference (ISMIR). pp. 157–163. New York, USA (2016), https://wp.nyu.edu/ismir2016/event/proceedings/

2016
[22]

In: 14th IAPR International Conference on Document Analysis and Recognition (ICDAR)

Hajič, jr., J., Pecina, P.: The MUSCIMA++ dataset for handwritten optical music recognition. In: 14th IAPR International Conference on Document Analysis and Recognition (ICDAR). pp. 39–46. Kyoto, Japan (2017). https://doi.org/10.1109/ICDAR.2017.16

work page doi:10.1109/icdar.2017.16 2017
[24]

In: 5th International Workshop on Reading Music Systems (WoRMS)

Havelka, J., Mayer, J., Pecina, P.: Symbol generation via autoencoders for handwritten music synthesis. In: 5th International Workshop on Reading Music Systems (WoRMS). pp. 20–24. Milan, Italy (2023). https: //doi.org/10.48550/arXiv.2311.04091

work page doi:10.48550/arxiv.2311.04091 2023
[26]

IEEE Transactions on Audio, Speech and Language Processing pp

Jung, J., Kim, D., Lee, S., Cho, S., So, H., Bukey, I., Donahue, C., Jeong, D.: U-must: A unified framework for cross-modal translation of score images, symbolic music, and performance audio. IEEE Transactions on Audio, Speech and Language Processing pp. 1–16 (2025). https://doi.org/10.1109/TASLPRO.2025.3648794

work page doi:10.1109/taslpro.2025.3648794 2025
[27]

In: 16th European Conference on Computer Vision (ECCV)

Kang, L., Riba, P., Wang, Y., Rusiñol, M., Fornés, A., Villegas, M.: Ganwriting: Content-conditioned gener- ation of styled handwritten word images. In: 16th European Conference on Computer Vision (ECCV). pp. 273–289. Glasgow, UK (2020). https://doi.org/10.1007/978-3-030-58592-1_17

work page doi:10.1007/978-3-030-58592-1_17 2020
[28]

In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision

Kang, L., Rusinol, M., Fornés, A., Riba, P., Villegas, M.: Unsupervised writer adaptation for synthetic-to- real handwritten word recognition. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision. pp. 3502–3511 (2020)

2020
[29]

Quantum Reinforcement Learning for Coordinated Satellite Systems,

Long, P., Novack, Z., Berg-Kirkpatrick, T., McAuley, J.: PDMX: A large-scale public domain MusicXML dataset for symbolic music processing. In: ICASSP 2025 - 2025 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). p. 1–5. IEEE (Apr 2025). https://doi.org/10.1109/icassp49660. 2025.10890217, http://dx.doi.org/10.1109/ICASSP496...

work page doi:10.1109/icassp49660 2025
[30]

In: 25th Interna- tional Society for Music Information Retrieval (ISMIR) (2025)

Martinez-Sevilla, J.C., Cerveto-Serrano, J., Luna-Barahona, N., Chapman, G., Sapp, C., Rizo, D., Calvo- Zaragoza, J.: Sheet music benchmark: Standardized optical music recognition evaluation. In: 25th Interna- tional Society for Music Information Retrieval (ISMIR) (2025)

2025
[31]

In: Proceedings of the 25th International Society for Music Information Retrieval Conference

Martinez-Sevilla, J.C., Rizo, D., Calvo-Zaragoza, J.: Towards universal optical music recognition: A case study on notation types. In: Proceedings of the 25th International Society for Music Information Retrieval Conference. pp. 914–921. ISMIR (Nov 2024). https://doi.org/10.5281/zenodo.14877479, https://doi.org/10. 5281/zenodo.14877479

work page doi:10.5281/zenodo.14877479 2024
[32]

In: 24th International Society for Music Information Retrieval (ISMIR)

Matrinez-Sevilla, J., Roselló, A., Rizo, D., Calvo-Zaragoza, J.: On the performance of optical music recogni- tion in the absence of specific training data. In: 24th International Society for Music Information Retrieval (ISMIR). pp. 319–326. Milan, Italy (2023). https://doi.org/10.5281/ZENODO.10265289

work page doi:10.5281/zenodo.10265289 2023
[33]

In: de Luca, E

Mayer, J., Jebavý, F., Vlková, M., Dvořáková, M., Pecina, P., Hajič jr., J.: MuNG studio: Annotation tool for music notation graph. In: de Luca, E. (ed.) Proceedings of the 12th International Conference on Digital Libraries for Musicology. pp. 114–118. Association for Computing Machinery, Association for Computing Machinery, New York, NY, United States (2025)

2025
[34]

In: de Luca, E

Mayer, J., Pecina, P., Hajič jr., J.: Smashcima: Full-page handwritten music document synthesizer. In: de Luca, E. (ed.) Proceedings of the 12th International Conference on Digital Libraries for Musicology. pp. 119–123. Association for Computing Machinery, Association for Computing Machinery, New York, NY, United States (2025)

2025
[35]

In: 16th International Conference on Document Analysis and Recognition (ICDAR)

Mayer, J., Pecina, P.: Synthesizing training data for handwritten music recognition. In: 16th International Conference on Document Analysis and Recognition (ICDAR). pp. 626–641. Lausanne, Switzerland (2021). https://doi.org/10.1007/978-3-030-86334-0_41

work page doi:10.1007/978-3-030-86334-0_41 2021
[36]

In: 4th International Workshop on Reading Music Systems (WoRMS)

Mayer, J., Pecina, P.: Obstacles with synthesizing training data for OMR. In: 4th International Workshop on Reading Music Systems (WoRMS). pp. 15–19. Online (2022). https://doi.org/10.48550/arXiv.2211.13285

work page doi:10.48550/arxiv.2211.13285 2022
[37]

In: 18th International Conference on Document Analysis and Recognition (ICDAR)

Mayer, J., Straka, M., Hajič jr., J., Pecina, P.: Practical end-to-end optical music recognition for pianoform music. In: 18th International Conference on Document Analysis and Recognition (ICDAR). pp. 55–73. Athens, Greece (2024). https://doi.org/10.1007/978-3-031-70552-6_4

work page doi:10.1007/978-3-031-70552-6_4 2024
[38]

In: 20th International Society for Music Information Retrieval (ISMIR)

Pacha, A., Calvo-Zaragoza, J., Hajič jr., J.: Learning notation graph construction for full-pipeline optical music recognition. In: 20th International Society for Music Information Retrieval (ISMIR). pp. 75–82. Delft, Netherlands (2019). https://doi.org/10.5281/zenodo.3527744

work page doi:10.5281/zenodo.3527744 2019
[39]

IEEE Transactions on Image Processing33, 4245–4260 (2024)

Peng, D., Ke, Q., Ambikapathi, A., Yazici, Y., Lei, Y., Liu, J.: Unsupervised domain adaptation via domain- adaptive diffusion. IEEE Transactions on Image Processing33, 4245–4260 (2024). https://doi.org/10.1109/ tip.2024.3424985

arXiv 2024
[40]

In: 15th International Society for Music Information Retrieval Conference (ISMIR)

Pugin, L., Zitellini, R., Roland, P.: Verovio: A library for engraving MEI music notation into SVG. In: 15th International Society for Music Information Retrieval Conference (ISMIR). pp. 107–112. Taipei, Taiwan (2014), https://archives.ismir.net/ismir2014/paper/000221.pdf

2014
[41]

International Journal of Multimedia Information Retrieval14(4) (Oct 2025)

Rios-Vila, A., Fuentes-Martinez, E., Castellanos, F.J.: An implicit layout-aware transformer for full-page end-to-end optical music recognition. International Journal of Multimedia Information Retrieval14(4) (Oct 2025). https://doi.org/10.1007/s13735-025-00385-5, http://dx.doi.org/10.1007/s13735-025-00385-5

work page doi:10.1007/s13735-025-00385-5 2025
[42]

In: 4th International Workshop on Reading Music Systems (WoRMS)

Ríos-Vila,A., Iñesta,J.M., Calvo-Zaragoza,J.: End-to-endfull-pageopticalmusicrecognition ofmonophonic documents via score unfolding. In: 4th International Workshop on Reading Music Systems (WoRMS). pp. 20–24. Online (2022), https://sites.google.com/view/worms2022/proceedings

2022
[43]

International Journal on Document Analysis and Recognition (IJDAR)26(3), 347–362 (2023)

Ríos-Vila, A., Rizo, D., Iñesta, J.M., Calvo-Zaragoza, J.: End-to-end optical music recognition for pianoform sheet music. International Journal on Document Analysis and Recognition (IJDAR)26(3), 347–362 (2023). https://doi.org/10.1007/s10032-023-00432-z

work page doi:10.1007/s10032-023-00432-z 2023
[44]

Roselló, A., Fuentes-Martínez, E., Alfaro-Contreras, M., Rizo, D., Calvo-Zaragoza, J.: Source-Free Domain Adaptation for Optical Music Recognition, p. 3–19. Springer Nature Switzerland (2024). https://doi.org/ 10.1007/978-3-031-70552-6_1, http://dx.doi.org/10.1007/978-3-031-70552-6_1

work page doi:10.1007/978-3-031-70552-6_1 2024
[45]

Ríos-Vila, A., Calvo-Zaragoza, J., Paquet, T.: Sheet music transformer: End-to-end optical music recognition beyond monophonic transcription (2024), https://arxiv.org/abs/2402.07596

arXiv 2024
[46]

https://doi.org/10.48550/ARXIV.2405.12105, https://arxiv.org/abs/2405

Ríos-Vila, A., Calvo-Zaragoza, J., Rizo, D., Paquet, T.: End-to-end full-page optical music recognition for pianoform sheet music (2024). https://doi.org/10.48550/ARXIV.2405.12105, https://arxiv.org/abs/2405. 12105

work page doi:10.48550/arxiv.2405.12105 2024
[47]

2024, 10.1109/BigData62323.2024.10825388

Shatri, E., Palavala, K.R., Fazekas, G.: Synthesising handwritten music with gans: A comprehensive eval- uation of cyclewgan, progan, and DCGAN. In: Ding, W., Lu, C., Wang, F., Di, L., Wu, K., Huan, J., Nambiar, R., Li, J., Ilievski, F., Baeza-Yates, R., Hu, X. (eds.) IEEE International Conference on Big Data, BigData 2024, Washington, DC, USA, December 1...

work page doi:10.1109/bigdata62323.2024.10825834 2024
[48]

SN Computer Science5(2) (Feb 2024)

de Sousa Neto, A.F., Bezerra, B.L.D., de Moura, G.C.D., Toselli, A.H.: Data augmentation for offline handwritten text recognition: A systematic literature review. SN Computer Science5(2) (Feb 2024). https: //doi.org/10.1007/s42979-023-02583-6

work page doi:10.1007/s42979-023-02583-6 2024
[49]

Steiner, A., Pinto, A.S., Tschannen, M., Keysers, D., Wang, X., Bitton, Y., Gritsenko, A., Minderer, M., Sherbondy, A., Long, S., Qin, S., Ingle, R., Bugliarello, E., Kazemzadeh, S., Mesnard, T., Alabdulmohsin, I., Beyer, L., Zhai, X.: PaliGemma 2: A family of versatile VLMs for transfer (2024), https://arxiv.org/abs/ 2412.03555

Pith/arXiv arXiv 2024
[50]

In: 6th International Workshop on Reading Music Systems (WoRMS)

Tirupati, N., Shatri, E., Fazekas, G.: Crafting handwritten notations: Towards sheet music generation. In: 6th International Workshop on Reading Music Systems (WoRMS). pp. 50–56. Online (2024). https: //doi.org/10.48550/arXiv.2411.15741

work page doi:10.48550/arxiv.2411.15741 2024
[51]

International Journal on Document Analysis and Recognition (IJDAR)27, 379–393 (2024)

Torras, P., Biswas, S., Fornés, A.: A unified representation framework for the evaluation of optical music recognition systems. International Journal on Document Analysis and Recognition (IJDAR)27, 379–393 (2024). https://doi.org/10.1007/s10032-024-00485-8

work page doi:10.1007/s10032-024-00485-8 2024
[52]

Torras, P., Dvořáková, M., Badal, C., Vlková, M., Asbert, G., Mayer, J., Fornés, A., Hajič, jr., J.: Two journeys: Insights on the annotation of large-scale optical music recognition datasets (2025)

2025
[53]

In: 6th International Workshop on Reading Music Systems

Umbreit, J., Schumann, S.: OMR on early music sources at the Bavarian State Library with MuRET– prototyping, automating, scaling. In: 6th International Workshop on Reading Music Systems. p. 43 (2024)

2024
[54]

In: Proceedings of the Thirty-Fourth International Joint Conference on Artificial Intelligence

Wang, Y., Wu, S., Hu, J., Du, X., Peng, Y., Huang, Y., Fan, S., Li, X., Yu, F., Sun, M.: NotaGen: advancing musicality in symbolic music generation with large language model training paradigms. In: Proceedings of the Thirty-Fourth International Joint Conference on Artificial Intelligence. pp. 10207–10215 (2025)

2025

[1] [1]

International Journal of Multimedia Information Retrieval 12(1), 12 (2023)

Alfaro-Contreras, M., Iñesta, J.M., Calvo-Zaragoza, J.: Optical music recognition for homophonic scores with neural networks and synthetic music generation. International Journal of Multimedia Information Retrieval 12(1), 12 (2023). https://doi.org/10.1007/s13735-023-00278-5

work page doi:10.1007/s13735-023-00278-5 2023

[2] [2]

Pattern Recognition Letters123, 1–8 (2019)

Baró, A., Riba, P., Calvo-Zaragoza, J., Fornés, A.: From optical music recognition to handwritten music recognition: A baseline. Pattern Recognition Letters123, 1–8 (2019). https://doi.org/10.1016/j.patrec. 2019.02.029

work page doi:10.1016/j.patrec 2019

[3] [3]

In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition

Benigmim, Y., Roy, S., Essid, S., Kalogeiton, V., Lathuilière, S.: One-shot unsupervised domain adaptation with personalized diffusion models. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp. 698–708 (2023)

2023

[4] [4]

Journal of New Music Research44(3), 169–195 (2015)

Byrd, D., Simonsen, J.G.: Towards a standard testbed for optical music recognition: Definitions, metrics, and page images. Journal of New Music Research44(3), 169–195 (2015). https://doi.org/10.1080/09298215. 2015.1045424

work page doi:10.1080/09298215 2015

[5] [5]

Calvo-Zaragoza, J., Fuentes-Martınez, E., Luna-Barahona, N., Rıos-Vila, A.: Can multimodal large language models read music score images? In: 6th International Workshop on Reading Music Systems. pp. 4–6 (2024)

2024

[6] [6]

ACM Computing Surveys53(4), 77 (2020)

Calvo-Zaragoza, J., Hajič jr., J., Pacha, A.: Understanding Optical Music Recognition. ACM Computing Surveys53(4), 77 (2020). https://doi.org/10.1145/3397499

work page doi:10.1145/3397499 2020

[7] [7]

In: Coustaty, M., Fornés, A

Calvo-Zaragoza, J., Martinez-Sevilla, J.C., Penarrubia, C., Rios-Vila, A.: Optical music recognition: Recent advances, current challenges, and future directions. In: Coustaty, M., Fornés, A. (eds.) Document Analysis and Recognition – ICDAR 2023 Workshops. pp. 94–104. Springer Nature Switzerland, Cham (2023)

2023

[8] [8]

In: 19th International Society for Music Information Retrieval Conference (ISMIR)

Calvo-Zaragoza, J., Rizo, D.: Camera-primus: Neural end-to-end optical music recognition on realistic mono- phonic scores. In: 19th International Society for Music Information Retrieval Conference (ISMIR). pp. 248–

[9] [9]

Paris, France (2018), http://ismir2018.ircam.fr/doc/pdfs/33_Paper.pdf

2018

[10] [10]

Applied Sciences8(4), 606 (2018)

Calvo-Zaragoza, J., Rizo, D.: End-to-End Neural Optical Music Recognition of Monophonic Scores. Applied Sciences8(4), 606 (2018). https://doi.org/10.3390/app8040606

work page doi:10.3390/app8040606 2018

[11] [11]

https://doi.org/10.36227/techrxiv.174077177.78767136/v1

Castellanos, F.J., Gallego, A.J., Fujinaga, I.: Deep learning for optical music recognition: A review (Feb 2025). https://doi.org/10.36227/techrxiv.174077177.78767136/v1

work page doi:10.36227/techrxiv.174077177.78767136/v1 2025

[12] [12]

In: Proceedings of the 10th International Conference on Digital Libraries for Musicology

Crawford, T., Lewis, D., Porter, A.: Exploring early vocal music and its lute arrangements: Using f-tempo as a musicological tool. In: Proceedings of the 10th International Conference on Digital Libraries for Musicology. pp. 77–81 (2023)

2023

[13] [13]

Bibliothek Forschung und Praxis 42(2), 319–323 (Jun 2018)

Diet, J.: Optical music recognition in der Bayerischen Staatsbibliothek. Bibliothek Forschung und Praxis 42(2), 319–323 (Jun 2018). https://doi.org/10.1515/bfp-2018-0030 12 https://lindat.cz

work page doi:10.1515/bfp-2018-0030 2018

[14] [14]

Design Initiative for a 10 TeV pCM Wakefield Collider,

Dvořák, V., Hajič jr., J., Mayer, J.: Staff layout analysis using the YOLO platform. In: 6th International Workshop on Reading Music Systems (WoRMS). pp. 18–22. Online (2024). https://doi.org/10.48550/arXiv. 2411.15741

work page internal anchor Pith review doi:10.48550/arxiv 2024

[15] [15]

International Journal on Document Analysis and Recog- nition (IJDAR)15, 243–251 (2011)

Fornés, A., Dutta, A., Gordo, A., Lladós, J.: CVC-MUSCIMA: A ground truth of handwritten music score images for writer identification and staff removal. International Journal on Document Analysis and Recog- nition (IJDAR)15, 243–251 (2011). https://doi.org/10.1007/s10032-011-0168-2

work page doi:10.1007/s10032-011-0168-2 2011

[16] [16]

Ahmadi, R

Fuentes-Martínez, E., Ríos-Vila, A., Martinez-Sevilla, J.C., Rizo, D., Calvo-Zaragoza, J.: Aligned music notation and lyrics transcription. Pattern Recognition170, 112094 (Feb 2026). https://doi.org/10.1016/j. patcog.2025.112094

work page doi:10.1016/j 2026

[17] [17]

In: 5th International Conference on Digital Libraries for Musicology (DLfM)

Gotham,M.,Jonas,P.,Bower,B.,Bosworth,W.,Rootham,D.,VanHandel,L.:Scoresofscores:anopenscore project to encode and share sheet music. In: 5th International Conference on Digital Libraries for Musicology (DLfM). p. 87–95. Paris, France (2018). https://doi.org/10.1145/3273024.3273026

work page doi:10.1145/3273024.3273026 2018

[18] [18]

In: Music Encoding Conference

Gotham, M.R.H., Jonas, P.: The OpenScore Lieder Corpus. In: Music Encoding Conference. pp. 131–136. Alicante, Spain (2022). https://doi.org/10.17613/1my2-dm23

work page doi:10.17613/1my2-dm23 2022

[19] [19]

In: 19th International Society for Music Information Retrieval Conference (ISMIR)

Hajič jr., J., Dorfer, M., Widmer, G., Pecina, P.: Towards full-pipeline handwritten OMR with musical sym- bol detection by u-nets. In: 19th International Society for Music Information Retrieval Conference (ISMIR). pp. 225–232. Paris, France (2018), http://ismir2018.ircam.fr/doc/pdfs/175_Paper.pdf

2018

[20] [20]

In: Proceedings of the 5th International Conference on Digital Libraries for Musicology

Hajič jr, J., Kolárová, M., Pacha, A., Calvo-Zaragoza, J.: How current optical music recognition systems are becoming useful for digital libraries. In: Proceedings of the 5th International Conference on Digital Libraries for Musicology. pp. 57–61 (2018)

2018

[21] [21]

In: 17th International Society for Music Information Retrieval Conference (ISMIR)

Hajič jr., J., Novotný, J., Pecina, P., Pokorný, J.: Further steps towards a standard testbed for optical music recognition. In: 17th International Society for Music Information Retrieval Conference (ISMIR). pp. 157–163. New York, USA (2016), https://wp.nyu.edu/ismir2016/event/proceedings/

2016

[22] [22]

In: 14th IAPR International Conference on Document Analysis and Recognition (ICDAR)

Hajič, jr., J., Pecina, P.: The MUSCIMA++ dataset for handwritten optical music recognition. In: 14th IAPR International Conference on Document Analysis and Recognition (ICDAR). pp. 39–46. Kyoto, Japan (2017). https://doi.org/10.1109/ICDAR.2017.16

work page doi:10.1109/icdar.2017.16 2017

[23] [24]

In: 5th International Workshop on Reading Music Systems (WoRMS)

Havelka, J., Mayer, J., Pecina, P.: Symbol generation via autoencoders for handwritten music synthesis. In: 5th International Workshop on Reading Music Systems (WoRMS). pp. 20–24. Milan, Italy (2023). https: //doi.org/10.48550/arXiv.2311.04091

work page doi:10.48550/arxiv.2311.04091 2023

[24] [26]

IEEE Transactions on Audio, Speech and Language Processing pp

Jung, J., Kim, D., Lee, S., Cho, S., So, H., Bukey, I., Donahue, C., Jeong, D.: U-must: A unified framework for cross-modal translation of score images, symbolic music, and performance audio. IEEE Transactions on Audio, Speech and Language Processing pp. 1–16 (2025). https://doi.org/10.1109/TASLPRO.2025.3648794

work page doi:10.1109/taslpro.2025.3648794 2025

[25] [27]

In: 16th European Conference on Computer Vision (ECCV)

Kang, L., Riba, P., Wang, Y., Rusiñol, M., Fornés, A., Villegas, M.: Ganwriting: Content-conditioned gener- ation of styled handwritten word images. In: 16th European Conference on Computer Vision (ECCV). pp. 273–289. Glasgow, UK (2020). https://doi.org/10.1007/978-3-030-58592-1_17

work page doi:10.1007/978-3-030-58592-1_17 2020

[26] [28]

In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision

Kang, L., Rusinol, M., Fornés, A., Riba, P., Villegas, M.: Unsupervised writer adaptation for synthetic-to- real handwritten word recognition. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision. pp. 3502–3511 (2020)

2020

[27] [29]

Quantum Reinforcement Learning for Coordinated Satellite Systems,

Long, P., Novack, Z., Berg-Kirkpatrick, T., McAuley, J.: PDMX: A large-scale public domain MusicXML dataset for symbolic music processing. In: ICASSP 2025 - 2025 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). p. 1–5. IEEE (Apr 2025). https://doi.org/10.1109/icassp49660. 2025.10890217, http://dx.doi.org/10.1109/ICASSP496...

work page doi:10.1109/icassp49660 2025

[28] [30]

In: 25th Interna- tional Society for Music Information Retrieval (ISMIR) (2025)

Martinez-Sevilla, J.C., Cerveto-Serrano, J., Luna-Barahona, N., Chapman, G., Sapp, C., Rizo, D., Calvo- Zaragoza, J.: Sheet music benchmark: Standardized optical music recognition evaluation. In: 25th Interna- tional Society for Music Information Retrieval (ISMIR) (2025)

2025

[29] [31]

In: Proceedings of the 25th International Society for Music Information Retrieval Conference

Martinez-Sevilla, J.C., Rizo, D., Calvo-Zaragoza, J.: Towards universal optical music recognition: A case study on notation types. In: Proceedings of the 25th International Society for Music Information Retrieval Conference. pp. 914–921. ISMIR (Nov 2024). https://doi.org/10.5281/zenodo.14877479, https://doi.org/10. 5281/zenodo.14877479

work page doi:10.5281/zenodo.14877479 2024

[30] [32]

In: 24th International Society for Music Information Retrieval (ISMIR)

Matrinez-Sevilla, J., Roselló, A., Rizo, D., Calvo-Zaragoza, J.: On the performance of optical music recogni- tion in the absence of specific training data. In: 24th International Society for Music Information Retrieval (ISMIR). pp. 319–326. Milan, Italy (2023). https://doi.org/10.5281/ZENODO.10265289

work page doi:10.5281/zenodo.10265289 2023

[31] [33]

In: de Luca, E

Mayer, J., Jebavý, F., Vlková, M., Dvořáková, M., Pecina, P., Hajič jr., J.: MuNG studio: Annotation tool for music notation graph. In: de Luca, E. (ed.) Proceedings of the 12th International Conference on Digital Libraries for Musicology. pp. 114–118. Association for Computing Machinery, Association for Computing Machinery, New York, NY, United States (2025)

2025

[32] [34]

In: de Luca, E

Mayer, J., Pecina, P., Hajič jr., J.: Smashcima: Full-page handwritten music document synthesizer. In: de Luca, E. (ed.) Proceedings of the 12th International Conference on Digital Libraries for Musicology. pp. 119–123. Association for Computing Machinery, Association for Computing Machinery, New York, NY, United States (2025)

2025

[33] [35]

In: 16th International Conference on Document Analysis and Recognition (ICDAR)

Mayer, J., Pecina, P.: Synthesizing training data for handwritten music recognition. In: 16th International Conference on Document Analysis and Recognition (ICDAR). pp. 626–641. Lausanne, Switzerland (2021). https://doi.org/10.1007/978-3-030-86334-0_41

work page doi:10.1007/978-3-030-86334-0_41 2021

[34] [36]

In: 4th International Workshop on Reading Music Systems (WoRMS)

Mayer, J., Pecina, P.: Obstacles with synthesizing training data for OMR. In: 4th International Workshop on Reading Music Systems (WoRMS). pp. 15–19. Online (2022). https://doi.org/10.48550/arXiv.2211.13285

work page doi:10.48550/arxiv.2211.13285 2022

[35] [37]

In: 18th International Conference on Document Analysis and Recognition (ICDAR)

Mayer, J., Straka, M., Hajič jr., J., Pecina, P.: Practical end-to-end optical music recognition for pianoform music. In: 18th International Conference on Document Analysis and Recognition (ICDAR). pp. 55–73. Athens, Greece (2024). https://doi.org/10.1007/978-3-031-70552-6_4

work page doi:10.1007/978-3-031-70552-6_4 2024

[36] [38]

In: 20th International Society for Music Information Retrieval (ISMIR)

Pacha, A., Calvo-Zaragoza, J., Hajič jr., J.: Learning notation graph construction for full-pipeline optical music recognition. In: 20th International Society for Music Information Retrieval (ISMIR). pp. 75–82. Delft, Netherlands (2019). https://doi.org/10.5281/zenodo.3527744

work page doi:10.5281/zenodo.3527744 2019

[37] [39]

IEEE Transactions on Image Processing33, 4245–4260 (2024)

Peng, D., Ke, Q., Ambikapathi, A., Yazici, Y., Lei, Y., Liu, J.: Unsupervised domain adaptation via domain- adaptive diffusion. IEEE Transactions on Image Processing33, 4245–4260 (2024). https://doi.org/10.1109/ tip.2024.3424985

arXiv 2024

[38] [40]

In: 15th International Society for Music Information Retrieval Conference (ISMIR)

Pugin, L., Zitellini, R., Roland, P.: Verovio: A library for engraving MEI music notation into SVG. In: 15th International Society for Music Information Retrieval Conference (ISMIR). pp. 107–112. Taipei, Taiwan (2014), https://archives.ismir.net/ismir2014/paper/000221.pdf

2014

[39] [41]

International Journal of Multimedia Information Retrieval14(4) (Oct 2025)

Rios-Vila, A., Fuentes-Martinez, E., Castellanos, F.J.: An implicit layout-aware transformer for full-page end-to-end optical music recognition. International Journal of Multimedia Information Retrieval14(4) (Oct 2025). https://doi.org/10.1007/s13735-025-00385-5, http://dx.doi.org/10.1007/s13735-025-00385-5

work page doi:10.1007/s13735-025-00385-5 2025

[40] [42]

In: 4th International Workshop on Reading Music Systems (WoRMS)

Ríos-Vila,A., Iñesta,J.M., Calvo-Zaragoza,J.: End-to-endfull-pageopticalmusicrecognition ofmonophonic documents via score unfolding. In: 4th International Workshop on Reading Music Systems (WoRMS). pp. 20–24. Online (2022), https://sites.google.com/view/worms2022/proceedings

2022

[41] [43]

International Journal on Document Analysis and Recognition (IJDAR)26(3), 347–362 (2023)

Ríos-Vila, A., Rizo, D., Iñesta, J.M., Calvo-Zaragoza, J.: End-to-end optical music recognition for pianoform sheet music. International Journal on Document Analysis and Recognition (IJDAR)26(3), 347–362 (2023). https://doi.org/10.1007/s10032-023-00432-z

work page doi:10.1007/s10032-023-00432-z 2023

[42] [44]

Roselló, A., Fuentes-Martínez, E., Alfaro-Contreras, M., Rizo, D., Calvo-Zaragoza, J.: Source-Free Domain Adaptation for Optical Music Recognition, p. 3–19. Springer Nature Switzerland (2024). https://doi.org/ 10.1007/978-3-031-70552-6_1, http://dx.doi.org/10.1007/978-3-031-70552-6_1

work page doi:10.1007/978-3-031-70552-6_1 2024

[43] [45]

Ríos-Vila, A., Calvo-Zaragoza, J., Paquet, T.: Sheet music transformer: End-to-end optical music recognition beyond monophonic transcription (2024), https://arxiv.org/abs/2402.07596

arXiv 2024

[44] [46]

https://doi.org/10.48550/ARXIV.2405.12105, https://arxiv.org/abs/2405

Ríos-Vila, A., Calvo-Zaragoza, J., Rizo, D., Paquet, T.: End-to-end full-page optical music recognition for pianoform sheet music (2024). https://doi.org/10.48550/ARXIV.2405.12105, https://arxiv.org/abs/2405. 12105

work page doi:10.48550/arxiv.2405.12105 2024

[45] [47]

2024, 10.1109/BigData62323.2024.10825388

Shatri, E., Palavala, K.R., Fazekas, G.: Synthesising handwritten music with gans: A comprehensive eval- uation of cyclewgan, progan, and DCGAN. In: Ding, W., Lu, C., Wang, F., Di, L., Wu, K., Huan, J., Nambiar, R., Li, J., Ilievski, F., Baeza-Yates, R., Hu, X. (eds.) IEEE International Conference on Big Data, BigData 2024, Washington, DC, USA, December 1...

work page doi:10.1109/bigdata62323.2024.10825834 2024

[46] [48]

SN Computer Science5(2) (Feb 2024)

de Sousa Neto, A.F., Bezerra, B.L.D., de Moura, G.C.D., Toselli, A.H.: Data augmentation for offline handwritten text recognition: A systematic literature review. SN Computer Science5(2) (Feb 2024). https: //doi.org/10.1007/s42979-023-02583-6

work page doi:10.1007/s42979-023-02583-6 2024

[47] [49]

Steiner, A., Pinto, A.S., Tschannen, M., Keysers, D., Wang, X., Bitton, Y., Gritsenko, A., Minderer, M., Sherbondy, A., Long, S., Qin, S., Ingle, R., Bugliarello, E., Kazemzadeh, S., Mesnard, T., Alabdulmohsin, I., Beyer, L., Zhai, X.: PaliGemma 2: A family of versatile VLMs for transfer (2024), https://arxiv.org/abs/ 2412.03555

Pith/arXiv arXiv 2024

[48] [50]

In: 6th International Workshop on Reading Music Systems (WoRMS)

Tirupati, N., Shatri, E., Fazekas, G.: Crafting handwritten notations: Towards sheet music generation. In: 6th International Workshop on Reading Music Systems (WoRMS). pp. 50–56. Online (2024). https: //doi.org/10.48550/arXiv.2411.15741

work page doi:10.48550/arxiv.2411.15741 2024

[49] [51]

International Journal on Document Analysis and Recognition (IJDAR)27, 379–393 (2024)

Torras, P., Biswas, S., Fornés, A.: A unified representation framework for the evaluation of optical music recognition systems. International Journal on Document Analysis and Recognition (IJDAR)27, 379–393 (2024). https://doi.org/10.1007/s10032-024-00485-8

work page doi:10.1007/s10032-024-00485-8 2024

[50] [52]

Torras, P., Dvořáková, M., Badal, C., Vlková, M., Asbert, G., Mayer, J., Fornés, A., Hajič, jr., J.: Two journeys: Insights on the annotation of large-scale optical music recognition datasets (2025)

2025

[51] [53]

In: 6th International Workshop on Reading Music Systems

Umbreit, J., Schumann, S.: OMR on early music sources at the Bavarian State Library with MuRET– prototyping, automating, scaling. In: 6th International Workshop on Reading Music Systems. p. 43 (2024)

2024

[52] [54]

In: Proceedings of the Thirty-Fourth International Joint Conference on Artificial Intelligence

Wang, Y., Wu, S., Hu, J., Du, X., Peng, Y., Huang, Y., Fan, S., Li, X., Yu, F., Sun, M.: NotaGen: advancing musicality in symbolic music generation with large language model training paradigms. In: Proceedings of the Thirty-Fourth International Joint Conference on Artificial Intelligence. pp. 10207–10215 (2025)

2025