pith. sign in

arxiv: 2605.18436 · v1 · pith:UE6SQ7IJnew · submitted 2026-05-18 · 💻 cs.CV

A Dataset for the Recognition of Historical and Handwritten Music Scores in Western Notation

Pith reviewed 2026-05-20 10:26 UTC · model grok-4.3

classification 💻 cs.CV
keywords optical music recognitionhandwritten musichistorical scoresdatasetMusicXMLOMR evaluationmusic heritage
0
0 comments X

The pith

MusiCorpus supplies 1,309 annotated pages of historical handwritten music to train recognition systems.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents MusiCorpus, a collection of 1,309 pages of historical sheet music, mostly handwritten, equipped with symbol annotations and MusicXML transcriptions. It targets the shortage of realistic training data that has limited progress in optical music recognition for actual library and archive materials. A sympathetic reader would care because the dataset allows deep learning models to learn from representative examples rather than synthetic or narrow samples. This supports both complete end-to-end recognition pipelines and methods that first locate individual symbols before interpreting them.

Core claim

The authors compiled MusiCorpus with 1,309 pages of primarily handwritten historical music scores drawn from memory institutions, paired with manual symbol annotations and full MusicXML transcriptions. This resource is positioned as the largest handwritten music dataset available and the first to reflect the variety found in real institutional collections, enabling training and evaluation of OMR systems under practical conditions.

What carries the argument

The MusiCorpus dataset of annotated historical music pages, which supplies training and test material for optical music recognition.

If this is right

  • OMR systems gain the ability to train on realistic variations in handwriting, layout, and degradation found in actual collections.
  • Performance of end-to-end versus object-detection OMR approaches can be compared directly on the same data.
  • Digitized musical heritage becomes more likely to be converted into machine-readable and editable formats.
  • Development of new recognition techniques can use this resource as a common benchmark.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Libraries could eventually use models trained on this data to create searchable indexes of their music holdings.
  • The annotations might support downstream tasks such as automatic alignment of scores with audio recordings.
  • Extensions could add labels for elements like lyrics or performance markings to broaden utility.

Load-bearing premise

The selected pages and their manual annotations are accurate, free of systematic bias, and representative of the broader range of historical documents held by libraries and archives.

What would settle it

A test set of historical music pages from additional institutions on which models trained only on MusiCorpus show markedly lower accuracy would indicate that the dataset does not capture sufficient variety.

Figures

Figures reproduced from arXiv: 2605.18436 by Alicia Forn\'es, Carles Badal, Gerard Asbert, Jan Haji\v{c} jr., Ji\v{r}\'i Mayer, Mark\'eta Herzanov\'a Vlkov\'a, Martina Dvo\v{r}\'akov\'a, Pau Torras, Samuel \v{S}omorjai, Vojt\v{e}ch Dvo\v{r}\'ak.

Figure 1
Figure 1. Figure 1: Summary of the MusiCorpus Dataset. It is designed to provide end-to-end transcriptions, layout segmentation, metadata and object-level segmentation records for more than 1.3k pages of music at various levels. Optical Music Recognition (OMR) is the field devoted to the conversion of images of musical documents into computer-processable files [1]. As a field, it shares strong ties to other similar recognitio… view at source ↗
Figure 2
Figure 2. Figure 2: Examples of the diverse typologies of pages present in the dataset. There are orchestral works, particellas and pianoform scores, among others. Some examples are typeset and others handwritten, with diverse degrees of paper conservation quality. is no method that can reliably recognise these highly complex scores; the only attempts that exist still show very large error rates even for relatively simple sco… view at source ↗
Figure 1
Figure 1. Figure 1: 1.1 Datasets on Music using the Common Western Music Notation As shown in [PITH_FULL_IMAGE:figures/full_fig_p004_1.png] view at source ↗
Figure 3
Figure 3. Figure 3: Illustrations of the types of music notation images contained in major OMR datasets. (a) Born-digital (synthetic) CWMN, (b) scanned image of CWMN, (c) hand￾written and binarised CWMN collected for OMR experiments rather than taken from real collections, (d) and (f) mensural handwritten notation, (e) printed mensural nota￾tion, (g) born-digital (synthetic) chant notation, (h) real medieval manuscript of cha… view at source ↗
Figure 4
Figure 4. Figure 4: Overview of the Dolores annotation pipeline. The decisions that were taken in developing this fraction of the dataset try to balance the desire of annotating the maximum number of scores possible while still having primitive-level annotations and transcriptions, trading off primitive￾level annotation accuracy for scale. A summary of the overall process can be found in [PITH_FULL_IMAGE:figures/full_fig_p00… view at source ↗
Figure 9
Figure 9. Figure 9: All resulting MuseScore files are batch-converted to MusicXML, modi [PITH_FULL_IMAGE:figures/full_fig_p010_9.png] view at source ↗
Figure 5
Figure 5. Figure 5: Showcase of the various tools used by the Dolores site. On the left, a screenshot of the Android app used by the musicologists to annotate the symbols on the tran￾scribed score over the original page. On the right, a screenshot of the validation tool used to quickly verify that all annotations are present and follow the desired format, as well as an example of a fully annotated and validated page using the… view at source ↗
Figure 6
Figure 6. Figure 6: OmniOMR site annotation schema: separate tooling for MusicXML transcrip￾tion and highly accurate symbol annotation, including the full MuNG standard, using MuNG Studio. Revisions (not shown) were done for both outputs, as well as cross￾checks against each other. Annotation A summary of the OmniOMR annotation process is shown in [PITH_FULL_IMAGE:figures/full_fig_p012_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: Showcase of the annotations by the OmniOMR site. Top left: high-accuracy symbol annotations; a semi-transparent overlay indicates a each symbol class with color: noteheads purple, stems yellow, beam orange, etc. Bottom left: syntax and precedence edges of the MuNG format. Right: the MuNG Studio annotation web application, with inspection and validation tools. training data are never seen in dev and test, t… view at source ↗
Figure 8
Figure 8. Figure 8: Directory structure of Musicorpus: site- (subset-)level directories, page-level di￾rectories with primary data, and subdivision-level directories with corresponding sub￾sets of data for each page. 3.2 Primary data For each page, the score image is available as image.jpg, alongside the two core ground truth files for the dataset: a MusicXML file named transcrip￾tion.musicxml for page-level end-to-end recogn… view at source ↗
Figure 9
Figure 9. Figure 9: Definition of subdivisions within the dataset using a heterogeneous page as example. In green, the definition of Systems 1 and 2, which corresponds to sets of music that sound in unison. In purple, the definition of Staves 1 and 4, which correspond to lines of music written for instruments that require a single staff. In orange, the definition of Grandstaves 2-3 and 5-6, which are blocks of multi-staff mus… view at source ↗
Figure 10
Figure 10. Figure 10: An example manuscript page with interesting staff, grandstaff, and system composition. It can be viewed online at http://digitalniknihovna.cz/mzk/uuid/uu id:d1769738-290b-4810-90b7-19fd8708d0c7 [PITH_FULL_IMAGE:figures/full_fig_p034_10.png] view at source ↗
Figure 11
Figure 11. Figure 11: The staff 2 image crop from the page in [PITH_FULL_IMAGE:figures/full_fig_p034_11.png] view at source ↗
Figure 12
Figure 12. Figure 12: The grandstaff 6-7 image crop from the page in [PITH_FULL_IMAGE:figures/full_fig_p035_12.png] view at source ↗
Figure 13
Figure 13. Figure 13: An example of a grandstaff that cannot be represented as two separate staves due to the pianoform music notation present on it. Finally the Systems subdivision is again analogous to Grandstaves, except it captures individual systems (staves of all instruments that play together). The system folders should again be composed of staff numbers from above: 8136b106-6283-42c6-99...b7-19fd8708d0c7/ ‘-- Systems/ … view at source ↗
Figure 14
Figure 14. Figure 14: Staff-level annotations stored in the layout.json file [PITH_FULL_IMAGE:figures/full_fig_p047_14.png] view at source ↗
Figure 15
Figure 15. Figure 15: Measure-level annotations stored in the layout.json file [PITH_FULL_IMAGE:figures/full_fig_p047_15.png] view at source ↗
read the original abstract

A large amount of musical heritage has been digitised by memory institutions: libraries, museums, and archives. Nevertheless, the field of Optical Music Recognition (OMR) has struggled with making this music machine-readable, despite advances in deep learning, mostly because no datasets for training systems in realistic conditions were available. The MusiCorpus dataset aims to remedy this situation by providing 1,309 pages of historical sheet music, primarily handwritten, with MusicXML transcriptions and symbol annotations. It is the largest dataset of handwritten music to date and the first dataset containing a realistic and representative sample of musical document collections from memory institutions, suitable for training and evaluating both end-to-end and object detection-based OMR systems and comparing their performance.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper introduces MusiCorpus, a dataset of 1,309 pages of primarily handwritten historical music scores in Western notation, accompanied by MusicXML transcriptions and symbol annotations. It claims to be the largest such dataset to date and the first to supply a realistic and representative sample of musical document collections from memory institutions, enabling training and evaluation of both end-to-end and object-detection OMR systems.

Significance. A well-validated dataset of this scale and diversity would address a documented gap in OMR research by supplying realistic training material for historical documents, potentially improving generalization of deep-learning models beyond synthetic or modern printed scores.

major comments (2)
  1. [Abstract] Abstract: the central claim that the collection constitutes 'a realistic and representative sample of musical document collections from memory institutions' is unsupported by any selection protocol, coverage statistics, or quantitative comparison of metadata distributions (era, degradation, handwriting style, notation complexity) against the broader holdings of the contributing institutions.
  2. [Dataset construction] Dataset construction section: no inter-annotator agreement figures, annotation accuracy metrics, or error analysis on the MusicXML transcriptions and symbol bounding boxes are reported, leaving the reliability of the ground truth for supervised training and benchmarking unquantified.
minor comments (2)
  1. [Dataset statistics] Provide a table summarizing page counts by institution, century, and primary notation type to allow readers to assess diversity at a glance.
  2. [Introduction] Clarify whether the 1,309 pages include any printed scores and, if so, their proportion relative to handwritten material.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their constructive comments and the opportunity to improve the manuscript. We address each major comment below, indicating the revisions we will make.

read point-by-point responses
  1. Referee: [Abstract] Abstract: the central claim that the collection constitutes 'a realistic and representative sample of musical document collections from memory institutions' is unsupported by any selection protocol, coverage statistics, or quantitative comparison of metadata distributions (era, degradation, handwriting style, notation complexity) against the broader holdings of the contributing institutions.

    Authors: We acknowledge that the manuscript does not present a formal selection protocol or quantitative comparisons against institutional holdings. The 1,309 pages were drawn from digitized collections supplied by partner memory institutions, chosen to reflect a range of historical periods, handwriting styles, and physical conditions typical of such archives. In the revised manuscript we will add an explicit description of the selection process, summary statistics on available metadata (era, style, degradation), and a discussion of how the sample relates to the source collections, thereby providing better support for the representativeness claim. revision: yes

  2. Referee: [Dataset construction] Dataset construction section: no inter-annotator agreement figures, annotation accuracy metrics, or error analysis on the MusicXML transcriptions and symbol bounding boxes are reported, leaving the reliability of the ground truth for supervised training and benchmarking unquantified.

    Authors: We agree that explicit reliability metrics would strengthen the paper. Transcriptions were performed by expert musicologists following a written protocol, and bounding-box annotations combined automated pre-detection with manual correction. A comprehensive inter-annotator agreement study was not feasible given the dataset scale and project resources. In revision we will expand the Dataset Construction section with a detailed account of the annotation workflow, quality-control procedures employed, and an error analysis based on the spot-checks that were conducted. revision: partial

Circularity Check

0 steps flagged

No circularity: dataset presentation is self-contained

full rationale

The paper introduces MusiCorpus as a new resource of 1,309 annotated pages drawn from memory institutions. Its core claims rest on direct description of the collection process, size, and intended use for OMR training/evaluation rather than any derivation, fitted parameter, or self-citation chain. No equations, predictions, or uniqueness theorems appear; representativeness is asserted via selection criteria, not reduced to prior outputs by construction. This is a standard dataset paper whose validity can be checked externally against the released data.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

This is a dataset-introduction paper; the contribution consists of data collection and annotation rather than new mathematical axioms, free parameters, or postulated entities.

pith-pipeline@v0.9.0 · 5719 in / 1034 out tokens · 37797 ms · 2026-05-20T10:26:56.424989+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

56 extracted references · 56 canonical work pages · 2 internal anchors

  1. [1]

    Understanding Optical Music Recognition,

    J. Calvo-Zaragoza, J. Hajič Jr., and A. Pacha, “Understanding Optical Music Recognition,”ACM Comput. Surv., vol. 53, pp. 1–35, July 2021

  2. [2]

    A Unified Representation Framework for the Evaluation of Optical Music Recognition Systems,

    P. Torras, S. Biswas, and A. Fornés, “A Unified Representation Framework for the Evaluation of Optical Music Recognition Systems,”IJDAR, vol. 27, pp. 379–393, July 2024

  3. [3]

    Handwritten Historical Music Recognition by Sequence-to-SequencewithAttentionMechanism,

    A. Baró, C. Badal, and A. Fornês, “Handwritten Historical Music Recognition by Sequence-to-SequencewithAttentionMechanism,” in2020 17th International Con- ference on Frontiers in Handwriting Recognition (ICFHR), (Dortmund, Germany), pp. 205–210, IEEE Computer Society, Sept. 2020

  4. [4]

    End-to-end optical music recognition for pianoform sheet music,

    A. Ríos-Vila, D. Rizo, J. M. Iñesta, and J. Calvo-Zaragoza, “End-to-end optical music recognition for pianoform sheet music,”International Journal on Document Analysis and Recognition (IJDAR), vol. 26, pp. 347–362, Sept. 2023

  5. [5]

    Practical end-to-end optical music recognition for pianoform music,

    J. Mayer, M. Straka, J. Hajič jr., and P. Pecina, “Practical end-to-end optical music recognition for pianoform music,” in18th International Conference on Document Analysis and Recognition (ICDAR), (Athens, Greece), pp. 55–73, 2024

  6. [6]

    End-to-End Full-Page Optical Music Recognition for Pianoform Sheet Music,

    A. Ríos-Vila, J. Calvo-Zaragoza, D. Rizo, and T. Paquet, “End-to-End Full-Page Optical Music Recognition for Pianoform Sheet Music,”Int J Comput Vis, vol. 134, p. 49, Jan. 2026. A dataset for OMR on historical and handwritten scores in CWMN 23

  7. [7]

    Technology readiness levels for machine learning systems,

    A. Lavin, C. M. Gilligan-Lee, A. Visnjic, S. Ganju, D. Newman, S. Ganguly, D.Lange,A.G.Baydin,A.Sharma,A.Gibson,S.Zheng,E.P.Xing,C.Mattmann, J. Parr, and Y. Gal, “Technology readiness levels for machine learning systems,” Nature Communications, vol. 13, Oct. 2022

  8. [8]

    Parallel data, tools and interfaces in OPUS,

    J. Tiedemann, “Parallel data, tools and interfaces in OPUS,” inProceedings of the eight international conference on language resources and evaluation (LREC’12), (Istanbul, Turkey), European Language Resources Association (ELRA), 2012

  9. [9]

    End-to-End Neural Optical Music Recognition of Monophonic Scores,

    J. Calvo-Zaragoza and D. Rizo, “End-to-End Neural Optical Music Recognition of Monophonic Scores,”Applied Sciences, vol. 8, p. 606, Apr. 2018

  10. [10]

    Camera-PrIMuS: Neural End-to-End Optical Mu- sic Recognition on Realistic Monophonic Scores,

    J. Calvo-Zaragoza and D. Rizo, “Camera-PrIMuS: Neural End-to-End Optical Mu- sic Recognition on Realistic Monophonic Scores,” in18th International Society for Music Information Retrieval Conference, (Paris, France), pp. 248–255, Interna- tional Society for Music Information Retrieval, 2018

  11. [11]

    Verovio

    L. Pugin, “Verovio.”

  12. [12]

    Learning audio–sheet music correspondences for cross-modal retrieval and piece identification,

    M. Dorfer, J. Hajič jr., A. Arzt, H. Frostel, and G. Widmer, “Learning audio–sheet music correspondences for cross-modal retrieval and piece identification,”Trans- actions of the International Society for Music Information Retrieval, vol. 1, p. 22, Sept. 2018

  13. [13]

    DeepScores - A Dataset for Segmentation, Detection and Classification of Tiny Objects,

    L. Tuggener, I. Elezi, J. Schmidhuber, M. Pelillo, and T. Stadelmann, “DeepScores - A Dataset for Segmentation, Detection and Classification of Tiny Objects,” in 24th International Conference on Pattern Recognition, (Beijing, China), pp. 3704– 3709, IEEE Computer Society, 2018

  14. [14]

    The DeepScoresV2 Dataset and Benchmark for Music Object Detection,

    L. Tuggener, Y. P. Satyawan, A. Pacha, J. Schmidhuber, and T. Stadelmann, “The DeepScoresV2 Dataset and Benchmark for Music Object Detection,” inProceed- ings of the 25th International Conference on Pattern Recognition, (Milan, Italy), pp. 9188–9195, IEEE Computer Society, 2020

  15. [15]

    Scores of scores: an openscore project to encode and share sheet music,

    M. Gotham, P. Jonas, B. Bower, W. Bosworth, D. Rootham, and L. VanHan- del, “Scores of scores: an openscore project to encode and share sheet music,” in 5th International Conference on Digital Libraries for Musicology (DLfM), (Paris, France), p. 87–95, 2018

  16. [16]

    The OpenScore Lieder Corpus,

    M. R. H. Gotham and P. Jonas, “The OpenScore Lieder Corpus,” inMusic Encod- ing Conference, (Alicante, Spain), pp. 131–136, 2022

  17. [17]

    DoReMi: First glance at a universal OMR dataset,

    E. Shatri and G. Fazekas, “DoReMi: First glance at a universal OMR dataset,” inProceedings of the 3rd International Workshop on Reading Music Systems (J. Calvo-Zaragoza and A. Pacha, eds.), (Alicante, Spain), pp. 43–49, International Society for Music Information Retrieval, 2021

  18. [18]

    TheMUSCIMA++DatasetforHandwrittenOpticalMusic Recognition,

    J.HajičandP.Pecina,“TheMUSCIMA++DatasetforHandwrittenOpticalMusic Recognition,” in2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR), vol. 01, pp. 39–46, Nov. 2017

  19. [19]

    Smashcima: Full-Page Handwritten Music Doc- ument Synthesizer,

    J. Mayer, P. Pecina, and J. Hajič, “Smashcima: Full-Page Handwritten Music Doc- ument Synthesizer,” inProceedings of the 12th International Conference on Digital Libraries for Musicology, DLfM ’25, (New York, NY, USA), pp. 119–123, Associ- ation for Computing Machinery, Sept. 2025

  20. [20]

    The CollabScore dataset -towards robust and generalized OMR evaluation

    P. Rigaux, B. Coüasnon, C. Guillotel-Nothmann, F. Guilloux, and A. Lemaitre, “The CollabScore dataset -towards robust and generalized OMR evaluation.” tex.hal_id: hal-05515751 tex.hal_version: v1, Jan. 2026

  21. [21]

    CVC-MUSCIMA: A ground truth of handwritten music score images for writer identification and staff removal,

    A. Fornés, A. Dutta, A. Gordo, and J. Lladós, “CVC-MUSCIMA: A ground truth of handwritten music score images for writer identification and staff removal,” IJDAR, vol. 15, pp. 243–251, Sept. 2012

  22. [22]

    Staff-line removal with selectional auto- encoders,

    A.-J. Gallego and J. Calvo-Zaragoza, “Staff-line removal with selectional auto- encoders,”Expert Systems with Applications, vol. 89, pp. 138–148, 2017. 24 P. Torras, J. Mayeret al

  23. [23]

    Staff layout analysis using the YOLO platform,

    V. Dvořák, J. Hajič jr., and J. Mayer, “Staff layout analysis using the YOLO platform,” in6th International Workshop on Reading Music Systems (WoRMS), (Online), pp. 18–22, 2024

  24. [24]

    From Optical Music Recogni- tion to Handwritten Music Recognition: A baseline,

    A. Baró, P. Riba, J. Calvo-Zaragoza, and A. Fornés, “From Optical Music Recogni- tion to Handwritten Music Recognition: A baseline,”Pattern Recognition Letters, vol. 123, pp. 1–8, May 2019

  25. [25]

    Towards universal optical music recognition: A case study on notation types,

    J. C. Martinez-Sevilla, D. Rizo, and J. Calvo-Zaragoza, “Towards universal optical music recognition: A case study on notation types,” inProceedings of the 25th International Society for Music Information Retrieval Conference, pp. 914–921, ISMIR, Nov. 2024

  26. [26]

    Recognition of Pen-Based Music Notation: The HOMUS Dataset,

    J. Calvo-Zaragoza and J. Oncina, “Recognition of Pen-Based Music Notation: The HOMUS Dataset,” in22nd International Conference on Pattern Recognition, (Stockholm, Sweden), pp. 3038–3043, IEEE Computer Society, 2014

  27. [27]

    The SEILS Dataset: Symbolically Encoded Scores in Modern-Early Notation for Computational Musi- cology,

    E. Parada-Cabaleiro, A. Batliner, A. Baird, and B. Schuller, “The SEILS Dataset: Symbolically Encoded Scores in Modern-Early Notation for Computational Musi- cology,” in18th International Society for Music Information Retrieval Conference, (Suzhou, China), 2017

  28. [28]

    On The Per- formance of Optical Music Recognition in the Absence of Specific Training Data,

    J. C. Martinez-Sevilla, A. Rosello, D. Rizo, and J. Calvo-Zaragoza, “On The Per- formance of Optical Music Recognition in the Absence of Specific Training Data,” inProceedings of the 24th International Society for Music Information Retrieval Conference, pp. 319–326, ISMIR, 2023

  29. [29]

    Two (note) heads are better than one: pen-based multimodal interaction with music scores,

    J. Calvo-Zaragoza, D. Rizo, and J. M. Iñesta, “Two (note) heads are better than one: pen-based multimodal interaction with music scores,” in17th International Society for Music Information Retrieval Conference(J. e. a. Devaney, ed.), (New York City), pp. 509–514, International Society for Music Information Retrieval, 2016

  30. [30]

    Digitization of choirbooks in guatemala,

    M. E. Thomae, J. E. Cumming, and I. Fujinaga, “Digitization of choirbooks in guatemala,” inProceedings of the 9th international conference on digital libraries for musicology, DLfM ’22, (Prague, Czech Republic), pp. 19–26, Association for Computing Machinery, 2022. Number of pages: 8 tex.address: New York, NY, USA

  31. [31]

    Aligned music notation and lyrics transcription,

    E. Fuentes-Martínez, A. Ríos-Vila, J. C. Martinez-Sevilla, D. Rizo, and J. Calvo- Zaragoza, “Aligned music notation and lyrics transcription,”Pattern Recognition, vol. 170, p. 112094, Feb. 2026

  32. [32]

    Optical Music Recognition of Jazz Lead Sheets,

    J. C. Martinez-Sevilla, F. Foscarin, P. Garcia-Iasci, D. Rizo, J. Calvo-Zaragoza, and G. Widmer, “Optical Music Recognition of Jazz Lead Sheets,” Aug. 2025. arXiv:2509.05329 [cs]

  33. [33]

    The KuiSCIMA Dataset for Optical Music Recognition of Ancient Chinese Suzipu Notation,

    T. Repolusk and E. Veas, “The KuiSCIMA Dataset for Optical Music Recognition of Ancient Chinese Suzipu Notation,” inDocument Analysis and Recognition - ICDAR 2024(E. H. Barney Smith, M. Liwicki, and L. Peng, eds.), (Cham), pp. 38– 54, Springer Nature Switzerland, 2024

  34. [34]

    On the automatic recognition of jeongganbo music notation: Dataset and approach,

    D. Kim, D. Han, D. Jeong, and J. J. Valero-Mas, “On the automatic recognition of jeongganbo music notation: Dataset and approach,”J. Comput. Cult. Herit., vol. 18, Sept. 2025

  35. [35]

    On the Integration of Language Mod- els into Sequence to Sequence Architectures for Handwritten Music Recognition,

    P. Torras, A. Baró, L. Kang, and A. Fornés, “On the Integration of Language Mod- els into Sequence to Sequence Architectures for Handwritten Music Recognition,” inProceedings of the 22nd International Society for Music Information Retrieval Conference, (Online), pp. 690–696, ISMIR, Nov. 2021

  36. [36]

    Aligned Music Notation and Lyrics Transcription,

    E. Fuentes-Martínez, A. Ríos-Vila, J. C. Martinez-Sevilla, D. Rizo, and J. Calvo- Zaragoza, “Aligned Music Notation and Lyrics Transcription,” Dec. 2024. arXiv:2412.04217 [cs]. A dataset for OMR on historical and handwritten scores in CWMN 25

  37. [37]

    A holistic approach for image-to-graph: application to optical music recognition,

    C. Garrido-Munoz, A. Rios-Vila, and J. Calvo-Zaragoza, “A holistic approach for image-to-graph: application to optical music recognition,”International Journal on Document Analysis and Recognition (IJDAR), Sept. 2022

  38. [38]

    End-To-End Full-Page Optical Music Recognition of Monophonic Documents via Score Unfolding,

    A. Ríos-Vila, J. M. Iñesta, and J. Calvo-Zaragoza, “End-To-End Full-Page Optical Music Recognition of Monophonic Documents via Score Unfolding,” inProceedings of the 4th International Workshop on Reading Music Systems(J. Calvo-Zaragoza, A. Pacha, and E. Shatri, eds.), (Online), pp. 20–24, 2022

  39. [39]

    CollabScore OMR: A Hybrid System for Music Score Recognition

    A. Lemaitre, B. Coüasnon, and P. Rigaux, “CollabScore OMR: A Hybrid System for Music Score Recognition.” Submitted to ICDAR 2026, Feb. 2026

  40. [40]

    Towards full-pipeline handwrit- ten OMR with musical symbol detection by u-nets,

    J. Hajič jr., M. Dorfer, G. Widmer, and P. Pecina, “Towards full-pipeline handwrit- ten OMR with musical symbol detection by u-nets,” in19th International Society for Music Information Retrieval Conference (ISMIR),(Paris,France),pp.225–232, 2018

  41. [41]

    Learning notation graph construc- tion for full-pipeline optical music recognition,

    A. Pacha, J. Calvo-Zaragoza, and J. H. jr., “Learning notation graph construc- tion for full-pipeline optical music recognition,” inProceedings of the 20th Interna- tional Society for Music Information Retrieval Conference, ISMIR 2019, Delft, The Netherlands, November 4-8, 2019(A. Flexer, G. Peeters, J. Urbano, and A. Volk, eds.), pp. 75–82, 2019

  42. [42]

    Optical Music Recognition: state-of-the-art and open issues,

    A. Rebelo, I. Fujinaga, F. Paszkiewicz, A. R. S. Marcal, C. Guedes, and J. S. Car- doso, “Optical Music Recognition: state-of-the-art and open issues,”International Journal of Multimedia Information Retrieval, vol. 1, p. 173–190, Mar. 2012

  43. [43]

    DMOS, a generic document recognition method: application to table structure analysis in a general and in a specific way,

    B. Coüasnon, “DMOS, a generic document recognition method: application to table structure analysis in a general and in a specific way,”International Journal of Document Analysis and Recognition (IJDAR), vol. 8, pp. 111–122, June 2006

  44. [44]

    Towards Musicdiff : A Foundation for Improved Optical Music Recognition Using Multiple Recognizers,

    I. Knopke and D. Byrd, “Towards Musicdiff : A Foundation for Improved Optical Music Recognition Using Multiple Recognizers,” in8th International Conference on Music Information Retrieval, (Vienna, Austria), pp. 123–126, 2007

  45. [45]

    Towards a Standard Testbed for Optical Music Recognition: Definitions, Metrics, and Page Images,

    D. Byrd and J. G. Simonsen, “Towards a Standard Testbed for Optical Music Recognition: Definitions, Metrics, and Page Images,”Journal of New Music Re- search, vol. 44, pp. 169–195, July 2015

  46. [46]

    Further steps towards a stan- dard testbed for optical music recognition,

    J. Hajič jr., J. Novotný, P. Pecina, and J. Pokorný, “Further steps towards a stan- dard testbed for optical music recognition,” in17th International Society for Music Information Retrieval Conference (ISMIR), (New York, USA), pp. 157–163, 2016

  47. [47]

    Sheet Music Benchmark: Standardized Optical Music Recognition Evaluation,

    J. C. Martinez-Sevilla, J. Cerveto-Serrano, N. Luna, G. Chapman, C. Sapp, D. Rizo, and J. Calvo-Zaragoza, “Sheet Music Benchmark: Standardized Optical Music Recognition Evaluation,” June 2025

  48. [48]

    Microsoft COCO: Common Objects in Context,

    T.-Y. Lin, M. Maire, S. Belongie, J. Hays, P. Perona, D. Ramanan, P. Dollár, and C. L. Zitnick, “Microsoft COCO: Common Objects in Context,” inComputer Vision – ECCV 2014(D. Fleet, T. Pajdla, B. Schiele, and T. Tuytelaars, eds.), (Cham), pp. 740–755, Springer International Publishing, 2014

  49. [49]

    MuNG Studio: Annotation Tool for Music Notation Graph,

    J. Mayer, F. Jebavý, M. Herzánová Vlková, M. Dvořáková, P. Pecina, and J. Hajič, “MuNG Studio: Annotation Tool for Music Notation Graph,” inProceedings of the 12th International Conference on Digital Libraries for Musicology, DLfM ’25, (New York, NY, USA), pp. 114–118, Association for Computing Machinery, Sept. 2025

  50. [50]

    The art of teaching computers: the simssa opti- cal music recognition workflow system,

    I. Fujinaga and G. Vigliensoni, “The art of teaching computers: the simssa opti- cal music recognition workflow system,” in2019 27th European Signal Processing Conference (EUSIPCO), pp. 1–5, IEEE, 2019

  51. [51]

    Optical Medieval Music Recognition — A Complete Pipeline for Historic Chants.,

    A. Hartelt, T. Eipert, and F. Puppe, “Optical Medieval Music Recognition — A Complete Pipeline for Historic Chants.,”Applied Sciences (2076-3417), vol. 14, no. 16, 2024. 26 P. Torras, J. Mayeret al

  52. [52]

    "Quality

    A. Alaei, R. Raveaux, D. Conte, and B. Stantic, “"Quality" vs. "Readability" in Document Images: Statistical Analysis of Human Perception,” in2018 13th IAPR International Workshop on Document Analysis Systems (DAS), (Vienna), pp. 363– 368, IEEE, Apr. 2018

  53. [53]

    YOLOv12: Attention-Centric Real-Time Object Detectors

    Y. Tian, Q. Ye, and D. Doermann, “YOLOv12: Attention-Centric Real-Time Ob- ject Detectors,” Feb. 2025. arXiv:2502.12524 [cs]

  54. [54]

    A baseline for general music object detection with deep learning,

    A. Pacha, J. Hajič jr., and J. Calvo-Zaragoza, “A baseline for general music object detection with deep learning,”Applied Sciences, vol. 8, no. 9, pp. 1488–1508, 2018

  55. [55]

    HTR-VT: Handwritten text recognition with vision transformer,

    Y. Li, D. Chen, T. Tang, and X. Shen, “HTR-VT: Handwritten text recognition with vision transformer,”Pattern Recognition, vol. 158, p. 110967, Feb. 2025

  56. [56]

    Attention Is All You Need

    A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, L. Kaiser, and I. Polosukhin, “Attention Is All You Need,” Dec. 2017. arXiv:1706.03762 [cs] version: 5. A dataset for OMR on historical and handwritten scores in CWMN 27 A Musicorpus data record detailed documentation MusiCorpus defines a set of guidelines on how to structure an Optic...