A unified Transformer model with modality-specific tokenization, trained on a new 1300-hour multimodal music dataset, outperforms single-task baselines on optical music recognition and other translations while achieving the first score-image-conditioned audio generation.
Sheet mu- sic transformer++: End-to-end full-page optical music recognition for pianoform sheet music
3 Pith papers cite this work. Polarity classification is still indexing.
representative citing papers
A CNN using ResNet-v2-style residual bottleneck blocks and multi-scale dilated convolutions followed by BiGRU and CTC loss achieves SeER of 7.52% and SyER of 0.45% on the Camera-PrIMuS dataset for optical music recognition.
citing papers explorer
-
Unified Cross-modal Translation of Score Images, Symbolic Music, and Performance Audio
A unified Transformer model with modality-specific tokenization, trained on a new 1300-hour multimodal music dataset, outperforms single-task baselines on optical music recognition and other translations while achieving the first score-image-conditioned audio generation.
-
A High-Accuracy Optical Music Recognition Method Based on Bottleneck Residual Convolutions
A CNN using ResNet-v2-style residual bottleneck blocks and multi-scale dilated convolutions followed by BiGRU and CTC loss achieves SeER of 7.52% and SyER of 0.45% on the Camera-PrIMuS dataset for optical music recognition.
- Direct content-based retrieval from music scores images