SiDiaC is a new historical corpus of Sinhala literary works spanning the 5th to 20th centuries, constructed via OCR digitization, orthography modernization, and genre-based annotation.
Survey on Publicly Available Sinhala Natural Language Processing Tools and Research,
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
fields
cs.CL 2representative citing papers
Empirical analysis of Sri Lankan tourism reviews finds 18.6% incongruence between star ratings and text sentiment, varying by venue type and linked to reviewer expertise, length, and temporal factors.
citing papers explorer
-
SiDiaC: Sinhala Diachronic Corpus
SiDiaC is a new historical corpus of Sinhala literary works spanning the 5th to 20th centuries, constructed via OCR digitization, orthography modernization, and genre-based annotation.