pith. sign in

arxiv: 0708.2971 · v1 · submitted 2007-08-22 · ⚛️ physics.soc-ph · physics.data-an· q-bio.PE

Indo-European languages tree by Levenshtein distance

classification ⚛️ physics.soc-ph physics.data-anq-bio.PE
keywords distancelanguageswordscitebeencloselycognatesconsidering
0
0 comments X
read the original abstract

The evolution of languages closely resembles the evolution of haploid organisms. This similarity has been recently exploited \cite{GA,GJ} to construct language trees. The key point is the definition of a distance among all pairs of languages which is the analogous of a genetic distance. Many methods have been proposed to define these distances, one of this, used by glottochronology, compute distance from the percentage of shared ``cognates''. Cognates are words inferred to have a common historical origin, and subjective judgment plays a relevant role in the identification process. Here we push closer the analogy with evolutionary biology and we introduce a genetic distance among language pairs by considering a renormalized Levenshtein distance among words with same meaning and averaging on all the words contained in a Swadesh list \cite{Sw}. The subjectivity of process is consistently reduced and the reproducibility is highly facilitated. We test our method against the Indo-European group considering fifty different languages and the two hundred words of the Swadesh list for any of them. We find out a tree which closely resembles the one published in \cite{GA} with some significant differences.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.