kMeans

Genetic marker data clustering made easy. KMeans clustering divides a number of objects into a a priori assigned number (k) of groups in such a way that the amonggroups Sum of Squares is maximised. This program can perform the clustering on genetic marker data either based on the allele frequencies or using an Analysis of Molecular Variance.
The method uses a pairwise matrix of distances between all observations. Given a certain clustering into k groups, for every group the withingroup Sum of Squares is calculated by taking the sum of the squared withingroup distances. When the distances are Euclidean, this is equivalent to calculating the sum of the squared distances from the points to the group's centroid. The Error Sum of Squares is then found by summing over groups. The amount of variance explained by the grouping is then calculated by dividing the Error Sum of Squares by the Total Sum of Squares.
