Parallel Design of Hash and K-means Algorithm in the Context of Big Data
Download as PDF
Xing Lei, Zhang Xiang, Guo Zhengkun, Guo Fuwang
In order to further improve the efficiency of K - means algorithm on the large-scale data clustering, this paper conducts deep analysis and research on the optimization of K - means clustering algorithm and proposes a selected program of initial clustering center based on Hash algorithm, hashing mass high-dimensional data to a compression space to excavate the clustering relations, so as to make the selected initial clustering center tend to be convergent state as far as possible and to greatly reduce the number of iterations of clustering, improved the accuracy of clustering.
K-means, Hash, mass data