Parallel Design of Hash and K-means Algorithm in the Context of Big Data

Xing Lei, Zhang Xiang, Guo Zhengkun, Guo Fuwang

Parallel Design of Hash and K-means Algorithm in the Context of Big Data

Download as PDF

DOI: 10.25236/mmmce.2019.129

Author(s)

Xing Lei, Zhang Xiang, Guo Zhengkun, Guo Fuwang

Corresponding Author

Xing Lei

Abstract

In order to further improve the efficiency of K - means algorithm on the large-scale data clustering, this paper conducts deep analysis and research on the optimization of K - means clustering algorithm and proposes a selected program of initial clustering center based on Hash algorithm, hashing mass high-dimensional data to a compression space to excavate the clustering relations, so as to make the selected initial clustering center tend to be convergent state as far as possible and to greatly reduce the number of iterations of clustering, improved the accuracy of clustering.

Keywords

K-means, Hash, mass data