Construction of Implicit Semantic Multi-label Text Fast Clustering Model based on Big Data

Dawei Zhao, Gang Chen

Construction of Implicit Semantic Multi-label Text Fast Clustering Model based on Big Data

Download as PDF

DOI: 10.25236/dpaic.2018.033

Author(s)

Dawei Zhao, Gang Chen

Corresponding Author

Gang Chen

Abstract

Aiming at the conceptual ambiguity and underlying semantic structure of multi-label text classification, an integrated classification method is proposed to combine random forest (RF) algorithm and implicit semantic index (LSI). Through the random segmentation of vocabulary, the diversity of integration is increased, different orthogonal projections of low-dimensional implicit semantic space are obtained, and LSI is performed on the basis of orthogonal projection in low-dimensional space. Random forest can effectively solve the binary classification problem, and implicit semantics reveals the underlying semantic structure of the text. The combination of the two can represent the diversity of the group and the individual accuracy. The experimental results on the Yahoo dataset verify the effectiveness of the proposed method, which is superior to other methods in terms of Hamming loss, coverage, first error and average accuracy.

Keywords

Big Data, Semantic Multi-label, Construction Method