Construction of Implicit Semantic Multi-label Text Fast Clustering Model based on Big Data
Download as PDF
DOI: 10.25236/dpaic.2018.033
Author(s)
Dawei Zhao, Gang Chen
Corresponding Author
Gang Chen
Abstract
Aiming at the conceptual ambiguity and underlying semantic structure of multi-label text classification, an integrated classification method is proposed to combine random forest (RF) algorithm and implicit semantic index (LSI). Through the random segmentation of vocabulary, the diversity of integration is increased, different orthogonal projections of low-dimensional implicit semantic space are obtained, and LSI is performed on the basis of orthogonal projection in low-dimensional space. Random forest can effectively solve the binary classification problem, and implicit semantics reveals the underlying semantic structure of the text. The combination of the two can represent the diversity of the group and the individual accuracy. The experimental results on the Yahoo dataset verify the effectiveness of the proposed method, which is superior to other methods in terms of Hamming loss, coverage, first error and average accuracy.
Keywords
Big Data, Semantic Multi-label, Construction Method