Research on Network Public Opinion Text Representation Strategy for Subject Classification——Taking Sina Weibo as an Example
Download as PDF
Longjia Jia, and Kun Hou
In this paper, we propose a text representation strategy, which solves the problem that term weights of Sina Weibo topic classification research are not suitable and the model explanatory is not strong. In the proposed document representation strategy, term weighting vector is constructed by taking pre-selection prediction. On training set, the effectiveness of term weighting vector is evaluated by cross-validation, and term weighting vector corresponding to the best evaluation result is selected as term weighting vector of test set. Compared with traditional W-Max, D-Max and D-TMax methods, the proposed method increases 4.25%, 5.03% and 7.10% respectively in MicroF1. In classification of public opinion topics, the proposed method can construct a more explicit term weighting vector for data set. It can enhance the interpretability of the model, and improve the classification performance.
Internet public opinion security, Theme classification, Text representation strategy, Machine learning