The best way to conference proceedings by Francis Academic Press

Web of Proceedings - Francis Academic Press
Web of Proceedings - Francis Academic Press

Research on Term Weighting Method in the Classification of Internet Public Opinion Security Topics in Colleges and Universities

Download as PDF

DOI: 10.25236/meici.2019.009

Author(s)

Longjia Jia, and Kun Hou

Corresponding Author

Longjia Jia

Abstract

With the rapid development of Sina Weibo, there is an urgent need for classification of internet public opinion security topics in colleges and universities. Term weighting is a strategy that assigns weights to terms in order to improve the performance of text categorization. In this paper, we propose a new category-based term weighting scheme named the probability of relevance frequency (prf), which uses available labeling information to assign appropriate weights to terms. The main idea of prf is that the more concentrated a high-frequency term is in the positive category than in the negative category, the more contribution it makes in separating the positive samples from the negative samples. By replacing word features with category-based features, the dimensionality of the document feature space can be reduced from tens of thousands to a small number of categories. In the experiments, we investigate the effects of prf on Sina Weibo dataset published by college users. The results show that the prf scheme outperforms other term weighting schemes, such as term frequency (tf), term frequency and inverse document frequency (tf*idf), term frequency and relevance frequency (tf*rf).

Keywords

Internet public opinion security, Theme classification, Term weighting, Machine learning