The best way to conference proceedings by Francis Academic Press

Web of Proceedings - Francis Academic Press
Web of Proceedings - Francis Academic Press

Optimal Document Representation Strategy for Supervised Term Weighting Schemes in Automatic Text Categorization

Download as PDF

DOI: 10.25236/iciss.2019.026

Author(s)

Longjia Jia, and Bangzuo Zhang

Corresponding Author

Longjia Jia

Abstract

Term weighting is a strategy that assigns weights to terms in order to improve the performance of text categorization. In this paper, we propose a document representation strategy for supervised text classification named the optimal document representation strategy for supervised term weighting schemes (ODRS), which can get the optimal term weighting vector in many different vectors. The main idea of ODRS is that by proposing optimal function and introducing the importance of categories and terms on training set to find the optimal parameters and then this optimal model will be applied to test set. In the experiments, we investigate the effects of ODRS on the 20 Newsgroups and Reuters21578 datasets using the SVM as classifier. The results show that the ODRS outperforms other text representation strategy schemes, such as Document Max, Document Two Max and global policy.

Keywords

Optimal document representation, Term weighting, Text Categorization, Machine learning