Full-text Retrieval Model based on Term Frequency and Position Weighting

Zhang Rui, Xie Puzhao, Sun Rui, Yang Luchang, Jiang Feiyue

Full-text Retrieval Model based on Term Frequency and Position Weighting

Download as PDF

DOI: 10.25236/icsemc.2017.06

Author(s)

Zhang Rui, Xie Puzhao, Sun Rui, Yang Luchang, Jiang Feiyue

Corresponding Author

Zhang Rui

Abstract

Nowadays mainstream literature retrieval system is based on the search terms, by extracting the document title, keyword, summary of literature to accomplish the function of retrieval.In this article, a full-text retrieval model based on lucene in computer science is purposed.The word frequency weighted algorithm is adopted to set the weighting coefficients in fields of the documents.The computer science literature's attributes are introduced into the evaluation model as an important indicator of the value of literature.The multifactor influence model employs simulated annealing algorithm to fit the best weight coefficients of each factor, making up the defect that Lucene default retrieval method can only retrieve byword frequency. The experimental data were divided into training set and the test set,whose emements are from CNKI.Weights of each field are trained by carrying out feature extraction.Then the model is validated by the test set consisting of a fixed number of high-quality document and inferior ones. The experimental results show that the trained model has higher precision in selecting high-quality documents.

Keywords

Lucene, Full-text search, Word frequency position weighting, Computer science, Multi- factor influenc emodel, Simulated annealing.