Tibetan Predicate Recognition Based on A Semi-supervised Model

Lin Li, Weina Zhao, Zewangkuanzhuo

Tibetan Predicate Recognition Based on A Semi-supervised Model

Download as PDF

DOI: 10.25236/iwmecs.2019.021

Author(s)

Lin Li, Weina Zhao, Zewangkuanzhuo

Corresponding Author

Lin Li

Abstract

The syntactic structure of a Tibetan sentence is largely determined by its predicate, thus whose recognition provides the basis for complete syntactic parsing and other NLP tasks. Tibetan predicate can be divided into two categories: verbal predicate and adjective predicate, both of which consist of a headword and other varied components. As a result, we focus on recognizing Tibetan predicate in this work. Previous approaches for this task often adopt rule-based or supervised machine learning methods. In this paper, we present a semi-supervised Tibetan predicate recognition model that adopts a Tibetan word embedding. Our semi-supervised model applies an unsupervised word embedding as extra-features into a supervised predicate recognition model. The strength of our model is that it uses a pre-trained word embedding and thus minimizes the need for prior knowledge. We use a near state-of-the-art baseline system that based on Conditional Random Fields (CRFs). On this base, we build up a supervised system combined with semantic features. Then, with a large-scale Tibetan corpus, we induce several Tibetan word embeddings. We evaluate these word embeddings on predicate recognition task. The results show varying degree improvements by using word embedding as features. And the F score of the semi-supervised reaches 88.58%, that is, the F score improves 7.92% and 3.10% compared with the baseline system and the supervised system respectively.

Keywords

Tibetan Predicate Recognition, Semi-supervised model, CRFs, Word Embedding