The Relationship between Syntactic Complexity and Quality of Nmt Outputs: an Exploratory Study
Download as PDF
DOI: 10.25236/icbsis.2020.027
Author(s)
Huang Yueyue, Li Keru
Corresponding Author
Huang Yueyue, Li Keru
Abstract
Propelled by automated translation technology, translators in the current times are in urgent need of more precise guidelines on the ramification of post-editing tasks. Set in English-Chinese pairs, this paper attempts to explore the relationship between syntactic complexity of source text (ST) and quality of Neutral Machine Translation (NMT) output. 40 sentences were extracted from two pieces of legal documents. Three groups were formed based on sentence length: the first group includes 20 sentences with 7-to-36-word length range, the second includes another 20 sentences with longer length ranging from 31 to 68 words, and the third comprises two previous groups as combined to test overall correlation. Syntactic Complexity Analyzer developed by Lu (2010) was adopted to measure the 40 sentences, which were then processed by two versions of free online NMT systems-Google Neural Machine Translation (GNMT) and Systran online translation tools. MT quality evaluation was carried out manually by counting errors at lexical and syntactic level. The overall results suggest a small-to-medium effect size from ST syntactic complexity for NMT quality regardless of different NMT systems, and T-unit-related complexity measurements, mean length of T-unit (MLT) in particular, account for most such correlation. Also, whereas GNMT output quality at lexical level scores significantly higher than that of Systran, error scoring for both systems at syntactic level does not vary significantly.
Keywords
Syntactic complexity, neural machine translation, Quality evaluation