Journal of Applied Science and Engineering

Published by Tamkang University Press


Fan Sun1 , Zijiao Chen1 , and Jingrui Pei This email address is being protected from spambots. You need JavaScript enabled to view it.2,3

1School of Chinese Studies, Dalian University of Foreign Languages Dalian,116044, China
2Chinese College of Minority Languages and Literature, Minzu University of China Beijing, 100000, China
3Software College, Shenyang Normal University Shenyang, 110034, China


Received: August 1, 2020
Accepted: November 16, 2020
Publication Date: April 1, 2020

Chinese automatic word segmentation is the premise of Chinese information processing, which is widely used in Chinese full-text retrieval, Chinese automatic full-text translation, Chinese text-to-speech conversion (TTS) and other fields. A dictionary plays an important role in Chinese word segmentation. The advantages and disadvantages of the word segmentation mechanism directly affect the speed and efficiency of Chinese word segmentation. Therefore, we propose a deep learning method for Chinese word segmentation. First, a separable convolution bidirectional long and short-term memory condition random field word segmentation model with feature points containing dictionary features is constructed. The model parameters are obtained by training on the existing word segmentation corpus. Then, the software engineering field text is used as the small-scale word segmentation training corpus. The word segmentation model of general corpus is fine-tuned. The experimental results show that the transfer learning reduces the iteration times of the domain segmentation model. Meanwhile, compared with other Chinese word segmentation models, the proposed model can reduce the corpus labeling time in training process and realize the cross-domain transfer learning of word segmentation model.

Keywords: Chinese word segmentation; deep learning; dictionary feature; bidirectional long and short-term memory; condition random field; separable convolution; Feature point


