Article ID: | iaor20173497 |
Volume: | 33 |
Issue: | 3 |
Start Page Number: | 428 |
End Page Number: | 447 |
Publication Date: | Aug 2017 |
Journal: | Computational Intelligence |
Authors: | Zhang Min, Zhang Yue, Chen Wenliang, Zhu Muhua, Zhu Jingbo |
Keywords: | information, datamining |
Shift‐reduce parsing enjoys the property of efficiency because of the use of efficient parsing algorithms like greedy/deterministic search and beam search. In addition, shift‐reduce parsing is much simpler and easy to implement compared with other parsing algorithms. In this article, we explore constituent boundary information to improve the performance of shift‐reduce phrase‐structure parsing. In previous work, constituent boundary information has been used to speed up chart parsers successfully. However, whether it is useful for improving parsing accuracy has not been investigated. We propose two different models to capture constituent boundary information, based on which two sets of novel features are designed for a shift‐reduce parser. The first model is a boundary prediction model that uses a classifier to predict the boundaries of constituents. We use automatically parsed data to train the classifier. The second one is a Tree Likelihood Model that measures the validity of a constituent by its likelihood which is calculated on automatically parsed data. Experimental results show that our proposed method outperforms a strong baseline by 0.8% and 1.6% in F‐score on English and Chinese data, respectively, achieving the competitive parsing accuracies on Chinese (84.8%) and English (90.8%). To our knowledge, this is the first time for shift‐reduce phrase‐structure parsing to advance the state‐of‐the‐art with constituent boundary information.