Building a Language-Independent Discourse Parser using Universal Networking Language

Building a Language-Independent Discourse Parser using Universal Networking Language

0.00 Avg rating0 Votes
Article ID: iaor201528955
Volume: 31
Issue: 4
Start Page Number: 593
End Page Number: 618
Publication Date: Nov 2015
Journal: Computational Intelligence
Authors: ,
Keywords: information theory, networks
Abstract:

Discourse parsing has become an inevitable task to process information in the natural language processing arena. Parsing complex discourse structures beyond the sentence level is a significant challenge. This article proposes a discourse parser that constructs rhetorical structure (RS) trees to identify such complex discourse structures. Unlike previous parsers that construct RS trees using lexical features, syntactic features and cue phrases, the proposed discourse parser constructs RS trees using high‐level semantic features inherited from the Universal Networking Language (UNL). The UNL also adds a language‐independent quality to the parser, because the UNL represents texts in a language‐independent manner. The parser uses a naive Bayes probabilistic classifier to label discourse relations. It has been tested using 500 Tamil‐language documents and the Rhetorical Structure Theory Discourse Treebank, which comprises 21 English‐language documents. The performance of the naive Bayes classifier has been compared with that of the support vector machine (SVM) classifier, which has been used in the earlier approaches to build a discourse parser. It is seen that the naive Bayes probabilistic classifier is better suited for discourse relation labeling when compared with the SVM classifier, in terms of training time, testing time, and accuracy.

Reviews

Required fields are marked *. Your email address will not be published.