Article ID: | iaor2016614 |
Volume: | 33 |
Issue: | 1 |
Start Page Number: | 92 |
End Page Number: | 106 |
Publication Date: | Feb 2016 |
Journal: | Expert Systems |
Authors: | Jadidinejad Amir H, Mahmoudi Fariborz, Meybodi M R |
Keywords: | information, networks |
A proper semantic representation of textual information underlies many natural language processing tasks. In this paper, a novel semantic annotator is presented to generate conceptual features for text documents. A comprehensive conceptual network is automatically constructed with the aid of Wikipedia that has been represented as a Markov chain. Furthermore, semantic annotator gets a fragment of natural language text and initiates a random walk to generate conceptual features that represent topical semantic of the input text. The generated conceptual features are applicable to many natural language processing tasks where the input is textual information and the output is a decision based on its context. Consequently, the effectiveness of the generated features is evaluated in the task of document clustering and classification. Empirical results demonstrate that representing text using conceptual features and considering the relations between concepts can significantly improve not only the bag of words representation but also other state‐of‐the‐art approaches.