Article ID: | iaor2006494 |
Country: | United States |
Volume: | 23 |
Issue: | 2 |
Start Page Number: | 179 |
End Page Number: | 197 |
Publication Date: | Sep 2004 |
Journal: | Journal of Intelligent Information Systems |
Authors: | Petridis V., Fragkou P., Kehagias A. |
Keywords: | programming: dynamic |
In this paper we introduce a dynamic programming algorithm which performs linear text segmentation by global minimization of a segmentation cost function which incorporates two factors: (a) within-segment word similarity and (b) prior information about segment length. We evaluate segmentation accuracy of the algorithm by precision, recall and Beeferman's segmentation metric. On a segmentation task which involves Choi's text collection, the algorithm achieves the best segmentation accuracy so far reported in the literature. The algorithm also achieves high accuracy on a second task which involves previously unused texts.