| Article ID: | iaor2006494 |
| Country: | United States |
| Volume: | 23 |
| Issue: | 2 |
| Start Page Number: | 179 |
| End Page Number: | 197 |
| Publication Date: | Sep 2004 |
| Journal: | Journal of Intelligent Information Systems |
| Authors: | Petridis V., Fragkou P., Kehagias A. |
| Keywords: | programming: dynamic |
In this paper we introduce a dynamic programming algorithm which performs linear text segmentation by global minimization of a segmentation cost function which incorporates two factors: (a) within-segment word similarity and (b) prior information about segment length. We evaluate segmentation accuracy of the algorithm by precision, recall and Beeferman's segmentation metric. On a segmentation task which involves Choi's text collection, the algorithm achieves the best segmentation accuracy so far reported in the literature. The algorithm also achieves high accuracy on a second task which involves previously unused texts.