Article ID: | iaor19952343 |
Country: | Switzerland |
Volume: | 55 |
Issue: | 1 |
Start Page Number: | 323 |
End Page Number: | 344 |
Publication Date: | May 1995 |
Journal: | Annals of Operations Research |
Authors: | Gale William A., Church Kenneth W., Yarowsky David |
Discrimination decisions arise in many natural language processing tasks. Three classical tasks are discriminating texts by their authors (author identification), discriminating documents by their relevance to some query (information retrieval), and discriminating multi-meaning words by their meanings (sense discrimination). Many other discrimination tasks arise regularly, such as determining whether a particular proper noun represents a person or a place, or whether a given work from some tele-type text would be capitalized if both cases had been used. Areas for research based on observed shortcomings of the method are also discussed. In particular, an example in the author identification task shows the need for a robust version of the method. Also, the method makes an assumption of independence which is demonstrably false, yet there has been no careful study of the results of this assumption.