Word Sense Induction with Closed Frequent Termsets

0.00 Avg rating—0 Votes

Article ID:	iaor20173504
Volume:	33
Issue:	3
Start Page Number:	335
End Page Number:	367
Publication Date:	Aug 2017
Journal:	Computational Intelligence
Authors:	Kozlowski Marek, Rybinski Henryk
Keywords:	datamining, information

Abstract:

The article is devoted to the problem of word sense induction. We propose a method for inducing senses from a raw text corpus. The proposed sense induction algorithm (called SenseSearcher, or SnS) is based on closed frequent sets, and as a result, it provides a multilevel sense representation. To a large extent, it is a knowledge‐poor approach, as it does not need any kind of structured knowledge base about senses and there is no deep language knowledge embedded. By discovering a hierarchy of senses, the algorithm enables identifying subsenses (fine‐grained senses). SnS discovers not only frequent (dominating) senses but also infrequent ones (dominated). The method was evaluated in two main areas: lexicography and information retrieval. With the use of the SnS algorithm, we provide a tool able to induce from a textual corpus a structure of senses, with a varying number of granularity levels. In the area of information retrieval, SnS can be used for clustering search result, according to the discovered senses. The experiments have shown that SnS performs better than the methods participating in the SemEval2013 WSI Task 11 competition, and most of the known search result clustering methods.

Reviews

Required fields are marked *. Your email address will not be published.