Unsupervised clustering for nontextual web document classification

0.00 Avg rating—0 Votes

Article ID:	iaor2005531
Country:	Netherlands
Volume:	37
Issue:	3
Start Page Number:	377
End Page Number:	396
Publication Date:	Jun 2004
Journal:	Decision Support Systems
Authors:	Chan Samuel W.K., Chong Mickey W.C.
Keywords:	neural networks, internet

Abstract:

While the breadth of vocabulary used in long documents may mislead the traditional keyboard-based retrieval systems, the demands for techniques in nontextual Web classification and retrieval from a large document collection are mounting. Only a few prototype systems have attempted to classify hypertext on the basis of nontextual elements in order to locate unfamiliar documents. As a result, a large portion of Web documents having pictorial information in nature is far beyond the reach of most current search engines. In this research, we devise a novel quantitative model of nontextual World Wide Web classification based on image information. An intelligent content-sensitive, attribute-rich image classifier is presented. An image similarity measure is used to deduce the likelihood among images. Different image feature vectors have been constructed and evaluated. Evaluation shows images judged to be similar by humans form interesting clusters in our unsupervised learning. Comparison with other clustering techniques, such as Hierarchical Agglomerative Clustering, demonstates that our approach is found useful in content-based image information retrieval.

Reviews

Required fields are marked *. Your email address will not be published.