Article ID: | iaor20172809 |
Volume: | 68 |
Issue: | 4 |
Start Page Number: | 777 |
End Page Number: | 798 |
Publication Date: | Aug 2017 |
Journal: | Journal of Global Optimization |
Authors: | Kuang Da, Du Rundong, Drake Barry |
Keywords: | datamining, matrices, heuristics |
The importance of unsupervised clustering and topic modeling is well recognized with ever‐increasing volumes of text data available from numerous sources. Nonnegative matrix factorization (NMF) has proven to be a successful method for cluster and topic discovery in unlabeled data sets. In this paper, we propose a fast algorithm for computing NMF using a divide‐and‐conquer strategy, called DC‐NMF. Given an input matrix where the columns represent data items, we build a binary tree structure of the data items using a recently‐proposed efficient algorithm for computing rank‐2 NMF, and then gather information from the tree to initialize the rank‐