Article ID: | iaor20116083 |
Volume: | 12 |
Issue: | 2 |
Start Page Number: | 67 |
End Page Number: | 79 |
Publication Date: | Jun 2011 |
Journal: | Information Technology and Management |
Authors: | Liu Bin, Cao Gui, He Wu |
Keywords: | datamining |
In the internet‐based e‐business environment, most business data are distributed, heterogeneous and private. To achieve true business intelligence, mining large amounts of distributed data is necessary. Through a thorough literature review, this paper identifies four main issues in distributed data mining (DDM) systems for e‐business and classifies modern DDM systems into three classes with representative samples. To address these identified issues, this paper proposes a novel DDM model named DRHPDM (Data source Relevance‐based Hierarchical Parallel Distributed data mining Model). In addition, to improve the quality of the final result, the data sources are divided into a centralized mining layer and a distributed mining layer, according to their relevance. To improve the openness, cross‐platform ability, and intelligence of the DDM system, web service and multi‐agent technologies are adopted. The feasibility of DRHPDM was verified by building a prototype system and applying it to a web usage mining scenario.