On the existence and significance of data preprocessing biases in web-usage mining

On the existence and significance of data preprocessing biases in web-usage mining

0.00 Avg rating0 Votes
Article ID: iaor200444
Country: United States
Volume: 15
Issue: 2
Start Page Number: 148
End Page Number: 170
Publication Date: Apr 2003
Journal: INFORMS Journal On Computing
Authors: , ,
Keywords: artificial intelligence: decision support, statistics: empirical, datamining
Abstract:

The literature on web-usage mining is replete with data preprocessing techniques, which correspond to many closely related problem formulations. We survey data-processing techniques for session-level pattern discovery and compare three of these techniques in the context of understanding session-level purchase behavior on the web. Using real data collected from 20,000 users' browsing behavior over a period of six months, four different models (linear regressions, logistic regressions, neural networks, and classification trees) are built based on data preprocessed using three different techniques. The results demonstrate that the three approaches result in radically different conclusions and provide initial evidence that a data preprocessing bias exists, the effect of which can be significant.

Reviews

Required fields are marked *. Your email address will not be published.