Article ID: | iaor200444 |
Country: | United States |
Volume: | 15 |
Issue: | 2 |
Start Page Number: | 148 |
End Page Number: | 170 |
Publication Date: | Apr 2003 |
Journal: | INFORMS Journal On Computing |
Authors: | Kimbrough Steven O., Zheng Zhiqiang, Padmanabhan Balaji |
Keywords: | artificial intelligence: decision support, statistics: empirical, datamining |
The literature on web-usage mining is replete with data preprocessing techniques, which correspond to many closely related problem formulations. We survey data-processing techniques for session-level pattern discovery and compare three of these techniques in the context of understanding session-level purchase behavior on the web. Using real data collected from 20,000 users' browsing behavior over a period of six months, four different models (linear regressions, logistic regressions, neural networks, and classification trees) are built based on data preprocessed using three different techniques. The results demonstrate that the three approaches result in radically different conclusions and provide initial evidence that a data preprocessing bias exists, the effect of which can be significant.