Article ID: | iaor20116023 |
Volume: | 30 |
Issue: | 3 |
Start Page Number: | 532 |
End Page Number: | 549 |
Publication Date: | May 2011 |
Journal: | Marketing Science |
Authors: | Singh Surendra N, Hillmer Steve, Wang Ze |
Keywords: | marketing, internet |
The World Wide Web contains a vast corpus of consumer‐generated content that holds invaluable insights for improving the product and service offerings of firms. Yet the typical method for extracting diagnostic information from online content–text mining–has limitations. As a starting point, we propose analyzing a sample of comments before initiating text mining. Using a combination of real data and simulations, we demonstrate that a sampling procedure that selects respondents whose comments contain a large amount of information is superior to the two most popular sampling methods–simple random sampling and stratified random sampling–‐in gaining insights from the data. In addition, we derive a method that determines the probability of observing diagnostic information repeated a specific number of times in the population, which will enable managers to base sample size decisions on the trade‐off between obtaining additional diagnostic information and the added expense of a larger sample. We provide an illustration of one of the methods using a real data set from a website containing qualitative comments about staying at a hotel and demonstrate how sampling qualitative comments can be a useful first step in text mining.