Article ID: | iaor20082871 |
Country: | United States |
Volume: | 55 |
Issue: | 5 |
Start Page Number: | 890 |
End Page Number: | 908 |
Publication Date: | Sep 2007 |
Journal: | Operations Research |
Authors: | Gopal Ram D., Garfinkel Robert S., Nunez Manuel A. |
Keywords: | datamining |
Data perturbation and query restriction are two methods developed to protect confidential data in statistical databases. In the former, the data are systematically changed to yield answers to queries that are statistically similar to those that would have resulted from the original data. The latter provide exact answers to queries as long as the risk of exact disclosure of confidential data does not become too great. We present a new methodology to combine these techniques so that the advantages of both are captured. The model is appropriate and computationally viable for large databases whether the queries are linear or nonlinear. The query restriction phase consists of finding an optimal subset of queries to answer exactly without compromising the database. This is an