Stochastic protection of confidential information in databases: a hybrid of data perturbation and query restriction

Stochastic protection of confidential information in databases: a hybrid of data perturbation and query restriction

0.00 Avg rating0 Votes
Article ID: iaor20082871
Country: United States
Volume: 55
Issue: 5
Start Page Number: 890
End Page Number: 908
Publication Date: Sep 2007
Journal: Operations Research
Authors: , ,
Keywords: datamining
Abstract:

Data perturbation and query restriction are two methods developed to protect confidential data in statistical databases. In the former, the data are systematically changed to yield answers to queries that are statistically similar to those that would have resulted from the original data. The latter provide exact answers to queries as long as the risk of exact disclosure of confidential data does not become too great. We present a new methodology to combine these techniques so that the advantages of both are captured. The model is appropriate and computationally viable for large databases whether the queries are linear or nonlinear. The query restriction phase consists of finding an optimal subset of queries to answer exactly without compromising the database. This is an NP-hard problem with a matroid intersection structure that lends itself to an efficient greedy heuristic. Then, given the queries that are answered exactly, we implement a data perturbation phase that provides stochastic protection and consistency. We present computational results on a large database with both linear and nonlinear queries. The results indicate that many queries can be answered exactly and the proposed perturbation approach provides more accurate answers than the standard perturbation method.

Reviews

Required fields are marked *. Your email address will not be published.