Protecting Privacy Against Record Linkage Disclosure: A Bounded Swapping Approach for Numeric Data

Protecting Privacy Against Record Linkage Disclosure: A Bounded Swapping Approach for Numeric Data

0.00 Avg rating0 Votes
Article ID: iaor201113556
Volume: 22
Issue: 4
Start Page Number: 774
End Page Number: 789
Publication Date: Dec 2011
Journal: Information Systems Research
Authors: ,
Keywords: experiment
Abstract:

Record linkage techniques have been widely used in areas such as antiterrorism, crime analysis, epidemiologic research, and database marketing. On the other hand, such techniques are also being increasingly used for identity matching that leads to the disclosure of private information. These techniques can be used to effectively reidentify records even in deidentified data. Consequently, the use of such techniques can lead to individual privacy being severely eroded. Our study addresses this important issue and provides a solution to resolve the conflict between privacy protection and data utility. We propose a data‐masking method for protecting private information against record linkage disclosure that preserves the statistical properties of the data for legitimate analysis. Our method recursively partitions a data set into smaller subsets such that data records within each subset are more homogeneous after each partition. The partition is made orthogonal to the maximum variance dimension represented by the first principal component in each partitioned set. The attribute values of a record in a subset are then masked using a double‐bounded swapping method. The proposed method, which we call multivariate swapping trees, is nonparametric in nature and does not require any assumptions about statistical distributions of the original data. Experiments conducted on real‐world data sets demonstrate that the proposed approach significantly outperforms existing methods in terms of both preventing identity disclosure and preserving data quality.

Reviews

Required fields are marked *. Your email address will not be published.