Article ID: | iaor201113556 |
Volume: | 22 |
Issue: | 4 |
Start Page Number: | 774 |
End Page Number: | 789 |
Publication Date: | Dec 2011 |
Journal: | Information Systems Research |
Authors: | Sarkar Sumit, Li Xiao-Bai |
Keywords: | experiment |
Record linkage techniques have been widely used in areas such as antiterrorism, crime analysis, epidemiologic research, and database marketing. On the other hand, such techniques are also being increasingly used for identity matching that leads to the disclosure of private information. These techniques can be used to effectively reidentify records even in deidentified data. Consequently, the use of such techniques can lead to individual privacy being severely eroded. Our study addresses this important issue and provides a solution to resolve the conflict between privacy protection and data utility. We propose a data‐masking method for protecting private information against record linkage disclosure that preserves the statistical properties of the data for legitimate analysis. Our method recursively partitions a data set into smaller subsets such that data records within each subset are more homogeneous after each partition. The partition is made orthogonal to the maximum variance dimension represented by the first principal component in each partitioned set. The attribute values of a record in a subset are then masked using a double‐bounded swapping method. The proposed method, which we call