Article ID: | iaor20082719 |
Country: | Netherlands |
Volume: | 43 |
Issue: | 1 |
Start Page Number: | 181 |
End Page Number: | 191 |
Publication Date: | Feb 2007 |
Journal: | Decision Support Systems |
Authors: | Amiri Ali |
Data sanitization is a process that is used to promote sharing of transactional databases among organizations while alleviating concerns of individual organizations by preserving confidentiality of their sensitive knowledge in the form of sensitive association rules. It hides the frequent itemsets corresponding to the sensitive association rules that contain sensitive knowledge by modifying the sensitive transactions that contain those itemsets. This process is guided by the need to minimize the impact on the data utility of the sanitized database by allowing mining as much as possible of the non-sensitive knowledge in the form non-sensitive association rules from the sanitized database. We propose three heuristic approaches for the sanitization problem. Results from extensive tests conducted on publicly available real datasets indicate that the approaches are effective and outperform a previous approach in terms of data utility at the expense of computational speed. The proposed approaches sanitize also the databases with great data accuracy, thus resulting in little distortion of the released databases. We recommend that the database owner sanitize the database using the third proposed hybrid approach.