Open Access Open Access  Restricted Access Subscription Access

PRIVACY PRESERVING DECISION TREE LEARNING USING UNREALIZED DATA SETS

Beula Amalorpavam A, N. Mookhambika

Abstract


Privacy preservation is important for machine learning and data mining, but measures designed to protect private information often result in a trade-off: reduced utility of training samples. This paper introduces a privacy preserving approach that can be applied to decision tree learning, without concomitant loss of accuracy. It describes an approach to the preservation of the privacy of collected data samples in cases where information from the sample database has been partially lost. This approach converts the original sample data sets into a group of unreal data sets, from which the original samples cannot be reconstructed without the entire group of unreal data sets. Meanwhile, an accurate decision tree can be built directly from those unreal data sets. This novel approach can be applied directly to the data storage as soon as the first sample is collected. The approach is compatible with other privacy preserving approaches, such as cryptography, for extra protection.


References


Ajmani, S.; Morris, R., and Liskov, B., (2001). “A Trusted Third - Party Computation Service”, Technical Report MIT-LCS-TR-847, MIT.

Wang, S. L., and Jafari, A., (2005). “Hiding Sensitive Predictive Association Rules”, Proceeding IEEE International Conference Systems, Man and Cybernetics, pp. 164-169.

Agrawal, R., and Srikant, R., (2000). “Privacy Preserving Data Mining,” Proceeding ACM SIGMOD Conference Management of Data (SIGMOD ’00), May, pp. 439-450.

Ma, Q., and Deng, P., (2008). “Secure Multi-Party Protocols for Privacy Preserving Data Mining”, Proceeding Third International Conference - Wireless Algorithms, Systems, and Applications (WASA ’08), pp. 526-537.

Gitanjali, J.; Indumathi, J.; Iyengar, N. C., and Sriman, N., (2010). “A Pristine Clean Cabalistic Foruity Strategize Based Approach for Incremental Data Stream Privacy Preserving Data Mining,” Proceeding IEEE Second International Advance Computing Conference (IACC), pp. 410-415.

Lomas, N., “Data on 84,000 United Kingdom Prisoners is Lost”, Retrieved Sept. 12, 2008, http://news.cnet.com/8301- 1009_3-10024550-83.html.

BBC News Brown Apologises for Records Loss, Retrieved September, 12, 2008,

http://news.bbc.co.uk/2/hi/uk_news/politics/7104945.stm

Kaplan, D., Hackers Steal 22,000 Social Security Numbers from University of Missouri Database, Retrieved September 2008, http://www.scmagazineus.com/Hackers-steal-22000-Social-Security-numbers-from-Univ.-of-Missouri-database/article/34964/

Goodin, D., “Hackers Infiltrate TD Ameritrade Client Database”, Retrieved September 2008, http://www.channelregister.co.uk/2007/09/15/ameritrade_database_burgled/

Liu, L.; Kantarcioglu, M., and Thuraisingham, B., (2009). “Privacy Preserving Decision Tree Mining from Perturbed Data”, Proceedings 42nd Hawaii International Conference System Sciences (HICSS ’09)

Zhu, Y.; Huang,; Yang, W.; Li, D.; Luo, Y., and F. Dong, (2009). “Three New Approaches to Privacy-Preserving Add to Multiply Protocol and Its Application”, Proceeding Second International Workshop Knowledge Discovery and Data Mining, (WKDD ’09), pp. 554-558.

Vaidya, J.; and Clifton, C., (2002). “Privacy Preserving Association Rule Mining in Vertically Partitioned Data”, Proceeding Eighth ACM SIGKDD International Conference Knowledge Discovery and Data Mining (KDD ’02), July, pp. 23- 26.

Shaneck, M.; and Kim, Y., (2010). “Efficient Cryptographic Primitives for Private Data Mining”, Proceeding 43rd Hawaii International Conference System Sciences (HICSS), pp. 1-9.

Aggarwal, C., and Yu, P., (2008). “Privacy - Preserving Data Mining:, Models and Algorithms”, Springer.

Sweeney, L., (2002). “k-Anonymity: A Model for Protecting Privacy”, International Journal Uncertainty, Fuzziness and Knowledge-based Systems, May, Volume. 10, pp. 557-570.

Dowd, J.; Xu, S., and Zhang, W., (2006). “Privacy-Preserving Decision Tree Mining Based on Random Substitions”, Proceeding International Conference Emerging Trends in Information and Communication Security (ETRICS ’06), pp. 145-159.

Bu, S.; Lakshmanan, L.; Ng, R.; and Ramesh, G., (2007). “Preservation of Patterns and Input-Output Privacy”, cedinging IEEE 23rd International Conference Data Eng., April, pp. 696-705.

Russell, S., and Peter, (2002). Artificial Intelligence: A Modern Approach 2/ E. Prentice-Hall.

Fong, P. K., (2008). “Privacy Preservation for Training Data Sets in Database: Application to Decision Tree Learning”, Master’s Thesis, Department of Computer Science, University of Victoria.


Full Text: PDF

Refbacks

  • There are currently no refbacks.


Creative Commons License
This work is licensed under a Creative Commons Attribution 3.0 License.

International Journal of Information Technology & Computer Sciences Perspectives © Pezzottaite Journals.