Python Machine Learning Projects

Abstract:

In our data world, many untrusted controllers collect personal data. Giving the individual data agency protects her privacy and informational self-determination. Local anonymization allows each person to anonymize her data before giving them to the data controller, giving her maximum agency. Randomized response (RR) generates multi-dimensional anonymized microdata for exploratory analysis and machine learning. Pooled randomized data can provide an unbiased estimate of the true data distribution of individuals. RR guarantees strict privacy. The curse of dimensionality when applied to multiple attributes reduces the accuracy of the estimated true data distribution. We propose complementary methods to reduce dimensionality. First, we present two basic protocols—separate RR on each attribute and joint RR for all attributes—and discuss their limitations. We then introduce an algorithm to cluster attributes so they can be considered independent and joint RR can be performed within each cluster. After that, we introduce an adjustment algorithm for the randomized data set that repairs some of the accuracy loss due to assuming independence between attributes when using RR separately on each attribute or between clusters in cluster-wise RR. We demonstrate the methods with empirical data.

Note: Please discuss with our team before submitting this abstract to the college. This Abstract or Synopsis varies based on student project requirements.

Did you like this final year project?

To download this project Code with thesis report and project training... Click Here

You may also like: