A Differential Privacy Example for Beginners | by Lia Zheng | Dec, 2022

By Jessie Hobb On Dec 17, 2022

Privacy through a coin flip

Differential privacy (DP) is a way to preserve the privacy of individuals in a dataset while preserving the overall usefulness of such a dataset. Ideally, someone shouldn’t be able to tell the difference between one dataset and a parallel one with a single point removed. To do this, randomized algorithms are used to add noise to the data.

As a simple example, imagine this: you are in a school with a total student body of 300 people. Each of you are asked, “Have you ever cheated on a test?”

See how this can be a sensitive question? Perhaps those who have cheated would be reluctant to respond yes, out of fear for potential repercussions. So… how do we resolve this issue? This is where DP comes in handy.

Each student to flips a coin. If they land on heads, they tell the truth. If they land on tails, they flip another coin; if they land on heads, respond yes; tails, no.

This way, even if your survey response gets released publicly, there is plausible deniability for the accuracy of your response. At the same time, with greater numbers of people, the school can still make use from the data; it doesn’t become completely useless.

We’ll look at this coin flip example implemented in Python code.

Thus, by using DP, we preserve privacy as someone wouldn’t be able to tell whether an individual was in a database or not.

How good are these results?

As discussed above, the larger the dataset, the more accurate the results will be because the randomness is counteracted with set probabilities. With the coin flip, the amount of randomness is quite high, with about 25% of the data being false. However, the actual distribution of cheaters vs. non-cheaters can be likewise calculated, knowing this.

Say c is the true proportion of students who cheated; then, 1-c is the true proportion of students who didn’t cheat. Say p_y is the proportion of students who reported that they cheated.

p_y is attained by adding 3/4 of the proportion of cheaters with 1/4 of the proportion of non-cheaters, as we assume 1/4 of each of the responses are false.

That simplifies down to:

So c is estimated to be twice the proportion that responded yes, minus 1/2, showing how we can estimate the true proportion of cheaters knowing the results after running with DP. Notice how the randomness of a coin flip is reduced to probabilities, demonstrating the need for a large dataset to preserve accuracy.

Our results showed that about 125/300 (so 5/12) reported cheating — this is p_y. Then, the calculated c value is 1/3, which is the same value as the real proportion.

Conclusion

This coin flip example is a very simplified version of how DP can be used to preserve privacy. By having the coin flip, the responses are anonymized, as there is “plausible deniability” in each response, yet it still allows for about 75% of the responses to be true. We are able to calculate “backwards” to find a good estimate of the true proportion of each response — a number that gets very accurate with large enough datasets.

The great advantage highlighted through this example is individual privacy — with DP, it isn’t possible to distinguish two different datasets that differ by one user, which guarantees privacy on an individual level.

Privacy through a coin flip

As a simple example, imagine this: you are in a school with a total student body of 300 people. Each of you are asked, “Have you ever cheated on a test?”

Each student to flips a coin. If they land on heads, they tell the truth. If they land on tails, they flip another coin; if they land on heads, respond yes; tails, no.

We’ll look at this coin flip example implemented in Python code.

Thus, by using DP, we preserve privacy as someone wouldn’t be able to tell whether an individual was in a database or not.

How good are these results?

Say c is the true proportion of students who cheated; then, 1-c is the true proportion of students who didn’t cheat. Say p_y is the proportion of students who reported that they cheated.

p_y is attained by adding 3/4 of the proportion of cheaters with 1/4 of the proportion of non-cheaters, as we assume 1/4 of each of the responses are false.

That simplifies down to:

Our results showed that about 125/300 (so 5/12) reported cheating — this is p_y. Then, the calculated c value is 1/3, which is the same value as the real proportion.

Conclusion

Read original article here

Denial of responsibility! Techno Blender is an automatic aggregator of the all world’s media. In each content, the hyperlink to the primary source is specified. All trademarks belong to their rightful owners, all materials to their authors. If you are the owner of the content and do not want us to publish your materials, please contact us by email – [email protected]. The content will be deleted within 24 hours.

A Differential Privacy Example for Beginners | by Lia Zheng | Dec, 2022

Privacy through a coin flip

Part 1: Without DP

Part 2: With DP

How good are these results?

Conclusion

Privacy through a coin flip

Part 1: Without DP

Part 2: With DP

How good are these results?

Conclusion