Machine Learning in Web 3.0. Understanding how Web3.0 enables… | by Divyam Shah | Aug, 2022

By Jessie Hobb On Aug 8, 2022

Understanding how Web3.0 enables machine learning to work without violating privacy and maintaining trust.

Blockchain for privacy-preserving ML(source: Unsplash )

Machine Learning is all about discovering latent relationships within data. These relationships can then be used to gain actionable insights for any business.

However, recent years have shown an increasing awareness around ownership of data by governments(ex: GDPR in Europe) and users in general. Today, people are not okay with their private data being used for analytics as per the whims and wishes of a company.

This leads to the following dilemma:

Companies need to perform analytics on user data to better understand and improvise their services. The more data they can get the better their insights are. This creates a strong incentive to acquire data from other users or 3rd party companies. There is also a strong monetary incentive to sell data as an asset to other companies.
Users realize that their personal data can be misused or leaked to other companies. This disincentivizes them to share data with companies.

How can we solve this dilemma?

Web3.0 brings with it the idea of Zero Knowledge Proofs to solve such problems.

Zero Knowledge Proofs are useful in applications where data privacy is paramount. ZKPs enable an application to prove that an insight from data is correct without revealing the underlying data.

A ZKP can be understood as an interaction between two players, a Prover and a Verifier.

Prover: A prover executes a computation and wants to prove to any third party that the computation was valid.
Verifier: A verifier’s role is to verify that the computation done by someone else was valid. This is done by accepting a witness from the user that has a sample test data point with inputs and expected corresponding outputs. This is used by users to validate the authenticity of a prover.

Both Prover and Verifier are deployed onto a Blockchain Network to maintain transparency.

Let’s take an example of a Sudoku Puzzle to demonstrate the working of a ZKP. Here Bob is trying to solve a Sudoku Puzzle and Alice says that she knows the solution. Bob is now unsure whether Alice actually knows the solution. To give confidence to Bob, Alice would have to reveal the solution. But this would spoil the fun for Bob! Is there any other way to give Bob confidence that Alice knows the solution without actually revealing it? That’s exactly what a ZKP enables.

A practical example to demonstrate Prover-Verifer interactions. Image by Author

Alice can deploy a ZKP(Prover-Verifier) that takes a section of the puzzle as input and gives a solution to that section. That way Bob can send a solved section alongside the expected outcome as ‘Witness’ to the ZKP’s Verifier. The Verifier can verify if Alice’s prover also gives the same solution as Bob’s expected solution. This would give Bob confidence that Alice actually knows the solution. He can then interact directly with the prover for hints on specific sections of the puzzle.

So how exactly can ZKPs enable privacy-preserving machine learning?

There are two distinct scenarios where ZKPs can enable privacy-preserving machine learning:

1. Analytics on Private Data.

2. Data and ML Algorithm Marketplaces.

Here data is private but the model has to be public to give confidence in the nature of insights extracted and for it to reach a larger audience.

We will take the example of a credit scoring application. Here one might need private details of a user like age, gender, salary, past debt repayment records, monthly expenses, etc. These are then used to compute a credit score for the user. Today there is no way to know for sure if companies that compute credit scores respect user privacy. They can very well use this data for other applications or sell it to other parties without user consent. Putting the whole application on a public blockchain such as Ethereum enables transparency and protects user consent.

There is one major problem with putting the whole credit scoring application on Ethereum. The ML model computing credit score might be a proprietary technology of a company. Companies have no incentive to host on a public blockchain like Ethereum. This is where ZKPs come into the picture. ZKPs encrypt the core ML algorithm logic into a Prover. One can think of a Prover as an entity that executes computations in a black box without revealing the contents of the box to anyone. zk-SNARK and zk-STARKs are the popular encryption methods to construct such black-boxes(Provers). You can read more about these here.

We’ll continue with the credit scoring example to understand how something like this can be implemented. Let’s assume Alice runs a company that computes credit score from a user’s private data. The app is deployed within the ZKP framework to ensure user’s data privacy. Any user such as Bob can validate the correctness of this ZKP app by passing a witness comprising of test values(age, income, expense) along with the expected result(credit score range) into the Verifier. The Verifier uses these values to validate if the output from Prover matches the expected witness output.

User(Bob) verifies the correctness of credit computation using a witness. Image by Author

Once the user is confident in the correctness of credit score computation, he can directly interact with the Prover that keeps certain inputs private(Age, Income) and others public(Expenses). The prover runs a computation and returns the credit score as output. This way we let Alice perform computations(Analytics) on Bob’s data without compromising privacy.

User(Bob) interacts with the Prover directly on a day-to-day basis to get credit scores. Image by Author

Data Marketplaces

ML models get better with data. This creates a strong incentive for companies to sell and purchase data from one another. Today there is no way to validate the quality of a dataset without actually sharing it. This is where ZKPs come to the rescue again.

ZKPs can be used to construct a Prover that uses certain computations to see if the data meets certain constraints or properties as a sanity check. The Prover would keep the data private and the very fact it is on a public blockchain would ensure there is no malpractice in the verification process. This ensures that the purchaser has confidence in data quality before buying the dataset.

If Alice was to sell a private dataset to Bob, she must ensure that the data meets certain constraints as specified by Bob. A ZKP app is constructed as per Bob’s constraints and deployed onto a public blockchain. Alice can then send her private dataset as input to the Prover and show that her data meets all necessary constraints.

Bob validates data before purchasing from Alice. Image by Author

ML Algorithm Marketplaces

Sometimes the problem at hand can be to sell the ML Algorithm and the associated pre-trained model. Here too we face the problem of getting the validation from the seller before selling it to them. ZKPs can be used here in a manner very similar to the Data Marketplaces problem discussed above.

We can deploy a pre-trained model into a Prover via ZKPs. The purchaser can then send a test dataset to verify if the model performs satisfactorily on the test set. This ensures that the purchaser has confidence in the ML Algorithm before purchasing it.

The below illustration shows how Alice trains and deploys an ML model into the ZKP framework. This enables bob to test out the performance of the model on a custom test dataset before actually purchasing the ML algorithm from Alice.

Bob validates Alice’s ML model before purchase. Image by Author