Naive Bayes Classifier from Scratch, with Python | by Piero Paialunga | Jan, 2023


Photo by Joel Abraham on Unsplash

Math and Physics are full of theorems, equations, principles, axioms, and corollaries. When I started studying Physics I remembered I got to the point where all the courses I studied had the same structures:

A. Defining the fundamental assumptions
B. Using math to build the next “brick of the wall
C. Stacking one brick on top of the other until the whole pieces come together into an elegant, beautiful, model of a portion of the world

Let’s take the first course I ever did in physics: calculus.

1. You start with the fundamental assumptions of sets and numbers. You start defining natural, integer, real, and complex numbers.
2. From there you start defining functions that are nothing but a map from space A (let’s say N-dimensional real space) to space B (let’s say a 1D real space).
3. Then you start with the study of the functions. So you start analyzing their minima, maxima, and saddle points. You accidentally (ops!) get to know the concept of “derivative”.
4. Then you see how you can integrate a function, that is the opposite of the derivative.
5. Then you combine these things with differential equations.

Don’t get me wrong: that process is amazing. I loved to see how far human logic can bring you. I loved to see that very complex natural events can be derived by starting from very simple concepts that evolve into way deeper implications.

Another valuable science lesson is the fact that a theorem that is originally applied to “A” can also be applied to “B”, “C”, “D” and “E”.
The fascinating aspect is that the field “A” and the other fields (B, C, D, and E) don’t have to be related.

Possibly the greatest example of that is Bayes’ Theorem.
Bayes’ theorem is something that is, in a certain way, an elementary and obvious concept. The incredibly cool thing is that we can build some very interesting algorithms by stressing the idea behind Bayes’ Theorem and applying it to Machine Learning. This tool is called the Naive Bayes Classifier

In this article:

  1. We will give a brief introduction to Bayes’ theorem. We will explain what it is, why it is important, and how it can be applied to Machine Learning
  2. We will see an application of the Bayes theorem in a made-up classification task.
  3. We will see a leveled-up version of the Bayes theorem using the so-called Gaussian Naive Bayes classifier.

I’m so excited. Let’s start this!

There was one definition of the Bayes Theorem that has really stuck in my brain:

“The Bayes Theorem is that theorem that proves that just because your car is blue it doesn’t mean that all the cars of the world are blue but if all the cars of the world are blue then your car must be blue”

If you read this article about Bayesian Neural Networks you probably recognize it is the same definition but I swear I didn’t use the same definition twice because I’m lazy! I mean… I am lazy but that is not the reason.

I use that definition because it helps us understand that event A given a condition of event B does not have the same probability of happening as event B given the condition of event A.

Let me do another example to overcome my laziness.

Let’s say we have two bowls

Now, let’s say that these two bowls are full of basketball and football (it’s called football, not soccer) balls.

Image by author

Now, let’s say that I ask you that question:

KNOWING that I picked a ball from the blue bowl, what is the probability that I picked a football ball ”

Well, the answer is very simple. The probability is 1, because, on the blue bowl, I only have football balls. Now let’s say that I pick a bowl randomly with equal probability (0.5 probability of picking the white bowl and 0.5 probability of picking the blue bowl). Let me ask you this question:

“ KNOWING that I picked a football ball, what is the probability that I picked that ball from the blue bowl? ”

As you can see, the question is kind of the same but event A (I picked a football ball) and event B (I picked the blue bowl) have the opposite order.
Now, I think that you kind of guessed that the answer is not the same, because I could pick the football ball from the white bowl rather than the blue one. I also think you kind of guessed that this probability is still high though because I only have one football ball in the white bowl.

Let’s do the experiment in Python.

I can go into the line-by-line detail of it but I’d find that pretty boring. What I’m doing here is nothing but setting the situation of bowls and balls like we did before and then running the probabilistic experiment N times. At the end of these N iterations, we will have N results. Then we will have to compute our probability as a frequentist probability. In this case, we will compute:

Image by author

Now we know that if we run N = infinite times in the same experiments we will converge to the analytical result, so we will iteratively increase N and see if it converges to whatever result.

As you see, we did 20,50,70,…, 1M iterations.
What is the result of this?

Image by author

Mhhh, so it seems like there is indeed an analytical result right? Let’s find out.

So, Bayes’ theorem tells us that:

Image by author

So the probability that an extracted football comes from the blue bowl is equal to the probability of extracting a football from the blue bowl (notice the difference! this is the probability of extracting a football GIVEN THE FACT THAT YOU PICKED THE BLUE BOWL) multiplied by the probability of extracting from a blue bowl divided by the probability of extracting a football.

We know that:

Image by author

Cause there are only the football balls in the blue bowl. We also assume that we are picking one of the two bowls with equal probabilities.
So we had:

Image by author

Now, the probability of picking the football is the sum of picking the football from the white bowl and picking the football from the blue bowl. Like:

Image by author

So we have:

Image by author

Because:

Image by author

So we have:

Image by author

And if we plot 5/6 in the previous plot we made we get that:

Image by author

We did the math right 😄

Ok so at this point I am positive you are all thinking:

“What does it have to do with Machine Learning at all?”

Let’s find out 😏

The Naive Bayes Classifier is the Naive application of the Bayes theorem to a Machine Learning classifier: as simple as that.

Let’s say we have a certain binary classification problem (class 1 and class 2). You have N instances and each instance has its label Y.

The so-called prior probability is defined as the following:

Image by author

Now what we really want to know is: given a certain instance, what is the probability that that instance belongs to class 1? What is the probability that belongs to class 2?

So what we are interested in is, given an instance x:

Image by author

In a binary dataset, we have that the sum of these two quantities is 1, so we actually only need one of these two quantities. Now this quantity might look a mystery, but the other one:

Image by author

can be computed very easily (remember the bowl/balls example!).
It is also very easy to compute the probability of P(x), in the same way, that we computed P(class 1) and P(class 2).

So we can actually compute:

Image by author

All of this looks good. Now I think you might have an understanding of why we called it naive. It is Naive because it is nothing more than counting the occurrences and using Bayesian logic to infer the prediction.

Let’s apply it to a toy dataset. We have these two classes and two features generated using sklearn.datasets library.

Fantastic.
So:

Image by author

Because we have two dimensions.
Given the class 0, we can easily compute what is the probability of a given area (a small area around a 2D point).

So:

  1. We know how to compute the posterior probability, that is P(Y|X)
  2. We have our set of Y and X
  3. We can apply the rule to the training set, and we can predict our test set results

Let’s do it!

  1. Train-Test split:

2. Importing and fitting the Naive Bayes Classifier:

Note that we are adding a bias to our data (x). This is because the Bayes classifier we are using considers our x_1 feature as a categorical feature and we consider these features as numerical ones. A more rigorous approach is using the LabelEncoder() feature, but in this specific case, it is exactly equivalent.

3. Testing the performance:

The performance, in this very simple toy example, is obviously (almost) perfect.

You probably have had enough of me talking, I am sorry for that.
I just want to close this article by talking about the Gaussian Naive Bayes.
In the previous example, we consider the features to be categorical. But what happens when they are not?

Well, we have to assume a certain distribution of the likelihood: P(Y|X).

Image by author

So it is called gaussian Naive Bayes because the likelihood is considered to be gaussian, as you see above. The mean (mu_y) and variance (sigma_y squared) are computed by computing the mean and variance of the classes.

Notice that now we don’t need to add the Bias to our data anymore.

These are the results

In this article we:

  1. Acknowledge the existence of Bayes’ Theorem. We had an intuitive introduction to it and we did a very simple controlled case to see how it works.
  2. Saw how Bayes’ Theorem can be applied to Machine Learning. What is Y, what is X, and how we can put them into the Bayes’ Formula to get some predictions in a classification task.
  3. We made up our own dataset and we used the Categorial Naive Bayes class of sklearn to do a toy classification task. It worked almost perfectly but had the problem of working only with categorical features.
  4. We saw how to extend the Categorical Naive Bayes into Gaussian Naive Bayes. The Gaussian Naive Bayes, as we saw above, uses a Gaussian Likelihood to work. This Gaussian Likelihood works with noncategorical features as well.

The Bayes Theorem is applied in Machine Learning in a lot of other ways, like the Bayesian Neural Networks or the Bayesian Ridge Regression. I feel that this is a very cool introductive example of what the application of the Bayes Theorem can do in a classification problem. I had a lot of fun writing it and I really hope you loved it! 🥰

If you liked the article and you want to know more about machine learning, or you just want to ask me something, you can:

A. Follow me on Linkedin, where I publish all my stories
B. Subscribe to my newsletter. It will keep you updated about new stories and give you the chance to text me to receive all the corrections or doubts you may have.
C. Become a referred member, so you won’t have any “maximum number of stories for the month” and you can read whatever I (and thousands of other Machine Learning and Data Science top writers) write about the newest technology available.


Photo by Joel Abraham on Unsplash

Math and Physics are full of theorems, equations, principles, axioms, and corollaries. When I started studying Physics I remembered I got to the point where all the courses I studied had the same structures:

A. Defining the fundamental assumptions
B. Using math to build the next “brick of the wall
C. Stacking one brick on top of the other until the whole pieces come together into an elegant, beautiful, model of a portion of the world

Let’s take the first course I ever did in physics: calculus.

1. You start with the fundamental assumptions of sets and numbers. You start defining natural, integer, real, and complex numbers.
2. From there you start defining functions that are nothing but a map from space A (let’s say N-dimensional real space) to space B (let’s say a 1D real space).
3. Then you start with the study of the functions. So you start analyzing their minima, maxima, and saddle points. You accidentally (ops!) get to know the concept of “derivative”.
4. Then you see how you can integrate a function, that is the opposite of the derivative.
5. Then you combine these things with differential equations.

Don’t get me wrong: that process is amazing. I loved to see how far human logic can bring you. I loved to see that very complex natural events can be derived by starting from very simple concepts that evolve into way deeper implications.

Another valuable science lesson is the fact that a theorem that is originally applied to “A” can also be applied to “B”, “C”, “D” and “E”.
The fascinating aspect is that the field “A” and the other fields (B, C, D, and E) don’t have to be related.

Possibly the greatest example of that is Bayes’ Theorem.
Bayes’ theorem is something that is, in a certain way, an elementary and obvious concept. The incredibly cool thing is that we can build some very interesting algorithms by stressing the idea behind Bayes’ Theorem and applying it to Machine Learning. This tool is called the Naive Bayes Classifier

In this article:

  1. We will give a brief introduction to Bayes’ theorem. We will explain what it is, why it is important, and how it can be applied to Machine Learning
  2. We will see an application of the Bayes theorem in a made-up classification task.
  3. We will see a leveled-up version of the Bayes theorem using the so-called Gaussian Naive Bayes classifier.

I’m so excited. Let’s start this!

There was one definition of the Bayes Theorem that has really stuck in my brain:

“The Bayes Theorem is that theorem that proves that just because your car is blue it doesn’t mean that all the cars of the world are blue but if all the cars of the world are blue then your car must be blue”

If you read this article about Bayesian Neural Networks you probably recognize it is the same definition but I swear I didn’t use the same definition twice because I’m lazy! I mean… I am lazy but that is not the reason.

I use that definition because it helps us understand that event A given a condition of event B does not have the same probability of happening as event B given the condition of event A.

Let me do another example to overcome my laziness.

Let’s say we have two bowls

Now, let’s say that these two bowls are full of basketball and football (it’s called football, not soccer) balls.

Image by author

Now, let’s say that I ask you that question:

KNOWING that I picked a ball from the blue bowl, what is the probability that I picked a football ball ”

Well, the answer is very simple. The probability is 1, because, on the blue bowl, I only have football balls. Now let’s say that I pick a bowl randomly with equal probability (0.5 probability of picking the white bowl and 0.5 probability of picking the blue bowl). Let me ask you this question:

“ KNOWING that I picked a football ball, what is the probability that I picked that ball from the blue bowl? ”

As you can see, the question is kind of the same but event A (I picked a football ball) and event B (I picked the blue bowl) have the opposite order.
Now, I think that you kind of guessed that the answer is not the same, because I could pick the football ball from the white bowl rather than the blue one. I also think you kind of guessed that this probability is still high though because I only have one football ball in the white bowl.

Let’s do the experiment in Python.

I can go into the line-by-line detail of it but I’d find that pretty boring. What I’m doing here is nothing but setting the situation of bowls and balls like we did before and then running the probabilistic experiment N times. At the end of these N iterations, we will have N results. Then we will have to compute our probability as a frequentist probability. In this case, we will compute:

Image by author

Now we know that if we run N = infinite times in the same experiments we will converge to the analytical result, so we will iteratively increase N and see if it converges to whatever result.

As you see, we did 20,50,70,…, 1M iterations.
What is the result of this?

Image by author

Mhhh, so it seems like there is indeed an analytical result right? Let’s find out.

So, Bayes’ theorem tells us that:

Image by author

So the probability that an extracted football comes from the blue bowl is equal to the probability of extracting a football from the blue bowl (notice the difference! this is the probability of extracting a football GIVEN THE FACT THAT YOU PICKED THE BLUE BOWL) multiplied by the probability of extracting from a blue bowl divided by the probability of extracting a football.

We know that:

Image by author

Cause there are only the football balls in the blue bowl. We also assume that we are picking one of the two bowls with equal probabilities.
So we had:

Image by author

Now, the probability of picking the football is the sum of picking the football from the white bowl and picking the football from the blue bowl. Like:

Image by author

So we have:

Image by author

Because:

Image by author

So we have:

Image by author

And if we plot 5/6 in the previous plot we made we get that:

Image by author

We did the math right 😄

Ok so at this point I am positive you are all thinking:

“What does it have to do with Machine Learning at all?”

Let’s find out 😏

The Naive Bayes Classifier is the Naive application of the Bayes theorem to a Machine Learning classifier: as simple as that.

Let’s say we have a certain binary classification problem (class 1 and class 2). You have N instances and each instance has its label Y.

The so-called prior probability is defined as the following:

Image by author

Now what we really want to know is: given a certain instance, what is the probability that that instance belongs to class 1? What is the probability that belongs to class 2?

So what we are interested in is, given an instance x:

Image by author

In a binary dataset, we have that the sum of these two quantities is 1, so we actually only need one of these two quantities. Now this quantity might look a mystery, but the other one:

Image by author

can be computed very easily (remember the bowl/balls example!).
It is also very easy to compute the probability of P(x), in the same way, that we computed P(class 1) and P(class 2).

So we can actually compute:

Image by author

All of this looks good. Now I think you might have an understanding of why we called it naive. It is Naive because it is nothing more than counting the occurrences and using Bayesian logic to infer the prediction.

Let’s apply it to a toy dataset. We have these two classes and two features generated using sklearn.datasets library.

Fantastic.
So:

Image by author

Because we have two dimensions.
Given the class 0, we can easily compute what is the probability of a given area (a small area around a 2D point).

So:

  1. We know how to compute the posterior probability, that is P(Y|X)
  2. We have our set of Y and X
  3. We can apply the rule to the training set, and we can predict our test set results

Let’s do it!

  1. Train-Test split:

2. Importing and fitting the Naive Bayes Classifier:

Note that we are adding a bias to our data (x). This is because the Bayes classifier we are using considers our x_1 feature as a categorical feature and we consider these features as numerical ones. A more rigorous approach is using the LabelEncoder() feature, but in this specific case, it is exactly equivalent.

3. Testing the performance:

The performance, in this very simple toy example, is obviously (almost) perfect.

You probably have had enough of me talking, I am sorry for that.
I just want to close this article by talking about the Gaussian Naive Bayes.
In the previous example, we consider the features to be categorical. But what happens when they are not?

Well, we have to assume a certain distribution of the likelihood: P(Y|X).

Image by author

So it is called gaussian Naive Bayes because the likelihood is considered to be gaussian, as you see above. The mean (mu_y) and variance (sigma_y squared) are computed by computing the mean and variance of the classes.

Notice that now we don’t need to add the Bias to our data anymore.

These are the results

In this article we:

  1. Acknowledge the existence of Bayes’ Theorem. We had an intuitive introduction to it and we did a very simple controlled case to see how it works.
  2. Saw how Bayes’ Theorem can be applied to Machine Learning. What is Y, what is X, and how we can put them into the Bayes’ Formula to get some predictions in a classification task.
  3. We made up our own dataset and we used the Categorial Naive Bayes class of sklearn to do a toy classification task. It worked almost perfectly but had the problem of working only with categorical features.
  4. We saw how to extend the Categorical Naive Bayes into Gaussian Naive Bayes. The Gaussian Naive Bayes, as we saw above, uses a Gaussian Likelihood to work. This Gaussian Likelihood works with noncategorical features as well.

The Bayes Theorem is applied in Machine Learning in a lot of other ways, like the Bayesian Neural Networks or the Bayesian Ridge Regression. I feel that this is a very cool introductive example of what the application of the Bayes Theorem can do in a classification problem. I had a lot of fun writing it and I really hope you loved it! 🥰

If you liked the article and you want to know more about machine learning, or you just want to ask me something, you can:

A. Follow me on Linkedin, where I publish all my stories
B. Subscribe to my newsletter. It will keep you updated about new stories and give you the chance to text me to receive all the corrections or doubts you may have.
C. Become a referred member, so you won’t have any “maximum number of stories for the month” and you can read whatever I (and thousands of other Machine Learning and Data Science top writers) write about the newest technology available.

FOLLOW US ON GOOGLE NEWS

Read original article here

Denial of responsibility! Techno Blender is an automatic aggregator of the all world’s media. In each content, the hyperlink to the primary source is specified. All trademarks belong to their rightful owners, all materials to their authors. If you are the owner of the content and do not want us to publish your materials, please contact us by email – admin@technoblender.com. The content will be deleted within 24 hours.
BayesClassifierJanNaivePaialungaPieropythonScratchTech NewsTechnoblenderTechnology
Comments (0)
Add Comment