Techno Blender
Digitally Yours.

Please: No More Flipping Coins in Data Science | by Federico Trotta | Apr, 2023

0 36


Image by Keith Johnston on Pixabay

Some days ago I read the topic of a thread on a channel I follow dedicated to Data Science; it was something like: “I flip 8 different coins. What’s the probability of tails for the 8th coin?

And you know what? This reminded me why I hated statistics at school, even though my statistics classes were “light”.

Luckily for me, I started studying Data Science when I was working. At the time, I had the chance to start studying statistics again, and focusing on a specific domain: this is what helped me fall in love with “applied statistics”. Or, as I call it: “statistics for Data Science”.

So, in this article I want to tell you why I believe we need to engineer statistics when we talk about Data Science, leaving coins to bar talk with friends over a beer.

While there is nothing bad about relying on “more academic examples”, the truth is that some people just can’t stand them, preferring “domain examples”, as this helps them better understand the topics. And I’m one of these people.

I know: this may be a controversial article, but I hope you’ll like it anyway, and…please: let’s talk about that in the comments!

I studied at “Liceo Scientifico” (that is something like “Science high school”) and Mechanical Engineering and had a few “light” classes in statistics. But you know what? The approaches were similar.

Here in Italy, we are famous (apart from the pizza!) for having very theoretical studies. And, believe me: it’s true!
I don’t know you, but for us, the following can be the typical problems when solving a statistical exercise:

  • “Flip through a deck of cards. What is the probability you draw the ace of hearts?”
  • “You have 100 balls in a bag. 30 are black and 70 are white. You take one out and it’s black. What is the probability that the next one is white?”

And the list can be extended… The fact is that every time I heard these kinds of questions my (mental) answer was:” WHO CARES?”.
Yes, let’s say it out of loud: who cares? Who wants to be a magician that can predict the color of the next ball out of the bag? Well, not me…

The truth is that my brain shut down when it heard such questions and it didn’t even want to hear the whole phrase because it knew it didn’t care. It simply didn’t like these kinds of challenges.

And here’s the point: the challenges were not challenging. Well, at least not to me.

But there’s more. It seems that the only interesting part of statistics is probability. So, if I could talk again to one of my statistics professors I’d say: “Hey! We all know gambling is iniquitous, because the probability of winning is way low. But statistics is not just probability and probability can be way more than that!”.

As an engineer, I love to be very practical and love concrete examples.
When I started studying Data Science I was working as Process Engineer in a firm in the industrial field, and when I discovered I had to know statistics I told myself:” Well, let’s see if this is the right time”.

One of the first questions I asked myself was:” Given a particular product manufactured in a particular assembly line, can Data Science help me find a way to understand who’s the best operator working in a particular manufacturing phase?”. Well, this is a very challenging question. And, in fact, it ended up I developed my bachelor’s thesis on it, creating an anomaly detection algorithm for industrial processes.

They’ve always told me: “Mathematics is an exact science”. This means that mathematics can be expressed with methodological rigor and its phenomena are measurable, reproducible, and objectively expressible in an analytical way, thus managing to predict the results of phenomena falling within its scope through a mathematical expression.

This means a simple thing to me: mathematics is in the sky, meaning it has little contact with reality.

Please, don’t get me wrong here. Our world works thanks to math, but to applied math!

What I mean is that I consider math as the law and Science and Engineering as the Judge who has to apply the law. This goes for any Science, and this is why I say that math for Data Science should be engineered.

Haven’t you heard what’s the most important thing to know in Data Science? Well, it’s domain knowledge! Not math, not statistics, not programming: it’s domain knowledge! This suggests what we said before: Data Science is a practical Science where we need to apply math and statistics to actual domain cases.

This is why I don’t care about the probability when flopping coins.
I love to hear: “When’s the right time to do maintenance to my machine if it works in that conditions for the next 12 months?

That’s the power of Machine Learning, which is nothing more than mathematics and statistics applied to the real world.

Please, don’t get me wrong: I’ve nothing against calculating the probability of flipping coins and other amenities.

But maybe, we should only understand that a lot of people just don’t care because they need (and love) to solve more challenging problems.

I remember spending evenings with my University mates discussing some deep, and indeed theoretical, topics in mathematics. It seems our brains were expanding while discussing, and learning together the most complex topics on analysis and algebra.

So, there is beauty even in math just for math. But at a certain point, I just needed to put my feet back on the ground and try to apply my learnings using practical cases, for a simple reason: it just helped me better understand the theory and the formulas.

So, good for you if flipping coins help you better understand probability and statistics, but this is not the same for every one of us.

There are a lot of studies suggesting that a practical approach helps better understand mathematics. Here’s an interesting book that may help you with that.

The question is: when an approach is “practical enough”? Isn’t calculating the probability of flipping coins “practical enough”?

Well, problems to solve should be both practical and interesting…

So, if you’re approaching Data Science and need to learn statistics you should make an effort on challenging yourself if you don’t like flipping-coins problems or similar.

If you have a job, try to apply Data Science to it; this is how I did and it worked. For example, say you work in the industrial field and can analyze production data. Target a product and start asking yourself:

  • What’s the average time needed to manufacture it?
  • Is there a time when operators produce the best (i.e., early in the morning)?
  • What is the probability that there are production wastes during the working day?

And so on. Try to be very specific and you’ll see that you can’t stop asking yourself such questions and find the solutions.

So, try to stick to your industry (or to an industry you love, if you don’t have a job) and ask the right questions that, eventually, will make you love (practical) statistics.

As I said in the beginning, this may be a controversial article but I hope you get the point.

The idea is to trigger people to make real examples when debating Data Science. I know: there is something beautiful even in calculating the probability of tails for the 8th coin; in fact, someone said: “Is this a regular coin, or it is loaded?”.

Fair point! But again…are you interested?

Well, sorry, but I’m not. If you’re not interested in solving these kinds of problems in statistics, but indeed want to develop a career in Data Science: please, don’t give up. Make your own examples to challenge yourself: you’ll see the results in a matter of days.

And…let me know in the comments what are your thoughts!


Image by Keith Johnston on Pixabay

Some days ago I read the topic of a thread on a channel I follow dedicated to Data Science; it was something like: “I flip 8 different coins. What’s the probability of tails for the 8th coin?

And you know what? This reminded me why I hated statistics at school, even though my statistics classes were “light”.

Luckily for me, I started studying Data Science when I was working. At the time, I had the chance to start studying statistics again, and focusing on a specific domain: this is what helped me fall in love with “applied statistics”. Or, as I call it: “statistics for Data Science”.

So, in this article I want to tell you why I believe we need to engineer statistics when we talk about Data Science, leaving coins to bar talk with friends over a beer.

While there is nothing bad about relying on “more academic examples”, the truth is that some people just can’t stand them, preferring “domain examples”, as this helps them better understand the topics. And I’m one of these people.

I know: this may be a controversial article, but I hope you’ll like it anyway, and…please: let’s talk about that in the comments!

I studied at “Liceo Scientifico” (that is something like “Science high school”) and Mechanical Engineering and had a few “light” classes in statistics. But you know what? The approaches were similar.

Here in Italy, we are famous (apart from the pizza!) for having very theoretical studies. And, believe me: it’s true!
I don’t know you, but for us, the following can be the typical problems when solving a statistical exercise:

  • “Flip through a deck of cards. What is the probability you draw the ace of hearts?”
  • “You have 100 balls in a bag. 30 are black and 70 are white. You take one out and it’s black. What is the probability that the next one is white?”

And the list can be extended… The fact is that every time I heard these kinds of questions my (mental) answer was:” WHO CARES?”.
Yes, let’s say it out of loud: who cares? Who wants to be a magician that can predict the color of the next ball out of the bag? Well, not me…

The truth is that my brain shut down when it heard such questions and it didn’t even want to hear the whole phrase because it knew it didn’t care. It simply didn’t like these kinds of challenges.

And here’s the point: the challenges were not challenging. Well, at least not to me.

But there’s more. It seems that the only interesting part of statistics is probability. So, if I could talk again to one of my statistics professors I’d say: “Hey! We all know gambling is iniquitous, because the probability of winning is way low. But statistics is not just probability and probability can be way more than that!”.

As an engineer, I love to be very practical and love concrete examples.
When I started studying Data Science I was working as Process Engineer in a firm in the industrial field, and when I discovered I had to know statistics I told myself:” Well, let’s see if this is the right time”.

One of the first questions I asked myself was:” Given a particular product manufactured in a particular assembly line, can Data Science help me find a way to understand who’s the best operator working in a particular manufacturing phase?”. Well, this is a very challenging question. And, in fact, it ended up I developed my bachelor’s thesis on it, creating an anomaly detection algorithm for industrial processes.

They’ve always told me: “Mathematics is an exact science”. This means that mathematics can be expressed with methodological rigor and its phenomena are measurable, reproducible, and objectively expressible in an analytical way, thus managing to predict the results of phenomena falling within its scope through a mathematical expression.

This means a simple thing to me: mathematics is in the sky, meaning it has little contact with reality.

Please, don’t get me wrong here. Our world works thanks to math, but to applied math!

What I mean is that I consider math as the law and Science and Engineering as the Judge who has to apply the law. This goes for any Science, and this is why I say that math for Data Science should be engineered.

Haven’t you heard what’s the most important thing to know in Data Science? Well, it’s domain knowledge! Not math, not statistics, not programming: it’s domain knowledge! This suggests what we said before: Data Science is a practical Science where we need to apply math and statistics to actual domain cases.

This is why I don’t care about the probability when flopping coins.
I love to hear: “When’s the right time to do maintenance to my machine if it works in that conditions for the next 12 months?

That’s the power of Machine Learning, which is nothing more than mathematics and statistics applied to the real world.

Please, don’t get me wrong: I’ve nothing against calculating the probability of flipping coins and other amenities.

But maybe, we should only understand that a lot of people just don’t care because they need (and love) to solve more challenging problems.

I remember spending evenings with my University mates discussing some deep, and indeed theoretical, topics in mathematics. It seems our brains were expanding while discussing, and learning together the most complex topics on analysis and algebra.

So, there is beauty even in math just for math. But at a certain point, I just needed to put my feet back on the ground and try to apply my learnings using practical cases, for a simple reason: it just helped me better understand the theory and the formulas.

So, good for you if flipping coins help you better understand probability and statistics, but this is not the same for every one of us.

There are a lot of studies suggesting that a practical approach helps better understand mathematics. Here’s an interesting book that may help you with that.

The question is: when an approach is “practical enough”? Isn’t calculating the probability of flipping coins “practical enough”?

Well, problems to solve should be both practical and interesting…

So, if you’re approaching Data Science and need to learn statistics you should make an effort on challenging yourself if you don’t like flipping-coins problems or similar.

If you have a job, try to apply Data Science to it; this is how I did and it worked. For example, say you work in the industrial field and can analyze production data. Target a product and start asking yourself:

  • What’s the average time needed to manufacture it?
  • Is there a time when operators produce the best (i.e., early in the morning)?
  • What is the probability that there are production wastes during the working day?

And so on. Try to be very specific and you’ll see that you can’t stop asking yourself such questions and find the solutions.

So, try to stick to your industry (or to an industry you love, if you don’t have a job) and ask the right questions that, eventually, will make you love (practical) statistics.

As I said in the beginning, this may be a controversial article but I hope you get the point.

The idea is to trigger people to make real examples when debating Data Science. I know: there is something beautiful even in calculating the probability of tails for the 8th coin; in fact, someone said: “Is this a regular coin, or it is loaded?”.

Fair point! But again…are you interested?

Well, sorry, but I’m not. If you’re not interested in solving these kinds of problems in statistics, but indeed want to develop a career in Data Science: please, don’t give up. Make your own examples to challenge yourself: you’ll see the results in a matter of days.

And…let me know in the comments what are your thoughts!

FOLLOW US ON GOOGLE NEWS

Read original article here

Denial of responsibility! Techno Blender is an automatic aggregator of the all world’s media. In each content, the hyperlink to the primary source is specified. All trademarks belong to their rightful owners, all materials to their authors. If you are the owner of the content and do not want us to publish your materials, please contact us by email – [email protected]. The content will be deleted within 24 hours.

Leave a comment