Correlation vs covariance: it’s much simpler than it seems | by Giuseppe Mastrandrea | Jun, 2022

By Jessie Hobb On Jun 27, 2022

What is correlation? How can we compute correlation between to continuous variables? And what are the differences with covariance?

Clearly, two persons who understood what is correlation. Photo by Jill Wellington: https://www.pexels.com/it-it/foto/due-persone-in-piedi-nella-fotografia-di-sagoma-40815/

Machine learning is a wonderful field of study. Studying Machine Learning means taking the most interesting concepts coming from the most disparate fields (math, finance, biology, computer science, etc) with the aim of producing accurate and reliable predictive models. During my experience as a Machine Learning and Data Science teacher at Datamasters more than once students got confused about basic concepts and indexes related to the world of data science. In my first article (which is available in Italian here) I wrote about some of these indexes: variance, standard deviation and covariance.

In this article we’re going to study another index that can sound confusing, but let me tell you this: it’s definitely not rocket science. We’re talking about the correlation coefficient. Correlation looks a lot like covariance, and its use is very precise: provide us information about the presence (and if yes, what kind of presence) of a relation between two random variables. The unusual thing is that under the term “correlation” many formulas and coefficients can be found, very different from each other. The use of one coefficient over another one is based on the type of variables for which we want to calculate correlation.

Seems like a big deal, uh? Well, maybe. The reality, under certain circumstances which turn out to be not restrictive at all, is much simpler than you could actually expect. Let’s start from two random variables, the old good “weight” and “height” of 6 persons:

Visualization of our dataset made with pyplot. Image by the author.

Pearson correlation coefficient — Visualization of our dataset made with pyplot. Image by the author.

What is correlation? How can we compute correlation between to continuous variables? And what are the differences with covariance?

Clearly, two persons who understood what is correlation. Photo by Jill Wellington: https://www.pexels.com/it-it/foto/due-persone-in-piedi-nella-fotografia-di-sagoma-40815/

Read original article here

Denial of responsibility! Techno Blender is an automatic aggregator of the all world’s media. In each content, the hyperlink to the primary source is specified. All trademarks belong to their rightful owners, all materials to their authors. If you are the owner of the content and do not want us to publish your materials, please contact us by email – [email protected]. The content will be deleted within 24 hours.