Techno Blender
Digitally Yours.

The Difference between Correlation and Causation | by Niklas Lang

0 80


Everything you need to know about Casual Inference

Photo by Clay Banks on Unsplash

Correlation refers to the relationship between two statistical variables. The two variables are then dependent on each other and change together. A positive correlation of two variables, therefore, means that an increase in A also leads to an increase in B. The association is undirected. It is therefore also true in the reverse case and an increase in variable B also changes the slope of A to the same extent.

Causation, on the other hand, describes a cause-effect relationship between two variables. Causation between A and B, therefore, means that the increase in A is also the cause of the increase in B. The difference quickly becomes clear with a simple example:

A study could very likely find a positive correlation between a person’s risk of skin cancer and the number of times they visit the outdoor pool. So if a person visits the outdoor pool frequently, then their risk of developing skin cancer also increases. A clear positive association. But is there also causation between outdoor swimming pool visits and skin cancer? Probably not, because that would mean that only outdoor swimming pool visits are the cause of the increased risk of skin cancer.

It is much more likely that people who spend more time in outdoor swimming pools are also exposed to significantly more sunlight. If they do not take sufficient precautions with sunscreen or similar, more sunburns can occur, which increases the risk of skin cancer. It is clear that the correlation between outdoor swimming pool visits and skin cancer risk is not causal.

Is there a Causation between Outdoor Swimming Pool Visits and Skin Cancer? | Source: Author

A variety of curious correlations that very likely do not show causation can be found at tylervigen.com.

For example, there is a very high association between the divorce rate in the American state of Maine and the per capita consumption of margarine. Whether this is also causation can be doubted.

In general, there are two types of contexts that can be distinguished:

  1. Linear or Non-Linear: The dependencies are linear if the changes in variable A always trigger a change with a constant factor in variable B. If this is not the case, the dependency is said to be non-linear.
  2. Positive or Negative: If the increase in variable A leads to an increase in variable B, then there is a positive correlation. If, on the other hand, the increase in A leads to a decrease in B, then the dependency is negative.
Types of Correlation | Source: Author

The Correlation Coefficient indicates how strong the association between the two variables is. In the example of tylervigen.com, this correlation is very strong at 99.26% and means that the two variables move almost 1 to 1, i.e. an increase in the consumption of Margarine by 10% also leads to an increase in the divorce rate by 10%. The correlation coefficient can also assume negative values.

A correlation coefficient smaller than 0 describes the Anti-Correlation and states that the two variables behave in opposite ways. For example, a negative association exists between current age and remaining life expectancy. The older a person gets, the shorter his or her remaining life expectancy.

In order to reliably prove causation, scientific experiments are conducted. In these experiments, people or test objects are divided into groups (you can read more about how this happens in our article about Sampling), so that in the best case all characteristics of the participants are similar or identical except for the characteristic that is assumed to be the cause.

For the “skin cancer outdoor swimming pool case”, this means that we try to form two groups in which both groups of participants have similar or preferably even the same characteristics, such as age, gender, physical health, and exposure to sunlight per week. Now it is examined whether the outdoor swimming pool visits of one group (note: the exposed sun exposure must remain constant), changes the skin cancer risk compared to the group that did not go to the outdoor swimming pool. If this change exceeds a certain level, one can speak of causation.

  • Only in very few cases does a correlation also imply causation.
  • Correlation means that two variables always change together. Causation, on the other hand, means that the change in one variable is the cause of the change in the other.
  • The correlation coefficient indicates the strength of the association. It can be either positive or negative. If the coefficient is negative, it is called anticorrelation.
  • To prove causation one needs complex experiments.


Everything you need to know about Casual Inference

Photo by Clay Banks on Unsplash

Correlation refers to the relationship between two statistical variables. The two variables are then dependent on each other and change together. A positive correlation of two variables, therefore, means that an increase in A also leads to an increase in B. The association is undirected. It is therefore also true in the reverse case and an increase in variable B also changes the slope of A to the same extent.

Causation, on the other hand, describes a cause-effect relationship between two variables. Causation between A and B, therefore, means that the increase in A is also the cause of the increase in B. The difference quickly becomes clear with a simple example:

A study could very likely find a positive correlation between a person’s risk of skin cancer and the number of times they visit the outdoor pool. So if a person visits the outdoor pool frequently, then their risk of developing skin cancer also increases. A clear positive association. But is there also causation between outdoor swimming pool visits and skin cancer? Probably not, because that would mean that only outdoor swimming pool visits are the cause of the increased risk of skin cancer.

It is much more likely that people who spend more time in outdoor swimming pools are also exposed to significantly more sunlight. If they do not take sufficient precautions with sunscreen or similar, more sunburns can occur, which increases the risk of skin cancer. It is clear that the correlation between outdoor swimming pool visits and skin cancer risk is not causal.

Is there a Causation between Outdoor Swimming Pool Visits and Skin Cancer? | Source: Author

A variety of curious correlations that very likely do not show causation can be found at tylervigen.com.

For example, there is a very high association between the divorce rate in the American state of Maine and the per capita consumption of margarine. Whether this is also causation can be doubted.

In general, there are two types of contexts that can be distinguished:

  1. Linear or Non-Linear: The dependencies are linear if the changes in variable A always trigger a change with a constant factor in variable B. If this is not the case, the dependency is said to be non-linear.
  2. Positive or Negative: If the increase in variable A leads to an increase in variable B, then there is a positive correlation. If, on the other hand, the increase in A leads to a decrease in B, then the dependency is negative.
Types of Correlation | Source: Author

The Correlation Coefficient indicates how strong the association between the two variables is. In the example of tylervigen.com, this correlation is very strong at 99.26% and means that the two variables move almost 1 to 1, i.e. an increase in the consumption of Margarine by 10% also leads to an increase in the divorce rate by 10%. The correlation coefficient can also assume negative values.

A correlation coefficient smaller than 0 describes the Anti-Correlation and states that the two variables behave in opposite ways. For example, a negative association exists between current age and remaining life expectancy. The older a person gets, the shorter his or her remaining life expectancy.

In order to reliably prove causation, scientific experiments are conducted. In these experiments, people or test objects are divided into groups (you can read more about how this happens in our article about Sampling), so that in the best case all characteristics of the participants are similar or identical except for the characteristic that is assumed to be the cause.

For the “skin cancer outdoor swimming pool case”, this means that we try to form two groups in which both groups of participants have similar or preferably even the same characteristics, such as age, gender, physical health, and exposure to sunlight per week. Now it is examined whether the outdoor swimming pool visits of one group (note: the exposed sun exposure must remain constant), changes the skin cancer risk compared to the group that did not go to the outdoor swimming pool. If this change exceeds a certain level, one can speak of causation.

  • Only in very few cases does a correlation also imply causation.
  • Correlation means that two variables always change together. Causation, on the other hand, means that the change in one variable is the cause of the change in the other.
  • The correlation coefficient indicates the strength of the association. It can be either positive or negative. If the coefficient is negative, it is called anticorrelation.
  • To prove causation one needs complex experiments.

FOLLOW US ON GOOGLE NEWS

Read original article here

Denial of responsibility! Techno Blender is an automatic aggregator of the all world’s media. In each content, the hyperlink to the primary source is specified. All trademarks belong to their rightful owners, all materials to their authors. If you are the owner of the content and do not want us to publish your materials, please contact us by email – [email protected]. The content will be deleted within 24 hours.

Leave a comment