Techno Blender
Digitally Yours.

The Science and Art of Causality (part 2) | by Quentin Gallea, PhD | Jan, 2023

0 46


As we saw in the first part of this two-part article, measuring a causal effect is critical to drawing the right conclusions, because every choice you make or decision you make is usually the result of expected causal relationships.

For example:

Individual choices:

  • If I go vegan, I’ll reduce my ecological footprint.
  • If I drink this tequila shot, I’ll dance better.

Companies:

  • Home-office reduces productivity
  • Spamming users with YouTube Premium Ads will increase the number of subscribers.

Policymakers:

  • Replacing nuclear power plants with renewables will help to reach the Paris Agreement.
  • Lockdowns will reduce the spread of the covid-19

The point is that there is no statistical test to prove that your effect is causal. To challenge causality, as explained in the first part of this article, you can ask two main questions: is there anything else that could explain the relationship between the cause and the effect, and could it be the other way around (i.e., the effect causes the cause)?

[H]ow can we find evidence of a causal effect despite the absence of statistical tests to directly test causality?

Those questions allow us to challenge causal claims. But, how can we find evidence of a causal effect despite the absence of statistical tests to directly test causality? In this article, I will show you how researchers proceed with a fascinating scientific paper: ‘LONDON FOG: A CENTURY OF POLLUTION AND MORTALITY, 1866–1965’ (Hanlon (2018)).

To do this, we will put ourselves in the shoes of police detectives. Police detectives constantly try to answer causal questions: who caused the death of this person? Was it Colonel Mustard with the candlestick in the conservatory? Are you sure that it was not with a wrench or that the crime was committed by someone else? In our case, we have a suspect, or more precisely a hypothesis that we would like to test (for example, pollution increases mortality). And then we ask ourselves if it was really pollution or the evolution of health services? Or was it actually a consequence of the weather? etc.

Usually, if you want to find out who committed the crime, you very rarely have video footage of someone committing the crime. And even if you do, maybe the image is blurry, maybe it’s a fake. Therefore, you might never be one hundred percent sure of the identity of the perpetrator. To overcome this limitation, you accumulate evidence, you try to discard all the possible alibis for the criminal until you have enough evidence, and you manage to discard the main other stories (is it somebody else? Was this person doing something else?). It’s very similar in research when we want to find causal evidence.

Let me illustrate those concepts with the following paper ‘LONDON FOG: A CENTURY OF POLLUTION AND MORTALITY, 1866–1965’ (Hanlon (2018)). London was a densely habited and heavily polluted area already during the 19th century. The author of this research paper, answers a very important question: what is the effect of exposure on mortality?

The interesting part of this article comes from the fact that air pollution data has only been available since the 1950s. However, accurate meteorological data have been available since the 1850s. So the idea of the article is to use fog as an indicator of pollution, because when the weather is foggy, pollution remains low and citizens’ exposure to pollution increases (see figure below).

You can find a complete Python notebook with my code to replicate the paper and produce the graph I will use in this article here: Deepnote notebook.

Causal Diagram. Image by author.

A first look at the effect

The paper investigates the effect of heavy fog on mortality rates. Therefore, let us first look at how the mortality rate (all causes together) changed from five weeks before a week of heavy fog to five weeks after. It appears that the mortality rate increases at impact (week 0) as well as in subsequent weeks. However, many factors could explain this effect (e.g., seasonality).

The dataset contains weekly data on weather and death in London from 1850 to 1940 and exclude World War 1 years.

Image by author

Can we discard the story of seasonality and time trend?

First, let us look at how the number of fog events is distributed over the year. There is a very strong seasonality (the probability of encountering heavy fog is higher in winter). It is therefore important to capture the effect of seasonality in our model because cold weather is associated with more fog and also potentially with more deaths (people get sick when it is cold).

Seasonality of fog. Image by author.

Second, let us look at the frequency of week with heavy fog from 1850 to 1940. Again, we can see a strong correlation. We observe fewer weeks with fog on average after 1900 compared to before 1900. The model will have to take this evolution into account to avoid confounding this effect with our effect of interest. This is because the quality of the medical system changes over time, reducing the mortality rate, while the number of weeks with fog also decreases over time. Therefore, if we do not capture the time trend, it could inflate the coefficient (overestimate the effect of fog on mortality).

Yearly number of fog events. The horizontal red dashed lines show the average before/after 1900. Image by author

Note that even if you are not familiar with the model presented below, you should be able to follow and understand the idea. The estimated model is a simple linear regression:

with t for the week. Fog^s is a dummy variable taking the value one when there was heavy fog in week s+t. X is a vector of meteorological controls including rainfall, temperature, pressure, and humidity. Year and Week are sets of fixed effects respectively capturing the effect of the year and the effect of the calendar week (seasonality). e is an error term.

Therefore, this model allows to measure the effect on mortality before, during and after weeks withheavy fog, while taking into account meteorological conditions, seasonality and yearly fixed effects (evolution in time).

Forest plot representing the coefficient of the linear regressions. The vertical axis represents the death rate while the horizontal axis represent the distance in week to a week with heavy fog. The bars represent the 95% confidence intervals. Image by author

The figure above compares a model without taking seasonality into account (pink squares) and a model taking seasonality into account (orange circles). We can see that the effect of seasonality does indeed inflate the coefficients (the increase in mortality rate is greater in the pink model than in the orange model). In addition, the mortality rate in the model with seasonality returns to the pre-fog week level after two weeks.

Let us now include the year fixed effects. This set of control variables captures the evolution over time of pollution but also for example of the quality of the health sector. Therefore, the interpretation of the coefficients is slightly different. Now we explore the deviations of the mortality rate from the average mortality rate in year t.

The figure below reveals that the mortality rate increases during the week of heavy fog and the following week. In addition, we can see that the weather controls do not affect the estimates much.

Forest plot representing the coefficient of the linear regressions. The vertical axis represents the death rate while the horizontal axis represent the distance in week to a week with heavy fog. Here both models include week and year fixed effects. The bars represent the 95% confidence intervals. Image by author

Let us now question the causal nature of this effect. The rationale is that fog keeps pollution low and therefore increases mortality. Use the tool I presented in the first part of this article: “What if it was something else that explained this effect?”.

Could it be a story of accident and crime?

If there is more fog, it is difficult to see, so there are more accidents or crimes. To rule out this alternative story, the author compares the death rate based on the recorded cause of death (e.g., accident/crime vs. pneumonia).

The figure below reveals that there is no effect on the death rate caused by accident/crime around a week of heavy fog, while we observe a statistically significant effect on mortality caused by pneumonia.

We are getting closer to catch our suspect: pollution. There is one last alternative story I wanted to explore with you.

Forest plot representing the coefficient of the linear regressions. The vertical axis represents the death rate (from different cause: accident/crime vs pneumonia) while the horizontal axis represent the distance in week to a week with heavy fog. The bars represent the 95% confidence intervals. Image by author

Could it be a story of weather and epidemiology?

When the weather is bad (foggy), people stay home. Hence, if people stay at home, there is a higher risk that they will contaminate other people and therefore the increase in deaths (e.g. from pneumonia) is simply a consequence of that and not of pollution. It seems very difficult to falsify this story, right?

The author has done something very elegant to dismiss this alternative story. The author compares two different weather shocks: heavy fog and heavy rain. Indeed, a heavy rain will have a similar effect: people might stay home longer. However, the key point here is that fog keeps pollution low while rain cleans the air. Therefore, if this is a pollution story, we would find an opposite effect of rain (fewer deaths) compared to fog.

The figure below reveals exactly this effect: fog kills, rain saves lives.

Forest plot representing the coefficient of the linear regressions. The vertical axis represents the overall death rate while the horizontal axis represent the distance in week to a week with heavy fog. The bars represent the 95% confidence intervals. Image by author

Conclusion

The paper presents a strong argument for the causal relationship between fog and pollution on health in London for a century, using evidence to discount various alternative explanations. To evaluate causality, it can be helpful to ask two questions: ‘Is there something else that could be causing the effect?’ and ‘Could it be the reverse?’. In addition, next time you have to challenge a causal claim, put yourself in the shoes of a detective, considering and gathering evidence for and against different explanations. Together, using those technics let’s make more informed decisions and fight misinformation.


As we saw in the first part of this two-part article, measuring a causal effect is critical to drawing the right conclusions, because every choice you make or decision you make is usually the result of expected causal relationships.

For example:

Individual choices:

  • If I go vegan, I’ll reduce my ecological footprint.
  • If I drink this tequila shot, I’ll dance better.

Companies:

  • Home-office reduces productivity
  • Spamming users with YouTube Premium Ads will increase the number of subscribers.

Policymakers:

  • Replacing nuclear power plants with renewables will help to reach the Paris Agreement.
  • Lockdowns will reduce the spread of the covid-19

The point is that there is no statistical test to prove that your effect is causal. To challenge causality, as explained in the first part of this article, you can ask two main questions: is there anything else that could explain the relationship between the cause and the effect, and could it be the other way around (i.e., the effect causes the cause)?

[H]ow can we find evidence of a causal effect despite the absence of statistical tests to directly test causality?

Those questions allow us to challenge causal claims. But, how can we find evidence of a causal effect despite the absence of statistical tests to directly test causality? In this article, I will show you how researchers proceed with a fascinating scientific paper: ‘LONDON FOG: A CENTURY OF POLLUTION AND MORTALITY, 1866–1965’ (Hanlon (2018)).

To do this, we will put ourselves in the shoes of police detectives. Police detectives constantly try to answer causal questions: who caused the death of this person? Was it Colonel Mustard with the candlestick in the conservatory? Are you sure that it was not with a wrench or that the crime was committed by someone else? In our case, we have a suspect, or more precisely a hypothesis that we would like to test (for example, pollution increases mortality). And then we ask ourselves if it was really pollution or the evolution of health services? Or was it actually a consequence of the weather? etc.

Usually, if you want to find out who committed the crime, you very rarely have video footage of someone committing the crime. And even if you do, maybe the image is blurry, maybe it’s a fake. Therefore, you might never be one hundred percent sure of the identity of the perpetrator. To overcome this limitation, you accumulate evidence, you try to discard all the possible alibis for the criminal until you have enough evidence, and you manage to discard the main other stories (is it somebody else? Was this person doing something else?). It’s very similar in research when we want to find causal evidence.

Let me illustrate those concepts with the following paper ‘LONDON FOG: A CENTURY OF POLLUTION AND MORTALITY, 1866–1965’ (Hanlon (2018)). London was a densely habited and heavily polluted area already during the 19th century. The author of this research paper, answers a very important question: what is the effect of exposure on mortality?

The interesting part of this article comes from the fact that air pollution data has only been available since the 1950s. However, accurate meteorological data have been available since the 1850s. So the idea of the article is to use fog as an indicator of pollution, because when the weather is foggy, pollution remains low and citizens’ exposure to pollution increases (see figure below).

You can find a complete Python notebook with my code to replicate the paper and produce the graph I will use in this article here: Deepnote notebook.

Causal Diagram. Image by author.

A first look at the effect

The paper investigates the effect of heavy fog on mortality rates. Therefore, let us first look at how the mortality rate (all causes together) changed from five weeks before a week of heavy fog to five weeks after. It appears that the mortality rate increases at impact (week 0) as well as in subsequent weeks. However, many factors could explain this effect (e.g., seasonality).

The dataset contains weekly data on weather and death in London from 1850 to 1940 and exclude World War 1 years.

Image by author

Can we discard the story of seasonality and time trend?

First, let us look at how the number of fog events is distributed over the year. There is a very strong seasonality (the probability of encountering heavy fog is higher in winter). It is therefore important to capture the effect of seasonality in our model because cold weather is associated with more fog and also potentially with more deaths (people get sick when it is cold).

Seasonality of fog. Image by author.

Second, let us look at the frequency of week with heavy fog from 1850 to 1940. Again, we can see a strong correlation. We observe fewer weeks with fog on average after 1900 compared to before 1900. The model will have to take this evolution into account to avoid confounding this effect with our effect of interest. This is because the quality of the medical system changes over time, reducing the mortality rate, while the number of weeks with fog also decreases over time. Therefore, if we do not capture the time trend, it could inflate the coefficient (overestimate the effect of fog on mortality).

Yearly number of fog events. The horizontal red dashed lines show the average before/after 1900. Image by author

Note that even if you are not familiar with the model presented below, you should be able to follow and understand the idea. The estimated model is a simple linear regression:

with t for the week. Fog^s is a dummy variable taking the value one when there was heavy fog in week s+t. X is a vector of meteorological controls including rainfall, temperature, pressure, and humidity. Year and Week are sets of fixed effects respectively capturing the effect of the year and the effect of the calendar week (seasonality). e is an error term.

Therefore, this model allows to measure the effect on mortality before, during and after weeks withheavy fog, while taking into account meteorological conditions, seasonality and yearly fixed effects (evolution in time).

Forest plot representing the coefficient of the linear regressions. The vertical axis represents the death rate while the horizontal axis represent the distance in week to a week with heavy fog. The bars represent the 95% confidence intervals. Image by author

The figure above compares a model without taking seasonality into account (pink squares) and a model taking seasonality into account (orange circles). We can see that the effect of seasonality does indeed inflate the coefficients (the increase in mortality rate is greater in the pink model than in the orange model). In addition, the mortality rate in the model with seasonality returns to the pre-fog week level after two weeks.

Let us now include the year fixed effects. This set of control variables captures the evolution over time of pollution but also for example of the quality of the health sector. Therefore, the interpretation of the coefficients is slightly different. Now we explore the deviations of the mortality rate from the average mortality rate in year t.

The figure below reveals that the mortality rate increases during the week of heavy fog and the following week. In addition, we can see that the weather controls do not affect the estimates much.

Forest plot representing the coefficient of the linear regressions. The vertical axis represents the death rate while the horizontal axis represent the distance in week to a week with heavy fog. Here both models include week and year fixed effects. The bars represent the 95% confidence intervals. Image by author

Let us now question the causal nature of this effect. The rationale is that fog keeps pollution low and therefore increases mortality. Use the tool I presented in the first part of this article: “What if it was something else that explained this effect?”.

Could it be a story of accident and crime?

If there is more fog, it is difficult to see, so there are more accidents or crimes. To rule out this alternative story, the author compares the death rate based on the recorded cause of death (e.g., accident/crime vs. pneumonia).

The figure below reveals that there is no effect on the death rate caused by accident/crime around a week of heavy fog, while we observe a statistically significant effect on mortality caused by pneumonia.

We are getting closer to catch our suspect: pollution. There is one last alternative story I wanted to explore with you.

Forest plot representing the coefficient of the linear regressions. The vertical axis represents the death rate (from different cause: accident/crime vs pneumonia) while the horizontal axis represent the distance in week to a week with heavy fog. The bars represent the 95% confidence intervals. Image by author

Could it be a story of weather and epidemiology?

When the weather is bad (foggy), people stay home. Hence, if people stay at home, there is a higher risk that they will contaminate other people and therefore the increase in deaths (e.g. from pneumonia) is simply a consequence of that and not of pollution. It seems very difficult to falsify this story, right?

The author has done something very elegant to dismiss this alternative story. The author compares two different weather shocks: heavy fog and heavy rain. Indeed, a heavy rain will have a similar effect: people might stay home longer. However, the key point here is that fog keeps pollution low while rain cleans the air. Therefore, if this is a pollution story, we would find an opposite effect of rain (fewer deaths) compared to fog.

The figure below reveals exactly this effect: fog kills, rain saves lives.

Forest plot representing the coefficient of the linear regressions. The vertical axis represents the overall death rate while the horizontal axis represent the distance in week to a week with heavy fog. The bars represent the 95% confidence intervals. Image by author

Conclusion

The paper presents a strong argument for the causal relationship between fog and pollution on health in London for a century, using evidence to discount various alternative explanations. To evaluate causality, it can be helpful to ask two questions: ‘Is there something else that could be causing the effect?’ and ‘Could it be the reverse?’. In addition, next time you have to challenge a causal claim, put yourself in the shoes of a detective, considering and gathering evidence for and against different explanations. Together, using those technics let’s make more informed decisions and fight misinformation.

FOLLOW US ON GOOGLE NEWS

Read original article here

Denial of responsibility! Techno Blender is an automatic aggregator of the all world’s media. In each content, the hyperlink to the primary source is specified. All trademarks belong to their rightful owners, all materials to their authors. If you are the owner of the content and do not want us to publish your materials, please contact us by email – [email protected]. The content will be deleted within 24 hours.
Leave a comment