Causal Inference with Continuous Treatments | by Ehud Karavani | Nov, 2022

By Jessie Hobb On Nov 2, 2022

Generalizing inverse probability weights for non-categorical treatments

Moving from categorical to continuous treatment variables (by the author).

Causal inference, the science of estimating causal effects from non-randomized observational data, is usually presented using binary treatment; we either treat or don’t treat; we give drug A or drug B. There’s a good reason for that, as causality is already complex as it is. However, not all interventions are binary (or discrete).

Sometimes, the interventions we care about are continuous. “Taking a drug”, for example, is fairly vague — drugs have active ingredients and those can come in different dosages. Too little and the drug might not seem effective, too much and the drug might be harmful. Therefore, we might be interested in the effect of different dosages of the same drug. This is often called dose-response modelling.

Continuous exposures are all around us. From drug dosages to number of daily cigarettes smoked or air pollution levels, from how much time you watched an ad before skipping it to how much red a the “unsubscribe” button is on a newsletter, from the interest rate increased by the central bank to the amount of money in a lottery winning. We can’t limit ourselves to studying binary exposures just because the introduction book didn’t cover the other ones.

In this post I will introduce a generalized version of inverse probability weighting for continuous treatment. I’ll show different estimation methods and discuss the required assumptions and its limitations. I will assume you are familiar with causal inference and IPW for binary treatment, but if you are not — I got you covered in this IPW explainer.

Recall that in the binary treatment setting, a common way to estimate causal effects is by using inverse probability weighting (sometimes called inverse propensity weighting, but I’ll just use IPW). Given individual l with treatment assignment aₗ and characteristics xₗ, its inverse propensity weight is defined as: wₗ = 1/Pr[A=aₗ|X=xₗ]. Namely, the inverse probability of l to be assigned to their treatment, given their characteristics.

However, when treatment (or any random variable for that matter) is continuous, the notion probability mass fails and we need to speak in terms of probability density. This is because the probability of a single point, say aₗ, is basically 0, while it may still have density associated with it since density is defined as the derivative of the cumulative probability function. This is a fundamental theoretical difference, but we can capture it in a small notation change, instead of Pr[A=aₗ|X=xₗ] we will use ƒ(aₗ|xₗ).

Gradually approximating a discrete binomial distribution with a continuous Gaussian one (image by author)

Recall that estimating the treatment effect with IPW is comprised of two main steps. First, model the treatment and obtain IP-weights. Second, model the outcome using those weights. In the binary case, once we have the weights, the simplest way to estimate the potential outcomes is to simply take the weighted average in the treated and untreated (often called the Horvitz-Thompson estimator). However, an equivalent way is to use a simple univariable regression: regress the outcome against the treatment (and an intercept) weighted by the IP-weights. Then, the average treatment effect is simply defined by the coefficient corresponding to the treatment variable. This is often called a marginal structural model in the epidemiology literature.

Note that in the continuous treatment case, the first option is not applicable. Often, there will be many unique treatment values and it will be rare to have enough samples with the exact same continuous treatment value, for all treatment values. Binning them will solve it, but we’re here for continuous treatment modelling. Therefore, we will need to use the latter option and create an additional (parametric) model between the outcome and the treatment. This will be our dose-response function.

Let’s examine those two steps in more details.

Step 1: modelling the treatment

With categorical treatments, we needed to model the probability of getting treated. We could have done that by regressing the treatment assignment against the covariates, basically using any “classifier” that outputs predictions in the 0–1 interval which we can then interpret as probabilities. Logistic regression, for example is a generalized linear model that is defined by the binomial distribution — a discrete probability function. With continuous treatment, however, we will need a regression model instead. For example, in generalized linear models, a linear regression model is defined by the gaussian distribution. And as the animation shows above — the more categories a binomial distribution has the better it is approximated by a normal distribution.

Once we fitted a model, we can obtain the conditional expectation E[A|X]. But unlike the binomial case, in the continuous case, this is not sufficient to generate densities. For simplicity, let’s assume the common Gaussian distribution, which is parameterized by a mean and variance. The conditional mean of that distribution will be the estimated conditional expectations (the predictions); the variance will be constant and will be set to be the variance of the residuals between the treatment and the predictions. Once we defined the distribution, we take the density of the observed treatment values with respect to this distribution. The generalized IP-weights are the inverse of these densities.

To summarize step 1:

Fit a function g(x), regressing the treatment A on covariates X.
Define the conditional distribution Dₗ=Normal(g(xₗ), Var(aₗ-g(xₗ)))
a. The conditional mean of each sample is its prediction.
b. The variance is fixed and is the variance of the prediction residuals.
Define the density dₗ as the value of aₗ from Dₗ.
Define the weight wₗ to be the inverse of the density: 1/dₗ.

Statistical summary of the first step, modelling the treatment

Step 2: modelling the outcome

Once we obtained the balancing weights w, we can model the counterfactual outcomes using the observed outcomes and treatments. To do that, we regress the outcome against the treatment, weighted by the IP-weights obtained from step 1. However, unlike the binary treatment case, the functional form of the continuous treatment should be flexible enough to avoid bias due to misspecification. For example, we will add a quadratic term of the treatment or model it using a spline, etc.

When we have non-linear transformations of the main treatment variable, we can no longer interpret the treatment effect as the coefficient of the treatment covariate. Instead, to make counterfactual outcome predictions, we will set some treatment value and run it through our model to get the predicted outcome, and average it out across the units to obtain the average outcome had everyone been assigned that specific treatment value.

We can repeat that for two different treatment values. Then the causal effect will be the difference (or ratio) between these two potential outcome predictions. Alternatively, we can repeat that for every treatment value in a range we care about and obtain a dose-response curve — see how the counterfactual outcome prediction changes as a function of assigning different dosages.

Marginal Structural Model — regress the outcome on the treatment weighted by the generalized IP-weights. As proposed by Robins, Hernan, and Brumback.

Code

Below is a Python code demonstrating the estimation process described above.

Generalizing inverse probability weights for non-categorical treatments

Let’s examine those two steps in more details.

Step 1: modelling the treatment

To summarize step 1:

Fit a function g(x), regressing the treatment A on covariates X.
Define the conditional distribution Dₗ=Normal(g(xₗ), Var(aₗ-g(xₗ)))
a. The conditional mean of each sample is its prediction.
b. The variance is fixed and is the variance of the prediction residuals.
Define the density dₗ as the value of aₗ from Dₗ.
Define the weight wₗ to be the inverse of the density: 1/dₗ.

Step 2: modelling the outcome

Marginal Structural Model — regress the outcome on the treatment weighted by the generalized IP-weights. As proposed by Robins, Hernan, and Brumback.

Code

Below is a Python code demonstrating the estimation process described above.

Read original article here

Denial of responsibility! Techno Blender is an automatic aggregator of the all world’s media. In each content, the hyperlink to the primary source is specified. All trademarks belong to their rightful owners, all materials to their authors. If you are the owner of the content and do not want us to publish your materials, please contact us by email – [email protected]. The content will be deleted within 24 hours.

Causal Inference with Continuous Treatments | by Ehud Karavani | Nov, 2022

Generalizing inverse probability weights for non-categorical treatments

Step 1: modelling the treatment

Step 2: modelling the outcome

Code

Extensions

Stabilized weights

Replacing weighted regression with a clever covariate

Heteroskedastic density and other distributions

Generalizing inverse probability weights for non-categorical treatments

Step 1: modelling the treatment

Step 2: modelling the outcome

Code

Extensions

Stabilized weights

Replacing weighted regression with a clever covariate

Heteroskedastic density and other distributions