Explanatory v. Predictive: The Forgotten Project Distinction | by Chris Walsh | Oct, 2022

By Jessie Hobb On Oct 20, 2022

Clarifying goals for informed decisions

One of the biggest sources of confusion in Data Science, and specifically between Data Science teams and business leaders, is the distinction between explanatory and predictive models. There is often a lack of definition around whether the goal of a project is to accurately forecast something, as in a predictive model. Or whether the goal is to show why something has happened, as in an explanatory model.

Photo by Pablo García Saldaña on Unsplash

The lack of definition arises due to misunderstanding from both sides. Business leaders often equate predictive and explanatory models — “if you can predict something, then you should be able to explain why it is happening” and vice versa “if you can explain something, then you should be able to predict what will happen in the future.” It’s easy to understand why. The scientific laws that we all look up to as gold standards are both predictive and explanatory — e.g. Newton’s Law of Gravitation. Business leaders want to understand their decisions the same way that we understand gravity.

On the other hand, Data Scientists tend to rely too heavily on approaches that optimize predictive ability while sacrificing explainability. Many of the most “fun” Data Science techniques are more in the predictive realm than they are explanatory.

The distinction feels intuitive at first, but the more you get to know it the more nuanced it becomes.

Predictive models are those that have the express purpose of forecasting. They are often about things such as “if I implement this strategy, how much additional revenue will I get?” Every business leader wants to answer that question before they implement a decision.

Explanatory models are those that deconstruct why something has or will happen. They are often about things such as “why didn’t the new strategy I implemented drive as much business improvement as I thought it would?”

That is to say, predictive models are all about whether or not something will happen and to what degree. They do not care precisely why that thing will happen. A predictive model says “I’m highly confident this will happen, but I can’t tell you precisely why it will happen.” In technical terms, predictive models emphasize forecast accuracy over causal inference.

On the other hand, explanatory models are all about the why. They do not care whether something will happen, only that they can explain with some degree of confidence, or power, why something happens. In other words, they care deeply about causality. An explanatory model says “I can tell you the biggest reasons behind this happening, but I’m not very certain about whether it will happen again.”

In a technical sense, predictive models aim to minimize both bias and estimation error. Explanatory models, on the other hand, aim to minimize bias while also providing a causal explanation of results, they tend to sacrifice estimation error to do so.

When we conflate predictive accuracy with explanatory power we often end up with models that are neither very explanatory nor very predictive.

So, a business leader tells you they’ve designed a new ship. They provide you with all of the specifications for this ship. Now, they want to know “will it sink?” It’s clearly a predictive question. They just asked, “will this happen?” But, when you tell the business leader that the ship is going to sink, their first question is going to be “why?” And if you’ve gone down the truly predictive modeling path, you won’t have an answer. At least not a very confident one.

On the other hand, if you’ve gone down the explanatory path your answer will sound something like “you’ve made a few design decisions that are pretty risky, but I’m not entirely confident whether those are enough to sink the ship.”

Either of those answers sounds like a cop-out. “I’m pretty sure it will sink, but I don’t know why” and “there’s definitely some risky things here, but I’m not sure if they’re enough to sink the ship.”

The business is looking for a model that is both explanatory and predictive, they are looking for Newton’s Law of Gravitation. But rarely do we have the time, resources, knowledge, or even ability to satisfy both sides of the question. The issue is that when you get into the technical design of a model, the search for causation introduces a lot of constraints on the input data. We can be relatively loose with our data requirements when being predictive but when we are explanatory we need to be extremely careful.

Take for example humanity’s numerous attempts at explaining the sunrise and sunset. It’s easy to build a good predictive model that, with an exceptionally high degree of likelihood, the sun will in fact set sometime tonight and rise again tomorrow. And you don’t even need to know anything about the solar system to do that! But it takes a lot of ingenuity to figure out why that happens. We spent thousands of years figuring out why. There is an inherent trade-off between the two types of models.

As a result, we need to decide whether our projects are predictive or explanatory. To make that call, we need to understand the trade-off between forecast accuracy and explainability in our particular situation. Which of the two is the end-user more willing to sacrifice? Is it ok that we can forecast without an explicit answer for why? Or is it more important that we can say why things are happening?

On top of which, Data Scientists need to be leery of questions phrased as predictive. Often business leaders will not be happy with a forecast which cannot be causally explained. They need a story that justifies the forecast, an explanatory model.

However, once we’ve made a clear decision, we have a value statement we can aim toward as we make decisions throughout a project. Either “optimize forecast accuracy” or “explain as much of the why as possible.” That will make a number of decisions easier.

For the Data Scientist, the key is that we don’t assume all problems to be predictive. Many of the questions we face are, in fact, more explanatory in nature, so we need to:

Keep an open mind as we choose the types of models we will pursue
Understand the needs of our business partners to decide the appropriate direction
Clearly communicate the inherent trade-offs at the beginning of the project

Once we have decided which camp the project falls into then we can move forward with confidence, knowing where to turn when we start to make trade-offs between forecast accuracy and an understanding of causality. This will simplify the decision-making process at various points of the project, enabling us to better choose our data inputs, design experiments if needed, and pick amongst model types.

Feel free to contact me on LinkedIn for more perspective on the Data Science field.

Clarifying goals for informed decisions

The distinction feels intuitive at first, but the more you get to know it the more nuanced it becomes.

When we conflate predictive accuracy with explanatory power we often end up with models that are neither very explanatory nor very predictive.

For the Data Scientist, the key is that we don’t assume all problems to be predictive. Many of the questions we face are, in fact, more explanatory in nature, so we need to:

Keep an open mind as we choose the types of models we will pursue
Understand the needs of our business partners to decide the appropriate direction
Clearly communicate the inherent trade-offs at the beginning of the project

Feel free to contact me on LinkedIn for more perspective on the Data Science field.

Read original article here

Denial of responsibility! Techno Blender is an automatic aggregator of the all world’s media. In each content, the hyperlink to the primary source is specified. All trademarks belong to their rightful owners, all materials to their authors. If you are the owner of the content and do not want us to publish your materials, please contact us by email – [email protected]. The content will be deleted within 24 hours.