Thinking Fast and Slow for Data Science | by Ani Madurkar | Oct, 2022

By Jessie Hobb On Oct 14, 2022

The importance of knowing when to use experimentation and when to use MLOps

These days almost every content related to Data Science and Machine Learning Engineering seems to be about MLOps. Ever since the Hidden Technical Debt in Machine Learning Systems (Sculley, et al.) was published in late 2015, there has been a dramatic rise in learning and teaching MLOps concepts. This famous image from the paper is shared in nearly every MLOps article and book (including some of my own walkthroughs) as premise #1 for learning MLOps.

Hidden Technical Debt in Machine Learning Systems, Sculley et al

Usually, premise #2 is around some statistic of how many machine learning models end up making it to production as a heuristic that minimal value is attained from a large amount of machine learning models.

Google Search Trend for MLOps. Image by author

MLOps is the opposite of Experimentation and the success of a Data Scientist relies on being able to leverage both mental faculties with high acuity.

MLOps has (and continues to) taken the Data Science world by storm to the extent that it is seen as the natural progression for Data Science professionals to evolve into. Even as someone who has a deep fascination with this space and has worked on multiple MLOps-related projects, I want to clarify that this dominance is overall damaging to the longevity of the field as a whole. MLOps is the opposite of Experimentation and the success of a Data Scientist relies on being able to leverage both mental faculties with high acuity. Over-focusing on one of the paradigms starts to diminish the importance of the other, but both are crucially needed to maximize the value generated.

Thinking Fast and Slow by Daniel Kahneman is one of my favorite books where the Nobel prize winner highlights the two dominant systems in your brain — System 1 and System 2. System 1 is the automatic and impulsive faculty while System 2 is the conscious and methodological one. Although I won’t do a deep dive into the book here, I draw analogies to the concepts discussed there since I think it applies to the Data world incredibly well. Regardless, I do encourage reading this book — it’s fascinating and backed by science.

In our case, Data Scientists need to be highly careful and deliberate when building systems with MLOps concepts and principles. They need to exercise their System 2 mental models that allow them to slowly design something that will scale far past them. Due to the technical rigor of the craft, this type of thinking gets touted as the most important. For Data Scientists to perform with high success, they need to lean into the moments when they’re utilizing their intuitive and curiosity-driven faculties to find creative solutions. This System 1 set of mental models allows them to ask a large variety of intelligent questions and experiment enough to identify the right questions to move forward with.

The differences in System 1 and 2 thinking for Data Scientists are less about whether you’re doing machine learning or not, and more about your objective in guiding effective decisions. Data Scientists have to balance speed and scale delicately in every project. From running expert analysis to engineering statistical methods, the two Systems are tenaciously balanced with each other. Just as Systems 1 and 2 work inside of our brains psychologically, these work congruently together when doing the work and it’s in our favor to see the value and pitfalls of each. Here’s a clear distinction between the two:

System 1: intuitive and fast expertise led by curiosity to ask better questions.

System 2: methodological and slow expertise led by rigor to provide better answers.

Cassie Kozyrkov, Chief Data Scientist at Google, has pioneered the importance of Decision Science as a field. She wrote Analytical Excellence is All about Speed to truly show the importance of Data Analysts, Machine Learning/AI Engineers, and Data Scientists and where they all shine respectively. Although I agree that they all should lean into their own expertise, I often see Senior+ Analysts, MLEs, and Data Scientists performing a little bit of all of the above. They have to be masters of some and also [at least] jack of all. The industry needing these kinds of multifaceted data professionals has caused each to develop these two main Systems of thought.

System 1 involves the rapid experimentation of the data to get a strong grasp of what it contains, where the limits are, what kinds of selection bias may be present, what kinds of answers it can reasonably yield, and more.

System 2 involves the careful building of systems that use the data in systems to scale effectively with models that are able to generalize to new data with reproducibility, transparency, and antifragility.

Lastly, for the sake of brevity, I’ll say “Data Scientists” as a short form for anyone doing enterprise machine learning work. This can be just as relevant for Machine Learning Engineers or other Data practitioners as well.

Anyone involved in the Twitter Machine Learning community sees a new MLOps diagram nearly every other week. Some new “needed” components of an architecture claim to make machine learning projects more systematic.

The simplest diagrams add development and production region dichotomies and the most complex look like an entire IT organization.

Twitter Public Domain, tweet by @suzatweet. Image by author

The variance in building machine learning systems at different organizations has also spawned a plethora of tools and technologies attempting to solve some or all of the MLOps grievances.

Curated list of the production machine learning tools is maintained by the Institute for Ethical AI. Image Source

This has consequently made Data Scientists scramble for becoming masters of a variety of tools in efforts to build these “robust and scalable” machine learning systems. The driving reason for this and the executive direction seems to spawn from the idea that minimal machine learning models end up making it to production and so minimal value is retrieved from these large-scale projects.

Although the jury is out on the accuracy of these metrics of how many machine learning models make it into production, I mainly want to question the necessity of that premise in the first place. What is the need to have every machine learning model make it to production? Since when did the industry blindly co-opt that value is minimally attained, if at all, unless a model makes it to production (and thereby needing all this surrounding infrastructure)?

Lak Lakshmanan wrote a great article “No, you don’t need MLOps” where he clarified the overcomplication of building machine learning systems at a large scale. Simpler solutions exist to “do MLOps” and follow the appropriate guidelines without needing to overengineer architecture that causes even greater technical debt than was argued in 2015.

Before progressing in my point, I want to make it abundantly clear that I do believe in the value of building efficiently robust and scalable machine learning systems, with simple solutions or not. I believe in Data Scientists learning how to think and build beyond Jupyter notebooks, but that does not mean that their role should be solely reduced to that function.

When aiming to build beyond Jupyter notebooks, it’s great to start thinking in terms of version control, feature stores, model registries, and the hundreds of tools or technologies that help MLOps. This insanely deep rabbit hole quickly deludes you into thinking that it’s the “primary” skill to optimize for. The value of experimentation and System 1 thinking is what people hire Data Scientists for as much as their ability to build data systems at scale, but it gets dwarfed in the noise of Kubernetes, Docker, Terraform, etc. Although MLOps can definitely serve certain projects incredibly well, it is not the best answer for every data project.

By “Experimentation” I don’t mean A/B testing and Bayesian Inference methods; I mean experimentation in the broadest sense of the term. Often times it can be considered “R&D work”, but fundamentally it is the skilled craft of being able to go from 0 to 1 in your data projects.

When you’re designing systems to a large scale, you want repeatable steps where you can take a stochastic system and make it as understandable and “controlled” as possible. But unless you know the right questions to ask and where to search for answers, you’re likely to build a system no one will use or find value from. We can measure the value of the predictions, what happens when they fail, and set OKRs to make sure we can quantitatively “succeed”. This is drastically different from measuring the value of experimentation, in which the goal is to go from zero to one. Experimentation is what helps you identify the right problems to solve and it requires you to be data-driven, product-oriented, and have a strong understanding of the domain. This can involve ML or not; it’s more about developing a strong model of the world your dataset is meant to represent.

You can’t develop a good sense of how to build a system that generalizes to new data deployed in the real world without being able to simulate a model of your dataset in your head that is beyond the dataset you have.

In practice, experimentation usually gets expected of by Analysts and/or Product Managers but I often see Data Scientists/MLEs have to do this work themselves as they become more senior. The farther you progress, the more it seems you are expected to think in both Systems and own more of the entire data product you build. You also start to develop specific intuitions of domains of what kinds of pitfalls you may run into, this dramatically improves your ability to catch errors that would come later on. This intuition really only comes with strong experimentation. You can’t develop a good sense of how to build a system that generalizes to new data deployed in the real world without being able to simulate a model of your dataset in your head that is beyond the dataset you have.

The issue comes in attributing a numerical value for this work. It’s hard, nebulous, and unpredictable. Understanding a domain and the people that comprise it to have a specific idea of why certain nulls are occurring, what confounders are missing (and why) to infer causality, which distributions the population is meant to take, and more can be a stochastic process of learning and exploring which requires System 1 thinking more than System 2. A lot of this work is nearly impossible to repeat effectively. It deals with the contextual nature of the data, which can largely be affected by your perspective on and time spent in domain.

Although this work can be slow and thoughtful, you’re often on tight deadlines when doing large scale data science projects and you rarely have “enough” time to do it all. This essentially means that this experimentation work almost always gets fast-tracked, hence why industries want specialists who deeply understand a domain and build strong data systems — their intuitions run deep and are intelligently led with careful assumptions. This is seen as valuable, but that’s often not translated to the time spent on this work.

The biggest shame with a hyper focus on MLOps is that we give the impression that that is at the top of this Data Science mountain with everything else being stepping stones along the way. The reality is that MLOps is but one of two very large mountains and the other being Experimentation. Both mountains are worthy and challenging to climb, and both have the potential to yield incredible value. A seasoned Data professional is hired to be able to traverse either mountain, which implies the knowledge of: which mountain to traverse when, the common pitfalls in climbing each/how to navigate them, the most effective approach to get to the top, and how to blaze a path that others may follow as well.

Almost always, climbing the Experimentation mountain makes climbing the MLOps mountain even better. It usually is not because of a formulaic reason that you can read about in any vanilla blog post about “How to Do Machine Learning”. It’s due to precise domain knowledge that gives you insight into the data and helps you create a generalizable model to a changing, real world dataset.

The effective way to climb the Experimentation mountain is by being curious and exploratory. Asynchronous reading of relevant domain material, qualitative inquiries for contextual research on the dataset, analyzing the data with ideas driven by metrics and intuition, and more are what drives this System 1 mindset.

The effective way to climb the MLOps mountain is by being strategic and methodological. Planning the systems and tools needed, the people overseeing certain aspects, strategizing which triggers and heuristics balance the system, and more are what drives this System 2 mindset.

The experts balance both under tight deadlines without burning out. That’s what we should put at the peak of the Data Science mountain. It’s different than true software engineering roles because of this and I think we do the field a great service by recognizing and teaching the other 50% of it as well.