Thoughts on Stateful ML, Online Learning, and Intelligent ML Model Retraining | by Kyle Gallatin

Designing scalable architecture for online and offline continuous learning systems

Ever since I read Chip Huyen’s Real-time machine learning: challenges and solutions, I’ve been thinking about the future of machine learning in production. Short feedback loops, real-time features, and stateful ML model deployments capable of learning online merit a very different sort of systems architecture that many of the stateless ML model deployments I work with today.

Me thinking ‘bout stateful ML in Cozumel, MX — Image by Author

For the past few months, I’ve had a thought of conducting informal user research, white-boarding, and doing ad-hoc development to get to the core of what a real stateful ML system might look like. For the most part, this post outlines the story of my thought process and I continue to dive into this space and uncover interesting and unique architectural challenges.

Stateful (or continuous) learning involves updating model parameters instead of retraining from scratch in order to:

Decrease training time
Save cost
Update models more frequently

Stateless versus stateful retraining — from Chip Huyen

Online learning involves learning from ground truth examples in real-time in order to:

Increase model performance and reactivity
Mitigate performance issues that would result from drift/staleness

Right now, most learning in the industry is done offline in batch.

Intelligent model retraining typically refers to automatically retraining models using some performance metric as opposed to on a set schedule in order to:

Reduce cost without sacrificing performance

Right now, most models across industries are retrained on a schedule using DAGs.

Intelligent retraining architecture from A Guide To Automated Model Retraining — by Arize AI

In a previous article, I’d tried to use foundational engineering principles in order to create a dead simple online learning architecture. My first thought — to model stateful, online learning architecture after stateful web applications. by treating the “model” as the DB (where predictions are reads and incremental training sessions are writes), I thought I might simplify the design process.

Image by Author

To a degree, I actually did! By using the online learning library River, I built a small, stateful online learning application that allowed me to update a model and serve predictions in real-time.

Flask app that shares a model in memory across multiple workers — Image by Author

This approach was cool and fun to code — but has some fundamental issues at scale:

Doesn’t scale horizontally: We can easily share a model in the memory of a single application — but this approach doesn’t scale approach multiple pods in orchestration engines like Kubernetes
Mixes application responsibilities: I don’t know (and don’t want to be the one to find out) about the caveats of trying to support a deployment that mixes training and serving
Preemptively introduces complexity: Online learning is the most proactive type of machine learning possible, but we haven’t even validated we need it in the first place. There has to be a better place to start…

Let’s start from an existing standard — distributed model training. It’s fairly common practice to use something like a parameter server as a centralized store while multiple workers calculate a partial/distributed gradient…or something…and reconcile the parameters after the fact.

So — I thought I’d try to this about this in the context of real-time model serving deployments, and came up with the dumbest architecture possible.

An architecture that makes no sense — Image by Author

Distributed model training is mean to speed up the training process. However, in this instance there’s no real need to be both training and serving in a distributed fashion — keeping the training decentralized introduces complexity and serves no purpose in an online training system. It makes way more sense to separate training entirely.

An architecture that makes slightly more sense — Image by Author

Great! Sort of. At this point I had to take a step back, as I was making quite a few assumptions and probably getting a bit ahead of myself:

We may not be able to get ground truth in near-real time
Continuous online training may not provide a net benefit over continuous training offline and is a premature optimization
Offline/online learning may also not be binary — and there are scenarios where we’d want/need both!

Let’s start from a simpler offline scenario — I want to use some sort of ML observability system to automatically retrain a model based on performance metric degradation. In a scenario where I’m doing continuous training (and model weights don’t take long to update) this is feasible to do without significant business impact.

Intelligent retraining and continuous online learning — Image by Author

Amazing — the first reasonable thing I’ve drawn all day! This system likely has a lower cost overhead than a stateless training architecture, and is reactive to changes in the model/data. We save lots of $ by only retraining as needed, and overall it’s pretty simple!

This architecture has a big problem though….it’s not nearly as fun! What might a system look like that has all the reactivity of online learning with the cost savings of continuous learning and the resilience of online learning?! Hopefully, something like this…

Continuous, online learning — Image by Author

Though there are details I still haven’t flushed out, there are a lot of benefits to this architecture. It allows for mixed online and offline learning (just as feature stores allow access to both streaming features and features computed offline), is highly robust to changes in data distribution or even individual user preferences for personalized systems (recsys), and still allows us to integrate ML observability (O11y) tooling to constantly measure data distributions and performance.

However, though this might be the most sensible thing diagram I’ve created yet, it still leaves a lot of open questions:

How/when do we evaluate the model and with what data in an online system? If the data distribution is subject large shifts, we’ll need to to create new data-driven methodologies and best practices for designing a held-out evaluation set that includes both old data and the most recent data.
How do we reconcile an ML model that splits training processes into batch/offline and online? We’ll need to experiment with new techniques and system architectures to allow for complex, computational operations that involve large ML models in a system like this.
How do we pull/push the model weights? On a cadence? During some event or subject to some change in metric? Each of this architectural decisions could have a significant impact on the performance of our system — and without online A/B testing or other research, it’ll be difficult to validate these choices.

Of course, one of my next steps is simply to start building some of this stuff and see what happens. However, I would appreciate insight, ideas and engagement from any and all folks in the industry to think about what some paths forward might be!

Please reach out on twitter, LinkedIn, or sign-up for the next sessions of my course on Designing Production ML Systems this May!

Designing scalable architecture for online and offline continuous learning systems

Me thinking ‘bout stateful ML in Cozumel, MX — Image by Author

Stateful (or continuous) learning involves updating model parameters instead of retraining from scratch in order to:

Decrease training time
Save cost
Update models more frequently

Stateless versus stateful retraining — from Chip Huyen

Online learning involves learning from ground truth examples in real-time in order to:

Increase model performance and reactivity
Mitigate performance issues that would result from drift/staleness

Right now, most learning in the industry is done offline in batch.

Intelligent model retraining typically refers to automatically retraining models using some performance metric as opposed to on a set schedule in order to:

Reduce cost without sacrificing performance

Right now, most models across industries are retrained on a schedule using DAGs.

Intelligent retraining architecture from A Guide To Automated Model Retraining — by Arize AI

Image by Author

To a degree, I actually did! By using the online learning library River, I built a small, stateful online learning application that allowed me to update a model and serve predictions in real-time.

Flask app that shares a model in memory across multiple workers — Image by Author

This approach was cool and fun to code — but has some fundamental issues at scale:

Doesn’t scale horizontally: We can easily share a model in the memory of a single application — but this approach doesn’t scale approach multiple pods in orchestration engines like Kubernetes
Mixes application responsibilities: I don’t know (and don’t want to be the one to find out) about the caveats of trying to support a deployment that mixes training and serving
Preemptively introduces complexity: Online learning is the most proactive type of machine learning possible, but we haven’t even validated we need it in the first place. There has to be a better place to start…

So — I thought I’d try to this about this in the context of real-time model serving deployments, and came up with the dumbest architecture possible.

An architecture that makes no sense — Image by Author

An architecture that makes slightly more sense — Image by Author

Great! Sort of. At this point I had to take a step back, as I was making quite a few assumptions and probably getting a bit ahead of myself:

We may not be able to get ground truth in near-real time
Continuous online training may not provide a net benefit over continuous training offline and is a premature optimization
Offline/online learning may also not be binary — and there are scenarios where we’d want/need both!

Intelligent retraining and continuous online learning — Image by Author

Continuous, online learning — Image by Author

However, though this might be the most sensible thing diagram I’ve created yet, it still leaves a lot of open questions:

How/when do we evaluate the model and with what data in an online system? If the data distribution is subject large shifts, we’ll need to to create new data-driven methodologies and best practices for designing a held-out evaluation set that includes both old data and the most recent data.
How do we reconcile an ML model that splits training processes into batch/offline and online? We’ll need to experiment with new techniques and system architectures to allow for complex, computational operations that involve large ML models in a system like this.
How do we pull/push the model weights? On a cadence? During some event or subject to some change in metric? Each of this architectural decisions could have a significant impact on the performance of our system — and without online A/B testing or other research, it’ll be difficult to validate these choices.

Please reach out on twitter, LinkedIn, or sign-up for the next sessions of my course on Designing Production ML Systems this May!

Read original article here

Denial of responsibility! Techno Blender is an automatic aggregator of the all world’s media. In each content, the hyperlink to the primary source is specified. All trademarks belong to their rightful owners, all materials to their authors. If you are the owner of the content and do not want us to publish your materials, please contact us by email – admin@technoblender.com. The content will be deleted within 24 hours.

Thoughts on Stateful ML, Online Learning, and Intelligent ML Model Retraining | by Kyle Gallatin | Apr, 2023

Designing scalable architecture for online and offline continuous learning systems

Designing scalable architecture for online and offline continuous learning systems

Related Posts