The Overpromise of End-to-End ML Platforms: Why One-Size-Fits-All Solutions Don’t Always Work | by Christian Freischlag | Dec, 2022


The Importance of Customization and Flexibility in Machine Learning Infrastructure

DALL·E – A illustration of a vendor lock in costing a company too much money

In the recent year, my inbox was flooded by MLOps companies trying to sell their one-size-fits-all infrastructure. But not only there, in comparison to pre-covid years, exhibitors at conferences were also dominated by ML(Ops) platforms. You can also find a lot of spam under every related article here on medium, and even a paid workshop was simply a sales demo from such a company. While I’m generally open to new products, there’s way too much being promised, and this article is about the importance of staying independent.

MLOps, or machine learning operations, combines software engineering and machine learning to enable organizations to deploy and manage machine learning models in a scalable and reliable manner using DevOps principles. One of the critical challenges with MLOps is that training, deployment, integration, and monitoring of models is coupled with a unique set of needs and requirements. A platform that works well for one company may not necessarily be the best fit for another, even if they are in the same industry. Each organization has unique data sources, business processes, and technology stacks.

Most MLOps products offer a simplified version of AWS, GCP, or Azure. While it’s pain to learn AWS (or GCP/Azure) configurations, such as AMI roles, networking, EC2, Lambda, and technologies to handle that (e.g., terraform), the possibilities are great, and all that struggle gives you a chance to build a highly customized system. Plus, you skill up your engineers and make them cloud-ready. This article is therefore not directed against cloud, but against fully integrated solutions from a single provider.

These ML/MLOps plattforms try to make the cloud usage easier by abstracting the details into everyday use cases and connecting all parts. It’s about outsourcing your ML infrastructure. As long as you only move in these simple use cases, you’ll be fine, and in that case, you should look at these platforms.

Unfortunately, machine learning in production rarely fits into these “toy” use cases. It’s pretty similar to data science in academia vs. data in the industry from my experience. It starts with the simple fact that some ML platforms are Python-only, and that limitation should already be alarming. Furthermore, some don’t allow to differentiate between Batch vs. API workloads. Some don’t support GPUs; some even limit your choice of algorithms to a pre-implemented list or don’t allow you to save your data under GPDR requirements. The list is long, and I want to refer to this repository for a comparison.

But I am not really focusing much about features in general. You might find a solution, which currently fits your need perfectly, but what if, at some point, your requirements change and it does not anymore? Would you build a second customized system in parallel? Would you move to another platform? And even if you pick the feature-rich and most expensive provider, do you think the long-term lock-in is worth it?

A paid, expensive machine learning platform may not be the most cost-effective solution, especially for a younger company. These platforms can be costly, and like most VC-baked platforms, they work through cheap initial costs, burning a lot of money from investors until they own their customers (vendor lock-in). At some point, they have to make profits, and guess who is paying for that? Outsourcing MLOps to these SaaS providers could potentially be very expensive.

With a custom open-source approach, organizations can easily integrate their machine-learning infrastructure with their existing systems, allowing them to quickly and easily deploy and manage models without disrupting their existing operations. That comes with a cost, you need people to do that.

Of course, there are a lot of tools around that you should not build on your own. For standard use cases, some solutions are worth buying; data ingestion, experiment tracking, monitoring just to name some. Most of them are even open-source and available as managed services. Usually, there is a consulting business behind these tools, and you can pay them to help build up and scale your system. But these tools are typically interchangeable, and you can keep the risk of being locked-in small and you are still deciding about your ML architecture. You are not buying the whole stack, but parts.

Also, keep in mind that there is often an obvious conflict of interest. While SaaS providers want you to spend more money each year, often through a consumption-based model (pay per compute time, amount of data, …), it is not very attractive for them to lower your bills by processing data as efficiently as possible.

Overall, it is clear that MLOps is not a one-size-fits-all approach, and a paid, expensive machine learning platform is rarely the best nor the cheapest solution. Custom open-source approaches offer a level of flexibility and adaptability that is often better suited to the needs of companies and can help organizations save money, while buying the tools, they really need.


The Importance of Customization and Flexibility in Machine Learning Infrastructure

DALL·E – A illustration of a vendor lock in costing a company too much money

In the recent year, my inbox was flooded by MLOps companies trying to sell their one-size-fits-all infrastructure. But not only there, in comparison to pre-covid years, exhibitors at conferences were also dominated by ML(Ops) platforms. You can also find a lot of spam under every related article here on medium, and even a paid workshop was simply a sales demo from such a company. While I’m generally open to new products, there’s way too much being promised, and this article is about the importance of staying independent.

MLOps, or machine learning operations, combines software engineering and machine learning to enable organizations to deploy and manage machine learning models in a scalable and reliable manner using DevOps principles. One of the critical challenges with MLOps is that training, deployment, integration, and monitoring of models is coupled with a unique set of needs and requirements. A platform that works well for one company may not necessarily be the best fit for another, even if they are in the same industry. Each organization has unique data sources, business processes, and technology stacks.

Most MLOps products offer a simplified version of AWS, GCP, or Azure. While it’s pain to learn AWS (or GCP/Azure) configurations, such as AMI roles, networking, EC2, Lambda, and technologies to handle that (e.g., terraform), the possibilities are great, and all that struggle gives you a chance to build a highly customized system. Plus, you skill up your engineers and make them cloud-ready. This article is therefore not directed against cloud, but against fully integrated solutions from a single provider.

These ML/MLOps plattforms try to make the cloud usage easier by abstracting the details into everyday use cases and connecting all parts. It’s about outsourcing your ML infrastructure. As long as you only move in these simple use cases, you’ll be fine, and in that case, you should look at these platforms.

Unfortunately, machine learning in production rarely fits into these “toy” use cases. It’s pretty similar to data science in academia vs. data in the industry from my experience. It starts with the simple fact that some ML platforms are Python-only, and that limitation should already be alarming. Furthermore, some don’t allow to differentiate between Batch vs. API workloads. Some don’t support GPUs; some even limit your choice of algorithms to a pre-implemented list or don’t allow you to save your data under GPDR requirements. The list is long, and I want to refer to this repository for a comparison.

But I am not really focusing much about features in general. You might find a solution, which currently fits your need perfectly, but what if, at some point, your requirements change and it does not anymore? Would you build a second customized system in parallel? Would you move to another platform? And even if you pick the feature-rich and most expensive provider, do you think the long-term lock-in is worth it?

A paid, expensive machine learning platform may not be the most cost-effective solution, especially for a younger company. These platforms can be costly, and like most VC-baked platforms, they work through cheap initial costs, burning a lot of money from investors until they own their customers (vendor lock-in). At some point, they have to make profits, and guess who is paying for that? Outsourcing MLOps to these SaaS providers could potentially be very expensive.

With a custom open-source approach, organizations can easily integrate their machine-learning infrastructure with their existing systems, allowing them to quickly and easily deploy and manage models without disrupting their existing operations. That comes with a cost, you need people to do that.

Of course, there are a lot of tools around that you should not build on your own. For standard use cases, some solutions are worth buying; data ingestion, experiment tracking, monitoring just to name some. Most of them are even open-source and available as managed services. Usually, there is a consulting business behind these tools, and you can pay them to help build up and scale your system. But these tools are typically interchangeable, and you can keep the risk of being locked-in small and you are still deciding about your ML architecture. You are not buying the whole stack, but parts.

Also, keep in mind that there is often an obvious conflict of interest. While SaaS providers want you to spend more money each year, often through a consumption-based model (pay per compute time, amount of data, …), it is not very attractive for them to lower your bills by processing data as efficiently as possible.

Overall, it is clear that MLOps is not a one-size-fits-all approach, and a paid, expensive machine learning platform is rarely the best nor the cheapest solution. Custom open-source approaches offer a level of flexibility and adaptability that is often better suited to the needs of companies and can help organizations save money, while buying the tools, they really need.

FOLLOW US ON GOOGLE NEWS

Read original article here

Denial of responsibility! Techno Blender is an automatic aggregator of the all world’s media. In each content, the hyperlink to the primary source is specified. All trademarks belong to their rightful owners, all materials to their authors. If you are the owner of the content and do not want us to publish your materials, please contact us by email – admin@technoblender.com. The content will be deleted within 24 hours.
artificial intelligenceChristianDecdontendtoendFreischlagOneSizeFitsAllOverpromisePlatformsSolutionsTech NewsTechnoblenderWork
Comments (0)
Add Comment