A Priority Based Scheduler for Amazon SageMaker Training Jobs
Optimizing the use of limited AI training accelerators — Part 2Photo by Adrien Aletti on UnsplashThis post was created in collaboration with Max Rabin.This is the second part of a series of posts on the topic of maximizing the utility of scarce AI resources. In the first post we noted the increasing limitations on the ability to scale up AI resources at will and, as a consequence, the growing trend of AI development teams to guarantee AI compute capacity by means such as building up an in-house AI server farm and/or…