Browsing Tag

Pinhasi

Building Spark Data Pipelines in the Cloud —What You Need to Get Started | by Assaf Pinhasi | Nov, 2022

Jessie Hobb Nov 16, 2022 0

Common engineering challenges and recipes for solutionsGenerated with stable diffusion by the authorOver the last ten years or so, authoring and executing Spark jobs has become considerably simpler, mainly thanks to:High level APIs — which make it easier to express logic.Managed cloud-based platforms — highly scalable object storage and one click ephemeral clusters based on spot instances make it infinitely simpler to run jobs (and delay the need to optimize them)While authoring logic in Spark and executing jobs has…

From Raw Videos to GAN Training. Introduction | by Assaf Pinhasi | Oct, 2022

Jessie Hobb Oct 20, 2022 0

Implementing a data pipeline and a lightweight Deep Learning data lake using ClearML on AWSHour One is an AI-centric start-up, and its main product transforms text into videos of virtual human presenters.Generating realistic, smooth, and compelling videos of human presenters speaking and gesturing in multiple languages based on text alone is a challenging task, that requires training complex Deep Learning models — and lots of training data.This post describes the design and implementation of a data pipeline and data…

Deep Lake — an architectural blueprint for managing Deep Learning data at scale — part I | by Assaf Pinhasi | Jun, 2022

Jessie Hobb Jun 14, 2022 0

Image by author using vqgan + clip (“underwater roboic world | trending on artstation”, 1000 iter.)In the past few years, machine learning data management practices have evolved dramatically, with the introduction of new design patterns and tools such as feature stores, data and model monitoring practices, and feature generation frameworks.Most advances in data management for machine learning are focused on classical (feature-based) data, and cannot be applied as-is to unstructured data, leaving deep learning data…