Techno Blender
Digitally Yours.
Browsing Tag

distributed

Are the OLS Estimators Normally Distributed in a Linear Regression Model? | by Aaron Zhu | Nov, 2022

Justification for the Normality AssumptionPhoto by Martin Sanchez on UnsplashWe all know that the normality assumption is optional to compute unbiased estimates in a linear regression model. In this post, we will discuss if the OLS estimators in a linear regression model are normally distributed and what assumptions would be needed to draw the conclusion.What are the OLS estimators in a linear regression model?The OLS estimators (β^) are computed from a sample to estimate the population parameters (β) in a linear…

Distributed Learning: A Primer. Behind the algorithms that make Machine… | by Samuel Flender | Nov, 2022

Behind the algorithms that make Machine Learning models bigger, better, and fasterImage generated with Stable DiffusionDistributed learning is one of the most critical components in the ML stack of modern tech companies: by parallelizing over a large number of machines, one can train bigger models on more data faster, unlocking higher-quality production models with more rapid iteration cycles.But don’t just take my word for it. Take Twitter’s:Using customized distributed training allows us to iterate faster and train…

Are the Error Terms Normally Distributed in a Linear Regression Model? | by Aaron Zhu | Nov, 2022

Justification for the Normality AssumptionPhoto by Martin Sanchez on UnsplashIn a linear regression model, the normality assumption (i.e., the error term is normally distributed) NOT required for calculating unbiased estimates. In this post, we’ll discuss under what situations we would need this normality assumption, why it is reasonable to make such an assumption, and how to check if the errors are normally distributed.What are the error terms in a linear regression model?The following is what a typical linear regression…

How is your data distributed? A practical introduction to the Kolmogorov-Smirnov test | by Gianluca Malato | Nov, 2022

An introduction to the KS test for beginnersPhoto by papazachariasa on PixabayData Scientists often need to assess the proper distribution of their data. We have already seen the Shapiro-Wilk test for normality, but what about non-normal distributions? There’s another test that can help us, which is the Kolmogorov-Smirnov test.Data Scientists usually face the problem of checking the distribution of their data comes. They work with samples and need to check if they come from a normal distribution, a lognormal distribution,…

Smart Distributed Training on Amazon SageMaker with SMD: Part 3 | by Chaim Rand | Sep, 2022

How to Optimize Model Distribution with SageMaker Distributed Model ParallelPhoto by Martin Jernberg on UnsplashThis is the final part of a three-part post on the topic of optimizing distributed training. In part one we provided a brief survey of distributed training algorithms. We noted that common to all algorithms is their reliance on high-speed communication between multiple GPUs. We surmised that a distributed algorithm that accounted for the underlying instance topology, particularly the differences in the…

Smart Distributed Training on Amazon SageMaker with SMD: Part 2 | by Chaim Rand | Sep, 2022

How to Optimize Data Distribution with SageMaker Distributed Data ParallelPhoto by Stephen on UnsplashThis is the second part of a three-part post on the topic of optimizing distributed training. In part one, we provided a brief survey of distributed training algorithms. We noted that common to all algorithms is their reliance on high-speed communication between multiple GPUs. We surmised that a distributed algorithm that accounted for the underlying instance topology, particularly the differences in the communication…

Normalize any Continuously Distributed Data with a Couple of Lines of Code | by Danil Vityazev | Sep, 2022

How to use inverse transform sampling to improve your modelNormalizing data is a common task in data science. Sometimes it allows us to speed up gradient descent or improve model accuracy, and in some cases it absolutely crucial. For example, the model I described in my last article cannot handle targets that are distributed non-normally. Some normalization techniques, like taking a logarithm, may work most of the time, but in this case, I decided to try something that would work for any data, no matter how it was…

Distributed Parallel Training: Data Parallelism and Model Parallelism | by Luhui Hu | Sep, 2022

How to scale out training large models like GPT-3 & DALL-E 2 in PyTorchPhoto by Mark Harpur on UnsplashRecent years have witnessed exponential growth in the scale of distributed parallel training and the size of deep learning models. In particular, Transformer-based language models have been stealing the show. The notorious GPT-3 blew out with 175 billion parameters and 96 attention layers with a 3.2 M batch size and 499 billion words. Exactly half a year later, Google published Switch Transformer with 1.6 trillion…

Distributed Forecast of 1M Time Series in Under 15 Minutes with Spark, Nixtla, and Fugue | by Federico Garza Ramírez | Sep, 2022

Scalable Time Series Modeling with open-source projects StatsForecast, Fugue, and SparkBy Kevin Kho, Han Wang, Max Mergenthaler and Federico Garza Ramírez.TL:DR We will show how you can leverage the distributed power of Spark and the highly efficient code from StatsForecast to fit millions of models in a couple of minutes.Time-series modeling, analysis, and prediction of trends and seasonalities for data collected over time is a rapidly growing category of software applications.Businesses, from electricity and economics…

Distributed Parallel Training — Model Parallel Training | by Luhui Hu | Sep, 2022

Distributed model parallel training for large models in PyTorchPhoto by Daniela Cuevas on UnsplashRecent years have seen an exponential increase in the scale of deep learning models and the challenge of distributed parallel training. For example, the famous GPT-3 has 175 billion parameters and 96 attention layers with a 3.2 M batch size and 499 billion words. Amazon SageMaker training platform can achieve a throughput of 32 samples per second on 120 ml.p4d.24xlarge instances and 175 billion parameters. If we increase this…