DL Notes: Advanced Gradient Descent
The main optimization algorithms used for training neural networks, explained and implemented from scratch in PythonPhoto by Jack Anstey / UnsplashIn my previous article about gradient descent, I explained the basic concepts behind it and summarized the main challenges of this kind of optimization.However, I only covered Stochastic Gradient Descent (SGD) and the “batch” and “mini-batch” implementation of gradient descent.Other algorithms offer advantages in terms of convergence speed, robustness to “landscape” features…