Visualizing Gradient Descent Parameters in Torch
Prying behind the interface to see the effects of SGD parameters on your model trainingBehind the simple interfaces of modern machine learning frameworks lie large amounts of complexity. With so many dials and knobs exposed to us, we could easily fall into cargo cult programming if we don’t understand what’s going on underneath. Consider the many parameters of Torch’s stochastic gradient descent (SGD) optimizer:def torch.optim.SGD( params, lr=0.001, momentum=0, dampening=0, weight_decay=0, nesterov=False, *,…