Stochastic Gradient Descent (SGD) is an iterative optimization algorithm used to find local minima of differentiable functions, commonly applied in machine learning to minimize cost functions. It updates model parameters after each training sample, allowing for faster convergence but potentially leading to noisy updates and longer convergence times. The document also discusses advantages and disadvantages of SGD compared to mini-batch and batch gradient descent methods.