We’ll break our training up into multiple steps, and use different learning rates at each step. This will allow the model to train more quickly at the beginning by taking larger steps, but we will reduce the learning rate in later steps, in order to more finely tune the model as it approaches an optimal solution. If we just used a high learning rate during the entire training process, then the network may never converge on a good solution, and if we use a low learning rate for the entire process, then the network would take far too long to train. Varying the learning rate gives us the best of both worlds (high accuracy, with a fast training time).