In multivariable calculus, assuming a function is differentiable at the point , the gradient of a function is the vector field of the partial derivatives, denoted with :
The directional derivative of at in the direction of the unit vector can be written:
Properties
Like regular derivatives, the gradient satisfies sum, product, and quotient rules and is distributive.
The gradient generalises past to:
i.e., it is the average of each gradient term. One key problem with this: if we use gradient descent, the time complexity of computing the gradient for each independent variable iteration is , which can be prohibitively expensive for large deep learning datasets.
Interpretations
Just looking at how the gradient interacts with the directional derivative:
- When and the gradient and unit vector point in the same direction, the directional derivative has its maximum value, and has its greatest rate of increase.
- When and both point in opposite directions, has its greatest rate of decrease and the directional derivative has its minimum value.
- The directional derivative is zero in any direction orthogonal to the gradient vector.
So what this means is that the gradient points in the direction of steepest ascent at . The negative gradient points in the direction of steepest descent.
By theorem, the line tangent to the level curve of at is orthogonal to the gradient at the point provided .