In multivariable calculus, assuming a function is differentiable at the point , the gradient of a function is the vector field of the partial derivatives, denoted with :

The directional derivative of at in the direction of the unit vector can be written:

Properties

Like regular derivatives, the gradient satisfies sum, product, and quotient rules and is distributive.

The gradient generalises past to:

i.e., it is the average of each gradient term. One key problem with this: if we use gradient descent, the time complexity of computing the gradient for each independent variable iteration is , which can be prohibitively expensive for large deep learning datasets.

Interpretations

Just looking at how the gradient interacts with the directional derivative:

  • When and the gradient and unit vector point in the same direction, the directional derivative has its maximum value, and has its greatest rate of increase.
  • When and both point in opposite directions, has its greatest rate of decrease and the directional derivative has its minimum value.
  • The directional derivative is zero in any direction orthogonal to the gradient vector.

So what this means is that the gradient points in the direction of steepest ascent at . The negative gradient points in the direction of steepest descent.

By theorem, the line tangent to the level curve of at is orthogonal to the gradient at the point provided .

See also