Sigmoids are hyperbolic looking functions. They’re used sometimes as activation functions, because they are easily differentiable, smooth, and continuous — these are problems that the unit step and sign functions suffer from.
Two main types are used:
- The hyperbolic tangent (or just tanh) ranges from . This replaces the sign function.
- The logistic function (or just the sigmoid) ranges from . This replaces the unit step function.
Both of these are susceptible to the vanishing gradient problem, where neurons far from 0 approaches 0 drastically. This is partially because the derivatives of these functions essentially act as a low-pass filter for the neuron values, so further from 0, the values are essentially 0.
Logistic
The logistic function is commonly used to produce the parameter of a Bernoulli distribution, since the restrictions on their values are identical.
The sigmoid is also commonly used as the activation function for the output layer for binary classification models, since the output is squashed to a range as a probability.
In code
At first glance, the logistic sigmoid is implemented as above. However, for large this can cause an overflow because of the . A better way to implement the sigmoid depends on the sign of :