In neural networks, activation functions determine whether a neuron will pass on information or retain it. We typically use ReLU as a standard.
Activation functions introduce non-linearity, increases model capacity, and regulates the output range. All modern activation functions are differentiable and non-linear.
Dominant types
- ReLU: . Sets to 0 for negative values, otherwise is a regular linear unit. Computationally efficient, but can lead to dead neurons.
- Linear activation: . Output of a neuron is fully passed onto the next layer. No activation. Turns all layers into a single massive layer. Prevents backpropagation.
- Discontinuous functions:
- Binary step: for and for . Works for binary classification, struggles with multi-class problems.
- Sign function: i.e., the sign of the input. Both the binary step function and the sign function are able to create a decision boundary based on what side of the hyperplane data falls.
- Sigmoids:
- Logistic function: . Commonly used to predict probabilities since values . Replaces the unit step function. Prone to the vanishing gradient problem.
- Hyperbolic tangent: . Used in place of the sign function.
- Softmax: output activation function for multiclass classifiers. Outputs a discrete probability distribution.