In neural networks, activation functions determine whether a neuron will pass on information or retain it. We typically use ReLU as a standard.

Activation functions introduce non-linearity, increases model capacity, and regulates the output range. All modern activation functions are differentiable and non-linear.

Dominant types

  • ReLU: . Sets to 0 for negative values, otherwise is a regular linear unit. Computationally efficient, but can lead to dead neurons.
  • Linear activation: . Output of a neuron is fully passed onto the next layer. No activation. Turns all layers into a single massive layer. Prevents backpropagation.
  • Discontinuous functions:
    • Binary step: for and for . Works for binary classification, struggles with multi-class problems.
    • Sign function: i.e., the sign of the input. Both the binary step function and the sign function are able to create a decision boundary based on what side of the hyperplane data falls.
  • Sigmoids:
    • Logistic function: . Commonly used to predict probabilities since values . Replaces the unit step function. Prone to the vanishing gradient problem.
    • Hyperbolic tangent: . Used in place of the sign function.
  • Softmax: output activation function for multiclass classifiers. Outputs a discrete probability distribution.

https://twitter.com/docmilanfar/status/1684428663872446465