Activation function

In neural networks, activation functions determine whether a neuron will pass on information or retain it. We typically use ReLU as a standard.

Activation functions introduce non-linearity, increases model capacity, and regulates the output range. All modern activation functions are differentiable and non-linear.

Dominant types

ReLU: $f (x) = max (0, x)$ . Sets to 0 for negative values, otherwise is a regular linear unit. Computationally efficient, but can lead to dead neurons.
Linear activation: $f (x) = x$ . Output of a neuron is fully passed onto the next layer. No activation. Turns all layers into a single massive layer. Prevents backpropagation.
Discontinuous functions:
- Binary step: $f (x) = 0$ for $x < 0$ and $f (x) = 1$ for $x \geq 0$ . Works for binary classification, struggles with multi-class problems.
- Sign function: i.e., the sign of the input. Both the binary step function and the sign function are able to create a decision boundary based on what side of the hyperplane data falls.
Sigmoids:
- Logistic function: $f (x) = \frac{1}{1 + e ^{- x}}$ . Commonly used to predict probabilities since values $\in [0, 1]$ . Replaces the unit step function. Prone to the vanishing gradient problem.
- Hyperbolic tangent: $f (x) = tanh (x)$ . Used in place of the sign function.
Softmax: output activation function for multiclass classifiers. Outputs a discrete probability distribution.

https://twitter.com/docmilanfar/status/1684428663872446465

jszhn

Recent Notes

Accounting method

Adjugate matrix

Algorithm

Algorithmic analysis

Alma Linux

Activation function

Dominant types

Graph View

Backlinks