Linear activation function

A linear activation function is described as:

y = w \cdot x + b

i.e., for $f (x) = x$ , the input of a neuron is nearly fully passed onto the next layer (if bias is 0). What this means is that for an $n$ -dimensional input space, our linear function produces an $(n - 1)$ -dimensional hyperplane.

Visually, we could think about a linear activation function dividing the input set into two spaces, i.e., for a classification problem.

Why don’t we want linear functions?

Most datasets aren’t linearly separable, i.e., there’s no line that can separate the input space.
We want multiple non-linear transformations in our layer to rectify this.
There’s no advantage from multiple linear layers.
- All layers become a single massive linear layer.
- Think about this: for many inputs, they’re all essentially sent through with few changes (sans a small bias change).

jszhn

Recent Notes

A* algorithm

ALOHA

ARP

Accounting

Activation function

Linear activation function

Graph View

Backlinks