Softmax is a function that serves as the final output layer’s activation function in a multiclass classification problem.
What does softmax do? It normalises the logits (the raw outputs of the NN) into a discrete probability distribution of all possible classes.
i.e., for a logit with possible classes, we run softmax times.
Temperature scaling
Softmax temperature scaling helps solve over-confidence in neural networks by scaling the input logits to the softmax with a temperature.
- A low temperature has larger logits with more confidence. It generates higher quality samples with less variety.
- A high temperature has smaller logits with less confidence. It has the opposite.
- Conceptually, the temperature is similar to the idea of simulated annealing. A high temperature indicates large levels of exploration compared to a low temperature. Low temperature = stable changes/exploration.
For a generative RNN, a high temperature means larger chance of nonsense outputs. A low temperature means relatively stable and sensical outputs.