Deep learning is a statistical learning approach that aims to mimic how the brain learns data. The core of deep learning is artificial neural networks, which consist of many simple but non-linear layers that learn representations with tasks directly from data.
Sub-pages
- Neuron
- Activation function
- Linear activation function
- Unit step function (and sign function)
- Sigmoid
- ReLU
- Softmax
- Temperature scaling
- Neural network layer
- Neural network architectures
- Feed-forward network (MLP)
- Fully-connected network
- Residual network
- Convolutional neural network (CNN)
- Convolution
- Pooling
- Existing architectures
- Transposed convolution
- Autoencoder
- Recurrent neural network (RNN)
- Long short-term memory (LSTM)
- Gated recurrent unit (GRU)
- Sampling strategies
- Greedy search
- Beam search
- Temperature scaling
- Graph neural network (GNN)
- Model performance metrics
- Error function
- Mean-squared error
- Cross entropy (CE) and binary cross entropy (BCE)
- Negative log likelihood (NLL)
- Classification metrics
- Accuracy, precision, recall, F1-score, support
- Error function
- Gradient descent
- Data processing
- Batch normalisation
- Transfer learning
- Generative AI
- Variational autoencoders (VAE)
- KL divergence
- Generative adversarial network (GAN)
- CycleGAN
- Transformer
Tools
Deep learning is primarily done in Python:
Resources
- Dive into Deep Learning, by Zhang, et al., for an accessible introduction
- Deep Learning, by Ian Goodfellow, Yoshua Bengia, and Aaron Courville, for a mathematically rigorous introduction
- Ilya Sutskever’s 30u30 reading list
Limits
The consequence of deep learning’s dominance is that much of the time spent in ML work is data preparation — data cleaning, sorting, and labelling for meaningful use (see data science). One other problem is that DL models will be quite accurate, but we can’t understand how or why they got to this conclusion (which motivates explainable AI). Models can also have the tendency to latch onto correlations, which is problematic because they won’t be able to find the causes (i.e., they could go in the wrong direction).
There are also limits with adversarial attacks, where specifically engineered noise can destroy the result of the model in a way that’s imperceptible to us.
Because of biases in the model’s training data, we also run the risk of having the model itself being biased. This can cause real-world problems in discrimination, especially when DL is applied to critical applications. There’s a whole host of ethical problems associated with modern machine learning.