Convolutional autoencoders are primarily useful with image data, due to the convolution’s properties, which can take advantage of spatial features. An encoder learns the visual embeddings, and the decoder up-samples the learned embeddings to match the original size of the image.

The encoder uses typical convolutions (the same as in convolutional neural networks), whereas the decoder uses transposed convolutions with the parameters backwards.

In code

Convolutional autoencoders follow a similar structure as regular autoencoders, with the addition of convolutional layers.

The encoder and decoder networks can be specified with something like:

self.encoder = nn.Sequential(
    nn.Conv2d(1, 16, 3, stride=2, padding=1),  
    nn.ReLU(),  
    nn.Conv2d(16, 32, 3, stride=2, padding=1),  
    nn.ReLU(),  
    nn.Conv2d(32, 64, 7)  
)  
self.decoder = nn.Sequential(  
    nn.ConvTranspose2d(64, 32, 7),  
    nn.ReLU(),  
    nn.ConvTranspose2d(32, 16, 3, stride=2, padding=1, output_padding=1),  
    nn.ReLU(),  
    nn.ConvTranspose2d(16, 1, 3, stride=2, padding=1, output_padding=1),  
    nn.Sigmoid()  
)