The opposite of the convolution is the transposed convolution (different from an inverse convolution). They work with similar parameters, but instead map from 1 pixel to pixels, and kernels are learned just like regular convolutional kernels:

where is the output padding. The computation is:

  • For each pixel of the input image:
    • Multiply each value of the kernel (i.e., a kernel) with the input pixel to get a weighted kernel.
  • Insert it into the output to create an image.
  • If the outputs overlap, then sum them.

For padding, the effect is the opposite of the regular convolution. After the output is computed, the rows and columns around the perimeter are removed. This is used because depending on the geometric parameters, it’s ambiguous what the output shape should be.

The stride will result in an increase in the upsampling effect of the convolution, i.e., it will increase the output resolution.

A transposed convolution layer with the same specifications (input, output channels, kernel size, stride, etc.) will have the reverse effect on the shape.

In code

In PyTorch:

conv = nn.Conv2d(in_channels = 8, out_channels = 8, kernel_size = 5)
convt = nn.ConvTranspose2d(in_channels = 8, # opposite!
						   out_channels = 8, kernel_size = 5)

For a more robust example, here’s a sample encoder/decoder network in a convolutional autoencoder:

self.encoder = nn.Sequential(
    nn.Conv2d(1, 16, 3, stride=2, padding=1),  
    nn.ReLU(),  
    nn.Conv2d(16, 32, 3, stride=2, padding=1),  
    nn.ReLU(),  
    nn.Conv2d(32, 64, 7)  
)  
self.decoder = nn.Sequential(  
    nn.ConvTranspose2d(64, 32, 7),  
    nn.ReLU(),  
    nn.ConvTranspose2d(32, 16, 3, stride=2, padding=1, output_padding=1),  
    nn.ReLU(),  
    nn.ConvTranspose2d(16, 1, 3, stride=2, padding=1, output_padding=1),  
    nn.Sigmoid()  
)