For CNN: uses a mixture of different sized kernels instead of single kernels (size carefully tuned).

Larger filters (like 7x7) aren’t necessary to learn the most important features in an image.