In deep learning, deep sets are a type of neural network architecture that can handle sets, multisets, or graphs as an input, i.e., order-invariant data structures. They are designed to be permutation-invariant, i.e., if the input order changes, the output doesn’t change; and they also accept variable-sized inputs.

They’re constructed with the following steps, involving a mix of usual neural network components:

  • A shared neural network (MLP, transformer) maps/encodes each element in the input set to an embedding vector space.
  • Embeddings are learned for the set in a pooling layer that aggregates the embeddings into a single output using a permutation-invariant operation (i.e., a sum, mean, or max).
  • This is then decoded with another neural network to project the embedding into a final output vector space.