In deep learning, deep sets are a type of neural network architecture that can handle sets, multisets, or graphs as an input, i.e., order-invariant data structures. They are designed to be permutation-invariant, i.e., if the input order changes, the output doesn’t change; and they also accept variable-sized inputs.
They’re constructed with the following steps, involving a mix of usual neural network components:
- A shared neural network (MLP, transformer) maps/encodes each element in the input set to an embedding vector space.
- Embeddings are learned for the set in a pooling layer that aggregates the embeddings into a single output using a permutation-invariant operation (i.e., a sum, mean, or max).
- This is then decoded with another neural network to project the embedding into a final output vector space.