Sets are an abstract data type (ADT) that are unordered and contain only unique elements (i.e., no duplicate elements), as an approximation of sets in mathematics. Unlike other collection types, we test for value membership in a set, instead of retrieving a specific element in a set.
Variations
In many languages, they’re a built-in data type. Sets can be implemented in a few different data structures, usually as red-black trees. This preserves the ability to traverse the set in order of insertion. This results in an average operation time of .
It’s also possible to implement a hashing function as the access mechanism, for hash sets (or unordered sets). We use this when the insertion order doesn’t matter (most of the time, like when we don’t need to iterate), since it drops the time complexity down to .
Multisets are structures that allow for multiple same elements, i.e., they are sets that can contain duplicate elements. This might seem unintuitive, but they effectively encode order invariance (i.e., they ensure abcc == cbac
).
Language-specific
In C++
In C++, sets are provided as part of the STL as std::set<T>
. They’re implemented as a red-black tree, with average operation time of . The std::unordered_set
uses hashing with average operation complexity of .
Some useful methods to know:
insert()
adds a given key to the set.erase()
removes the specified iterator or key from the set.contains()
(as of C++20) is a Boolean function that checks if the key exists in the set.
C++ also includes std::multiset<T>
and std::unordered_multiset<T>
. The notable addition to these containers is the count(key)
method, which counts occurrences of a given key.
See also
- Union find, which extend the functionality of sets by containing multiple disjoint subsets of elements