A hash table is a data structure that stores key-value pairs. Data is essentially stored in a very large array, at least 1.3 times as large as maximum number of elements to be stored. Each key will map to a unique index using a hash function .

Hash tables find uses in databases as well as in caching, where quick data retrieval is important. They’re fast: on average we see performance on basic operations like search, insert, and delete.

Good hash functions will map evenly across the array. For numerical values, an example of this is .1 We also need to deal with hash collisions where a hash function might return the same output for two different inputs. All hash functions deal with this, so that’s why hash tables are super large. As for some disadvantages:

  • We need to know the size of the table ahead of time
  • Requires a huge amount of space
  • Table needs to be zeroed out at the beginning
  • We can’t list elements in an ordered sequence

Time complexity

Hash tables are fast: on average we see performance on basic operations (search, insert, delete). The runtime of hash operations will depend on the number of probes required, where the number of probes is a function of the load factor.

Basic operations

Insert operations should follow a hashing approach outlined in hash collisions. When we deal with deletes, we may run into problems.

In code

The C++ Standard Template Library contains the std::unordered_map<T, T> structure. This uses hashing with average access time. There’s also the std::map<T, T> structure, that is based on a red-black tree.

Footnotes

  1. From Prof Stumm’s lecture notes.