Probing Hash Table Idea

Probing Hash Tables Idea (13 minutes) (Spring 2021)

Motivation

Chaining hash tables handled collisions (elements hashing to the same index) by maintaining a collection of elements at each hash index.

While chaining hash tables are easy to implement, they have the disadvantage of using linked lists. Linked lists can slow performance down, due to extra time needed for allocating new cells. You are also required to maintain a separate data structure at each hash index, instead of a flat hash table, you have a "collection of collections", or "table of lists", or "vector of sets", as the last lecture covered.

There is another way to resolve collisions, that can be done in a flat hash table. Just a vector/table/array, no "collection of collections", linked lists, or additional data structure at each hash index.

Probing

The general idea of probing is:

  1. Get the hash index of a key/element (that is either being searched for or inserted)

  2. If that hash index is occupied by some other key/element, simply find another hash index.

  3. If you reach an empty hash index, you either know that it's safe to insert, or that the element/key is NOT present. (Depending on if you were inserting or searching)

Two ways of Probing are:

Linear probing (if hash index 2 is full, go to hash index 3. Repeat, moving to 4, 5, etc.)

Quadratic probing - advance through the hash table using the square of your iterative attempt. If you are on probing attempt 1, look 1 spot ahead. Probing attempt 2, look 4 spots ahead. Etc etc, Searching 1, 4, 9, 16, 25, etc spots ahead. Always % table size, so your searches will wrap around the table, hopefully hitting every hash index.

An Issue with Linear Probing

With linear probing, you are looking at "the very next hash index" every time you need to probe further. With inserts that hash to the same hash index, this results in adjacent clusters of occupied hash indexes.

This reduces performance, as you have to linearly search through potentially large adjacent clusters of hash indexes.

Thanks

Thanks for Brian Foster for writing up these notes based on the video lectures.