Hash Tables

randerson112358
7 min readNov 7, 2017

--

The idea of hashing is to distribute the entries (key/value pairs) across an array of buckets. Given a key, the algorithm computes an index that suggests where the entry can be found.

In general, hash table components:

  1. A function to return the index: index = f(key, array_size);
  2. Hash function to return the transformation being done: hash = hashfunc(key);
  3. Creating the index: index = hash % array_size;

The end result may look something like the following:

Code Idea:

f(key, array_size){
int hash = hash(key);
int index = hash % array_size;
return index;
}
hash(key){
// Code for whatever transformation you want
// return that transformation
}
A small phone book as a hash table, notice the names/contacts are the keys then a hash function is used to transform those names into index numbers and within the “bucket/array” for the particular index contains a phone number.

C Program

/**
This programs uses the hash table
to change and store phone numbers from names into an array.
NOTE: If you try adding more than 1 contact this could cause collisions. For example if we had a contact named Nick and another named Norbert, using the below code, the key for both will be the first character 'N' which will map to the same number and thus inserting the second person into the phonebook/array will overwrite the first. Written in the C programming language
By: (randerson112358)
*/
# include <stdio.h>
int main(void)
{
int phoneNumber= 5554447777;// This is the 7 digit phone number of the contact
int index;// This is the index within the phonebook/ hashTable
char[31] name = "Nick";// This is the contacts name
int table_size = 10;
int hashTable[table_size]; // This is the phone book

index = f(name[0], table_size);
hashTable[index] = phoneNumber;
system("pause"); //Comment this out if you are not using windows operating system
}
//returns the character key as an integer
int hash( char key){
return (int)key;
}
int f(char key, int hashTable_size){
int hash = hash(key);
int index = hash %hashTable_size;
return index;
}

Hash Table Uses:

Hash tables are the fastest data storage structure. This makes them a necessity for situations in which a computer program, rather than a human, is interacting with the data. Hash tables are typically used in spelling checkers and as symbol tables in computer language compilers, where a program must check thousands of words or symbols in a fraction of a second. Hash tables are used when speedy insertion, deletion, and lookup is the priority.

  1. Associative arrays
    Hash tables are commonly used to implement many types of in-memory tables. They are used to implement associative arrays (arrays whose indices are arbitrary strings or other complicated objects), especially in interpreted programming languages like Perl, Ruby, Python, and PHP.
  2. Database indexing
    Hash tables may also be used as disk-based data structures and database indices (such as in dbm) although B-trees are more popular in these applications. In multi-node database systems, hash tables are commonly used to distribute rows amongst nodes, reducing network traffic for hash joins.
  3. Caches
    Hash tables can be used to implement caches, auxiliary data tables that are used to speed up the access to data that is primarily stored in slower media. In this application, hash collisions can be handled by discarding one of the two colliding entries usually erasing the old item that is currently stored in the table and overwriting it with the new item, so every item in the table has a unique hash value.
  4. Sets
    Besides recovering the entry that has a given key, many hash table implementations can also tell whether such an entry exists or not. Those structures can therefore be used to implement a set data structure, which merely records whether a given key belongs to a specified set of keys. In this case, the structure can be simplified by eliminating all parts that have to do with the entry values. Hashing can be used to implement both static and dynamic sets.
Hash Table C Program Video

Collision Resolution

Hash collisions are practically unavoidable when hashing a random subset of a large set of possible keys. For example, if 2,450 keys are hashed into a million buckets, even with a perfectly uniform random distribution, according to the birthday problem there is approximately a 95% chance of at least two of the keys being hashed to the same slot. Therefore, almost all hash table implementations have some collision resolution strategy to handle such events. Some common strategies are described below. All these methods require that the keys (or pointers to them) be stored in the table, together with the associated values.

  • Separate Chaining
  • Robin Hood Hashing
  • Open Addressing

Separate Chaining
In the method known as separate chaining, each bucket is independent, and has some sort of list of entries with the same index. The time for hash table operations is the time to find the bucket (which is constant) plus the time for the list operation. In a good hash table, each bucket has zero or one entries, and sometimes two or three, but rarely more than that. Therefore, structures that are efficient in time and space for these cases are preferred. Structures that are efficient for a fairly large number of entries per bucket are not needed or desirable. If these cases happen often, the hashing is not working well, and this needs to be fixed.

Robin Hood Hashing
One interesting variation on double-hashing collision resolution is Robin Hood hashing. The idea is that a new key may displace a key already inserted, if its probe count is larger than that of the key at the current position. The net effect of this is that it reduces worst case search times in the table. This is similar to ordered hash tables except that the criterion for bumping a key does not depend on a direct relationship between the keys. Since both the worst case and the variation in the number of probes is reduced dramatically, an interesting variation is to probe the table starting at the expected successful probe value and then expand from that position in both directions. External Robin Hashing is an extension of this algorithm where the table is stored in an external file and each table position corresponds to a fixed-sized

Open Addressing
In another strategy, called open addressing, all entry records are stored in the bucket array itself. When a new entry has to be inserted, the buckets are examined, starting with the hashed-to slot and proceeding in some probe sequence, until an unoccupied slot is found. When searching for an entry, the buckets are scanned in the same sequence, until either the target record is found, or an unused array slot is found, which indicates that there is no such key in the table. The name “open addressing” refers to the fact that the location (“address”) of the item is not determined by its hash value. (This method is also called closed hashing; it should not be confused with “open hashing” or “closed addressing” that usually mean separate chaining.). Quadratic probing is an open addressing scheme in computer programming for resolving collisions in hash tables when an incoming data’s hash value indicates it should be stored in an already-occupied slot or bucket. Linear probing is another example of open addressing.

Quadratic Probing

Insert the following numbers into a hash table of size 7 using the hash function H(key) = (key + j² ) mod 7. Show the result when collisions are resolved using quadratic probing.

Numbers: 76, 40, 48, 5, 20

Pre Condition
1) The size of the hash table should be at least half empty
2) The size of the hash table should be a prime number.

Strategy
1) Set counter j = 0
2) Get the hash value, h(k) = (k +j²) mod table_size
3) If hash_table[h(k)] is empty, insert the key and stop
else the space is occupied, we must find the next available space.
3.1) increment j by 1
3.2) compute a new hash value, h(k) = (k + j²) mode table_size
3.3) Repeat step 3 till j is equal to table_size
4) The hash table is full
5) Stop

Linear Probing

Linear probing insertion is a strategy for resolving collisions or keys that map to the same index in a hash table. Insert the following numbers into a hash table of size 5 using the hash function H(key) = key mod 5. Show the result when the collisions are resolved.

Numbers: 10, 11, 12, 15

Pre Condition
1) The size of the hash table should be at least half empty
2) The size of the hash table should be a prime number.

Strategy

  1. Use a hash function to find the index for a key.
  2. If that spot contains a value, use the next available spot “ a higher index”. If you reach the end of the array, go back to the front of the array.

Get the C Program here:

Thanks for reading this article I hope its helpful to you all ! Keep up the learning, and if you would like more algorithm analysis videos please visit and subscribe to my YouTube channels (randerson112358 & compsci112358 )

Check Out the following for content / videos on Algorithm Analysis and Programming:

YouTube Channel:
randerson112358: https://www.youtube.com/channel/UCaV_0qp2NZd319K4_K8Z5SQ

compsci112358:
https://www.youtube.com/channel/UCbmb5IoBtHZTpYZCDBOC1CA

Website:
http://everythingcomputerscience.com/

Video Tutorials on Recurrence Relation:
https://www.udemy.com/recurrence-relation-made-easy/

Video Tutorial on Algorithm Analysis:
https://www.udemy.com/algorithm-analysis/

Twitter:
https://twitter.com/CsEverything

Resources:
http://courses.washington.edu/css502/zander/Notes/09hash.pdf

http://www.cs.cmu.edu/~ab/15-121N11/lectures/lecture16.pdf

--

--