Hashing is the technique used for performing almost constant time search in case of insertion, deletion and find operation. The essence of hashing is to facilitate the next level searching method when compared with the linear or binary search. Hashing is the solution that can be used in almost all such situations and performs extremely well compared to above data structure like Array, Linked List, Balanced Binary search Tree.
Hash Function: A function that converts a given big phone number to a small practical integer value. The mapped integer value is used as an index in hash table. In simple terms, a hash function maps a big number or string to a small integer that can be used as index in hash table.
A good hash function should have following properties
1) Efficiently computable.
2) Should uniformly distribute the keys (Each table position equally likely for each key)
For example for phone numbers a bad hash function is to take first three digits. A better function is consider last three digits. Please note that this may not be the best hash function. There may be better ways.
Hash Table: This table is an array of constant size which depends on applications. The hash table contains key values with pointers to the corresponding records. The basic idea is to place key value into location in the hash table and this location will be calculated from the key value itself. The one to one correspondence between the key value and index in the hash table is known as hashing. The ideal hash table structure is merely an array of some fixed size, containing the items. A stored item needs to have a data member, called key, that will be used in computing the index value for the item.
Key could be an integer, a string, etc
a name or Id that is a part of a large employee structure
The size of the array is TableSize.
The items that are stored in the hash table are indexed by values from 0 to TableSize – 1.
Each key is mapped into some number in the range 0 to TableSize – 1.
The mapping is called a hash function.
Collision Handling: Since a hash function gets us a small number for a big key, there is possibility that two keys result in same value. The situation where a newly inserted key maps to an already occupied slot in hash table is called collision and must be handled using some collision handling technique. Following are the ways to handle collisions:
- Chaining: The idea is to make each cell of hash table point to a linked list of records that have same hash function value. Chaining is simple, but requires additional memory outside the table.
- Open Addressing: In open addressing, all elements are stored in the hash table itself. Each table entry contains either a record or NIL. When searching for an element, we one by one examine table slots until the desired element is found or it is clear that the element is not in the table. Some common Open Addressing methods
- Linear probing
- Quadratic probing
- Double hashing