Latest revision as of 23:01, 19 December 2014

Hashing

Hashing is designed to solve the problem of needing to efficiently find or store an item in a collection. It is a technique to narrow down the search space.
Hashing means using some function or algorithm to map object data to some representative integer value.
One basic approach. Calculate a hash for each key and store the hash value in an array index of the same value. Store the key, value pair in the same index positions in other arrays.
However, this approach suffers from wasted space in the array - for those array indexes for which corresponding hash values are never generated.
Another problem is collisions - a single hash value can represent many keys.
How to reduce the no. of collisions - increase the address space. Therefore, most hashing algorithms are a trade-off between space and time efficiency.
To solve wasted space use : hashcode modulo arraysize.
To solve collisions use linkedlists to store key->value pairs for each hash value.
Also choosing prime numbers as the hash table size gives better results.

Hash function properties

Be able to distribute the keys evenly over the range of positions in the array.
Distribute the keys over a larger range of values.
Combining the above two, Ideally a hash function should distribute hash codes evenly over the range of integers.

Collision Resolution

Now when we calculate a hash for a key & find that there is a collision, we can resize the array & store all the keys separately.
However, this is expensive & there will be lots of wasted space in the array.
One solution to collision is called linear probing. i.e. Just place the item in the next available slot.
Another solution is use buckets to store more than one item at each position.
However the buckets can have large no. of items which makes searching through them also costly.
We need to redistribute evenly between the greater no. of buckets, leading to a decrease in the size of each individual bucket.
When to resize ? Monitor the load factor. No. of values/No. of buckets (No. of array indices). If it is too high, then some of the buckets are too large.
A good load factor is 0.75 - 75% full.

@@ Line 6: / Line 6: @@
 * However, this approach suffers from wasted space in the array - for those array indexes for which corresponding hash values are never generated.
 * Another problem is collisions - a single hash value can represent many keys.
+* How to reduce the no. of collisions - increase the address space. Therefore, most hashing algorithms are a trade-off between space and time efficiency.
 * To solve wasted space use : hashcode modulo arraysize.
 * To solve collisions use linkedlists to store key->value pairs for each hash value.
+* Also choosing prime numbers as the hash table size gives better results.
 = Hash function properties =
@@ Line 13: / Line 15: @@
 * Distribute the keys over a larger range of values.
 * Combining the above two, Ideally a hash function should distribute hash codes evenly over the range of integers.
+= Collision Resolution =
+* Now when we calculate a hash for a key & find that there is a collision, we can resize the array & store all the keys separately.
+* However, this is expensive & there will be lots of wasted space in the array.
+* One solution to collision is called linear probing. i.e. Just place the item in the next available slot.
+* Another solution is use buckets to store more than one item at each position.
+* However the buckets can have large no. of items which makes searching through them also costly.
+* We need to redistribute evenly between the greater no. of buckets, leading to a decrease in the size of each individual bucket.
+* When to resize ? Monitor the load factor. No. of values/No. of buckets (No. of array indices). If it is too high, then some of the buckets are too large.
+* A good load factor is 0.75 - 75% full.

Difference between revisions of "Hashtable"

Latest revision as of 23:01, 19 December 2014

Hashing

Hash function properties

Collision Resolution

Navigation menu

Page actions

Page actions

Personal tools

Navigation

Search

Tools