Difference between revisions of "Hashtable"
From Suhrid.net Wiki
Jump to navigationJump to search(2 intermediate revisions by the same user not shown) | |||
Line 6: | Line 6: | ||
* However, this approach suffers from wasted space in the array - for those array indexes for which corresponding hash values are never generated. | * However, this approach suffers from wasted space in the array - for those array indexes for which corresponding hash values are never generated. | ||
* Another problem is collisions - a single hash value can represent many keys. | * Another problem is collisions - a single hash value can represent many keys. | ||
+ | * How to reduce the no. of collisions - increase the address space. Therefore, most hashing algorithms are a trade-off between space and time efficiency. | ||
* To solve wasted space use : hashcode modulo arraysize. | * To solve wasted space use : hashcode modulo arraysize. | ||
* To solve collisions use linkedlists to store key->value pairs for each hash value. | * To solve collisions use linkedlists to store key->value pairs for each hash value. | ||
+ | * Also choosing prime numbers as the hash table size gives better results. | ||
= Hash function properties = | = Hash function properties = | ||
Line 13: | Line 15: | ||
* Distribute the keys over a larger range of values. | * Distribute the keys over a larger range of values. | ||
* Combining the above two, Ideally a hash function should distribute hash codes evenly over the range of integers. | * Combining the above two, Ideally a hash function should distribute hash codes evenly over the range of integers. | ||
+ | |||
+ | = Collision Resolution = | ||
+ | |||
+ | * Now when we calculate a hash for a key & find that there is a collision, we can resize the array & store all the keys separately. | ||
+ | * However, this is expensive & there will be lots of wasted space in the array. | ||
+ | * One solution to collision is called linear probing. i.e. Just place the item in the next available slot. | ||
+ | * Another solution is use buckets to store more than one item at each position. | ||
+ | * However the buckets can have large no. of items which makes searching through them also costly. | ||
+ | * We need to redistribute evenly between the greater no. of buckets, leading to a decrease in the size of each individual bucket. | ||
+ | * When to resize ? Monitor the load factor. No. of values/No. of buckets (No. of array indices). If it is too high, then some of the buckets are too large. | ||
+ | * A good load factor is 0.75 - 75% full. |
Latest revision as of 23:01, 19 December 2014
Hashing
- Hashing is designed to solve the problem of needing to efficiently find or store an item in a collection. It is a technique to narrow down the search space.
- Hashing means using some function or algorithm to map object data to some representative integer value.
- One basic approach. Calculate a hash for each key and store the hash value in an array index of the same value. Store the key, value pair in the same index positions in other arrays.
- However, this approach suffers from wasted space in the array - for those array indexes for which corresponding hash values are never generated.
- Another problem is collisions - a single hash value can represent many keys.
- How to reduce the no. of collisions - increase the address space. Therefore, most hashing algorithms are a trade-off between space and time efficiency.
- To solve wasted space use : hashcode modulo arraysize.
- To solve collisions use linkedlists to store key->value pairs for each hash value.
- Also choosing prime numbers as the hash table size gives better results.
Hash function properties
- Be able to distribute the keys evenly over the range of positions in the array.
- Distribute the keys over a larger range of values.
- Combining the above two, Ideally a hash function should distribute hash codes evenly over the range of integers.
Collision Resolution
- Now when we calculate a hash for a key & find that there is a collision, we can resize the array & store all the keys separately.
- However, this is expensive & there will be lots of wasted space in the array.
- One solution to collision is called linear probing. i.e. Just place the item in the next available slot.
- Another solution is use buckets to store more than one item at each position.
- However the buckets can have large no. of items which makes searching through them also costly.
- We need to redistribute evenly between the greater no. of buckets, leading to a decrease in the size of each individual bucket.
- When to resize ? Monitor the load factor. No. of values/No. of buckets (No. of array indices). If it is too high, then some of the buckets are too large.
- A good load factor is 0.75 - 75% full.