Chapter 12: Hash Tables
8 min readChapter 12 Overview: Hash Tables
Hash tables (or hash maps) are one of the most useful and versatile data structures in computer science. They provide a mechanism to store key-value pairs and allow for extremely fast average-case lookups, insertions, and deletions. In Python, the dict and set types are implemented using hash tables.
Key Hash Table Concepts:
- Hash Function: A function that maps a key to an integer, which is then used as an index (or “hash code”) in an underlying array (often called buckets or slots). A good hash function should distribute keys uniformly across the array to minimize collisions.
- Key Properties: Keys must be “hashable,” which generally means they must be immutable. An object is hashable if it has a hash value that never changes during its lifetime.
- Buckets/Slots: The underlying array that stores the data. The hash code determines which bucket a key-value pair belongs to.
- Collision: When two different keys map to the same bucket index. This is inevitable unless the hash table is very large and the hash function is perfect.
- Collision Resolution: The strategy for handling collisions.
- Separate Chaining (or Chaining): Each bucket is a pointer to a linked list (or another data structure) that holds all the key-value pairs that hashed to that index. This is a very common method.
- Open Addressing: If a collision occurs, the algorithm probes for the next empty slot in the array according to a fixed sequence. Examples include Linear Probing (checking the next slot, then the next, etc.), Quadratic Probing, and Double Hashing.
Performance:
- Average Case: O(1) for lookups, insertions, and deletions.
- Worst Case: O(n) for lookups, insertions, and deletions. This happens when all keys hash to the same bucket, effectively turning the hash table into a single linked list.
Section 12.1: Test for Palindromic Permutations
The Problem: Given a string, determine if any permutation of its characters can form a palindrome.
Example:
- “edified” ->
True(a permutation is “deified”) - “level” ->
True(it’s already a palindrome) - “aabbc” ->
True(a permutation is “abcba”) - “abc” ->
False
Key Insight: A string can be permuted into a palindrome if and only if at most one of its characters appears an odd number of times. All other characters must appear an even number of times.
- If the string length is even, all character counts must be even.
- If the string length is odd, exactly one character count must be odd (the one that will be the center of the palindrome).
Algorithm (Using a Hash Table/Set):
- Initialize a hash table (or hash set) to track the counts of characters that have appeared an odd number of times so far. Let’s call it
odd_chars. - Iterate through each character
cin the input string:- If
cis already inodd_chars, it means we’ve now seen this character an even number of times. Remove it fromodd_chars. - If
cis not inodd_chars, it means we’ve now seen it an odd number of times. Add it toodd_chars.
- If
- After iterating through the whole string, check the size of
odd_chars.- If
len(odd_chars) <= 1, then a palindromic permutation is possible. ReturnTrue. - Otherwise, return
False.
- If
Complexity:
- Time: O(n), where n is the number of characters in the string, because we iterate through it once.
- Space: O(c), where c is the number of unique characters in the character set (e.g., 26 for lowercase English letters, 128 for ASCII). This is effectively O(1) if the character set is fixed and finite.
Section 12.2: Is an Anonymous Letter Constructible?
The Problem: You are given two strings, the letter_text and the magazine_text. Write a function to determine if the letter can be written using characters from the magazine. Each character in the magazine can only be used once.
Algorithm (Using a Hash Table for Character Counts):
- Create a hash table (e.g., a
collections.Counterin Python or a simpledict) to store the counts of all characters available in themagazine_text. - Iterate through
magazine_textand populate the hash table. - Iterate through each character
cin theletter_text:- Check if
cis in the hash table and if its count is greater than 0. - If
cis not in the table or its count is 0, it means we don’t have this character available in the magazine. ReturnFalse. - If the character is available, decrement its count in the hash table to “use it up”.
- Check if
- If the loop completes, it means all characters required for the letter were found in the magazine. Return
True.
Complexity:
- Time: O(m + n), where m is the length of the magazine text and n is the length of the letter text.
- Space: O(c), where c is the number of unique characters in the magazine’s character set.
Section 12.4: Implement an LRU Cache
The Problem: Design and implement a Least Recently Used (LRU) cache. It should support two operations: insert and lookup. The cache has a fixed capacity. When a new item is inserted and the cache is full, the least recently used item must be evicted.
Key Requirements:
lookup(key): If the key exists, return its value and mark it as recently used.insert(key, value): Insert or update the key-value pair. Mark it as recently used. If insertion causes the cache to exceed capacity, evict the LRU item.- Both operations should be as fast as possible, ideally O(1).
Algorithm (Hash Table + Doubly Linked List): This is the classic solution that combines the strengths of two data structures to achieve O(1) performance.
- Hash Table (
dictin Python): Storeskey -> Nodemappings. This provides O(1) average time complexity for lookups. TheNodeit points to is a node in our doubly linked list. - Doubly Linked List: Stores the items in order of use. The most recently used item is at the head (or tail), and the least recently used item is at the other end.
- A doubly linked list is crucial because it allows for O(1) removal of any node if we have a direct pointer to it. The hash table gives us this pointer.
Operations:
lookup(key):- Check if
keyis in the hash table. If not, returnNone(cache miss). - If it exists, get the
Nodefrom the hash table. - Move this
Nodeto the head of the linked list to mark it as most recently used. - Return the
Node’s value.
- Check if
insert(key, value):- Check if
keyis already in the hash table.- If yes, update the value in its corresponding
Nodeand move thatNodeto the head of the list.
- If yes, update the value in its corresponding
- If
keyis not in the hash table (a new insertion):- If the cache is full (
len(hash_table) == capacity):- Get the
lru_nodefrom the tail of the linked list. - Remove
lru_node.keyfrom the hash table. - Remove the
lru_nodefrom the linked list.
- Get the
- Create a new
Nodewith the givenkeyandvalue. - Add the new
Nodeto the head of the linked list. - Add the
key -> new_Nodemapping to the hash table.
- If the cache is full (
- Check if
Section 12.8: Find the Smallest Subarray Covering a Set of Keywords
The Problem: Given a long text (an array of words) and a set of keywords, find the smallest subarray of the text that contains all the keywords.
Example:
- Text:
["apple", "banana", "apple", "apple", "dog", "cat", "apple", "dog", "banana", "apple", "cat", "dog"] - Keywords:
{"banana", "cat"} - Result:
["cat", "apple", "dog", "banana"](length 4, from index 6 to 9)
Algorithm (Sliding Window with Hash Table): This is a classic sliding window problem. We expand a window from the right and shrink it from the left.
- Create a hash table
keyword_countsto store the count of each keyword we need to find. Initialize it from the input set of keywords. - Initialize
left = 0,result = (-1, -1)(to store the start/end of the best subarray), andremaining_to_find = len(keywords). - Use a hash table
window_countsto track counts of keywords within the current window[left, right]. - Iterate with a
rightpointer from0tolen(text) - 1:- Let
word = text[right]. - If
wordis one of the keywords:- Increment its count in
window_counts. - If
window_counts[word]is now equal tokeyword_counts[word], it means we have found enough of this specific keyword. Decrementremaining_to_find.
- Increment its count in
- Shrink the window from the left: While
remaining_to_find == 0:- We have a valid window
[left, right]. Check if it’s smaller than the bestresultfound so far. If so, updateresult. - Let
left_word = text[left]. - If
left_wordis a keyword:- Decrement its count in
window_counts. - If its
window_countsis now less than itskeyword_counts, we no longer have all required keywords in the window. Incrementremaining_to_find.
- Decrement its count in
- Move the left boundary:
left += 1.
- We have a valid window
- Let
- Return the subarray corresponding to the final
resultindices.