\

Chapter 12: Hash Tables

8 min read

Chapter 12 Overview: Hash Tables

Hash tables (or hash maps) are one of the most useful and versatile data structures in computer science. They provide a mechanism to store key-value pairs and allow for extremely fast average-case lookups, insertions, and deletions. In Python, the dict and set types are implemented using hash tables.

Key Hash Table Concepts:

  • Hash Function: A function that maps a key to an integer, which is then used as an index (or “hash code”) in an underlying array (often called buckets or slots). A good hash function should distribute keys uniformly across the array to minimize collisions.
  • Key Properties: Keys must be “hashable,” which generally means they must be immutable. An object is hashable if it has a hash value that never changes during its lifetime.
  • Buckets/Slots: The underlying array that stores the data. The hash code determines which bucket a key-value pair belongs to.
  • Collision: When two different keys map to the same bucket index. This is inevitable unless the hash table is very large and the hash function is perfect.
  • Collision Resolution: The strategy for handling collisions.
    1. Separate Chaining (or Chaining): Each bucket is a pointer to a linked list (or another data structure) that holds all the key-value pairs that hashed to that index. This is a very common method.
    2. Open Addressing: If a collision occurs, the algorithm probes for the next empty slot in the array according to a fixed sequence. Examples include Linear Probing (checking the next slot, then the next, etc.), Quadratic Probing, and Double Hashing.

Performance:

  • Average Case: O(1) for lookups, insertions, and deletions.
  • Worst Case: O(n) for lookups, insertions, and deletions. This happens when all keys hash to the same bucket, effectively turning the hash table into a single linked list.

Section 12.1: Test for Palindromic Permutations

The Problem: Given a string, determine if any permutation of its characters can form a palindrome.

Example:

  • “edified” -> True (a permutation is “deified”)
  • “level” -> True (it’s already a palindrome)
  • “aabbc” -> True (a permutation is “abcba”)
  • “abc” -> False

Key Insight: A string can be permuted into a palindrome if and only if at most one of its characters appears an odd number of times. All other characters must appear an even number of times.

  • If the string length is even, all character counts must be even.
  • If the string length is odd, exactly one character count must be odd (the one that will be the center of the palindrome).

Algorithm (Using a Hash Table/Set):

  1. Initialize a hash table (or hash set) to track the counts of characters that have appeared an odd number of times so far. Let’s call it odd_chars.
  2. Iterate through each character c in the input string:
    • If c is already in odd_chars, it means we’ve now seen this character an even number of times. Remove it from odd_chars.
    • If c is not in odd_chars, it means we’ve now seen it an odd number of times. Add it to odd_chars.
  3. After iterating through the whole string, check the size of odd_chars.
    • If len(odd_chars) <= 1, then a palindromic permutation is possible. Return True.
    • Otherwise, return False.

Complexity:

  • Time: O(n), where n is the number of characters in the string, because we iterate through it once.
  • Space: O(c), where c is the number of unique characters in the character set (e.g., 26 for lowercase English letters, 128 for ASCII). This is effectively O(1) if the character set is fixed and finite.

Section 12.2: Is an Anonymous Letter Constructible?

The Problem: You are given two strings, the letter_text and the magazine_text. Write a function to determine if the letter can be written using characters from the magazine. Each character in the magazine can only be used once.

Algorithm (Using a Hash Table for Character Counts):

  1. Create a hash table (e.g., a collections.Counter in Python or a simple dict) to store the counts of all characters available in the magazine_text.
  2. Iterate through magazine_text and populate the hash table.
  3. Iterate through each character c in the letter_text:
    • Check if c is in the hash table and if its count is greater than 0.
    • If c is not in the table or its count is 0, it means we don’t have this character available in the magazine. Return False.
    • If the character is available, decrement its count in the hash table to “use it up”.
  4. If the loop completes, it means all characters required for the letter were found in the magazine. Return True.

Complexity:

  • Time: O(m + n), where m is the length of the magazine text and n is the length of the letter text.
  • Space: O(c), where c is the number of unique characters in the magazine’s character set.

Section 12.4: Implement an LRU Cache

The Problem: Design and implement a Least Recently Used (LRU) cache. It should support two operations: insert and lookup. The cache has a fixed capacity. When a new item is inserted and the cache is full, the least recently used item must be evicted.

Key Requirements:

  • lookup(key): If the key exists, return its value and mark it as recently used.
  • insert(key, value): Insert or update the key-value pair. Mark it as recently used. If insertion causes the cache to exceed capacity, evict the LRU item.
  • Both operations should be as fast as possible, ideally O(1).

Algorithm (Hash Table + Doubly Linked List): This is the classic solution that combines the strengths of two data structures to achieve O(1) performance.

  1. Hash Table (dict in Python): Stores key -> Node mappings. This provides O(1) average time complexity for lookups. The Node it points to is a node in our doubly linked list.
  2. Doubly Linked List: Stores the items in order of use. The most recently used item is at the head (or tail), and the least recently used item is at the other end.
    • A doubly linked list is crucial because it allows for O(1) removal of any node if we have a direct pointer to it. The hash table gives us this pointer.

Operations:

  • lookup(key):
    1. Check if key is in the hash table. If not, return None (cache miss).
    2. If it exists, get the Node from the hash table.
    3. Move this Node to the head of the linked list to mark it as most recently used.
    4. Return the Node’s value.
  • insert(key, value):
    1. Check if key is already in the hash table.
      • If yes, update the value in its corresponding Node and move that Node to the head of the list.
    2. If key is not in the hash table (a new insertion):
      • If the cache is full (len(hash_table) == capacity):
        • Get the lru_node from the tail of the linked list.
        • Remove lru_node.key from the hash table.
        • Remove the lru_node from the linked list.
      • Create a new Node with the given key and value.
      • Add the new Node to the head of the linked list.
      • Add the key -> new_Node mapping to the hash table.

Section 12.8: Find the Smallest Subarray Covering a Set of Keywords

The Problem: Given a long text (an array of words) and a set of keywords, find the smallest subarray of the text that contains all the keywords.

Example:

  • Text: ["apple", "banana", "apple", "apple", "dog", "cat", "apple", "dog", "banana", "apple", "cat", "dog"]
  • Keywords: {"banana", "cat"}
  • Result: ["cat", "apple", "dog", "banana"] (length 4, from index 6 to 9)

Algorithm (Sliding Window with Hash Table): This is a classic sliding window problem. We expand a window from the right and shrink it from the left.

  1. Create a hash table keyword_counts to store the count of each keyword we need to find. Initialize it from the input set of keywords.
  2. Initialize left = 0, result = (-1, -1) (to store the start/end of the best subarray), and remaining_to_find = len(keywords).
  3. Use a hash table window_counts to track counts of keywords within the current window [left, right].
  4. Iterate with a right pointer from 0 to len(text) - 1:
    • Let word = text[right].
    • If word is one of the keywords:
      • Increment its count in window_counts.
      • If window_counts[word] is now equal to keyword_counts[word], it means we have found enough of this specific keyword. Decrement remaining_to_find.
    • Shrink the window from the left: While remaining_to_find == 0:
      • We have a valid window [left, right]. Check if it’s smaller than the best result found so far. If so, update result.
      • Let left_word = text[left].
      • If left_word is a keyword:
        • Decrement its count in window_counts.
        • If its window_counts is now less than its keyword_counts, we no longer have all required keywords in the window. Increment remaining_to_find.
      • Move the left boundary: left += 1.
  5. Return the subarray corresponding to the final result indices.