\

Chapter 6: Strings

7 min read

Chapter 6 Overview: Strings

This chapter focuses on string manipulation, a fundamental skill in programming and a common topic in interviews. Problems often involve parsing, searching, and transforming string data.

Key String Concepts in Python:

  • Immutability: Python strings are immutable. This means that any method that appears to modify a string (like replace() or upper()) actually returns a new string. This has performance implications; building a string piece-by-piece in a loop using + concatenation is often inefficient (O(n^2)) because it creates a new string at each step. Using a list of characters and ''.join() at the end is the preferred, efficient (O(n)) approach.
  • Encoding: Understand that strings are sequences of Unicode characters. ord() gets the integer representation of a character, and chr() does the reverse.
  • Slicing: Python’s string slicing s[start:stop:step] is powerful and efficient for extracting substrings.

Section 6.1: Interconvert Strings and Integers

The Problem: Implement functions to convert a string to an integer (string_to_int) and an integer to a string (int_to_string), without using built-in functions like int() or str(). Handle negative numbers.

string_to_int(s)

Algorithm:

  1. Handle the sign. Check if the first character s[0] is '-'. If so, note it and slice the string to work with the remaining digits.
  2. Initialize result = 0.
  3. Iterate through the digits of the string. For each character c:
    • Convert the character to its integer value. ord(c) - ord('0') is a classic trick for this.
    • Update the result: result = result * 10 + digit.
  4. Apply the sign if the number was negative.

int_to_string(x)

Algorithm:

  1. Handle the sign. If x < 0, note it, and work with the absolute value of x. Handle the edge case x = 0.
  2. Use a list to store the characters of the digits.
  3. In a loop, as long as x > 0:
    • Get the last digit: digit = x % 10.
    • Convert it to a character: chr(ord('0') + digit).
    • Append the character to your list.
    • Update x: x //= 10.
  4. The list of characters will be in reverse order (e.g., 123 becomes ['3', '2', '1']).
  5. Reverse the list and join() it to form the string.
  6. Add the '-' prefix if the original number was negative.

Section 6.2: Base Conversion

The Problem: Given a string representing a number in base b1 and a target base b2, convert the number to a string in base b2. The bases are between 2 and 16. For digits greater than 9, use ‘A’ for 10, ‘B’ for 11, etc.

Algorithm (Two-Step Process): The simplest way to convert between two arbitrary bases is to use a common intermediate base, like base 10.

  1. Step 1: Convert from base b1 to an integer (base 10).

    • This is very similar to the string_to_int problem.
    • Iterate through the string s in b1. For each character c:
      • result = result * b1 + digit_value.
      • The digit_value for ‘A’…‘F’ needs to be handled (e.g., ord(c) - ord('A') + 10).
  2. Step 2: Convert the integer from base 10 to a string in base b2.

    • This is very similar to the int_to_string problem.
    • In a loop, as long as the number x > 0:
      • remainder = x % b2.
      • Convert remainder to its character representation (handling 10-15 -> ‘A’-‘F’).
      • Prepend this character to your result.
      • x //= b2.

Section 6.5: Test for Palindromicity

The Problem: Write a function that checks if a string is a palindrome, considering only alphanumeric characters and ignoring case.

Example:

  • “A man, a plan, a canal, Panama” -> True
  • “race a car” -> False

Algorithm (Two Pointers): This is a classic and efficient approach.

  1. Initialize two pointers: left = 0 and right = len(s) - 1.
  2. Loop while left < right:
    • Move left pointer: While left < right and the character s[left] is not alphanumeric, increment left.
    • Move right pointer: While left < right and the character s[right] is not alphanumeric, decrement right.
    • Compare: Compare s[left].lower() and s[right].lower(). If they are not equal, the string is not a palindrome. Return False.
    • Advance pointers: If they are equal, move the pointers inward: left += 1, right -= 1.
  3. If the loop completes, it means all valid character pairs matched. Return True.

Complexity:

  • Time: O(n), where n is the length of the string. Each pointer traverses the string at most once.
  • Space: O(1), as we only use a few variables for the pointers.

Section 6.8: The Look-and-Say Problem

The Problem: Generate the n-th term of the “look-and-say” sequence. This sequence starts with “1”. Each subsequent term is generated by reading aloud the digits of the previous term.

  • R(1) = “1”
  • R(2) = “11” (read R(1) as “one 1”)
  • R(3) = “21” (read R(2) as “two 1s”)
  • R(4) = “1211” (read R(3) as “one 2, one 1”)
  • R(5) = “111221” (read R(4) as “one 1, one 2, two 1s”)

Algorithm (Iterative Generation):

  1. Start with s = "1".
  2. Loop n-1 times to generate the subsequent terms. In each iteration:
    • Create a new empty string or list next_s.
    • Iterate through the current string s to identify groups of identical consecutive digits.
    • Use a pointer or index i. While i < len(s):
      • Let current_digit = s[i].
      • Count how many times it repeats consecutively. Let this be count.
      • Advance i by count.
      • Append the count and the digit to next_s. For example, if you see “222”, you append “3” and “2”.
    • Replace the old s with the newly generated next_s.
  3. Return s.

Section 6.9: Convert from Roman to Decimal

The Problem: Given a string representing a Roman numeral, convert it to an integer.

Roman Numeral Rules:

  • I=1, V=5, X=10, L=50, C=100, D=500, M=1000
  • Typically, values are added (e.g., VI = 5 + 1 = 6).
  • Subtraction Rule: A smaller value placed before a larger value is subtracted.
    • IV = 4 (5 - 1)
    • IX = 9 (10 - 1)
    • XL = 40 (50 - 10)
    • XC = 90 (100 - 10)
    • CD = 400 (500 - 100)
    • CM = 900 (1000 - 100)

Algorithm (Iterate from Right to Left): This is a clever way to handle the subtraction rule easily.

  1. Create a map T = {'I': 1, 'V': 5, ...} for Roman numeral values.
  2. Initialize the result with the value of the last character in the string: result = T[s[-1]].
  3. Iterate through the string from the second-to-last character backwards to the front (i from len(s)-2 down to 0).
    • If the value of the current character T[s[i]] is less than the value of the character to its right T[s[i+1]], it’s a subtraction case. So, result -= T[s[i]].
    • Otherwise, it’s an addition case. So, result += T[s[i]].
  4. Return result.

Example: s = "IX"

  • result starts as T['X'] = 10.
  • Loop i at s[0] (‘I’).
  • T['I'] (1) is less than T['X'] (10).
  • Subtract: result = 10 - 1 = 9.
  • Return 9.

Example: s = "LVIII"

  • result starts as T['I'] = 1.
  • i at s[2] (‘I’): T['I'](1) >= T['I'](1) is false (equal), so add. result = 1 + 1 = 2.
  • i at s[1] (‘V’): T['V'](5) >= T['I'](1). Add. result = 2 + 5 = 7.
  • i at s[0] (‘L’): T['L'](50) >= T['V'](5). Add. result = 7 + 50 = 57. Wait, this logic is slightly off. Let’s correct. result = T[s[-1]] is the initial value. s = "LVIII"
  1. result = T['I'] = 1
  2. i at s[3] (‘I’): T[s[3]] (1) is not less than T[s[4]] (1). Add T[s[3]]. result = 1 + 1 = 2.
  3. i at s[2] (‘I’): T[s[2]] (1) is not less than T[s[3]] (1). Add T[s[2]]. result = 2 + 1 = 3.
  4. i at s[1] (‘V’): T[s[1]] (5) is not less than T[s[2]] (1). Add T[s[1]]. result = 3 + 5 = 8.
  5. i at s[0] (‘L’): T[s[0]] (50) is not less than T[s[1]] (5). Add T[s[0]]. result = 8 + 50 = 58. Correct.