Chapter 6: Strings
7 min readChapter 6 Overview: Strings
This chapter focuses on string manipulation, a fundamental skill in programming and a common topic in interviews. Problems often involve parsing, searching, and transforming string data.
Key String Concepts in Python:
- Immutability: Python strings are immutable. This means that any method that appears to modify a string (like
replace()orupper()) actually returns a new string. This has performance implications; building a string piece-by-piece in a loop using+concatenation is often inefficient (O(n^2)) because it creates a new string at each step. Using a list of characters and''.join()at the end is the preferred, efficient (O(n)) approach. - Encoding: Understand that strings are sequences of Unicode characters.
ord()gets the integer representation of a character, andchr()does the reverse. - Slicing: Python’s string slicing
s[start:stop:step]is powerful and efficient for extracting substrings.
Section 6.1: Interconvert Strings and Integers
The Problem: Implement functions to convert a string to an integer (string_to_int) and an integer to a string (int_to_string), without using built-in functions like int() or str(). Handle negative numbers.
string_to_int(s)
Algorithm:
- Handle the sign. Check if the first character
s[0]is'-'. If so, note it and slice the string to work with the remaining digits. - Initialize
result = 0. - Iterate through the digits of the string. For each character
c:- Convert the character to its integer value.
ord(c) - ord('0')is a classic trick for this. - Update the result:
result = result * 10 + digit.
- Convert the character to its integer value.
- Apply the sign if the number was negative.
int_to_string(x)
Algorithm:
- Handle the sign. If
x < 0, note it, and work with the absolute value ofx. Handle the edge casex = 0. - Use a list to store the characters of the digits.
- In a loop, as long as
x > 0:- Get the last digit:
digit = x % 10. - Convert it to a character:
chr(ord('0') + digit). - Append the character to your list.
- Update
x:x //= 10.
- Get the last digit:
- The list of characters will be in reverse order (e.g., 123 becomes
['3', '2', '1']). - Reverse the list and
join()it to form the string. - Add the
'-'prefix if the original number was negative.
Section 6.2: Base Conversion
The Problem: Given a string representing a number in base b1 and a target base b2, convert the number to a string in base b2. The bases are between 2 and 16. For digits greater than 9, use ‘A’ for 10, ‘B’ for 11, etc.
Algorithm (Two-Step Process): The simplest way to convert between two arbitrary bases is to use a common intermediate base, like base 10.
Step 1: Convert from base
b1to an integer (base 10).- This is very similar to the
string_to_intproblem. - Iterate through the string
sinb1. For each characterc:result = result * b1 + digit_value.- The
digit_valuefor ‘A’…‘F’ needs to be handled (e.g.,ord(c) - ord('A') + 10).
- This is very similar to the
Step 2: Convert the integer from base 10 to a string in base
b2.- This is very similar to the
int_to_stringproblem. - In a loop, as long as the number
x > 0:remainder = x % b2.- Convert
remainderto its character representation (handling 10-15 -> ‘A’-‘F’). - Prepend this character to your result.
x //= b2.
- This is very similar to the
Section 6.5: Test for Palindromicity
The Problem: Write a function that checks if a string is a palindrome, considering only alphanumeric characters and ignoring case.
Example:
- “A man, a plan, a canal, Panama” ->
True - “race a car” ->
False
Algorithm (Two Pointers): This is a classic and efficient approach.
- Initialize two pointers:
left = 0andright = len(s) - 1. - Loop while
left < right:- Move
leftpointer: Whileleft < rightand the characters[left]is not alphanumeric, incrementleft. - Move
rightpointer: Whileleft < rightand the characters[right]is not alphanumeric, decrementright. - Compare: Compare
s[left].lower()ands[right].lower(). If they are not equal, the string is not a palindrome. ReturnFalse. - Advance pointers: If they are equal, move the pointers inward:
left += 1,right -= 1.
- Move
- If the loop completes, it means all valid character pairs matched. Return
True.
Complexity:
- Time: O(n), where n is the length of the string. Each pointer traverses the string at most once.
- Space: O(1), as we only use a few variables for the pointers.
Section 6.8: The Look-and-Say Problem
The Problem: Generate the n-th term of the “look-and-say” sequence. This sequence starts with “1”. Each subsequent term is generated by reading aloud the digits of the previous term.
R(1)= “1”R(2)= “11” (readR(1)as “one 1”)R(3)= “21” (readR(2)as “two 1s”)R(4)= “1211” (readR(3)as “one 2, one 1”)R(5)= “111221” (readR(4)as “one 1, one 2, two 1s”)
Algorithm (Iterative Generation):
- Start with
s = "1". - Loop
n-1times to generate the subsequent terms. In each iteration:- Create a new empty string or list
next_s. - Iterate through the current string
sto identify groups of identical consecutive digits. - Use a pointer or index
i. Whilei < len(s):- Let
current_digit = s[i]. - Count how many times it repeats consecutively. Let this be
count. - Advance
ibycount. - Append the count and the digit to
next_s. For example, if you see “222”, you append “3” and “2”.
- Let
- Replace the old
swith the newly generatednext_s.
- Create a new empty string or list
- Return
s.
Section 6.9: Convert from Roman to Decimal
The Problem: Given a string representing a Roman numeral, convert it to an integer.
Roman Numeral Rules:
I=1, V=5, X=10, L=50, C=100, D=500, M=1000- Typically, values are added (e.g.,
VI= 5 + 1 = 6). - Subtraction Rule: A smaller value placed before a larger value is subtracted.
IV= 4 (5 - 1)IX= 9 (10 - 1)XL= 40 (50 - 10)XC= 90 (100 - 10)CD= 400 (500 - 100)CM= 900 (1000 - 100)
Algorithm (Iterate from Right to Left): This is a clever way to handle the subtraction rule easily.
- Create a map
T = {'I': 1, 'V': 5, ...}for Roman numeral values. - Initialize the result with the value of the last character in the string:
result = T[s[-1]]. - Iterate through the string from the second-to-last character backwards to the front (
ifromlen(s)-2down to0).- If the value of the current character
T[s[i]]is less than the value of the character to its rightT[s[i+1]], it’s a subtraction case. So,result -= T[s[i]]. - Otherwise, it’s an addition case. So,
result += T[s[i]].
- If the value of the current character
- Return
result.
Example: s = "IX"
resultstarts asT['X']= 10.- Loop
iats[0](‘I’). T['I'](1) is less thanT['X'](10).- Subtract:
result = 10 - 1 = 9. - Return 9.
Example: s = "LVIII"
resultstarts asT['I']= 1.iats[2](‘I’):T['I'](1) >= T['I'](1)is false (equal), so add.result = 1 + 1 = 2.iats[1](‘V’):T['V'](5) >= T['I'](1). Add.result = 2 + 5 = 7.iats[0](‘L’):T['L'](50) >= T['V'](5). Add.result = 7 + 50 = 57. Wait, this logic is slightly off. Let’s correct.result = T[s[-1]]is the initial value.s = "LVIII"
result = T['I']= 1iats[3](‘I’):T[s[3]](1) is not less thanT[s[4]](1). AddT[s[3]]. result = 1 + 1 = 2.iats[2](‘I’):T[s[2]](1) is not less thanT[s[3]](1). AddT[s[2]]. result = 2 + 1 = 3.iats[1](‘V’):T[s[1]](5) is not less thanT[s[2]](1). AddT[s[1]]. result = 3 + 5 = 8.iats[0](‘L’):T[s[0]](50) is not less thanT[s[1]](5). AddT[s[0]]. result = 8 + 50 = 58. Correct.