Edit Distance
[[Dynamic Programming Primer]]
Problem
You are given two words, word1 and word2. You want to find the minimum number of operations required to convert word1 into word2.
You are allowed three types of operations, and each one costs 1:
- Insert a character.
- Delete a character.
- Replace a character.
Example: word1 = “horse”, word2 = “ros”
1. Horse -> Rorse (replace 'h' with 'r')
2. Rorse -> Rors (delete 'e')
3. Rors -> Ros (delete 'o')
The minimum number of operations is **3**.
Brainstorming
This is again a 2D dp problem with grids as we are dealing with 2 strings comparison again similar to [[Longest Common Subsequence]] but it has an added complexity for the operation
- we are assuming that it will always be word 1 changing to word 2 and not the other way around hence
dp[i][j] - we will always compare the last character of the prefixes to confirm that word1 and word2 till i-1 and j-1 if we are checking for
dp[i][j]is matching or not if they are we don’t add any cost we just move ahead. if they are not then we try to calculate the cost for each operations - for each operation we assume that doing that operations results in successfull string match we don’t need to check whether it actually does or not
- we take the minimum of all three while calculating the cost of the current prefix length i, j
- the base case for the first row which represents word 2 in our case via the variable j for edit distance cost with empty string is the length of that word
- the base case for the first column which represents word 1 in our case via the variable i for edit cost with empty string is the length of the word
This is what it translates too
- We initialise a
dpgrid ofm+1xn+1 - Fill the first row
(dp[j]): The cost to convert an empty string to a string of length j is j insertions. So,dp[j] = j. - Fill the first column
(dp[i]):The cost to convert a string of length i to an empty string is i deletions. So,dp[i] = i. - we start a loop for the row
for j in range(1,m)for colfor i in range(1,n)- we check for the
word[i-1] == word[j-1]if its a match then thedpstate is simpledp[i][j] = dp[i-1][j-1] - If they do not match: You must pay a cost of 1 for an operation. You calculate the cost for all three hypothetical plans and take the minimum.
- Cost of Delete Plan: `dp[i-1][j]
- Cost of Insert Plan:
dp[i][j-1] - Cost of Replace Plan:
dp[i-1][j-1] - Recurrence:
dp[i][j] = 1 + min(dp[i-1][j], dp[i][j-1], dp[i-1][j-1])
- we check for the
- The Final Answer: The answer for the entire problem is in the bottom-right corner:
dp[m][n].
word1 = "sea"
word2 = "saw"
m = len(word1)
n = len(word2)
dp = [[0 for _ in range(n+1)] for _ in range(m+1)
# base case 1 for the first row where for empty string and string of length j is j
for j in range(1, m):
dp[0][j] = j
# base case 2 for the first column where for empty string and the string of lenght i is i
for i in range(1,n):
dp[i][0] = i
# the final loop
for j in range(1,m):
for i in range(1,n):
if word1[j-1] == word2[i-1]:
dp[i][j] = dp[i-1][j-1]
else:
"""
1. Cost of Delete Plan: `dp[i-1][j]
2. Cost of Insert Plan: `dp[i][j-1]`
3. Cost of Replace Plan: `dp[i-1][j-1]`
"""
dp[i][j] = 1 + min(dp[i-1][j], dp[i][j-1], dp[i-1][j-1])
print(dp[m][n])