\

Longest Common Subsequence

Difficulty: N/A | Solved: December 20, 2025

[[Dynamic Programming Primer]]

Problem

  • text1 = "abcde"

  • text2 = "ace"

  • The Goal: Find the length of the longest sequence of characters that appears in both strings in the same order, but not necessarily contiguously. The answer here is "ace", length 3.

Brain Storming

okay so the key here is the dp states again . Here the interesting thing is we unlike other problems we have 2 things to look at instead of one. Earlier we had 1 variable to look back ( how many coins does it take to make amount i how many ways can we reach step i what is the longest increasing sequence in prefix i that ends with nums[i]) but here we have 2 strings hence we would need to keep track of states across both of them i j

The Golden Rule: If you need two pieces of information to uniquely define a subproblem, you need a 2D DP table.

Story of a single cell

so lets take an example eat sea the dp grid will look like

The DP grid where we track the string eat with i and string sea with j
the dp[i][j] will represent "the longest common sequence in prefix ij" 

The empty strings are added so that the dp grid has enough space to accomodate all the results. they also act as base case for this problem.

      " "  "E"  "A"  "T"
  " "  *    *.   *.   *
  
  "S"  *    *.   *.   *
  
  "E"  *.   *.   *.   *
  
  "A"  *    *.   *.   *

lets take an example of dp[2][3]

  1. the i=2 corresponds to prefix of length 2 "SE"
  2. the j=3 corresponds to prefix of length 3 "EAT"

the dp[2][3] then represents answer to the question what is the longest common subsequence in prefixes “SE” and “EAT” which is E i.e 1


this is how the result will look like in the dp grid
        1.   1.   2.   3
       " "  "E"  "A"  "T"
 0 " "  *    *.   *.   *
  
 1 "S"  *    *.   *.   *
  
 2 "E"  *.   *.   *.   *
  
 3 "A"  *    *.   1   *

Now the question is how do we fill this grid?

the first thing we need would be a base case . We should add a empty string (we already have it)

now how do we calculate it ? we use the same logic we have used before at any given state we use the previous states answer stored to help us calculate the new answer

let’s pick an arbitrary cell: dp[2][3]

  1. What does it mean?

    • i=2 corresponds to the prefix of text1 of length 2: “SE”
    • j=3 corresponds to the prefix of text2 of length 3: “EAT”
    • So, dp[2][3] will hold the final answer for LCS(“SE”, “EAT”).
  2. How do we calculate it? (The Story of a Single Cell)
    To solve for dp[2][3], we only look at the last characters: ‘E’ (from “SE”) and ‘T’ (from “EAT”).

    • They don’t match.
    • This means we are forced to throw one away. Our two choices are:
      1. Find the LCS(“S”, “EAT”). The answer to this is already stored in the cell above us: dp[1][3].
      2. Find the LCS(“SE”, “EA”). The answer to this is already stored in the cell to our left: dp[2][2].
    • We want the longest, so we take the max of those two cells.
    • The Recurrence: dp[2][3] = max(dp[1][3], dp[2][2])

now, what if we were calculating dp[3][2]?

  1. What does it mean?

    • i=3 -> “SEA”
    • j=2 -> “EA”
  2. How do we calculate it?

    • Compare the last characters: ‘A’ (from “SEA”) and ‘A’ (from “EA”).
    • They match!
    • This is great. We know our LCS will be 1 + the LCS of the strings without these matching characters.
    • We are left with “SE” and “E”. The answer for LCS(“SE”, “E”) is already stored in the cell diagonally up-left from us: dp[2][1]
    • The Recurrence: dp[3][2] = 1 + dp[2][1]

 Every cell in the grid calculates its own answer by only looking at its three neighbors: above, left, and diagonal-left. It trusts that they have already been computed correctly. By the time you reach the bottom-right corner, you have built up the solution for the entire problem.

so the algo is like this

  1. take the length of both the strings
  2. the length of the 1 string becomes columns c
  3. the length of the 2 strings becomes rows r
  4. initialise a 2D dp array with 0 as all the values rXc
  5. then we loop for each row + 1 and for each column + 1 i represents the row and j represents the column
    1. we check for text[i] == text[j] for the character if yes we update dp[i][j] = 1 + dp[i-1][j-1]
    2. if not then we take the max of previous states dp[i][j] = max(dp[i-1][j], dp[i][j-1]
  6. Base case if lets say we compare “A” with " " then the LCS of any string with an empty string is 0. This is the fundamental truth that will anchor our entire solution.

text1 = "EAT"
text2 = "SEA"

c = len(text1)
r = len(text2)

# The zeros handle our base case (LCS with an empty string is 0)
dp = [[0 for _ in range(c +1)] for _ in range(r +1)]

for i in range(r + 1):
	for j in range(c + 1):
		if text[i-1] == text[j-1]:
			dp[i][j] = 1 + dp[i-1][j-1] # you increment the value from the diagonal left
		else:
			dp[i][j] = max(dp[i-1][j], dp[i][j-1]) # you pick the max value from left and the top 

print(dp[c][r])

Question 1: Why text1[i-1] == text2[j-1] instead of text1[i] == text2[j]?

This is purely an implementation detail to make our lives easier. It’s a direct consequence of our decision to use a dp grid of size (m+1) x (n+1).

Let’s align the indices:

Our DP Grid IndexCorresponds to Prefix LengthCorresponds to String Index
i=0, j=00 (empty string)(doesn’t exist)
i=1, j=11text[0]
i=2, j=22text[1]
i, ji and jtext[i-1] and text[j-1]

You can see there’s an “off-by-one” shift. The i-th character of a string is at index i-1.

So, when we are filling cell dp[i][j], we are solving the problem for prefixes of length i and j. The last characters of those prefixes are at indices i-1 and j-1.

This is why the check is text1[i-1] == text2[j-1]. It’s the most common source of bugs when implementing this, and you’ve spotted it perfectly.