Longest Substring Without Repeating Characters
[[Arrays]]
Problem
Given a string s, find the length of the longest substring without duplicate characters.
Example 1:
Input: s = “abcabcbb”
Output: 3
Explanation: The answer is “abc”, with the length of 3. Note that "bca" and "cab" are also correct answers.
Example 2:
Input: s = “bbbbb” Output: 1 Explanation: The answer is “b”, with the length of 1.
Example 3:
Input: s = “pwwkew” Output: 3 Explanation: The answer is “wke”, with the length of 3. Notice that the answer must be a substring, “pwke” is a subsequence and not a substring.
Brainstorming
A substring is a contiguous block of characters that are part of a string a subsequence is a list of characters in order that are part of the string they may or may not be contiguous
so in the case of s = “pwwkew” is “wke” is a valid substring and “pwke” is a valid subsequence
to generate every possible substring we will use 2 things index and length so at each index starting from 0 we will try to create a substring of all possible length that is between 1 and the total length of the string
an example of this
index = 0 lengths 1 2 3 4 5 6 “p” “pw” “pww” “pwwk” “pwwke” “pwwkew”
index = 1 length 1 2 3 4 5 “w” “ww” “wwk” “wwke” “wwkew”
for each of this substring we can keep a track of the characters for the visibility. lets take “pww” we can iterate from start and add it to a set if we find a character already present in the set that means the substring does not satisfy our condition.
the time complexity is O(NXM) where N is the length of the string and M is the length of the substring
for the optimization of brute force we can se that it does a lot of re work say if “abca” is not a valid string that means we can skip the first character since we know “abc” is a valid string minus the latest character and “bca” is a valid string minus the first character we can just shift our focus to more left
i think a set would work the best since the look up time is o(1) so as we are moving our right pointer to the next character we can check if the character is present in the set or not if it is it means our substring is invalid and we need to move the left pointer by 1
but thats not efficient we can instead think of jumping all the way to the index from where the index became invalid
so we would know the index at which its invalid of eg if the s = “pwwkew” after reach w at index 2 we know that the string till 2 is invalid so we can just skip pw and start from w again.
Exactly.
That is the final piece of the puzzle. A hash map (or dictionary in Python) gives you the power to not just ask “Have I seen this character before?” but to ask “Where was the last place I saw this character?”
You have now designed the entire optimal algorithm. Let’s put your pieces together into a coherent plan.
Data Structures:
- A
leftpointer, starting at 0. - A
max_lengthvariable, starting at 0. - Your hash map,
char_to_index_map, to store characters and their most recently seen indices.
The Algorithm (The Sliding Window):
You will have one main loop that moves a right pointer from 0 to the end of the string.
Inside the loop, for each character at s[right]:
The Check: You look at the
character.- Prompt 1: How do you check your
char_to_index_mapto see if you have encountered a duplicate within your current window? (A character seen before, but at an index that is outside your currentleftboundary, is not a problem).
- Prompt 1: How do you check your
The Action (If a duplicate is found):
- Prompt 2: You’ve found a duplicate character inside your current window. You know you need to “jump” the
leftpointer. To what new position should you move it?
- Prompt 2: You’ve found a duplicate character inside your current window. You know you need to “jump” the
The Updates (Every time):
- Prompt 3: After handling any potential jumps, you need to update two things in every single iteration:
a. You must update your
char_to_index_mapwith the current character’s latest position. Why is it important to do this after the duplicate check? b. You must update yourmax_length. The length of the current valid window isright - left + 1. How do you use this to update the overallmax_length? SDBVJDS
- Prompt 3: After handling any potential jumps, you need to update two things in every single iteration:
a. You must update your
s = "abcabcbb"
left = 0
max_length = 0
char_to_index_map = {}
for right in range(len(s)):
if s[right] in char_to_index_map:
if char_to_index_map[s[right]] >= left: # the latest index of the duplicate is in the current window
left = char_to_index_map[s[right]] + 1 # jump the left pointer
char_to_index_map[s[right]] = right # update the index of the duplicate with the latest index
current_length = right - left +1 # calculate the length of the window
max_length = max(current_length, max_length)
else:
char_to_index_map[s[right]] = right # add the character and its index
current_length = right - left + 1
max_length = max(current_length, max_length)
print(max_length)
Ideal version: The only thing that sometimes happens is the left pointer jump.
# Initialization is perfect
left = 0
max_length = 0
char_to_index_map = {}
# Loop is perfect
for right in range(len(s)):
# Let's get the current character
char = s[right]
# The only conditional part of the logic: the jump.
# Check if this character is a duplicate that is INSIDE our window.
if char in char_to_index_map and char_to_index_map[char] >= left:
# If it is, jump the left pointer.
left = char_to_index_map[char] + 1
# These next three steps happen EVERY time, unconditionally.
# 1. Update the map with the latest position of the character.
# This must be done AFTER the check above.
char_to_index_map[char] = right
# 2. Calculate the length of the current valid window.
current_length = right - left + 1
# 3. Update the overall max_length.
max_length = max(max_length, current_length)
# return max_length