C# || How To Find Longest Duplicate Substring Using C#
The following is a module with functions which demonstrates how to find the longest duplicate substring using C#.
1. Longest Dup Substring – Problem Statement
Given a string s, consider all duplicated substrings: (contiguous) substrings of s that occur 2 or more times. The occurrences may overlap.
Return any duplicated substring that has the longest possible length. If s does not have a duplicated substring, the answer is “”.
Example 1:
Input: s = "banana"
Output: "ana"
Example 2:
Input: s = "abcd"
Output: ""
2. Longest Dup Substring – Solution
The following is a solution which demonstrates how find the longest duplicate substring.
This solution uses Binary Search and Rolling Hash.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 |
// ============================================================================ // Author: Kenneth Perkins // Date: Oct 30, 2021 // Taken From: http://programmingnotes.org/ // File: Solution.cs // Description: Demonstrates how to find the longest duplicate substring // ============================================================================ public class Solution { public string LongestDupSubstring(string s) { // Convert string to array of ascii integers // to implement constant time slice var nums = new int[s.Length]; for (int index = 0; index < s.Length; ++index) { nums[index] = (int)s[index] - (int)'a'; } // Keeps track of substring start index and length var startIndex = -1; var substrLength = -1; // Use binary search to find the largest possible length of a duplicate substring var left = 1; var right = s.Length; while (left < right) { var mid = left + (right - left) / 2; // Rolling hash (Rabin-Karp) var index = RollingSearch(nums, mid); // Substring found if (index != -1) { if (mid > substrLength) { startIndex = index; substrLength = mid; } left = mid + 1; // Substring not found } else { right = mid; } } return substrLength != -1 ? s.Substring(startIndex, substrLength): ""; } private int RollingSearch(int[] nums, int mid) { var seen = new HashSet<long>(); // Base value for the rolling hash function var a = 31; // Current hash var hash = Hash(nums, mid, a); seen.Add(hash); long pow = 1; for (var index = 1; index < mid; ++index) { pow *= a; } var result = -1; for (var index = 1; index < nums.Length - mid + 1; ++index) { // Compute rolling hash hash = RollingHash(pow, hash, nums[index - 1], nums[index + mid - 1], a); // Hash found if (seen.Contains(hash)) { result = index; break; } // Hash not found seen.Add(hash); } return result; } private long Hash(int[] nums, int mid, int a) { long h = 0; long aL = 1; for (var index = mid; index >= 1; --index) { h += (nums[index - 1] + 1) * aL; aL *= a; } return h; } private long RollingHash(long pow, long hash, int left, int right, int a) { return (hash - (left + 1) * pow) * a + (right + 1); } }// http://programmingnotes.org/ |
QUICK NOTES:
The highlighted lines are sections of interest to look out for.
The code is heavily commented, so no further insight is necessary. If you have any questions, feel free to leave a comment below.
Once compiled, you should get this as your output for the example cases:
"ana"
""
Leave a Reply