最小編輯距離
Minimum Edit Distance (MED)
or
最小編輯成本 Minimum Edit Cost (MEC)
應用
字串相似、機器翻譯、資訊擷取、語音辨識
定義
用最低成本將A字串轉乘B字串
可用操作法 :
- Deletion : 字串A刪除一個字元(Cost1 預設為1)
- Insertion : 字串A插入一個字元(Cost2 預設為1)
- Substitution : 字串A直接將字元替換成所需(Cost3 預設為2)
表格
Bottom-up
兩字串: X[m], Y[n]
最小編輯距離表: D[i][j], (0<=i<=n, 0<=j<=m), (X[1..i] Y[1..j])
最小編輯距離是: D[m][n]
Pseudo Code
初始化
D[i][0] = i cost1 // 最左那排
D[0][j] = j cost2 // 最下那排
遞迴關係
1 2 3 4 5 6 7 8 9 10 11
| for i=1 to m for j=1 to n D[i][j] = min( D[i-1][j]+cost1, // Insertion OR Deletion D[i][j-1]+cost2, // Insertion OR Deletion D[i-1][j-1]+ ( // Substitution if(X[i]!=Y[j]) cost3 if(X[i]==Y[j]) 0 ) ) return D[N][M]
|
時間複雜度: O(nm)
空間複雜度: O(nm)
例子
X: INTENTION
Y: EXECUTION
初始化後
開始算
結果
C++
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86
| #include <iostream> #include <string> #include <algorithm> #include <iomanip> using namespace std; void MEC(string s1, string s2) { int m = s1.size(); int n = s2.size(); int **D = new int*[m + 1]; for (int i = 0; i < m+1; i++) { D[i] = new int[n + 1]; } int cost1 = 2; int cost2 = 3; int cost3 = 4; for (int i = 0; i <= m; i++) { D[i][0] = i * cost1; } for (int j = 0; j <= n; j++) { D[0][j]= j * cost2; } for (int i = 1; i <= m; i++) { for (int j = 1; j <= n; j++) { if (s1[i - 1] == s2[j - 1]) { D[i][j] = D[i - 1][j - 1]; } else { D[i][j] = min(min(D[i - 1][j] + cost1, D[i][j - 1] + cost2), D[i - 1][j - 1] + cost3); } } } for (int i = m; i >= 0; i--) { for (int j = 0; j <= n; j++) { cout << setw(2) << D[i][j] << " "; } cout << endl; } cout << D[m][n] << endl; }; int main() { int num_i = 1; for (int i = 0; i < num_i; i++) { string s1 = "HABCDACE"; string s2 = "DABCDACE"; MEC(s1, s2); } system("pause"); return 0; } Sample input 2 HABCDACE DABCDACE QWERTY HWTDDGKY Sample output 4 20 */
|