0%

Minimum Edit Distance

最小編輯距離

Minimum Edit Distance (MED)
or
最小編輯成本 Minimum Edit Cost (MEC)

應用

字串相似、機器翻譯、資訊擷取、語音辨識

定義

用最低成本將A字串轉乘B字串
可用操作法 :

  • Deletion : 字串A刪除一個字元(Cost1 預設為1)
  • Insertion : 字串A插入一個字元(Cost2 預設為1)
  • Substitution : 字串A直接將字元替換成所需(Cost3 預設為2)

表格

Bottom-up
兩字串: X[m], Y[n]
最小編輯距離表: D[i][j], (0<=i<=n, 0<=j<=m), (X[1..i] Y[1..j])
最小編輯距離是: D[m][n]

Pseudo Code

  1. 初始化
    D[i][0] = i cost1 // 最左那排
    D[0][j] = j
    cost2 // 最下那排

  2. 遞迴關係

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    for i=1 to m
    for j=1 to n
    D[i][j] = min(
    D[i-1][j]+cost1, // Insertion OR Deletion
    D[i][j-1]+cost2, // Insertion OR Deletion
    D[i-1][j-1]+ ( // Substitution
    if(X[i]!=Y[j]) cost3
    if(X[i]==Y[j]) 0
    )
    )
    return D[N][M]

時間複雜度: O(nm)
空間複雜度: O(nm)

例子

X: INTENTION
Y: EXECUTION

  1. 初始化後
    image alt

  2. 開始算
    image alt

  3. 結果
    image alt

C++

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
#include <iostream>
#include <string>
#include <algorithm>
#include <iomanip>
using namespace std;
void MEC(string s1, string s2) {
int m = s1.size();
int n = s2.size();
int **D = new int*[m + 1];
for (int i = 0; i < m+1; i++)
{
D[i] = new int[n + 1];
}
// 初始化
int cost1 = 2; // Deletion
int cost2 = 3; // Insertion
int cost3 = 4; // Substitution
for (int i = 0; i <= m; i++)
{
D[i][0] = i * cost1;
}
for (int j = 0; j <= n; j++)
{
D[0][j]= j * cost2;
}
// 演算法
for (int i = 1; i <= m; i++)
{
for (int j = 1; j <= n; j++)
{
if (s1[i - 1] == s2[j - 1])
{
D[i][j] = D[i - 1][j - 1];
}
else
{
D[i][j] = min(min(D[i - 1][j] + cost1, D[i][j - 1] + cost2), D[i - 1][j - 1] + cost3);
}
}
}
// debug
for (int i = m; i >= 0; i--)
{
for (int j = 0; j <= n; j++)
{
cout << setw(2) << D[i][j] << " ";
}
cout << endl;
}
cout << D[m][n] << endl;
};
int main()
{
int num_i = 1;
//cin >> num_i;
//cin.ignore();
for (int i = 0; i < num_i; i++)
{
string s1 = "HABCDACE";
string s2 = "DABCDACE";
//getline(cin, s1);
//getline(cin, s2);
MEC(s1, s2);
}
system("pause");
return 0;
}
/*
Sample input
2
HABCDACE
DABCDACE
QWERTY
HWTDDGKY
Sample output
4
20
*/