AC自动机,用来匹配路由

2023-04-27

用AC自动机进行多模式匹配(传入一个host,查看是否匹配多个pattern中的一个)

和trie匹配的区别就在于,trie失配时会回到root重新匹配,AC失配时会用奇妙的方法找到与当前匹配字符拥有最大后缀的状态继续匹配

所以重点就在于构建每个匹配状态下的失配指针,用于指示下一步

记S2 = goto(S1, C)为字典树中S1状态接收C goto S2,S2 = failure(S1)为S1的failure失配(状态)节点为S2

  1. Success 接收一个匹配字符
  2. Output 达到某个模式串的末尾,在Trie树中标记的output状态
  3. Failure 跳转到最大后缀 记此时状态为S1状态机的上一success状态为(唯一)S2, S1与S2的goto条件为C,S2的failure状态为S3(唯一且已经存在),测试S3的goto状态,如果存在S4 = goto(S3, C)则S4为C1的failure 即如有: S1 = success(S2, C) S3 = failure(S2) S4 = success(S3, C) 则S4 = failure(S1) failure节点从上逐层计算,所以当我们求failure(S1)时,其上一个状态S2必然已经计算过failure了 当前状态(S1)的failure状态(S4=failure(S1))的意义是在自动机中找到与S1拥有最长相同后缀的状态,而其前一个状态(S2)的failure状态(S3)和S2拥有最长相同后缀(已经确定的),所以最长相同后缀可以再延长(通过S1=goto(S2,C), S4=goto(S3, C)的C),那相当于S1和S4分别继承了他们唯一前一状态的最长相同后缀 当然S4直接存在是理想情况,如果S4=goto(S3, C)走不通转移不了,那就再找S3的failure, 即找到S5=failure(S3), 寻找S6=goto(S5, C)

这个东西匹配速度是只与字符串长度相关的O(n),不受pattern数量影响, 缺点就是构建非常慢,可能需要预构建然后序列化再读取(非常不想这样)

```go package structure

func NewTrie() Trie { trie := &Trie{ success: make(map[uint8]Trie), failure: nil, emit: false, } trie.failure = trie return trie }

type Trie struct { success map[uint8]Trie failure Trie emit bool }

func (trie *Trie) Add(value string) { curr := trie for i := range value { ch := value[i] _, exist := curr.success[ch] if exist == false { curr.success[ch] = NewTrie() } curr = curr.success[ch] if i == len(value)-1 { curr.emit = true } } }

type ACAutomaton struct { root *Trie }

func (ac *ACAutomaton) MatchAny(key string) bool { curr := ac.root for i := range key { success, exist := curr.success[key[i]] for exist == false && curr != ac.root { curr = curr.failure success, exist = curr.success[key[i]] } if exist == true { curr = success } // else: curr == ac.root, curr = ac.root if curr.emit == true { return true } } return false }

func NewACAutomaton(patterns []string) *ACAutomaton { ac := &ACAutomaton{root: NewTrie()} for _, pattern := range patterns { ac.root.Add(pattern) } ac.Build() return ac }

func (ac *ACAutomaton) Build() { queue := NewQueue*Trie err := queue.Push(ac.root) if err != nil { panic(err) // should never happen } for queue.Size() != 0 { // bfs without layer s2 := queue.Pop() // s2: s1 is s2\s success state with c for c, s1 := range s2.success { // s1: state we are finding failure for err = queue.Push(s1) if err != nil { panic(err) } s3 := s2.failure // s3: s2\s failure state for { if s3 == ac.root { s1.failure = ac.root break } s4, exist := s3.success[c] // s4: s3\s success state with c if exist == true { s1.failure = s4 break } s3 = s3.failure } } } }

```