elasticsearch自動(dòng)補(bǔ)全詳解

頭號(hào)碼甲 2022-11-19 發(fā)布于北京

展開(kāi)全文

一、參考

Suggesters

Elasticsearch Suggester 詳解

二、基本介紹

2.1 bing 示例

2.2 suggest 過(guò)程

三、ES 的 suggester

3.1 實(shí)現(xiàn)原理

將輸入的文本分解為token, 然后在索引的字典中查找相似的 term 并且返回

3.2 4 種 suggester

(1) term suggester

(2) phrase suggester

(3) completion suggester

(4) context suggester

四、term suggester

(1) 創(chuàng)建索引，寫入文檔

# 創(chuàng)建索引
PUT yztest/
{
  "mappings": {
    "properties": {
      "message": {
        "type": "text"
      }
    }
  }
}

# 添加文檔1
POST yztest/_doc/1
{
  "message": "The goal of Apache Lucene is to provide world class search capabilities"
}

# 添加文檔2
POST yztest/_doc/2
{
  "message": "Lucene is the search core of both Apache Solr and Elasticsearch."
}

(2) 查看分詞 token


# 分析分詞器結(jié)果

GET yztest/_analyze
{
  "field": "message",
  "text": [
    "The goal of Apache Lucene is to provide world class search capabilities",
    "Lucene is the search core of both Apache Solr and Elasticsearch."
  ]
}

(3) 不同的查詢結(jié)果

a) 當(dāng)輸入單詞拼寫錯(cuò)誤時(shí)候，會(huì)推薦正確的拼寫單詞列表

# 查詢
POST yztest/_search
{
  "suggest": {
    "suggest_message": { # 自定義的suggester名稱
      "text": "lucenl",  # 查詢的字符串，即用戶輸入的內(nèi)容
      "term": { # suggester類型為term suggester
        "field": "message", # 待匹配字段
        "suggest_mode": "missing" # 推薦結(jié)果模式,missing表示如果存在了term和用戶輸入的文本相同，則不再推薦
      }
    }
  }
}

# 返回結(jié)果
{
  "took" : 7,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 0,
      "relation" : "eq"
    },
    "max_score" : null,
    "hits" : [ ]
  },
  "suggest" : {
    "suggest_message" : [
      {
        "text" : "lucenl",
        "offset" : 0,
        "length" : 6,
        "options" : [ # options為一個(gè)數(shù)組，里面的值為具體的推薦值
          {
            "text" : "lucene",
            "score" : 0.8333333,
            "freq" : 2
          }
        ]
      }
    ]
  }
}

b) 當(dāng)輸入為多個(gè)單詞組成的字符串時(shí)

# 查詢
POST yztest/_search
{
  "suggest": {
    "suggest_message": {
      "text": "lucene search",
      "term": {
        "field": "message",
        "suggest_mode": "always"
      }
    }
  }
}

# 查詢結(jié)果
{
  "took" : 3,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 0,
      "relation" : "eq"
    },
    "max_score" : null,
    "hits" : [ ]
  },
  "suggest" : {
    "suggest_message" : [
      {
        "text" : "lucene",
        "offset" : 0,
        "length" : 6,
        "options" : [ ]
      },
      {
        "text" : "search",
        "offset" : 7,
        "length" : 6,
        "options" : [ ]
      }
    ]
  }
}

五、phrase suggester

# 詞組查詢
POST yztest/_search
{
  "suggest": {
    "YOUR_SUGGESTION": {
      "text": "Solr and Elasticearc", # 用戶輸入的字符串
      "phrase": { # 指定suggest類型為phrase suggester
        "field": "message", # 待匹配的字段
        "highlight": { # 可以設(shè)置高亮
          "pre_tag": "<em>",
          "post_tag": "</em>"
        }
      }
    }
  }
}

# 返回結(jié)果
{
  "took" : 9,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 0,
      "relation" : "eq"
    },
    "max_score" : null,
    "hits" : [ ]
  },
  "suggest" : {
    "YOUR_SUGGESTION" : [
      {
        "text" : "Solr and Elasticearc",
        "offset" : 0,
        "length" : 20,
        "options" : [
          {
            "text" : "solr and elasticsearch",
            "highlighted" : "solr and <em>elasticsearch</em>", # 高亮部分
            "score" : 0.017689342
          }
        ]
      }
    ]
  }
}

六、completion suggester

自動(dòng)補(bǔ)全功能

6.1 創(chuàng)建 mapping 指定 suggest 字段

# 創(chuàng)建索引
PUT yztest/
{
  "mappings": {
    "properties": {
      "message": { # 通過(guò)字段的type，指定是否使用suggest
        "type": "completion"
      }
    }
  }
}

6.2 查詢

(1) 索引文檔

POST yztest/_doc/1
{
  "message": "The goal of Apache Lucene is to provide world class search capabilities"
}

POST yztest/_doc/2
{
  "message": "Lucene is the search core of both Apache Solr and Elasticsearch."
}

POST yztest/_doc/3
{
  "message": "Lucene is the search core of Elasticsearch."
}

POST yztest/_doc/4
{
  "message": "Lucene is the search core of Apache Solr."
}

(2) 前綴查詢

# 查詢
POST yztest/_search
{
  "suggest": {
    "message_suggest": { # 自定義suggester名稱
      "prefix": "lucene is the", # 前綴字符串，即用戶輸入的文本
      "completion": { # 指定suggester的類型為 completion suggester
        "field": "message" # 待匹配的字段
      }
    }
  }
}

# 查詢結(jié)果
{
  "took" : 1,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 0,
      "relation" : "eq"
    },
    "max_score" : null,
    "hits" : [ ]
  },
  "suggest" : {
    "message_suggest" : [
      {
        "text" : "lucene is the",
        "offset" : 0,
        "length" : 13,
        "options" : [
          {
            "text" : "Lucene is the search core of Apache Solr.",
            "_index" : "yztest",
            "_type" : "_doc",
            "_id" : "4",
            "_score" : 1.0,
            "_source" : {
              "message" : "Lucene is the search core of Apache Solr."
            }
          },
          {
            "text" : "Lucene is the search core of Elasticsearch.",
            "_index" : "yztest",
            "_type" : "_doc",
            "_id" : "3",
            "_score" : 1.0,
            "_source" : {
              "message" : "Lucene is the search core of Elasticsearch."
            }
          },
          {
            "text" : "Lucene is the search core of both Apache Solr and ",
            "_index" : "yztest",
            "_type" : "_doc",
            "_id" : "2",
            "_score" : 1.0,
            "_source" : {
              "message" : "Lucene is the search core of both Apache Solr and Elasticsearch."
            }
          }
        ]
      }
    ]
  }
}

(3) skip_duplicates

刪除重復(fù)匹配文檔

# 查詢中指定skip_duplicates, 默認(rèn)值為false
POST yztest/_search
{
  "suggest": {
    "message_suggest": {
      "prefix": "lucene is the",
      "completion": {
        "field": "message",
        "skip_duplicates": true
      }
    }
  }
}

(4) fuzzy query

# 查詢中指定fuzzy屬性，即不一定是prefix準(zhǔn)確查詢
POST yztest/_search
{
  "suggest": {
    "message_suggest": {
      "prefix": "lucen is the",
      "completion": {
        "field": "message",
        "fuzzy": {
          "fuzziness": 2
        }
      }
    }
  }
}

(5) regex 查詢，正則匹配

# 正則匹配
POST yztest/_search
{
  "suggest": {
    "message_suggest": {
      "regex": ".*solr.*", # 正則表達(dá)式
      "completion": {
        "field": "message"
      }
    }
  }
}

七、context suggester

八、如何實(shí)現(xiàn)？

本站是提供個(gè)人知識(shí)管理的網(wǎng)絡(luò)存儲(chǔ)空間，所有內(nèi)容均由用戶發(fā)布，不代表本站觀點(diǎn)。請(qǐng)注意甄別內(nèi)容中的聯(lián)系方式、誘導(dǎo)購(gòu)買等信息，謹(jǐn)防詐騙。如發(fā)現(xiàn)有害或侵權(quán)內(nèi)容，請(qǐng)點(diǎn)擊一鍵舉報(bào)。

轉(zhuǎn)藏 分享

QQ空間 QQ好友新浪微博微信

獻(xiàn)花（0） +1

來(lái)自：頭號(hào)碼甲 > 《待分類》

舉報(bào)/認(rèn)領(lǐng)