Tuesday, October 25, 2022

[FIXED] How do I do a partial match in Elasticsearch?

Issue

I have a link like http://drive.google.com and I want to match "google" out of the link.

I have:

query: {
    bool : {
        must: {
            match: { text: 'google'} 
        }
    }
}

But this only matches if the whole text is 'google' (case insensitive, so it also matches Google or GooGlE etc). How do I match for the 'google' inside of another string?


Solution

The point is that the ElasticSearch regex you are using requires a full string match:

Lucene’s patterns are always anchored. The pattern provided must match the entire string.

Thus, to match any character (but a newline), you can use .* pattern:

match: { text: '.*google.*'}
                ^^      ^^

In ES6+, use regexp insted of match:

"query": {
   "regexp": { "text": ".*google.*"} 
}

One more variation is for cases when your string can have newlines: match: { text: '(.|\n)*google(.|\n)*'}. This awful (.|\n)* is a must in ElasticSearch because this regex flavor does not allow any [\s\S] workarounds, nor any DOTALL/Singleline flags. "The Lucene regular expression engine is not Perl-compatible but supports a smaller range of operators."

However, if you do not plan to match any complicated patterns and need no word boundary checking, regex search for a mere substring is better performed with a mere wildcard search:

{
    "query": {
        "wildcard": {
            "text": {
                "value": "*google*",
                "boost": 1.0,
                "rewrite": "constant_score"
            }
        }
    }
} 

See Wildcard search for more details.

NOTE: The wildcard pattern also needs to match the whole input string, thus

  • google* finds all strings starting with google
  • *google* finds all strings containing google
  • *google finds all strings ending with google

Also, bear in mind the only pair of special characters in wildcard patterns:

?, which matches any single character
*, which can match zero or more characters, including an empty one


Answered By - Wiktor Stribiżew
Answer Checked By - Katrina (PHPFixing Volunteer)

No comments:

Post a Comment

Note: Only a member of this blog may post a comment.