This story focuses on different types of queries on elastic-search like a match, term, multi-match, regexp, wildcard, range, geometry, multi-index search. Finally, we will see spring boot code using High-Level Rest Client of Elastic Search.
I have used Elastic Search 7.3.0 version for this demo.
Below is the snapshot of different types of queries used in this tutorial
I have set up a spring boot application that will load data into the elastic search, the data has varied types of search filters. There are around 3M data in my local elastic search cluster for both indices.
Before querying and loading data it is suggestable to create indexes from Kibana query console
I created a simple spring boot application to load data. Please visit the GitHub link to load elastic search and Kibana console from docker setup and load data into the elastic search.
In this hands-on exercise, I have defined two indexes user
and user_address
.
User Index
Check User Index Mapping file is here.
UserAddress Index
Check the User Address index mapping file from here.
The primary difference between the
text
datatype and thekeyword
datatype is thattext
fields are analyzed at the time of indexing, andkeyword
fields are not. What that means is,text
fields are broken down into their individual terms at indexing to allow for partial matching, whilekeyword
fields are indexed as is.
The “match” query is one of the most basic and commonly used queries in Elasticsearch and functions as a full-text query. We can use this query to search for text, numbers, or boolean values.
match queries accept text/numerics/dates, analyzes them, and constructs a query. The match query is of type boolean. It means that the text provided is analyzed and the analysis process constructs a boolean query from the provided text. The operator flag can be set to or and to control the boolean clauses (defaults to or). The minimum number of optional should clause to match can be set using the minimum_should_match parameter.
GET user/_search
{
"query": {
"match": {
"phrase": {
"query": "kL5fP"
}
}
},
"highlight": {
"fields": {
"phrase": {}
}
}
}
----------------------------------------------------------------
There are 19 results, but minified to show only 1.{
"took" : 3,
"timed_out" : false,
"_shards" : {
"total" : 2,
"successful" : 2,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 19,
"relation" : "eq"
},
"max_score" : 11.733445,
"hits" : [
{
"_index" : "user",
"_type" : "_doc",
"_id" : "e9a6dbf4-c6a6-46ce-9049-729f4bb64407",
"_score" : 11.733445,
"_source" : {
"_class" : "com.elastic.demo.entity.User",
"id" : "e9a6dbf4-c6a6-46ce-9049-729f4bb64407",
"firstName" : "URuMyV",
"lastName" : "XbonCWM",
"uniqueId" : "ALHNQSOVOJWFN0I1",
"country" : "India",
"city" : "Hyderabad",
"mobileNumber" : "61623020652",
"point" : [
77.27123,
16.5898
],
"maritalStatus" : "Divorced",
"numberOfSiblings" : 2,
"siblings" : [
"itsZzh kMBKlT",
"RNrFfJ oBZGSEcd"
],
"profession" : "Banker",
"income" : 411516,
"phrase" : "oGRUxB yr72CdCt yZqQb kL5fP",
"nativeResident" : true,
"dateOfBirth" : 103573800000,
"createdOn" : 1570885284000
},
"highlight" : {
"phrase" : [
"oGRUxB yr72CdCt yZqQb <em>kL5fP</em>"
]
}
}
]
}
The multi_match query builds on the match query to allow multi-field queries.
Types of multi_match query
The way the multi_match query is executed internally depends on the type parameter, which can be set to:
best_fields: (default) Finds documents that match any field but uses the _score from the best field. See best_fields.
The best_fields type is most useful when you are searching for multiple words best found in the same field. For instance “brown fox” in a single field is more meaningful than “brown” in one field and “fox” in the other.
most_fields: Finds documents which match any field and combines the _score from each field. See most_fields.
The most_fields type is most useful when querying multiple fields that contain the same text analyzed in different ways. For instance, the main field may contain synonyms, stemming, and terms without diacritics. A second field may contain the original terms, and a third field might contain shingles. By combining scores from all three fields we can match as many documents as possible with the main field, but use the second and third fields to push the most similar results to the top of the list.
cross_fields: Treats fields with the same analyzer as though they were one big field. Looks for each word in any field. See cross_fields.
The cross_fields type is particularly useful with structured documents where multiple fields should match. For instance, when querying the first_name and last_name fields for “Will Smith”, the best match is likely to have “Will” in one field and “Smith” in the other.
phrase: Runs a match_phrase query on each field and uses the _score from the best field. See phrase and phrase_prefix.
The phrase and phrase_prefix types behave just like best_fields, but they use a match_phrase or match_phrase_prefix query instead of a match query.
Default behavior: Searches for the best match (full-text search) on mobileNumber, firstName, lastName, and uniqueId.
GET user/_search
{
"query": {
"multi_match": {
"query": "HyWHrsVr",
"fields": ["mobileNumber", "firstName", "lastName", "uniqueId"]
}
}
}
Phase_prefix: Searches for the prefix match (hywh*) on mobileNumber, firstName, lastName, and uniqueId.
GET user/_search
{
"query": {
"multi_match": {
"query": "hywh",
"type": "phrase_prefix",
"fields": ["mobileNumber", "firstName", "lastName", "uniqueId"]
}
}
}
Sometimes we are more interested in a structured search in which we want to find an exact match and return the results. The term and terms query help us here. In the below example, we are searching for all users in our index where the profession is Singer
. Returns documents that contain an exact term in a provided field. You can use the term query to find documents based on a precise value such as a price, a product ID, or a username.
Note :
Avoid using the term query for text fields. By default, Elasticsearch changes the values of text fields as part of analysis. This can make finding exact matches for text field values difficult.
To search text field values, use the match query instead.
GET user/_search
{
"query": {
"term": {
"profession.keyword": "Singer"
}
}
}
-------
Will return results
The reason that we used
Keyword
here is to search for an exact word. Ifkeyword
is not used we have to search forsearch criteria in lower case
.
GET user/_search
{
"query": {
"term": {
"profession": "Singer"
}
}
}
Will not return results
-------------------------------------------------------------
GET user/_search
{
"query": {
"term": {
"profession": "singer".
}
}
}
Will return results.
We can also search by multiple query strings with the help of terms. We will refine the above query whose profession is Singer
and Farmer
.
GET user/_search
{
"query": {
"terms": {
"profession.keyword": ["Singer", "Farmer"]
}
}
}
The AND/OR/NOT operators can be used to fine-tune our search queries in order to provide more relevant or specific results. This is implemented in the search API as a bool query. The bool query accepts a must parameter (equivalent to AND), a must_not parameter (equivalent to NOT), and a should parameter (equivalent to OR).
A query that matches documents matching boolean combinations of other queries. The bool query maps to Lucene BooleanQuery. It is built using one or more boolean clauses, each clause with a typed occurrence. The occurrence types are
must: The clause (query) must appear in matching documents and will contribute to the score.
filter: The clause (query) must appear in matching documents. However, unlike must
the score of the query will be ignored. Filter clauses are executed in filter context, meaning that scoring is ignored and clauses are considered for caching.
should: The clause (query) should appear in the matching document. If the bool query is in a query context and has a must or filter clause then a document will match the bool query even if none of the should queries match. In this case these clauses are only used to influence the score. If the bool query is in a filter context or has neither must nor filter then at least one of the should queries match a document for it to match the bool query? This behavior may be explicitly controlled by setting the minimum_should_match parameter.
must_not: The clause (query) must not appear in the matching documents. Clauses are executed in filter context meaning that scoring is ignored and clauses are considered for caching. Because scoring is ignored, a score of 0 for all documents is returned.
For Example: If I want to search users profession is
Athelete
and marital status ismarried
and mobile numbers are matching with12360
. Must works as and operator
GET user/_search
{
"query": {
"bool": {
"must": [
{
"match": {
"profession": "Athlete"
}
},
{
"wildcard": {
"mobileNumber.keyword": "*12360*"
}
},
{
"match": {
"maritalStatus": "Married"
}
}
]
}
}
}
Returns documents that contain terms matching a wildcard pattern. A wildcard operator is a placeholder that matches one or more characters. For example, the * wildcard operator matches zero or more characters. You can combine wildcard operators with other characters to create a wildcard pattern.
case_insensitive [7.10.0]Added in 7.10.0.
(Optional, Boolean) Allows case insensitive matching of the pattern with the indexed field values when set to true. Default is false which means the case sensitivity of matching depends on the underlying field’s mapping.
If we want to search users by the first name matching abc
or last name matching abc
or unique ID matching abc
. It is logical grouping by the OR operator.
For below 7.10.0, do not use keywords for case insensitivity or normalize during index creation.
GET user/_search
{
"query": {
"bool": {
"should": [
{
"wildcard": {
"firstName": "*abc*"
}
},
{
"wildcard": {
"lastName": "*abc*" }
},
{
"wildcard": {
"uniqueId": "*abc*"
}
}
]
}
}
}
(Video) Microservices Logging | ELK Stack | Elastic Search | Logstash | Kibana | JavaTechie
Returns documents that contain terms matching a regular expression.
A regular expression is a way to match patterns in data using placeholder characters, called operators. For a list of operators supported by the regexp query, see Regular expression syntax.
Let us query the user index, to get a list of users whose siblings have regular expression e[a-z]*h
.
GET user/_search
{
"query": {
"regexp": {
"siblings": "e[a-z]*h"
}
},
"_source": ["siblings", "id"],
"highlight": {
"fields" : {
"siblings": {}
}
}
}
----------------------------------------------------
Just one result for understanding, This query resulted around 5k documents.
{
"_index" : "user",
"_type" : "_doc",
"_id" : "00159cc8-313c-4492-a5b9-ebaf884ca0e2",
"_score" : 1.0,
"_source" : {
"siblings" : [
"hcbKUQCki juVLjF",
"KYSuPQM jTraY",
"eMDTLh gGgSY"
],
"id" : "00159cc8-313c-4492-a5b9-ebaf884ca0e2"
},
"highlight" : {
"siblings" : [
"<em>eMDTLh</em> gGgSY"
]
}
}
Returns documents based on a provided query string, using a parser with a strict syntax. This query uses syntax to parse and split the provided query string based on operators, such as AND or NOT. The query then analyzes each split text independently before returning matching documents.
You can use the query_string query to create a complex search that includes wildcard characters, searches across multiple fields, and more. While versatile, the query is strict and returns an error if the query string includes any invalid syntax.
The usage of ~ in the query indicates the usage of a fuzzy query
GET user/_search
{
"query": {
"query_string": {
"query": "saad~1 or zojmi~1", -- Skips one word during search.
"fields": ["lastName", "firstName"]
}
},
"highlight": {
"fields" : {
"maritalStatus": {}
}
}
}
------------------------------
{
"_index" : "user",
"_type" : "_doc",
"_id" : "0e167d85-281a-4b6a-ab0b-8fc9f496245a",
"_score" : 10.263874,
"_source" : {
"_class" : "com.elastic.demo.entity.User",
"id" : "0e167d85-281a-4b6a-ab0b-8fc9f496245a",
"firstName" : "ZOjLi",
"lastName" : "ROmwpwGE"
},
"highlight" : {
"firstName" : [
"<em>ZOjLi</em>" - Here m is replaced by i. still search is successful becuase of fuzzy logic.
]
}
}
The simple_query_string query is a version of the query_string query that is more suitable for use in a single search box that is exposed to users because it replaces the use of AND/OR/NOT with +/|/-, respectively, and it discards invalid parts of a query instead of throwing an exception if a user makes a mistake.
GET user/_search
{
"query": {
"query_string": {
"query": "saad~1 | zojmi~1", -- Skips one word during search.
"fields": ["lastName", "firstName"]
}
},
"highlight": {
"fields" : {
"maritalStatus": {}
}
}
}
Another most commonly used query in the Elasticsearch world is the range query. The range query allows us to get the documents that contain the terms within the specified range. Range query is a term-level query (means using to query structured data) and can be used against numerical fields, date fields, etc.
On Numeric Field: We will query the user whose income ≥ 100000 and ≤ 500000.
GET user/_search
{
"query": {
"range": {
"income": {
"gte": 100000,
"lte": 500000
}
}
}
}
On Date Field: The date field, can be stored in many formats. Please visit here for more details. By default, it is epoch milliseconds. I have used the same format. Even though it is stored in epoch format, we can query the data in supported date/DateTime formats.
GET user/_search
{
"query": {
"range" : {
"dateOfBirth": {
"gte": "2001-08-01",
"lte": "2001-12-31"
}
}
}
}
In the user index, a field point is created as geo_point. There are many other options available for geography fields. I will create a more detailed blog on this field. For now, we have created Geo_point as [lon, lat]. We will query to find locations that distances are within 1km for a specific location radius.
GET user/_search
{
"query": {
"bool": {
"must": {
"match_all": {}
},
"filter": {
"geo_distance": {
"distance": "1km",
"point": [79.7397, 15.684453142518711] }
}
}
The multi-search API allows to execution of several search requests within the same API. The endpoint for it is _msearch
.
Use case: We have two indices,
user
anduser_address
. In traditional RDBMS, user and user_address tables are linked via foreign key constraints. We have to join two tables to find user and their associated addresses. How we can achieve this in elastic search
GET default/_msearch
{"index" : "user"}
{"query": {"bool": {"must": [{"match": {"id.keyword": "e25b9ecf-b6fa-4ef4-a5df-2fc7dd62691d"}}]}}}
{"index" : "user_address"}
{"query": {"bool": {"must": [{"match": {"userId.keyword": "e25b9ecf-b6fa-4ef4-a5df-2fc7dd62691d"}}]}}}
---------------------------------------------------------------
{
"took" : 22,
"responses" : [
{
"took" : 22,
"timed_out" : false,
"_shards" : {
"total" : 2,
"successful" : 2,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 1,
"relation" : "eq"
},
"max_score" : 12.832057,
"hits" : [
{
"_index" : "user",
"_type" : "_doc",
"_id" : "e25b9ecf-b6fa-4ef4-a5df-2fc7dd62691d",
"_score" : 12.832057,
"_source" : {
"_class" : "com.elastic.demo.entity.User",
"id" : "e25b9ecf-b6fa-4ef4-a5df-2fc7dd62691d",
"firstName" : "iUWvggrjb",
"lastName" : "PwZSA",
"uniqueId" : "YQWRQH2NVROOTC3R",
"country" : "India",
"city" : "Hyderabad",
"mobileNumber" : "73642600218",
"point" : [
76.49461,
16.46887
],
"maritalStatus" : "Widowed",
"numberOfSiblings" : 1,
"siblings" : [
"pqHpPibr KKcXioB"
],
"profession" : "Actor",
"income" : 209060,
"phrase" : "6q0y1uOgDZD VG6bWkUGT uGlRAJ RICgp",
"nativeResident" : false,
"dateOfBirth" : -110525400000,
"createdOn" : 1563241250000
}
}
]
},
"status" : 200
},
{
"took" : 22,
"timed_out" : false,
"_shards" : {
"total" : 2,
"successful" : 2,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 2,
"relation" : "eq"
},
"max_score" : 13.525113,
"hits" : [
{
"_index" : "user_address",
"_type" : "_doc",
"_id" : "a4dd76f1-7003-4ab7-876a-a87646101f4d",
"_score" : 13.525113,
"_source" : {
"_class" : "com.elastic.demo.entity.UserAddress",
"id" : "a4dd76f1-7003-4ab7-876a-a87646101f4d",
"userId" : "e25b9ecf-b6fa-4ef4-a5df-2fc7dd62691d",
"address1" : "LcjcBGazLfGHAzNMxcnskeSaP",
"address2" : "xeDrDTPhNDDYZJR",
"street" : "iVmGEclcCBLoTdfzQdhK",
"landmark" : "dyHKqajaFaGJSsa",
"city" : "Hyderabad",
"state" : "Telangana",
"zipCode" : "271066",
"createdOn" : 1571685580000
}
},
{
"_index" : "user_address",
"_type" : "_doc",
"_id" : "455634b6-0996-4a1c-8e9d-3a9172eae01f",
"_score" : 13.523252,
"_source" : {
"_class" : "com.elastic.demo.entity.UserAddress",
"id" : "455634b6-0996-4a1c-8e9d-3a9172eae01f",
"userId" : "e25b9ecf-b6fa-4ef4-a5df-2fc7dd62691d",
"address1" : "qBcBGHtnYmsCZSYINqFMHJpfB",
"address2" : "khTLQUhsipHzGRy",
"street" : "xcnTmHhyzJCNVqUPYSoo",
"landmark" : "YxgEMZHGAxPDKly",
"city" : "Hyderabad",
"state" : "Telangana",
"zipCode" : "054947",
"createdOn" : 1565102726000
}
}
]
},
"status" : 200
}
]
}
So far we have seen how to query the data from Elastic Search using the dev console. Now we will see how can we do it using Spring Boot.
You can find the full source code from GitHub.
We can use Elastic Search HighLevelRestClient and write queries and get results from Elastic Search. We can also directly use APIs from Elastic Search but Using SDKs is always preferred.
Include dependencies
plugins {
id 'org.springframework.boot' version '2.5.4'
id 'io.spring.dependency-management' version '1.0.11.RELEASE'
id 'java'
}group = 'com.elastic.demo'
version = '0.0.1-SNAPSHOT'
sourceCompatibility = '11'
configurations {
compileOnly {
extendsFrom annotationProcessor
}
}
repositories {
mavenCentral()
}
dependencies {
implementation 'org.springframework.boot:spring-boot-starter-data-elasticsearch'
implementation group: 'org.elasticsearch.client', name: 'elasticsearch-rest-high-level-client', version: '7.3.0'
implementation group: 'org.springdoc', name: 'springdoc-openapi-ui', version: '1.5.10'
implementation group: 'org.elasticsearch', name: 'elasticsearch', version: '7.3.0'
implementation group: 'org.apache.commons', name: 'commons-lang3', version: '3.11'
implementation 'org.springframework.boot:spring-boot-starter-web'
compileOnly 'org.projectlombok:lombok'
annotationProcessor 'org.projectlombok:lombok'
testImplementation 'org.springframework.boot:spring-boot-starter-test'
}
test {
useJUnitPlatform()
}
Configure Elastic Search
@Configuration
@Slf4j
public class ElasticSearchConfiguration { @Bean(name = "highLevelClient", destroyMethod = "close")
public RestHighLevelClient client(){
RestClientBuilder builder = RestClient.builder(new HttpHost("localhost", 9200, "http"));
builder.setRequestConfigCallback(requestConfigBuilder -> requestConfigBuilder.setConnectTimeout(600 * 1000).setSocketTimeout(600 * 1000)
.setConnectionRequestTimeout(-1));
RestHighLevelClient client = new RestHighLevelClient(builder);
return client;
}
}
Elastic Search Connection Proxy
@Service
@Slf4j
@RequiredArgsConstructor
public class HighLevelRestClient { private final RestHighLevelClient restHighLevelClient;
@SneakyThrows
public SearchResponse postSearchQueries(SearchRequest searchRequest) {
log.info("Search JSON query: {}", searchRequest.source().toString());
return restHighLevelClient.search(searchRequest, RequestOptions.DEFAULT);
}
@SneakyThrows
public MultiSearchResponse postMSearch(MultiSearchRequest multiSearchRequest) {
log.info("Search JSON query: {}", multiSearchRequest.requests().toString());
return restHighLevelClient.msearch(multiSearchRequest, RequestOptions.DEFAULT);
}
}
A Sample Service Method to Query Data
public WSUsersResponse searchDateRange(String fromDate, String toDate, Integer offset, Integer limit) {
SearchRequest searchRequest = new SearchRequest();
searchRequest.indices("user");
SearchSourceBuilder sourceBuilder = new SearchSourceBuilder();
sourceBuilder.from(offset);
sourceBuilder.size(limit);
sourceBuilder.query(QueryBuilders.rangeQuery("dateOfBirth").gte(fromDate).lte(toDate));
searchRequest.source(sourceBuilder);
SearchResponse searchResponse = highLevelRestClient.postSearchQueries(searchRequest);
log.info("Search JSON query: {}", searchRequest.source().toString());
return extractUserResponse(searchResponse);
}
Similarly, there are npm modules, go modules also available to query data from elastic search. They are pretty much straightforward. The typical part is to figure out how to query the data from Elastic search, the actual implementation using any language or framework is pretty straightforward.