Tuesday, December 1, 2015

Elasticsearch: What you see is NOT what you get !!!

Recently, I troubleshooted two common issues with ES queries. Suddenly, the queries stopped returning data using some filter criteria. But when we looked at all the records for the type then the filter should have evaluated to true and returned data. We often forget that ES mostly returns original document which was indexed. But while indexing the fields in document there are analyzers being applied. So the field value we see in document is not exactly same as indexed value. 

Consider the following simple example:

Line 1: We have created an employee record with displayName as "Ajey Dudhe".
Line 6: We are retrieving all the employee records.

If you see on the right hand side under _source the document is returned as it is. But does Elasticsearch see the value of displayName as "Ajey Dudhe". The answer is no. In this example, when the document is inserted then default analyzers are used. By default the string value will be broken into tokens and lower-cased. And this is the value which Elasticsearch sees. In order to have a look at the actual value visible to Elasticsearch we need to use the fielddata_fields option while fetching the documents as follows:

Line 22 on right hand side: Notice that the fielddata_fields now has values as seen by ES i.e. tokenized and lower-cased. In case the value is not at all indexed by ES then it will not appear under fields.

No comments:

Post a Comment