In last article, I have explained how to create Azure Search Service using Azure Portal. If you are new to Azure search, I would recommend to have a look at below articles:
In this article, I will explain how to setup Azure search service index.
First of all, what is Index ?
An index is the primary means of organizing and searching documents in Azure Search, similar to how a table organizes records in a database. Index is like a database table while Document is like a row inside that database table.
An index is a persistent store of documents and other constructs used for filtered and full text search on an Azure Search service. And conceptually, a document is single unit of searchable data in your index.
Lets take an example of eCommerce site like Flipkart or Amazon. If we have to implement search for such sites, there will be one document for every product they sell. That way, user can search for any product.
Similarly, in case of hotel booking sites, there should be a document corresponding to every hotel in the index. Then only users would be able to search for every hotel.
You have to define an Index for enabling search on the solution. When you add or update the index, Azure search creates physical structures based on the schema information you provided. Later, when you add / update index or submit search queries, you are sending requests to specific index in your search service.
Loading of documents in the index is called as Indexing or data ingestion. In push model, your application can used REST API or .Net SDK to push index data. In pull model, indexers are used to perform indexing.
How to create Index ?
Now that we understand what is index, let’s have a look at how to create index. For performing below steps, you will have to login to Azure portal and open the Azure search service instance.
Add Index Link
Sign in to the Azure portal and open the service dashboard. You can click All services in the jump bar to search for existing “search services” in the current subscription.
Then you can click on “Add Index” button at the top of page.
You will have to provide name of Azure search index. Please note that this name would be used in the URL.
Below is the sample which shows how index name is used in URL. The name of Azure search service is my-demo-service. The name of index is hotels.
I generally prefer the plural name in the URL as it makes URL more meaningful.
Every index has following components:
- Fields – every field has name, data type and attributes
- Scoring Profiles
- CORS setting to enable customization.
If every index is like a database table and every document is like a row in that table, then every field is like a column in that table.
For every field, you have to specify name, data type and other attributes. The data type specifies what kind of data would be stored in this field. The attributes specifies how the field would be used.
From the field collection, one field should be marked as a key. This key is unique ID of each document used for document lookup.
Every index must have one key and that key should have Edm.String as data type. Only one field can be marked as key.
Below data types are supported:
- Edm.String – text that can be optionally tokenized for full-text search (word breaking, stemming and so forth)
- Collection (Edm.String) – list of strings that can be optionally tokenized for full-text search.
- Edm.Boolean – allows true / false values to be stored in field
- Edm.Int32 – to store 32-bit integer values
- Edm.Int64 – to store long values (i.e. 64 bit integer)
- Edm.Double – to store double precision numeric data
- Edm.DateTimeOffset – to store date in OData V4 format (e.g.
- Edm.Geography – to represent geographic location on the globe.
Below are the attributes which can be set on every field. Below are the available attributes:
- Key – unique ID of every document. Each index must have only one field as key and its type should be Edm.String.
- Retrievable – specifies whether fields should be returned in search result. This is useful for filtering, sorting or scoring mechanism, but do not want to make it visible to end user.
- Filterable – allowed the field to be used in filter queries
- Sortable – allows the query to sort search results based on this field. By default, results are sorted using the score. This field allows user to sort the search results.
- Facetable – allows field to be used in faceted navigation for user self-directed filtering. Generally repetitive values which group multiple objects together (e.g. product brand) are best candidates for this.
- Searchable – marks the field as full-text searchable
Analyzers can be set on a field only if the field is marked as Searchable.
This sets the language analyzer to be used for the field. Below are some of the things done by language analyzers:
- Non-essential words (stopwords) and punctuation are removed.
- Phrases and hyphenated words are broken down into component parts.
- Upper-case words are lower-cased.
- Words are reduced to root forms so that a match can be found regardless of tense.
Language analyzers convert a text input into primitive or root forms that are efficient for information storage and retrieval. Conversion occurs during indexing, when the index is built, and then again during search when the index is read.
You are more likely to get the search results you expect if you use the same analyzer for both operations.
For every language, you will be able to find two types of analyzers, Lucene analyzers and Microsoft analyzers. You can use any of the analyzers. Generally, Microsoft analyzers support more features.
A suggester is used to define which fields in the index would be used for auto-complete or type-ahead queries in searches.
Fields added to a suggester are used to build type-ahead search terms. All of the search terms are created during indexing and stored separately.
A scoring profile is a section of the schema that defines custom scoring behaviors that let you influence which items appear higher in the search results. Scoring profiles are made up of field weights and functions.
To use them, you specify a profile by name on the query string. Alternatively, you can also set default scoring profile to use custom profile as default.
To allow cross-origin queries to your index, enable CORS (Cross-Origin Resource Sharing) by setting the corsOptions attribute. For security reasons, only query APIs support CORS.
This option can have two values –
- allowedOrigins – list of origins allowed to access index
- maxAgeInSeconds – browsers use this value to determine the duration (in seconds) to cache CORS preflight responses.
I hope this has provided enough information to understand how to create index and which all factors should be considered while creating the index.
Please comment and let me know your thoughts.