Search API

Cantemo’s search API is an example of a v2 API. It is possible to test this API via the Help menu in Cantemo.

Intro

The search API is available via /API/v2/search/ with a PUT request. It takes a JSON search document in the request body.

PUT

Specific fields

You can specify list of field you want to be returned from API:

"fields": [
  "id",
  "title",
  "parent_collection",
  "portal_mf201890",
  "portal_mf268857",
  "portal_mf551902",
  "metadata_main_group"
]

Filtering

In addition to searching using a query string, you can also filter out results by specific field. This can be done using a document on the following form:

{
  "filter": {
    "operator": "AND",
    "terms": [
      {"name": "title", "value": "squirrel"}
    ]
  }
}

You can search for multiple values at the same time. In this case an implicit AND is used so:

{
  "filter": {
    "operator": "AND",
    "terms": [
      {"name": "portal_mf471117", "value": ["en", "sv"]}
    ]
  }
}

will return all hits which contain both the values en and sv in field portal_mf471117.

For an implicit OR on a single field, use “value_in”, for example:

{
  "filter": {
    "operator": "AND",
    "terms": [
      {"name": "id", "value_in": ["VX-9", "VX-2", "VX-5"]}
    ]
  }
}

will return all hits which contain either the value VX-9 or VX-2 or VX-5 in field id.

You can also search in multiple fields at the same time. In this case, the “operator” defines how the terms as combined, in this example with OR:

{
  "filter": {
    "operator": "OR",
    "terms": [
      {"name": "portal_mf551902", "value": "12"},
      {"name": "title", "value": "squirrel"}
    ]
  }
}

This query will return all items which have _either_ squirrel in the title field _or_ 12 as value of the field portal_mf551902.

Range filters

Certain metadata field types support a range search. The field types which support this are: Timestamp, Date, Integer and Float.

A range search can be performed:

{
  "filter": {
    "operator": "AND",
    "terms": [
      {"name": "portal_mf551902", "range": {"max": "14", "min": "2"}},
      {"name": "portal_mf268857", "range": {"min": "2016-12-24"}},
      {"name": "portal_mf201890", "range": {"max": "2"}}
    ]
  }
}

In order to perform an open-ended search you can leave out either min or max.

When using range filters, the “now”, “now_utc” and “now_local_tz” keywords can be used. Both “now_utc” and “now” (which are alias), will be interpreted as the current datetime for the UTC timezone. When using “now_local_tz”, the date used in the range filter is the current datetime in the Cantemo Server timezone (if it was changed, otherwise will also be UTC).

Important: when using “now”, “now_utc”, “now_local_tz”, datetime objects are created. So, if this is intended to be used to filter against dates values (without a time), then the value should be floored to the 00:00 time, which can be done by applying the “/d” operator. So, if querying Cantemo at, for example, 2019-01-02 at 01:55:00 UTC, then the values will be:

  • 2019-01-02T01:55:00 for “now”

  • 2019-01-02T00:00:00 for “now/d”

Missing filters

If you want to find documents that does not contain a particular field you can specify missing or exists key in document:

{
  "filter": {
    "operator": "AND",
    "terms": [
      {"name": "portal_mf619153", "missing": true}
    ]
  }
}

or:

{
  "filter": {
    "operator": "AND",
    "terms": [
      {"name": "portal_mf619153", "exists": false}
    ]
  }
}

Often you want to filter our deleted items, this works by including the following:

{
  "filter": {
    "operator": "AND",
    "terms": [
      {"name": "portal_deleted", "missing": true}
    ]
  }
}

Another example, items that do not have a metadata group set:

{
  "filter": {
    "operator": "AND",
    "terms": [
      {"name": "metadata_main_group", "missing": true}
    ]
  }
}

Exists filters

If you want to find documents that contain a particular field you can specify exists or missing key in document:

{
  "filter": {
    "operator": "AND",
    "terms": [
      {"name": "portal_mf619153", "exists": true}
    ]
  }
}

or:

{
  "filter": {
    "operator": "AND",
    "terms": [
      {"name": "portal_mf619153", "missing": false}
    ]
  }
}

Filter operators

Operators allows using more complex booelan expressions and nesting. Supported operators are AND, OR and NOT. This allows for basic boolean logic.

An example using an OR operator:

{
  "filter": {
    "operator": "OR",
    "terms": [
      {"name": "id", "value": "VX-9"},
      {"name": "id", "value": "VX-2"},
      {"name": "id", "value": "VX-5"}
    ]
  }
}

This is the same example as above, but this time the query will return all hits which contain either the value VX-9 or VX-2 or VX-5 in field id.

You can also nest filters:

{
  "filter": {
    "operator": "AND",
    "terms": [
      {"name": "metadata_main_group", "value": "Film"}
    ],
    "filters": [
      {
        "operator": "OR",
        "terms": [
          {"name": "portal_mf201890", "value": "1"},
          {"name": "title", "value": "squirrel"}
        ]
      }
    ]
  }
}

This query will return all items which have _either_ squirrel in the title field _or_ 1 as value of the field portal_mf201890, and where the main metadata group is Film.

Subgroups

Metadata is subgroups can be searched for by using a filter on the root level. This will return items which have these values in any fields in the item metadata, irregardless of which subgroup instance the value is in. For example, the following will return any item which has the value sv in metadata field portal_mf771345 and value 5 in metadata field portal_mf642112 even if these values are in two different subgroup instances:

{
  "filter": {
    "operator": "AND",
    "terms": [
      {"name": "portal_mf771345", "value": "sv"},
      {"name": "portal_mf642112", "value": "5"}
    ],
  }
}

To have more granular control, you can search for fields in a specific subgroup. This will return items where the fields are present in the same subgroup instance. The following example shows how to search for the two fields used above, but this time they have to be present in the same subgroup. It will not hit items which have these values in two different subgroup instances:

{
  "filter": {
    "operator": "AND",
    "groups": [{
      "name": "Subgroup",
      "filter": {
        "operator": "AND",
        "terms": [
          {"name": "portal_mf771345", "value": "sv"},
          {"name": "portal_mf642112", "value": "5"}
        ],
      }
    }]
  }
}

The groups filter can also be combined with other filters. This will search for an item which has either the value sv in field portal_mf771345 or the value 5 in the field portal_mf771345 of an instance of the group Subgroup, and which does not have the value The Matrix in the field portal_mf112753 in any instance of the group SubSubGroup inside the former instance of the group Subgroup:

{
  "filter": {
    "groups": [{
        "name": "Subgroup",
        "filter": {
            "operator": "OR",
            "terms": [
                {"name": "portal_mf771345", "value": "sv"},
                {"name": "portal_mf642112", "value": "5"}
            ],
            "groups": [{
                "name": "SubSubgroup",
                "filter": {
                    "operator": "NOT"
                    "terms": [
                        {"name": "portal_mf112753", "value": "The Matrix"}
                    ]
                }
            }]
         }
    }]
  }
}

Sorting

You can sort the search result using one or multiple fields and each field can be sorted in descending or ascending order. The search result is first sorted using the first field and any results which have the same value will be sorted using the second field, and so on:

{
  "query": "squirrel",
  "sort": [
    {"name": "portal_mf268857", "order": "desc"},
    {"name": "title", "order": "asc"}
  ]
}

Field name _relevance (or its alias _score) can be used for sorting on relevance, based on matches on a search query.

Pagination

You can paginate using the query parameters page and page_size. This allows you to page through the search result one page at a time.

Search History

By default, calls to the Search API do not modify users search history, and do not return a search history ID.

This can be controlled with the query parameter update_search_history: If true, the request will update users Search History with the given search document, and the search history ID is returned as “search_id”.

Please note that filters for complex search documents cannot be rendered in the Cantemo UI, more information in Saved Searches.

Doc types

You can specify document types you want to perform search on. The valid values are item, subclip or collection:

{
  "doc_types": ["item", "subclip", "collection"]
}

The default is items, subclips and collections, [“item”, “subclip”, “collection”].

Note: even when doc_type is limited to item documents, also timed subclips (i.e. timed annotation metadata) values are searched by default. This can be modified with the search_interval-parameter.

Search interval

You can define what type of metadata is searched: Only non-timed item metadata (item), annotation timed metadata (subclip), or all metadata (all):

{
  "search_interval": "item"
}

Default is all metadata, all.

Wildcards

Wildcard characters can be used when searching in String and Text fields. Supported characters are * which matches any string of characters, and ? which matches a single character.

Note that the wildcard is filtered against to each individual word in the field, not the whole text field. Stop words, i.e. very common words such as “this” and “is”, are not indexed and are not searchable.

Field types

The following field types are supported by the Search API:

  • String

  • Text

  • Date

  • Timestamp

  • Dropdown

  • Radio

  • Checkbox

  • Lookup

  • Tags

  • Integer

  • Float

String and Text field

From the Search API perspective, String and Text fields behave the same way. These fields can be searched for using value and wildcard searches.

Filtering is against each individual word of the value, not the whole text field. Stop words are not searchable.

Date and Timestamp

All dates are sent as ISO8601 timestamps on the form YYYY-MM-DDThh:mm:ss.xxx with an optional timezone either specified as Z or as an offset from UTC specified as [+-]HH:MM. If the timezone if unspecified then the system local timezone is assumed.

When searching, it is also possible to do arithmetic relative to the current date. For example, you can search for items created during the past 5 days using the following query:

{
  "filter": {
    "operator": "AND",
    "terms": [
      {"name": "portal_mf471117", "range": {"min": "now-5d"}}
    ]
  }
}

Numbers

The API supports numbers and floats. You can either search for an exact value or using a range search.

Multiple choice fields

Dropdowns, Checkboxes, Radio buttons, and Lookups can be used when searching. The key must be used when searching for these values. For example, to search for all items which have English or Swedish in the Language field:

{
  "filter": {
    "operator": "AND",
    "terms": [
      {"name": "portal_mf471117", "value_in": ["en", "sv"]}
    ]
  }
}

For an implicit AND, send in the keys as a list to “value”:

{
  "filter": {
    "operator": "AND",
    "terms": [
      {"name": "portal_mf471117", "value": ["en", "sv"]}
    ]
  }
}

I.e. this would match items which have both English and Swedish in the Language field.

Exact Search on Terms

In most cases, for fields containing “text” values, the search retrieves Items where the contained value on the specified field is “similar” to the value used to search. This applies to fields of the following types:

  • string

  • textarea

  • checkbox

  • tags

  • radio

  • lookup

  • dropdown

For example, this query:

{
  "filter": {
    "operator": "AND",
    "terms": [
      {
        "name": "title",
        "value":"squirrel"
      }
    ]
  }
}

will retrieve Items having “squirrel” as the value of the “title” field, but will also return items where the value of this field is “squirrel.1213” or “my squirrel”.

In some scenarios, it’s useful to search only for Items where the value contained in the specified field is exactly the same as the specified on the search term. In order to achieve this, the “exact” option can be used as in the following example:

{
  "filter": {
    "operator": "AND",
    "terms": [
      {
        "name": "title",
        "value":"squirrel",
        "exact": true
      }
    ]
  }
}

Will match for items where the title is exactly “squirrel” (this is also case-sensitive, so for example, “Squirrel” will not match with that search). Important: the ‘exact’ option is not valid for the following field types:

  • date

  • float

  • integer

  • timestamp

Aggregations

Aggregations can be used to do calculations over the results from searches. This can for example be used to drive a faceted search where the user is presented with various metadata fields and the most popular values for each field. This functionality is built on the aggregations functionality in Elasticsearch which is the search engine bundled with Cantemo. The full documentation and usage examples of Elasticsearch Aggregations can be found at https://www.elastic.co/guide/en/elasticsearch/reference/6.2/search-aggregations.html

Cantemo supports some, but not all aggregations available in Elasticsearch. Cantemo uses an internal document structure in Elasticsearch so even though you can query Elasticsearch directly, the data you get back will not be in a usable form, and the document format is not guaranteed to be stable over time.

Common options

All aggregations support the option missing, which you can use to set a default value for documents which are missing a value. Normally, these documents are ignored when calculating aggregations but with the option missing you can assign a value when doing these calculations. This can for example be used to include documents without any tags in a terms aggregation, in order to show them in the GUI:

{
  "query": "swedish",
  "aggregations": {
    "language": {
      "terms": {
        "missing": "NO_LANGUAGE",
        "name": "portal_mf471117"
      }
    }
  }
}

This results in the following response:

{
  "aggregations": {
    "language": {
      "buckets": [{"key": "NO_LANGUAGE", "doc_count": 15},
                  {"key": "sv", "doc_count": 8},
                  {"key": "ru", "doc_count": 4},
                  {"key": "uk", "doc_count": 2},
                  {"key": "ro", "doc_count": 1}]
    }
  }
}

Terms

The Terms aggregation can be used to calculate the number of objects with different values for a given metadata field. For example, it can be used to return the number of items with different values set in a lookup field:

{
  "query": "swedish",
  "aggregations": {
    "language": {
      "terms": {
        "name": "portal_mf471117"
      }
    }
  }
}

This results in the following response:

{
  "aggregations": {
    "language": {
      "buckets": [{"key": "sv", "doc_count": 8},
                  {"key": "ru", "doc_count": 4},
                  {"key": "uk", "doc_count": 2},
                  {"key": "ro", "doc_count": 1}]
    }
  }
}
Options

The Terms aggregation supports options which can be used to tune which results are returned.

  • size - Limit the number of buckets returned

Query:

{
  "query": "swedish",
  "aggregations": {
    "language": {
      "terms": {
        "name": "portal_mf471117",
        "size": 3
      }
    }
  }
}

Response:

{
  "aggregations": {
    "language": {
      "buckets": [{"key": "sv", "doc_count": 8},
                  {"key": "ru", "doc_count": 4},
                  {"key": "uk", "doc_count": 2}]
    }
  }
}
  • min_doc_count - only return buckets with at least this number of occurrences

Query:

{
  "query": "swedish",
  "aggregations": {
    "language": {
      "terms": {
        "name": "portal_mf471117",
        "min_doc_count": 4
      }
    }
  }
}

Response:

{
  "aggregations": {
    "language": {
      "buckets": [{"key": "sv", "doc_count": 8},
                  {"key": "ru", "doc_count": 4}]
    }
  }
}
  • include - an array of strings with exact match of key

Query:

{
  "query": "swedish",
  "aggregations": {
    "language": {
      "terms": {
        "name": "portal_mf471117",
        "include": ["sv", "u"]
      }
    }
  }
}

Response:

{
  "aggregations": {
    "language": {
      "buckets": [{"key": "sv", "doc_count": 4}]
    }
  }
}
  • exclude - an array of strings with exact match of key

Query:

{
  "query": "swedish",
  "aggregations": {
    "language": {
      "terms": {
        "name": "portal_mf471117",
        "exclude": ["sv", "u"]
      }
    }
  }
}

Response:

{
  "aggregations": {
    "language": {
      "buckets": [{"key": "ro", "doc_count": 8},
                  {"key": "ru", "doc_count": 3},
                  {"key": "uk", "doc_count": 2}
    }
  }
}
  • include_pattern - a regular expression of the keys to include

Query:

{
  "query": "swedish",
  "aggregations": {
    "language": {
      "terms": {
        "name": "portal_mf471117",
        "include_pattern": "r.*"
      }
    }
  }
}

Response:

{
  "aggregations": {
    "language": {
      "buckets": [{"key": "ru", "doc_count": 4},
                  {"key": "ro", "doc_count": 1}]
    }
  }
}
  • exclude_pattern - a regular expression of the keys to exclude

Query:

{
  "query": "swedish",
  "aggregations": {
    "language": {
      "terms": {
        "name": "portal_mf471117",
        "exclude_pattern": "r.*"
      }
    }
  }
}

Response:

{
  "aggregations": {
    "language": {
      "buckets": [{"key": "sv", "doc_count": 8},
                 {"key": "uk", "doc_count": 2}]
    }
  }
}
  • order - Return the resulting buckets in a particular order

By default, the buckets are returned sorted by count descending, with the most numerous term first in the list. This behavior can be altered by specifying an expected order. The order can be either count or key, and the direction can be either asc or desc

Query:

{
  "query": "swedish",
  "aggregations": {
    "language": {
      "terms": {
        "name": "portal_mf471117",
        "order": {"key": "asc"}
      }
    }
  }
}

Response:

{
  "aggregations": {
    "language": {
      "buckets": [{"key": "ru", "doc_count": 4},
                  {"key": "sv", "doc_count": 8},
                  {"key": "uk", "doc_count": 2}]
    }
  }
}

Terms searches works best on fields with a limited number of distinct choices, but we do not limit which fields you can do a terms search on. So if you have a date field which you only store a year value in for example, then you can still do a terms aggregation. However, for that use case, Histogram is probably more suitable.

Range

The Range aggregation can be used to divide number or date metadata fields into custom ranges.

Query:

{
  "query": "swedish",
    "aggregations": {
      "episode": {
        "range": {
          "name": "portal_mf551902",
          "ranges": [
            {"to": 2},
            {"from": 3, "to": 6},
            {"from": 7}]
      }
    }
  }
}

Response:

{
  "aggregations": {
    "episode": {
      "buckets": [{"to": 2, "key": "*-2.0", "doc_count": 1},
                  {"to": 6, "from": 3, "key": "3.0-6.0", "doc_count": 0},
                  {"from": 7, "key": "7.0-*", "doc_count": 7}]
    }
  }
}

As you can see, the response contains the ranges specified in the query, with the respective ranges and a key identifying the range. You can also specify your own key for each range.

Query:

{
  "query": "swedish",
    "aggregations": {
      "episode": {
        "range": {
          "name": "portal_mf551902",
          "ranges": [
            {"to": 2, key: "low"},
            {"from": 3, "to": 6, "key": "middle"},
            {"from": 7, "key": "high"}]
      }
    }
  }
}

Response:

{
  "aggregations": {
    "episode": {
      "buckets": [{"to": 2, "key": "low", "doc_count": 1},
                  {"to": 6, "from": 3, "key": "middle", "doc_count": 0},
                  {"from": 7, "key": "high", "doc_count": 7}]
    }
  }
}

Range aggregations are supported for number metadata types, integers and floats, as well as timestamps and date fields.

Options

For date and timestamp fields, the API supports additional options.

Query:

{
  "aggregations": {
    "release_date": {
      "range": {
        "name": "portal_mf268857",
        "format": "MM/dd/YYYY",
          "ranges": [
            {"to": "12/24/2016"},
            {"from": "12/24/2016", "to": "12/25/2016"},
            {"from": "12/26/2016"}
          ]
        }
      }
   }
}

Response:

{
  "aggregations": {
      "release_date": {
          "buckets": [
              {
                  "to": "12/24/2016",
                  "key": "*-12/24/2016",
                  "doc_count": 8
              },
              {
                  "from": "12/24/2016",
                  "doc_count": 4,
                  "to": "12/25/2016",
                  "key": "12/24/2016-12/25/2016"
              },
              {
                  "from": "12/26/2016",
                  "key": "12/26/2016-*",
                  "doc_count": 10
              }
          ]
      },
  • time_zone

By default, queries for date and timestamp are returned with the system’s configured timezone, but you can override that with the time_zone setting. Available timezones are listed at http://www.joda.org/joda-time/timezones.html

Query:

{
  "aggregations": {
    "create_date": {
      "range": {
        "name": "xmp_xmp_CreateDate",
        "time_zone": "America/Los_Angeles",
        "ranges": [
          {"to": "2015-07-01"},
          {"from": "2015-07-01", "to": "2015-09-01"},
          {"from": "2015-09-01"}
        ]
      }
    }
  }
}

Response:

{
  "aggregations": {
    "create_date": {
      "buckets": [
        {
          "to": "2015-07-01T00:00:00.000-07:00",
          "key": "*-2015-07-01T00:00:00.000-07:00"
          "doc_count": 3
        },
        {
          "from": "2015-07-01T00:00:00.000-07:00",
          "to": "2015-09-01T00:00:00.000-07:00",
          "doc_count": 1,
          "key": "2015-07-01T00:00:00.000-07:00-2015-09-01T00:00:00.000-07:00"
        },
        {
          "from": "2015-09-01T00:00:00.000-07:00",
          "key": "2015-09-01T00:00:00.000-07:00-*",
          "doc_count": 6
        }
      ]
    }
  }
}

Histogram

The Histogram aggregation can be used to automatically group documents into interval and return the counts for the different intervals. This can be useful for example to display a timestamp field as a bar diagram showing the number of occurrences per month.

When performing a histogram aggregation, you have to specify what interval you want the data aggregated over. The interval can be specified in different ways, depending on the type of field you are querying. If the field is an integer or a float field then you must specify interval as a number, which will be the size of each bucket in the histogram. If the field is of type date or timestamp then you can specify interval as one of the following:

year (1y), quarter (1q), month (1M), week (1w), day (1d), hour (1h), minute (1m), second (1s)

or as one of the time units documented at https://www.elastic.co/guide/en/elasticsearch/reference/6.2/common-options.html#time-units

An example histogram query for a timestamp field may look as follows.

Query:

 {
   "query": "swedish",
   "aggregations": {
     "create_date": {
       "histogram": {
         "name": "xmp_xmp_CreateDate",
         "interval": "month"
       }
     }
   }
}

Response:

{
  "aggregations": {
    "create_date": {
      "buckets": [{"key": "2015-06-01T00:00:00.000-07:00", "doc_count": 1},
                  {"key": "2015-07-01T00:00:00.000-07:00", "doc_count": 0},
                  {"key": "2015-08-01T00:00:00.000-07:00", "doc_count": 1},
                  {"key": "2015-09-01T00:00:00.000-07:00", "doc_count": 6}]
    }
  }
}

Histogram aggregations are supported for integer, float, timestamp and date fields.

Options
  • offset

By default, each bucket will start at an offset of 0, so if interval is specified as 10 for an integer field with only positive numbers, then the first bucket will contain documents from 0 to 9, the second bucket from 10 to 19 and so on. Offset can be used to start at a different number, so if offset is 5, then the first bucket will contain documents from -5 to 4, the second bucket will contain 5 to 14 instead. Offset works the same for timestamp and date field so you can create a histogram of each month where buckets start on the 15th of the month instead of the first.

  • format

Format can be specified for histograms over date and timestamp fields and work the same way as for ranges.

  • time_zone

Timezone can be specified for histograms over date and timestamp fields and work the same way as for ranges.

  • order

Order can be specified the same way as for terms aggregations. The only difference is that histograms are sorted by key ascending by default.

Metrics Aggregations

Cantemo supports most, but not all Metric Aggregations supported by Elastic search and documented at https://www.elastic.co/guide/en/elasticsearch/reference/6.2/search-aggregations-metrics.html and please see the Elasticsearch documentation for specific notes about implementation details and limitations.

Avg

Can be used to calculate the average of a field. Only supported for integer and float fields.

Query:

{
  "query": "swedish",
  "aggregations": {
    "episode": {
      "avg": {
          "name": "portal_mf551902"
      }
    }
  }
}

Response:

{
  "aggregations": {
    "episode": {
      "value": 9
    }
  }
}
Max

Can be used to calculate the maximum value of a field in any document. Only supported for integer and float fields.

Query:

{
  "query": "swedish",
  "aggregations": {
    "episode": {
      "max": {
          "name": "portal_mf551902"
      }
    }
  }
}

Response:

{
  "aggregations": {
    "episode": {
      "value": 15
    }
  }
}
Min

Can be used to calculate the minimum value of a field in any document. Only supported for integer and float fields.

Query:

{
  "query": "swedish",
  "aggregations": {
    "episode": {
      "min": {
          "name": "portal_mf551902"
      }
    }
  }
}

Response:

{
  "aggregations": {
    "episode": {
      "value": 1
    }
  }
}
Sum

Can be used to calculate the sum of the value of a field in any all. Only supported for integer and float fields.

Query:

{
  "query": "swedish",
  "aggregations": {
    "episode": {
      "sum": {
          "name": "portal_mf551902"
      }
    }
  }
}

Response:

{
  "aggregations": {
    "episode": {
      "value": 72
    }
  }
}
Value Count

Can be used to calculate the number of documents in a bucket.

Query:

{
  "query": "swedish",
  "aggregations": {
    "episode": {
      "value_count": {
          "name": "portal_mf551902"
      }
    }
  }
}

Response:

{
  "aggregations": {
    "episode": {
      "value": 8
    }
  }
}
Stats

An aggregator which returns all of the above stats.

Query:

{
  "query": "swedish",
  "aggregations": {
    "episode": {
      "stats": {
          "name": "portal_mf551902"
      }
    }
  }
}

Response:

{
  "aggregations": {
    "episode": {
      "count": 8,
      "max": 15.0,
      "sum": 72.0,
      "avg": 9.0,
      "min": 1.0
    }
  }
}
Cardinality

The cardinality aggregation can be used to calculate the number of different values in the result set. The response is approximate and we recommend reading the documentation at https://www.elastic.co/guide/en/elasticsearch/reference/6.2/search-aggregations-metrics-cardinality-aggregation.html before using this aggregator.

The option precision_threshold is supported.

Query:

{
  "query": "swedish",
  "aggregations": {
    "episode": {
      "cardinality": {
          "name": "portal_mf551902"
      }
    }
  }
}

Response:

{
  "aggregations": {
    "episode": {
      "value": 5
    }
  }
}
Percentiles

The percentiles aggregator returns a list of percentiles for the requested metadata field. To quote from the Elasticsearch documentation:

Percentiles show the point at which a certain percentage of observed values occur. For example, the 95th percentile is the value which is greater than 95% of the observed values.

Percentiles are often used to find outliers. In normal distributions, the 0.13th and 99.87th percentiles represents three standard deviations from the mean. Any data which falls outside three standard deviations is often considered an anomaly.

When a range of percentiles are retrieved, they can be used to estimate the data distribution and determine if the data is skewed, bimodal, etc.

Query:

{
  "query": "swedish",
  "aggregations": {
    "episode": {
      "percentiles": {
        "name": "portal_mf551902"
      }
    }
  }
}

Response:

{
  "aggregations": {
     "episode": {
       "values": {
          "1.0": 1.0,
          "5.0": 1.0,
          "25.0": 8.0,
          "50.0": 9.0,
          "75.0": 11.0,
          "95.0": 15.0,
          "99.0": 15.0
       }
     }
 }

The percentiles aggregation by default returns the following range of percentiles: [ 1, 5, 25, 50, 75, 95, 99 ]. This can be overridden with the percents option.

Query:

{
  "query": "swedish",
  "aggregations": {
    "episode": {
      "percentiles": {
        "name": "portal_mf551902",
        "percents": [5, 75]
      }
    }
  }
}

Response:

{
  "aggregations": {
    "episode": {
      "values": {
        "5.0": 1,
        "75.0": 11.0
      }
    }
  }
}
Extended stats

The extended stats aggregation can be used to return a large number of statistical metrics for a given field.

Query:

{
  "query": "swedish",
  "aggregations": {
    "episode": {
      "extended_stats": {
        "name": "portal_mf551902"
      }
    }
  }
}

Response:

{
  "aggregations": {
    "episode": {
      "avg": 9.0,
      "count": 8,
      "max": 15.0,
      "min": 1.0,
      "std_deviation": 3.774917217635375,
      "std_deviation_bounds": {
        "lower": 1.4501655647292502,
        "upper": 16.54983443527075
      },
      "sum": 72.0,
      "sum_of_squares": 762.0,
      "variance": 14.25
    }
  }
}

Sub Aggregations

Both bucket and metric aggregations can be nested inside another bucket aggregation. This allows for complex calculations. In the following example, we group items together by release date by month, and then calculate an average rating for each month’s releases.

Query:

 {
   "query": "swedish",
   "aggregations": {
     "release_date": {
       "histogram": {
         "name": "portal_mf268857",
         "interval": "month"
       },
       "rating": {"avg": {"name": "portal_mf365234"}}
     }
   }
}

Response:

{
  "aggregations": {
    "release_date": {
      "buckets": [{"key": "2015-06-01", "doc_count": 1, "rating": {"value": 4} },
                  {"key": "2015-07-01", "doc_count": 5, "rating": {"value": 2.25}},
                  {"key": "2015-08-01", "doc_count": 1, "rating": {"value": 3}},
                  {"key": "2015-09-01", "doc_count": 6, "rating": {"value": 3.5}}
    }
  }
}

Note

It is only possible to nest aggregations inside a bucket aggregation. Attempting to nest an aggregation inside a metric aggregation will result in a validation error.

Post Filters

Cantemo’s search API support the use of post filters when combined with aggregations. The filter is applied after the query has executed. We can use this behavior to apply additional filters to our search criteria that don’t affect things like facets in your UI. Be aware that post filters should be used only in combination with aggregations, and only when you need differential filtering. The full documentation and usage examples of Elasticsearch Post filter can be found at: https://www.elastic.co/guide/en/elasticsearch/reference/current/search-request-body.html#request-body-search-post-filter

Post filters has the same structure as normal terms filtering.

Query:

{
  "aggregations": {
      "doc_type": {
          "terms": {
              "name": "type", "includes": ['item']
          }
      }
  },
  "post_filter": {
    "terms": [{"name": "mediaType", "value_in": ["video"]}]
  }
}

Response:

{
  "aggregations": {
    "doc_type": {
      buckets: [
          {key: "item", doc_count: 115},
          {key: "collection", doc_count: 3}
      ]
  }
}

Note

The aggregations are unchanged as the post filters are applied after the search query has been executed, however the result will only include video assets.

Saved Searches

Searches formatted for the PUT variant of the API can also be saved, using a POST request on the /API/v2/search/saved/ endpoint. For example:

{
  "document": {
    "category": "Film",
    "document": {
      "filter": {
        "operator": "AND",
        "terms": [
          {
            "name": "portal_deleted",
            "missing": true
          },
          {
            "category": "usermetadata",
            "name": "portal_mf201890",
            "value": "1"
          },
          {
            "category": "filemetadata",
            "name": "portal_itemtype",
            "value_in": [
              "video",
              "audio"
            ]
          }
        ]
      }
    },
    "sort": [
      {"name": "title", "order": "desc"}
    ]
  },
  "name": "Video or audio, season is 1"
}

The created saved search:

  • Matches items where metadata field Season (portal_mf201890) is 1, and type is video or audio

  • Results are sorted by Title, descending order

  • “category”: “user” for the filter on portal_mf201890 defines that this is shown in the User Metadata section of the search form.

  • “category”: “filemetadata” for the portal_itemtype filter defines that this is shown in the File Metadata section of the search form.

  • The high-level “category”: “Film” defines the metadata group used for user metadata filters

  • Name for the Saved Search is “Video or audio, season is 1”

Please check search documents created with the Cantemo UI for more concrete examples, and to match the way the User Interface generates for example date range filters.

This end-point returns the ID of the new Saved Search, and the search document and name.

Note: Complex Saved Search documents cannot be rendered by the Cantemo search form, and their filters can only be shown and modified with the REST APIs.

Additionally:

  • GET /API/v2/search/saved/{search_id}/ returns results for an existing Saved Search

  • PUT /API/v2/search/saved/{search_id}/ can be used to update an existing Saved Search

  • DELETE /API/v2/search/saved/{search_id} deletes an existing Saved Search

  • GET /API/v2/search/{search_id}/document/ returns the search document, id, and name of an existing Saved Search

Autocomplete suggestions

The search API also supports retrieving autocomplete suggestions. This will return autocomplete matches within the search result. This can be used for example to limit the autocomplete results to only the items in a particular collection, or only the ones limited by a certain metadata field.

The following examples all use page_size set to 0, which makes the API only return the suggestions and not the search result.

Simple query

In the simplest form, you can include a top-level element with the name autocomplete in the search document. This will return autocomplete suggestions with for all assets in the system.

Query:

PUT to /API/v2/search/?page_size=0

{
  "autocomplete": {
    "simple_autocomplete_query": {
      "query": "How m"
    }
  }
}

Response:

{
  "autocomplete": {
    "simple_autocomplete_query": [
      "How much wood could a woodchuck chuck if a woodchuck?",
      "How much would it cost?",
      "How many roads must a man walk down?"
    ]
}

Limited scope

You can also include a search query or filter with the search document, to limit which items are included in the autocomplete result.

The following example searches only among items which matches the query woodchuck

Query:

PUT to /API/v2/search/?page_size=0

{
  "query": "woodchuck",
  "autocomplete": {
    "filtered_autocomplete_query": {
      "query": "would chuck"
    }
  }
}

Response:

{
  "autocomplete": {
    "filtered_autocomplete_query": [
      "How much wood could a woodchuck chuck if a woodchuck?",
      "A woodchuck would chuck no amount of wood since a woodchuck can’t chuck wood.",
      "But if a woodchuck could chuck and would chuck some amount of wood",
      "what amount of wood would a woodchuck chuck?""
    ]
}

The following example searches only among items which have the word Nature in their title field.

Query:

PUT to /API/v2/search/?page_size=0

{
  "filter": {
    "operator": "AND",
    "terms": [
      {
        "name": "title",
        "value":"Nature"
      }
    ]
  },
  "autocomplete": {
    "filtered_autocomplete_query": {
      "query": "woodc"
    }
  }
}

Response:

{
  "autocomplete": {
    "filtered_autocomplete_query": [
      "Woodchucks in their natural habitats",
      "Woodchopping for beginners"
    ]
}

Specific field

You can limit the fields which are queried for the autocomplete matching. Only free-text fields are supported for autocompletion.

Query:

PUT to /API/v2/search/?page_size=0

{
  "autocomplete": {
    "title_autocomplete_query": {
      "query": "wood",
      "field": "title"
    }
  }
}

Response:

{
  "autocomplete": {
    "title_autocomplete_query": [
      "Woody Woodpecker",
      "Woody Allen",
      "Woodstock"
    ]
}

Limit size

You can control how many autocomplete suggestions are returnes with the size parameter.

Query:

PUT to /API/v2/search/?page_size=0

{
  "autocomplete": {
    "size_autocomplete_query": {
      "query": "How m",
      "size": 2
    }
  }
}

Response:

{
  "autocomplete": {
    "size_autocomplete_query": [
      "How much wood could a woodchuck chuck if a woodchuck?",
      "How much would it cost?"
    ]
}

Multiple queries

It is also possible to perform multiple autocomplete suggestions in one search query

Query:

PUT to /API/v2/search/?page_size=0

{
  "autocomplete": {
    "simple_autocomplete_query": {
      "query": "How m"
    },
    "title_autocomplete_query": {
      "query": "wood",
      "field": "title"
    }
  }
}

Response:

{
  "autocomplete": {
    "simple_autocomplete_query": [
      "How much wood could a woodchuck chuck if a woodchuck?",
      "How much would it cost?",
      "How many roads must a man walk down?"
    ],
    "title_autocomplete_query": [
      "Woody Woodpecker",
      "Woody Allen",
      "Woodstock"
    ]
}

Nested queries

You can also search for nested objects using the search API. One example in Cantemo when this is needed is when searching for relations. Read more about nested objects here: https://www.elastic.co/guide/en/elasticsearch/reference/7.7/nested.html

Query:

PUT to /API/v2/search/?page_size=0

{
    "filter": {
        "operator": "AND",
        "nested": {
            "operator": "AND",
            "path": "relations",
            "terms": [
                {
                    "name": "related_to",
                    "value": "VX-51"
                },
                {
                    "name": "type",
                    "value": "portal_metadata_cascade"
                }
            ]
        }
    }
}