elasticsearch

package
v0.27.0-beta Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Sep 10, 2024 License: MIT Imports: 13 Imported by: 0

README

---
title: "Elasticsearch"
lang: "en-US"
draft: false
description: "Learn about how to set up a VDP Elasticsearch component https://github.com/instill-ai/instill-core"
---

The Elasticsearch component is a data component that allows users to access the Elasticsearch database.
It can carry out the following tasks:

- [Search](#search)
- [Vector Search](#vector-search)
- [Index](#index)
- [Multi Index](#multi-index)
- [Update](#update)
- [Delete](#delete)
- [Create Index](#create-index)
- [Delete Index](#delete-index)



## Release Stage

`Alpha`



## Configuration

The component configuration is defined and maintained [here](https://github.com/instill-ai/component/blob/main/application/elasticsearch/v0/config/definition.json).




## Setup




In order to communicate with Elastic, the following connection details need to be
provided. You may specify them directly in a pipeline recipe as key-value pairs
withing the component's `setup` block, or you can create a **Connection** from
the [**Integration Settings**](https://www.instill.tech/docs/vdp/integration)
page and reference the whole `setup` as `setup:
${connection.<my-connection-id>}`.

| Field | Field ID | Type | Note |
| :--- | :--- | :--- | :--- |
| Cloud ID (required) | `cloud-id` | string | Fill in the Cloud ID for the Elasticsearch instance |
| API Key (required) | `api-key` | string | Fill in the API key for the Elasticsearch instance (please use encoded one) |




## Supported Tasks

### Search

Search for documents in Elasticsearch, support full text search


| Input | ID | Type | Description |
| :--- | :--- | :--- | :--- |
| Task ID (required) | `task` | string | `TASK_SEARCH` |
| Index Name (required) | `index-name` | string | Name of the Elasticsearch index |
| ID | `id` | string | The ID of the document |
| Query | `query` | string | Full text search query for search task, query will be prioritised over filter if both are provided, if both query and filter are not provided, all documents will be selected |
| Filter | `filter` | object | The query dsl filter which starts with "query" field, please refer to https://www.elastic.co/guide/en/elasticsearch/reference/current/query-filter-context.html |
| Filter SQL | `filter-sql` | string | The filter to be applied to the data with SQL syntax, which starts with WHERE clause, empty for no filter |
| Size | `size` | integer | Number of documents to return. If empty then all documents will be returned |
| Fields | `fields` | array[string] | The fields to return in the documents. If empty then all fields will be returned |
| Minimum Score | `min-score` | number | Minimum score to consider for search results. If empty then no minimum score will be considered |



| Output | ID | Type | Description |
| :--- | :--- | :--- | :--- |
| Status | `status` | string | Search operation status |
| Result | `result` | object | Result of the search operation |






### Vector Search

Search for vector similarity search in Elasticsearch


| Input | ID | Type | Description |
| :--- | :--- | :--- | :--- |
| Task ID (required) | `task` | string | `TASK_VECTOR_SEARCH` |
| Index Name (required) | `index-name` | string | Name of the Elasticsearch index |
| Field (required) | `field` | string | Field name of the vector to search for similar vectors |
| Query Vector | `query-vector` | array[number] | Query vector to search for similar vectors |
| K | `k` | integer | K of documents to do kNN vector search |
| Num Candidates | `num-candidates` | integer | Number of candidates to be considered for kNN vector search. Default to 2 times of k |
| Filter | `filter` | object | The query dsl filter which starts with "filter" field, please refer to https://www.elastic.co/guide/en/elasticsearch/reference/current/knn-search.html#knn-search-filter-example |
| Filter SQL | `filter-sql` | string | The filter to be applied to the data with SQL syntax, which starts with WHERE clause, empty for no filter |
| Fields | `fields` | array[string] | The fields to return in the documents. If empty then all fields will be returned |
| Minimum Score | `min-score` | number | Minimum score to consider for search results. If empty then no minimum score will be considered |



| Output | ID | Type | Description |
| :--- | :--- | :--- | :--- |
| Status | `status` | string | Search operation status |
| Result | `result` | object | Result of the vector search operation |






### Index

Index a document into Elasticsearch


| Input | ID | Type | Description |
| :--- | :--- | :--- | :--- |
| Task ID (required) | `task` | string | `TASK_INDEX` |
| Index Name (required) | `index-name` | string | Name of the Elasticsearch index |
| ID | `id` | string | The ID of the document |
| Data (required) | `data` | object | Data to be indexed |



| Output | ID | Type | Description |
| :--- | :--- | :--- | :--- |
| Status | `status` | string | Index operation status |






### Multi Index

Index multiple documents into Elasticsearch with bulk API


| Input | ID | Type | Description |
| :--- | :--- | :--- | :--- |
| Task ID (required) | `task` | string | `TASK_MULTI_INDEX` |
| Index Name (required) | `index-name` | string | Name of the Elasticsearch index |
| Array ID | `array-id` | array[string] | The array of id |
| Array Data (required) | `array-data` | array[object] | Array data to be indexed |



| Output | ID | Type | Description |
| :--- | :--- | :--- | :--- |
| Status | `status` | string | Index operation status |






### Update

Update a document in Elasticsearch


| Input | ID | Type | Description |
| :--- | :--- | :--- | :--- |
| Task ID (required) | `task` | string | `TASK_UPDATE` |
| Index Name (required) | `index-name` | string | Name of the Elasticsearch index |
| ID | `id` | string | The ID of the document |
| Query | `query` | string | Full text search query for update task, query will be prioritised over filter if both are provided, if both query and filter are not provided, all documents will be selected |
| Filter | `filter` | object | The query dsl filter which starts with "query" field, please refer to https://www.elastic.co/guide/en/elasticsearch/reference/current/query-filter-context.html |
| Filter SQL | `filter-sql` | string | The filter to be applied to the data with SQL syntax, which starts with WHERE clause, empty for no filter |
| Update (required) | `update-data` | object | Update data |



| Output | ID | Type | Description |
| :--- | :--- | :--- | :--- |
| Status | `status` | string | Update operation status |






### Delete

Delete documents from Elasticsearch


| Input | ID | Type | Description |
| :--- | :--- | :--- | :--- |
| Task ID (required) | `task` | string | `TASK_DELETE` |
| Index Name (required) | `index-name` | string | Name of the Elasticsearch index |
| ID | `id` | string | The ID of the document |
| Query | `query` | string | Full text search query for delete task, query will be prioritised over filter if both are provided, if both query and filter are not provided, all documents will be selected |
| Filter | `filter` | object | The query dsl filter which starts with "query" field, please refer to https://www.elastic.co/guide/en/elasticsearch/reference/current/query-filter-context.html |
| Filter SQL | `filter-sql` | string | The filter to be applied to the data with SQL syntax, which starts with WHERE clause, empty for no filter |



| Output | ID | Type | Description |
| :--- | :--- | :--- | :--- |
| Status | `status` | string | Delete operation status |






### Create Index

Create an index in Elasticsearch


| Input | ID | Type | Description |
| :--- | :--- | :--- | :--- |
| Task ID (required) | `task` | string | `TASK_CREATE_INDEX` |
| Index Name (required) | `index-name` | string | Name of the Elasticsearch index |
| Mappings | `mappings` | object | Index mappings which starts with \{"mappings":\{"properties"\}\} field, please refer to https://www.elastic.co/guide/en/elasticsearch/reference/current/dense-vector.html for vector search and https://www.elastic.co/guide/en/elasticsearch/reference/current/mapping-types.html for other mappings |



| Output | ID | Type | Description |
| :--- | :--- | :--- | :--- |
| Status | `status` | string | Create index operation status |






### Delete Index

Delete an index in Elasticsearch


| Input | ID | Type | Description |
| :--- | :--- | :--- | :--- |
| Task ID (required) | `task` | string | `TASK_DELETE_INDEX` |
| Index Name (required) | `index-name` | string | Name of the Elasticsearch index |



| Output | ID | Type | Description |
| :--- | :--- | :--- | :--- |
| Status | `status` | string | Delete index operation status |







Documentation

Index

Constants

View Source
const (
	TaskSearch       = "TASK_SEARCH"
	TaskVectorSearch = "TASK_VECTOR_SEARCH"
	TaskIndex        = "TASK_INDEX"
	TaskMultiIndex   = "TASK_MULTI_INDEX"
	TaskUpdate       = "TASK_UPDATE"
	TaskDelete       = "TASK_DELETE"
	TaskCreateIndex  = "TASK_CREATE_INDEX"
	TaskDeleteIndex  = "TASK_DELETE_INDEX"
)

Variables

This section is empty.

Functions

func CreateIndex

func CreateIndex(es *esapi.IndicesCreate, indexName string, mappings map[string]any) error

mappings refer to elasticsearch documentation for more information, use dense_vector type with similarity and dims fields pre-defined mappings is mandatory for vector search, if index isnt created with mappings, vector search will not work as dense_vector type doesn't explicitly defined

func DeleteDocument

func DeleteDocument(es *esapi.DeleteByQuery, esSQLTranslate *esapi.SQLTranslate, inputStruct DeleteInput) (int, error)

func DeleteIndex

func DeleteIndex(es *esapi.IndicesDelete, indexName string) error

func IndexDocument

func IndexDocument(es *esapi.Index, inputStruct IndexInput) error

func Init

func Init(bc base.Component) *component

Init returns an implementation of IConnector that interacts with Elasticsearch.

func MultiIndexDocument

func MultiIndexDocument(es *esapi.Bulk, inputStruct MultiIndexInput) (int, error)

func UpdateDocument

func UpdateDocument(es *esapi.UpdateByQuery, esSQLTranslate *esapi.SQLTranslate, inputStruct UpdateInput) (int, error)

Types

type CountResponse

type CountResponse struct {
	Count int `json:"count"`
}

type CreateIndexInput

type CreateIndexInput struct {
	IndexName string         `json:"index-name"`
	Mappings  map[string]any `json:"mappings"`
}

type CreateIndexOutput

type CreateIndexOutput struct {
	Status string `json:"status"`
}

type DeleteIndexInput

type DeleteIndexInput struct {
	IndexName string `json:"index-name"`
}

type DeleteIndexOutput

type DeleteIndexOutput struct {
	Status string `json:"status"`
}

type DeleteInput

type DeleteInput struct {
	ID        string         `json:"id"`
	Filter    map[string]any `json:"filter"`
	FilterSQL string         `json:"filter-sql"`
	Query     string         `json:"query"`
	IndexName string         `json:"index-name"`
}

type DeleteOutput

type DeleteOutput struct {
	Status string `json:"status"`
}

type DeleteUpdateResponse

type DeleteUpdateResponse struct {
	Took     int  `json:"took"`
	TimedOut bool `json:"timed_out"`
	Total    int  `json:"total"`
	Deleted  int  `json:"deleted"`
	Updated  int  `json:"updated"`
}

type ESBulk

type ESBulk func(body io.Reader, o ...func(*esapi.BulkRequest)) (*esapi.Response, error)

type ESClient

type ESClient struct {
	// contains filtered or unexported fields
}

type ESCount

type ESCount func(o ...func(*esapi.CountRequest)) (*esapi.Response, error)

type ESCreateIndex

type ESCreateIndex func(index string, o ...func(*esapi.IndicesCreateRequest)) (*esapi.Response, error)

type ESDelete

type ESDelete func(index []string, body io.Reader, o ...func(*esapi.DeleteByQueryRequest)) (*esapi.Response, error)

type ESDeleteIndex

type ESDeleteIndex func(index []string, o ...func(*esapi.IndicesDeleteRequest)) (*esapi.Response, error)

type ESIndex

type ESIndex func(index string, body io.Reader, o ...func(*esapi.IndexRequest)) (*esapi.Response, error)

type ESSQLTranslate

type ESSQLTranslate func(body io.Reader, o ...func(*esapi.SQLTranslateRequest)) (*esapi.Response, error)

type ESSearch

type ESSearch func(o ...func(*esapi.SearchRequest)) (*esapi.Response, error)

type ESUpdate

type ESUpdate func(index []string, o ...func(*esapi.UpdateByQueryRequest)) (*esapi.Response, error)

type Hit

type Hit struct {
	Index  string         `json:"_index"`
	ID     string         `json:"_id"`
	Score  float64        `json:"_score"`
	Source map[string]any `json:"_source"`
}

func SearchDocument

func SearchDocument(es *esapi.Search, esSQLTranslate *esapi.SQLTranslate, inputStruct SearchInput) ([]Hit, error)

size is optional, empty means all documents min-score is optional, empty means no minimum score fields is optional, empty means all fields filter is optional, empty means no filter, choose one (id, filter, or filter-sql) id is optional, empty means no id, choose one (id, filter, or filter-sql) filter-sql is optional, empty means no filter-sql, choose one (id, filter, or filter-sql) query is optional, empty means no query, only for full text search

func VectorSearchDocument

func VectorSearchDocument(es *esapi.Search, esSQLTranslate *esapi.SQLTranslate, inputStruct VectorSearchInput) ([]Hit, error)

Only support vector search for now, for semantic search, we can use external model on other component combined with vector search size is optional, empty means all documents source-only, if true will return only source of documents, if false will return all fields (_id, _index, _score, _source) min-score is optional, empty means no minimum score fields is optional, empty means all fields filter is optional, empty means no filter

type IndexInput

type IndexInput struct {
	ID        string         `json:"id"`
	Data      map[string]any `json:"data"`
	IndexName string         `json:"index-name"`
}

type IndexOutput

type IndexOutput struct {
	Status string `json:"status"`
}

type MultiIndexInput

type MultiIndexInput struct {
	ArrayID   []string         `json:"array-id"`
	ArrayData []map[string]any `json:"array-data"`
	IndexName string           `json:"index-name"`
}

type MultiIndexOutput

type MultiIndexOutput struct {
	Status string `json:"status"`
}

type MultiIndexResponse

type MultiIndexResponse struct {
	Errors bool  `json:"errors"`
	Items  []any `json:"items"`
	Took   int   `json:"took"`
}

type SearchInput

type SearchInput struct {
	ID        string         `json:"id"`
	Fields    []string       `json:"fields"`
	MinScore  float64        `json:"min-score"`
	Filter    map[string]any `json:"filter"`
	FilterSQL string         `json:"filter-sql"`
	Query     string         `json:"query"`
	IndexName string         `json:"index-name"`
	Size      int            `json:"size"`
}

type SearchOutput

type SearchOutput struct {
	Result SearchResult `json:"result"`
	Status string       `json:"status"`
}

type SearchResponse

type SearchResponse struct {
	Took     int  `json:"took"`
	TimedOut bool `json:"timed_out"`
	Shards   struct {
		Total      int `json:"total"`
		Successful int `json:"successful"`
		Skipped    int `json:"skipped"`
		Failed     int `json:"failed"`
	} `json:"_shards"`
	Hits struct {
		Total struct {
			Value    int    `json:"value"`
			Relation string `json:"relation"`
		} `json:"total"`
		MaxScore float64 `json:"max_score"`
		Hits     []Hit   `json:"hits"`
	} `json:"hits"`
}

type SearchResult

type SearchResult struct {
	IDs       []string         `json:"ids"`
	Documents []map[string]any `json:"documents"`
	Data      []map[string]any `json:"data"`
}

type UpdateInput

type UpdateInput struct {
	ID        string         `json:"id"`
	Update    map[string]any `json:"update-data"`
	Filter    map[string]any `json:"filter"`
	FilterSQL string         `json:"filter-sql"`
	Query     string         `json:"query"`
	IndexName string         `json:"index-name"`
}

type UpdateOutput

type UpdateOutput struct {
	Status string `json:"status"`
}

type VectorResult

type VectorResult struct {
	IDs       []string         `json:"ids"`
	Documents []map[string]any `json:"documents"`
	Vectors   [][]float64      `json:"vectors"`
	Metadata  []map[string]any `json:"metadata"`
}

type VectorSearchInput

type VectorSearchInput struct {
	Filter        map[string]any `json:"filter"`
	FilterSQL     string         `json:"filter-sql"`
	IndexName     string         `json:"index-name"`
	Field         string         `json:"field"`
	Fields        []string       `json:"fields"`
	QueryVector   []float64      `json:"query-vector"`
	K             int            `json:"k"`
	NumCandidates int            `json:"num-candidates"`
	MinScore      float64        `json:"min-score"`
}

type VectorSearchOutput

type VectorSearchOutput struct {
	Status string       `json:"status"`
	Result VectorResult `json:"result"`
}

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL