instill

package
v0.29.0-beta Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Sep 30, 2024 License: MIT Imports: 20 Imported by: 0

README

---
title: "Instill Model"
lang: "en-US"
draft: false
description: "Learn about how to set up a VDP Instill Model component https://github.com/instill-ai/instill-core"
---

The Instill Model component is an AI component that allows users to connect the AI models served on the Instill Model Platform.
It can carry out the following tasks:
- [Classification](#classification)
- [Instance Segmentation](#instance-segmentation)
- [Keypoint](#keypoint)
- [Detection](#detection)
- [OCR](#ocr)
- [Semantic Segmentation](#semantic-segmentation)
- [Text Generation](#text-generation)
- [Text Generation Chat](#text-generation-chat)
- [Text to Image](#text-to-image)
- [Visual Question Answering](#visual-question-answering)
- [Chat](#chat)

## Release Stage

`Alpha`

## Configuration

The component definition and tasks are defined in the [definition.json](https://github.com/instill-ai/component/blob/main/ai/instill/v0/config/definition.json) and [tasks.json](https://github.com/instill-ai/component/blob/main/ai/instill/v0/config/tasks.json) files respectively.



## Supported Tasks

### Classification

Classify images into predefined categories.

<div class="markdown-col-no-wrap" data-col-1 data-col-2>

| Input | ID | Type | Description |
| :--- | :--- | :--- | :--- |
| Task ID (required) | `task` | string | `TASK_CLASSIFICATION` |
| Model Name (required) | `model-name` | string | The Instill Model model to be used. |
| Image (required) | `image-base64` | string | Image base64 |
</div>






<div class="markdown-col-no-wrap" data-col-1 data-col-2>

| Output | ID | Type | Description |
| :--- | :--- | :--- | :--- |
| Category | `category` | string | The predicted category of the input. |
| Score | `score` | number | The confidence score of the predicted category of the input. |
</div>

### Instance Segmentation

Detect, localize and delineate multiple objects in images.

<div class="markdown-col-no-wrap" data-col-1 data-col-2>

| Input | ID | Type | Description |
| :--- | :--- | :--- | :--- |
| Task ID (required) | `task` | string | `TASK_INSTANCE_SEGMENTATION` |
| Model Name (required) | `model-name` | string | The Instill Model model to be used. |
| Image (required) | `image-base64` | string | Image base64 |
</div>






<div class="markdown-col-no-wrap" data-col-1 data-col-2>

| Output | ID | Type | Description |
| :--- | :--- | :--- | :--- |
| [Objects](#instance-segmentation-objects) | `objects` | array[object] | A list of detected instance bounding boxes. |
</div>

<details>
<summary> Output Objects in Instance Segmentation</summary>

<h4 id="instance-segmentation-objects">Objects</h4>

<div class="markdown-col-no-wrap" data-col-1 data-col-2>

| Field | Field ID | Type | Note |
| :--- | :--- | :--- | :--- |
| [Bounding Box](#instance-segmentation-bounding-box) | `bounding-box` | object | The detected bounding box in (left, top, width, height) format. |
| Category | `category` | string | The predicted category of the bounding box. |
| RLE | `rle` | string | Run Length Encoding (RLE) of instance mask within the bounding box. |
| Score | `score` | number | The confidence score of the predicted instance object. |
</div>

<h4 id="instance-segmentation-bounding-box">Bounding Box</h4>

<div class="markdown-col-no-wrap" data-col-1 data-col-2>

| Field | Field ID | Type | Note |
| :--- | :--- | :--- | :--- |
| Height | `height` | number | Bounding box height value |
| Left | `left` | number | Bounding box left x-axis value |
| Top | `top` | number | Bounding box top y-axis value |
| Width | `width` | number | Bounding box width value |
</div>
</details>

### Keypoint

Detect and localize multiple keypoints of objects in images.

<div class="markdown-col-no-wrap" data-col-1 data-col-2>

| Input | ID | Type | Description |
| :--- | :--- | :--- | :--- |
| Task ID (required) | `task` | string | `TASK_KEYPOINT` |
| Model Name (required) | `model-name` | string | The Instill Model model to be used. |
| Image (required) | `image-base64` | string | Image base64 |
</div>






<div class="markdown-col-no-wrap" data-col-1 data-col-2>

| Output | ID | Type | Description |
| :--- | :--- | :--- | :--- |
| [Objects](#keypoint-objects) | `objects` | array[object] | A list of keypoint objects, a keypoint object includes all the pre-defined keypoints of a detected object. |
</div>

<details>
<summary> Output Objects in Keypoint</summary>

<h4 id="keypoint-objects">Objects</h4>

<div class="markdown-col-no-wrap" data-col-1 data-col-2>

| Field | Field ID | Type | Note |
| :--- | :--- | :--- | :--- |
| [Bounding Box](#keypoint-bounding-box) | `bounding-box` | object | The detected bounding box in (left, top, width, height) format. |
| [Keypoints](#keypoint-keypoints) | `keypoints` | array | A keypoint group is composed of a list of pre-defined keypoints of a detected object. |
| Score | `score` | number | The confidence score of the predicted object. |
</div>

<h4 id="keypoint-keypoints">Keypoints</h4>

<div class="markdown-col-no-wrap" data-col-1 data-col-2>

| Field | Field ID | Type | Note |
| :--- | :--- | :--- | :--- |
| Visibility Score | `v` | number | visibility score of the keypoint. |
| X Coordinate | `x` | number | x coordinate of the keypoint. |
| Y Coordinate | `y` | number | y coordinate of the keypoint. |
</div>

<h4 id="keypoint-bounding-box">Bounding Box</h4>

<div class="markdown-col-no-wrap" data-col-1 data-col-2>

| Field | Field ID | Type | Note |
| :--- | :--- | :--- | :--- |
| Height | `height` | number | Bounding box height value |
| Left | `left` | number | Bounding box left x-axis value |
| Top | `top` | number | Bounding box top y-axis value |
| Width | `width` | number | Bounding box width value |
</div>
</details>

### Detection

Detect and localize multiple objects in images.

<div class="markdown-col-no-wrap" data-col-1 data-col-2>

| Input | ID | Type | Description |
| :--- | :--- | :--- | :--- |
| Task ID (required) | `task` | string | `TASK_DETECTION` |
| Model Name (required) | `model-name` | string | The Instill Model model to be used. |
| Image (required) | `image-base64` | string | Image base64 |
</div>






<div class="markdown-col-no-wrap" data-col-1 data-col-2>

| Output | ID | Type | Description |
| :--- | :--- | :--- | :--- |
| [Objects](#detection-objects) | `objects` | array[object] | A list of detected objects. |
</div>

<details>
<summary> Output Objects in Detection</summary>

<h4 id="detection-objects">Objects</h4>

<div class="markdown-col-no-wrap" data-col-1 data-col-2>

| Field | Field ID | Type | Note |
| :--- | :--- | :--- | :--- |
| [Bounding box](#detection-bounding-box) | `bounding-box` | object | The detected bounding box in (left, top, width, height) format. |
| Category | `category` | string | The predicted category of the bounding box. |
| Score | `score` | number | The confidence score of the predicted category of the bounding box. |
</div>

<h4 id="detection-bounding-box">Bounding Box</h4>

<div class="markdown-col-no-wrap" data-col-1 data-col-2>

| Field | Field ID | Type | Note |
| :--- | :--- | :--- | :--- |
| Height | `height` | number | Bounding box height value |
| Left | `left` | number | Bounding box left x-axis value |
| Top | `top` | number | Bounding box top y-axis value |
| Width | `width` | number | Bounding box width value |
</div>
</details>

### OCR

Detect and recognize text in images.

<div class="markdown-col-no-wrap" data-col-1 data-col-2>

| Input | ID | Type | Description |
| :--- | :--- | :--- | :--- |
| Task ID (required) | `task` | string | `TASK_OCR` |
| Model Name (required) | `model-name` | string | The Instill Model model to be used. |
| Image (required) | `image-base64` | string | Image base64 |
</div>






<div class="markdown-col-no-wrap" data-col-1 data-col-2>

| Output | ID | Type | Description |
| :--- | :--- | :--- | :--- |
| [Objects](#ocr-objects) | `objects` | array[object] | A list of detected bounding boxes. |
</div>

<details>
<summary> Output Objects in OCR</summary>

<h4 id="ocr-objects">Objects</h4>

<div class="markdown-col-no-wrap" data-col-1 data-col-2>

| Field | Field ID | Type | Note |
| :--- | :--- | :--- | :--- |
| [Bounding Box](#ocr-bounding-box) | `bounding-box` | object | The detected bounding box in (left, top, width, height) format. |
| Score | `score` | number | The confidence score of the predicted object. |
| Text | `text` | string | Text string recognised per bounding box. |
</div>

<h4 id="ocr-bounding-box">Bounding Box</h4>

<div class="markdown-col-no-wrap" data-col-1 data-col-2>

| Field | Field ID | Type | Note |
| :--- | :--- | :--- | :--- |
| Height | `height` | number | Bounding box height value |
| Left | `left` | number | Bounding box left x-axis value |
| Top | `top` | number | Bounding box top y-axis value |
| Width | `width` | number | Bounding box width value |
</div>
</details>

### Semantic Segmentation

Classify image pixels into predefined categories.

<div class="markdown-col-no-wrap" data-col-1 data-col-2>

| Input | ID | Type | Description |
| :--- | :--- | :--- | :--- |
| Task ID (required) | `task` | string | `TASK_SEMANTIC_SEGMENTATION` |
| Model Name (required) | `model-name` | string | The Instill Model model to be used. |
| Image (required) | `image-base64` | string | Image base64 |
</div>






<div class="markdown-col-no-wrap" data-col-1 data-col-2>

| Output | ID | Type | Description |
| :--- | :--- | :--- | :--- |
| [Stuffs](#semantic-segmentation-stuffs) | `stuffs` | array[object] | A list of RLE binary masks. |
</div>

<details>
<summary> Output Objects in Semantic Segmentation</summary>

<h4 id="semantic-segmentation-stuffs">Stuffs</h4>

<div class="markdown-col-no-wrap" data-col-1 data-col-2>

| Field | Field ID | Type | Note |
| :--- | :--- | :--- | :--- |
| Category | `category` | string | Category text string corresponding to each stuff mask. |
| RLE | `rle` | string | Run Length Encoding (RLE) of each stuff mask within the image. |
</div>
</details>

### Text Generation

Generate texts from input text prompts.

<div class="markdown-col-no-wrap" data-col-1 data-col-2>

| Input | ID | Type | Description |
| :--- | :--- | :--- | :--- |
| Task ID (required) | `task` | string | `TASK_TEXT_GENERATION` |
| Model Name (required) | `model-name` | string | The Instill Model model to be used. |
| Prompt (required) | `prompt` | string | The prompt text |
| System message | `system-message` | string | The system message helps set the behavior of the assistant. For example, you can modify the personality of the assistant or provide specific instructions about how it should behave throughout the conversation. By default, the model’s behavior is using a generic message as "You are a helpful assistant." |
| Seed | `seed` | integer | The seed |
| Temperature | `temperature` | number | The temperature for sampling |
| Max new tokens | `max-new-tokens` | integer | The maximum number of tokens for model to generate |
</div>






<div class="markdown-col-no-wrap" data-col-1 data-col-2>

| Output | ID | Type | Description |
| :--- | :--- | :--- | :--- |
| Text | `text` | string | Text |
</div>

### Text Generation Chat

Generate texts from input text prompts and chat history.

<div class="markdown-col-no-wrap" data-col-1 data-col-2>

| Input | ID | Type | Description |
| :--- | :--- | :--- | :--- |
| Task ID (required) | `task` | string | `TASK_TEXT_GENERATION_CHAT` |
| Model Name (required) | `model-name` | string | The Instill Model model to be used. |
| Prompt (required) | `prompt` | string | The prompt text |
| System message | `system-message` | string | The system message helps set the behavior of the assistant. For example, you can modify the personality of the assistant or provide specific instructions about how it should behave throughout the conversation. By default, the model’s behavior is using a generic message as "You are a helpful assistant." |
| Prompt Images | `prompt-images` | array[string] | The prompt images |
| [Chat history](#text-generation-chat-chat-history) | `chat-history` | array[object] | Incorporate external chat history, specifically previous messages within the conversation. Please note that System Message will be ignored and will not have any effect when this field is populated. Each message should adhere to the format: : \{"role": "The message role, i.e. 'system', 'user' or 'assistant'", "content": "message content"\}. |
| Seed | `seed` | integer | The seed |
| Temperature | `temperature` | number | The temperature for sampling |
| Max new tokens | `max-new-tokens` | integer | The maximum number of tokens for model to generate |
</div>


<details>
<summary> Input Objects in Text Generation Chat</summary>

<h4 id="text-generation-chat-chat-history">Chat History</h4>

Incorporate external chat history, specifically previous messages within the conversation. Please note that System Message will be ignored and will not have any effect when this field is populated. Each message should adhere to the format: : \{"role": "The message role, i.e. 'system', 'user' or 'assistant'", "content": "message content"\}.

<div class="markdown-col-no-wrap" data-col-1 data-col-2>

| Field | Field ID | Type | Note |
| :--- | :--- | :--- | :--- |
| [Content](#text-generation-chat-content) | `content` | array | The message content  |
| Role | `role` | string | The message role, i.e. 'system', 'user' or 'assistant'  |
</div>
<h4 id="text-generation-chat-content">Content</h4>

The message content

<div class="markdown-col-no-wrap" data-col-1 data-col-2>

| Field | Field ID | Type | Note |
| :--- | :--- | :--- | :--- |
| [Image URL](#text-generation-chat-image-url) | `image-url` | object | The image URL  |
| Text | `text` | string | The text content.  |
| Type | `type` | string | The type of the content part.  <br/><details><summary><strong>Enum values</strong></summary><ul><li>`text`</li><li>`image-url`</li></ul></details>  |
</div>
<h4 id="text-generation-chat-image-url">Image URL</h4>

The image URL

<div class="markdown-col-no-wrap" data-col-1 data-col-2>

| Field | Field ID | Type | Note |
| :--- | :--- | :--- | :--- |
| URL | `url` | string | Either a URL of the image or the base64 encoded image data.  |
</div>
</details>



<div class="markdown-col-no-wrap" data-col-1 data-col-2>

| Output | ID | Type | Description |
| :--- | :--- | :--- | :--- |
| Text | `text` | string | Text |
</div>

### Text to Image

Generate images from input text prompts.

<div class="markdown-col-no-wrap" data-col-1 data-col-2>

| Input | ID | Type | Description |
| :--- | :--- | :--- | :--- |
| Task ID (required) | `task` | string | `TASK_TEXT_TO_IMAGE` |
| Model Name (required) | `model-name` | string | The Instill Model model to be used. |
| Prompt (required) | `prompt` | string | The prompt text |
| Samples | `samples` | integer | The number of generated samples, default is 1 |
| Seed | `seed` | integer | The seed, default is 0 |
| Aspect ratio | `negative-prompt` | string | Keywords of what you do not wish to see in the output image. |
| Aspect ratio | `aspect-ratio` | string | Controls the aspect ratio of the generated image. Defaults to 1:1. |
</div>






<div class="markdown-col-no-wrap" data-col-1 data-col-2>

| Output | ID | Type | Description |
| :--- | :--- | :--- | :--- |
| Images | `images` | array[string] | Images |
</div>

### Visual Question Answering

Answer questions based on a prompt and an image.

<div class="markdown-col-no-wrap" data-col-1 data-col-2>

| Input | ID | Type | Description |
| :--- | :--- | :--- | :--- |
| Task ID (required) | `task` | string | `TASK_VISUAL_QUESTION_ANSWERING` |
| Model Name (required) | `model-name` | string | The Instill Model model to be used. |
| Prompt (required) | `prompt` | string | The prompt text |
| System message | `system-message` | string | The system message helps set the behavior of the assistant. For example, you can modify the personality of the assistant or provide specific instructions about how it should behave throughout the conversation. By default, the model’s behavior is using a generic message as "You are a helpful assistant." |
| Prompt Images | `prompt-images` | array[string] | The prompt images |
| [Chat history](#visual-question-answering-chat-history) | `chat-history` | array[object] | Incorporate external chat history, specifically previous messages within the conversation. Please note that System Message will be ignored and will not have any effect when this field is populated. Each message should adhere to the format: : \{"role": "The message role, i.e. 'system', 'user' or 'assistant'", "content": "message content"\}. |
| Seed | `seed` | integer | The seed |
| Temperature | `temperature` | number | The temperature for sampling |
| Max new tokens | `max-new-tokens` | integer | The maximum number of tokens for model to generate |
</div>


<details>
<summary> Input Objects in Visual Question Answering</summary>

<h4 id="visual-question-answering-chat-history">Chat History</h4>

Incorporate external chat history, specifically previous messages within the conversation. Please note that System Message will be ignored and will not have any effect when this field is populated. Each message should adhere to the format: : \{"role": "The message role, i.e. 'system', 'user' or 'assistant'", "content": "message content"\}.

<div class="markdown-col-no-wrap" data-col-1 data-col-2>

| Field | Field ID | Type | Note |
| :--- | :--- | :--- | :--- |
| [Content](#visual-question-answering-content) | `content` | array | The message content  |
| Role | `role` | string | The message role, i.e. 'system', 'user' or 'assistant'  |
</div>
<h4 id="visual-question-answering-content">Content</h4>

The message content

<div class="markdown-col-no-wrap" data-col-1 data-col-2>

| Field | Field ID | Type | Note |
| :--- | :--- | :--- | :--- |
| [Image URL](#visual-question-answering-image-url) | `image-url` | object | The image URL  |
| Text | `text` | string | The text content.  |
| Type | `type` | string | The type of the content part.  <br/><details><summary><strong>Enum values</strong></summary><ul><li>`text`</li><li>`image-url`</li></ul></details>  |
</div>
<h4 id="visual-question-answering-image-url">Image URL</h4>

The image URL

<div class="markdown-col-no-wrap" data-col-1 data-col-2>

| Field | Field ID | Type | Note |
| :--- | :--- | :--- | :--- |
| URL | `url` | string | Either a URL of the image or the base64 encoded image data.  |
</div>
</details>



<div class="markdown-col-no-wrap" data-col-1 data-col-2>

| Output | ID | Type | Description |
| :--- | :--- | :--- | :--- |
| Text | `text` | string | Text |
</div>

### Chat

Generate texts from input text prompts and chat history.

<div class="markdown-col-no-wrap" data-col-1 data-col-2>

| Input | ID | Type | Description |
| :--- | :--- | :--- | :--- |
| Task ID (required) | `task` | string | `TASK_CHAT` |
| Model Name (required) | `model-name` | string | The Instill Model model to be used. |
| Prompt (required) | `prompt` | string | The prompt text |
| System message | `system-message` | string | The system message helps set the behavior of the assistant. For example, you can modify the personality of the assistant or provide specific instructions about how it should behave throughout the conversation. By default, the model’s behavior is using a generic message as "You are a helpful assistant." |
| Prompt Images | `prompt-images` | array[string] | The prompt images |
| [Chat history](#chat-chat-history) | `chat-history` | array[object] | Incorporate external chat history, specifically previous messages within the conversation. Please note that System Message will be ignored and will not have any effect when this field is populated. Each message should adhere to the format: : \{"role": "The message role, i.e. 'system', 'user' or 'assistant'", "content": "message content"\}. |
| Seed | `seed` | integer | The seed |
| Temperature | `temperature` | number | The temperature for sampling |
| Max new tokens | `max-new-tokens` | integer | The maximum number of tokens for model to generate |
</div>


<details>
<summary> Input Objects in Chat</summary>

<h4 id="chat-chat-history">Chat History</h4>

Incorporate external chat history, specifically previous messages within the conversation. Please note that System Message will be ignored and will not have any effect when this field is populated. Each message should adhere to the format: : \{"role": "The message role, i.e. 'system', 'user' or 'assistant'", "content": "message content"\}.

<div class="markdown-col-no-wrap" data-col-1 data-col-2>

| Field | Field ID | Type | Note |
| :--- | :--- | :--- | :--- |
| [Content](#chat-content) | `content` | array | The message content  |
| Role | `role` | string | The message role, i.e. 'system', 'user' or 'assistant'  |
</div>
<h4 id="chat-content">Content</h4>

The message content

<div class="markdown-col-no-wrap" data-col-1 data-col-2>

| Field | Field ID | Type | Note |
| :--- | :--- | :--- | :--- |
| [Image URL](#chat-image-url) | `image-url` | object | The image URL  |
| Text | `text` | string | The text content.  |
| Type | `type` | string | The type of the content part.  <br/><details><summary><strong>Enum values</strong></summary><ul><li>`text`</li><li>`image-url`</li></ul></details>  |
</div>
<h4 id="chat-image-url">Image URL</h4>

The image URL

<div class="markdown-col-no-wrap" data-col-1 data-col-2>

| Field | Field ID | Type | Note |
| :--- | :--- | :--- | :--- |
| URL | `url` | string | Either a URL of the image or the base64 encoded image data.  |
</div>
</details>



<div class="markdown-col-no-wrap" data-col-1 data-col-2>

| Output | ID | Type | Description |
| :--- | :--- | :--- | :--- |
| Text | `text` | string | Text |
</div>

Documentation

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

func Init

func Init(bc base.Component) *component

Types

type ChatParameter

type ChatParameter struct {
	MaxTokens   int     `json:"max-tokens,omitempty"`
	Seed        int     `json:"seed,omitempty"`
	N           int     `json:"n,omitempty"`
	Temperature float32 `json:"temperature,omitempty"`
	TopP        int     `json:"top-p,omitempty"`
}

type ChatRequestData

type ChatRequestData struct {
	Messages []Message `json:"messages,omitempty"`
}

type Content

type Content struct {
	Text        string `json:"text,omitempty"`
	ImageBase64 string `json:"image-base64,omitempty"`
	Type        string `json:"type,omitempty"`
}

type Message

type Message struct {
	Content []Content `json:"content,omitempty"`
	Role    string    `json:"role,omitempty"`
}

type ModelsResp

type ModelsResp struct {
	Models []struct {
		Name string `json:"name"`
		Task string `json:"task"`
	} `json:"models"`
}

type RequestWrapper

type RequestWrapper struct {
	Data      any `json:"data,omitempty"`
	Parameter any `json:"parameter,omitempty"`
}

type TextCompletionRequestData

type TextCompletionRequestData struct {
	Prompt        string `json:"prompt"`
	SystemMessage string `json:"system-message,omitempty"`
}

type TextCompletionRequestParameter

type TextCompletionRequestParameter struct {
	MaxTokens   int     `json:"max-tokens,omitempty"`
	Seed        int     `json:"seed,omitempty"`
	N           int     `json:"n,omitempty"`
	Temperature float32 `json:"temperature,omitempty"`
	TopP        int     `json:"top-p,omitempty"`
}

type TextGenerationInput

type TextGenerationInput struct {
	Prompt        string   `json:"prompt"`
	SystemMessage *string  `json:"system-message,omitempty"`
	PromptImages  []string `json:"prompt-images,omitempty"`

	// Note: We're currently sharing the same struct in the OpenAI component,
	// but this will be moved to the standardized format later.
	ChatHistory []*openai.TextMessage `json:"chat-history,omitempty"`
}

type TextToImageRequestData

type TextToImageRequestData struct {
	Prompt string `json:"prompt"`
}

type TextToImageRequestParameter

type TextToImageRequestParameter struct {
	AspectRatio    string `json:"aspect-ratio,omitempty"`
	NegativePrompt string `json:"negative-prompt,omitempty"`
	N              int    `json:"n,omitempty"`
	Seed           int    `json:"seed,omitempty"`
}

type VisionRequestData

type VisionRequestData struct {
	ImageBase64 string `json:"image-base64"`
	Type        string `json:"type"`
}

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL