openai

package

v0.29.0-beta Latest Latest Go to latest Published: Sep 30, 2024 License: MIT Imports: 20 Imported by: 0

Details

Valid go.mod file

The Go module system was introduced in Go 1.11 and is the official dependency management solution for Go.
Redistributable license

Redistributable licenses place minimal restrictions on how software can be used, modified, and redistributed.
Tagged version

Modules with tagged versions give importers more predictable builds.
Stable version

When a project reaches major version v1 it is considered stable.
Learn more about best practices

Repository

README ¶

---
title: "OpenAI"
lang: "en-US"
draft: false
description: "Learn about how to set up a VDP OpenAI component https://github.com/instill-ai/instill-core"
---

The OpenAI component is an AI component that allows users to connect the AI models served on the OpenAI Platform.
It can carry out the following tasks:
- [Text Generation](#text-generation)
- [Text Embeddings](#text-embeddings)
- [Speech Recognition](#speech-recognition)
- [Text to Speech](#text-to-speech)
- [Text to Image](#text-to-image)

## Release Stage

`Alpha`

## Configuration

The component definition and tasks are defined in the [definition.json](https://github.com/instill-ai/component/blob/main/ai/openai/v0/config/definition.json) and [tasks.json](https://github.com/instill-ai/component/blob/main/ai/openai/v0/config/tasks.json) files respectively.

## Setup


In order to communicate with OpenAI, the following connection details need to be
provided. You may specify them directly in a pipeline recipe as key-value pairs
within the component's `setup` block, or you can create a **Connection** from
the [**Integration Settings**](https://www.instill.tech/docs/vdp/integration)
page and reference the whole `setup` as `setup:
${connection.<my-connection-id>}`.

<div class="markdown-col-no-wrap" data-col-1 data-col-2>

| Field | Field ID | Type | Note |
| :--- | :--- | :--- | :--- |
| API Key | `api-key` | string | Fill in your OpenAI API key. To find your keys, visit your OpenAI's API Keys page.  |
| Organization ID | `organization` | string | Specify which organization is used for the requests. Usage will count against the specified organization's subscription quota.  |

</div>




## Supported Tasks

### Text Generation

OpenAI's text generation models (often called generative pre-trained transformers or large language models) have been trained to understand natural language, code, and images. The models provide text outputs in response to their inputs. The inputs to these models are also referred to as "prompts". Designing a prompt is essentially how you “program” a large language model model, usually by providing instructions or some examples of how to successfully complete a task.

<div class="markdown-col-no-wrap" data-col-1 data-col-2>

| Input | ID | Type | Description |
| :--- | :--- | :--- | :--- |
| Task ID (required) | `task` | string | `TASK_TEXT_GENERATION` |
| Model (required) | `model` | string | ID of the model to use. |
| Prompt (required) | `prompt` | string | The prompt text |
| System message | `system-message` | string | The system message helps set the behavior of the assistant. For example, you can modify the personality of the assistant or provide specific instructions about how it should behave throughout the conversation. By default, the model’s behavior is using a generic message as "You are a helpful assistant." |
| Image | `images` | array[string] | The images |
| [Chat history](#text-generation-chat-history) | `chat-history` | array[object] | Incorporate external chat history, specifically previous messages within the conversation. Please note that System Message will be ignored and will not have any effect when this field is populated. Each message should adhere to the format \{"role": "The message role, i.e. 'system', 'user' or 'assistant'", "content": "message content"\}. |
| Temperature | `temperature` | number | What sampling temperature to use, between 0 and 2. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic.  We generally recommend altering this or `top-p` but not both.  |
| N | `n` | integer | How many chat completion choices to generate for each input message. Note that you will be charged based on the number of generated tokens across all of the choices. Keep `n` as `1` to minimize costs. |
| Max Tokens | `max-tokens` | integer | The maximum number of tokens that can be generated in the chat completion.  The total length of input tokens and generated tokens is limited by the model's context length. |
| [Response Format](#text-generation-response-format) | `response-format` | object | Response format. |
| Top P | `top-p` | number | An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered.  We generally recommend altering this or `temperature` but not both.  |
| Presence Penalty | `presence-penalty` | number | Number between -2.0 and 2.0. Positive values penalize new tokens based on whether they appear in the text so far, increasing the model's likelihood to talk about new topics. |
| Frequency Penalty | `frequency-penalty` | number | Number between -2.0 and 2.0. Positive values penalize new tokens based on their existing frequency in the text so far, decreasing the model's likelihood to repeat the same line verbatim. |
</div>


<details>
<summary> Input Objects in Text Generation</summary>

<h4 id="text-generation-chat-history">Chat History</h4>

Incorporate external chat history, specifically previous messages within the conversation. Please note that System Message will be ignored and will not have any effect when this field is populated. Each message should adhere to the format \{"role": "The message role, i.e. 'system', 'user' or 'assistant'", "content": "message content"\}.

<div class="markdown-col-no-wrap" data-col-1 data-col-2>

| Field | Field ID | Type | Note |
| :--- | :--- | :--- | :--- |
| [Content](#text-generation-content) | `content` | array | The message content  |
| Role | `role` | string | The message role, i.e. 'system', 'user' or 'assistant'  |
</div>
<h4 id="text-generation-content">Content</h4>

The message content

<div class="markdown-col-no-wrap" data-col-1 data-col-2>

| Field | Field ID | Type | Note |
| :--- | :--- | :--- | :--- |
| [Image URL](#text-generation-image-url) | `image-url` | object | The image URL  |
| Text | `text` | string | The text content.  |
| Type | `type` | string | The type of the content part.  <br/><details><summary><strong>Enum values</strong></summary><ul><li>`text`</li><li>`image-url`</li></ul></details>  |
</div>
<h4 id="text-generation-image-url">Image URL</h4>

The image URL

<div class="markdown-col-no-wrap" data-col-1 data-col-2>

| Field | Field ID | Type | Note |
| :--- | :--- | :--- | :--- |
| URL | `url` | string | Either a URL of the image or the base64 encoded image data.  |
</div>
</details>

<details>
<summary>The <code>response-format</code> Object </summary>

<h4 id="text-generation-response-format">Response Format</h4>

`response-format` must fulfill one of the following schemas:

<h5 id="text-generation-text"><code>Text</code></h5>

<div class="markdown-col-no-wrap" data-col-1 data-col-2>

| Field | Field ID | Type | Note |
| :--- | :--- | :--- | :--- |
| Type | `type` | string |  Must be `"text"`   |
</div>

<h5 id="text-generation-json-object"><code>JSON Object</code></h5>

<div class="markdown-col-no-wrap" data-col-1 data-col-2>

| Field | Field ID | Type | Note |
| :--- | :--- | :--- | :--- |
| Type | `type` | string |  Must be `"json_object"`   |
</div>

<h5 id="text-generation-json-schema"><code>JSON Schema</code></h5>

<div class="markdown-col-no-wrap" data-col-1 data-col-2>

| Field | Field ID | Type | Note |
| :--- | :--- | :--- | :--- |
| JSON Schema | `json-schema` | string |  Set up the schema of the structured output.  |
| Type | `type` | string |  Must be `"json_schema"`   |
</div>
</details>

<div class="markdown-col-no-wrap" data-col-1 data-col-2>

| Output | ID | Type | Description |
| :--- | :--- | :--- | :--- |
| Texts | `texts` | array[string] | Texts |
| [Usage](#text-generation-usage) (optional) | `usage` | object | Usage statistics related to the query |
</div>

<details>
<summary> Output Objects in Text Generation</summary>

<h4 id="text-generation-usage">Usage</h4>

<div class="markdown-col-no-wrap" data-col-1 data-col-2>

| Field | Field ID | Type | Note |
| :--- | :--- | :--- | :--- |
| Completion tokens | `completion-tokens` | integer | Total number of tokens used (completion) |
| Prompt tokens | `prompt-tokens` | integer | Total number of tokens used (prompt) |
| Total tokens | `total-tokens` | integer | Total number of tokens used (prompt + completion) |
</div>
</details>

### Text Embeddings

Turn text into numbers, unlocking use cases like search.

<div class="markdown-col-no-wrap" data-col-1 data-col-2>

| Input | ID | Type | Description |
| :--- | :--- | :--- | :--- |
| Task ID (required) | `task` | string | `TASK_TEXT_EMBEDDINGS` |
| Model (required) | `model` | string | ID of the model to use. |
| Text (required) | `text` | string | The text |
| Dimensions | `dimensions` | integer | The number of dimensions the resulting output embeddings should have. Only supported in text-embedding-3 and later models. |
</div>






<div class="markdown-col-no-wrap" data-col-1 data-col-2>

| Output | ID | Type | Description |
| :--- | :--- | :--- | :--- |
| Embedding | `embedding` | array[number] | Embedding of the input text |
</div>

### Speech Recognition

Turn audio into text.

<div class="markdown-col-no-wrap" data-col-1 data-col-2>

| Input | ID | Type | Description |
| :--- | :--- | :--- | :--- |
| Task ID (required) | `task` | string | `TASK_SPEECH_RECOGNITION` |
| Model (required) | `model` | string | ID of the model to use. Only `whisper-1` is currently available.  |
| Audio (required) | `audio` | string | The audio file object (not file name) to transcribe, in one of these formats: flac, mp3, mp4, mpeg, mpga, m4a, ogg, wav, or webm.  |
| Prompt | `prompt` | string | An optional text to guide the model's style or continue a previous audio segment. The prompt should match the audio language.  |
| Language | `language` | string | The language of the input audio. Supplying the input language in <a href="https://en.wikipedia.org/wiki/List_of_ISO_639-1_codes">ISO-639-1</a> format will improve accuracy and latency.  |
| Temperature | `temperature` | number | The sampling temperature, between 0 and 1. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic. If set to 0, the model will use <a href="https://en.wikipedia.org/wiki/Log_probability">log probability</a> to automatically increase the temperature until certain thresholds are hit.  |
</div>






<div class="markdown-col-no-wrap" data-col-1 data-col-2>

| Output | ID | Type | Description |
| :--- | :--- | :--- | :--- |
| Text | `text` | string | Generated text |
</div>

### Text to Speech

Turn text into lifelike spoken audio

<div class="markdown-col-no-wrap" data-col-1 data-col-2>

| Input | ID | Type | Description |
| :--- | :--- | :--- | :--- |
| Task ID (required) | `task` | string | `TASK_TEXT_TO_SPEECH` |
| Model (required) | `model` | string | One of the available TTS models: `tts-1` or `tts-1-hd`  |
| Text (required) | `text` | string | The text to generate audio for. The maximum length is 4096 characters. |
| Voice (required) | `voice` | string | The voice to use when generating the audio. Supported voices are `alloy`, `echo`, `fable`, `onyx`, `nova`, and `shimmer`. |
| Response Format | `response-format` | string | The format to audio in. Supported formats are `mp3`, `opus`, `aac`, and `flac`. |
| Speed | `speed` | number | The speed of the generated audio. Select a value from `0.25` to `4.0`. `1.0` is the default. |
</div>






<div class="markdown-col-no-wrap" data-col-1 data-col-2>

| Output | ID | Type | Description |
| :--- | :--- | :--- | :--- |
| Audio (optional) | `audio` | string | AI generated audio |
</div>

### Text to Image

Generate or manipulate images with DALL·E.

<div class="markdown-col-no-wrap" data-col-1 data-col-2>

| Input | ID | Type | Description |
| :--- | :--- | :--- | :--- |
| Task ID (required) | `task` | string | `TASK_TEXT_TO_IMAGE` |
| Model (required) | `model` | string | The model to use for image generation. |
| Prompt (required) | `prompt` | string | A text description of the desired image(s). The maximum length is 1000 characters for `dall-e-2` and 4000 characters for `dall-e-3`. |
| N | `n` | integer | The number of images to generate. Must be between 1 and 10. For `dall-e-3`, only `n=1` is supported. |
| Quality | `quality` | string | The quality of the image that will be generated. `hd` creates images with finer details and greater consistency across the image. This param is only supported for `dall-e-3`. |
| Size | `size` | string | The size of the generated images. Must be one of `256x256`, `512x512`, or `1024x1024` for `dall-e-2`. Must be one of `1024x1024`, `1792x1024`, or `1024x1792` for `dall-e-3` models. |
| N | `style` | string | The style of the generated images. Must be one of `vivid` or `natural`. Vivid causes the model to lean towards generating hyper-real and dramatic images. Natural causes the model to produce more natural, less hyper-real looking images. This param is only supported for `dall-e-3`. |
</div>






<div class="markdown-col-no-wrap" data-col-1 data-col-2>

| Output | ID | Type | Description |
| :--- | :--- | :--- | :--- |
| [Images](#text-to-image-images) | `results` | array[object] | Generated results |
</div>

<details>
<summary> Output Objects in Text to Image</summary>

<h4 id="text-to-image-images">Images</h4>

<div class="markdown-col-no-wrap" data-col-1 data-col-2>

| Field | Field ID | Type | Note |
| :--- | :--- | :--- | :--- |
| Generated Image | `image` | string | Generated image |
| Revised Prompt | `revised-prompt` | string | Revised prompt |
</div>
</details>
## Example Recipes

Recipe for the [PicassoAI: Cubist Creations at Your Command!](https://instill.tech/instill-ai/pipelines/picasso-ai/playground) pipeline.

```yaml
version: v1beta
component:
  mistral-0:
    type: mistral-ai
    task: TASK_TEXT_GENERATION_CHAT
    input:
      max-new-tokens: 100
      model-name: open-mixtral-8x22b
      prompt: |-
        Generate a Picasso-inspired image based on the following user input:

        ${variable.prompt}

        Using the specified Picasso period: ${variable.period}


        Transform this input into a detailed text-to-image prompt by:

        1. Identifying the key elements or subjects in the user's description

        2. Adding artistic elements and techniques specific to the ${variable.period} period of Picasso's work

        3. Including cubist or abstract features characteristic of the ${variable.period}

        4. Suggesting a composition or scene layout typical of Picasso's work from this era

        Enhance the prompt with vivid, descriptive language and specific Picasso-style elements from the ${variable.period}. The final prompt should begin with "Create an image in the style of Picasso's ${variable.period} period:" followed by the enhanced description.
      safe: false
      system-message: You are a helpful assistant.
      temperature: 0.7
      top-k: 10
      top-p: 0.5
    setup:
      api-key: ${secret.INSTILL_SECRET}
  openai-0:
    type: openai
    task: TASK_TEXT_TO_IMAGE
    input:
      model: dall-e-3
      n: 1
      prompt: |-
        Using this primary color palette: ${variable.colour}

        ${mistral-0.output.text}
      quality: standard
      size: 1024x1024
      style: vivid
    setup:
      api-key: ${secret.INSTILL_SECRET}
variable:
  colour:
    title: Colour
    description: Describe the main colour to use i.e. blue, random
    instill-format: string
    instill-ui-order: 1
  period:
    title: Period
    description: |
      Input different Picasso periods i.e. Blue, Rose, African, Synthetic Cubism, etc.
    instill-format: string
  prompt:
    title: Prompt
    description: Input prompt here i.e. "A cute baby wombat"
    instill-format: string
output:
  image:
    title: Image
    value: ${openai-0.output.results}
```

Recipe for the [Explain this topic to me in another language](https://instill.tech/instill-ai/pipelines/gpt-4o-mini-demo/playground) pipeline.

```yaml
version: v1beta
component:
  openai:
    type: openai
    task: TASK_TEXT_GENERATION
    input:
      model: gpt-4o-mini
      n: 1
      prompt: |-
        Talk about this topic in ${variable.language}  in a concise and beginner-friendly way:
        ${variable.prompt}
      response-format:
        type: text
      system-message: You are a helpful assistant.
      temperature: 1
      top-p: 1
    setup:
      api-key: ${secret.INSTILL_SECRET}
variable:
  language:
    title: Language
    description: Input a language i.e. Chinese, Japanese, French, etc.
    instill-format: string
  prompt:
    title: Prompt
    description: Write the topic you want to ask about here i.e. "Tell me about small LLMs"
    instill-format: string
output:
  result:
    title: Result
    value: ${openai.output.texts}
```

Documentation ¶

Index ¶

Constants
func Init(bc base.Component) *component
type AudioTranscriptionInput
type AudioTranscriptionReq
type AudioTranscriptionResp
type Content
type Data
type ImageGenerationsOutput
type ImageGenerationsOutputResult
type ImageGenerationsReq
type ImageGenerationsResp
type ImageGenerationsRespData
type ImageURL
type ImagesGenerationInput
type ListModelsResponse
type Model
type ModelPermission
type TextCompletionInput
type TextCompletionOutput
type TextEmbeddingsInput
type TextEmbeddingsOutput
type TextEmbeddingsReq
type TextEmbeddingsResp
type TextMessage
type TextToSpeechInput
type TextToSpeechOutput
type TextToSpeechReq

Constants ¶

View Source

const (
	TextGenerationTask    = "TASK_TEXT_GENERATION"
	TextEmbeddingsTask    = "TASK_TEXT_EMBEDDINGS"
	SpeechRecognitionTask = "TASK_SPEECH_RECOGNITION"
	TextToSpeechTask      = "TASK_TEXT_TO_SPEECH"
	TextToImageTask       = "TASK_TEXT_TO_IMAGE"
)

Variables ¶

This section is empty.

Functions ¶

func Init ¶

func Init(bc base.Component) *component

Init returns an initialized OpenAI connector.

Types ¶

type AudioTranscriptionInput ¶

type AudioTranscriptionInput struct {
	Audio       string   `json:"audio"`
	Model       string   `json:"model"`
	Prompt      *string  `json:"prompt,omitempty"`
	Temperature *float64 `json:"temperature,omitempty"`
	Language    *string  `json:"language,omitempty"`
}

type AudioTranscriptionReq ¶

type AudioTranscriptionReq struct {
	File           []byte   `json:"file"`
	Model          string   `json:"model"`
	Prompt         *string  `json:"prompt,omitempty"`
	Language       *string  `json:"language,omitempty"`
	Temperature    *float64 `json:"temperature,omitempty"`
	ResponseFormat string   `json:"response_format,omitempty"`
}

type AudioTranscriptionResp ¶

type AudioTranscriptionResp struct {
	Text     string  `json:"text"`
	Duration float32 `json:"duration"`
}

type Content ¶

type Content struct {
	Type     string    `json:"type"`
	Text     *string   `json:"text,omitempty"`
	ImageURL *ImageURL `json:"image_url,omitempty"`
}

type Data ¶

type Data struct {
	Object    string    `json:"object"`
	Embedding []float64 `json:"embedding"`
	Index     int       `json:"index"`
}

type ImageGenerationsOutput ¶

type ImageGenerationsOutput struct {
	Results []ImageGenerationsOutputResult `json:"results"`
}

type ImageGenerationsOutputResult ¶

type ImageGenerationsOutputResult struct {
	Image         string `json:"image"`
	RevisedPrompt string `json:"revised-prompt"`
}

type ImageGenerationsReq ¶

type ImageGenerationsReq struct {
	Prompt         string  `json:"prompt"`
	Model          string  `json:"model"`
	N              *int    `json:"n,omitempty"`
	Quality        *string `json:"quality,omitempty"`
	Size           *string `json:"size,omitempty"`
	Style          *string `json:"style,omitempty"`
	ResponseFormat string  `json:"response_format"`
}

type ImageGenerationsResp ¶

type ImageGenerationsResp struct {
	Data []ImageGenerationsRespData `json:"data"`
}

type ImageGenerationsRespData ¶

type ImageGenerationsRespData struct {
	Image         string `json:"b64_json"`
	RevisedPrompt string `json:"revised_prompt"`
}

type ImageURL ¶

type ImageURL struct {
	URL string `json:"url"`
}

type ImagesGenerationInput ¶

type ImagesGenerationInput struct {
	Prompt  string  `json:"prompt"`
	Model   string  `json:"model"`
	N       *int    `json:"n,omitempty"`
	Quality *string `json:"quality,omitempty"`
	Size    *string `json:"size,omitempty"`
	Style   *string `json:"style,omitempty"`
}

type ListModelsResponse ¶

type ListModelsResponse struct {
	Object string  `json:"object"`
	Data   []Model `json:"data"`
}

type Model ¶

type Model struct {
	ID         string            `json:"id"`
	Object     string            `json:"object"`
	Created    int               `json:"created"`
	OwnedBy    string            `json:"owned_by"`
	Permission []ModelPermission `json:"permission"`
	Root       string            `json:"root"`
}

Model represents a OpenAI Model

type ModelPermission ¶

type ModelPermission struct {
	ID                 string `json:"id"`
	Object             string `json:"object"`
	Created            int    `json:"created"`
	AllowCreateEngine  bool   `json:"allow_create_engine"`
	AllowSampling      bool   `json:"allow_sampling"`
	AllowLogprobs      bool   `json:"allow_logprobs"`
	AllowSearchIndices bool   `json:"allow_search_indices"`
	AllowView          bool   `json:"allow_view"`
	AllowFineTuning    bool   `json:"allow_fine_tuning"`
	Organization       string `json:"organization"`
	IsBlocking         bool   `json:"is_blocking"`
}

type TextCompletionInput ¶

type TextCompletionInput struct {
	Prompt           string                     `json:"prompt"`
	Images           []string                   `json:"images"`
	ChatHistory      []*TextMessage             `json:"chat-history,omitempty"`
	Model            string                     `json:"model"`
	SystemMessage    *string                    `json:"system-message,omitempty"`
	Temperature      *float32                   `json:"temperature,omitempty"`
	TopP             *float32                   `json:"top-p,omitempty"`
	N                *int                       `json:"n,omitempty"`
	Stop             *string                    `json:"stop,omitempty"`
	MaxTokens        *int                       `json:"max-tokens,omitempty"`
	PresencePenalty  *float32                   `json:"presence-penalty,omitempty"`
	FrequencyPenalty *float32                   `json:"frequency-penalty,omitempty"`
	ResponseFormat   *responseFormatInputStruct `json:"response-format,omitempty"`
}

type TextCompletionOutput ¶

type TextCompletionOutput struct {
	Texts []string `json:"texts"`
	Usage usage    `json:"usage"`
}

type TextEmbeddingsInput ¶

type TextEmbeddingsInput struct {
	Text       string `json:"text"`
	Model      string `json:"model"`
	Dimensions int    `json:"dimensions"`
}

type TextEmbeddingsOutput ¶

type TextEmbeddingsOutput struct {
	Embedding []float64 `json:"embedding"`
}

type TextEmbeddingsReq ¶

type TextEmbeddingsReq struct {
	Model      string   `json:"model"`
	Dimensions int      `json:"dimensions,omitempty"`
	Input      []string `json:"input"`
}

type TextEmbeddingsResp ¶

type TextEmbeddingsResp struct {
	Object string      `json:"object"`
	Data   []Data      `json:"data"`
	Model  string      `json:"model"`
	Usage  usageOpenAI `json:"usage"`
}

type TextMessage ¶

type TextMessage struct {
	Role    string    `json:"role"`
	Content []Content `json:"content"`
}

type TextToSpeechInput ¶

type TextToSpeechInput struct {
	Text           string   `json:"text"`
	Model          string   `json:"model"`
	Voice          string   `json:"voice"`
	ResponseFormat *string  `json:"response-format,omitempty"`
	Speed          *float64 `json:"speed,omitempty"`
}

type TextToSpeechOutput ¶

type TextToSpeechOutput struct {
	Audio string `json:"audio"`
}

type TextToSpeechReq ¶

type TextToSpeechReq struct {
	Input          string   `json:"input"`
	Model          string   `json:"model"`
	Voice          string   `json:"voice"`
	ResponseFormat *string  `json:"response_format,omitempty"`
	Speed          *float64 `json:"speed,omitempty"`
}

Source Files ¶

View all Source files

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL