geminiclient

package module
v1.7.0 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Aug 26, 2024 License: Apache-2.0 Imports: 17 Imported by: 0

README

Build Go Report Card License

Gemini Client

A simple way to use the Gemini API.

Features and limitations

  • It is possible to submit a prompt and receive a response.
  • The package can run both locally (calling the Gemini API) and in Google Cloud (for example as a Google Cloud Run instance).
  • Supports multi-modal prompts (prompts where you can add text, images or data to the prompt).
  • Supports tool / function calling where you can supply custom Go functions to the Gemini client, and Gemini can call the functions as needed (but only for 1 tool/function, for now).
  • This package is a work in progress!
  • The only currently known issue is that when adding more than 1 tool/function, it appears to not work, ref. the multicall branch.
  • The functions starting with Must are alternatives to the ones that return a value and an error. These functions will just return the value, but panic if it fails. This is handy for testing and quick examples, but larger applications should probably not use them.

Example use

  1. Run gcloud auth login and/or gcloud auth application-default login, if needed.
  2. Get the Google Project ID at https://console.cloud.google.com/.
  3. export GCP_PROJECT=123, where "123" is your own Google Project ID.
  4. (optionally) export GCP_LOCATION=us-west1, if "us-west1" is the location you prefer.
  5. Create a directory for this experiment, for instance: mkdir -p ~/geminitest && cd ~/geminitest.
  6. Create a main.go file that looks like this (0.4 is the temperature, 0.0 is less creative, 1.0 is more creative):
package main

import (
    "fmt"

    "github.com/xyproto/geminiclient"
)

func main() {
    fmt.Println(geminiclient.MustAsk("Write a haiku about cows.", 0.4))
}
  1. Prepare a simple go.mod project file with ie. go mod init cows
  2. Fetch the dependencies (this geminiclient package) with go mod tidy
  3. Build and run the executable: go build && ./cows
  4. Observe the output, that should look a bit like this:
Black and white patches,
Chewing grass in sunlit fields,
Mooing gentle song.

A note about Google Cloud

If an application that uses geminiclient is deployed to ie. Google Cloud Run, then creating a new service account with "Vertex AI User" permissions is probably needed. This can be created in the "IAM & Admin" section. The service account can then be selected when deploying to Cloud Run.

Function calling / tool use

package main

import (
    "fmt"
    "log"
    "strings"

    "github.com/xyproto/geminiclient"
)

func main() {
    gc := geminiclient.MustNew()

    // Define a custom function for getting the weather, that Gemini can choose to call
    getWeatherRightNow := func(location string) string {
        fmt.Println("getWeatherRightNow was called")
        switch location {
        case "NY":
            return "It's sunny in New York."
        case "London":
            return "It's rainy in London."
        default:
            return "Weather data not available."
        }
    }

    // Add the weather function as a tool
    err := gc.AddFunctionTool("get_weather_right_now", "Get the current weather for a specific location", getWeatherRightNow)
    if err != nil {
        log.Fatalf("Failed to add function tool: %v", err)
    }

    // Query Gemini with a prompt that requires using the custom weather tool
    result, err := gc.Query("What is the weather in NY?")
    if err != nil {
        log.Fatalf("Failed to query Gemini: %v", err)
    }

    // Check and print the weather response
    if !strings.Contains(result, "sunny") {
        log.Fatalf("Expected 'sunny' to be in the response, but got: %v", result)
    }
    fmt.Println("Weather AI Response:", result)

    gc.Clear() // Clear the current prompt parts, tools and functions

    // Define a custom function for reversing a string
    reverseString := func(input string) string {
        fmt.Println("reverseString was called")
        runes := []rune(input)
        for i, j := 0, len(runes)-1; i < j; i, j = i+1, j-1 {
            runes[i], runes[j] = runes[j], runes[i]
        }
        return string(runes)
    }

    // Add the string reversal function as a tool
    err = gc.AddFunctionTool("reverse_string", "Reverse the given string", reverseString)
    if err != nil {
        log.Fatalf("Failed to add function tool: %v", err)
    }

    // Query Gemini with a prompt that requires using the string reversal tool
    result, err = gc.Query("Reverse the string 'hello'. Reply with a single word.")
    if err != nil {
        log.Fatalf("Failed to query Gemini: %v", err)
    }

    // Check and print the string reversal response
    expected := "olleh"
    if !strings.Contains(result, expected) {
        log.Fatalf("Expected '%s' to be in the response, but got: %v", expected, result)
    }
    fmt.Println("Response:", result)
}

Multimodal prompts / analyzing images

package main

import (
    "fmt"
    "log"

    "github.com/xyproto/geminiclient"
    "github.com/xyproto/wordwrap"
)

func main() {
    const (
        multiModalModelName = "gemini-1.0-pro-vision" // "gemini-1.5-pro" also works, if only text is sent
        temperature         = 0.4
        descriptionPrompt   = "Describe what is common for these two images."
    )

    gc, err := geminiclient.NewMultiModal(multiModalModelName, temperature)
    if err != nil {
        log.Fatalf("Could not initialize the Gemini client with the %s model: %v\n", multiModalModelName, err)
    }

    // Build a prompt
    if err := gc.AddImage("frog.png"); err != nil {
        log.Fatalf("Could not add frog.png: %v\n", err)
    }
    gc.AddURI("gs://generativeai-downloads/images/scones.jpg")
    gc.AddText(descriptionPrompt)

    // Count the tokens that are about to be sent
    tokenCount, err := gc.CountTokens()
    if err != nil {
        log.Fatalln(err)
    }
    fmt.Printf("Sending %d tokens.\n\n", tokenCount)

    // Submit the images and the text prompt
    response, err := gc.Submit()
    if err != nil {
        log.Fatalln(err)
    }

    // Format and print out the response
    if lines, err := wordwrap.WordWrap(response, 79); err == nil { // success
        for _, line := range lines {
            fmt.Println(line)
        }
        return
    }

    fmt.Println(response)
}

Producing JSON

package main

import (
    "fmt"
    "log"
    "time"

    "github.com/xyproto/geminiclient"
)

func main() {
    const (
        prompt      = `What color is the sky? Answer with a JSON struct where the only key is "color" and the value is a lowercase string.`
        modelName   = "gemini-1.5-pro"
        temperature = 0.0
        timeout     = 10 * time.Second
    )

    gc, err := geminiclient.NewWithTimeout(modelName, temperature, timeout)
    if err != nil {
        log.Fatalln(err)
    }

    fmt.Println(prompt)

    result, err := gc.Query(prompt)
    if err != nil {
        log.Fatalln(err)
    }

    fmt.Println(result)
}
  • gemini-1.5-flash is the default model.
  • gemini-1.5-pro is smarter, but slower and more expensive.

Environment variables

These environment variables are supported:

  • GCP_PROJECT_ID or PROJECT_ID for the Google Cloud Project ID
  • GCP_LOCATION or PROJECT_LOCATION for the Google Cloud Project location (like us-west1)
  • MODEL_NAME for the Gemini model name (like gemini-1.5-flash or gemini-1.5-pro)
  • MULTI_MODAL_MODEL_NAME for the Gemini multi-modal name (like gemini-1.0-pro-vision)

General info

  • Version: 1.7.0
  • License: Apache 2
  • Author: Alexander F. Rødseth

Documentation

Index

Constants

This section is empty.

Variables

View Source
var ErrEmptyPrompt = errors.New("empty prompt")
View Source
var (
	ErrGoogleCloudProjectID = errors.New("please set GCP_PROJECT_ID or PROJECT_ID to your Google Cloud project ID")
)

Functions

func Ask

func Ask(prompt string, temperature float32) (string, error)

func MustAsk

func MustAsk(prompt string, temperature float32) string

Types

type FunctionCallHandler

type FunctionCallHandler func(response map[string]any) (map[string]any, error)

FunctionCallHandler defines a callback type for handling function responses.

type GeminiClient

type GeminiClient struct {
	Client              *genai.Client
	Functions           map[string]reflect.Value // For custom functions that the LLM can call
	ModelName           string
	MultiModalModelName string
	ProjectLocation     string
	ProjectID           string
	Tools               []*genai.Tool
	Parts               []genai.Part
	Timeout             time.Duration
	Temperature         float32
	Trim                bool
	Verbose             bool
}

func MustNew

func MustNew() *GeminiClient

func MustNewText

func MustNewText(modelName string, temperature float32) *GeminiClient

func MustNewWithTimeout

func MustNewWithTimeout(modelName string, temperature float32, timeout time.Duration) *GeminiClient

func New

func New(modelName string, temperature float32) (*GeminiClient, error)

func NewCustom

func NewCustom(modelName, multiModalModelName, projectLocation, projectID string, temperature float32, timeout time.Duration) (*GeminiClient, error)

func NewMultiModal

func NewMultiModal(modelName string, temperature float32) (*GeminiClient, error)

New creates a new MultiModal instance with a specified model name and temperature, initializing it with default values for parts, trim, and verbose settings.

func NewText

func NewText(modelName, projectLocation, projectID string, temperature float32) (*GeminiClient, error)

func NewWithTimeout

func NewWithTimeout(modelName string, temperature float32, timeout time.Duration) (*GeminiClient, error)

func (*GeminiClient) AddData

func (gc *GeminiClient) AddData(mimeType string, data []byte)

AddData adds arbitrary data with a specified MIME type to the parts of the MultiModal instance.

func (*GeminiClient) AddFunctionTool

func (gc *GeminiClient) AddFunctionTool(name, description string, fn interface{}) error

AddFunctionTool registers a custom Go function as a tool that the model can call.

func (*GeminiClient) AddImage

func (gc *GeminiClient) AddImage(filename string) error

AddImage reads an image from a file, prepares it for processing, and adds it to the list of parts to be used by the model. It supports verbose logging of operations if enabled.

func (*GeminiClient) AddText

func (gc *GeminiClient) AddText(prompt string)

AddText adds a textual part to the MultiModal instance.

func (*GeminiClient) AddURI

func (gc *GeminiClient) AddURI(URI string)

AddURI adds a file part to the MultiModal instance from a Google Cloud URI, allowing for integration with cloud resources directly. Example URI: "gs://generativeai-downloads/images/scones.jpg"

func (*GeminiClient) AddURL

func (gc *GeminiClient) AddURL(URL string) error

AddURL downloads the file from the given URL, identifies the MIME type, and adds it as a genai.Part.

func (*GeminiClient) Clear

func (gc *GeminiClient) Clear()

Clear clears the prompt parts, tools, and functions registered with the client.

func (*GeminiClient) ClearParts

func (gc *GeminiClient) ClearParts()

func (*GeminiClient) ClearToolsAndFunctions

func (gc *GeminiClient) ClearToolsAndFunctions()

ClearToolsAndFunctions clears all registered tools and functions.

func (*GeminiClient) CountPartTokensWithContext

func (gc *GeminiClient) CountPartTokensWithContext(ctx context.Context) (int, error)

CountPartTokensWithContext counts the tokens in the current multimodal parts using the default client and model.

func (*GeminiClient) CountPromptTokens

func (gc *GeminiClient) CountPromptTokens(prompt string) (int, error)

CountPromptTokens counts the number of tokens in the given text prompt using the default client and model.

func (*GeminiClient) CountPromptTokensWithClient

func (gc *GeminiClient) CountPromptTokensWithClient(ctx context.Context, client *genai.Client, prompt, modelName string) (int, error)

CountPromptTokensWithClient counts the tokens in the given text prompt using a specific client and model.

func (*GeminiClient) CountPromptTokensWithModel

func (gc *GeminiClient) CountPromptTokensWithModel(ctx context.Context, prompt, modelName string) (int, error)

CountPromptTokensWithModel counts the tokens in the given text prompt using the specified model within the default client.

func (*GeminiClient) CountTextTokens

func (gc *GeminiClient) CountTextTokens(text string) (int, error)

CountTextTokens counts the tokens in the given text using the default client and model.

func (*GeminiClient) CountTextTokensWithClient

func (gc *GeminiClient) CountTextTokensWithClient(ctx context.Context, client *genai.Client, text, modelName string) (int, error)

CountTextTokensWithClient counts the tokens in the given text using a specific client and model.

func (*GeminiClient) CountTextTokensWithModel

func (gc *GeminiClient) CountTextTokensWithModel(ctx context.Context, text, modelName string) (int, error)

CountTextTokensWithModel counts the tokens in the given text using the specified model within the default client.

func (*GeminiClient) CountTokens

func (gc *GeminiClient) CountTokens() (int, error)

CountTokens counts the tokens in the current multimodal parts using the default client, model, and a new context.

func (*GeminiClient) MultiQuery

func (gc *GeminiClient) MultiQuery(prompt string, base64Data, dataMimeType *string, temperature *float32) (string, error)

MultiQuery processes a prompt with optional base64-encoded data and MIME type for the data.

func (*GeminiClient) MultiQueryWithCallbacks

func (gc *GeminiClient) MultiQueryWithCallbacks(prompt string, base64Data, dataMimeType *string, temperature *float32, callback FunctionCallHandler) (string, error)

MultiQueryWithCallbacks processes a prompt, supports function tools, and uses a callback function to handle function responses.

func (*GeminiClient) MultiQueryWithSequentialCallbacks

func (gc *GeminiClient) MultiQueryWithSequentialCallbacks(prompt string, callbacks map[string]FunctionCallHandler) (string, error)

MultiQueryWithSequentialCallbacks handles multiple function calls in sequence, using callback functions to manage responses.

func (*GeminiClient) MustAddImage

func (gc *GeminiClient) MustAddImage(filename string)

MustAddImage is a convenience function that adds an image to the MultiModal instance, terminating the program if adding the image fails.

func (*GeminiClient) Query

func (gc *GeminiClient) Query(prompt string) (string, error)

func (*GeminiClient) QueryWithCallbacks

func (gc *GeminiClient) QueryWithCallbacks(prompt string, callback FunctionCallHandler) (string, error)

QueryWithCallbacks allows querying with a prompt and processing function calls via a callback handler.

func (*GeminiClient) QueryWithSequentialCallbacks

func (gc *GeminiClient) QueryWithSequentialCallbacks(prompt string, callbacks map[string]FunctionCallHandler) (string, error)

QueryWithSequentialCallbacks allows querying with a prompt and processing multiple function calls in sequence via a map of callback handlers.

func (*GeminiClient) SetTimeout

func (gc *GeminiClient) SetTimeout(timeout time.Duration)

func (*GeminiClient) SetTrim

func (gc *GeminiClient) SetTrim(trim bool)

SetTrim updates the trim flag of the MultiModal instance, controlling whether the output is trimmed for whitespace.

func (*GeminiClient) SetVerbose

func (gc *GeminiClient) SetVerbose(verbose bool)

SetVerbose updates the verbose logging flag of the MultiModal instance, allowing for more detailed output during operations.

func (*GeminiClient) Submit

func (gc *GeminiClient) Submit() (string, error)

Submit sends all added parts to the specified Vertex AI model for processing, returning the model's response. It supports temperature configuration and response trimming. This function creates a temporary client and is not meant to be used within Google Cloud (use SubmitToClient instead).

func (*GeminiClient) SubmitToClient

func (gc *GeminiClient) SubmitToClient(ctx context.Context) (result string, err error)

SubmitToClient sends all added parts to the specified Vertex AI model for processing, returning the model's response. It supports temperature configuration and response trimming.

func (*GeminiClient) SubmitToClientStreaming

func (gc *GeminiClient) SubmitToClientStreaming(ctx context.Context, streamCallback func(string)) (result string, err error)

SubmitToClientStreaming sends the current parts to Gemini, and streams the response back by calling the streamCallback function.

Directories

Path Synopsis
cmd

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL