model-proxy

command module

v0.0.0-...-009dced Latest Latest Go to latest Published: Apr 9, 2024 License: MIT Imports: 14 Imported by: 0

Details

Valid go.mod file

The Go module system was introduced in Go 1.11 and is the official dependency management solution for Go.
Redistributable license

Redistributable licenses place minimal restrictions on how software can be used, modified, and redistributed.
Tagged version

Modules with tagged versions give importers more predictable builds.
Stable version

When a project reaches major version v1 it is considered stable.
Learn more about best practices

Repository

github.com/jsfour/model-proxy

Links

Open Source Insights

README ¶

LLM Model Proxy Cache

Develop against large language models without the big bill

This application serves as a reverse proxy with caching capabilities, specifically tailored for language model model API requests. Built with Golang, it facilitates interactions with models hosted on platforms like OpenAI by caching responses and minimizing redundant external API calls.

The goal is to allow for you to develop against llm api's without running up a bill.

Features

Reverse Proxy Functionality: Directs API requests to the appropriate machine learning model service provider. Currently only supports OpenAI.
Caching Mechanism: Stores successful responses to reduce API calls and improve performance. Cache hits serve responses directly from the cache.
Token Counting: Leveraging tiktoken-go, the service estimates the number of tokens in each request's payload to keep track of usage.
Dynamic Service Resolution: Looks up the configured model service provider based on the requested model in the API call.
Extensibility: Supports registering multiple model providers through the IModelProvider interface, each with its own set of API models and endpoints.

How It Works

The entry point main() initiates the application by setting up a response cache and a service resolver that includes an OpenAIProvider responsible for handling OpenAI API requests. The HTTP server listens on port 8080 and processes incoming requests through a handler which:

Generates a cache key based on the request's path, body, and header.
Checks the cache for a stored response corresponding to the cache key.
If a cache hit occurs, it serves the response directly from the cache.
On a cache miss, it determines the correct service provider and reverse proxies the request to the target machine learning model API.

API

The application exposes a single HTTP endpoint / that accepts requests with model specifications in the body. Examples of supported model names include "gpt-4", "gpt-3.5-turbo", and "text-embedding-3-large".

Setup

To run the service, ensure you have the following:

Golang installed and configured.
tiktoken-go library installed (go get github.com/pkoukk/tiktoken-go).

To start the server, execute:

go run main.go

The server will listen on http://localhost:8080.

Installation

Download the repository and install the dependencies:

go get

Then install via:

go install

Usage

Make an API request to the service with the desired machine learning model name and payload:

curl -X POST http://localhost:8080/v1/chat/completions -H "Content-Type: application/json" \
    -d '{"model": "gpt-3.5-turbo", "messages": [{"role": "user", "content": "What is the capital of France?"}]}'

Replace the model value and the messages array content with your specific requirements.

Nodejs

You can drop in the proxt via the

const llm = new OpenAI({
  baseURL: "http://localhost:8080/v1",
});

const res = await llm.chat.completions.create({
  messages: [
    { role: "system", content: "You are a helpful assistant." },
    { role: "user", content: "Hello!" },
  ],
  model: "gpt-3.5-turbo",
  temperature: 1,
});

Caching Key Generation

ls $GOPATH/bin The generateCacheKey() function creates a unique cache key by hashing the request's path, body, and the 'Authorization' bearer token if present.

Token Counting

The CountTokens() method, part of the OpenAIProvider implementation of IModelProvider, counts tokens in the request content for a given model, aiding in managing token usage.

Note

This application serves as an example and may require additional security and error handling features to be production-ready.

Documentation ¶

There is no documentation for this package.

Source Files ¶

View all Source files

main.go

Directories ¶

Path	Synopsis
cache
providers

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL