ogem

package module

v0.0.0-...-3b69c6d Latest Latest Go to latest Published: Nov 23, 2024 License: Apache-2.0 Imports: 2 Imported by: 0

Details

Valid go.mod file
Redistributable license
Tagged version
Stable version
Learn more about best practices

Repository

github.com/yanolja/ogem

Links

Open Source Insights

README ¶

Ogem

Ogem is a unified proxy server that provides access to multiple AI language models through an OpenAI-compatible API interface. It supports OpenAI, Google's Gemini (both Studio and Vertex AI), and Anthropic's Claude models.

Features

OpenAI API-compatible interface
Support for multiple AI providers:
- OpenAI (e.g., GPT-4, GPT-3.5)
- Google Gemini (e.g., Gemini 1.5 Flash, Pro)
- Anthropic Claude (e.g., Claude 3.5 Opus, Sonnet, Haiku)
Smart routing based on latency
Rate limiting and quota management
Response caching for deterministic requests
Batch processing support
Regional endpoint selection
Multi-provider fallback support

Quick Start

Using Docker

# Pull and run the latest version
docker pull ynext/ogem:latest
docker run -p 8080:8080 \
  -e OPEN_GEMINI_API_KEY="your-api-key" \
  -e CONFIG_SOURCE="path-or-url-to-config" \
  ynext/ogem:latest

# Or use a specific version
docker pull ynext/ogem:0.0.1

Building from Source

go run cmd/main.go

Configuration

Configuration can be provided through a local file or remote URL using the CONFIG_SOURCE environment variable.

Example config.yaml:

# Amount of time to wait before retrying the request when there are no available endpoints due to rate limiting.
retry_interval: "1m"
# How frequently to check the health of the providers. If you don't want to check the health, set it to 0.
ping_interval: "1h"
providers:
  openai:
    regions:
      # For providers that does not support multiple regions, you can use the provider name as the region name.
      openai:
        models:
          - name: "gpt-4"
            rate_key: "gpt-4"
            rpm: 10_000
            tpm: 1_000_000
  vertex:
    regions:
      default:
        # Models listed under `default` serve as a template and will be automatically copied to all regions.
        # However, `default` itself is not a valid region - you must define at least one actual region
        # (like 'us-central1') for the provider to work.
        #
        # Example:
        # default:
        #   models: [model-a, model-b]  # These will be copied to all regions
        # us-central1:                  # This is a real region
        #   models: [model-c]           # Final models: model-a, model-b, model-c
        models:
          - name: "gemini-1.5-flash"
            rate_key: "gemini-1.5-flash"
            rpm: 200
            tpm: 4_000_000
      us-central1:
        models:
          - name: "gemini-1.5-pro"
            rate_key: "gemini-1.5-pro"
            rpm: 60
            tpm: 4_000_000

Providers and Models

Ogem supports multiple AI providers through different integration methods:

openai: Direct integration with OpenAI's API
- Supports: GPT-4, GPT-3.5, and other OpenAI models
- Requires: OPENAI_API_KEY
studio: Google's Gemini API (via AI Studio)
- Supports: Gemini 1.5 Pro, Gemini 1.5 Flash
- Requires: GENAI_STUDIO_API_KEY
vertex: Google Cloud's Vertex AI platform
- Supports: Gemini models, custom/finetuned models
- Requires: GOOGLE_CLOUD_PROJECT and GCP authentication
claude: Direct integration with Anthropic's Claude API
- Supports: Claude 3 Opus, Sonnet, Haiku
- Requires: CLAUDE_API_KEY
vclaude: Claude models via Vertex AI
- Supports: Claude models deployed on GCP
- Requires: GOOGLE_CLOUD_PROJECT and GCP authentication
custom: Custom endpoint
- Supports: Any API that is OpenAI-compatible
- Requires: BASE_URL, PROTOCOL, API_KEY_ENV

Using Custom Endpoint

For custom endpoints, you can specify the base URL, protocol, and API key environment variable.

providers:
  custom:  # Choose any name for the custom provider
    base_url: https://api.example.com/v1
    protocol: openai  # Only openai protocol is supported for custom endpoints
    api_key_env: EXAMPLE_API_KEY
    regions:
      custom:  # This region name must match the provider name
        models:
          - name: some-model
            rate_key: some-model
            rpm: 10_000
            tpm: 30_000_000
  another:
    base_url: https://api.another.com/v1
    protocol: openai
    api_key_env: ANOTHER_API_KEY
    regions:
      another:
        models:
          - name: your-model-name
            rate_key: your-rate-key

For the API key, it is not allowed to specify any in the config.yaml file. Instead, you should set it as an environment variable and set the variable name in the api_key_env field. Currently, only OpenAI protocol is supported for custom endpoints.

Using Finetuned Models

For custom or finetuned models on Vertex AI, you can map the full endpoint path to a friendly name:

models:
  - name: "projects/1234567890123/locations/us-central1/endpoints/45678901234567890123"
    other_names:
      - "finetuned-flash"    # This becomes the model name you use in API calls
    rate_key: "gemini-1.5-flash"
    rpm: 200    # Requests per minute limit
    tpm: 4_000_000    # Tokens per minute limit

You can then use finetuned-flash in your API calls instead of the full endpoint path.

Rate Limiting and Quotas

Each model configuration includes rate limiting parameters:

rpm: Requests Per Minute limit
tpm: Tokens Per Minute limit (total tokens including both input and output)

Example configuration:

models:
  - name: "gemini-1.5-pro"
    rate_key: "gemini-1.5-pro"
    rpm: 60      # Maximum 60 requests per minute
    tpm: 4000000 # Maximum 4 million tokens per minute

State Management with Valkey (Redis-compatible)

Ogem can use Valkey for distributed state management, which is recommended for multi-instance deployments:

Purpose: Manages rate limiting, quotas, and request caching across multiple Ogem instances
Configuration: Set via VALKEY_ENDPOINT environment variable
Format: localhost:6379
Optional: If not configured, Ogem will use in-memory storage (suitable for single-instance deployments)

Example configuration:

export VALKEY_ENDPOINT="localhost:6379"

The reasons why we use Valkey instead of Redis are:

Redis is not open source anymore so that it's not suitable for self-hosted deployments (https://github.com/redis/redis/pull/13157)
Valkey is Redis-compatible so that you can migrate to Valkey easily

Batch Processing

Batch processing is a cost-optimization feature that uses OpenAI's batch API to reduce costs. Here's how it works:

How It Works

Add a batch model to the config (e.g., gpt-4o@batch).

models:
  - name: "gpt-4o@batch"
    rate_key: "gpt-4o@batch"
    rpm: 10_000
    tpm: 30_000_000

When you send a request with a @batch suffix (e.g., gpt-4o@batch):
- Your request joins a batch queue
- Batches are processed every 10 seconds or when 50,000 requests accumulate
- The request waits for the batch to complete
Response Behavior:
- The request blocks until the batch is completed
- If the batch completes within your request timeout: You get results
- If timeout occurs: You can retry with the same request
- Each identical request gets the same request_id internally, preventing duplicate processing

Usage Example

{
  "model": "gpt-4o@batch",
  "messages": [
    [{"role": "user", "content": "Hello! How can you help me today?"}]
  ]
}

Benefits

Up to 50% cost reduction using OpenAI's batch API pricing
Automatic request batching and management
Identical requests are deduplicated

Limitations

Currently only supported for OpenAI models
Cannot expect different results for identical requests even if you set the temperature to 0

Environment Variables

Core Settings

CONFIG_SOURCE: Path or URL to config file (default: "config.yaml")
CONFIG_TOKEN: Bearer token for authenticated config URL (optional)
PORT: Server port (default: 8080)

API Keys

OPEN_GEMINI_API_KEY: API key for accessing Ogem
OPENAI_API_KEY: OpenAI API key
CLAUDE_API_KEY: Anthropic Claude API key
GENAI_STUDIO_API_KEY: Google Gemini Studio API key
GOOGLE_CLOUD_PROJECT: GCP project ID for Vertex AI

Performance Settings

VALKEY_ENDPOINT: Redis-compatible endpoint for state management
RETRY_INTERVAL: Wait duration before retrying failed requests
PING_INTERVAL: Health check interval

API Usage

Send requests using the OpenAI API format:

curl http://localhost:8080/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $OGEM_API_KEY" \
  -d '{
    "model": "gemini-1.5-flash",
    "messages": [
      {
        "role": "user",
        "content": "Hello! How can you help me today?"
      }
    ],
    "temperature": 0.7
  }'

Model Selection

Three formats for model selection:

Simple model name:

{
  "model": "gpt-4"
}

Provider and model:

{
  "model": "vertex/gemini-1.5-pro"
}

Provider, region, and model:

{
  "model": "vertex/us-central1/gemini-1.5-pro"
}

Fallback Chain

Specify multiple models for automatic fallback:

{
  "model": "gpt-4,claude-3-opus,gemini-1.5-pro"
}

Batch Processing

Add @batch suffix for batch processing:

{
  "model": "gpt-4@batch"
}

Currently, batch processing is only supported for OpenAI models.

Docker Support

Running with Docker

# Basic run
docker run -p 8080:8080 ynext/ogem:latest

# With configuration
docker run -p 8080:8080 \
  -e CONFIG_SOURCE="https://api.example.com/config.yaml" \
  -e CONFIG_TOKEN="your-token" \
  -e OPENAI_API_KEY="your-key" \
  ynext/ogem:latest

Building Docker Image

The image supports both AMD64 (Intel/AMD) and ARM64 architectures.

# Setup buildx for multi-architecture support
docker buildx create --name mybuilder --use

# Build and push with version tag
docker buildx build \
  --platform linux/amd64,linux/arm64 \
  -t ynext/ogem:0.0.1 \
  -t ynext/ogem:latest \
  --push .

# Verify architectures
docker buildx imagetools inspect ynext/ogem:latest

Error Handling

Standard HTTP status codes:

400: Bad Request
401: Unauthorized
429: Too Many Requests
500: Internal Server Error
503: Service Unavailable

Development

Requirements:

Go 1.23+
Docker (optional)
Valkey (Redis-compatible, optional)

# Clone repository
git clone https://github.com/yanolja/ogem.git
cd ogem

# Run tests
go test ./...

# Build binary
go build ./cmd/main.go

License

This project is licensed under the terms of the Apache 2.0 license. See the LICENSE file for more details.

Contributing

Before you submit any contributions, please make sure to review and agree to our Contributor License Agreement.

Code of Conduct

Please read our Code of Conduct before engaging with our community.

Documentation ¶

Index ¶

type ProviderStatus
type ProvidersStatus
- func (providers ProvidersStatus) ForEach(...) bool
- func (providers ProvidersStatus) Update(provider, region string, callback func(*RegionStatus) error) error
type RegionStatus
type SupportedModel

Constants ¶

This section is empty.

Variables ¶

This section is empty.

Functions ¶

This section is empty.

Types ¶

type ProviderStatus ¶

type ProviderStatus struct {
	// Base URL of the endpoint. E.g., "http://localhost:8080/v1"
	BaseUrl string `yaml:"base_url" json:"base_url"`

	// API protocol used by the endpoint. E.g., "openai"
	Protocol string `yaml:"protocol" json:"protocol"`

	// Environment variable name for the API key. E.g., "SELF_HOST_API_KEY"
	ApiKeyEnv string `yaml:"api_key_env" json:"api_key_env"`

	// Regions maps region names to their status.
	// The "default" region configures provider-wide settings.
	// E.g., Regions["us-central1"]
	Regions map[string]*RegionStatus `yaml:"regions" json:"regions"`
}

type ProvidersStatus ¶

type ProvidersStatus map[string]*ProviderStatus

ProvidersStatus is a map of provider names to their status.

func (ProvidersStatus) ForEach ¶

func (providers ProvidersStatus) ForEach(callback func(
	provider string,
	providerStatus ProviderStatus,
	region string,
	regionStatus RegionStatus,
	models []*SupportedModel,
) bool,
) bool

*

Iterates over all models of all providers and regions. *
@param callback - The callback function to be executed for each model.
Should return true to stop the iteration.
Parameters are:
provider - The provider name.
providerStatus - The provider status. Read-only.
region - The region name.
regionStatus - The region status. Read-only.
models - The list of models available in the region.
@returns true if the iteration was stopped by the callback, false otherwise.

func (ProvidersStatus) Update ¶

func (providers ProvidersStatus) Update(provider, region string, callback func(*RegionStatus) error) error

*

Update modifies or creates a region status for a specific provider and region.
It applies the provided callback to update the region status. Note that the
model list in the region status is read-only and any changes will be ignored. *
@param provider - The provider name.
@param region - The region name.
@param callback - The callback function to be executed to update the region status.
Should return an error to stop the update.
@returns An error if the provider or region is not found, or if the callback returns an error.

type RegionStatus ¶

type RegionStatus struct {
	// Models supported by this region. Actual supported models are
	// a combination of this list and the default models of the provider.
	Models []*SupportedModel `yaml:"models" json:"models"`

	// Latency to this region.
	// Measured with minimal token completion and the fastest model.
	Latency time.Duration `json:"latency"`

	// Last time the region status was updated.
	LastChecked time.Time `json:"last_checked"`
}

type SupportedModel ¶

type SupportedModel struct {
	// Model name. E.g., "gpt-4o"
	Name string `yaml:"name" json:"name"`

	// Model aliases. All names here and in `name` must refer to the same model.
	// E.g., {"gpt-4o-2024-05-13", "gpt-4o-2024-08-06"}
	OtherNames []string `yaml:"other_names" json:"other_names,omitempty"`

	// Rate key. Models sharing this key have the same rate limits.
	// E.g., "gpt-4o"
	RateKey string `yaml:"rate_key" json:"rate_key"`

	// Maximum tokens per minute (TPM).
	// Cannot send more than this number of tokens per minute for this model.
	MaxTokensPerMinute int `yaml:"tpm" json:"tpm,omitempty"`

	// Maximum requests per minute.
	// Cannot send more than this number of requests per minute for this model.
	MaxRequestsPerMinute int `yaml:"rpm" json:"rpm,omitempty"`
}

Source Files ¶

View all Source files

ogem.go

Directories ¶

Path	Synopsis
cmd
openai
provider
claude
openai
studio
vclaude
vertex
server
state
utils
array
copy
env
heap
orderedmap

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL