Go OpenAI Router
What is it?
A simple routing layer for using OpenAI/AzureOpenAI endpoints in a go project in production. Wraps the official Azure OpenAI SDK for go - https://pkg.go.dev/github.com/Azure/azure-sdk-for-go/sdk/ai/azopenai
When should you use it?
- When a single Azure/OpenAI API Endpoint doesn't have enough tokens/minute bandwidth for your use case and you need a router/load-balancer to manage requests between different Azure/OpenAI Endpoints.
- Like us, you are building AI/GenAI products using Golang instead of Python.
The router provides three different load balancing strategies -
- Round Robin
- Least Busy
- Least Latency
The router expects that <DEPLOYMENT_NAME>
exists in all the underlying servers that the router uses.
Example -
//Create server configuration - 1
config1 := server.ServerConfig{
Endpoint: "https://<YOUR_AZURE_RESOURCE>.openai.azure.com/",
Type: server.AzureOpenAiServerType,
AvailableModels: []string{"gpt-3.5-turbo", "gpt-4-turbo"},
//Create server configuration - 2
config2 := server.ServerConfig{
Endpoint: "https://api.openai.com/v1",
Type: server.OpenAiServerType,
AvailableModels: []string{"gpt-3.5-turbo", "gpt-4-turbo", "gpt-4-vision-preview"}
//Create the router using the conifgurations and a strategy
router, _ := router.NewRouter(
[]server.ServerConfig{config1, config2},
//Prepare a sample azure request
azureMessages := []azopenai.ChatRequestMessageClassification{}
azureMessages = append(azureMessages, &azopenai.ChatRequestUserMessage{
Content: azopenai.NewChatRequestUserMessageContent("Who wrote the Jungle Book?"),
body := azopenai.ChatCompletionsOptions{
Messages: azureMessages,
DeploymentName: "<DEPLOYMENT_NAME>",
//Make the call using the router
router.GetChatCompletions(context.TODO(), body, nil)
We decided to build and open-source this project since we believe this is a key challenge people will face when they want to deploy their GenAI products in production to large enterprises/userbases and since we didn't find a suitable alternative in Golang for utilities that exist for python, for example - https://github.com/BerriAI/litellm
If you are interested in contributing to the project, feel free to open an issue, request or a PR.
Currently we don't have a structured/formalized roadmap for the project, we will be adding features as we need them. But some of the things that we believe would happens soon are -
- Routing based on token usage/requests per minute.
- Rate limiting/throttling.