Flyflow
Flyflow is API middleware to optimize LLM applications, same response quality, 5x lower latency, secure, and much higher token limits
When building on top of LLMs, builders care about the following:
- Response quality
- Latency (both time to first token and tokens / second)
- Rate limits
- Reliability
- Enterprise grade security
Flyflow is middleware designed to optimize for all of these qualities, built to be open source, high performance written in golang, and optionally self-hosted for maximum flexiblity.
Fine tuning
The flyflow completions API is a drop in replacement for the openai completions API. Use flyflow directly in your openai provider and start using the API. All openai features will work including embeddings.
from openai import OpenAI
client = OpenAI(
base_url="https://api.flyflow.dev",
api_key='demo'
)
chat_completion = client.chat.completions.create(...)
Flyflow automatically tracks your query patterns with openai and you can use it to fine tune mixtral MoE or llama 70b to match the quality of GPT4 on your query patterns.
Inference
Flyflow allows for substantially higher token limits and reliability by load balancing across many different inference providers.
We host your custom fine turned models with providers like anyscale, together.ai, and fal, and optimize for latency, tokens / second, and rate limits, with a model that's the same level of quality as GPT4 for your queries.
This also enables significantly higher reliability, because if a provider fails we can drop in fallbacks to pick up the load.
Security and observability
Flyflow can also act as security middleware, preventing sensitive information from reaching the inference provider (including openai and microsoft).
We provide easy to configure plugins that allow you to filter PII from your queries, and advanced observability tools that help you understand how LLMs are being used by your organization.
Configurability
Flyflow is designed to be extremely configurable. Want to just use us as security middleware, but run all of your inference through GPT4? We got you. Want to fine tune and back up you backend with 5 inference providers for the highest possible rate limits and tokens / second? No problem.
Written in golang, our backend is designed to maximize for performance, without compromising on flexibility for developers.
Installation
Install cobra:
go get -u github.com/spf13/cobra@latest
go install github.com/spf13/cobra-cli@latest