raptor

module

v0.3.3 Latest Latest Go to latest Published: Apr 21, 2024 License: Apache-2.0

Details

Valid go.mod file

The Go module system was introduced in Go 1.11 and is the official dependency management solution for Go.
Redistributable license

Redistributable licenses place minimal restrictions on how software can be used, modified, and redistributed.
Tagged version

Modules with tagged versions give importers more predictable builds.
Stable version

When a project reaches major version v1 it is considered stable.
Learn more about best practices

Repository

github.com/raptor-ml/raptor

Links

Open Source Insights

README ¶

From notebook to production
Transform your data science to production-ready artifacts

Raptor frees data scientists and ML engineers to build and deploy operational models and ML-driven functionality, without learning backend engineering.

It compiles your python research code to production artifacts, and takes care of the engineering concerns such as scalability and reliability using best-practices on Kubernetes.

Explore the docs »

Getting started in 5 minutes » · Report a Bug · Request a Feature

🧐 What is Raptor?

Raptor frees data scientists and ML engineers to focus on the data science and research work, and build operational models and ML-driven functionality without learning backend engineering. Focus on what you're good at, increase your end-to-end velocity, and close the gap between research and production.

With Raptor, you can export your Python research code as standard production artifacts, and deploy them to Kubernetes. Once they deployed, Raptor optimizes data processing and feature calculation for production, deploys models to Sagemaker or Docker containers, and connects to your production data sources, scaling, high availability, caching, monitoring, and all other backend concerns.

😍 Why people love Raptor? and how does it change their lives?

Raptor is made by and for data scientists and ML engineers. We know how hard it is to build and deploy models to be an integral part of your products, and we want to make it easier.

Before Raptor, data scientists had to work closely with backend engineers to build a "production version" of their work: connect to data sources, transform their data with Flink/Spark or even Java, create APIs, dockerizing the model, handle scaling and high availability, and more.

High-level view of Raptor

With Raptor, data scientists can focus only on their research and model development, then export their work to production. Raptor takes care of the rest, including connecting to data sources, transforming the data, deploying and connecting the model, etc. This means data scientists can focus on what they do best, and Raptor handles the rest.

⭐️ Key Features

Focus on your work: Raptor frees data scientists and ML engineers to focus on the model, without learning backend engineering. Stop worrying about the engineering concerns, and focus on what you're good at.
Eliminate serving/training skew: You can use the same code for training and production to avoid training serving skew.
Real-time/on-demand: Raptor optimizes feature calculations and predictions to be performed at the time of the request.
Seamless caching and storage: Raptor uses an integrated caching system, and store your historical data for training purposes. So you won't need any other data storage system such as "Feature Store".
Turns data science work into production artifacts: Raptor implements best-practice functionalities of Kubernetes solutions, such as scaling, health, auto-recovery, monitoring, logging, and more.
Integrates with R&D team: Raptor extends existing DevOps tools and infrastructure and allows you to connect your ML research to the rest of your organization's R&D ecosystem, utilizing tools such as CI/CD and monitoring.

(back to top)

🚀 Getting Started

To start, install Raptor LabSDK. The LabSDK is a Python package that help you develop models and features in notebooks or IDEs.

pip install raptor-labsdk

⚡ Quick Example

import pandas as pd
from raptor import *
from typing_extensions import TypedDict


@data_source(
    training_data=pd.read_csv(
        'https://gist.githubusercontent.com/AlmogBaku/8be77c2236836177b8e54fa8217411f2/raw/hello_world_transactions.csv'),
    production_config=StreamingConfig()
)
class BankTransaction(TypedDict):
    customer_id: str
    amount: float
    timestamp: str


# Define features 🧪
@feature(keys='customer_id', data_source=BankTransaction)
@aggregation(function=AggregationFunction.Sum, over='10h', granularity='1h')
def total_spend(this_row: BankTransaction, ctx: Context) -> float:
    """total spend by a customer in the last hour"""
    return this_row['amount']


@feature(keys='customer_id', data_source=BankTransaction)
@freshness(max_age='5h', max_stale='1d')
def amount(this_row: BankTransaction, ctx: Context) -> float:
    """total spend by a customer in the last hour"""
    return this_row['amount']


# Train the model 🤓
@model(
    keys='customer_id',
    input_features=['total_spend+sum'],
    input_labels=[amount],
    model_framework='sklearn',
    model_server='sagemaker-ack',
)
@freshness(max_age='1h', max_stale='100h')
def amount_prediction(ctx: TrainingContext):
    from sklearn.linear_model import LinearRegression
    df = ctx.features_and_labels()
    trainer = LinearRegression()
    trainer.fit(df[ctx.input_features], df[ctx.input_labels])
    return trainer


amount_prediction.export()  # Export to production 🎉

This will generate a bunch of artifacts in the out directory. The out directory also includes a Makefile that can be used for integration in any CI/CD pipeline, or even invoked manually.

(back to top)

🥊 How does Raptor different than ___ ?

MLOps platforms (MLFlow, Kubeflow, Metaflow, Sagemaker, VertexAI, etc.)

Traditional MLOps platforms are focused on managing the ML resources lifecycle and are not designed for building operational models and features. Raptor is designed for building operational models and features, and can be integrated with MLOps platforms.

Feature Stores (Hopsworks, Feast, etc.)

Feature store is a data storage system that stores pre-computed features for training and online purposes. That means you need to orchestrate the pre-computation of the features, store them, connect them to your model, and write ad-hoc backend code.

Raptor takes a radically different approach. You focus on the model, and Raptor takes care of the rest. Raptor has a built-in caching system that allows you to achieve similar results to a feature store but without the need to orchestrate the data pipeline and the model deployment directly.

Model Servers (Sagemaker, BentoML, KServe, etc.)

Model servers are designed for serving models in production. They are not designed for building models and features for production. In fact, Raptor integrates seamlessly with Model Servers(such as Sagemaker, BentoML, etc.) to serve your models.

💡 How does it work?

The work with Raptor starts in your research phase in your notebook or IDE. Raptor allows you to write your ML work in a translatable way for production purposes.

Models and Features in Raptor are composed of a declarative part(via Python's decorators) and a function code. This way, Raptor can translate the heavy-lifting engineering concerns(such as aggregations or caching) by implementing the "declarative part", and optimizing the implementation for production.

Features are composed from a declarative part and a function code

After you are satisfied with your research results, "export" these definitions, and deploy it to Kubernetes using standard tools; Once deployed, Raptor Core(the server-side part) is extending Kubernetes with the ability to implement them. It takes care of the engineering concerns by managing and controlling Kubernetes-native resources such as deployments to connect your production data sources and run your business logic at scale.

You can read more about Raptor's architecture in the docs.

(back to top)

⎈ Production Installation

Raptor installation is not required for training purposes. You only need to install Raptor when deploying to production (or staging).

Learn more about production installation at the docs.

🏗️ Prerequisites

Kubernetes cluster (including EKS, GKE, etc.)
Redis server (> 2.8.9)
Optional: Snowflake or S3 bucket (to record historical data for retraining purposes)

(back to top)

🏔 Roadmap

S3 historical storage plugins
- S3 storing
- S3 fetching data - Spark
Deploy models to model servers
- Sagemaker ACK
- VertexAI
- Seldon
- Kubeflow
- KFServing
- Standalone
Large-scale training
Support more data sources
- Kafka
- GCP Pub/Sub
- Rest
- Snowflake
- BigQuery
- gRPC
- Redis
- Postgres
- GraphQL

See the open issues for a full list of proposed features (and known issues) .

(back to top)

👷‍ Contributing

Contributions make the open-source community a fantastic place to learn, inspire, and create. Any contributions you make are greatly appreciated (not only code! but also documenting, blogging, or giving us feedback) 😍.

Please fork the repo and create a pull request if you have a suggestion. You can also simply open an issue and choose " Feature Request" to give us some feedback.

Don't forget to give the project a star! ⭐️

For more information about contributing code to the project, read the CONTRIBUTING.md file.

(back to top)

📃 License

Distributed under the Apache2 License. Read the LICENSE file for more information.

(back to top)

👫 Joining the community

You can join the Raptor community on Slack, follow us on Twitter, and participate in the issues and pull requests.

Don't forget to give the project a star! ⭐️

(back to top)

Directories ¶

Path	Synopsis
api
v1alpha1 Package v1alpha1 contains API Schema definitions for the k8s.raptor.ml v1alpha1 API group +kubebuilder:object:generate=true +groupName=k8s.raptor.ml	Package v1alpha1 contains API Schema definitions for the k8s.raptor.ml v1alpha1 API group +kubebuilder:object:generate=true +groupName=k8s.raptor.ml
proto/gen/go Module
cmd
core
core/internal/setup
historian
internal
accessor
engine
engine/controllers
historian
operator
plugins
plugins/builders/model
plugins/builders/rest
plugins/builders/sourceless
plugins/builders/streaming
plugins/modelservers/sagemaker-ack
plugins/providers/historical/parquet
plugins/providers/historical/parquet/s3
plugins/providers/historical/snowflake
plugins/providers/state/redis
stats
version
pkg
plugins
protoregistry
querybuilder
runner
runtimemanager
sdk

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL

README ¶

From notebook to production Transform your data science to production-ready artifacts

🧐 What is Raptor?

😍 Why people love Raptor? and how does it change their lives?

⭐️ Key Features

🚀 Getting Started

⚡ Quick Example

🥊 How does Raptor different than ___ ?

MLOps platforms (MLFlow, Kubeflow, Metaflow, Sagemaker, VertexAI, etc.)

Feature Stores (Hopsworks, Feast, etc.)

Model Servers (Sagemaker, BentoML, KServe, etc.)

💡 How does it work?

⎈ Production Installation

🏗️ Prerequisites

🏔 Roadmap

👷‍ Contributing

📃 License

👫 Joining the community

Directories ¶

From notebook to production
Transform your data science to production-ready artifacts