merlin

package
v0.11.0 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Oct 31, 2024 License: Apache-2.0 Imports: 22 Imported by: 0

README

merlin

Extractor for Machine Learning(ML) Models from Merlin.

The extractor uses the REST API exposed by Merlin to extract models. The REST API has been documented with Swagger and can be seen here.

Usage

source:
  name: merlin
  scope: staging
  config:
    url: my-company.com/api/merlin/
    service_account_base64: |
      ____base64_encoded_service_account_credentials____

Inputs

Key Value Example Description Required?
url string my-company.com/api/merlin/ Merlin's API base URL
service_account_base64 string ____BASE64_ENCODED_SERVICE_ACCOUNT____ Service Account credentials in base64 encoded string.
request_timeout string 10s Timeout for HTTP requests to Merlin API
worker_count int 5 Number of workers to spawn for extracting projects parallely from Merlin.
Notes
  • Leaving service_account_base64 blank will default to Google's default authentication. It is recommended if Meteor instance runs inside the same Google Cloud environment as the BigQuery project.

Outputs

The models are mapped to an Asset with model specific metadata stored using Model. Please refer the proto definitions for more information.

A single model asset includes all the active model versions. A model version is considered active if it has an endpoint.

Field Value Sample Value
resource.urn urn:merlin:{scope}:model:{model.project_name}.{model.name} urn:merlin:staging:model:food.restaurant-image
resource.name {model.name} tensorflow-sample
resource.service merlin merlin
resource.type model model
resource.url {model.endpoints[0].url} tensorflow-sample.integration-test.models.mycompany.com
namespace {project.name} integration-test
flavor model.type pyfunc
versions []ModelVersion
attributes.merlin_project_id project.id 23
attributes.mlflow_experiment_id model.mlflow_experiment_id 721
attributes.mlflow_experiment_url model.mlflow_url http://mlflow.mycompany.com/#/experiments/721
attributes.endpoint_urls[] model.endpoints[].url ["tensorflow-sample.integration-test.models.mycompany.com"]
create_time model.created_at 2021-03-01T18:42:50.564685Z
update_time model.updated_at 2022-01-27T10:21:26.121941Z
resource.owners[].urn {project.administrators[]} giga.chad@knowyourmeme.com
resource.owners[].email {project.administrators[]} giga.chad@knowyourmeme.com
lineage.upstreams []Resource upstreams
resource.labels {"team": {project.team}, "stream": {project.stream} + project.labels {"stream": "relevance","team": "search"}
ModelVersion

A ModelVersion is used to represent each combination of Merlin model's version and it's 'endpoint' destination. A single model version will have an 'endpoint' for each environment it is deployed in. Please refer the proto definitions for more information.

Field Value Sample Value
status model_version.status running
version model_version.id 11
attributes.endpoint_id endpoint.id 187
attributes.mlflow_run_id model_version.mlflow_run_id 3c7067f3770441ebbd66a0dce91b8724
attributes.mlflow_run_url model_version.mlflow_url http://mlflow.mycompany.com/#/experiments/721/runs/3c7067f3770441ebbd66a0dce91b8724
attributes.endpoint_url endpoint.url tensorflow-sample.integration-test.models.mycompany.com
attributes.version_endpoint_url version_endpoint.url http://tensorflow-sample-11.integration-test.models.mycompany.com/v1/models/tensorflow-sample-11
attributes.monitoring_url version_endpoint.monitoring_url https://grafana.mycompany.com/graph/d/z9MBKR1Az/model-version-dashboard?params
attributes.message version_endpoint.message timeout creating inference service
attributes.environment_name endpoint.environment_name aws-staging
attributes.deployment_mode version_endpoint.deployment_mode serverless
attributes.service_name version_endpoint.service_name tensorflow-sample-11-predictor-default.integration-test.models.mycompany.com
attributes.env_vars version_endpoint.env_vars {"INIT_HEAP_SIZE_IN_MB": "2250","WORKERS": "1"}
attributes.transformer version_endpoint.transformer Attributes including transformer.{enabled, type, image, command, args, env_vars}
attributes.weight endpoint.rule.destinationsp[].weight 100
labels model_version.labels
create_time model_version.created_at 2022-11-13T07:21:07.888150Z
update_time model_version.updated_at 2022-11-13T07:21:07.888150Z
Resource upstreams

The extractor currently has limited support for constructing the upstreams for Model that utilises the env vars for standard transformer. It parses the feature table specs that specify the project name and feature table name of the CaraML Store Feature Table from the env vars. This information is used to construct the upstreams for the model.

Field Value Sample Value
urn urn:caramlstore:{scope}:feature_table:{ft.project}.{ft.name} urn:kafka:int-kafka.yonkou.io:topic:staging_30min_demand
type feature_table topic
service caramlstore kafka

Contributing

Refer to the contribution guidelines for information on contributing to this module.

Documentation

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

This section is empty.

Types

type Client

type Client interface {
	Projects(ctx context.Context) ([]merlin.Project, error)
	Models(ctx context.Context, projectID int64) ([]merlin.Model, error)
	ModelVersion(ctx context.Context, modelID, versionID int64) (merlin.ModelVersion, error)
}

type Config

type Config struct {
	URL                  string        `json:"url" yaml:"url" mapstructure:"url" validate:"required"`
	ServiceAccountBase64 string        `json:"service_account_base64" yaml:"service_account_base64" mapstructure:"service_account_base64"`
	RequestTimeout       time.Duration `json:"request_timeout" yaml:"request_timeout" mapstructure:"request_timeout" validate:"min=1ms" default:"10s"`
	WorkerCount          int           `json:"worker_count" yaml:"worker_count" mapstructure:"worker_count" validate:"min=1" default:"5"`
}

Config holds the set of configuration for the Merlin extractor.

type Extractor

type Extractor struct {
	plugins.BaseExtractor
	// contains filtered or unexported fields
}

Extractor manages the communication with the Merlin service.

func New

func New(logger log.Logger, newClient NewClientFunc) *Extractor

New returns a pointer to an initialized Extractor Object

func (*Extractor) Extract

func (e *Extractor) Extract(ctx context.Context, emit plugins.Emit) error

func (*Extractor) Init

func (e *Extractor) Init(ctx context.Context, config plugins.Config) error

Init initializes the extractor

type NewClientFunc

type NewClientFunc func(ctx context.Context, cfg Config) (Client, error)

Directories

Path Synopsis
internal

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL