Documentation ¶
Overview ¶
The Serving Endpoints API allows you to create, update, and delete model serving endpoints.
Index ¶
- type BuildLogsRequest
- type BuildLogsResponse
- type CreateServingEndpoint
- type DeleteServingEndpointRequest
- type EndpointCoreConfigInput
- type EndpointCoreConfigOutput
- type EndpointCoreConfigSummary
- type EndpointPendingConfig
- type EndpointState
- type EndpointStateConfigUpdate
- type EndpointStateReady
- type ExportMetricsRequest
- type GetServingEndpointRequest
- type ListEndpointsResponse
- type LogsRequest
- type QueryEndpointResponse
- type QueryRequest
- type Route
- type ServedModelInput
- type ServedModelOutput
- type ServedModelSpec
- type ServedModelState
- type ServedModelStateDeployment
- type ServerLogsResponse
- type ServingEndpoint
- type ServingEndpointDetailed
- type ServingEndpointDetailedPermissionLevel
- type ServingEndpointsAPI
- func (a *ServingEndpointsAPI) BuildLogs(ctx context.Context, request BuildLogsRequest) (*BuildLogsResponse, error)
- func (a *ServingEndpointsAPI) BuildLogsByNameAndServedModelName(ctx context.Context, name string, servedModelName string) (*BuildLogsResponse, error)
- func (a *ServingEndpointsAPI) Create(ctx context.Context, createServingEndpoint CreateServingEndpoint) (*WaitGetServingEndpointNotUpdating[ServingEndpointDetailed], error)
- func (a *ServingEndpointsAPI) CreateAndWait(ctx context.Context, createServingEndpoint CreateServingEndpoint, ...) (*ServingEndpointDetailed, error)deprecated
- func (a *ServingEndpointsAPI) Delete(ctx context.Context, request DeleteServingEndpointRequest) error
- func (a *ServingEndpointsAPI) DeleteByName(ctx context.Context, name string) error
- func (a *ServingEndpointsAPI) ExportMetrics(ctx context.Context, request ExportMetricsRequest) error
- func (a *ServingEndpointsAPI) ExportMetricsByName(ctx context.Context, name string) error
- func (a *ServingEndpointsAPI) Get(ctx context.Context, request GetServingEndpointRequest) (*ServingEndpointDetailed, error)
- func (a *ServingEndpointsAPI) GetByName(ctx context.Context, name string) (*ServingEndpointDetailed, error)
- func (a *ServingEndpointsAPI) Impl() ServingEndpointsService
- func (a *ServingEndpointsAPI) ListAll(ctx context.Context) ([]ServingEndpoint, error)
- func (a *ServingEndpointsAPI) Logs(ctx context.Context, request LogsRequest) (*ServerLogsResponse, error)
- func (a *ServingEndpointsAPI) LogsByNameAndServedModelName(ctx context.Context, name string, servedModelName string) (*ServerLogsResponse, error)
- func (a *ServingEndpointsAPI) Query(ctx context.Context, request QueryRequest) (*QueryEndpointResponse, error)
- func (a *ServingEndpointsAPI) UpdateConfig(ctx context.Context, endpointCoreConfigInput EndpointCoreConfigInput) (*WaitGetServingEndpointNotUpdating[ServingEndpointDetailed], error)
- func (a *ServingEndpointsAPI) UpdateConfigAndWait(ctx context.Context, endpointCoreConfigInput EndpointCoreConfigInput, ...) (*ServingEndpointDetailed, error)deprecated
- func (a *ServingEndpointsAPI) WaitGetServingEndpointNotUpdating(ctx context.Context, name string, timeout time.Duration, ...) (*ServingEndpointDetailed, error)
- func (a *ServingEndpointsAPI) WithImpl(impl ServingEndpointsService) *ServingEndpointsAPI
- type ServingEndpointsService
- type TrafficConfig
- type WaitGetServingEndpointNotUpdating
- func (w *WaitGetServingEndpointNotUpdating[R]) Get() (*ServingEndpointDetailed, error)
- func (w *WaitGetServingEndpointNotUpdating[R]) GetWithTimeout(timeout time.Duration) (*ServingEndpointDetailed, error)
- func (w *WaitGetServingEndpointNotUpdating[R]) OnProgress(callback func(*ServingEndpointDetailed)) *WaitGetServingEndpointNotUpdating[R]
Constants ¶
This section is empty.
Variables ¶
This section is empty.
Functions ¶
This section is empty.
Types ¶
type BuildLogsRequest ¶
type BuildLogsRequest struct { // The name of the serving endpoint that the served model belongs to. This // field is required. Name string `json:"-" url:"-"` // The name of the served model that build logs will be retrieved for. This // field is required. ServedModelName string `json:"-" url:"-"` }
Retrieve the logs associated with building the model's environment for a given serving endpoint's served model.
type BuildLogsResponse ¶
type BuildLogsResponse struct { // The logs associated with building the served model's environment. Logs string `json:"logs"` }
type CreateServingEndpoint ¶
type CreateServingEndpoint struct { // The core config of the serving endpoint. Config EndpointCoreConfigInput `json:"config"` // The name of the serving endpoint. This field is required and must be // unique across a Databricks workspace. An endpoint name can consist of // alphanumeric characters, dashes, and underscores. Name string `json:"name"` }
type DeleteServingEndpointRequest ¶
type DeleteServingEndpointRequest struct { // The name of the serving endpoint. This field is required. Name string `json:"-" url:"-"` }
Delete a serving endpoint
type EndpointCoreConfigInput ¶
type EndpointCoreConfigInput struct { // The name of the serving endpoint to update. This field is required. Name string `json:"-" url:"-"` // A list of served models for the endpoint to serve. A serving endpoint can // have up to 10 served models. ServedModels []ServedModelInput `json:"served_models"` // The traffic config defining how invocations to the serving endpoint // should be routed. TrafficConfig *TrafficConfig `json:"traffic_config,omitempty"` }
type EndpointCoreConfigOutput ¶
type EndpointCoreConfigOutput struct { // The config version that the serving endpoint is currently serving. ConfigVersion int `json:"config_version,omitempty"` // The list of served models under the serving endpoint config. ServedModels []ServedModelOutput `json:"served_models,omitempty"` // The traffic configuration associated with the serving endpoint config. TrafficConfig *TrafficConfig `json:"traffic_config,omitempty"` }
type EndpointCoreConfigSummary ¶
type EndpointCoreConfigSummary struct { // The list of served models under the serving endpoint config. ServedModels []ServedModelSpec `json:"served_models,omitempty"` }
type EndpointPendingConfig ¶
type EndpointPendingConfig struct { // The config version that the serving endpoint is currently serving. ConfigVersion int `json:"config_version,omitempty"` // The list of served models belonging to the last issued update to the // serving endpoint. ServedModels []ServedModelOutput `json:"served_models,omitempty"` // The timestamp when the update to the pending config started. StartTime int64 `json:"start_time,omitempty"` // The traffic config defining how invocations to the serving endpoint // should be routed. TrafficConfig *TrafficConfig `json:"traffic_config,omitempty"` }
type EndpointState ¶
type EndpointState struct { // The state of an endpoint's config update. This informs the user if the // pending_config is in progress, if the update failed, or if there is no // update in progress. Note that if the endpoint's config_update state value // is IN_PROGRESS, another update can not be made until the update completes // or fails." ConfigUpdate EndpointStateConfigUpdate `json:"config_update,omitempty"` // The state of an endpoint, indicating whether or not the endpoint is // queryable. An endpoint is READY if all of the served models in its active // configuration are ready. If any of the actively served models are in a // non-ready state, the endpoint state will be NOT_READY. Ready EndpointStateReady `json:"ready,omitempty"` }
type EndpointStateConfigUpdate ¶
type EndpointStateConfigUpdate string
The state of an endpoint's config update. This informs the user if the pending_config is in progress, if the update failed, or if there is no update in progress. Note that if the endpoint's config_update state value is IN_PROGRESS, another update can not be made until the update completes or fails."
const EndpointStateConfigUpdateInProgress EndpointStateConfigUpdate = `IN_PROGRESS`
const EndpointStateConfigUpdateNotUpdating EndpointStateConfigUpdate = `NOT_UPDATING`
const EndpointStateConfigUpdateUpdateFailed EndpointStateConfigUpdate = `UPDATE_FAILED`
func (*EndpointStateConfigUpdate) Set ¶
func (f *EndpointStateConfigUpdate) Set(v string) error
Set raw string value and validate it against allowed values
func (*EndpointStateConfigUpdate) String ¶
func (f *EndpointStateConfigUpdate) String() string
String representation for fmt.Print
func (*EndpointStateConfigUpdate) Type ¶
func (f *EndpointStateConfigUpdate) Type() string
Type always returns EndpointStateConfigUpdate to satisfy [pflag.Value] interface
type EndpointStateReady ¶
type EndpointStateReady string
The state of an endpoint, indicating whether or not the endpoint is queryable. An endpoint is READY if all of the served models in its active configuration are ready. If any of the actively served models are in a non-ready state, the endpoint state will be NOT_READY.
const EndpointStateReadyNotReady EndpointStateReady = `NOT_READY`
const EndpointStateReadyReady EndpointStateReady = `READY`
func (*EndpointStateReady) Set ¶
func (f *EndpointStateReady) Set(v string) error
Set raw string value and validate it against allowed values
func (*EndpointStateReady) String ¶
func (f *EndpointStateReady) String() string
String representation for fmt.Print
func (*EndpointStateReady) Type ¶
func (f *EndpointStateReady) Type() string
Type always returns EndpointStateReady to satisfy [pflag.Value] interface
type ExportMetricsRequest ¶
type ExportMetricsRequest struct { // The name of the serving endpoint to retrieve metrics for. This field is // required. Name string `json:"-" url:"-"` }
Retrieve the metrics associated with a serving endpoint
type GetServingEndpointRequest ¶
type GetServingEndpointRequest struct { // The name of the serving endpoint. This field is required. Name string `json:"-" url:"-"` }
Get a single serving endpoint
type ListEndpointsResponse ¶
type ListEndpointsResponse struct { // The list of endpoints. Endpoints []ServingEndpoint `json:"endpoints,omitempty"` }
type LogsRequest ¶
type LogsRequest struct { // The name of the serving endpoint that the served model belongs to. This // field is required. Name string `json:"-" url:"-"` // The name of the served model that logs will be retrieved for. This field // is required. ServedModelName string `json:"-" url:"-"` }
Retrieve the most recent log lines associated with a given serving endpoint's served model
type QueryEndpointResponse ¶
type QueryEndpointResponse struct { // The predictions returned by the serving endpoint. Predictions []any `json:"predictions"` }
type QueryRequest ¶
type QueryRequest struct { // The name of the serving endpoint. This field is required. Name string `json:"-" url:"-"` }
Query a serving endpoint with provided model input.
type Route ¶
type Route struct { // The name of the served model this route configures traffic for. ServedModelName string `json:"served_model_name"` // The percentage of endpoint traffic to send to this route. It must be an // integer between 0 and 100 inclusive. TrafficPercentage int `json:"traffic_percentage"` }
type ServedModelInput ¶
type ServedModelInput struct { // An object containing a set of optional, user-specified environment // variable key-value pairs used for serving this model. Note: this is an // experimental feature and subject to change. Example model environment // variables that refer to Databricks secrets: `{"OPENAI_API_KEY": // "{{secrets/my_scope/my_key}}", "DATABRICKS_TOKEN": // "{{secrets/my_scope2/my_key2}}"}` EnvironmentVars map[string]string `json:"environment_vars,omitempty"` // The name of the model in Databricks Model Registry to be served or if the // model resides in Unity Catalog, the full name of model, in the form of // __catalog_name__.__schema_name__.__model_name__. ModelName string `json:"model_name"` // The version of the model in Databricks Model Registry or Unity Catalog to // be served. ModelVersion string `json:"model_version"` // The name of a served model. It must be unique across an endpoint. If not // specified, this field will default to <model-name>-<model-version>. A // served model name can consist of alphanumeric characters, dashes, and // underscores. Name string `json:"name,omitempty"` // Whether the compute resources for the served model should scale down to // zero. ScaleToZeroEnabled bool `json:"scale_to_zero_enabled"` // The workload size of the served model. The workload size corresponds to a // range of provisioned concurrency that the compute will autoscale between. // A single unit of provisioned concurrency can process one request at a // time. Valid workload sizes are "Small" (4 - 4 provisioned concurrency), // "Medium" (8 - 16 provisioned concurrency), and "Large" (16 - 64 // provisioned concurrency). If scale-to-zero is enabled, the lower bound of // the provisioned concurrency for each workload size will be 0. WorkloadSize string `json:"workload_size"` }
type ServedModelOutput ¶
type ServedModelOutput struct { // The creation timestamp of the served model in Unix time. CreationTimestamp int64 `json:"creation_timestamp,omitempty"` // The email of the user who created the served model. Creator string `json:"creator,omitempty"` // An object containing a set of optional, user-specified environment // variable key-value pairs used for serving this model. Note: this is an // experimental feature and subject to change. Example model environment // variables that refer to Databricks secrets: `{"OPENAI_API_KEY": // "{{secrets/my_scope/my_key}}", "DATABRICKS_TOKEN": // "{{secrets/my_scope2/my_key2}}"}` EnvironmentVars map[string]string `json:"environment_vars,omitempty"` // The name of the model in Databricks Model Registry or the full name of // the model in Unity Catalog. ModelName string `json:"model_name,omitempty"` // The version of the model in Databricks Model Registry or Unity Catalog to // be served. ModelVersion string `json:"model_version,omitempty"` // The name of the served model. Name string `json:"name,omitempty"` // Whether the compute resources for the Served Model should scale down to // zero. ScaleToZeroEnabled bool `json:"scale_to_zero_enabled,omitempty"` // Information corresponding to the state of the Served Model. State *ServedModelState `json:"state,omitempty"` // The workload size of the served model. The workload size corresponds to a // range of provisioned concurrency that the compute will autoscale between. // A single unit of provisioned concurrency can process one request at a // time. Valid workload sizes are "Small" (4 - 4 provisioned concurrency), // "Medium" (8 - 16 provisioned concurrency), and "Large" (16 - 64 // provisioned concurrency). If scale-to-zero is enabled, the lower bound of // the provisioned concurrency for each workload size will be 0. WorkloadSize string `json:"workload_size,omitempty"` }
type ServedModelSpec ¶
type ServedModelSpec struct { // The name of the model in Databricks Model Registry or the full name of // the model in Unity Catalog. ModelName string `json:"model_name,omitempty"` // The version of the model in Databricks Model Registry or Unity Catalog to // be served. ModelVersion string `json:"model_version,omitempty"` // The name of the served model. Name string `json:"name,omitempty"` }
type ServedModelState ¶
type ServedModelState struct { // The state of the served model deployment. DEPLOYMENT_CREATING indicates // that the served model is not ready yet because the deployment is still // being created (i.e container image is building, model server is deploying // for the first time, etc.). DEPLOYMENT_RECOVERING indicates that the // served model was previously in a ready state but no longer is and is // attempting to recover. DEPLOYMENT_READY indicates that the served model // is ready to receive traffic. DEPLOYMENT_FAILED indicates that there was // an error trying to bring up the served model (e.g container image build // failed, the model server failed to start due to a model loading error, // etc.) DEPLOYMENT_ABORTED indicates that the deployment was terminated // likely due to a failure in bringing up another served model under the // same endpoint and config version. Deployment ServedModelStateDeployment `json:"deployment,omitempty"` // More information about the state of the served model, if available. DeploymentStateMessage string `json:"deployment_state_message,omitempty"` }
type ServedModelStateDeployment ¶
type ServedModelStateDeployment string
The state of the served model deployment. DEPLOYMENT_CREATING indicates that the served model is not ready yet because the deployment is still being created (i.e container image is building, model server is deploying for the first time, etc.). DEPLOYMENT_RECOVERING indicates that the served model was previously in a ready state but no longer is and is attempting to recover. DEPLOYMENT_READY indicates that the served model is ready to receive traffic. DEPLOYMENT_FAILED indicates that there was an error trying to bring up the served model (e.g container image build failed, the model server failed to start due to a model loading error, etc.) DEPLOYMENT_ABORTED indicates that the deployment was terminated likely due to a failure in bringing up another served model under the same endpoint and config version.
const ServedModelStateDeploymentDeploymentAborted ServedModelStateDeployment = `DEPLOYMENT_ABORTED`
const ServedModelStateDeploymentDeploymentCreating ServedModelStateDeployment = `DEPLOYMENT_CREATING`
const ServedModelStateDeploymentDeploymentFailed ServedModelStateDeployment = `DEPLOYMENT_FAILED`
const ServedModelStateDeploymentDeploymentReady ServedModelStateDeployment = `DEPLOYMENT_READY`
const ServedModelStateDeploymentDeploymentRecovering ServedModelStateDeployment = `DEPLOYMENT_RECOVERING`
func (*ServedModelStateDeployment) Set ¶
func (f *ServedModelStateDeployment) Set(v string) error
Set raw string value and validate it against allowed values
func (*ServedModelStateDeployment) String ¶
func (f *ServedModelStateDeployment) String() string
String representation for fmt.Print
func (*ServedModelStateDeployment) Type ¶
func (f *ServedModelStateDeployment) Type() string
Type always returns ServedModelStateDeployment to satisfy [pflag.Value] interface
type ServerLogsResponse ¶
type ServerLogsResponse struct { // The most recent log lines of the model server processing invocation // requests. Logs string `json:"logs"` }
type ServingEndpoint ¶
type ServingEndpoint struct { // The config that is currently being served by the endpoint. Config *EndpointCoreConfigSummary `json:"config,omitempty"` // The timestamp when the endpoint was created in Unix time. CreationTimestamp int64 `json:"creation_timestamp,omitempty"` // The email of the user who created the serving endpoint. Creator string `json:"creator,omitempty"` // System-generated ID of the endpoint. This is used to refer to the // endpoint in the Permissions API Id string `json:"id,omitempty"` // The timestamp when the endpoint was last updated by a user in Unix time. LastUpdatedTimestamp int64 `json:"last_updated_timestamp,omitempty"` // The name of the serving endpoint. Name string `json:"name,omitempty"` // Information corresponding to the state of the serving endpoint. State *EndpointState `json:"state,omitempty"` }
type ServingEndpointDetailed ¶
type ServingEndpointDetailed struct { // The config that is currently being served by the endpoint. Config *EndpointCoreConfigOutput `json:"config,omitempty"` // The timestamp when the endpoint was created in Unix time. CreationTimestamp int64 `json:"creation_timestamp,omitempty"` // The email of the user who created the serving endpoint. Creator string `json:"creator,omitempty"` // System-generated ID of the endpoint. This is used to refer to the // endpoint in the Permissions API Id string `json:"id,omitempty"` // The timestamp when the endpoint was last updated by a user in Unix time. LastUpdatedTimestamp int64 `json:"last_updated_timestamp,omitempty"` // The name of the serving endpoint. Name string `json:"name,omitempty"` // The config that the endpoint is attempting to update to. PendingConfig *EndpointPendingConfig `json:"pending_config,omitempty"` // The permission level of the principal making the request. PermissionLevel ServingEndpointDetailedPermissionLevel `json:"permission_level,omitempty"` // Information corresponding to the state of the serving endpoint. State *EndpointState `json:"state,omitempty"` }
type ServingEndpointDetailedPermissionLevel ¶
type ServingEndpointDetailedPermissionLevel string
The permission level of the principal making the request.
const ServingEndpointDetailedPermissionLevelCanManage ServingEndpointDetailedPermissionLevel = `CAN_MANAGE`
const ServingEndpointDetailedPermissionLevelCanQuery ServingEndpointDetailedPermissionLevel = `CAN_QUERY`
const ServingEndpointDetailedPermissionLevelCanView ServingEndpointDetailedPermissionLevel = `CAN_VIEW`
func (*ServingEndpointDetailedPermissionLevel) Set ¶
func (f *ServingEndpointDetailedPermissionLevel) Set(v string) error
Set raw string value and validate it against allowed values
func (*ServingEndpointDetailedPermissionLevel) String ¶
func (f *ServingEndpointDetailedPermissionLevel) String() string
String representation for fmt.Print
func (*ServingEndpointDetailedPermissionLevel) Type ¶
func (f *ServingEndpointDetailedPermissionLevel) Type() string
Type always returns ServingEndpointDetailedPermissionLevel to satisfy [pflag.Value] interface
type ServingEndpointsAPI ¶
type ServingEndpointsAPI struct {
// contains filtered or unexported fields
}
The Serving Endpoints API allows you to create, update, and delete model serving endpoints.
You can use a serving endpoint to serve models from the Databricks Model Registry or from Unity Catalog. Endpoints expose the underlying models as scalable REST API endpoints using serverless compute. This means the endpoints and associated compute resources are fully managed by Databricks and will not appear in your cloud account. A serving endpoint can consist of one or more MLflow models from the Databricks Model Registry, called served models. A serving endpoint can have at most ten served models. You can configure traffic settings to define how requests should be routed to your served models behind an endpoint. Additionally, you can configure the scale of resources that should be applied to each served model.
func NewServingEndpoints ¶
func NewServingEndpoints(client *client.DatabricksClient) *ServingEndpointsAPI
func (*ServingEndpointsAPI) BuildLogs ¶
func (a *ServingEndpointsAPI) BuildLogs(ctx context.Context, request BuildLogsRequest) (*BuildLogsResponse, error)
Retrieve the logs associated with building the model's environment for a given serving endpoint's served model.
Retrieves the build logs associated with the provided served model.
func (*ServingEndpointsAPI) BuildLogsByNameAndServedModelName ¶
func (a *ServingEndpointsAPI) BuildLogsByNameAndServedModelName(ctx context.Context, name string, servedModelName string) (*BuildLogsResponse, error)
Retrieve the logs associated with building the model's environment for a given serving endpoint's served model.
Retrieves the build logs associated with the provided served model.
func (*ServingEndpointsAPI) Create ¶
func (a *ServingEndpointsAPI) Create(ctx context.Context, createServingEndpoint CreateServingEndpoint) (*WaitGetServingEndpointNotUpdating[ServingEndpointDetailed], error)
Create a new serving endpoint.
func (*ServingEndpointsAPI) CreateAndWait
deprecated
func (a *ServingEndpointsAPI) CreateAndWait(ctx context.Context, createServingEndpoint CreateServingEndpoint, options ...retries.Option[ServingEndpointDetailed]) (*ServingEndpointDetailed, error)
Calls ServingEndpointsAPI.Create and waits to reach NOT_UPDATING state
You can override the default timeout of 20 minutes by calling adding retries.Timeout[ServingEndpointDetailed](60*time.Minute) functional option.
Deprecated: use ServingEndpointsAPI.Create.Get() or ServingEndpointsAPI.WaitGetServingEndpointNotUpdating
func (*ServingEndpointsAPI) Delete ¶
func (a *ServingEndpointsAPI) Delete(ctx context.Context, request DeleteServingEndpointRequest) error
Delete a serving endpoint.
func (*ServingEndpointsAPI) DeleteByName ¶
func (a *ServingEndpointsAPI) DeleteByName(ctx context.Context, name string) error
Delete a serving endpoint.
func (*ServingEndpointsAPI) ExportMetrics ¶
func (a *ServingEndpointsAPI) ExportMetrics(ctx context.Context, request ExportMetricsRequest) error
Retrieve the metrics associated with a serving endpoint.
Retrieves the metrics associated with the provided serving endpoint in either Prometheus or OpenMetrics exposition format.
func (*ServingEndpointsAPI) ExportMetricsByName ¶
func (a *ServingEndpointsAPI) ExportMetricsByName(ctx context.Context, name string) error
Retrieve the metrics associated with a serving endpoint.
Retrieves the metrics associated with the provided serving endpoint in either Prometheus or OpenMetrics exposition format.
func (*ServingEndpointsAPI) Get ¶
func (a *ServingEndpointsAPI) Get(ctx context.Context, request GetServingEndpointRequest) (*ServingEndpointDetailed, error)
Get a single serving endpoint.
Retrieves the details for a single serving endpoint.
func (*ServingEndpointsAPI) GetByName ¶
func (a *ServingEndpointsAPI) GetByName(ctx context.Context, name string) (*ServingEndpointDetailed, error)
Get a single serving endpoint.
Retrieves the details for a single serving endpoint.
func (*ServingEndpointsAPI) Impl ¶
func (a *ServingEndpointsAPI) Impl() ServingEndpointsService
Impl returns low-level ServingEndpoints API implementation
func (*ServingEndpointsAPI) ListAll ¶
func (a *ServingEndpointsAPI) ListAll(ctx context.Context) ([]ServingEndpoint, error)
Retrieve all serving endpoints.
This method is generated by Databricks SDK Code Generator.
func (*ServingEndpointsAPI) Logs ¶
func (a *ServingEndpointsAPI) Logs(ctx context.Context, request LogsRequest) (*ServerLogsResponse, error)
Retrieve the most recent log lines associated with a given serving endpoint's served model.
Retrieves the service logs associated with the provided served model.
func (*ServingEndpointsAPI) LogsByNameAndServedModelName ¶
func (a *ServingEndpointsAPI) LogsByNameAndServedModelName(ctx context.Context, name string, servedModelName string) (*ServerLogsResponse, error)
Retrieve the most recent log lines associated with a given serving endpoint's served model.
Retrieves the service logs associated with the provided served model.
func (*ServingEndpointsAPI) Query ¶
func (a *ServingEndpointsAPI) Query(ctx context.Context, request QueryRequest) (*QueryEndpointResponse, error)
Query a serving endpoint with provided model input.
func (*ServingEndpointsAPI) UpdateConfig ¶
func (a *ServingEndpointsAPI) UpdateConfig(ctx context.Context, endpointCoreConfigInput EndpointCoreConfigInput) (*WaitGetServingEndpointNotUpdating[ServingEndpointDetailed], error)
Update a serving endpoint with a new config.
Updates any combination of the serving endpoint's served models, the compute configuration of those served models, and the endpoint's traffic config. An endpoint that already has an update in progress can not be updated until the current update completes or fails.
func (*ServingEndpointsAPI) UpdateConfigAndWait
deprecated
func (a *ServingEndpointsAPI) UpdateConfigAndWait(ctx context.Context, endpointCoreConfigInput EndpointCoreConfigInput, options ...retries.Option[ServingEndpointDetailed]) (*ServingEndpointDetailed, error)
Calls ServingEndpointsAPI.UpdateConfig and waits to reach NOT_UPDATING state
You can override the default timeout of 20 minutes by calling adding retries.Timeout[ServingEndpointDetailed](60*time.Minute) functional option.
Deprecated: use ServingEndpointsAPI.UpdateConfig.Get() or ServingEndpointsAPI.WaitGetServingEndpointNotUpdating
func (*ServingEndpointsAPI) WaitGetServingEndpointNotUpdating ¶
func (a *ServingEndpointsAPI) WaitGetServingEndpointNotUpdating(ctx context.Context, name string, timeout time.Duration, callback func(*ServingEndpointDetailed)) (*ServingEndpointDetailed, error)
WaitGetServingEndpointNotUpdating repeatedly calls ServingEndpointsAPI.Get and waits to reach NOT_UPDATING state
func (*ServingEndpointsAPI) WithImpl ¶
func (a *ServingEndpointsAPI) WithImpl(impl ServingEndpointsService) *ServingEndpointsAPI
WithImpl could be used to override low-level API implementations for unit testing purposes with github.com/golang/mock or other mocking frameworks.
type ServingEndpointsService ¶
type ServingEndpointsService interface { // Retrieve the logs associated with building the model's environment for a // given serving endpoint's served model. // // Retrieves the build logs associated with the provided served model. BuildLogs(ctx context.Context, request BuildLogsRequest) (*BuildLogsResponse, error) // Create a new serving endpoint. Create(ctx context.Context, request CreateServingEndpoint) (*ServingEndpointDetailed, error) // Delete a serving endpoint. Delete(ctx context.Context, request DeleteServingEndpointRequest) error // Retrieve the metrics associated with a serving endpoint. // // Retrieves the metrics associated with the provided serving endpoint in // either Prometheus or OpenMetrics exposition format. ExportMetrics(ctx context.Context, request ExportMetricsRequest) error // Get a single serving endpoint. // // Retrieves the details for a single serving endpoint. Get(ctx context.Context, request GetServingEndpointRequest) (*ServingEndpointDetailed, error) // Retrieve all serving endpoints. // // Use ListAll() to get all ServingEndpoint instances List(ctx context.Context) (*ListEndpointsResponse, error) // Retrieve the most recent log lines associated with a given serving // endpoint's served model. // // Retrieves the service logs associated with the provided served model. Logs(ctx context.Context, request LogsRequest) (*ServerLogsResponse, error) // Query a serving endpoint with provided model input. Query(ctx context.Context, request QueryRequest) (*QueryEndpointResponse, error) // Update a serving endpoint with a new config. // // Updates any combination of the serving endpoint's served models, the // compute configuration of those served models, and the endpoint's traffic // config. An endpoint that already has an update in progress can not be // updated until the current update completes or fails. UpdateConfig(ctx context.Context, request EndpointCoreConfigInput) (*ServingEndpointDetailed, error) }
The Serving Endpoints API allows you to create, update, and delete model serving endpoints.
You can use a serving endpoint to serve models from the Databricks Model Registry or from Unity Catalog. Endpoints expose the underlying models as scalable REST API endpoints using serverless compute. This means the endpoints and associated compute resources are fully managed by Databricks and will not appear in your cloud account. A serving endpoint can consist of one or more MLflow models from the Databricks Model Registry, called served models. A serving endpoint can have at most ten served models. You can configure traffic settings to define how requests should be routed to your served models behind an endpoint. Additionally, you can configure the scale of resources that should be applied to each served model.
type TrafficConfig ¶
type TrafficConfig struct { // The list of routes that define traffic to each served model. Routes []Route `json:"routes,omitempty"` }
type WaitGetServingEndpointNotUpdating ¶
type WaitGetServingEndpointNotUpdating[R any] struct { Response *R Name string `json:"name"` // contains filtered or unexported fields }
WaitGetServingEndpointNotUpdating is a wrapper that calls ServingEndpointsAPI.WaitGetServingEndpointNotUpdating and waits to reach NOT_UPDATING state.
func (*WaitGetServingEndpointNotUpdating[R]) Get ¶
func (w *WaitGetServingEndpointNotUpdating[R]) Get() (*ServingEndpointDetailed, error)
Get the ServingEndpointDetailed with the default timeout of 20 minutes.
func (*WaitGetServingEndpointNotUpdating[R]) GetWithTimeout ¶
func (w *WaitGetServingEndpointNotUpdating[R]) GetWithTimeout(timeout time.Duration) (*ServingEndpointDetailed, error)
Get the ServingEndpointDetailed with custom timeout.
func (*WaitGetServingEndpointNotUpdating[R]) OnProgress ¶
func (w *WaitGetServingEndpointNotUpdating[R]) OnProgress(callback func(*ServingEndpointDetailed)) *WaitGetServingEndpointNotUpdating[R]
OnProgress invokes a callback every time it polls for the status update.