Documentation ¶
Index ¶
Constants ¶
This section is empty.
Variables ¶
View Source
var ErrNotAccountClient = errors.New("invalid Databricks Account configuration")
Functions ¶
func Must ¶
Must panics if error is not nil. It's intended to be used with databricks.NewWorkspaceClient and databricks.NewAccountClient.
func WithProduct ¶
func WithProduct(name, version string)
WithProduct is expected to be set by developers to differentiate their app from others.
Example setting is:
func init() { databricks.WithProduct("your-product", "0.0.1") }
Types ¶
type AccountClient ¶
type AccountClient struct { Config *config.Config // These APIs manage access rules on resources in an account. Currently, // only grant rules are supported. A grant rule specifies a role assigned to // a set of principals. A list of rules attached to a resource is called a // rule set. AccessControl *iam.AccountAccessControlAPI // This API allows you to download billable usage logs for the specified // account and date range. This feature works with all account types. BillableUsage *billing.BillableUsageAPI // These APIs manage budget configuration including notifications for // exceeding a budget for a period. They can also retrieve the status of // each budget. Budgets *billing.BudgetsAPI // These APIs manage credential configurations for this workspace. // Databricks needs access to a cross-account service IAM role in your AWS // account so that Databricks can deploy clusters in the appropriate VPC for // the new workspace. A credential configuration encapsulates this role // information, and its ID is used when creating a new workspace. Credentials *provisioning.CredentialsAPI // These APIs enable administrators to manage custom oauth app integrations, // which is required for adding/using Custom OAuth App Integration like // Tableau Cloud for Databricks in AWS cloud. // // **Note:** You can only add/use the OAuth custom application integrations // when OAuth enrollment status is enabled. For more details see // :method:OAuthEnrollment/create CustomAppIntegration *oauth2.CustomAppIntegrationAPI // These APIs manage encryption key configurations for this workspace // (optional). A key configuration encapsulates the AWS KMS key information // and some information about how the key configuration can be used. There // are two possible uses for key configurations: // // * Managed services: A key configuration can be used to encrypt a // workspace's notebook and secret data in the control plane, as well as // Databricks SQL queries and query history. * Storage: A key configuration // can be used to encrypt a workspace's DBFS and EBS data in the data plane. // // In both of these cases, the key configuration's ID is used when creating // a new workspace. This Preview feature is available if your account is on // the E2 version of the platform. Updating a running workspace with // workspace storage encryption requires that the workspace is on the E2 // version of the platform. If you have an older workspace, it might not be // on the E2 version of the platform. If you are not sure, contact your // Databricks representative. EncryptionKeys *provisioning.EncryptionKeysAPI // Groups simplify identity management, making it easier to assign access to // Databricks account, data, and other securable objects. // // It is best practice to assign access to workspaces and access-control // policies in Unity Catalog to groups, instead of to users individually. // All Databricks account identities can be assigned as members of groups, // and members inherit permissions that are assigned to their group. Groups *iam.AccountGroupsAPI // The Accounts IP Access List API enables account admins to configure IP // access lists for access to the account console. // // Account IP Access Lists affect web application access and REST API access // to the account console and account APIs. If the feature is disabled for // the account, all access is allowed for this account. There is support for // allow lists (inclusion) and block lists (exclusion). // // When a connection is attempted: 1. **First, all block lists are // checked.** If the connection IP address matches any block list, the // connection is rejected. 2. **If the connection was not rejected by block // lists**, the IP address is compared with the allow lists. // // If there is at least one allow list for the account, the connection is // allowed only if the IP address matches an allow list. If there are no // allow lists for the account, all IP addresses are allowed. // // For all allow lists and block lists combined, the account supports a // maximum of 1000 IP/CIDR values, where one CIDR counts as a single value. // // After changes to the account-level IP access lists, it can take a few // minutes for changes to take effect. IpAccessLists *settings.AccountIpAccessListsAPI // These APIs manage log delivery configurations for this account. The two // supported log types for this API are _billable usage logs_ and _audit // logs_. This feature is in Public Preview. This feature works with all // account ID types. // // Log delivery works with all account types. However, if your account is on // the E2 version of the platform or on a select custom plan that allows // multiple workspaces per account, you can optionally configure different // storage destinations for each workspace. Log delivery status is also // provided to know the latest status of log delivery attempts. The // high-level flow of billable usage delivery: // // 1. **Create storage**: In AWS, [create a new AWS S3 bucket] with a // specific bucket policy. Using Databricks APIs, call the Account API to // create a [storage configuration object](#operation/create-storage-config) // that uses the bucket name. 2. **Create credentials**: In AWS, create the // appropriate AWS IAM role. For full details, including the required IAM // role policies and trust relationship, see [Billable usage log delivery]. // Using Databricks APIs, call the Account API to create a [credential // configuration object](#operation/create-credential-config) that uses the // IAM role's ARN. 3. **Create log delivery configuration**: Using // Databricks APIs, call the Account API to [create a log delivery // configuration](#operation/create-log-delivery-config) that uses the // credential and storage configuration objects from previous steps. You can // specify if the logs should include all events of that log type in your // account (_Account level_ delivery) or only events for a specific set of // workspaces (_workspace level_ delivery). Account level log delivery // applies to all current and future workspaces plus account level logs, // while workspace level log delivery solely delivers logs related to the // specified workspaces. You can create multiple types of delivery // configurations per account. // // For billable usage delivery: * For more information about billable usage // logs, see [Billable usage log delivery]. For the CSV schema, see the // [Usage page]. * The delivery location is // `<bucket-name>/<prefix>/billable-usage/csv/`, where `<prefix>` is the // name of the optional delivery path prefix you set up during log delivery // configuration. Files are named // `workspaceId=<workspace-id>-usageMonth=<month>.csv`. * All billable usage // logs apply to specific workspaces (_workspace level_ logs). You can // aggregate usage for your entire account by creating an _account level_ // delivery configuration that delivers logs for all current and future // workspaces in your account. * The files are delivered daily by // overwriting the month's CSV file for each workspace. // // For audit log delivery: * For more information about about audit log // delivery, see [Audit log delivery], which includes information about the // used JSON schema. * The delivery location is // `<bucket-name>/<delivery-path-prefix>/workspaceId=<workspaceId>/date=<yyyy-mm-dd>/auditlogs_<internal-id>.json`. // Files may get overwritten with the same content multiple times to achieve // exactly-once delivery. * If the audit log delivery configuration included // specific workspace IDs, only _workspace-level_ audit logs for those // workspaces are delivered. If the log delivery configuration applies to // the entire account (_account level_ delivery configuration), the audit // log delivery includes workspace-level audit logs for all workspaces in // the account as well as account-level audit logs. See [Audit log delivery] // for details. * Auditable events are typically available in logs within 15 // minutes. // // [Audit log delivery]: https://docs.databricks.com/administration-guide/account-settings/audit-logs.html // [Billable usage log delivery]: https://docs.databricks.com/administration-guide/account-settings/billable-usage-delivery.html // [Usage page]: https://docs.databricks.com/administration-guide/account-settings/usage.html // [create a new AWS S3 bucket]: https://docs.databricks.com/administration-guide/account-api/aws-storage.html LogDelivery *billing.LogDeliveryAPI // These APIs manage metastore assignments to a workspace. MetastoreAssignments *catalog.AccountMetastoreAssignmentsAPI // These APIs manage Unity Catalog metastores for an account. A metastore // contains catalogs that can be associated with workspaces Metastores *catalog.AccountMetastoresAPI // These APIs manage network configurations for customer-managed VPCs // (optional). Its ID is used when creating a new workspace if you use // customer-managed VPCs. Networks *provisioning.NetworksAPI // These APIs enable administrators to enroll OAuth for their accounts, // which is required for adding/using any OAuth published/custom application // integration. // // **Note:** Your account must be on the E2 version to use these APIs, this // is because OAuth is only supported on the E2 version. OAuthEnrollment *oauth2.OAuthEnrollmentAPI // These APIs manage private access settings for this account. PrivateAccess *provisioning.PrivateAccessAPI // These APIs enable administrators to manage published oauth app // integrations, which is required for adding/using Published OAuth App // Integration like Tableau Cloud for Databricks in AWS cloud. // // **Note:** You can only add/use the OAuth published application // integrations when OAuth enrollment status is enabled. For more details // see :method:OAuthEnrollment/create PublishedAppIntegration *oauth2.PublishedAppIntegrationAPI // These APIs enable administrators to manage service principal secrets. // // You can use the generated secrets to obtain OAuth access tokens for a // service principal, which can then be used to access Databricks Accounts // and Workspace APIs. For more information, see [Authentication using OAuth // tokens for service principals], // // In addition, the generated secrets can be used to configure the // Databricks Terraform Provider to authenticate with the service principal. // For more information, see [Databricks Terraform Provider]. // // [Authentication using OAuth tokens for service principals]: https://docs.databricks.com/dev-tools/authentication-oauth.html // [Databricks Terraform Provider]: https://github.com/xuxiaoshuo/terraform-provider-databricks/blob/master/docs/index.md#authenticating-with-service-principal ServicePrincipalSecrets *oauth2.ServicePrincipalSecretsAPI // Identities for use with jobs, automated tools, and systems such as // scripts, apps, and CI/CD platforms. Databricks recommends creating // service principals to run production jobs or modify production data. If // all processes that act on production data run with service principals, // interactive users do not need any write, delete, or modify privileges in // production. This eliminates the risk of a user overwriting production // data by accident. ServicePrincipals *iam.AccountServicePrincipalsAPI // The Personal Compute enablement setting lets you control which users can // use the Personal Compute default policy to create compute resources. By // default all users in all workspaces have access (ON), but you can change // the setting to instead let individual workspaces configure access control // (DELEGATE). // // There is only one instance of this setting per account. Since this // setting has a default value, this setting is present on all accounts even // though it's never set on a given account. Deletion reverts the value of // the setting back to the default value. Settings *settings.AccountSettingsAPI // These APIs manage storage configurations for this workspace. A root // storage S3 bucket in your account is required to store objects like // cluster logs, notebook revisions, and job results. You can also use the // root storage S3 bucket for storage of non-production DBFS data. A storage // configuration encapsulates this bucket information, and its ID is used // when creating a new workspace. Storage *provisioning.StorageAPI // These APIs manage storage credentials for a particular metastore. StorageCredentials *catalog.AccountStorageCredentialsAPI // User identities recognized by Databricks and represented by email // addresses. // // Databricks recommends using SCIM provisioning to sync users and groups // automatically from your identity provider to your Databricks account. // SCIM streamlines onboarding a new employee or team by using your identity // provider to create users and groups in Databricks account and give them // the proper level of access. When a user leaves your organization or no // longer needs access to Databricks account, admins can terminate the user // in your identity provider and that user’s account will also be removed // from Databricks account. This ensures a consistent offboarding process // and prevents unauthorized users from accessing sensitive data. Users *iam.AccountUsersAPI // These APIs manage VPC endpoint configurations for this account. VpcEndpoints *provisioning.VpcEndpointsAPI // The Workspace Permission Assignment API allows you to manage workspace // permissions for principals in your account. WorkspaceAssignment *iam.WorkspaceAssignmentAPI // These APIs manage workspaces for this account. A Databricks workspace is // an environment for accessing all of your Databricks assets. The workspace // organizes objects (notebooks, libraries, and experiments) into folders, // and provides access to data and computational resources such as clusters // and jobs. // // These endpoints are available if your account is on the E2 version of the // platform or on a select custom plan that allows multiple workspaces per // account. Workspaces *provisioning.WorkspacesAPI }
func NewAccountClient ¶
func NewAccountClient(c ...*Config) (*AccountClient, error)
NewAccountClient creates new Databricks SDK client for Accounts or returns error in case configuration is wrong
type WorkspaceClient ¶
type WorkspaceClient struct { Config *config.Config Files *files.FilesAPI // These APIs manage access rules on resources in an account. Currently, // only grant rules are supported. A grant rule specifies a role assigned to // a set of principals. A list of rules attached to a resource is called a // rule set. A workspace must belong to an account for these APIs to work. AccountAccessControlProxy *iam.AccountAccessControlProxyAPI // The alerts API can be used to perform CRUD operations on alerts. An alert // is a Databricks SQL object that periodically runs a query, evaluates a // condition of its result, and notifies one or more users and/or // notification destinations if the condition was met. Alerts can be // scheduled using the `sql_task` type of the Jobs API, e.g. // :method:jobs/create. Alerts *sql.AlertsAPI // A catalog is the first layer of Unity Catalog’s three-level namespace. // It’s used to organize your data assets. Users can see all catalogs on // which they have been assigned the USE_CATALOG data permission. // // In Unity Catalog, admins and data stewards manage users and their access // to data centrally across all of the workspaces in a Databricks account. // Users in different workspaces can share access to the same data, // depending on privileges granted centrally in Unity Catalog. Catalogs *catalog.CatalogsAPI // A clean room is a secure, privacy-protecting environment where two or // more parties can share sensitive enterprise data, including customer // data, for measurements, insights, activation and other use cases. // // To create clean rooms, you must be a metastore admin or a user with the // **CREATE_CLEAN_ROOM** privilege. CleanRooms *sharing.CleanRoomsAPI // Cluster policy limits the ability to configure clusters based on a set of // rules. The policy rules limit the attributes or attribute values // available for cluster creation. Cluster policies have ACLs that limit // their use to specific users and groups. // // Cluster policies let you limit users to create clusters with prescribed // settings, simplify the user interface and enable more users to create // their own clusters (by fixing and hiding some values), control cost by // limiting per cluster maximum cost (by setting limits on attributes whose // values contribute to hourly price). // // Cluster policy permissions limit which policies a user can select in the // Policy drop-down when the user creates a cluster: - A user who has // cluster create permission can select the Unrestricted policy and create // fully-configurable clusters. - A user who has both cluster create // permission and access to cluster policies can select the Unrestricted // policy and policies they have access to. - A user that has access to only // cluster policies, can select the policies they have access to. // // If no policies have been created in the workspace, the Policy drop-down // does not display. // // Only admin users can create, edit, and delete policies. Admin users also // have access to all policies. ClusterPolicies *compute.ClusterPoliciesAPI // The Clusters API allows you to create, start, edit, list, terminate, and // delete clusters. // // Databricks maps cluster node instance types to compute units known as // DBUs. See the instance type pricing page for a list of the supported // instance types and their corresponding DBUs. // // A Databricks cluster is a set of computation resources and configurations // on which you run data engineering, data science, and data analytics // workloads, such as production ETL pipelines, streaming analytics, ad-hoc // analytics, and machine learning. // // You run these workloads as a set of commands in a notebook or as an // automated job. Databricks makes a distinction between all-purpose // clusters and job clusters. You use all-purpose clusters to analyze data // collaboratively using interactive notebooks. You use job clusters to run // fast and robust automated jobs. // // You can create an all-purpose cluster using the UI, CLI, or REST API. You // can manually terminate and restart an all-purpose cluster. Multiple users // can share such clusters to do collaborative interactive analysis. // // IMPORTANT: Databricks retains cluster configuration information for up to // 200 all-purpose clusters terminated in the last 30 days and up to 30 job // clusters recently terminated by the job scheduler. To keep an all-purpose // cluster configuration even after it has been terminated for more than 30 // days, an administrator can pin a cluster to the cluster list. Clusters *compute.ClustersAPI // This API allows execution of Python, Scala, SQL, or R commands on running // Databricks Clusters. CommandExecution *compute.CommandExecutionAPI // Connections allow for creating a connection to an external data source. // // A connection is an abstraction of an external data source that can be // connected from Databricks Compute. Creating a connection object is the // first step to managing external data sources within Unity Catalog, with // the second step being creating a data object (catalog, schema, or table) // using the connection. Data objects derived from a connection can be // written to or read from similar to other Unity Catalog data objects based // on cloud storage. Users may create different types of connections with // each connection having a unique set of configuration options to support // credential management and other settings. Connections *catalog.ConnectionsAPI // This API allows retrieving information about currently authenticated user // or service principal. CurrentUser *iam.CurrentUserAPI // In general, there is little need to modify dashboards using the API. // However, it can be useful to use dashboard objects to look-up a // collection of related query IDs. The API can also be used to duplicate // multiple dashboards at once since you can get a dashboard definition with // a GET request and then POST it to create a new one. Dashboards can be // scheduled using the `sql_task` type of the Jobs API, e.g. // :method:jobs/create. Dashboards *sql.DashboardsAPI // This API is provided to assist you in making new query objects. When // creating a query object, you may optionally specify a `data_source_id` // for the SQL warehouse against which it will run. If you don't already // know the `data_source_id` for your desired SQL warehouse, this API will // help you find it. // // This API does not support searches. It returns the full list of SQL // warehouses in your workspace. We advise you to use any text editor, REST // client, or `grep` to search the response from this API for the name of // your SQL warehouse as it appears in Databricks SQL. DataSources *sql.DataSourcesAPI // DBFS API makes it simple to interact with various data sources without // having to include a users credentials every time to read a file. Dbfs *files.DbfsAPI // The SQL Permissions API is similar to the endpoints of the // :method:permissions/set. However, this exposes only one endpoint, which // gets the Access Control List for a given object. You cannot modify any // permissions using this API. // // There are three levels of permission: // // - `CAN_VIEW`: Allows read-only access // // - `CAN_RUN`: Allows read access and run access (superset of `CAN_VIEW`) // // - `CAN_MANAGE`: Allows all actions: read, run, edit, delete, modify // permissions (superset of `CAN_RUN`) DbsqlPermissions *sql.DbsqlPermissionsAPI // Experiments are the primary unit of organization in MLflow; all MLflow // runs belong to an experiment. Each experiment lets you visualize, search, // and compare runs, as well as download run artifacts or metadata for // analysis in other tools. Experiments are maintained in a Databricks // hosted MLflow tracking server. // // Experiments are located in the workspace file tree. You manage // experiments using the same tools you use to manage other workspace // objects such as folders, notebooks, and libraries. Experiments *ml.ExperimentsAPI // An external location is an object that combines a cloud storage path with // a storage credential that authorizes access to the cloud storage path. // Each external location is subject to Unity Catalog access-control // policies that control which users and groups can access the credential. // If a user does not have access to an external location in Unity Catalog, // the request fails and Unity Catalog does not attempt to authenticate to // your cloud tenant on the user’s behalf. // // Databricks recommends using external locations rather than using storage // credentials directly. // // To create external locations, you must be a metastore admin or a user // with the **CREATE_EXTERNAL_LOCATION** privilege. ExternalLocations *catalog.ExternalLocationsAPI // Functions implement User-Defined Functions (UDFs) in Unity Catalog. // // The function implementation can be any SQL expression or Query, and it // can be invoked wherever a table reference is allowed in a query. In Unity // Catalog, a function resides at the same level as a table, so it can be // referenced with the form // __catalog_name__.__schema_name__.__function_name__. Functions *catalog.FunctionsAPI // Registers personal access token for Databricks to do operations on behalf // of the user. // // See [more info]. // // [more info]: https://docs.databricks.com/repos/get-access-tokens-from-git-provider.html GitCredentials *workspace.GitCredentialsAPI // The Global Init Scripts API enables Workspace administrators to configure // global initialization scripts for their workspace. These scripts run on // every node in every cluster in the workspace. // // **Important:** Existing clusters must be restarted to pick up any changes // made to global init scripts. Global init scripts are run in order. If the // init script returns with a bad exit code, the Apache Spark container // fails to launch and init scripts with later position are skipped. If // enough containers fail, the entire cluster fails with a // `GLOBAL_INIT_SCRIPT_FAILURE` error code. GlobalInitScripts *compute.GlobalInitScriptsAPI // In Unity Catalog, data is secure by default. Initially, users have no // access to data in a metastore. Access can be granted by either a // metastore admin, the owner of an object, or the owner of the catalog or // schema that contains the object. Securable objects in Unity Catalog are // hierarchical and privileges are inherited downward. // // Securable objects in Unity Catalog are hierarchical and privileges are // inherited downward. This means that granting a privilege on the catalog // automatically grants the privilege to all current and future objects // within the catalog. Similarly, privileges granted on a schema are // inherited by all current and future objects within that schema. Grants *catalog.GrantsAPI // Groups simplify identity management, making it easier to assign access to // Databricks workspace, data, and other securable objects. // // It is best practice to assign access to workspaces and access-control // policies in Unity Catalog to groups, instead of to users individually. // All Databricks workspace identities can be assigned as members of groups, // and members inherit permissions that are assigned to their group. Groups *iam.GroupsAPI // Instance Pools API are used to create, edit, delete and list instance // pools by using ready-to-use cloud instances which reduces a cluster start // and auto-scaling times. // // Databricks pools reduce cluster start and auto-scaling times by // maintaining a set of idle, ready-to-use instances. When a cluster is // attached to a pool, cluster nodes are created using the pool’s idle // instances. If the pool has no idle instances, the pool expands by // allocating a new instance from the instance provider in order to // accommodate the cluster’s request. When a cluster releases an instance, // it returns to the pool and is free for another cluster to use. Only // clusters attached to a pool can use that pool’s idle instances. // // You can specify a different pool for the driver node and worker nodes, or // use the same pool for both. // // Databricks does not charge DBUs while instances are idle in the pool. // Instance provider billing does apply. See pricing. InstancePools *compute.InstancePoolsAPI // The Instance Profiles API allows admins to add, list, and remove instance // profiles that users can launch clusters with. Regular users can list the // instance profiles available to them. See [Secure access to S3 buckets] // using instance profiles for more information. // // [Secure access to S3 buckets]: https://docs.databricks.com/administration-guide/cloud-configurations/aws/instance-profiles.html InstanceProfiles *compute.InstanceProfilesAPI // IP Access List enables admins to configure IP access lists. // // IP access lists affect web application access and REST API access to this // workspace only. If the feature is disabled for a workspace, all access is // allowed for this workspace. There is support for allow lists (inclusion) // and block lists (exclusion). // // When a connection is attempted: 1. **First, all block lists are // checked.** If the connection IP address matches any block list, the // connection is rejected. 2. **If the connection was not rejected by block // lists**, the IP address is compared with the allow lists. // // If there is at least one allow list for the workspace, the connection is // allowed only if the IP address matches an allow list. If there are no // allow lists for the workspace, all IP addresses are allowed. // // For all allow lists and block lists combined, the workspace supports a // maximum of 1000 IP/CIDR values, where one CIDR counts as a single value. // // After changes to the IP access list feature, it can take a few minutes // for changes to take effect. IpAccessLists *settings.IpAccessListsAPI // The Jobs API allows you to create, edit, and delete jobs. // // You can use a Databricks job to run a data processing or data analysis // task in a Databricks cluster with scalable resources. Your job can // consist of a single task or can be a large, multi-task workflow with // complex dependencies. Databricks manages the task orchestration, cluster // management, monitoring, and error reporting for all of your jobs. You can // run your jobs immediately or periodically through an easy-to-use // scheduling system. You can implement job tasks using notebooks, JARS, // Delta Live Tables pipelines, or Python, Scala, Spark submit, and Java // applications. // // You should never hard code secrets or store them in plain text. Use the // [Secrets CLI] to manage secrets in the [Databricks CLI]. Use the [Secrets // utility] to reference secrets in notebooks and jobs. // // [Databricks CLI]: https://docs.databricks.com/dev-tools/cli/index.html // [Secrets CLI]: https://docs.databricks.com/dev-tools/cli/secrets-cli.html // [Secrets utility]: https://docs.databricks.com/dev-tools/databricks-utils.html#dbutils-secrets Jobs *jobs.JobsAPI // The Libraries API allows you to install and uninstall libraries and get // the status of libraries on a cluster. // // To make third-party or custom code available to notebooks and jobs // running on your clusters, you can install a library. Libraries can be // written in Python, Java, Scala, and R. You can upload Java, Scala, and // Python libraries and point to external packages in PyPI, Maven, and CRAN // repositories. // // Cluster libraries can be used by all notebooks running on a cluster. You // can install a cluster library directly from a public repository such as // PyPI or Maven, using a previously installed workspace library, or using // an init script. // // When you install a library on a cluster, a notebook already attached to // that cluster will not immediately see the new library. You must first // detach and then reattach the notebook to the cluster. // // When you uninstall a library from a cluster, the library is removed only // when you restart the cluster. Until you restart the cluster, the status // of the uninstalled library appears as Uninstall pending restart. Libraries *compute.LibrariesAPI // A metastore is the top-level container of objects in Unity Catalog. It // stores data assets (tables and views) and the permissions that govern // access to them. Databricks account admins can create metastores and // assign them to Databricks workspaces to control which workloads use each // metastore. For a workspace to use Unity Catalog, it must have a Unity // Catalog metastore attached. // // Each metastore is configured with a root storage location in a cloud // storage account. This storage location is used for metadata and managed // tables data. // // NOTE: This metastore is distinct from the metastore included in // Databricks workspaces created before Unity Catalog was released. If your // workspace includes a legacy Hive metastore, the data in that metastore is // available in a catalog named hive_metastore. Metastores *catalog.MetastoresAPI // MLflow Model Registry is a centralized model repository and a UI and set // of APIs that enable you to manage the full lifecycle of MLflow Models. ModelRegistry *ml.ModelRegistryAPI // Permissions API are used to create read, write, edit, update and manage // access for various users on different objects and endpoints. Permissions *iam.PermissionsAPI // The Delta Live Tables API allows you to create, edit, delete, start, and // view details about pipelines. // // Delta Live Tables is a framework for building reliable, maintainable, and // testable data processing pipelines. You define the transformations to // perform on your data, and Delta Live Tables manages task orchestration, // cluster management, monitoring, data quality, and error handling. // // Instead of defining your data pipelines using a series of separate Apache // Spark tasks, Delta Live Tables manages how your data is transformed based // on a target schema you define for each processing step. You can also // enforce data quality with Delta Live Tables expectations. Expectations // allow you to define expected data quality and specify how to handle // records that fail those expectations. Pipelines *pipelines.PipelinesAPI // View available policy families. A policy family contains a policy // definition providing best practices for configuring clusters for a // particular use case. // // Databricks manages and provides policy families for several common // cluster use cases. You cannot create, edit, or delete policy families. // // Policy families cannot be used directly to create clusters. Instead, you // create cluster policies using a policy family. Cluster policies created // using a policy family inherit the policy family's policy definition. PolicyFamilies *compute.PolicyFamiliesAPI // Databricks Providers REST API Providers *sharing.ProvidersAPI // These endpoints are used for CRUD operations on query definitions. Query // definitions include the target SQL warehouse, query text, name, // description, tags, parameters, and visualizations. Queries can be // scheduled using the `sql_task` type of the Jobs API, e.g. // :method:jobs/create. Queries *sql.QueriesAPI // Access the history of queries through SQL warehouses. QueryHistory *sql.QueryHistoryAPI // Databricks Recipient Activation REST API RecipientActivation *sharing.RecipientActivationAPI // Databricks Recipients REST API Recipients *sharing.RecipientsAPI // The Repos API allows users to manage their git repos. Users can use the // API to access all repos that they have manage permissions on. // // Databricks Repos is a visual Git client in Databricks. It supports common // Git operations such a cloning a repository, committing and pushing, // pulling, branch management, and visual comparison of diffs when // committing. // // Within Repos you can develop code in notebooks or other files and follow // data science and engineering code development best practices using Git // for version control, collaboration, and CI/CD. Repos *workspace.ReposAPI // A schema (also called a database) is the second layer of Unity // Catalog’s three-level namespace. A schema organizes tables, views and // functions. To access (or list) a table or view in a schema, users must // have the USE_SCHEMA data permission on the schema and its parent catalog, // and they must have the SELECT permission on the table or view. Schemas *catalog.SchemasAPI // The Secrets API allows you to manage secrets, secret scopes, and access // permissions. // // Sometimes accessing data requires that you authenticate to external data // sources through JDBC. Instead of directly entering your credentials into // a notebook, use Databricks secrets to store your credentials and // reference them in notebooks and jobs. // // Administrators, secret creators, and users granted permission can read // Databricks secrets. While Databricks makes an effort to redact secret // values that might be displayed in notebooks, it is not possible to // prevent such users from reading secrets. Secrets *workspace.SecretsAPI // Identities for use with jobs, automated tools, and systems such as // scripts, apps, and CI/CD platforms. Databricks recommends creating // service principals to run production jobs or modify production data. If // all processes that act on production data run with service principals, // interactive users do not need any write, delete, or modify privileges in // production. This eliminates the risk of a user overwriting production // data by accident. ServicePrincipals *iam.ServicePrincipalsAPI // The Serving Endpoints API allows you to create, update, and delete model // serving endpoints. // // You can use a serving endpoint to serve models from the Databricks Model // Registry or from Unity Catalog. Endpoints expose the underlying models as // scalable REST API endpoints using serverless compute. This means the // endpoints and associated compute resources are fully managed by // Databricks and will not appear in your cloud account. A serving endpoint // can consist of one or more MLflow models from the Databricks Model // Registry, called served models. A serving endpoint can have at most ten // served models. You can configure traffic settings to define how requests // should be routed to your served models behind an endpoint. Additionally, // you can configure the scale of resources that should be applied to each // served model. ServingEndpoints *serving.ServingEndpointsAPI Shares *sharing.SharesAPI // The SQL Statement Execution API manages the execution of arbitrary SQL // statements and the fetching of result data. // // **Release status** // // This feature is in [Public Preview]. // // **Getting started** // // We suggest beginning with the [SQL Statement Execution API tutorial]. // // **Overview of statement execution and result fetching** // // Statement execution begins by issuing a // :method:statementexecution/executeStatement request with a valid SQL // statement and warehouse ID, along with optional parameters such as the // data catalog and output format. // // When submitting the statement, the call can behave synchronously or // asynchronously, based on the `wait_timeout` setting. When set between // 5-50 seconds (default: 10) the call behaves synchronously and waits for // results up to the specified timeout; when set to `0s`, the call is // asynchronous and responds immediately with a statement ID that can be // used to poll for status or fetch the results in a separate call. // // **Call mode: synchronous** // // In synchronous mode, when statement execution completes within the `wait // timeout`, the result data is returned directly in the response. This // response will contain `statement_id`, `status`, `manifest`, and `result` // fields. The `status` field confirms success whereas the `manifest` field // contains the result data column schema and metadata about the result set. // The `result` field contains the first chunk of result data according to // the specified `disposition`, and links to fetch any remaining chunks. // // If the execution does not complete before `wait_timeout`, the setting // `on_wait_timeout` determines how the system responds. // // By default, `on_wait_timeout=CONTINUE`, and after reaching // `wait_timeout`, a response is returned and statement execution continues // asynchronously. The response will contain only `statement_id` and // `status` fields, and the caller must now follow the flow described for // asynchronous call mode to poll and fetch the result. // // Alternatively, `on_wait_timeout` can also be set to `CANCEL`; in this // case if the timeout is reached before execution completes, the underlying // statement execution is canceled, and a `CANCELED` status is returned in // the response. // // **Call mode: asynchronous** // // In asynchronous mode, or after a timed-out synchronous request continues, // a `statement_id` and `status` will be returned. In this case polling // :method:statementexecution/getStatement calls are required to fetch the // result and metadata. // // Next, a caller must poll until execution completes (`SUCCEEDED`, // `FAILED`, etc.) by issuing :method:statementexecution/getStatement // requests for the given `statement_id`. // // When execution has succeeded, the response will contain `status`, // `manifest`, and `result` fields. These fields and the structure are // identical to those in the response to a successful synchronous // submission. The `result` field will contain the first chunk of result // data, either `INLINE` or as `EXTERNAL_LINKS` depending on `disposition`. // Additional chunks of result data can be fetched by checking for the // presence of the `next_chunk_internal_link` field, and iteratively `GET` // those paths until that field is unset: `GET // https://$DATABRICKS_HOST/{next_chunk_internal_link}`. // // **Fetching result data: format and disposition** // // To specify the result data format, set the `format` field to `JSON_ARRAY` // (JSON), `ARROW_STREAM` ([Apache Arrow Columnar]), or `CSV`. // // You can also configure how to fetch the result data in two different // modes by setting the `disposition` field to `INLINE` or `EXTERNAL_LINKS`. // // The `INLINE` disposition can only be used with the `JSON_ARRAY` format // and allows results up to 16 MiB. When a statement executed with `INLINE` // disposition exceeds this limit, the execution is aborted, and no result // can be fetched. // // The `EXTERNAL_LINKS` disposition allows fetching large result sets in // `JSON_ARRAY`, `ARROW_STREAM` and `CSV` formats, and with higher // throughput. // // The API uses defaults of `format=JSON_ARRAY` and `disposition=INLINE`. // Databricks recommends that you explicit setting the format and the // disposition for all production use cases. // // **Statement response: statement_id, status, manifest, and result** // // The base call :method:statementexecution/getStatement returns a single // response combining `statement_id`, `status`, a result `manifest`, and a // `result` data chunk or link, depending on the `disposition`. The // `manifest` contains the result schema definition and the result summary // metadata. When using `disposition=EXTERNAL_LINKS`, it also contains a // full listing of all chunks and their summary metadata. // // **Use case: small result sets with INLINE + JSON_ARRAY** // // For flows that generate small and predictable result sets (<= 16 MiB), // `INLINE` downloads of `JSON_ARRAY` result data are typically the simplest // way to execute and fetch result data. // // When the result set with `disposition=INLINE` is larger, the result can // be transferred in chunks. After receiving the initial chunk with // :method:statementexecution/executeStatement or // :method:statementexecution/getStatement subsequent calls are required to // iteratively fetch each chunk. Each result response contains a link to the // next chunk, when there are additional chunks to fetch; it can be found in // the field `.next_chunk_internal_link`. This link is an absolute `path` to // be joined with your `$DATABRICKS_HOST`, and of the form // `/api/2.0/sql/statements/{statement_id}/result/chunks/{chunk_index}`. The // next chunk can be fetched by issuing a // :method:statementexecution/getStatementResultChunkN request. // // When using this mode, each chunk may be fetched once, and in order. A // chunk without a field `next_chunk_internal_link` indicates the last chunk // was reached and all chunks have been fetched from the result set. // // **Use case: large result sets with EXTERNAL_LINKS + ARROW_STREAM** // // Using `EXTERNAL_LINKS` to fetch result data in Arrow format allows you to // fetch large result sets efficiently. The primary difference from using // `INLINE` disposition is that fetched result chunks contain resolved // `external_links` URLs, which can be fetched with standard HTTP. // // **Presigned URLs** // // External links point to data stored within your workspace's internal // DBFS, in the form of a presigned URL. The URLs are valid for only a short // period, <= 15 minutes. Alongside each `external_link` is an expiration // field indicating the time at which the URL is no longer valid. In // `EXTERNAL_LINKS` mode, chunks can be resolved and fetched multiple times // and in parallel. // // ---- // // ### **Warning: We recommend you protect the URLs in the EXTERNAL_LINKS.** // // When using the EXTERNAL_LINKS disposition, a short-lived pre-signed URL // is generated, which the client can use to download the result chunk // directly from cloud storage. As the short-lived credential is embedded in // a pre-signed URL, this URL should be protected. // // Since pre-signed URLs are generated with embedded temporary credentials, // you need to remove the authorization header from the fetch requests. // // ---- // // Similar to `INLINE` mode, callers can iterate through the result set, by // using the `next_chunk_internal_link` field. Each internal link response // will contain an external link to the raw chunk data, and additionally // contain the `next_chunk_internal_link` if there are more chunks. // // Unlike `INLINE` mode, when using `EXTERNAL_LINKS`, chunks may be fetched // out of order, and in parallel to achieve higher throughput. // // **Limits and limitations** // // Note: All byte limits are calculated based on internal storage metrics // and will not match byte counts of actual payloads. // // - Statements with `disposition=INLINE` are limited to 16 MiB and will // abort when this limit is exceeded. - Statements with // `disposition=EXTERNAL_LINKS` are limited to 100 GiB. - The maximum query // text size is 16 MiB. - Cancelation may silently fail. A successful // response from a cancel request indicates that the cancel request was // successfully received and sent to the processing engine. However, for // example, an outstanding statement may complete execution during signal // delivery, with the cancel signal arriving too late to be meaningful. // Polling for status until a terminal state is reached is a reliable way to // determine the final state. - Wait timeouts are approximate, occur // server-side, and cannot account for caller delays, network latency from // caller to service, and similarly. - After a statement has been submitted // and a statement_id is returned, that statement's status and result will // automatically close after either of 2 conditions: - The last result chunk // is fetched (or resolved to an external link). - One hour passes with no // calls to get the status or fetch the result. Best practice: in // asynchronous clients, poll for status regularly (and with backoff) to // keep the statement open and alive. - After fetching the last result chunk // (including chunk_index=0) the statement is automatically closed. // // [Apache Arrow Columnar]: https://arrow.apache.org/overview/ // [Public Preview]: https://docs.databricks.com/release-notes/release-types.html // [SQL Statement Execution API tutorial]: https://docs.databricks.com/sql/api/sql-execution-tutorial.html StatementExecution *sql.StatementExecutionAPI // A storage credential represents an authentication and authorization // mechanism for accessing data stored on your cloud tenant. Each storage // credential is subject to Unity Catalog access-control policies that // control which users and groups can access the credential. If a user does // not have access to a storage credential in Unity Catalog, the request // fails and Unity Catalog does not attempt to authenticate to your cloud // tenant on the user’s behalf. // // Databricks recommends using external locations rather than using storage // credentials directly. // // To create storage credentials, you must be a Databricks account admin. // The account admin who creates the storage credential can delegate // ownership to another user or group to manage permissions on it. StorageCredentials *catalog.StorageCredentialsAPI // A system schema is a schema that lives within the system catalog. A // system schema may contain information about customer usage of Unity // Catalog such as audit-logs, billing-logs, lineage information, etc. SystemSchemas *catalog.SystemSchemasAPI // Primary key and foreign key constraints encode relationships between // fields in tables. // // Primary and foreign keys are informational only and are not enforced. // Foreign keys must reference a primary key in another table. This primary // key is the parent constraint of the foreign key and the table this // primary key is on is the parent table of the foreign key. Similarly, the // foreign key is the child constraint of its referenced primary key; the // table of the foreign key is the child table of the primary key. // // You can declare primary keys and foreign keys as part of the table // specification during table creation. You can also add or drop constraints // on existing tables. TableConstraints *catalog.TableConstraintsAPI // A table resides in the third layer of Unity Catalog’s three-level // namespace. It contains rows of data. To create a table, users must have // CREATE_TABLE and USE_SCHEMA permissions on the schema, and they must have // the USE_CATALOG permission on its parent catalog. To query a table, users // must have the SELECT permission on the table, and they must have the // USE_CATALOG permission on its parent catalog and the USE_SCHEMA // permission on its parent schema. // // A table can be managed or external. From an API perspective, a __VIEW__ // is a particular kind of table (rather than a managed or external table). Tables *catalog.TablesAPI // Enables administrators to get all tokens and delete tokens for other // users. Admins can either get every token, get a specific token by ID, or // get all tokens for a particular user. TokenManagement *settings.TokenManagementAPI // The Token API allows you to create, list, and revoke tokens that can be // used to authenticate and access Databricks REST APIs. Tokens *settings.TokensAPI // User identities recognized by Databricks and represented by email // addresses. // // Databricks recommends using SCIM provisioning to sync users and groups // automatically from your identity provider to your Databricks workspace. // SCIM streamlines onboarding a new employee or team by using your identity // provider to create users and groups in Databricks workspace and give them // the proper level of access. When a user leaves your organization or no // longer needs access to Databricks workspace, admins can terminate the // user in your identity provider and that user’s account will also be // removed from Databricks workspace. This ensures a consistent offboarding // process and prevents unauthorized users from accessing sensitive data. Users *iam.UsersAPI // Volumes are a Unity Catalog (UC) capability for accessing, storing, // governing, organizing and processing files. Use cases include running // machine learning on unstructured data such as image, audio, video, or PDF // files, organizing data sets during the data exploration stages in data // science, working with libraries that require access to the local file // system on cluster machines, storing library and config files of arbitrary // formats such as .whl or .txt centrally and providing secure access across // workspaces to it, or transforming and querying non-tabular data files in // ETL. Volumes *catalog.VolumesAPI // A SQL warehouse is a compute resource that lets you run SQL commands on // data objects within Databricks SQL. Compute resources are infrastructure // resources that provide processing capabilities in the cloud. Warehouses *sql.WarehousesAPI // The Workspace API allows you to list, import, export, and delete // notebooks and folders. // // A notebook is a web-based interface to a document that contains runnable // code, visualizations, and explanatory text. Workspace *workspace.WorkspaceAPI // A catalog in Databricks can be configured as __OPEN__ or __ISOLATED__. An // __OPEN__ catalog can be accessed from any workspace, while an // __ISOLATED__ catalog can only be access from a configured list of // workspaces. // // A catalog's workspace bindings can be configured by a metastore admin or // the owner of the catalog. WorkspaceBindings *catalog.WorkspaceBindingsAPI // This API allows updating known workspace settings for advanced users. WorkspaceConf *settings.WorkspaceConfAPI }
func NewWorkspaceClient ¶
func NewWorkspaceClient(c ...*Config) (*WorkspaceClient, error)
NewWorkspaceClient creates new Databricks SDK client for Workspaces or returns error in case configuration is wrong
Directories ¶
Path | Synopsis |
---|---|
examples
|
|
internal
|
|
code
Package holds higher-level abstractions on top of OpenAPI that are used to generate code via text/template for Databricks SDK in different languages.
|
Package holds higher-level abstractions on top of OpenAPI that are used to generate code via text/template for Databricks SDK in different languages. |
gen
Usage: openapi-codegen
|
Usage: openapi-codegen |
Databricks SDK for Go APIs
|
Databricks SDK for Go APIs |
billing
These APIs allow you to manage Billable Usage, Budgets, Log Delivery, etc.
|
These APIs allow you to manage Billable Usage, Budgets, Log Delivery, etc. |
catalog
These APIs allow you to manage Account Metastore Assignments, Account Metastores, Account Storage Credentials, Catalogs, Connections, External Locations, Functions, Grants, Metastores, Schemas, Storage Credentials, System Schemas, Table Constraints, Tables, Volumes, Workspace Bindings, etc.
|
These APIs allow you to manage Account Metastore Assignments, Account Metastores, Account Storage Credentials, Catalogs, Connections, External Locations, Functions, Grants, Metastores, Schemas, Storage Credentials, System Schemas, Table Constraints, Tables, Volumes, Workspace Bindings, etc. |
compute
These APIs allow you to manage Cluster Policies, Clusters, Command Execution, Global Init Scripts, Instance Pools, Instance Profiles, Libraries, Policy Families, etc.
|
These APIs allow you to manage Cluster Policies, Clusters, Command Execution, Global Init Scripts, Instance Pools, Instance Profiles, Libraries, Policy Families, etc. |
files
DBFS API makes it simple to interact with various data sources without having to include a users credentials every time to read a file.
|
DBFS API makes it simple to interact with various data sources without having to include a users credentials every time to read a file. |
iam
These APIs allow you to manage Account Access Control, Account Access Control Proxy, Account Groups, Account Service Principals, Account Users, Current User, Groups, Permissions, Service Principals, Users, Workspace Assignment, etc.
|
These APIs allow you to manage Account Access Control, Account Access Control Proxy, Account Groups, Account Service Principals, Account Users, Current User, Groups, Permissions, Service Principals, Users, Workspace Assignment, etc. |
jobs
The Jobs API allows you to create, edit, and delete jobs.
|
The Jobs API allows you to create, edit, and delete jobs. |
ml
These APIs allow you to manage Experiments, Model Registry, etc.
|
These APIs allow you to manage Experiments, Model Registry, etc. |
oauth2
These APIs allow you to manage Custom App Integration, O Auth Enrollment, Published App Integration, Service Principal Secrets, etc.
|
These APIs allow you to manage Custom App Integration, O Auth Enrollment, Published App Integration, Service Principal Secrets, etc. |
pipelines
The Delta Live Tables API allows you to create, edit, delete, start, and view details about pipelines.
|
The Delta Live Tables API allows you to create, edit, delete, start, and view details about pipelines. |
provisioning
These APIs allow you to manage Credentials, Encryption Keys, Networks, Private Access, Storage, Vpc Endpoints, Workspaces, etc.
|
These APIs allow you to manage Credentials, Encryption Keys, Networks, Private Access, Storage, Vpc Endpoints, Workspaces, etc. |
serving
The Serving Endpoints API allows you to create, update, and delete model serving endpoints.
|
The Serving Endpoints API allows you to create, update, and delete model serving endpoints. |
settings
These APIs allow you to manage Account Ip Access Lists, Account Settings, Ip Access Lists, Token Management, Tokens, Workspace Conf, etc.
|
These APIs allow you to manage Account Ip Access Lists, Account Settings, Ip Access Lists, Token Management, Tokens, Workspace Conf, etc. |
sharing
These APIs allow you to manage Clean Rooms, Providers, Recipient Activation, Recipients, Shares, etc.
|
These APIs allow you to manage Clean Rooms, Providers, Recipient Activation, Recipients, Shares, etc. |
sql
These APIs allow you to manage Alerts, Dashboards, Data Sources, Dbsql Permissions, Queries, Query History, Statement Execution, Warehouses, etc.
|
These APIs allow you to manage Alerts, Dashboards, Data Sources, Dbsql Permissions, Queries, Query History, Statement Execution, Warehouses, etc. |
workspace
These APIs allow you to manage Git Credentials, Repos, Secrets, Workspace, etc.
|
These APIs allow you to manage Git Credentials, Repos, Secrets, Workspace, etc. |
Click to show internal directories.
Click to hide internal directories.