dbfs

package
v0.6.0 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Apr 3, 2023 License: Apache-2.0 Imports: 9 Imported by: 0

Documentation

Overview

DBFS API makes it simple to interact with various data sources without having to include a users credentials every time to read a file.

Databricks File System (DBFS) API

We recommend using a client created via [databricks.NewWorkspaceClient] to simplify the configuration experience.

Reading and writing files

You can open a file on DBFS for reading or writing with DbfsAPI.Open. This function returns a Handle that is compatible with a subset of io interfaces for reading, writing, and closing.

Uploading a file from an io.Reader:

upload, _ := os.Open("/path/to/local/file.ext")
remote, _ := w.Dbfs.Open(ctx, "/path/to/remote/file", dbfs.FileModeWrite|dbfs.FileModeOverwrite)
io.Copy(remote, upload)
remote.Close()

Downloading a file to an io.Writer:

download, _ := os.Create("/path/to/local")
remote, _ := w.Dbfs.Open(ctx, "/path/to/remote/file", dbfs.FileModeRead)
_ = io.Copy(download, remote)

Reading and writing files from buffers

You can read from or write to a DBFS file directly from a byte slice through the convenience functions DbfsAPI.ReadFile and DbfsAPI.WriteFile.

Uploading a file from a byte slice:

buf := []byte("Hello world!")
_ = w.Dbfs.WriteFile(ctx, "/path/to/remote/file", buf)

Downloading a file into a byte slice:

buf, err := w.Dbfs.ReadFile(ctx, "/path/to/remote/file")

Moving files

err := w.Dbfs.Move(ctx, dbfs.Move{
	SourcePath:      "/remote/src/path",
	DestinationPath: "/remote/dst/path",
})

Creating directories

w.Dbfs.MkdirsByPath(ctx, "/remote/dir/path")

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

This section is empty.

Types

type AddBlock

type AddBlock struct {
	// The base64-encoded data to append to the stream. This has a limit of 1
	// MB.
	Data string `json:"data"`
	// The handle on an open stream.
	Handle int64 `json:"handle"`
}

type Close

type Close struct {
	// The handle on an open stream.
	Handle int64 `json:"handle"`
}

type Create

type Create struct {
	// The flag that specifies whether to overwrite existing file/files.
	Overwrite bool `json:"overwrite,omitempty"`
	// The path of the new file. The path should be the absolute DBFS path.
	Path string `json:"path"`
}

type CreateResponse

type CreateResponse struct {
	// Handle which should subsequently be passed into the AddBlock and Close
	// calls when writing to a file through a stream.
	Handle int64 `json:"handle,omitempty"`
}

type DbfsAPI

type DbfsAPI struct {
	// contains filtered or unexported fields
}

DBFS API makes it simple to interact with various data sources without having to include a users credentials every time to read a file.

func NewDbfs

func NewDbfs(client *client.DatabricksClient) *DbfsAPI

func (*DbfsAPI) AddBlock

func (a *DbfsAPI) AddBlock(ctx context.Context, request AddBlock) error

Append data block.

Appends a block of data to the stream specified by the input handle. If the handle does not exist, this call will throw an exception with `RESOURCE_DOES_NOT_EXIST`.

If the block of data exceeds 1 MB, this call will throw an exception with `MAX_BLOCK_SIZE_EXCEEDED`.

func (*DbfsAPI) Close

func (a *DbfsAPI) Close(ctx context.Context, request Close) error

Close the stream.

Closes the stream specified by the input handle. If the handle does not exist, this call throws an exception with `RESOURCE_DOES_NOT_EXIST`.

func (*DbfsAPI) CloseByHandle

func (a *DbfsAPI) CloseByHandle(ctx context.Context, handle int64) error

Close the stream.

Closes the stream specified by the input handle. If the handle does not exist, this call throws an exception with `RESOURCE_DOES_NOT_EXIST`.

func (*DbfsAPI) Create

func (a *DbfsAPI) Create(ctx context.Context, request Create) (*CreateResponse, error)

Open a stream.

"Opens a stream to write to a file and returns a handle to this stream. There is a 10 minute idle timeout on this handle. If a file or directory already exists on the given path and __overwrite__ is set to `false`, this call throws an exception with `RESOURCE_ALREADY_EXISTS`.

A typical workflow for file upload would be:

1. Issue a `create` call and get a handle. 2. Issue one or more `add-block` calls with the handle you have. 3. Issue a `close` call with the handle you have.

func (*DbfsAPI) Delete

func (a *DbfsAPI) Delete(ctx context.Context, request Delete) error

Delete a file/directory.

Delete the file or directory (optionally recursively delete all files in the directory). This call throws an exception with `IO_ERROR` if the path is a non-empty directory and `recursive` is set to `false` or on other similar errors.

When you delete a large number of files, the delete operation is done in increments. The call returns a response after approximately 45 seconds with an error message (503 Service Unavailable) asking you to re-invoke the delete operation until the directory structure is fully deleted.

For operations that delete more than 10K files, we discourage using the DBFS REST API, but advise you to perform such operations in the context of a cluster, using the [File system utility (dbutils.fs)](/dev-tools/databricks-utils.html#dbutils-fs). `dbutils.fs` covers the functional scope of the DBFS REST API, but from notebooks. Running such operations using notebooks provides better control and manageability, such as selective deletes, and the possibility to automate periodic delete jobs.

func (*DbfsAPI) GetStatus

func (a *DbfsAPI) GetStatus(ctx context.Context, request GetStatus) (*FileInfo, error)

Get the information of a file or directory.

Gets the file information for a file or directory. If the file or directory does not exist, this call throws an exception with `RESOURCE_DOES_NOT_EXIST`.

func (*DbfsAPI) GetStatusByPath

func (a *DbfsAPI) GetStatusByPath(ctx context.Context, path string) (*FileInfo, error)

Get the information of a file or directory.

Gets the file information for a file or directory. If the file or directory does not exist, this call throws an exception with `RESOURCE_DOES_NOT_EXIST`.

func (*DbfsAPI) Impl

func (a *DbfsAPI) Impl() DbfsService

Impl returns low-level Dbfs API implementation

func (*DbfsAPI) ListAll

func (a *DbfsAPI) ListAll(ctx context.Context, request List) ([]FileInfo, error)

List directory contents or file details.

List the contents of a directory, or details of the file. If the file or directory does not exist, this call throws an exception with `RESOURCE_DOES_NOT_EXIST`.

When calling list on a large directory, the list operation will time out after approximately 60 seconds. We strongly recommend using list only on directories containing less than 10K files and discourage using the DBFS REST API for operations that list more than 10K files. Instead, we recommend that you perform such operations in the context of a cluster, using the [File system utility (dbutils.fs)](/dev-tools/databricks-utils.html#dbutils-fs), which provides the same functionality without timing out.

This method is generated by Databricks SDK Code Generator.

func (*DbfsAPI) ListByPath

func (a *DbfsAPI) ListByPath(ctx context.Context, path string) (*ListStatusResponse, error)

List directory contents or file details.

List the contents of a directory, or details of the file. If the file or directory does not exist, this call throws an exception with `RESOURCE_DOES_NOT_EXIST`.

When calling list on a large directory, the list operation will time out after approximately 60 seconds. We strongly recommend using list only on directories containing less than 10K files and discourage using the DBFS REST API for operations that list more than 10K files. Instead, we recommend that you perform such operations in the context of a cluster, using the [File system utility (dbutils.fs)](/dev-tools/databricks-utils.html#dbutils-fs), which provides the same functionality without timing out.

func (*DbfsAPI) Mkdirs

func (a *DbfsAPI) Mkdirs(ctx context.Context, request MkDirs) error

Create a directory.

Creates the given directory and necessary parent directories if they do not exist. If a file (not a directory) exists at any prefix of the input path, this call throws an exception with `RESOURCE_ALREADY_EXISTS`. **Note**: If this operation fails, it might have succeeded in creating some of the necessary parent directories.

func (*DbfsAPI) MkdirsByPath

func (a *DbfsAPI) MkdirsByPath(ctx context.Context, path string) error

Create a directory.

Creates the given directory and necessary parent directories if they do not exist. If a file (not a directory) exists at any prefix of the input path, this call throws an exception with `RESOURCE_ALREADY_EXISTS`. **Note**: If this operation fails, it might have succeeded in creating some of the necessary parent directories.

func (*DbfsAPI) Move

func (a *DbfsAPI) Move(ctx context.Context, request Move) error

Move a file.

Moves a file from one location to another location within DBFS. If the source file does not exist, this call throws an exception with `RESOURCE_DOES_NOT_EXIST`. If a file already exists in the destination path, this call throws an exception with `RESOURCE_ALREADY_EXISTS`. If the given source path is a directory, this call always recursively moves all files.",

func (*DbfsAPI) Open

func (a *DbfsAPI) Open(ctx context.Context, path string, mode FileMode) (Handle, error)

Open opens a remote DBFS file for reading or writing. The returned object implements relevant io interfaces for convenient integration with other code that reads or writes bytes.

The io.WriterTo interface is provided and maximizes throughput for bulk reads by reading data with the DBFS maximum read chunk size of 1MB. Similarly, the io.ReaderFrom interface is provided for bulk writing.

A file opened for writing must always be closed.

func (*DbfsAPI) Put

func (a *DbfsAPI) Put(ctx context.Context, request Put) error

Upload a file.

Uploads a file through the use of multipart form post. It is mainly used for streaming uploads, but can also be used as a convenient single call for data upload.

Alternatively you can pass contents as base64 string.

The amount of data that can be passed (when not streaming) using the __contents__ parameter is limited to 1 MB. `MAX_BLOCK_SIZE_EXCEEDED` will be thrown if this limit is exceeded.

If you want to upload large files, use the streaming upload. For details, see :method:dbfs/create, :method:dbfs/addBlock, :method:dbfs/close.

func (*DbfsAPI) Read

func (a *DbfsAPI) Read(ctx context.Context, request Read) (*ReadResponse, error)

Get the contents of a file.

"Returns the contents of a file. If the file does not exist, this call throws an exception with `RESOURCE_DOES_NOT_EXIST`. If the path is a directory, the read length is negative, or if the offset is negative, this call throws an exception with `INVALID_PARAMETER_VALUE`. If the read length exceeds 1 MB, this call throws an exception with `MAX_READ_SIZE_EXCEEDED`.

If `offset + length` exceeds the number of bytes in a file, it reads the contents until the end of file.",

func (*DbfsAPI) ReadFile added in v0.2.0

func (a *DbfsAPI) ReadFile(ctx context.Context, name string) ([]byte, error)

ReadFile is identical to os.ReadFile but for DBFS.

func (DbfsAPI) RecursiveList

func (a DbfsAPI) RecursiveList(ctx context.Context, path string) ([]FileInfo, error)

RecursiveList traverses the DBFS tree and returns all non-directory objects under the path

func (*DbfsAPI) WithImpl

func (a *DbfsAPI) WithImpl(impl DbfsService) *DbfsAPI

WithImpl could be used to override low-level API implementations for unit testing purposes with github.com/golang/mock or other mocking frameworks.

func (*DbfsAPI) WriteFile added in v0.2.0

func (a *DbfsAPI) WriteFile(ctx context.Context, name string, data []byte) error

WriteFile is identical to os.WriteFile but for DBFS.

type DbfsService

type DbfsService interface {

	// Append data block.
	//
	// Appends a block of data to the stream specified by the input handle. If
	// the handle does not exist, this call will throw an exception with
	// `RESOURCE_DOES_NOT_EXIST`.
	//
	// If the block of data exceeds 1 MB, this call will throw an exception with
	// `MAX_BLOCK_SIZE_EXCEEDED`.
	AddBlock(ctx context.Context, request AddBlock) error

	// Close the stream.
	//
	// Closes the stream specified by the input handle. If the handle does not
	// exist, this call throws an exception with `RESOURCE_DOES_NOT_EXIST`.
	Close(ctx context.Context, request Close) error

	// Open a stream.
	//
	// "Opens a stream to write to a file and returns a handle to this stream.
	// There is a 10 minute idle timeout on this handle. If a file or directory
	// already exists on the given path and __overwrite__ is set to `false`,
	// this call throws an exception with `RESOURCE_ALREADY_EXISTS`.
	//
	// A typical workflow for file upload would be:
	//
	// 1. Issue a `create` call and get a handle. 2. Issue one or more
	// `add-block` calls with the handle you have. 3. Issue a `close` call with
	// the handle you have.
	Create(ctx context.Context, request Create) (*CreateResponse, error)

	// Delete a file/directory.
	//
	// Delete the file or directory (optionally recursively delete all files in
	// the directory). This call throws an exception with `IO_ERROR` if the path
	// is a non-empty directory and `recursive` is set to `false` or on other
	// similar errors.
	//
	// When you delete a large number of files, the delete operation is done in
	// increments. The call returns a response after approximately 45 seconds
	// with an error message (503 Service Unavailable) asking you to re-invoke
	// the delete operation until the directory structure is fully deleted.
	//
	// For operations that delete more than 10K files, we discourage using the
	// DBFS REST API, but advise you to perform such operations in the context
	// of a cluster, using the [File system utility
	// (dbutils.fs)](/dev-tools/databricks-utils.html#dbutils-fs). `dbutils.fs`
	// covers the functional scope of the DBFS REST API, but from notebooks.
	// Running such operations using notebooks provides better control and
	// manageability, such as selective deletes, and the possibility to automate
	// periodic delete jobs.
	Delete(ctx context.Context, request Delete) error

	// Get the information of a file or directory.
	//
	// Gets the file information for a file or directory. If the file or
	// directory does not exist, this call throws an exception with
	// `RESOURCE_DOES_NOT_EXIST`.
	GetStatus(ctx context.Context, request GetStatus) (*FileInfo, error)

	// List directory contents or file details.
	//
	// List the contents of a directory, or details of the file. If the file or
	// directory does not exist, this call throws an exception with
	// `RESOURCE_DOES_NOT_EXIST`.
	//
	// When calling list on a large directory, the list operation will time out
	// after approximately 60 seconds. We strongly recommend using list only on
	// directories containing less than 10K files and discourage using the DBFS
	// REST API for operations that list more than 10K files. Instead, we
	// recommend that you perform such operations in the context of a cluster,
	// using the [File system utility
	// (dbutils.fs)](/dev-tools/databricks-utils.html#dbutils-fs), which
	// provides the same functionality without timing out.
	//
	// Use ListAll() to get all FileInfo instances
	List(ctx context.Context, request List) (*ListStatusResponse, error)

	// Create a directory.
	//
	// Creates the given directory and necessary parent directories if they do
	// not exist. If a file (not a directory) exists at any prefix of the input
	// path, this call throws an exception with `RESOURCE_ALREADY_EXISTS`.
	// **Note**: If this operation fails, it might have succeeded in creating
	// some of the necessary parent directories.
	Mkdirs(ctx context.Context, request MkDirs) error

	// Move a file.
	//
	// Moves a file from one location to another location within DBFS. If the
	// source file does not exist, this call throws an exception with
	// `RESOURCE_DOES_NOT_EXIST`. If a file already exists in the destination
	// path, this call throws an exception with `RESOURCE_ALREADY_EXISTS`. If
	// the given source path is a directory, this call always recursively moves
	// all files.",
	Move(ctx context.Context, request Move) error

	// Upload a file.
	//
	// Uploads a file through the use of multipart form post. It is mainly used
	// for streaming uploads, but can also be used as a convenient single call
	// for data upload.
	//
	// Alternatively you can pass contents as base64 string.
	//
	// The amount of data that can be passed (when not streaming) using the
	// __contents__ parameter is limited to 1 MB. `MAX_BLOCK_SIZE_EXCEEDED` will
	// be thrown if this limit is exceeded.
	//
	// If you want to upload large files, use the streaming upload. For details,
	// see :method:dbfs/create, :method:dbfs/addBlock, :method:dbfs/close.
	Put(ctx context.Context, request Put) error

	// Get the contents of a file.
	//
	// "Returns the contents of a file. If the file does not exist, this call
	// throws an exception with `RESOURCE_DOES_NOT_EXIST`. If the path is a
	// directory, the read length is negative, or if the offset is negative,
	// this call throws an exception with `INVALID_PARAMETER_VALUE`. If the read
	// length exceeds 1 MB, this call throws an exception with
	// `MAX_READ_SIZE_EXCEEDED`.
	//
	// If `offset + length` exceeds the number of bytes in a file, it reads the
	// contents until the end of file.",
	Read(ctx context.Context, request Read) (*ReadResponse, error)
}

DBFS API makes it simple to interact with various data sources without having to include a users credentials every time to read a file.

type Delete

type Delete struct {
	// The path of the file or directory to delete. The path should be the
	// absolute DBFS path.
	Path string `json:"path"`
	// Whether or not to recursively delete the directory's contents. Deleting
	// empty directories can be done without providing the recursive flag.
	Recursive bool `json:"recursive,omitempty"`
}

type FileInfo

type FileInfo struct {
	// The length of the file in bytes or zero if the path is a directory.
	FileSize int64 `json:"file_size,omitempty"`
	// True if the path is a directory.
	IsDir bool `json:"is_dir,omitempty"`
	// Last modification time of given file/dir in milliseconds since Epoch.
	ModificationTime int64 `json:"modification_time,omitempty"`
	// The path of the file or directory.
	Path string `json:"path,omitempty"`
}

type FileMode added in v0.2.0

type FileMode int

FileMode conveys user intent when opening a file.

const (
	// Exactly one of FileModeRead or FileModeWrite must be specified.
	FileModeRead FileMode = 1 << iota
	FileModeWrite
	FileModeOverwrite
)

type GetStatus

type GetStatus struct {
	// The path of the file or directory. The path should be the absolute DBFS
	// path.
	Path string `json:"-" url:"path"`
}

Get the information of a file or directory

type Handle added in v0.2.0

type Handle interface {
	io.ReadWriteCloser
	io.WriterTo
	io.ReaderFrom
}

Handle defines the interface of the object returned by DbfsAPI.Open.

type List

type List struct {
	// The path of the file or directory. The path should be the absolute DBFS
	// path.
	Path string `json:"-" url:"path"`
}

List directory contents or file details

type ListStatusResponse

type ListStatusResponse struct {
	// A list of FileInfo's that describe contents of directory or file. See
	// example above.
	Files []FileInfo `json:"files,omitempty"`
}

type MkDirs

type MkDirs struct {
	// The path of the new directory. The path should be the absolute DBFS path.
	Path string `json:"path"`
}

type Move

type Move struct {
	// The destination path of the file or directory. The path should be the
	// absolute DBFS path.
	DestinationPath string `json:"destination_path"`
	// The source path of the file or directory. The path should be the absolute
	// DBFS path.
	SourcePath string `json:"source_path"`
}

type Put

type Put struct {
	// This parameter might be absent, and instead a posted file will be used.
	Contents string `json:"contents,omitempty"`
	// The flag that specifies whether to overwrite existing file/files.
	Overwrite bool `json:"overwrite,omitempty"`
	// The path of the new file. The path should be the absolute DBFS path.
	Path string `json:"path"`
}

type Read

type Read struct {
	// The number of bytes to read starting from the offset. This has a limit of
	// 1 MB, and a default value of 0.5 MB.
	Length int `json:"-" url:"length,omitempty"`
	// The offset to read from in bytes.
	Offset int `json:"-" url:"offset,omitempty"`
	// The path of the file to read. The path should be the absolute DBFS path.
	Path string `json:"-" url:"path"`
}

Get the contents of a file

type ReadResponse

type ReadResponse struct {
	// The number of bytes read (could be less than `length` if we hit end of
	// file). This refers to number of bytes read in unencoded version (response
	// data is base64-encoded).
	BytesRead int64 `json:"bytes_read,omitempty"`
	// The base64-encoded contents of the file read.
	Data string `json:"data,omitempty"`
}

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL