Documentation ¶
Overview ¶
DBFS API makes it simple to interact with various data sources without having to include a users credentials every time to read a file.
Databricks File System (DBFS) API ¶
We recommend using a client created via [databricks.NewWorkspaceClient] to simplify the configuration experience.
Reading and writing files ¶
You can open a file on DBFS for reading or writing with DbfsAPI.Open. This function returns a Handle that is compatible with a subset of io interfaces for reading, writing, and closing.
Uploading a file from an io.Reader:
upload, _ := os.Open("/path/to/local/file.ext") remote, _ := w.Dbfs.Open(ctx, "/path/to/remote/file", dbfs.FileModeWrite|dbfs.FileModeOverwrite) io.Copy(remote, upload) remote.Close()
Downloading a file to an io.Writer:
download, _ := os.Create("/path/to/local") remote, _ := w.Dbfs.Open(ctx, "/path/to/remote/file", dbfs.FileModeRead) _ = io.Copy(download, remote)
Reading and writing files from buffers ¶
You can read from or write to a DBFS file directly from a byte slice through the convenience functions DbfsAPI.ReadFile and DbfsAPI.WriteFile.
Uploading a file from a byte slice:
buf := []byte("Hello world!") _ = w.Dbfs.WriteFile(ctx, "/path/to/remote/file", buf)
Downloading a file into a byte slice:
buf, err := w.Dbfs.ReadFile(ctx, "/path/to/remote/file")
Moving files ¶
err := w.Dbfs.Move(ctx, dbfs.Move{ SourcePath: "/remote/src/path", DestinationPath: "/remote/dst/path", })
Creating directories ¶
w.Dbfs.MkdirsByPath(ctx, "/remote/dir/path")
Index ¶
- type AddBlock
- type Close
- type Create
- type CreateResponse
- type DbfsAPI
- func (a *DbfsAPI) AddBlock(ctx context.Context, request AddBlock) error
- func (a *DbfsAPI) Close(ctx context.Context, request Close) error
- func (a *DbfsAPI) CloseByHandle(ctx context.Context, handle int64) error
- func (a *DbfsAPI) Create(ctx context.Context, request Create) (*CreateResponse, error)
- func (a *DbfsAPI) Delete(ctx context.Context, request Delete) error
- func (a *DbfsAPI) GetStatus(ctx context.Context, request GetStatus) (*FileInfo, error)
- func (a *DbfsAPI) GetStatusByPath(ctx context.Context, path string) (*FileInfo, error)
- func (a *DbfsAPI) Impl() DbfsService
- func (a *DbfsAPI) ListAll(ctx context.Context, request List) ([]FileInfo, error)
- func (a *DbfsAPI) ListByPath(ctx context.Context, path string) (*ListStatusResponse, error)
- func (a *DbfsAPI) Mkdirs(ctx context.Context, request MkDirs) error
- func (a *DbfsAPI) MkdirsByPath(ctx context.Context, path string) error
- func (a *DbfsAPI) Move(ctx context.Context, request Move) error
- func (a *DbfsAPI) Open(ctx context.Context, path string, mode FileMode) (Handle, error)
- func (a *DbfsAPI) Put(ctx context.Context, request Put) error
- func (a *DbfsAPI) Read(ctx context.Context, request Read) (*ReadResponse, error)
- func (a *DbfsAPI) ReadFile(ctx context.Context, name string) ([]byte, error)
- func (a DbfsAPI) RecursiveList(ctx context.Context, path string) ([]FileInfo, error)
- func (a *DbfsAPI) WithImpl(impl DbfsService) *DbfsAPI
- func (a *DbfsAPI) WriteFile(ctx context.Context, name string, data []byte) error
- type DbfsService
- type Delete
- type FileInfo
- type FileMode
- type GetStatus
- type Handle
- type List
- type ListStatusResponse
- type MkDirs
- type Move
- type Put
- type Read
- type ReadResponse
Constants ¶
This section is empty.
Variables ¶
This section is empty.
Functions ¶
This section is empty.
Types ¶
type CreateResponse ¶
type CreateResponse struct { // Handle which should subsequently be passed into the AddBlock and Close // calls when writing to a file through a stream. Handle int64 `json:"handle,omitempty"` }
type DbfsAPI ¶
type DbfsAPI struct {
// contains filtered or unexported fields
}
DBFS API makes it simple to interact with various data sources without having to include a users credentials every time to read a file.
func NewDbfs ¶
func NewDbfs(client *client.DatabricksClient) *DbfsAPI
func (*DbfsAPI) AddBlock ¶
Append data block.
Appends a block of data to the stream specified by the input handle. If the handle does not exist, this call will throw an exception with `RESOURCE_DOES_NOT_EXIST`.
If the block of data exceeds 1 MB, this call will throw an exception with `MAX_BLOCK_SIZE_EXCEEDED`.
func (*DbfsAPI) Close ¶
Close the stream.
Closes the stream specified by the input handle. If the handle does not exist, this call throws an exception with `RESOURCE_DOES_NOT_EXIST`.
func (*DbfsAPI) CloseByHandle ¶
Close the stream.
Closes the stream specified by the input handle. If the handle does not exist, this call throws an exception with `RESOURCE_DOES_NOT_EXIST`.
func (*DbfsAPI) Create ¶
Open a stream.
"Opens a stream to write to a file and returns a handle to this stream. There is a 10 minute idle timeout on this handle. If a file or directory already exists on the given path and __overwrite__ is set to `false`, this call throws an exception with `RESOURCE_ALREADY_EXISTS`.
A typical workflow for file upload would be:
1. Issue a `create` call and get a handle. 2. Issue one or more `add-block` calls with the handle you have. 3. Issue a `close` call with the handle you have.
func (*DbfsAPI) Delete ¶
Delete a file/directory.
Delete the file or directory (optionally recursively delete all files in the directory). This call throws an exception with `IO_ERROR` if the path is a non-empty directory and `recursive` is set to `false` or on other similar errors.
When you delete a large number of files, the delete operation is done in increments. The call returns a response after approximately 45 seconds with an error message (503 Service Unavailable) asking you to re-invoke the delete operation until the directory structure is fully deleted.
For operations that delete more than 10K files, we discourage using the DBFS REST API, but advise you to perform such operations in the context of a cluster, using the [File system utility (dbutils.fs)](/dev-tools/databricks-utils.html#dbutils-fs). `dbutils.fs` covers the functional scope of the DBFS REST API, but from notebooks. Running such operations using notebooks provides better control and manageability, such as selective deletes, and the possibility to automate periodic delete jobs.
func (*DbfsAPI) GetStatus ¶
Get the information of a file or directory.
Gets the file information for a file or directory. If the file or directory does not exist, this call throws an exception with `RESOURCE_DOES_NOT_EXIST`.
func (*DbfsAPI) GetStatusByPath ¶
Get the information of a file or directory.
Gets the file information for a file or directory. If the file or directory does not exist, this call throws an exception with `RESOURCE_DOES_NOT_EXIST`.
func (*DbfsAPI) Impl ¶
func (a *DbfsAPI) Impl() DbfsService
Impl returns low-level Dbfs API implementation
func (*DbfsAPI) ListAll ¶
List directory contents or file details.
List the contents of a directory, or details of the file. If the file or directory does not exist, this call throws an exception with `RESOURCE_DOES_NOT_EXIST`.
When calling list on a large directory, the list operation will time out after approximately 60 seconds. We strongly recommend using list only on directories containing less than 10K files and discourage using the DBFS REST API for operations that list more than 10K files. Instead, we recommend that you perform such operations in the context of a cluster, using the [File system utility (dbutils.fs)](/dev-tools/databricks-utils.html#dbutils-fs), which provides the same functionality without timing out.
This method is generated by Databricks SDK Code Generator.
func (*DbfsAPI) ListByPath ¶
List directory contents or file details.
List the contents of a directory, or details of the file. If the file or directory does not exist, this call throws an exception with `RESOURCE_DOES_NOT_EXIST`.
When calling list on a large directory, the list operation will time out after approximately 60 seconds. We strongly recommend using list only on directories containing less than 10K files and discourage using the DBFS REST API for operations that list more than 10K files. Instead, we recommend that you perform such operations in the context of a cluster, using the [File system utility (dbutils.fs)](/dev-tools/databricks-utils.html#dbutils-fs), which provides the same functionality without timing out.
func (*DbfsAPI) Mkdirs ¶
Create a directory.
Creates the given directory and necessary parent directories if they do not exist. If a file (not a directory) exists at any prefix of the input path, this call throws an exception with `RESOURCE_ALREADY_EXISTS`. **Note**: If this operation fails, it might have succeeded in creating some of the necessary parent directories.
func (*DbfsAPI) MkdirsByPath ¶
Create a directory.
Creates the given directory and necessary parent directories if they do not exist. If a file (not a directory) exists at any prefix of the input path, this call throws an exception with `RESOURCE_ALREADY_EXISTS`. **Note**: If this operation fails, it might have succeeded in creating some of the necessary parent directories.
func (*DbfsAPI) Move ¶
Move a file.
Moves a file from one location to another location within DBFS. If the source file does not exist, this call throws an exception with `RESOURCE_DOES_NOT_EXIST`. If a file already exists in the destination path, this call throws an exception with `RESOURCE_ALREADY_EXISTS`. If the given source path is a directory, this call always recursively moves all files.",
func (*DbfsAPI) Open ¶
Open opens a remote DBFS file for reading or writing. The returned object implements relevant io interfaces for convenient integration with other code that reads or writes bytes.
The io.WriterTo interface is provided and maximizes throughput for bulk reads by reading data with the DBFS maximum read chunk size of 1MB. Similarly, the io.ReaderFrom interface is provided for bulk writing.
A file opened for writing must always be closed.
func (*DbfsAPI) Put ¶
Upload a file.
Uploads a file through the use of multipart form post. It is mainly used for streaming uploads, but can also be used as a convenient single call for data upload.
Alternatively you can pass contents as base64 string.
The amount of data that can be passed (when not streaming) using the __contents__ parameter is limited to 1 MB. `MAX_BLOCK_SIZE_EXCEEDED` will be thrown if this limit is exceeded.
If you want to upload large files, use the streaming upload. For details, see :method:dbfs/create, :method:dbfs/addBlock, :method:dbfs/close.
func (*DbfsAPI) Read ¶
Get the contents of a file.
"Returns the contents of a file. If the file does not exist, this call throws an exception with `RESOURCE_DOES_NOT_EXIST`. If the path is a directory, the read length is negative, or if the offset is negative, this call throws an exception with `INVALID_PARAMETER_VALUE`. If the read length exceeds 1 MB, this call throws an exception with `MAX_READ_SIZE_EXCEEDED`.
If `offset + length` exceeds the number of bytes in a file, it reads the contents until the end of file.",
func (*DbfsAPI) ReadFile ¶ added in v0.2.0
ReadFile is identical to os.ReadFile but for DBFS.
func (DbfsAPI) RecursiveList ¶
RecursiveList traverses the DBFS tree and returns all non-directory objects under the path
func (*DbfsAPI) WithImpl ¶
func (a *DbfsAPI) WithImpl(impl DbfsService) *DbfsAPI
WithImpl could be used to override low-level API implementations for unit testing purposes with github.com/golang/mock or other mocking frameworks.
type DbfsService ¶
type DbfsService interface { // Append data block. // // Appends a block of data to the stream specified by the input handle. If // the handle does not exist, this call will throw an exception with // `RESOURCE_DOES_NOT_EXIST`. // // If the block of data exceeds 1 MB, this call will throw an exception with // `MAX_BLOCK_SIZE_EXCEEDED`. AddBlock(ctx context.Context, request AddBlock) error // Close the stream. // // Closes the stream specified by the input handle. If the handle does not // exist, this call throws an exception with `RESOURCE_DOES_NOT_EXIST`. Close(ctx context.Context, request Close) error // Open a stream. // // "Opens a stream to write to a file and returns a handle to this stream. // There is a 10 minute idle timeout on this handle. If a file or directory // already exists on the given path and __overwrite__ is set to `false`, // this call throws an exception with `RESOURCE_ALREADY_EXISTS`. // // A typical workflow for file upload would be: // // 1. Issue a `create` call and get a handle. 2. Issue one or more // `add-block` calls with the handle you have. 3. Issue a `close` call with // the handle you have. Create(ctx context.Context, request Create) (*CreateResponse, error) // Delete a file/directory. // // Delete the file or directory (optionally recursively delete all files in // the directory). This call throws an exception with `IO_ERROR` if the path // is a non-empty directory and `recursive` is set to `false` or on other // similar errors. // // When you delete a large number of files, the delete operation is done in // increments. The call returns a response after approximately 45 seconds // with an error message (503 Service Unavailable) asking you to re-invoke // the delete operation until the directory structure is fully deleted. // // For operations that delete more than 10K files, we discourage using the // DBFS REST API, but advise you to perform such operations in the context // of a cluster, using the [File system utility // (dbutils.fs)](/dev-tools/databricks-utils.html#dbutils-fs). `dbutils.fs` // covers the functional scope of the DBFS REST API, but from notebooks. // Running such operations using notebooks provides better control and // manageability, such as selective deletes, and the possibility to automate // periodic delete jobs. Delete(ctx context.Context, request Delete) error // Get the information of a file or directory. // // Gets the file information for a file or directory. If the file or // directory does not exist, this call throws an exception with // `RESOURCE_DOES_NOT_EXIST`. GetStatus(ctx context.Context, request GetStatus) (*FileInfo, error) // List directory contents or file details. // // List the contents of a directory, or details of the file. If the file or // directory does not exist, this call throws an exception with // `RESOURCE_DOES_NOT_EXIST`. // // When calling list on a large directory, the list operation will time out // after approximately 60 seconds. We strongly recommend using list only on // directories containing less than 10K files and discourage using the DBFS // REST API for operations that list more than 10K files. Instead, we // recommend that you perform such operations in the context of a cluster, // using the [File system utility // (dbutils.fs)](/dev-tools/databricks-utils.html#dbutils-fs), which // provides the same functionality without timing out. // // Use ListAll() to get all FileInfo instances List(ctx context.Context, request List) (*ListStatusResponse, error) // Create a directory. // // Creates the given directory and necessary parent directories if they do // not exist. If a file (not a directory) exists at any prefix of the input // path, this call throws an exception with `RESOURCE_ALREADY_EXISTS`. // **Note**: If this operation fails, it might have succeeded in creating // some of the necessary parent directories. Mkdirs(ctx context.Context, request MkDirs) error // Move a file. // // Moves a file from one location to another location within DBFS. If the // source file does not exist, this call throws an exception with // `RESOURCE_DOES_NOT_EXIST`. If a file already exists in the destination // path, this call throws an exception with `RESOURCE_ALREADY_EXISTS`. If // the given source path is a directory, this call always recursively moves // all files.", Move(ctx context.Context, request Move) error // Upload a file. // // Uploads a file through the use of multipart form post. It is mainly used // for streaming uploads, but can also be used as a convenient single call // for data upload. // // Alternatively you can pass contents as base64 string. // // The amount of data that can be passed (when not streaming) using the // __contents__ parameter is limited to 1 MB. `MAX_BLOCK_SIZE_EXCEEDED` will // be thrown if this limit is exceeded. // // If you want to upload large files, use the streaming upload. For details, // see :method:dbfs/create, :method:dbfs/addBlock, :method:dbfs/close. Put(ctx context.Context, request Put) error // Get the contents of a file. // // "Returns the contents of a file. If the file does not exist, this call // throws an exception with `RESOURCE_DOES_NOT_EXIST`. If the path is a // directory, the read length is negative, or if the offset is negative, // this call throws an exception with `INVALID_PARAMETER_VALUE`. If the read // length exceeds 1 MB, this call throws an exception with // `MAX_READ_SIZE_EXCEEDED`. // // If `offset + length` exceeds the number of bytes in a file, it reads the // contents until the end of file.", Read(ctx context.Context, request Read) (*ReadResponse, error) }
DBFS API makes it simple to interact with various data sources without having to include a users credentials every time to read a file.
type Delete ¶
type Delete struct { // The path of the file or directory to delete. The path should be the // absolute DBFS path. Path string `json:"path"` // Whether or not to recursively delete the directory's contents. Deleting // empty directories can be done without providing the recursive flag. Recursive bool `json:"recursive,omitempty"` }
type FileInfo ¶
type FileInfo struct { // The length of the file in bytes or zero if the path is a directory. FileSize int64 `json:"file_size,omitempty"` // True if the path is a directory. IsDir bool `json:"is_dir,omitempty"` // Last modification time of given file/dir in milliseconds since Epoch. ModificationTime int64 `json:"modification_time,omitempty"` // The path of the file or directory. Path string `json:"path,omitempty"` }
type GetStatus ¶
type GetStatus struct { // The path of the file or directory. The path should be the absolute DBFS // path. Path string `json:"-" url:"path"` }
Get the information of a file or directory
type Handle ¶ added in v0.2.0
type Handle interface { io.ReadWriteCloser io.WriterTo io.ReaderFrom }
Handle defines the interface of the object returned by DbfsAPI.Open.
type List ¶
type List struct { // The path of the file or directory. The path should be the absolute DBFS // path. Path string `json:"-" url:"path"` }
List directory contents or file details
type ListStatusResponse ¶
type ListStatusResponse struct { // A list of FileInfo's that describe contents of directory or file. See // example above. Files []FileInfo `json:"files,omitempty"` }
type MkDirs ¶
type MkDirs struct { // The path of the new directory. The path should be the absolute DBFS path. Path string `json:"path"` }
type Put ¶
type Put struct { // This parameter might be absent, and instead a posted file will be used. Contents string `json:"contents,omitempty"` // The flag that specifies whether to overwrite existing file/files. Overwrite bool `json:"overwrite,omitempty"` // The path of the new file. The path should be the absolute DBFS path. Path string `json:"path"` }
type Read ¶
type Read struct { // The number of bytes to read starting from the offset. This has a limit of // 1 MB, and a default value of 0.5 MB. Length int `json:"-" url:"length,omitempty"` // The offset to read from in bytes. Offset int `json:"-" url:"offset,omitempty"` // The path of the file to read. The path should be the absolute DBFS path. Path string `json:"-" url:"path"` }
Get the contents of a file
type ReadResponse ¶
type ReadResponse struct { // The number of bytes read (could be less than `length` if we hit end of // file). This refers to number of bytes read in unencoded version (response // data is base64-encoded). BytesRead int64 `json:"bytes_read,omitempty"` // The base64-encoded contents of the file read. Data string `json:"data,omitempty"` }