cloudfuse

command module
v0.1.4 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Oct 18, 2023 License: MIT Imports: 2 Imported by: 0

README

Cloudfuse - An S3 and Azure Storage FUSE driver

About

Cloudfuse is a fork of the open source project blobfuse2 from Microsoft that adds support for S3 storage, a GUI for configuration and mounting, and Windows support. It provides a virtual filesystem backed by either S3 or Azure Storage. It uses the libfuse open source library (fuse) to communicate with the Linux FUSE kernel module and uses WinFSP to support running on Windows. It implements the filesystem operations using the S3 and Azure Storage REST APIs.

Cloudfuse is stable, provided that it is used within its limits documented here. Cloudfuse supports both reads and writes, however, it does not guarantee continuous sync of data written to storage using other APIs or other mounts of Cloudfuse. For data integrity it is recommended that multiple sources do not modify the same blob/object/file. Please submit an issue here for any issues/feature requests/questions.

NOTICE

  • We have seen some customer issues around files getting corrupted when streaming is used in write mode. Kindly avoid using this feature for write while we investigate and resolve it.

Features

  • Mount an S3 bucket or Azure storage container or datalake file system on Linux and Windows.
  • Basic file system operations such as mkdir, opendir, readdir, rmdir, open, read, create, write, close, unlink, truncate, stat, rename
  • Local caching to improve subsequent access times
  • Streaming to support reading AND writing large files
  • Parallel downloads and uploads to improve access time for large files
  • Multiple mounts to the same container for read-only workloads

Health Monitor

Cloudfuse also supports a health monitor. It allows customers gain more insight into how their Cloudfuse instance is behaving with the rest of their machine. Visit here to set it up.

Features compared to blobfuse2

  • Supports any S3 compatible storage
  • Adds a GUI to configure and start mounts
  • Runs on Windows using WinFSP in foreground or as a Windows service

Download Cloudfuse

You can install Cloudfuse by cloning this repository. In the workspace execute the build script ./build.sh to build the binary. This will build a binary both for Linux or for Windows depending on the OS you are using.

Linux

Cloudfuse currently only supports libfuse2. On Linux, you need to install the libfuse2 package, for example on Ubuntu:

sudo apt install libfuse2
Running on Linux

To start your mount of an S3 Bucket or Azure Container use the mount command and specify the location of your config file. See config file for information about the config file. By default, cloudfuse will run in the background which allows you to close the terminal when you start a mount. If you would like it to run in the foreground you can specify foreground: true in your config file or pass --foreground=true as an argument when mounting.

    cloudfuse mount <mount path> --config-file=<config file>
Windows

On Windows, you also need to install the third party utility WinFsp. To download WinFsp, please run the WinFsp installer found here.

See here for how to setup Cloudfuse to run on Windows.

Supported Operations

The general format of the Cloudfuse commands is cloudfuse [command] [arguments] --[flag-name]=[flag-value]

  • help - Help about any command
  • mount - Mounts a cloud storage container as a filesystem. The supported containers include
    • S3 Bucket
    • Azure Blob Container
    • Azure Datalake Gen2 Container
  • mount all - Mounts all the containers in an S3 Account or Azure account as a filesystem. The supported storage services include
  • mount list - Lists all Cloudfuse filesystems.
  • secure decrypt - Decrypts a config file.
  • secure encrypt - Encrypts a config file.
  • secure get - Gets value of a config parameter from an encrypted config file.
  • secure set - Updates value of a config parameter.
  • unmount - Unmounts the Cloudfuse filesystem.
  • unmount all - Unmounts all Cloudfuse filesystems.

Find help from your command prompt

To see a list of commands, type cloudfuse -h and then press the ENTER key. To learn about a specific command, just include the name of the command (For example: cloudfuse mount -h).

Usage

  • Mount with cloudfuse
    • cloudfuse mount <mount path> --config-file=<config file>
  • Mount all containers in your storage account
    • cloudfuse mount all <mount path> --config-file=<config file>
  • List all mount instances of cloudfuse
    • cloudfuse mount list
  • Unmount cloudfuse on Linux
    • cloudfuse unmount <mount path>
  • Unmount all cloudfuse instances on Linux
    • cloudfuse unmount all
  • Install as a Windows service
    • cloudfuse service install
  • Uninstall cloudfuse from a Windows service
    • cloudfuse service uninstall
  • Start the Windows service
    • cloudfuse service start
  • Stop the Windows service
    • cloudfuse service stop
  • Mount an instance that will persist in Windows when restarted
    • cloudfuse service mount <mount path> --config-file=<config file>
  • Unmount mount of Cloudfuse running as a Windows service
    • cloudfuse service unmount <mount path>

CLI parameters

  • General options
    • --config-file=<PATH>: The path to the config file.
    • --log-level=<LOG_*>: The level of logs to capture.
    • --log-file-path=<PATH>: The path for the log file.
    • --foreground=true: Mounts the system in foreground mode.
    • --read-only=true: Mount container in read-only mode.
    • --default-working-dir: The default working directory to store log files and other cloudfuse related information.
    • --disable-version-check=true: Disable the cloudfuse version check.
    • --secure-config=true : Config file is encrypted suing 'cloudfuse secure` command.
    • --passphrase=<STRING> : Passphrase used to encrypt/decrypt config file.
    • --wait-for-mount=<TIMEOUT IN SECONDS> : Let parent process wait for given timeout before exit to ensure child has started.
  • Attribute cache options
    • --attr-cache-timeout=<TIMEOUT IN SECONDS>: The timeout for the attribute cache entries.
    • --no-symlinks=true: To improve performance disable symlink support.
  • Storage options
    • --container-name=<CONTAINER NAME>: The container to mount.
    • --cancel-list-on-mount-seconds=<TIMEOUT IN SECONDS>: Time for which list calls will be blocked after mount. (prevent billing charges on mounting)
    • --virtual-directory=true : Support virtual directories without existence of a special marker blob for block blob account (Azure only).
    • --subdirectory=<path> : Subdirectory to mount instead of entire container.
    • --disable-compression:false : Disable content encoding negotiation with server. If objects/blobs have 'content-encoding' set to 'gzip' then turn on this flag.
    • --use-adls=false : Specify configured storage account is HNS enabled or not. This must be turned on when HNS enabled account is mounted.
  • File cache options
    • --file-cache-timeout=<TIMEOUT IN SECONDS>: Timeout for which file is cached on local system.
    • --tmp-path=<PATH>: The path to the file cache.
    • --cache-size-mb=<SIZE IN MB>: Amount of disk cache that can be used by cloudfuse.
    • --high-disk-threshold=<PERCENTAGE>: If local cache usage exceeds this, start early eviction of files from cache.
    • --low-disk-threshold=<PERCENTAGE>: If local cache usage comes below this threshold then stop early eviction.
    • --sync-to-flush=false : Sync call will force upload a file to storage container if this is set to true, otherwise it just evicts file from local cache.
  • Stream options
    • --block-size-mb=<SIZE IN MB>: Size of a block to be downloaded during streaming.
  • Block-Cache options
    • --block-cache-block-size=<SIZE IN MB>: Size of a block to be downloaded as a unit.
    • --block-cache-pool-size=<SIZE IN MB>: Size of pool to be used for caching. This limits total memory used by block-cache.
    • --block-cache-path=<PATH>: Path where downloaded blocks will be persisted. Not providing this parameter will disable the disk caching.
    • --block-cache-disk-size=<SIZE IN MB>: Disk space to be used for caching.
    • --block-cache-prefetch=<Number of blocks>: Number of blocks to prefetch at max when sequential reads are in progress.
    • --block-cache-prefetch-on-open=true: Start prefetching on open system call instead of waiting for first read. Enhances perf if file is read sequentially from offset 0.
  • Fuse options
    • --attr-timeout=<TIMEOUT IN SECONDS>: Time the kernel can cache inode attributes.
    • --entry-timeout=<TIMEOUT IN SECONDS>: Time the kernel can cache directory listing.
    • --negative-timeout=<TIMEOUT IN SECONDS>: Time the kernel can cache non-existence of file or directory.
    • --allow-other: Allow other users to have access this mount point.
    • --disable-writeback-cache=true: Disallow libfuse to buffer write requests if you must strictly open files in O_WRONLY or O_APPEND mode.
    • --ignore-open-flags=true: Ignore the append and write only flag since O_APPEND and O_WRONLY is not supported with writeback caching.

S3 configuration

S3 connections will be configured by options in the following order of precedence:

  • The s3storage section of the Config file
  • Environment variables
    • AWS_ACCESS_KEY_ID: key ID, used as a pair with AWS_SECRET_ACCESS_KEY
    • AWS_SECRET_ACCESS_KEY: secret key, used as a pair with AWS_ACCESS_KEY_ID
    • AWS_SESSION_TOKEN: validates a temporary key pair (key ID & secret key)
    • AWS_WEB_IDENTITY_TOKEN_FILE: temporary credential from an external identity provider
    • AWS_REGION: the service region (e.g. us-east-1)
    • AWS_PROFILE: the profile name to use from shared configuration file(s)
  • Shared configuration files (~/.aws/credentials and ~/.aws/config)
    • The formatting for these files is documented at the link below. For more information about environment variables and shared configuration files, please see the documentation here.

Azure storage configuration with environment variables

  • General options
    • AZURE_STORAGE_ACCOUNT: Specifies the storage account to be connected.
    • AZURE_STORAGE_ACCOUNT_TYPE: Specifies the account type 'block' or 'adls'
    • AZURE_STORAGE_ACCOUNT_CONTAINER: Specifies the name of the container to be mounted
    • AZURE_STORAGE_BLOB_ENDPOINT: Specifies the blob endpoint to use. Defaults to *.blob.core.windows.net, but is useful for targeting storage emulators.
    • AZURE_STORAGE_AUTH_TYPE: Overrides the currently specified auth type. Case insensitive. Options: Key, SAS, MSI, SPN
  • Account key auth:
    • AZURE_STORAGE_ACCESS_KEY: Specifies the storage account key to use for authentication.
  • SAS token auth:
    • AZURE_STORAGE_SAS_TOKEN: Specifies the SAS token to use for authentication.
  • Managed Identity auth:
    • AZURE_STORAGE_IDENTITY_CLIENT_ID: Only one of these three parameters are needed if multiple identities are present on the system.
    • AZURE_STORAGE_IDENTITY_OBJECT_ID: Only one of these three parameters are needed if multiple identities are present on the system.
    • AZURE_STORAGE_IDENTITY_RESOURCE_ID: Only one of these three parameters are needed if multiple identities are present on the system.
    • MSI_ENDPOINT: Specifies a custom managed identity endpoint, as IMDS may not be available under some scenarios. Uses the MSI_SECRET parameter as the Secret header.
    • MSI_SECRET: Specifies a custom secret for an alternate managed identity endpoint.
  • Service Principal Name auth:
    • AZURE_STORAGE_SPN_CLIENT_ID: Specifies the client ID for your application registration
    • AZURE_STORAGE_SPN_TENANT_ID: Specifies the tenant ID for your application registration
    • AZURE_STORAGE_AAD_ENDPOINT: Specifies a custom AAD endpoint to authenticate against
    • AZURE_STORAGE_SPN_CLIENT_SECRET: Specifies the client secret for your application registration.
    • AZURE_STORAGE_AUTH_RESOURCE : Scope to be used while requesting for token.
  • Proxy Server:
    • http_proxy: The proxy server address. Example: 10.1.22.4:8080.
    • https_proxy: The proxy server address when https is turned off forcing http. Example: 10.1.22.4:8080.

Config file

  • See this sample config file.
  • See this config file for a list and description of all possible configurable options in cloudfuse.

Please note: do not use quotations "" for any of the config parameters

Frequently Asked Questions

  • How do I generate a SAS for Azure with permissions for rename? az cli has a command to generate a sas token. Open a command prompt and make sure you are logged in to az cli. Run the following command and the sas token will be displayed in the command prompt. az storage container generate-sas --account-name <account name ex:myadlsaccount> --account-key <accountKey> -n <container name> --permissions dlrwac --start <today's date ex: 2021-03-26> --expiry <date greater than the current time ex:2021-03-28>
  • Why do I get EINVAL on opening a file with WRONLY or APPEND flags? To improve performance, Cloudfuse by default enables writeback caching, which can produce unexpected behavior for files opened with WRONLY or APPEND flags, so Cloudfuse returns EINVAL on open of a file with those flags. Either use disable-writeback-caching to turn off writeback caching (can potentially result in degraded performance) or ignore-open-flags (replace WRONLY with RDWR and ignore APPEND) based on your workload.
  • How to mount Cloudfuse inside a container? Refer to 'docker' folder in this repo. It contains a sample 'Dockerfile'. If you wish to create your own container image, try 'buildandruncontainer.sh' script, it will create a container image and launch the container using current environment variables holding your storage account credentials.
  • Why am I not able to see the updated contents of file(s), which were updated through means other than Cloudfuse mount? If your use-case involves updating/uploading file(s) through other means and you wish to see the updated contents on Cloudfuse mount then you need to disable kernel page-cache. -o direct_io CLI parameter is the option you need to use while mounting. Along with this, set file-cache-timeout=0 and all other libfuse caching parameters should also be set to 0. User shall be aware that disabling kernel cache can result into more calls to S3 or Azure Storage which will have cost and performance implications.

Un-Supported File system operations

  • mkfifo : fifo creation is not supported by cloudfuse and this will result in "function not implemented" error
  • chown : Change of ownership is not supported by Azure Storage hence Cloudfuse does not support this.
  • Creation of device files or pipes is not supported by Cloudfuse.
  • Cloudfuse does not support extended-attributes (x-attrs) operations
  • Cloudfuse does not support lseek() operation on directory handles. No error is thrown but it will not work as expected.

Un-Supported Scenarios

  • Cloudfuse does not support overlapping mount paths. While running multiple instances of Cloudfuse make sure each instance has a unique and non-overlapping mount point.

  • Cloudfuse does not support co-existence with NFS on same mount path. Behavior in this case is undefined.

  • For Azure block blob accounts, where data is uploaded through other means, Cloudfuse expects special directory marker files to exist in container. In absence of this few file operations might not work. For e.g. if you have a blob 'A/B/c.txt' then special marker files shall exists for 'A' and 'A/B', otherwise opening of 'A/B/c.txt' will fail. Once a 'ls' operation is done on these directories 'A' and 'A/B' you will be able to open 'A/B/c.txt' as well. Possible workaround to resolve this from your container is to either

    create the directory marker files manually through portal or run 'mkdir' command for 'A' and 'A/B' from cloudfuse. Refer me for details on this.

Limitations

  • In case of Azure BlockBlob accounts, ACLs are not supported by Azure Storage so Cloudfuse will by default return success for 'chmod' operation. However it will work fine for Gen2 (DataLake) accounts. ACLs are not currently supported for S3 accounts.

  • When Cloudfuse is mounted on a container, SYS_ADMIN privileges are required for it to interact with the fuse driver. If container is created without the privilege, mount will fail. Sample command to spawn a docker container is

    docker run -it --rm --cap-add=SYS_ADMIN --device=/dev/fuse --security-opt apparmor:unconfined <environment variables> <docker image>

Syslog security warning

By default, Cloudfuse will log to syslog. The default settings will, in some cases, log relevant file paths to syslog. If this is sensitive information, turn off logging or set log-level to LOG_ERR.

License

This project is licensed under MIT.

Contributing

This project welcomes contributions and suggestions.

This project is governed by the code of conduct. You are expected to follow this as you contribute to the project. Please report all unacceptable behavior to opensource@seagate.com.

Documentation

The Go Gopher

There is no documentation for this package.

Directories

Path Synopsis
log
component
libfuse
Package libfuse defines the component that interacts with the FUSE filesystem and defines all functions to interface with FUSE.
Package libfuse defines the component that interacts with the FUSE filesystem and defines all functions to interface with FUSE.
Package internal is a generated GoMock package.
Package internal is a generated GoMock package.
tools

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL