dxfuse

package module
v1.4.0 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Aug 17, 2024 License: Apache-2.0 Imports: 36 Imported by: 0

README

dxfuse: a FUSE filesystem for DNAnexus

A filesystem that allows users access to the DNAnexus storage system.

NOTE: This project is in beta . It's used on DNAnexus cloud workers, and may also be run on a Linux or macOS machine with a network connection and a DNAnexus account.

The code uses the FUSE library, implemented in golang. The DNAnexus storage system is not POSIX compilant. It holds not just files and directories, but also records, databases, applets, and workflows. It allows things that are not POSIX, for example:

  1. Files in a directory can have the same name
  2. A filename can include slashes
  3. A file and a directory may share a name

To fit these names into a POSIX compliant filesystem, as FUSE and Linux require, files are moved to avoid name collisions. For example, if file foo.txt has three versions in directory bar, dxfuse will present the following Unix directory structure:

bar/
    foo.txt
    1/foo.txt
    2/foo.txt

The directories 1 and 2 are new, and do not exist in the project. If a file contains a slash, it is replaced with a triple underscore. As a rule, directories are not moved, nor are their characters modified. However, if a directory name contains a slash, it is dropped, and a warning is emitted to the log.

dxfuse provides a read-only view of the DNAnexus storage unless launched with an optional -limitedWrite flag. limitedWrite mode allows writing data in append-only fashion that enables support of spark file outputs.

dxfuse approximates a normal POSIX filesystem, but does not always have the same semantics. For example:

  1. Metadata like last access time are not supported
  2. Directories have approximate create/modify times. This is because DNAx does not keep such attributes for directories.

There are several limitations currently:

  • Primarily intended for Linux, but can be used on OSX
  • Limits directories to 255,000 elements
  • Updates to the project emanating from other machines are not reflected locally
  • Does not support hard links
  • limitedWrite mode has additional limitations described in the Limited Write Mode section

Download benchmarks

Streaming a file from DNAnexus using dxfuse performs similiarly to dx-toolkit. The following table shows performance across several instance types. The benchmark was how many seconds does it take to download a file of size X? The lower the number, the better. The two download methods were (1) dx cat, and (2) cat from a dxfuse mount point.

instance type dx cat (seconds) dxfuse cat (seconds) file size
mem1_ssd1_v2_x4 207 219 17G
mem1_ssd1_v2_x4 66 77 5.9G
mem1_ssd1_v2_x4 6 4 705M
mem1_ssd1_v2_x4 3 3 285M
mem1_ssd1_v2_x16 57 49 17G
mem1_ssd1_v2_x16 22 24 5.9G
mem1_ssd1_v2_x16 3 3 705M
mem1_ssd1_v2_x16 2 1 285M
mem3_ssd1_v2_x32 52 51 17G
mem3_ssd1_v2_x32 20 15 5.9G
mem3_ssd1_v2_x32 4 2 705M
mem3_ssd1_v2_x32 2 1 285M

Implementation overview

The implementation uses a sqlite database, located on /var/dxfuse/metadata.db. It stores files and directories in tables, indexed to speed up common queries.

Load on the DNAnexus API servers and the cloud object system is carefully controlled. Bulk calls are used to describe data objects, and the number of parallel IO requests is bounded.

dxfuse operations can sometimes be slow, for example, if the server is slow to respond, or has been temporarily shut down (503 mode). This may cause the filesystem to lose its interactive feel. Running it on a cloud worker reduces network latency significantly, and is the way it is used in the product. Running on a local, non cloud machine, runs the risk of network choppiness.

Limited Write Mode

dxfuse -limitedWrite mode was primarly designed to support spark file output over the file:/// protocol.

Creating and writing to files is allowed when dxfuse is mounted with the -limitedWrite flag. Writing to files is append only. Any non-sequential writes will return ENOTSUP. Seeking or reading operations are not permitted while a file is being written.

Supported operations

-limitedWrite mode enables the following operations: rename (mv, see below), unlink (rm), mkdir (see below), and rmdir (empty folders only). Rewriting existing files is not permitted, nor is truncating existing files.

mkdir behavior

All mkdir operations via dxfuse are treated as mkdir -p. This is because dxfuse does not present the realtime state of the project. Folders can be created outside of dxfuse (or in another dxfuse process), and therefore not be visible to the current running dxfuse. A subsequent mkdir --> project-xxxx/newFolder would return a 422 error, because dxfuse did not know the directory already exists. This design is due to spark behavior where multiple worker nodes sometimes attempt to create the same output directory.

rename behavior

Rename does not allow removing the target file or directory because DNAnexus API does not support this functionality. Removal of target file or directory must be done as a separate operation before calling the rename (mv) operation.

$ ls -lht MNT/file*
-r--r--r-- 1 root root 6 Aug 20 21:23 file1
-r--r--r-- 1 root root 6 Aug 20 21:23 file
$ mv MNT/file MNT/file1
mv: cannot move 'file' to 'file1': Operation not permitted

# Supported via 2 distinct operations:
$ rm MNT/file1
$ mv MNT/file MNT/file1

File upload and closing

Each dxfuse file open for writing is allocated a 16MiB write buffer in memory, which is uploaded as a DNAnexus file part when full. This buffer increases in size for each part 1.1^n * 16MiB up to a maximum 700MiB. dxfuse uploads up to 4 parts in parallel across all files being uploaded.

The upload of the last DNAnexus file part and the call of file-xxxx/close DNAnexus API operation are performed by dxfuse only when the OS process that created the OS file descriptor closes the OS file descriptor, triggering FlushFile fuse operation.

File descriptor duplication and empty files

Applications which immediately duplicate a file descriptor after opening it are supported, but writing and the subsequent file descriptor duplication is not supported, as this triggers the FlushFile fuse operation. See the below supported syscall access pattern, where FlushFile op triggered by close(3) is ignored for an empty open file.

# Supported access pattern
openat(AT_FDCWD, "MNT/project/writefile", O_WRONLY|O_CREAT|O_TRUNC, 0666) = 3
dup2(3, 1)                              = 1
# Triggers a FlushFile ignored by dxfuse since the file is empty
close(3)                                = 0
read(0, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 1024) = 1024
write(1, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 1024) = 1024

The example below closes the file descriptor after writing, which closes the dnanexus file, causing subsequent writes to fail.

# Unsupported access pattern
openat(AT_FDCWD, "MNT/project/writefile", O_WRONLY|O_CREAT|O_TRUNC, 0666) = 3
read(0, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 1024) = 1024
write(3, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 1024) = 1024
dup2(3, 1)                              = 1
# Triggers a FlushFile --> file-xxxx/close by dxfuse since the file size is greater than 0
close(3)                                = 0
# Returns EPERM error since the file has been closed already
write(1, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 1024) = -1

Ignoring the FlushFile op for empty files creates an edge case for creating empty files in dxfuse-mounted folders. For empty files, the empty part upload and file-xxxx/close are not called until the ReleaseFileHandle fuse operation is triggered by the kernel when the last open file descriptor for a given file has been closed. The downside of this behavior is that the dxfuse client application creating the empty file is unable to catch errors that may happen during file-xxxxx/close API call as it does for non-empty files closed via FlushFile fuse operation triggered by application's call to close(3).

File closing error checking

dxfuse clients should check errors from close(3) call to make sure the corresponding DNAnexus file has been transitioned out of the open state, as DNAnexus files left in open state are eventually removed by the DNAnexus cleanup daemon.

Spark output artifacts

Spark output through dxfuse uses the spark file:// protocol. Due to this each output produced by spark will have a corresponding .crc file. These files can be removed.

Upload benchmarks

Upload benchmarks are from an Ubuntu 20.04 DNAnexus worker mem2_ssd1_v2_x32 (AWS m5d.8xlarge) instance running kernel 5.4.0-1055-aws. dx and dxfuse benchmark commands were run like so. These benchmarks are not exact because they include the wait time until the uploaded file is transitioned to the closed state.

time dd if=/dev/zero bs=1M count=$SIZE | dx upload --wait -

time dd if=/dev/zero bs=1M count=$SIZE of=MNT/project/$SIZE

dx upload --wait (seconds) dxfuse upload(seconds) file size
4.4 3.5 100MiB
4.8 4.2 200MiB
5.9 4.8 400MiB
5.9 6.8 800MiB
7 10 1GiB
22.5 19.2 2GiB
37.8 87 10GiB
254 495 100GiB

Building

To build the code from source, you'll need, at the very least, the go and git tools.

git clone git@github.com:dnanexus/dxfuse.git
cd dxfuse
go build -o dxfuse cli/main.go

Usage

Allow regular users access to the fuse device on the local machine:

chmod u+rw /dev/fuse

In theory, it should be possible to use suid to achive this instead, but that does not currently work.

To mount a dnanexus project mammals in local directory /home/jonas/foo do:

dxfuse /home/jonas/foo mammals

Note that dxfuse will hide any existing content of the mount point (e.g. /home/jonas/foo directory in the example above) until the dxfuse process is stopped.

The bootstrap process has some asynchrony, so it could take it a second two to start up. It spawns a separate process for the filesystem server, waits for it to start, and exits. To get more information, use the verbose flag. Debugging output is written to the log, which is placed at $HOME/.dxfuse/dxfuse.log. The maximal verbosity level is 2.

dxfuse -verbose 1 MOUNT-POINT PROJECT-NAME

Project ids can be used instead of project names. To mount several projects, say, mammals, fish, and birds, do:

dxfuse /home/jonas/foo mammals fish birds

This will create the directory hierarchy:

/home/jonas/foo
              |_ mammals
              |_ fish
              |_ birds

Note that files may be hard linked from several projects. These will appear as a single inode with a link count greater than one.

To stop the dxfuse process do:

fusermount -u MOUNT-POINT

Extended attributes (xattrs)

DNXa data objects have properties and tags, these are exposed as POSIX extended attributes. Xattrs can be read, written, and removed. The package we use here is attr, it can installed with sudo apt-get install attr on Linux. On OSX the xattr package comes packaged with the base operating system, and can be used to the same effect.

DNAx tags and properties are prefixed. For example, if zebra.txt is a file then attr -l zebra.txt will print out all the tags, properties, and attributes that have no POSIX equivalent. These are split into three correspnding prefixes tag, prop, and base all under the user Linux namespace.

Here zebra.txt has no properties or tags.

$ attr -l zebra.txt

base.state: closed
base.archivalState: live
base.id: file-xxxx

Add a property named family with value mammal

$ attr -s prop.family -V mammal zebra.txt

Add a tag africa

$ attr -s tag.africa -V XXX zebra.txt

Remove the family property:

$ attr -r prop.family zebra.txt

You cannot modify base.* attributes, these are read-only. Setting and deleting xattrs can be done only for files that are closed on the platform.

macOS

For OSX you will need to install macFUSE. Note that Your Milage May Vary (YMMV) on this platform, we are mostly focused on Linux.

Feaures such as kernel read-ahead, pagecache, mmap, and PID tracking may not work on macOS.

mmap

dxfuse supports shared read-only mmap for remote files. This is only possible with FUSE when both the kernel pagecache and kernel readahead options are both enabled for the FUSE mount, which may have other side effects of increased memory usage (pagecache) and more remote read requests (readahead).

>>> import mmap
>>> fd = open('MNT/dxfuse_test_data/README.md', 'r')
mmap.mmap(fd.fileno(), 0, flags=mmap.MAP_SHARED, prot=mmap.PROT_READ)
<mmap.mmap object at 0x7f9cadd87770>

Common problems

If a project appears empty, or is missing files, it could be that the dnanexus token does not have permissions for it. Try to see if you can do dx ls YOUR_PROJECT:.

There is no natural match for DNAnexus applets and workflows, so they are presented as block devices. They do not behave like block devices, but the shell colors them differently from files and directories.

Documentation

Overview

Accept commands from the dxfuse_tools program. The only command

* right now is sync, but this is the place to implement additional * ones to come in the future.

Index

Constants

View Source
const (
	// namespace for xattrs
	XATTR_TAG  = "tag"
	XATTR_PROP = "prop"
	XATTR_BASE = "base"
)
View Source
const (
	// read only file that is on the cloud
	AM_RO_Remote = 1

	// 'open' file that is being appended to on the platform
	// file is not readable until it is in the 'closed' state
	// at which point it is set to readonly and AM_RO_Remote
	AM_AO_Remote = 2
)

Files can be in two access modes: remote-read-only or remote-append-only

View Source
const (
	PFM_NIL                  = 1 // No IOs have been seen yet, cache is empty
	PFM_DETECT_SEQ           = 2 // First accesses, detecting if access is sequential
	PFM_PREFETCH_IN_PROGRESS = 3 // prefetch is ongoing
	PFM_EOF                  = 4 // reached the end of the file
)

enumerated type for the state of a PFM (file metadata)

View Source
const (
	IOV_HOLE      = 1 // empty
	IOV_IN_FLIGHT = 2 // in progress
	IOV_DONE      = 3 // completed successfully
	IOV_ERRORED   = 4 // completed with an error
)

state of an io-vector

View Source
const (
	DATA_OUTSIDE_CACHE = 1 // data not in cache
	DATA_IN_CACHE      = 2 // data is in cache
	DATA_HOLE          = 3 // we would have the data if we were doing caching
	DATA_WAIT          = 4 // need to wait for some of the IOs
)
View Source
const (
	KiB = 1024
	MiB = 1024 * KiB
	GiB = 1024 * MiB
)
View Source
const (
	DatabaseFile = "metadata.db"
	LogFile      = "dxfuse.log"
)
View Source
const (
	MinHttpClientPoolSize     = 30
	FileWriteInactivityThresh = 5 * time.Minute
	MaxDirSize                = 255 * 1000
	MaxNumFileHandles         = 1000 * 1000
	NumRetriesDefault         = 10
	InitialUploadPartSize     = 16 * MiB
	MaxUploadPartSize         = 700 * MiB
	Version                   = "v1.4.0"
)
View Source
const (
	InodeInvalid = 0
	InodeRoot    = fuseops.RootInodeID // This is an OS constant
)
View Source
const (
	// flags for writing files to disk
	DIRTY_FILES_ALL      = 14 // all modified files
	DIRTY_FILES_INACTIVE = 15 // only files there were unmodified recently
)
View Source
const (
	// Permissions
	PERM_VIEW       = 1
	PERM_UPLOAD     = 2
	PERM_CONTRIBUTE = 3
	PERM_ADMINISTER = 4
)
View Source
const (
	FK_Regular  = 10
	FK_Applet   = 12
	FK_Workflow = 13
	FK_Record   = 14
	FK_Database = 15
	FK_Other    = 16
)

Kinds of files

View Source
const (
	// A port number for accepting commands
	CmdPort = 7205
)

Variables

This section is empty.

Functions

func BytesToString

func BytesToString(numBytes int64) string

1024 => 1KB 10240 => 10KB 1100000 => 1MB

func DxDescribeBulkObjects

func DxDescribeBulkObjects(
	ctx context.Context,
	httpClient *http.Client,
	dxEnv *dxda.DXEnvironment,
	projectId string,
	objIds []string) (map[string]DxDescribeDataObject, error)

func DxFindProject

func DxFindProject(
	ctx context.Context,
	dxEnv *dxda.DXEnvironment,
	projName string) (string, error)

Find the project-id for a project name. Return nil if the project does not exist

func FilenameIsPosixCompliant

func FilenameIsPosixCompliant(filename string) bool

func GetTgid added in v1.0.0

func GetTgid(pid uint32) (tgid int32, err error)

func LogMsg

func LogMsg(moduleName string, a string, args ...interface{})

add a timestamp and module name, to a log message

func MakeFSBaseDir

func MakeFSBaseDir() string

create a directory for all the dxfuse files in $HOME/.dxfuse

func MaxInt

func MaxInt(x, y int) int

func MaxInt64

func MaxInt64(x, y int64) int64

func MinInt

func MinInt(x, y int) int

func MinInt64

func MinInt64(x, y int64) int64

func SecondsToTime

func SecondsToTime(t int64) time.Time

convert time in seconds since 1-Jan 1970, to the equivalent golang structure

func Time2string

func Time2string(t time.Time) string

Types

type Cache

type Cache struct {
	// contains filtered or unexported fields
}

A cache of all the data retrieved from the platform, for one file. It is a contiguous range of chunks. All IOs are the same size.

type Chunk

type Chunk struct {
	// contains filtered or unexported fields
}

type CmdClient

type CmdClient struct {
}

func NewCmdClient

func NewCmdClient() *CmdClient

Sending commands with a client

func (*CmdClient) Sync

func (client *CmdClient) Sync()

type CmdServer

type CmdServer struct {
	// contains filtered or unexported fields
}

func NewCmdServer

func NewCmdServer(options Options, sybx *SyncDbDx) *CmdServer

func (*CmdServer) Close

func (cmdSrv *CmdServer) Close()

func (*CmdServer) Init

func (cmdSrv *CmdServer) Init()

type CmdServerBox

type CmdServerBox struct {
	// contains filtered or unexported fields
}

A separate structure used for exporting through RPC

func (*CmdServerBox) GetLine

func (box *CmdServerBox) GetLine(arg string, reply *bool) error

Note: all export functions from this module have to have this format. Nothing else will work with the RPC package.

type DeadFile

type DeadFile struct {
	Kind      int    // Kind of object this is
	Id        string // Required to build a download URL
	ProjId    string // Note: this could be a container
	Inode     int64
	LocalPath string
}

A file that is scheduled for removal

type Dir

type Dir struct {
	Parent   string // the parent directory, used for debugging
	Dname    string // This is the last part of the full path
	FullPath string // combine parent and dname, then normalize
	Inode    int64
	Ctime    time.Time   // DNAx does not record times per directory.
	Mtime    time.Time   // we use the project creation time, and mtime as an approximation.
	Mode     os.FileMode // uint32
	Uid      uint32
	Gid      uint32

	// extra information, used internally
	ProjId     string
	ProjFolder string
	Populated  bool
	// contains filtered or unexported fields
}

directories

func (Dir) GetAttrs

func (d Dir) GetAttrs() (a fuseops.InodeAttributes)

func (Dir) GetInode

func (d Dir) GetInode() fuseops.InodeID

type DirHandle

type DirHandle struct {
	// contains filtered or unexported fields
}

type Dirs

type Dirs struct {
	// contains filtered or unexported fields
}

func (Dirs) Len

func (d Dirs) Len() int

func (Dirs) Less

func (d Dirs) Less(i, j int) bool

func (Dirs) Swap

func (d Dirs) Swap(i, j int)

type DirtyFileInfo

type DirtyFileInfo struct {
	Inode int64

	// will be "" for files created locally, and not uploaded yet
	Id         string
	FileSize   int64
	Mtime      int64
	LocalPath  string
	Tags       []string
	Properties map[string]string
	Name       string
	Directory  string
	ProjFolder string
	ProjId     string
	// contains filtered or unexported fields
}

Information required to upload file data to the platform. It also includes updated tags and properties of a data-object.

Not that not only files have attributes, applets and workflows have them too.

type DxDescribeDataObject

type DxDescribeDataObject struct {
	Id            string
	ProjId        string
	Name          string
	State         string
	ArchivalState string
	Folder        string
	Size          int64
	CtimeSeconds  int64
	MtimeSeconds  int64
	Tags          []string
	Properties    map[string]string
}

------------------------------------------------------------------- Description of a DNAx data object

func DxDescribe

func DxDescribe(
	ctx context.Context,
	httpClient *http.Client,
	dxEnv *dxda.DXEnvironment,
	projectId string,
	objId string) (DxDescribeDataObject, error)

Describe just one object. Retrieve state even if the object is not closed.

type DxDescribePrj

type DxDescribePrj struct {
	Id           string
	Name         string
	Region       string
	Version      int
	DataUsageGiB float64
	CtimeSeconds int64
	MtimeSeconds int64
	UploadParams FileUploadParameters
	Level        int // one of VIEW, UPLOAD, CONTRIBUTE, ADMINISTER
}

func DxDescribeProject

func DxDescribeProject(
	ctx context.Context,
	httpClient *http.Client,
	dxEnv *dxda.DXEnvironment,
	projectId string) (*DxDescribePrj, error)

type DxDescribeRaw

type DxDescribeRaw struct {
	Id               string            `json:"id"`
	ProjId           string            `json:"project"`
	Name             string            `json:"name"`
	State            string            `json:"state"`
	ArchivalState    string            `json:"archivalState"`
	Folder           string            `json:"folder"`
	CreatedMillisec  int64             `json:"created"`
	ModifiedMillisec int64             `json:"modified"`
	Size             int64             `json:"size"`
	Tags             []string          `json:"tags"`
	Properties       map[string]string `json:"properties"`
}

type DxDescribeRawTop

type DxDescribeRawTop struct {
	Describe DxDescribeRaw `json:"describe"`
}

type DxDownloadURL

type DxDownloadURL struct {
	URL     string            `json:"url"`
	Headers map[string]string `json:"headers"`
}

A URL generated with the /file-xxxx/download API call, that is used to download file ranges.

type DxFolder

type DxFolder struct {
	// contains filtered or unexported fields
}

a DNAx directory. It holds files and sub-directories.

func DxDescribeFolder

func DxDescribeFolder(
	ctx context.Context,
	httpClient *http.Client,
	dxEnv *dxda.DXEnvironment,
	projectId string,
	folder string) (*DxFolder, error)

type DxListFolder

type DxListFolder struct {
	// contains filtered or unexported fields
}

type DxOps

type DxOps struct {
	// contains filtered or unexported fields
}

func NewDxOps

func NewDxOps(dxEnv dxda.DXEnvironment, options Options) *DxOps

func (*DxOps) DxAddTags

func (ops *DxOps) DxAddTags(
	ctx context.Context,
	httpClient *http.Client,
	projId string,
	objId string,
	tags []string) error

func (*DxOps) DxClone

func (ops *DxOps) DxClone(
	ctx context.Context,
	httpClient *http.Client,
	srcProjId string,
	srcId string,
	destProjId string,
	destProjFolder string) (bool, error)

func (*DxOps) DxFileCloseAndWait

func (ops *DxOps) DxFileCloseAndWait(
	ctx context.Context,
	httpClient *http.Client,
	projectId string,
	fid string) error

func (*DxOps) DxFileNew

func (ops *DxOps) DxFileNew(
	ctx context.Context,
	httpClient *http.Client,
	nonceStr string,
	projId string,
	fname string,
	folder string) (string, error)

func (*DxOps) DxFileUploadPart

func (ops *DxOps) DxFileUploadPart(
	ctx context.Context,
	httpClient *http.Client,
	fileId string,
	index int,
	data []byte) error

func (*DxOps) DxFolderNew

func (ops *DxOps) DxFolderNew(
	ctx context.Context,
	httpClient *http.Client,
	projId string,
	folder string) error

func (*DxOps) DxFolderRemove

func (ops *DxOps) DxFolderRemove(
	ctx context.Context,
	httpClient *http.Client,
	projId string,
	folder string) error

func (*DxOps) DxMove

func (ops *DxOps) DxMove(
	ctx context.Context,
	httpClient *http.Client,
	projId string,
	objectIds []string,
	folders []string,
	destination string) error
API method: /class-xxxx/move

Moves the specified data objects and folders to a destination folder in the same container.

func (*DxOps) DxRemoveObjects

func (ops *DxOps) DxRemoveObjects(
	ctx context.Context,
	httpClient *http.Client,
	projId string,
	objectIds []string) error

func (*DxOps) DxRemoveTags

func (ops *DxOps) DxRemoveTags(
	ctx context.Context,
	httpClient *http.Client,
	projId string,
	objId string,
	tags []string) error

func (*DxOps) DxRename

func (ops *DxOps) DxRename(
	ctx context.Context,
	httpClient *http.Client,
	projId string,
	fileId string,
	newName string) error

API method: /class-xxxx/rename

rename a data object

func (*DxOps) DxRenameFolder

func (ops *DxOps) DxRenameFolder(
	ctx context.Context,
	httpClient *http.Client,
	projId string,
	folder string,
	newName string) error

func (*DxOps) DxSetProperties

func (ops *DxOps) DxSetProperties(
	ctx context.Context,
	httpClient *http.Client,
	projId string,
	objId string,
	props map[string](*string)) error

type File

type File struct {
	Kind   int    // Kind of object this is
	Id     string // Required to build a download URL
	ProjId string // Note: this could be a container

	// One of {open, closing, closed}.
	// Closed is the only state where the file can be read
	State string

	// One of {live, archival, archived, unarchiving}.
	// Live is the only state where the file can be read.
	ArchivalState string

	Name  string
	Size  int64
	Inode int64
	Ctime time.Time
	Mtime time.Time
	Mode  os.FileMode // uint32
	Uid   uint32
	Gid   uint32

	// tags and properties
	Tags       []string
	Properties map[string]string
	// contains filtered or unexported fields
}

A Unix file can stand for any DNAx data object. For example, it could be a workflow or an applet. We distinguish between them based on the Id (file-xxxx, applet-xxxx, workflow-xxxx, ...).

Note: this struct is immutable by convention. The up-to-date data is always on the database, not in memory.

func (File) GetAttrs

func (f File) GetAttrs() (a fuseops.InodeAttributes)

func (File) GetInode

func (f File) GetInode() fuseops.InodeID

type FileHandle

type FileHandle struct {
	Id string // To avoid looking up the file-id for each /upload call

	// For writeable files only
	// Only flush from original FD
	Tgid int32
	// contains filtered or unexported fields
}

type FileUpdateReq

type FileUpdateReq struct {
	// contains filtered or unexported fields
}

type FileUploadParameters

type FileUploadParameters struct {
	MinimumPartSize      int64 `json:"minimumPartSize"`
	MaximumPartSize      int64 `json:"maximumPartSize"`
	EmptyLastPartAllowed bool  `json:"emptyLastPartAllowed"`
	MaximumNumParts      int64 `json:"maximumNumParts"`
	MaximumFileSize      int64 `json:"maximumFileSize"`
}

https://documentation.dnanexus.com/developer/api/data-containers/projects#api-method-project-xxxx-describe

type FileUploader added in v1.0.0

type FileUploader struct {
	// contains filtered or unexported fields
}

func NewFileUploader added in v1.0.0

func NewFileUploader(verboseLevel int, options Options, dxEnv dxda.DXEnvironment) *FileUploader

func (*FileUploader) AllocateWriteBuffer added in v1.0.0

func (uploader *FileUploader) AllocateWriteBuffer(partId int, block bool) []byte

TODO replace this with a more reasonble buffer pool for managing memory use

func (*FileUploader) Shutdown added in v1.0.0

func (uploader *FileUploader) Shutdown()

type Filesys

type Filesys struct {
	// inherit empty implementations for all the filesystem
	// methods we do not implement
	fuseutil.NotImplementedFileSystem
	// contains filtered or unexported fields
}

func NewDxfuse

func NewDxfuse(
	dxEnv dxda.DXEnvironment,
	manifest Manifest,
	options Options) (*Filesys, error)

func (*Filesys) CreateFile

func (fsys *Filesys) CreateFile(ctx context.Context, op *fuseops.CreateFileOp) error

A CreateRequest asks to create and open a file (not a directory).

func (fsys *Filesys) CreateLink(ctx context.Context, op *fuseops.CreateLinkOp) error

func (*Filesys) FlushFile

func (fsys *Filesys) FlushFile(ctx context.Context, op *fuseops.FlushFileOp) error

func (*Filesys) ForgetInode

func (fsys *Filesys) ForgetInode(ctx context.Context, op *fuseops.ForgetInodeOp) error

This may be the wrong way to do it. We may need to actually delete the inode at this point, instead of inside RmDir/Unlink.

func (*Filesys) GetInodeAttributes

func (fsys *Filesys) GetInodeAttributes(ctx context.Context, op *fuseops.GetInodeAttributesOp) error

func (*Filesys) GetXattr

func (fsys *Filesys) GetXattr(ctx context.Context, op *fuseops.GetXattrOp) error

func (*Filesys) ListXattr

func (fsys *Filesys) ListXattr(ctx context.Context, op *fuseops.ListXattrOp) error

Make a list of all the extended attributes

func (*Filesys) LookUpInode

func (fsys *Filesys) LookUpInode(ctx context.Context, op *fuseops.LookUpInodeOp) error

func (*Filesys) MkDir

func (fsys *Filesys) MkDir(ctx context.Context, op *fuseops.MkDirOp) error

All mkdir operations are treated as "mkdir -p"

func (*Filesys) OpenDir

func (fsys *Filesys) OpenDir(ctx context.Context, op *fuseops.OpenDirOp) error

OpenDir return nil error allows open dir COMMON for drivers

func (*Filesys) OpenFile

func (fsys *Filesys) OpenFile(ctx context.Context, op *fuseops.OpenFileOp) error

Note: What happens if the file is opened for writing?

func (*Filesys) ReadDir

func (fsys *Filesys) ReadDir(ctx context.Context, op *fuseops.ReadDirOp) (err error)

ReadDir lists files into readdirop

func (*Filesys) ReadFile

func (fsys *Filesys) ReadFile(ctx context.Context, op *fuseops.ReadFileOp) error

func (*Filesys) ReleaseDirHandle

func (fsys *Filesys) ReleaseDirHandle(ctx context.Context, op *fuseops.ReleaseDirHandleOp) error

ReleaseDirHandle deletes file handle entry

func (*Filesys) ReleaseFileHandle

func (fsys *Filesys) ReleaseFileHandle(ctx context.Context, op *fuseops.ReleaseFileHandleOp) error

func (*Filesys) RemoveXattr

func (fsys *Filesys) RemoveXattr(ctx context.Context, op *fuseops.RemoveXattrOp) error

func (*Filesys) Rename

func (fsys *Filesys) Rename(ctx context.Context, op *fuseops.RenameOp) error

func (*Filesys) RmDir

func (fsys *Filesys) RmDir(ctx context.Context, op *fuseops.RmDirOp) error

func (*Filesys) SetInodeAttributes

func (fsys *Filesys) SetInodeAttributes(ctx context.Context, op *fuseops.SetInodeAttributesOp) error

if the file is writable, we can modify some of the attributes. otherwise, this is a permission error.

func (*Filesys) SetXattr

func (fsys *Filesys) SetXattr(ctx context.Context, op *fuseops.SetXattrOp) error

func (*Filesys) Shutdown

func (fsys *Filesys) Shutdown()

func (*Filesys) StatFS

func (fsys *Filesys) StatFS(ctx context.Context, op *fuseops.StatFSOp) error

func (*Filesys) SyncFile

func (fsys *Filesys) SyncFile(ctx context.Context, op *fuseops.SyncFileOp) error
func (fsys *Filesys) Unlink(ctx context.Context, op *fuseops.UnlinkOp) error

Decrement the link count, and remove the file if it hits zero.

func (*Filesys) WriteFile

func (fsys *Filesys) WriteFile(ctx context.Context, op *fuseops.WriteFileOp) error

Writes to files.

Note: the file-open operation doesn't state if the file is going to be opened for reading or writing.

type FindProjectReply

type FindProjectReply struct {
	Results []FindResult `json:"results"`
}

type FindProjectRequest

type FindProjectRequest struct {
	Name  string `json:"name"`
	Level string `json:"level"`
}

type FindResult

type FindResult struct {
	Id string `json:"id"`
}

type IoReq

type IoReq struct {
	// contains filtered or unexported fields
}

A request that one of the IO-threads will pick up

type Iovec

type Iovec struct {
	// contains filtered or unexported fields
}

type ListFolderRequest

type ListFolderRequest struct {
	Folder        string `json:"folder"`
	Only          string `json:"only"`
	IncludeHidden bool   `json:"includeHidden"`
}

type ListFolderResponse

type ListFolderResponse struct {
	Objects []ObjInfo `json:"objects"`
	Folders []string  `json:"folders"`
}

type MProperties

type MProperties struct {
	Elements map[string]string `json:"elements"`
}

Marshal a DNAx object properties to/from a string that is stored in a database table. We use base64 encoding for the same reason as tags (see above).

type MTags

type MTags struct {
	Elements []string `json:"elements"`
}

Marshal a DNAx object tags to/from a string that is stored in a database table.

We use base64 encoding to avoid problematic characters (`) when putting this string into SQL statements

type Manifest

type Manifest struct {
	Files       []ManifestFile `json:"files"`
	Directories []ManifestDir  `json:"directories"`
}

func MakeManifestFromProjectIds

func MakeManifestFromProjectIds(
	ctx context.Context,
	dxEnv dxda.DXEnvironment,
	projectIds []string) (*Manifest, error)

func ReadManifest

func ReadManifest(fname string) (*Manifest, error)

read the manifest from a file into a memory structure

func (*Manifest) Clean

func (m *Manifest) Clean()

func (*Manifest) DirSkeleton

func (m *Manifest) DirSkeleton() ([]string, error)

Figure out the directory structure needed to support the leaf nodes. For example, if we need to create:

["/A/B/C", "/D", "/D/E"]

then the skeleton is:

["/A", "/A/B", "/D"]

The root directory is not reported in the skeleton.

func (*Manifest) FillInMissingFields

func (m *Manifest) FillInMissingFields(ctx context.Context, dxEnv dxda.DXEnvironment) error

func (*Manifest) Validate

func (m *Manifest) Validate() error

type ManifestDir

type ManifestDir struct {
	ProjId  string `json:"proj_id"`
	Folder  string `json:"folder"`
	Dirname string `json:"dirname"`

	// These may missing.
	CtimeSeconds int64 `json:"ctime,omitempty"`
	MtimeSeconds int64 `json:"mtime,omitempty"`
}

type ManifestFile

type ManifestFile struct {
	ProjId string `json:"proj_id"`
	FileId string `json:"file_id"`
	Parent string `json:"parent"`

	// These may not be provided by the user. Then, we
	// need to query DNAx for the information.
	State         string `json:"state,omitempty"`
	ArchivalState string `json:"archivalState,omitempty"`
	Fname         string `json:"fname,omitempty"`
	Size          int64  `json:"size,omitempty"`
	CtimeSeconds  int64  `json:"ctime,omitempty"`
	MtimeSeconds  int64  `json:"mtime,omitempty"`
}

type MeasureWindow

type MeasureWindow struct {
	// contains filtered or unexported fields
}

type MetadataDb

type MetadataDb struct {
	// contains filtered or unexported fields
}

func NewMetadataDb

func NewMetadataDb(
	dbFullPath string,
	dxEnv dxda.DXEnvironment,
	options Options) (*MetadataDb, error)

func (*MetadataDb) BeginTxn

func (mdb *MetadataDb) BeginTxn() (*sql.Tx, error)

func (*MetadataDb) CreateDir

func (mdb *MetadataDb) CreateDir(
	oph *OpHandle,
	projId string,
	projFolder string,
	ctime int64,
	mtime int64,
	mode os.FileMode,
	dirPath string) (int64, error)

Assumption: the directory does not already exist in the database.

func (*MetadataDb) CreateFile

func (mdb *MetadataDb) CreateFile(
	ctx context.Context,
	oph *OpHandle,
	dir *Dir,
	fname string,
	mode os.FileMode) (File, error)

We know that the parent directory exists, is populated, and the file does not exist

func (*MetadataDb) DirtyFilesGetAndReset

func (mdb *MetadataDb) DirtyFilesGetAndReset(flag int) ([]DirtyFileInfo, error)

Get a list of all the dirty files, and reset the table. The files can be modified again, which will set the flag to true.

func (*MetadataDb) Init

func (mdb *MetadataDb) Init() error

construct an initial empty database, representing an entire project.

func (*MetadataDb) LookupByInode

func (mdb *MetadataDb) LookupByInode(ctx context.Context, oph *OpHandle, inode int64) (Node, bool, error)

search for a file with a particular inode.

func (*MetadataDb) LookupDirByInode

func (mdb *MetadataDb) LookupDirByInode(ctx context.Context, oph *OpHandle, inode int64) (Dir, bool, error)

func (*MetadataDb) LookupInDir

func (mdb *MetadataDb) LookupInDir(ctx context.Context, oph *OpHandle, dir *Dir, dirOrFileName string) (Node, bool, error)

Search for a file/subdir in a directory Look for file [filename] in directory [parent]/[dname].

1. Look if the directory has already been downloaded and placed in the DB 2. If not, populate it 3. Do a lookup in the directory.

Note: the file might not exist.

func (*MetadataDb) MoveDir

func (mdb *MetadataDb) MoveDir(
	ctx context.Context,
	oph *OpHandle,
	oldParentDir Dir,
	newParentDir Dir,
	oldDir Dir,
	newName string) error

As a running example:

say we have a directory structure: A ├── fruit │ ├── grapes.txt │ └── melon.txt ├── X.txt └── Y.txt

We also have: D └── K

From the shell we issue the command: $ mv A D/K/

func (*MetadataDb) MoveFile

func (mdb *MetadataDb) MoveFile(
	ctx context.Context,
	oph *OpHandle,
	inode int64,
	newParentDir Dir,
	newName string) error

Move a file

  1. Can move a file from one directory to another, or leave it in the same directory
  2. Can change the filename.

func (*MetadataDb) PopulateRoot

func (mdb *MetadataDb) PopulateRoot(ctx context.Context, oph *OpHandle, manifest Manifest) error

Build a toplevel directory for each project.

func (*MetadataDb) ReadDirAll

func (mdb *MetadataDb) ReadDirAll(ctx context.Context, oph *OpHandle, dir *Dir) (map[string]File, map[string]Dir, error)

Add a directory with its contents to an exisiting database

func (*MetadataDb) RemoveEmptyDir

func (mdb *MetadataDb) RemoveEmptyDir(oph *OpHandle, inode int64) error

Remove a directory from the database

func (*MetadataDb) Shutdown

func (mdb *MetadataDb) Shutdown()
func (mdb *MetadataDb) Unlink(ctx context.Context, oph *OpHandle, file File) error

TODO: take into account the case of ForgetInode, and files that are open, but unlinked.

on this file system, since we don't keep track of link count, this amount to removing the file.

func (*MetadataDb) UpdateClosedFileMetadata added in v1.0.0

func (mdb *MetadataDb) UpdateClosedFileMetadata(
	ctx context.Context,
	oph *OpHandle,
	inode int64) error

func (*MetadataDb) UpdateFileAttrs

func (mdb *MetadataDb) UpdateFileAttrs(
	ctx context.Context,
	oph *OpHandle,
	inode int64,
	fileSize int64,
	modTime time.Time,
	mode *os.FileMode) error

func (*MetadataDb) UpdateFileLocalPath

func (mdb *MetadataDb) UpdateFileLocalPath(
	ctx context.Context,
	oph *OpHandle,
	inode int64,
	localPath string) error

func (*MetadataDb) UpdateFileTagsAndProperties

func (mdb *MetadataDb) UpdateFileTagsAndProperties(
	ctx context.Context,
	oph *OpHandle,
	file File) error

func (*MetadataDb) UpdateInodeFileId

func (mdb *MetadataDb) UpdateInodeFileId(inode int64, fileId string) error

We wrote a new version of this file, creating a new file-id.

type MoveRecord

type MoveRecord struct {
	// contains filtered or unexported fields
}

type Node

type Node interface {
	GetInode() fuseops.InodeID
	GetAttrs() fuseops.InodeAttributes
}

A node is a generalization over files and directories

type Nonce

type Nonce struct {
	// contains filtered or unexported fields
}

func NewNonce

func NewNonce() *Nonce

func (*Nonce) String

func (n *Nonce) String() string

Create a random nonce, no longer than 128 bytes

type ObjInfo

type ObjInfo struct {
	Id string `json:"id"`
}

type OpHandle

type OpHandle struct {
	// contains filtered or unexported fields
}

A handle used when operating on a filesystem operation. We normally need a transaction and an http client.

func (*OpHandle) RecordError

func (oph *OpHandle) RecordError(err error) error

type Options

type Options struct {
	ReadOnly     bool
	Verbose      bool
	VerboseLevel int
	Uid          uint32
	Gid          uint32
}

type Posix

type Posix struct {
	// contains filtered or unexported fields
}

func NewPosix

func NewPosix(options Options) *Posix

func (*Posix) FixDir

func (px *Posix) FixDir(dxFolder *DxFolder) (*PosixDir, error)

main entry point

1. Keep directory names fixed 2. Change file names to not collide with directories, or with each other.

func (*Posix) SortObjectsByCtime added in v0.24.0

func (px *Posix) SortObjectsByCtime(dxObjs []DxDescribeDataObject) []DxDescribeDataObject

pick all the objects with "name" from the list. Return an empty array if none exist. Sort them from newest to oldest.

type PosixDir

type PosixDir struct {
	// contains filtered or unexported fields
}

Try to fix a DNAx directory, so it will adhere to POSIX.

  1. If several files share the same name, make them unique by moving into an extra subdirectory. For example:

    src name file-id new name X.txt file-0001 X.txt X.txt file-0005 1/X.txt X.txt file-0012 2/X.txt

2. DNAx files can include slashes. Drop these files, with a put note in the log.

  1. A directory and a file can have the same name. For example: ROOT/ zoo/ sub-directory zoo regular file

    Is converted into: ROOT zoo/ sub-directory 1/ faux sub-directory zoo regular file

type PrefetchFileMetadata

type PrefetchFileMetadata struct {
	// contains filtered or unexported fields
}

type PrefetchGlobalState

type PrefetchGlobalState struct {
	// contains filtered or unexported fields
}

global limits

func NewPrefetchGlobalState

func NewPrefetchGlobalState(verboseLevel int, dxEnv dxda.DXEnvironment) *PrefetchGlobalState

func (*PrefetchGlobalState) CacheLookup

func (pgs *PrefetchGlobalState) CacheLookup(hid fuseops.HandleID, startOfs int64, endOfs int64, data []byte) int

This is done on behalf of a user read request. If this range has been prefetched, copy the data. Return how much data was copied. Return zero length if the data isn't in cache.

func (*PrefetchGlobalState) CreateStreamEntry

func (pgs *PrefetchGlobalState) CreateStreamEntry(hid fuseops.HandleID, f File, url DxDownloadURL)

func (*PrefetchGlobalState) DownloadEntireFile

func (pgs *PrefetchGlobalState) DownloadEntireFile(
	client *http.Client,
	inode int64,
	size int64,
	url DxDownloadURL,
	fd *os.File,
	localPath string) error

Download an entire file, and write it to disk.

func (*PrefetchGlobalState) RemoveStreamEntry

func (pgs *PrefetchGlobalState) RemoveStreamEntry(hid fuseops.HandleID)

func (*PrefetchGlobalState) Shutdown

func (pgs *PrefetchGlobalState) Shutdown()

type Reply

type Reply struct {
	Results []DxDescribeRawTop `json:"results"`
}

type ReplyAddTags

type ReplyAddTags struct {
	Id string `json:"id"`
}

type ReplyClone

type ReplyClone struct {
	Id      string   `json:"id"`
	Project string   `json:"project"`
	Exists  []string `json:"exists"`
}

type ReplyDescribeProject

type ReplyDescribeProject struct {
	Id               string               `json:"id"`
	Name             string               `json:"name"`
	Region           string               `json:"region"`
	Version          int                  `json:"version"`
	DataUsage        float64              `json:"dataUsage"`
	CreatedMillisec  int64                `json:"created"`
	ModifiedMillisec int64                `json:"modified"`
	UploadParams     FileUploadParameters `json:"fileUploadParameters"`
	Level            string               `json:"level"`
}

type ReplyFolderNew

type ReplyFolderNew struct {
	Id string `json:"id"`
}

type ReplyFolderRemove

type ReplyFolderRemove struct {
	Id string `json:"id"`
}

type ReplyMove

type ReplyMove struct {
	Id string `json:"id"`
}

type ReplyNewFile

type ReplyNewFile struct {
	Id string `json:"id"`
}

type ReplyRemoveObjects

type ReplyRemoveObjects struct {
	Id string `json:"id"`
}

type ReplyRemoveTags

type ReplyRemoveTags struct {
	Id string `json:"id"`
}

type ReplyRename

type ReplyRename struct {
	Id string `json:"id"`
}

type ReplyRenameFolder

type ReplyRenameFolder struct {
	Id string `json:"id"`
}

type ReplySetProperties

type ReplySetProperties struct {
	Id string `json:"id"`
}

type ReplyUploadChunk

type ReplyUploadChunk struct {
	Url     string            `json:"url"`
	Expires int64             `json:"expires"`
	Headers map[string]string `json:"headers"`
}

type Request

type Request struct {
	Objects         []string                   `json:"id"`
	DescribeOptions map[string]map[string]bool `json:"describe"`
}

type RequestAddTags

type RequestAddTags struct {
	ProjId string   `json:"project"`
	Tags   []string `json:"tags"`
}

type RequestClone

type RequestClone struct {
	Objects     []string `json:"objects"`
	Folders     []string `json:"folders"`
	Project     string   `json:"project"`
	Destination string   `json:"destination"`
	Parents     bool     `json:"parents"`
}

type RequestDescribeProject

type RequestDescribeProject struct {
	Fields map[string]bool `json:"fields"`
}

type RequestFolderNew

type RequestFolderNew struct {
	ProjId  string `json:"project"`
	Folder  string `json:"folder"`
	Parents bool   `json:"parents"`
}

type RequestFolderRemove

type RequestFolderRemove struct {
	ProjId string `json:"project"`
	Folder string `json:"folder"`
}

type RequestMove

type RequestMove struct {
	Objects     []string `json:"objects"`
	Folders     []string `json:"folders"`
	Destination string   `json:"destination"`
}

type RequestNewFile

type RequestNewFile struct {
	ProjId  string `json:"project"`
	Name    string `json:"name"`
	Folder  string `json:"folder"`
	Parents bool   `json:"parents"`
	Nonce   string `json:"nonce"`
}

type RequestRemoveObjects

type RequestRemoveObjects struct {
	Objects []string `json:"objects"`
	Force   bool     `json:"force"`
}

type RequestRemoveTags

type RequestRemoveTags struct {
	ProjId string   `json:"project"`
	Tags   []string `json:"tags"`
}

type RequestRename

type RequestRename struct {
	ProjId string `json:"project"`
	Name   string `json:"name"`
}

type RequestRenameFolder

type RequestRenameFolder struct {
	Folder string `json:"folder"`
	Name   string `json:"name"`
}

type RequestSetProperties

type RequestSetProperties struct {
	ProjId     string               `json:"project"`
	Properties map[string](*string) `json:"properties"`
}

type RequestUploadChunk

type RequestUploadChunk struct {
	Size  int    `json:"size"`
	Index int    `json:"index"`
	Md5   string `json:"md5"`
}

type RequestWithScope added in v0.22.2

type RequestWithScope struct {
	Objects         []string                   `json:"id"`
	Scope           map[string]string          `json:"scope"`
	DescribeOptions map[string]map[string]bool `json:"describe"`
}

type SyncDbDx

type SyncDbDx struct {
	// contains filtered or unexported fields
}

func NewSyncDbDx

func NewSyncDbDx(
	options Options,
	dxEnv dxda.DXEnvironment,
	projId2Desc map[string]DxDescribePrj,
	mdb *MetadataDb,
	mutex *sync.Mutex) *SyncDbDx

func (*SyncDbDx) CmdSync

func (sybx *SyncDbDx) CmdSync() error

func (*SyncDbDx) Shutdown

func (sybx *SyncDbDx) Shutdown()

type UploadRequest added in v1.0.0

type UploadRequest struct {
	// contains filtered or unexported fields
}

Directories

Path Synopsis
test

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL