gowfs

package module
v0.3.0 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Apr 10, 2020 License: Apache-2.0 Imports: 17 Imported by: 0

README

Build Status

gowfs

gowfs is a Go bindings for Hadoop HDFS via its WebHDFS interface. It provides typed access to remote HDFS resources via Go's JSON marshaling system. gowfs follows the WebHDFS JSON protocol outline in http://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-hdfs/WebHDFS.html. It has been tested with Apache Hadoop 2.x.x - series.

GoDoc Package Documentation

GoDoc documentation - https://godoc.org/github.com/vladimirvivien/gowfs

Usage

go get github.com/vladimirvivien/gowfs
import github.com/vladimirvivien/gowfs
...
fs, err := gowfs.NewFileSystem(gowfs.Configuration{Addr: "localhost:50070", User: "hdfs"})
if err != nil{
	log.Fatal(err)
}
checksum, err := fs.GetFileChecksum(gowfs.Path{Name: "location/to/file"})
if err != nil {
	log.Fatal(err)
}
fmt.Println (checksum)

Run HDFS Test

To see the API used, see directory test-hdfs. Compile and use that code to test against a running HDFS deployment. See https://github.com/vladimirvivien/gowfs/tree/master/test-hdfs.

HDFS Setup
  • Enable dfs.webhdfs.enabled property in your hsdfs-site.xml
  • Ensure hadoop.http.staticuser.user property is set in your core-site.xml.

API Overview

gowfs lets you access HDFS resources via two structs FileSystem and FsShell. Use FileSystem to get access to low level callse. FsShell is designed to provide a higer level of abstraction and integration with the local file system.

FileSystem API

Configuration{} Struct

Use the Configuration{} struct to specify paramters for the file system. You can create configuration either using a Configuration{} literal or using NewConfiguration() for defaults.

conf := *gowfs.NewConfiguration()
conf.Addr = "localhost:50070"
conf.User = "hdfs"
conf.ConnectionTime = time.Second * 15
conf.DisableKeepAlives = false 
FileSystem{} Struct

Create a new FileSystem{} struct before you can make call to any functions. You create the FileSystem by passing in a Configuration pointer as shown below.

fs, err := gowfs.NewFileSystem(conf)

Now you are ready to communicate with HDFS.

Create File

FileSystem.Create() creates and store a remote file on the HDFS server. See https://godoc.org/github.com/vladimirvivien/gowfs#FileSystem.Create

ok, err := fs.Create(
    bytes.NewBufferString("Hello webhdfs users!"),
	gowfs.Path{Name:"/remote/file"},
	false,
	0,
	0,
	0700,
	0,
)
Open HDFS File

Use the FileSystem.Open() to open and read a remote file from HDFS. See https://godoc.org/github.com/vladimirvivien/gowfs#FileSystem.Open

data, err := fs.Open(gowfs.Path{Name:"/remote/file"}, 0, 512, 2048)
...
rcvdData, _ := ioutil.ReadAll(data)
fmt.Println(string(rcvdData))

Append to File

To append to an existing HDFS file, use FileSystem.Append(). See https://godoc.org/github.com/vladimirvivien/gowfs#FileSystem.Append

ok, err := fs.Append(
    bytes.NewBufferString("Hello webhdfs users!"),
    gowfs.Path{Name:"/remote/file"}, 4096)
Rename File

Use FileSystem.Rename() to rename HDFS resources. See https://godoc.org/github.com/vladimirvivien/gowfs#FileSystem.Rename

ok, err := fs.Rename(gowfs.Path{Name:"/old/name"}, Path{Name:"/new/name"})
Delete HDFS Resources

To delete an HDFS resource (file/directory), use FileSystem.Delete(). See https://godoc.org/github.com/vladimirvivien/gowfs#FileSystem.Delete

ok, err := fs.Delete(gowfs.Path{Name:"/remote/file/todelete"}, false)
File Status

You can get status about an existing HDFS resource using FileSystem.GetFileStatus(). See https://godoc.org/github.com/vladimirvivien/gowfs#FileSystem.GetFileStatus

fileStatus, err := fs.GetFileStatus(gowfs.Path{Name:"/remote/file"})

gowfs returns a value of type FileStatus which is a struct with info about remote file.

type FileStatus struct {
	AccesTime int64
    BlockSize int64
    Group string
    Length int64
    ModificationTime int64
    Owner string
    PathSuffix string
    Permission string
    Replication int64
    Type string
}

You can get a list of file stats using FileSystem.ListStatus().

stats, err := fs.ListStatus(gowfs.Path{Name:"/remote/directory"})
for _, stat := range stats {
    fmt.Println(stat.PathSuffix, stat.Length)
}

FsShell Examples

Create the FsShell

To create an FsShell, you need to have an existing instance of FileSystem.

shell := gowfs.FsShell{FileSystem:fs}
FsShell.Put()

Use the put to upload a local file to an HDFS file system. See https://godoc.org/github.com/vladimirvivien/gowfs#FsShell.PutOne

ok, err := shell.Put("local/file/name", "hdfs/file/path", true)
FsShell.Get()

Use the Get to retrieve remote HDFS file to local file system. See https://godoc.org/github.com/vladimirvivien/gowfs#FsShell.Get

ok, err := shell.Get("hdfs/file/path", "local/file/name")
FsShell.AppendToFile()

Append local files to remote HDFS file or directory. See https://godoc.org/github.com/vladimirvivien/gowfs#FsShell.AppendToFile

ok, err := shell.AppendToFile([]string{"local/file/1", "local/file/2"}, "remote/hdfs/path")
FsShell.Chown()

Change owner for remote file. See https://godoc.org/github.com/vladimirvivien/gowfs#FsShell.Chown.

ok, err := shell.Chown([]string{"/remote/hdfs/file"}, "owner2")
FsShell.Chgrp()

Change group of remote HDFS files. See https://godoc.org/github.com/vladimirvivien/gowfs#FsShell.Chgrp

ok, err := shell.Chgrp([]string{"/remote/hdfs/file"}, "superduper")
FsShell.Chmod()

Change file mod of remote HDFS files. See https://godoc.org/github.com/vladimirvivien/gowfs#FsShell.Chmod

ok, err := shell.Chmod([]string{"/remote/hdfs/file/"}, 0744)

Limitations

  1. Only "SIMPLE" security mode supported.
  2. No support for kerberos (none plan right now)
  3. No SSL support yet.

References

  1. WebHDFS API - http://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-hdfs/WebHDFS.html
  2. FileSystemShell - http://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/FileSystemShell.html#getmerge

Documentation

Overview

gowfs is Go bindings for the Hadoop HDFS over its WebHDFS interface. gowfs uses JSON marshalling to expose typed values from HDFS. See https://github.com/vladimirvivien/gowfs.

Index

Constants

View Source
const (
	OP_OPEN                  = "OPEN"
	OP_CREATE                = "CREATE"
	OP_APPEND                = "APPEND"
	OP_CONCAT                = "CONCAT"
	OP_RENAME                = "RENAME"
	OP_DELETE                = "DELETE"
	OP_SETPERMISSION         = "SETPERMISSION"
	OP_SETOWNER              = "SETOWNER"
	OP_SETREPLICATION        = "SETREPLICATION"
	OP_SETTIMES              = "SETTIMES"
	OP_MKDIRS                = "MKDIRS"
	OP_CREATESYMLINK         = "CREATESYMLINK"
	OP_LISTSTATUS            = "LISTSTATUS"
	OP_GETFILESTATUS         = "GETFILESTATUS"
	OP_GETCONTENTSUMMARY     = "GETCONTENTSUMMARY"
	OP_GETFILECHECKSUM       = "GETFILECHECKSUM"
	OP_GETDELEGATIONTOKEN    = "GETDELEGATIONTOKEN"
	OP_GETDELEGATIONTOKENS   = "GETDELEGATIONTOKENS"
	OP_RENEWDELEGATIONTOKEN  = "RENEWDELEGATIONTOKEN"
	OP_CANCELDELEGATIONTOKEN = "CANCELDELEGATIONTOKEN"
)
View Source
const MAX_DOWN_CHUNK int64 = 500 * (1024 * 1024) // 500 MB
View Source
const MAX_UP_CHUNK int64 = 1 * (1024 * 1024) * 1024 // 1 GB.
View Source
const WebHdfsVer string = "/webhdfs/v1"

Variables

This section is empty.

Functions

func Backoff added in v0.3.0

func Backoff(coef float64) func(time.Duration) time.Duration

func Retries added in v0.3.0

func Retries(tries int, delay time.Duration, backoff func(time.Duration) time.Duration) func() []time.Duration

Types

type Configuration

type Configuration struct {
	Addr                  string // host:port
	BasePath              string // initial base path to be appended
	User                  string // user.name to use to connect
	Password              string
	ConnectionTimeout     time.Duration
	DisableKeepAlives     bool
	DisableCompression    bool
	ResponseHeaderTimeout time.Duration
	MaxIdleConnsPerHost   int
	UseBaseAuth           bool
	UseHTTPS              bool
	TLSClientSkipSecurity bool
	Retries               func() []time.Duration
}

func NewConfiguration

func NewConfiguration() *Configuration

func (*Configuration) GetNameNodeUrl

func (conf *Configuration) GetNameNodeUrl() (*url.URL, error)

type ContentSummary

type ContentSummary struct {
	DirectoryCount int64
	FileCount      int64
	Length         int64
	Quota          int64
	SpaceConsumed  int64
	SpaceQuota     int64
}
Type for HDFS FileSystem content summary (FileSystem.getContentSummary())
See http://hadoop.apache.org/docs/r2.2.0/hadoop-project-dist/hadoop-hdfs/WebHDFS.html#ContentSummary_JSON_Schema

Example:

{
  "ContentSummary":
  {
    "directoryCount": 2,
    "fileCount"     : 1,
    "length"        : 24930,
    "quota"         : -1,
    "spaceConsumed" : 24930,
    "spaceQuota"    : -1
  }
}

type FileChecksum

type FileChecksum struct {
	Algorithm string
	Bytes     string
	Length    int64
}
Type for HDFS FileSystem.getFileChecksum()
See http://hadoop.apache.org/docs/r2.2.0/hadoop-project-dist/hadoop-hdfs/WebHDFS.html#FileChecksum_JSON_Schema

Example:

{
  "FileChecksum":
  {
    "algorithm": "MD5-of-1MD5-of-512CRC32",
    "bytes"    : "eadb10de24aa315748930df6e185c0d ...",
    "length"   : 28
  }
}

type FileStatus

type FileStatus struct {
	AccesTime        int64
	BlockSize        int64
	Group            string
	Length           int64
	ModificationTime int64
	Owner            string
	PathSuffix       string
	Permission       string
	Replication      int64
	Type             string
}

Represents HDFS FileStatus (FileSystem.getStatus()) See http://hadoop.apache.org/docs/r2.2.0/hadoop-project-dist/hadoop-hdfs/WebHDFS.html#FileStatus_JSON_Schema

Example:

{
  "FileStatus":
  {
    "accessTime"      : 0, 				// integer
    "blockSize"       : 0, 				// integer
    "group"           : "grp",			// string
    "length"          : 0,             	// integer - zero for directories
    "modificationTime": 1320173277227,	// integer
    "owner"           : "webuser",		// string
    "pathSuffix"      : "",				// string
    "permission"      : "777",			// string
    "replication"     : 0,				// integer
    "type"            : "DIRECTORY"    	// string - enum {FILE, DIRECTORY, SYMLINK}
  }
}

type FileStatuses

type FileStatuses struct {
	FileStatus []FileStatus
}

Container type for multiple FileStatus for directory, etc (see HDFS FileSystem.listStatus()) NOTE: the confusing naming and Plurality is to match WebHDFS schema.

type FileSystem

type FileSystem struct {
	Config Configuration
	// contains filtered or unexported fields
}

This type maps fields and functions to HDFS's FileSystem class.

func NewFileSystem

func NewFileSystem(conf Configuration) (*FileSystem, error)

func (*FileSystem) Append

func (fs *FileSystem) Append(data io.Reader, p Path, buffersize int, contenttype string) (bool, error)

Appends specified data to an existing file. See HDFS FileSystem.append() See http://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-hdfs/WebHDFS.html#Append_to_a_File NOTE: Append() is known to have issues - see https://issues.apache.org/jira/browse/HDFS-4600

func (*FileSystem) CancelDelegationToken

func (fs *FileSystem) CancelDelegationToken(token string) (bool, error)

func (*FileSystem) Concat

func (fs *FileSystem) Concat(target Path, sources []string) (bool, error)

Concatenate (on the server) a list of given files paths to a new file. See HDFS FileSystem.concat()

func (*FileSystem) Create

func (fs *FileSystem) Create(
	data io.Reader,
	p Path,
	overwrite bool,
	blocksize uint64,
	replication uint16,
	permission os.FileMode,
	buffersize uint,
	contenttype string) (bool, error)

Creates a new file and stores its content in HDFS. See HDFS FileSystem.create() For detail, http://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-hdfs/WebHDFS.html#Create_and_Write_to_a_File See NOTE section on that page for impl detail.

func (fs *FileSystem) CreateSymlink(dest Path, link Path, createParent bool) (bool, error)

Creates a symlink where link -> destination See HDFS FileSystem.createSymlink() dest - the full path of the original resource link - the symlink path to create createParent - when true, parent dirs are created if they don't exist See http://hadoop.apache.org/docs/r2.2.0/hadoop-project-dist/hadoop-hdfs/WebHDFS.html#HTTP_Query_Parameter_Dictionary

func (*FileSystem) Delete

func (fs *FileSystem) Delete(path Path, recursive bool) (bool, error)

Deletes the specified path. See HDFS FileSystem.delete()

func (*FileSystem) GetContentSummary

func (fs *FileSystem) GetContentSummary(p Path) (ContentSummary, error)

Returns ContentSummary for the given path. For detail, see HDFS FileSystem.getContentSummary()

func (*FileSystem) GetDelegationToken

func (fs *FileSystem) GetDelegationToken(renewer string) (Token, error)

func (*FileSystem) GetDelegationTokens

func (fs *FileSystem) GetDelegationTokens(renewer string) ([]Token, error)

func (*FileSystem) GetFileChecksum

func (fs *FileSystem) GetFileChecksum(p Path) (FileChecksum, error)

Returns HDFS file checksum. For detail, see HDFS FileSystem.getFileChecksum()

func (*FileSystem) GetFileStatus

func (fs *FileSystem) GetFileStatus(p Path) (FileStatus, error)

Returns status for a given file. The Path must represent a FILE on the remote system. (see HDFS FileSystem.getFileStatus())

func (*FileSystem) GetHomeDirectory

func (fs *FileSystem) GetHomeDirectory() (Path, error)

func (*FileSystem) ListStatus

func (fs *FileSystem) ListStatus(p Path) ([]FileStatus, error)

Returns an array of FileStatus for a given file directory. For details, see HDFS FileSystem.listStatus()

func (*FileSystem) MkDirs

func (fs *FileSystem) MkDirs(p Path, fm os.FileMode) (bool, error)

Creates the specified directory(ies). See HDFS FileSystem.mkdirs()

func (*FileSystem) Open

func (fs *FileSystem) Open(p Path, offset, length int64, buffSize int) (io.ReadCloser, error)

Opens the specificed Path and returns its content to be accessed locally. See HDFS WebHdfsFileSystem.open() See http://hadoop.apache.org/docs/r2.2.0/hadoop-project-dist/hadoop-hdfs/WebHDFS.html#HTTP_Query_Parameter_Dictionary

func (*FileSystem) Rename

func (fs *FileSystem) Rename(source Path, destination Path) (bool, error)

Renames the specified path resource to a new name. See HDFS FileSystem.rename()

func (*FileSystem) RenewDelegationToken

func (fs *FileSystem) RenewDelegationToken(token string) (int64, error)

func (*FileSystem) SetOwner

func (fs *FileSystem) SetOwner(path Path, owner string, group string) (bool, error)

Sets owner for the specified path. See HDFS FileSystem.setOwner()

func (*FileSystem) SetPermission

func (fs *FileSystem) SetPermission(path Path, permission os.FileMode) (bool, error)

Sets the permission for the specified path. See FileSystem.setPermission()

func (*FileSystem) SetReplication

func (fs *FileSystem) SetReplication(path Path, replication uint16) (bool, error)

Sets replication factor for given path. See HDFS FileSystem.setReplication()

func (*FileSystem) SetTimes

func (fs *FileSystem) SetTimes(path Path, accesstime int64, modificationtime int64) (bool, error)

Sets access or modification time for specified resource See HDFS FileSystem.setTimes

type FsShell

type FsShell struct {
	FileSystem  *FileSystem
	WorkingPath string
}

func (FsShell) AppendToFile

func (shell FsShell) AppendToFile(filePaths []string, hdfsPath string, contenttype string) (bool, error)

Appends the specified list of local files to the HDFS path.

func (FsShell) Cat

func (shell FsShell) Cat(hdfsPaths []string, writr io.Writer) error

Returns a writer with the content of the specified files.

func (FsShell) Chgrp

func (shell FsShell) Chgrp(hdfsPaths []string, grpName string) (bool, error)

Changes the group association of the given hdfs paths.

func (FsShell) Chmod

func (shell FsShell) Chmod(hdfsPaths []string, perm os.FileMode) (bool, error)

Changes the filemode of the provided hdfs paths.

func (FsShell) Chown

func (shell FsShell) Chown(hdfsPaths []string, owner string) (bool, error)

Changes the owner of the specified hdfs paths.

func (FsShell) Exists

func (shell FsShell) Exists(hdfsPath string) (bool, error)

Tests the existence of a remote HDFS file/directory.

func (FsShell) Get

func (shell FsShell) Get(hdfsPath, localFile string) (bool, error)

Retrieves a remote HDFS file and saves as the specified local file.

func (FsShell) MoveFromLocal

func (shell FsShell) MoveFromLocal(localFile, hdfsPath string, overwrite bool) (bool, error)

Copies local file to remote destination, then local file is removed.

func (FsShell) MoveToLocal

func (shell FsShell) MoveToLocal(hdfsPath, localFile string) (bool, error)

Copies remote HDFS file locally. The remote file is then removed.

func (FsShell) Put

func (shell FsShell) Put(localFile string, hdfsPath string, overwrite bool) (bool, error)

Copies one specified local file to the remote HDFS server. Uses default permission, blocksize, and replication.

func (FsShell) PutMany

func (shell FsShell) PutMany(files []string, hdfsPath string, overwrite bool) (bool, error)

Copies sepcified local files to remote HDFS server. The hdfsPath must be a directory (created if it does not exist). Uses default permission, blocksize, and replication.

func (FsShell) Rm

func (shell FsShell) Rm(hdfsPath string) (bool, error)

Removes the specified HDFS source.

type HdfsJsonData

type HdfsJsonData struct {
	Boolean         bool
	FileStatus      FileStatus
	FileStatuses    FileStatuses
	FileChecksum    FileChecksum
	ContentSummary  ContentSummary
	Token           Token
	Tokens          Tokens
	Long            int64
	RemoteException RemoteException
}

Root level struct for data JSON data from WebHDFS.

type Path

type Path struct {
	Name       string  // Relative path representation (/root/leaf)
	RefererUrl url.URL // URL related to path (http://server:port/root/leaf)
}

Represents a remote webHDFS path

type RemoteException

type RemoteException struct {
	Exception     string
	JavaClassName string
	Message       string
}

Example:

{
  "RemoteException":
  {
    "exception"    : "FileNotFoundException",
    "javaClassName": "java.io.FileNotFoundException",
    "message"      : "File does not exist: /foo/a.patch"
  }
}

func (RemoteException) Error

func (re RemoteException) Error() string

Implementation of error type. Returns string representation of RemoteException.

type Token

type Token struct {
	UrlString string
}

Example:

{
  "Token":
  {
    "urlString": "JQAIaG9y..."
  }
}

type Tokens

type Tokens struct {
	Token []Token
}

Container type for Token

Directories

Path Synopsis

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL