Documentation ¶
Overview ¶
Package hdfs provides a native, idiomatic interface to HDFS. Where possible, it mimics the functionality and signatures of the standard `os` package.
Example:
client, _ := hdfs.New("namenode:8020") file, _ := client.Open("/mobydick.txt") buf := make([]byte, 59) file.ReadAt(buf, 48847) fmt.Println(string(buf)) // => Abominable are the tumblers into which he pours his poison.
Index ¶
- Variables
- func Username() (string, error)
- type Client
- func (c *Client) Append(name string) (*FileWriter, error)
- func (c *Client) Chmod(name string, perm os.FileMode) error
- func (c *Client) Chown(name string, user, group string) error
- func (c *Client) Chtimes(name string, atime time.Time, mtime time.Time) error
- func (c *Client) Close() error
- func (c *Client) CopyToLocal(src string, dst string) error
- func (c *Client) CopyToRemote(src string, dst string) error
- func (c *Client) Create(name string) (*FileWriter, error)
- func (c *Client) CreateEmptyFile(name string) error
- func (c *Client) CreateFile(name string, replication int, blockSize int64, perm os.FileMode) (*FileWriter, error)
- func (c *Client) GetContentSummary(name string) (*ContentSummary, error)
- func (c *Client) Mkdir(dirname string, perm os.FileMode) error
- func (c *Client) MkdirAll(dirname string, perm os.FileMode) error
- func (c *Client) Open(name string) (*FileReader, error)
- func (c *Client) ReadDir(dirname string) ([]os.FileInfo, error)
- func (c *Client) ReadFile(filename string) ([]byte, error)
- func (c *Client) Remove(name string) error
- func (c *Client) Rename(oldpath, newpath string) error
- func (c *Client) Stat(name string) (os.FileInfo, error)
- func (c *Client) StatFs() (FsInfo, error)
- type ClientOptions
- type ContentSummary
- type FileInfo
- func (fi *FileInfo) AccessTime() time.Time
- func (fi *FileInfo) IsDir() bool
- func (fi *FileInfo) ModTime() time.Time
- func (fi *FileInfo) Mode() os.FileMode
- func (fi *FileInfo) Name() string
- func (fi *FileInfo) Owner() string
- func (fi *FileInfo) OwnerGroup() string
- func (fi *FileInfo) Size() int64
- func (fi *FileInfo) Sys() interface{}
- type FileReader
- func (f *FileReader) Checksum() ([]byte, error)
- func (f *FileReader) Close() error
- func (f *FileReader) Name() string
- func (f *FileReader) Read(b []byte) (int, error)
- func (f *FileReader) ReadAt(b []byte, off int64) (int, error)
- func (f *FileReader) Readdir(n int) ([]os.FileInfo, error)
- func (f *FileReader) Readdirnames(n int) ([]string, error)
- func (f *FileReader) Seek(offset int64, whence int) (int64, error)
- func (f *FileReader) Stat() os.FileInfo
- type FileWriter
- type FsInfo
- type HadoopConf
- type Property
Constants ¶
This section is empty.
Variables ¶
var StatFsError = errors.New("Failed to get HDFS usage")
Functions ¶
Types ¶
type Client ¶
type Client struct {
// contains filtered or unexported fields
}
A Client represents a connection to an HDFS cluster
func New ¶
New returns a connected Client, or an error if it can't connect. The user will be the user the code is running under. If address is an empty string it will try and get the namenode address from the hadoop configuration files.
func NewClient ¶ added in v1.1.0
func NewClient(options ClientOptions) (*Client, error)
NewClient returns a connected Client for the given options, or an error if the client could not be created.
func NewForConnection ¶ added in v1.0.2
func NewForConnection(namenode *rpc.NamenodeConnection) *Client
NewForConnection returns Client with the specified, underlying rpc.NamenodeConnection. You can use rpc.WrapNamenodeConnection to wrap your own net.Conn.
func NewForUser ¶
NewForUser returns a connected Client with the user specified, or an error if it can't connect.
func (*Client) Append ¶ added in v1.0.0
func (c *Client) Append(name string) (*FileWriter, error)
Append opens an existing file in HDFS and returns an io.WriteCloser for writing to it. Because of the way that HDFS writes are buffered and acknowledged asynchronously, it is very important that Close is called after all data has been written.
func (*Client) Chown ¶
Chown changes the user and group of the file. Unlike os.Chown, this takes a string username and group (since that's what HDFS uses.)
If an empty string is passed for user or group, that field will not be changed remotely.
func (*Client) Close ¶ added in v1.0.0
Close terminates all underlying socket connections to remote server.
func (*Client) CopyToLocal ¶
CopyToLocal copies the HDFS file specified by src to the local file at dst. If dst already exists, it will be overwritten.
func (*Client) CopyToRemote ¶ added in v1.0.0
CopyToRemote copies the local file specified by src to the HDFS file at dst. If dst already exists, it will be overwritten.
func (*Client) Create ¶ added in v1.0.0
func (c *Client) Create(name string) (*FileWriter, error)
Create opens a new file in HDFS with the default replication, block size, and permissions (0644), and returns an io.WriteCloser for writing to it. Because of the way that HDFS writes are buffered and acknowledged asynchronously, it is very important that Close is called after all data has been written.
func (*Client) CreateEmptyFile ¶
CreateEmptyFile creates a empty file at the given name, with the permissions 0644.
func (*Client) CreateFile ¶ added in v1.0.0
func (c *Client) CreateFile(name string, replication int, blockSize int64, perm os.FileMode) (*FileWriter, error)
CreateFile opens a new file in HDFS with the given replication, block size, and permissions, and returns an io.WriteCloser for writing to it. Because of the way that HDFS writes are buffered and acknowledged asynchronously, it is very important that Close is called after all data has been written.
func (*Client) GetContentSummary ¶ added in v0.1.4
func (c *Client) GetContentSummary(name string) (*ContentSummary, error)
GetContentSummary returns a ContentSummary representing the named file or directory. The summary contains information about the entire tree rooted in the named file; for instance, it can return the total size of all
func (*Client) MkdirAll ¶
MkdirAll creates a directory for dirname, along with any necessary parents, and returns nil, or else returns an error. The permission bits perm are used for all directories that MkdirAll creates. If dirname is already a directory, MkdirAll does nothing and returns nil.
func (*Client) Open ¶
func (c *Client) Open(name string) (*FileReader, error)
Open returns an FileReader which can be used for reading.
func (*Client) ReadDir ¶
ReadDir reads the directory named by dirname and returns a list of sorted directory entries.
type ClientOptions ¶ added in v1.1.0
type ClientOptions struct { Addresses []string Namenode *rpc.NamenodeConnection User string }
ClientOptions represents the configurable options for a client.
type ContentSummary ¶ added in v0.1.4
type ContentSummary struct {
// contains filtered or unexported fields
}
ContentSummary represents a set of information about a file or directory in HDFS. It's provided directly by the namenode, and has no unix filesystem analogue.
func (*ContentSummary) DirectoryCount ¶ added in v0.1.4
func (cs *ContentSummary) DirectoryCount() int
DirectoryCount returns the number of directories under the named one, including any subdirectories, and including the root directory itself. If the named path is a file, this returns 0.
func (*ContentSummary) FileCount ¶ added in v0.1.4
func (cs *ContentSummary) FileCount() int
FileCount returns the number of files under the named path, including any subdirectories. If the named path is a file, FileCount returns 1.
func (*ContentSummary) NameQuota ¶ added in v0.1.4
func (cs *ContentSummary) NameQuota() int
NameQuota returns the HDFS configured "name quota" for the named path. The name quota is a hard limit on the number of directories and files inside a directory; see http://goo.gl/sOSJmJ for more information.
func (*ContentSummary) Size ¶ added in v0.1.4
func (cs *ContentSummary) Size() int64
Size returns the total size of the named path, including any subdirectories.
func (*ContentSummary) SizeAfterReplication ¶ added in v0.1.4
func (cs *ContentSummary) SizeAfterReplication() int64
SizeAfterReplication returns the total size of the named path, including any subdirectories. Unlike Size, it counts the total replicated size of each file, and represents the total on-disk footprint for a tree in HDFS.
func (*ContentSummary) SpaceQuota ¶ added in v0.1.4
func (cs *ContentSummary) SpaceQuota() int64
SpaceQuota returns the HDFS configured "name quota" for the named path. The name quota is a hard limit on the number of directories and files inside a directory; see http://goo.gl/sOSJmJ for more information.
type FileInfo ¶
type FileInfo struct {
// contains filtered or unexported fields
}
FileInfo implements os.FileInfo, and provides information about a file or directory in HDFS.
func (*FileInfo) AccessTime ¶
AccessTime returns the last time the file was accessed. It's not part of the os.FileInfo interface.
func (*FileInfo) Owner ¶
Owner returns the name of the user that owns the file or directory. It's not part of the os.FileInfo interface.
func (*FileInfo) OwnerGroup ¶
OwnerGroup returns the name of the group that owns the file or directory. It's not part of the os.FileInfo interface.
type FileReader ¶
type FileReader struct {
// contains filtered or unexported fields
}
A FileReader represents an existing file or directory in HDFS. It implements io.Reader, io.ReaderAt, io.Seeker, and io.Closer, and can only be used for reads. For writes, see FileWriter and Client.Create.
func (*FileReader) Checksum ¶
func (f *FileReader) Checksum() ([]byte, error)
Checksum returns HDFS's internal "MD5MD5CRC32C" checksum for a given file.
Internally to HDFS, it works by calculating the MD5 of all the CRCs (which are stored alongside the data) for each block, and then calculating the MD5 of all of those.
func (*FileReader) Read ¶
func (f *FileReader) Read(b []byte) (int, error)
Read implements io.Reader.
func (*FileReader) ReadAt ¶
func (f *FileReader) ReadAt(b []byte, off int64) (int, error)
ReadAt implements io.ReaderAt.
func (*FileReader) Readdir ¶
func (f *FileReader) Readdir(n int) ([]os.FileInfo, error)
Readdir reads the contents of the directory associated with file and returns a slice of up to n os.FileInfo values, as would be returned by Stat, in directory order. Subsequent calls on the same file will yield further os.FileInfos.
If n > 0, Readdir returns at most n os.FileInfo values. In this case, if Readdir returns an empty slice, it will return a non-nil error explaining why. At the end of a directory, the error is io.EOF.
If n <= 0, Readdir returns all the os.FileInfo from the directory in a single slice. In this case, if Readdir succeeds (reads all the way to the end of the directory), it returns the slice and a nil error. If it encounters an error before the end of the directory, Readdir returns the os.FileInfo read until that point and a non-nil error.
func (*FileReader) Readdirnames ¶
func (f *FileReader) Readdirnames(n int) ([]string, error)
Readdirnames reads and returns a slice of names from the directory f.
If n > 0, Readdirnames returns at most n names. In this case, if Readdirnames returns an empty slice, it will return a non-nil error explaining why. At the end of a directory, the error is io.EOF.
If n <= 0, Readdirnames returns all the names from the directory in a single slice. In this case, if Readdirnames succeeds (reads all the way to the end of the directory), it returns the slice and a nil error. If it encounters an error before the end of the directory, Readdirnames returns the names read until that point and a non-nil error.
func (*FileReader) Seek ¶
func (f *FileReader) Seek(offset int64, whence int) (int64, error)
Seek implements io.Seeker.
The seek is virtual - it starts a new block read at the new position.
func (*FileReader) Stat ¶
func (f *FileReader) Stat() os.FileInfo
Stat returns the FileInfo structure describing file.
type FileWriter ¶ added in v1.0.0
type FileWriter struct {
// contains filtered or unexported fields
}
A FileWriter represents a writer for an open file in HDFS. It implements Writer and Closer, and can only be used for writes. For reads, see FileReader and Client.Open.
func (*FileWriter) Close ¶ added in v1.0.0
func (f *FileWriter) Close() error
Close closes the file, writing any remaining data out to disk and waiting for acknowledgements from the datanodes. It is important that Close is called after all data has been written.
func (*FileWriter) Write ¶ added in v1.0.0
func (f *FileWriter) Write(b []byte) (int, error)
Write implements io.Writer for writing to a file in HDFS. Internally, it writes data to an internal buffer first, and then later out to HDFS. Because of this, it is important that Close is called after all data has been written.
type FsInfo ¶ added in v1.0.3
type FsInfo struct { Capacity uint64 Used uint64 Remaining uint64 UnderReplicated uint64 CorruptBlocks uint64 MissingBlocks uint64 MissingReplOneBlocks uint64 BlocksInFuture uint64 PendingDeletionBlocks uint64 }
FsInfo provides information about HDFS
type HadoopConf ¶ added in v1.0.0
HadoopConf represents a map of all the key value configutation pairs found in a user's hadoop configuration files.
func LoadHadoopConf ¶ added in v1.0.0
func LoadHadoopConf(inputPath string) HadoopConf
LoadHadoopConf returns a HadoopConf object that is key value map of all the hadoop conf properties, swallows errors reading xml and reading a non-existant file.
func (HadoopConf) Namenodes ¶ added in v1.0.0
func (conf HadoopConf) Namenodes() ([]string, error)
Namenodes returns a slice of deduplicated namenodes named in a user's hadoop configuration files or an error is there are no namenodes.
Source Files ¶
Directories ¶
Path | Synopsis |
---|---|
cmd
|
|
protocol
|
|
hadoop_common
Package hadoop_common is a generated protocol buffer package.
|
Package hadoop_common is a generated protocol buffer package. |
hadoop_hdfs
Package hadoop_hdfs is a generated protocol buffer package.
|
Package hadoop_hdfs is a generated protocol buffer package. |
Package rpc implements some of the lower-level functionality required to communicate with the namenode and datanodes.
|
Package rpc implements some of the lower-level functionality required to communicate with the namenode and datanodes. |