Documentation ¶
Overview ¶
Package hdfs provides a native, idiomatic interface to HDFS. Where possible, it mimics the functionality and signatures of the standard `os` package.
Example:
client, _ := hdfs.New("namenode:8020") file, _ := client.Open("/mobydick.txt") buf := make([]byte, 59) file.ReadAt(buf, 48847) fmt.Println(string(buf)) // => Abominable are the tumblers into which he pours his poison.
Index ¶
- Constants
- type Client
- func (c *Client) AllowSnapshots(dir string) error
- func (c *Client) Append(name string) (*FileWriter, error)
- func (c *Client) Chmod(name string, perm os.FileMode) error
- func (c *Client) Chown(name string, user, group string) error
- func (c *Client) Chtimes(name string, atime time.Time, mtime time.Time) error
- func (c *Client) Close() error
- func (c *Client) CopyToLocal(src string, dst string) error
- func (c *Client) CopyToRemote(src string, dst string) error
- func (c *Client) Create(name string) (*FileWriter, error)
- func (c *Client) CreateEmptyFile(name string) error
- func (c *Client) CreateFile(name string, replication int, blockSize int64, perm os.FileMode) (*FileWriter, error)
- func (c *Client) CreateSnapshot(dir, name string) (string, error)
- func (c *Client) DeleteSnapshot(dir, name string) error
- func (c *Client) DisallowSnapshots(dir string) error
- func (c *Client) GetContentSummary(name string) (*ContentSummary, error)
- func (c *Client) GetXAttrs(name string, keys ...string) (map[string]string, error)
- func (c *Client) ListXAttrs(name string) (map[string]string, error)
- func (c *Client) Mkdir(dirname string, perm os.FileMode) error
- func (c *Client) MkdirAll(dirname string, perm os.FileMode) error
- func (c *Client) Name() string
- func (c *Client) Open(name string) (*FileReader, error)
- func (c *Client) ReadDir(dirname string) ([]os.FileInfo, error)
- func (c *Client) ReadFile(filename string) ([]byte, error)
- func (c *Client) Remove(name string) error
- func (c *Client) RemoveAll(name string) error
- func (c *Client) RemoveXAttr(name, key string) error
- func (c *Client) Rename(oldpath, newpath string) error
- func (c *Client) SetXAttr(name, key, value string) error
- func (c *Client) Stat(name string) (os.FileInfo, error)
- func (c *Client) StatFs() (FsInfo, error)
- func (c *Client) User() string
- func (c *Client) Walk(root string, walkFn filepath.WalkFunc) error
- type ClientOptions
- type ContentSummary
- type Error
- type FileInfo
- func (fi *FileInfo) AccessTime() time.Time
- func (fi *FileInfo) IsDir() bool
- func (fi *FileInfo) ModTime() time.Time
- func (fi *FileInfo) Mode() os.FileMode
- func (fi *FileInfo) Name() string
- func (fi *FileInfo) Owner() string
- func (fi *FileInfo) OwnerGroup() string
- func (fi *FileInfo) Size() int64
- func (fi *FileInfo) Sys() interface{}
- type FileReader
- func (f *FileReader) Checksum() ([]byte, error)
- func (f *FileReader) Close() error
- func (f *FileReader) Name() string
- func (f *FileReader) Read(b []byte) (int, error)
- func (f *FileReader) ReadAt(b []byte, off int64) (int, error)
- func (f *FileReader) Readdir(n int) ([]os.FileInfo, error)
- func (f *FileReader) Readdirnames(n int) ([]string, error)
- func (f *FileReader) Seek(offset int64, whence int) (int64, error)
- func (f *FileReader) SetDeadline(t time.Time) error
- func (f *FileReader) Stat() (os.FileInfo, error)
- type FileStatus
- type FileWriter
- type FsInfo
Constants ¶
const ( DataTransferProtectionAuthentication = "authentication" DataTransferProtectionIntegrity = "integrity" DataTransferProtectionPrivacy = "privacy" )
Variables ¶
This section is empty.
Functions ¶
This section is empty.
Types ¶
type Client ¶
type Client struct {
// contains filtered or unexported fields
}
Client represents a connection to an HDFS cluster. A Client will automatically maintain leases for any open files, preventing other clients from modifying them, until Close is called.
func New ¶
New returns Client connected to the namenode(s) specified by address, or an error if it can't connect. Multiple namenodes can be specified by separating them with commas, for example "nn1:9000,nn2:9000".
The user will be the current system user. Any other relevant options (including the address(es) of the namenode(s), if an empty string is passed) will be loaded from the Hadoop configuration present at HADOOP_CONF_DIR or HADOOP_HOME, as specified by hadoopconf.LoadFromEnvironment and ClientOptionsFromConf.
Note, however, that New will not attempt any Kerberos authentication; use NewClient if you need that.
func NewClient ¶
func NewClient(options ClientOptions) (*Client, error)
NewClient returns a connected Client for the given options, or an error if the client could not be created.
func (*Client) AllowSnapshots ¶
AllowSnapshots marks a directory as available for snapshots. This is required to make a snapshot of a directory as snapshottable directories work as a whitelist.
This requires superuser privileges.
func (*Client) Append ¶
func (c *Client) Append(name string) (*FileWriter, error)
Append opens an existing file in HDFS and returns an io.WriteCloser for writing to it. Because of the way that HDFS writes are buffered and acknowledged asynchronously, it is very important that Close is called after all data has been written.
func (*Client) Chown ¶
Chown changes the user and group of the file. Unlike os.Chown, this takes a string username and group (since that's what HDFS uses.)
If an empty string is passed for user or group, that field will not be changed remotely.
func (*Client) CopyToLocal ¶
CopyToLocal copies the HDFS file specified by src to the local file at dst. If dst already exists, it will be overwritten.
func (*Client) CopyToRemote ¶
CopyToRemote copies the local file specified by src to the HDFS file at dst.
func (*Client) Create ¶
func (c *Client) Create(name string) (*FileWriter, error)
Create opens a new file in HDFS with the default replication, block size, and permissions (0644), and returns an io.WriteCloser for writing to it. Because of the way that HDFS writes are buffered and acknowledged asynchronously, it is very important that Close is called after all data has been written.
func (*Client) CreateEmptyFile ¶
CreateEmptyFile creates a empty file at the given name, with the permissions 0644.
func (*Client) CreateFile ¶
func (c *Client) CreateFile(name string, replication int, blockSize int64, perm os.FileMode) (*FileWriter, error)
CreateFile opens a new file in HDFS with the given replication, block size, and permissions, and returns an io.WriteCloser for writing to it. Because of the way that HDFS writes are buffered and acknowledged asynchronously, it is very important that Close is called after all data has been written.
func (*Client) CreateSnapshot ¶
CreateSnapshots creates a snapshot of a given directory and name, and returns the path containing the snapshot. Snapshot names must be unique.
This requires superuser privileges.
func (*Client) DeleteSnapshot ¶
CreateSnapshots deletes a snapshot with a given directory and name.
This requires superuser privileges.
func (*Client) DisallowSnapshots ¶
DisallowSnapshots marks a directory as unavailable for snapshots.
This requires superuser privileges.
func (*Client) GetContentSummary ¶
func (c *Client) GetContentSummary(name string) (*ContentSummary, error)
GetContentSummary returns a ContentSummary representing the named file or directory. The summary contains information about the entire tree rooted in the named file; for instance, it can return the total size of all
func (*Client) GetXAttrs ¶
GetXAttrs returns the extended attributes for the given path and list of keys. The keys should be prefixed by namespace, e.g. user.foo or trusted.bar.
func (*Client) ListXAttrs ¶
ListXAttrs returns a list of all extended attributes for the given path. The returned keys will be in the form
func (*Client) MkdirAll ¶
MkdirAll creates a directory for dirname, along with any necessary parents, and returns nil, or else returns an error. The permission bits perm are used for all directories that MkdirAll creates. If dirname is already a directory, MkdirAll does nothing and returns nil.
func (*Client) Name ¶
Name returns the unique name that the Client uses in communication with namenodes and datanodes.
func (*Client) Open ¶
func (c *Client) Open(name string) (*FileReader, error)
Open returns an FileReader which can be used for reading.
func (*Client) ReadDir ¶
ReadDir reads the directory named by dirname and returns a list of sorted directory entries.
The os.FileInfo values returned will not have block location attached to the struct returned by Sys().
func (*Client) RemoveAll ¶
RemoveAll removes path and any children it contains. It removes everything it can but returns the first error it encounters. If the path does not exist, RemoveAll returns nil (no error).
func (*Client) RemoveXAttr ¶
RemoveXAttr unsets an extended attribute for the given path and key. It returns an error if the attribute doesn't already exist.
func (*Client) SetXAttr ¶
SetXAttr sets an extended attribute for the given path and key. If the attribute doesn't exist, it will be created.
func (*Client) User ¶
User returns the user that the Client is acting under. This is either the current system user or the kerberos principal.
func (*Client) Walk ¶
Walk walks the file tree rooted at root, calling walkFn for each file or directory in the tree, including root. All errors that arise visiting files and directories are filtered by walkFn. The files are walked in lexical order, which makes the output deterministic but means that for very large directories Walk can be inefficient. Walk does not follow symbolic links.
type ClientOptions ¶
type ClientOptions struct { // Addresses specifies the namenode(s) to connect to. Addresses []string // User specifies which HDFS user the client will act as. It is required // unless kerberos authentication is enabled, in which case it will be // determined from the provided credentials if empty. User string // UseDatanodeHostname specifies whether the client should connect to the // datanodes via hostname (which is useful in multi-homed setups) or IP // address, which may be required if DNS isn't available. UseDatanodeHostname bool // NamenodeDialFunc is used to connect to the namenodes. If nil, then // (&net.Dialer{}).DialContext is used. NamenodeDialFunc func(ctx context.Context, network, addr string) (net.Conn, error) // DatanodeDialFunc is used to connect to the datanodes. If nil, then // (&net.Dialer{}).DialContext is used. DatanodeDialFunc func(ctx context.Context, network, addr string) (net.Conn, error) // KerberosClient is used to connect to kerberized HDFS clusters. If provided, // the client will always mutually authenticate when connecting to the // namenode(s). KerberosClient *krb.Client // KerberosServicePrincipleName specifies the Service Principle Name // (<SERVICE>/<FQDN>) for the namenode(s). Like in the // dfs.namenode.kerberos.principal property of core-site.xml, the special // string '_HOST' can be substituted for the address of the namenode in a // multi-namenode setup (for example: 'nn/_HOST'). It is required if // KerberosClient is provided. KerberosServicePrincipleName string // DataTransferProtection specifies whether or not authentication, data // signature integrity checks, and wire encryption is required when // communicating the the datanodes. A value of "authentication" implies // just authentication, a value of "integrity" implies both authentication // and integrity checks, and a value of "privacy" implies all three. The // Client may negotiate a higher level of protection if it is requested // by the datanode; for example, if the datanode and namenode hdfs-site.xml // has dfs.encrypt.data.transfer enabled, this setting is ignored and // a level of "privacy" is used. DataTransferProtection string }
ClientOptions represents the configurable options for a client. The NamenodeDialFunc and DatanodeDialFunc options can be used to set connection timeouts:
dialFunc := (&net.Dialer{ Timeout: 30 * time.Second, KeepAlive: 30 * time.Second, DualStack: true, }).DialContext options := ClientOptions{ Addresses: []string{"nn1:9000"}, NamenodeDialFunc: dialFunc, DatanodeDialFunc: dialFunc, }
func ClientOptionsFromConf ¶
func ClientOptionsFromConf(conf hadoopconf.HadoopConf) ClientOptions
ClientOptionsFromConf attempts to load any relevant configuration options from the given Hadoop configuration and create a ClientOptions struct suitable for creating a Client. Currently this sets the following fields on the resulting ClientOptions:
// Determined by fs.defaultFS (or the deprecated fs.default.name), or // fields beginning with dfs.namenode.rpc-address. Addresses []string // Determined by dfs.client.use.datanode.hostname. UseDatanodeHostname bool // Set to a non-nil but empty client (without credentials) if the value of // hadoop.security.authentication is 'kerberos'. It must then be replaced // with a credentialed Kerberos client. KerberosClient *krb.Client // Determined by dfs.namenode.kerberos.principal, with the realm // (everything after the first '@') chopped off. KerberosServicePrincipleName string // Determined by dfs.data.transfer.protection or dfs.encrypt.data.transfer // (in the latter case, it is set to 'privacy'). DataTransferProtection string
Because of the way Kerberos can be forced by the Hadoop configuration but not actually configured, you should check for whether KerberosClient is set in the resulting ClientOptions before proceeding:
options := ClientOptionsFromConf(conf) if options.KerberosClient != nil { // Replace with a valid credentialed client. options.KerberosClient = getKerberosClient() }
type ContentSummary ¶
type ContentSummary struct {
// contains filtered or unexported fields
}
ContentSummary represents a set of information about a file or directory in HDFS. It's provided directly by the namenode, and has no unix filesystem analogue.
func (*ContentSummary) DirectoryCount ¶
func (cs *ContentSummary) DirectoryCount() int
DirectoryCount returns the number of directories under the named one, including any subdirectories, and including the root directory itself. If the named path is a file, this returns 0.
func (*ContentSummary) FileCount ¶
func (cs *ContentSummary) FileCount() int
FileCount returns the number of files under the named path, including any subdirectories. If the named path is a file, FileCount returns 1.
func (*ContentSummary) NameQuota ¶
func (cs *ContentSummary) NameQuota() int
NameQuota returns the HDFS configured "name quota" for the named path. The name quota is a hard limit on the number of directories and files inside a directory; see http://goo.gl/sOSJmJ for more information.
func (*ContentSummary) Size ¶
func (cs *ContentSummary) Size() int64
Size returns the total size of the named path, including any subdirectories.
func (*ContentSummary) SizeAfterReplication ¶
func (cs *ContentSummary) SizeAfterReplication() int64
SizeAfterReplication returns the total size of the named path, including any subdirectories. Unlike Size, it counts the total replicated size of each file, and represents the total on-disk footprint for a tree in HDFS.
func (*ContentSummary) SpaceQuota ¶
func (cs *ContentSummary) SpaceQuota() int64
SpaceQuota returns the HDFS configured "name quota" for the named path. The name quota is a hard limit on the number of directories and files inside a directory; see http://goo.gl/sOSJmJ for more information.
type Error ¶
type Error interface { // Method returns the RPC method that encountered an error. Method() string // Desc returns the long form of the error code (for example ERROR_CHECKSUM). Desc() string // Exception returns the java exception class name (for example // java.io.FileNotFoundException). Exception() string // Message returns the full error message, complete with java exception // traceback. Message() string }
Error represents a remote java exception from an HDFS namenode or datanode.
type FileInfo ¶
type FileInfo struct {
// contains filtered or unexported fields
}
FileInfo implements os.FileInfo, and provides information about a file or directory in HDFS.
func (*FileInfo) AccessTime ¶
AccessTime returns the last time the file was accessed. It's not part of the os.FileInfo interface.
func (*FileInfo) Owner ¶
Owner returns the name of the user that owns the file or directory. It's not part of the os.FileInfo interface.
func (*FileInfo) OwnerGroup ¶
OwnerGroup returns the name of the group that owns the file or directory. It's not part of the os.FileInfo interface.
type FileReader ¶
type FileReader struct {
// contains filtered or unexported fields
}
A FileReader represents an existing file or directory in HDFS. It implements io.Reader, io.ReaderAt, io.Seeker, and io.Closer, and can only be used for reads. For writes, see FileWriter and Client.Create.
func (*FileReader) Checksum ¶
func (f *FileReader) Checksum() ([]byte, error)
Checksum returns HDFS's internal "MD5MD5CRC32C" checksum for a given file.
Internally to HDFS, it works by calculating the MD5 of all the CRCs (which are stored alongside the data) for each block, and then calculating the MD5 of all of those.
func (*FileReader) Read ¶
func (f *FileReader) Read(b []byte) (int, error)
Read implements io.Reader.
func (*FileReader) ReadAt ¶
func (f *FileReader) ReadAt(b []byte, off int64) (int, error)
ReadAt implements io.ReaderAt.
func (*FileReader) Readdir ¶
func (f *FileReader) Readdir(n int) ([]os.FileInfo, error)
Readdir reads the contents of the directory associated with file and returns a slice of up to n os.FileInfo values, as would be returned by Stat, in directory order. Subsequent calls on the same file will yield further os.FileInfos.
If n > 0, Readdir returns at most n os.FileInfo values. In this case, if Readdir returns an empty slice, it will return a non-nil error explaining why. At the end of a directory, the error is io.EOF.
If n <= 0, Readdir returns all the os.FileInfo from the directory in a single slice. In this case, if Readdir succeeds (reads all the way to the end of the directory), it returns the slice and a nil error. If it encounters an error before the end of the directory, Readdir returns the os.FileInfo read until that point and a non-nil error.
The os.FileInfo values returned will not have block location attached to the struct returned by Sys(). To fetch that information, make a separate call to Stat.
Note that making multiple calls to Readdir with a smallish n (as you might do with the os version) is slower than just requesting everything at once. That's because HDFS has no mechanism for limiting the number of entries returned; whatever extra entries it returns are simply thrown away.
func (*FileReader) Readdirnames ¶
func (f *FileReader) Readdirnames(n int) ([]string, error)
Readdirnames reads and returns a slice of names from the directory f.
If n > 0, Readdirnames returns at most n names. In this case, if Readdirnames returns an empty slice, it will return a non-nil error explaining why. At the end of a directory, the error is io.EOF.
If n <= 0, Readdirnames returns all the names from the directory in a single slice. In this case, if Readdirnames succeeds (reads all the way to the end of the directory), it returns the slice and a nil error. If it encounters an error before the end of the directory, Readdirnames returns the names read until that point and a non-nil error.
func (*FileReader) Seek ¶
func (f *FileReader) Seek(offset int64, whence int) (int64, error)
Seek implements io.Seeker.
The seek is virtual - it starts a new block read at the new position.
func (*FileReader) SetDeadline ¶
func (f *FileReader) SetDeadline(t time.Time) error
SetDeadline sets the deadline for future Read, ReadAt, and Checksum calls. A zero value for t means those calls will not time out.
type FileStatus ¶
type FileStatus = hdfs.HdfsFileStatusProto
type FileWriter ¶
type FileWriter struct {
// contains filtered or unexported fields
}
A FileWriter represents a writer for an open file in HDFS. It implements Writer and Closer, and can only be used for writes. For reads, see FileReader and Client.Open.
func (*FileWriter) Close ¶
func (f *FileWriter) Close() error
Close closes the file, writing any remaining data out to disk and waiting for acknowledgements from the datanodes. It is important that Close is called after all data has been written.
func (*FileWriter) Flush ¶
func (f *FileWriter) Flush() error
Flush flushes any buffered data out to the datanodes. Even immediately after a call to Flush, it is still necessary to call Close once all data has been written.
func (*FileWriter) SetDeadline ¶
func (f *FileWriter) SetDeadline(t time.Time) error
SetDeadline sets the deadline for future Write, Flush, and Close calls. A zero value for t means those calls will not time out.
Note that because of buffering, Write calls that do not result in a blocking network call may still succeed after the deadline.
func (*FileWriter) Write ¶
func (f *FileWriter) Write(b []byte) (int, error)
Write implements io.Writer for writing to a file in HDFS. Internally, it writes data to an internal buffer first, and then later out to HDFS. Because of this, it is important that Close is called after all data has been written.
Source Files ¶
Directories ¶
Path | Synopsis |
---|---|
cmd
|
|
Package hadoopconf provides utilities for reading and parsing Hadoop's xml configuration files.
|
Package hadoopconf provides utilities for reading and parsing Hadoop's xml configuration files. |
internal
|
|
protocol/hadoop_common
Package hadoop_common is a generated protocol buffer package.
|
Package hadoop_common is a generated protocol buffer package. |
protocol/hadoop_hdfs
Package hadoop_hdfs is a generated protocol buffer package.
|
Package hadoop_hdfs is a generated protocol buffer package. |
rpc
Package rpc implements some of the lower-level functionality required to communicate with the namenode and datanodes.
|
Package rpc implements some of the lower-level functionality required to communicate with the namenode and datanodes. |
transfer
Package transfer implements wire transfer with the datanodes.
|
Package transfer implements wire transfer with the datanodes. |