iterator

package
v0.0.8 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Aug 12, 2022 License: Apache-2.0 Imports: 31 Imported by: 0

Documentation

Index

Constants

View Source
const (
	// GitScheme is the standard prefix used for git repo UID's.
	GitScheme = "git://"

	// GitSchemeRaw is the standard prefix used for git repo UID's but
	// without the scheme protocol separator which is <colon-slash-slash>.
	GitSchemeRaw = "git"

	// GitProgram is the name of the git executable. It is needed until we
	// figure out how to make this pure golang.
	GitProgram = "git"
)
View Source
const (
	// HttpScheme is the standard prefix used for http URL's.
	HttpScheme = "http://"

	// HttpsScheme is the standard prefix used for https URL's.
	HttpsScheme = "https://"

	// HttpSchemeRaw is the standard prefix used for http URL's but without
	// the scheme protocol separator which is <colon-slash-slash>.
	HttpSchemeRaw = "http"

	// HttpsSchemeRaw is the standard prefix used for https URL's but
	// without the scheme protocol separator which is <colon-slash-slash>.
	HttpsSchemeRaw = "https"

	// UnknownFileName is the filename used when the URL doesn't have an
	// obvious filename at the end that we can use.
	// TODO: is there a better name we can use? This is mostly arbitrary.
	UnknownFileName = ".unknown"
)
View Source
const (
	// ZipExtension is the standard extension used for zip URI's.
	ZipExtension = ".zip"

	// JarExtension is used for java .jar files. This is included here since
	// they are just zip files that are named differently.
	JarExtension = ".jar"

	// WhlExtension is used for python .whl files. This is included here since
	// they are just zip files that are named differently.
	WhlExtension = ".whl"
)
View Source
const (
	// FileScheme is the standard prefix used for file path UID's.
	FileScheme = "file://"
)
View Source
const (
	// TarExtension is the standard extension used for tar URI's.
	TarExtension = ".tar"
)

Variables

View Source
var (
	// SkipPathExtensions is a list of file extensions to not scan. This
	// list is alphabetical and has a comment for each element.
	SkipPathExtensions = []string{
		".bmp",
		".cvsignore",
		".doc",
		".eps",
		".gif",
		".gitignore",
		".jpeg",
		".jpg",
		".ico",
		".pdf",
		".png",
		".ppt",
		".svg",
		".odp",
		".ods",
		".odt",
		".xls",
	}

	// SkipDirPaths is a list of relative dir paths to not scan. This list
	// list is alphabetical and has a comment for each element.
	SkipDirPaths = []string{
		".git/",
		".github/",
		".svn/",
	}
)
View Source
var (
	// Bzip2Extensions is a list of valid extensions.
	Bzip2Extensions = []string{
		".bz",
		".bz2",

		".bzip2",
		".tbz",
		".tbz2",
	}
)
View Source
var (
	// GzipExtensions is a list of valid extensions.
	GzipExtensions = []string{
		".gz",
		".gzip",
		".tgz",
	}
)

Functions

func GitSubmoduleParentURL added in v0.0.5

func GitSubmoduleParentURL(iterator interfaces.Iterator) (string, error)

GitSubmoduleParentURL returns the URL of the parent git iterator. It only traverses through fs iterators. It stops at the first git iterator. Anything else and it's an error.

func SkipPath

func SkipPath(path safepath.Path, info fs.FileInfo) (bool, error)

SkipPath takes an input path and file info struct, and returns whether we should skip over it or not. To skip it, return true and no error. To skip a directory, return interfaces.SkipDir as the error. Lastly, if anything goes wrong, you can return your own error, but minimizing this chance is ideal. The stuff that gets skipped in here *must* be common for all iterators, as this function is shared by all of them. Individual backends can have their own file skip detection as well. For example, one particular backend might not know how to scan *.go files, where as a different one might specialize in this. Lastly, a design decision was made to make this a "pure, stateless" function. In other words, the decision to skip a file or not should be based entirely on the input arguments, and more complicated skip functions that might take into account more complex logic, such as the existence of multiple file paths is not possible. For example, if someone were to invent a file called `.legalignore` that worked like `.gitignore` but told software which files copyrights wouldn't apply from, we'd be unable to detect those and skip over them with this skip function since it only has a view into individual files and doesn't get a stateful, full directory tree view.

func WhichSuffixInsensitive added in v0.0.4

func WhichSuffixInsensitive(s string, suffixList []string) string

WhichSuffix returns the first suffix with the longest match that is found in the input string from the list provided. If none are found, then the empty string is returned. The comparisons are done in lower case, but the returned suffix is in the original case from the input list.

Types

type Bzip2 added in v0.0.4

type Bzip2 struct {
	Debug  bool
	Logf   func(format string, v ...interface{})
	Prefix safepath.AbsDir

	// Parser is a pointer to the parser that returned this. If it wasn't
	// returned by a parser, leave this nil. If this iterator came from an
	// iterator, then the Iterator handle should be filled instead.
	Parser interfaces.Parser

	// Iterator is a pointer to the iterator that returned this. If it
	// wasn't returned by an iterator, leave this nil. If this iterator came
	// from a parser, then the Parser handle should be filled instead.
	Iterator interfaces.Iterator

	// Path is the location of the file to gunzip.
	Path safepath.AbsFile

	// AllowAnyExtension specifies whether we will attempt to run if the
	// Path does not end with the correct bzip2 extension.
	AllowAnyExtension bool

	// AllowedExtensions specifies a list of extensions that we are allowed
	// to try to decode from. If this is empty, then we allow only the
	// defaults above because allowing no extensions at all would make no
	// sense. If AllowAnyExtension is set, then this has no effect. All the
	// matches are case insensitive.
	AllowedExtensions []string
	// contains filtered or unexported fields
}

Bzip2 is an iterator that takes a .bz or similar URI to open and performs the decompress operation. It will eventually return an Fs iterator since there's no need for it to know how to walk through a filesystem tree itself and it's going to return a single file here. It can use a local cache so that future calls to the same URI won't have to waste cycles, but only in cases when we can determine it will be the same file.

func (*Bzip2) Close added in v0.0.4

func (obj *Bzip2) Close() error

Close shuts down the iterator and/or performs clean up after the Recurse method has run. This must be called if you run Recurse.

func (*Bzip2) GetIterator added in v0.0.4

func (obj *Bzip2) GetIterator() interfaces.Iterator

GetIterator returns a handle to the parent iterator that built this iterator if there is one.

func (*Bzip2) GetParser added in v0.0.4

func (obj *Bzip2) GetParser() interfaces.Parser

GetParser returns a handle to the parent parser that built this iterator if there is one.

func (*Bzip2) Recurse added in v0.0.4

func (obj *Bzip2) Recurse(ctx context.Context, scan interfaces.ScanFunc) ([]interfaces.Iterator, error)

Recurse runs a simple iterator that is responsible for uncompressing a bzip2 URI into a local filesystem path. If this happens successfully, it will return a new FsIterator that is initialized to this root path.

func (*Bzip2) String added in v0.0.4

func (obj *Bzip2) String() string

String returns a human-readable representation of the bzip2 path we're looking at. The output of this format is not guaranteed to be constant, so don't try to parse it.

func (*Bzip2) Validate added in v0.0.4

func (obj *Bzip2) Validate() error

Validate runs some checks to ensure this iterator was built correctly.

type Fs

type Fs struct {
	Debug  bool
	Logf   func(format string, v ...interface{})
	Prefix safepath.AbsDir

	// Parser is a pointer to the parser that returned this. If it wasn't
	// returned by a parser, leave this nil. If this iterator came from an
	// iterator, then the Iterator handle should be filled instead.
	Parser interfaces.Parser

	// Iterator is a pointer to the iterator that returned this. If it
	// wasn't returned by an iterator, leave this nil. If this iterator came
	// from a parser, then the Parser handle should be filled instead.
	Iterator interfaces.Iterator

	// Path is the location of the fs to walk.
	Path safepath.Path

	// GenUID takes the safe path that would have been used to build the UID
	// and returns an improved UID that is more pleasantly human readable.
	// Specifying this function is optional, but if it is used, it's not
	// recommended to error unless there's a programming mistake, and you
	// must be confident that your results will be properly unique.
	GenUID func(safepath.Path) (string, error)

	// Unlock is a function that should be called as part of the Close
	// method once this resource is finished. It can be defined when
	// building this iterator in case we want a mechanism for the caller of
	// this iterator to tell the child when to unlock any in-use resources.
	// It must be safe to call this function more than once if necessary.
	// This is currently unused.
	Unlock func()
}

Fs is an iterator that scans your local filesystem at the specified path. Recursive scanners, while running a scan function, can also return more iterators. In this pattern, this iterator may be used to run on a cloned git directory after the git iterator pulled the files down onto the filesystem. If we encounter a git submodule (by finding a .gitmodules file) we will parse it and return a number of git iterators for each of the contained repositories. TODO: This iterator could learn how to identify go.mod files, python, java, etc, and learn how to iterate into those projects by returning new iterators. TODO: This iterator could grow a Copy option to copy the files into a new directory before iterating over them.

func (*Fs) Close

func (obj *Fs) Close() error

Close shuts down the iterator and/or performs clean up after the Recurse method has run. This must be called if you run Recurse.

func (*Fs) GetIterator

func (obj *Fs) GetIterator() interfaces.Iterator

GetIterator returns a handle to the parent iterator that built this iterator if there is one.

func (*Fs) GetParser

func (obj *Fs) GetParser() interfaces.Parser

GetParser returns a handle to the parent parser that built this iterator if there is one.

func (*Fs) GitSubmodulesHelper

func (obj *Fs) GitSubmodulesHelper(ctx context.Context, p safepath.Path) ([]interfaces.Iterator, error)

GitSubmodulesHelper is a helper that checks for a .gitmodules file and produces the iterators that come from it.

func (*Fs) Recurse

func (obj *Fs) Recurse(ctx context.Context, scan interfaces.ScanFunc) ([]interfaces.Iterator, error)

Recurse runs a simple recursive iterator that walks through a local filesystem path. It applies a scan function to everything that it encounters. While iterating, it may also discover certain files that it can use to produce new iterators.

func (*Fs) String

func (obj *Fs) String() string

String returns a human-readable representation of the fs path we're looking at. The output of this format is not guaranteed to be constant, so don't try to parse it.

func (*Fs) Validate

func (obj *Fs) Validate() error

Validate runs some checks to ensure this iterator was built correctly.

type Git

type Git struct {
	Debug  bool
	Logf   func(format string, v ...interface{})
	Prefix safepath.AbsDir

	// Parser is a pointer to the parser that returned this. If it wasn't
	// returned by a parser, leave this nil. If this iterator came from an
	// iterator, then the Iterator handle should be filled instead.
	Parser interfaces.Parser

	// Iterator is a pointer to the iterator that returned this. If it
	// wasn't returned by an iterator, leave this nil. If this iterator came
	// from a parser, then the Parser handle should be filled instead.
	Iterator interfaces.Iterator

	// URL is the git URL of the repository that we want to clone from.
	// TODO: consider doing some clever parsing of well-known paths like
	// github-style URL's or internal company code repository URL's.
	// TODO: this could be implemented with a layered iterator that's github
	// specific, and after parsing, it returns this raw git iterator.
	URL string

	// TrimGitSuffix specifies whether we should try to trim a .git suffix
	// from any URL that we get. Usually they can be cloned both ways, but
	// modern repositories omit the need for this.
	TrimGitSuffix bool

	// Hash is the specific commit hash to use to specify what to scan. You
	// can either identify things this way, with Ref or Rev, but not more
	// than one.
	Hash string // len 40 chars

	// Ref is a specific revision to use to specify what you want to scan.
	// This can be a branch, note, remote, or tag ref. These are in the form
	// with the prefix: refs/heads/, refs/notes/, refs/remotes/ or
	// refs/tags. If you specify this, you must not specify Hash or Rev. If
	// you want the possibly ambiguous, but "easy" way of specifying
	// something, then use Rev.
	Ref string

	// Rev is the method most CLI tools use to identify a specific hash. You
	// pass in a sensible string of your choice, and git will attempt to
	// find what you mean. This isn't recommended if you want to be precise
	// because you can have weirdly named branches that can trick you.
	Rev string
	// contains filtered or unexported fields
}

Git is an iterator that takes a git URL to clone and performs this download operation. It will eventually return an Fs iterator since there's no need for it to know how to walk through a filesystem tree itself. It can use a local cache so that future calls to the same repository won't have to waste bandwidth or cycles again. We don't recurse into git submodules, but rather the Fs iterators know how to find them and generate git iterators for them. This keeps things flatter and allows us to work more quickly in parallel.

NOTE: I wanted to name this "giterator", but that wouldn't be consistent. TODO: If someone wanted to scan *every* commit, or a range of commits, rather than the latest HEAD, then we could have options to return multiple iterators to support that. TODO: concurrent use of this lib: https://github.com/go-git/go-git/issues/285 NOTE: We currently keep all the repo data in one place, and have locking for reads and checkouts on the unique ID of the git directory.

func (*Git) Close

func (obj *Git) Close() error

Close shuts down the iterator and/or performs clean up after the Recurse method has run. This must be called if you run Recurse.

func (*Git) GetIterator

func (obj *Git) GetIterator() interfaces.Iterator

GetIterator returns a handle to the parent iterator that built this iterator if there is one.

func (*Git) GetParser

func (obj *Git) GetParser() interfaces.Parser

GetParser returns a handle to the parent parser that built this iterator if there is one.

func (*Git) Recurse

func (obj *Git) Recurse(ctx context.Context, scan interfaces.ScanFunc) ([]interfaces.Iterator, error)

Recurse runs a simple iterator that is responsible for cloning a git repository into a local filesystem path. If this happens successfully, it will return a new FsIterator that is initialized to this root path.

func (*Git) String

func (obj *Git) String() string

String returns a human-readable representation of the git repo we're looking at. The output of this format is not guaranteed to be constant, so don't try to parse it.

func (*Git) Validate

func (obj *Git) Validate() error

Validate runs some checks to ensure this iterator was built correctly.

type Gzip added in v0.0.4

type Gzip struct {
	Debug  bool
	Logf   func(format string, v ...interface{})
	Prefix safepath.AbsDir

	// Parser is a pointer to the parser that returned this. If it wasn't
	// returned by a parser, leave this nil. If this iterator came from an
	// iterator, then the Iterator handle should be filled instead.
	Parser interfaces.Parser

	// Iterator is a pointer to the iterator that returned this. If it
	// wasn't returned by an iterator, leave this nil. If this iterator came
	// from a parser, then the Parser handle should be filled instead.
	Iterator interfaces.Iterator

	// Path is the location of the file to gunzip.
	Path safepath.AbsFile

	// AllowAnyExtension specifies whether we will attempt to run if the
	// Path does not end with the correct gzip extension.
	AllowAnyExtension bool

	// AllowedExtensions specifies a list of extensions that we are allowed
	// to try to decode from. If this is empty, then we allow only the
	// defaults above because allowing no extensions at all would make no
	// sense. If AllowAnyExtension is set, then this has no effect. All the
	// matches are case insensitive.
	AllowedExtensions []string
	// contains filtered or unexported fields
}

Gzip is an iterator that takes a .gz or similar URI to open and performs the decompress operation. It will eventually return an Fs iterator since there's no need for it to know how to walk through a filesystem tree itself and it's going to return a single file here. It can use a local cache so that future calls to the same URI won't have to waste cycles, but only in cases when we can determine it will be the same file. This does _not_ support gzip multistream, but it could be added if we find a use-case for it.

func (*Gzip) Close added in v0.0.4

func (obj *Gzip) Close() error

Close shuts down the iterator and/or performs clean up after the Recurse method has run. This must be called if you run Recurse.

func (*Gzip) GetIterator added in v0.0.4

func (obj *Gzip) GetIterator() interfaces.Iterator

GetIterator returns a handle to the parent iterator that built this iterator if there is one.

func (*Gzip) GetParser added in v0.0.4

func (obj *Gzip) GetParser() interfaces.Parser

GetParser returns a handle to the parent parser that built this iterator if there is one.

func (*Gzip) Recurse added in v0.0.4

func (obj *Gzip) Recurse(ctx context.Context, scan interfaces.ScanFunc) ([]interfaces.Iterator, error)

Recurse runs a simple iterator that is responsible for uncompressing a gzip URI into a local filesystem path. If this happens successfully, it will return a new FsIterator that is initialized to this root path.

func (*Gzip) String added in v0.0.4

func (obj *Gzip) String() string

String returns a human-readable representation of the gzip path we're looking at. The output of this format is not guaranteed to be constant, so don't try to parse it.

func (*Gzip) Validate added in v0.0.4

func (obj *Gzip) Validate() error

Validate runs some checks to ensure this iterator was built correctly.

type Http

type Http struct {
	Debug  bool
	Logf   func(format string, v ...interface{})
	Prefix safepath.AbsDir

	// Parser is a pointer to the parser that returned this. If it wasn't
	// returned by a parser, leave this nil. If this iterator came from an
	// iterator, then the Iterator handle should be filled instead.
	Parser interfaces.Parser

	// Iterator is a pointer to the iterator that returned this. If it
	// wasn't returned by an iterator, leave this nil. If this iterator came
	// from a parser, then the Parser handle should be filled instead.
	Iterator interfaces.Iterator

	// URL is the http URL of the file that we want to download.
	// TODO: consider doing some clever parsing of well-known paths like
	// github-style URL's or internal company code repository URL's.
	URL string

	// AllowHttp specifies whether we're allowed to download http
	// (unencrypted) URLs.
	AllowHttp bool
	// contains filtered or unexported fields
}

Http is an iterator that takes an http URL to download and performs the download operation. It will eventually return an Fs iterator since there's no need for it to know how to walk through a filesystem tree itself. It can use a local cache so that future calls to the same URL won't have to waste bandwidth or cycles again but only in cases when we can determine it will be the same file. Please note this is named http, but we obviously support https as the most common form of this.

func (*Http) Close

func (obj *Http) Close() error

Close shuts down the iterator and/or performs clean up after the Recurse method has run. This must be called if you run Recurse.

func (*Http) GetIterator

func (obj *Http) GetIterator() interfaces.Iterator

GetIterator returns a handle to the parent iterator that built this iterator if there is one.

func (*Http) GetParser

func (obj *Http) GetParser() interfaces.Parser

GetParser returns a handle to the parent parser that built this iterator if there is one.

func (*Http) Recurse

func (obj *Http) Recurse(ctx context.Context, scan interfaces.ScanFunc) ([]interfaces.Iterator, error)

Recurse runs a simple iterator that is responsible for downloading an http url into a local filesystem path. If this happens successfully, it will return a new FsIterator that is initialized to this root path.

func (*Http) String

func (obj *Http) String() string

String returns a human-readable representation of the http URL we're looking at. The output of this format is not guaranteed to be constant, so don't try to parse it.

func (*Http) Validate

func (obj *Http) Validate() error

Validate runs some checks to ensure this iterator was built correctly.

type Tar added in v0.0.4

type Tar struct {
	Debug  bool
	Logf   func(format string, v ...interface{})
	Prefix safepath.AbsDir

	// Parser is a pointer to the parser that returned this. If it wasn't
	// returned by a parser, leave this nil. If this iterator came from an
	// iterator, then the Iterator handle should be filled instead.
	Parser interfaces.Parser

	// Iterator is a pointer to the iterator that returned this. If it
	// wasn't returned by an iterator, leave this nil. If this iterator came
	// from a parser, then the Parser handle should be filled instead.
	Iterator interfaces.Iterator

	// Path is the location of the file to untar.
	Path safepath.AbsFile

	// AllowAnyExtension specifies whether we will attempt to run if the
	// Path does not end with the correct tar extension.
	AllowAnyExtension bool

	// AllowedExtensions specifies a list of extensions that we are allowed
	// to try to decode from. If this is empty, then we allow only the
	// default of tar because allowing no extensions at all would make no
	// sense. If AllowAnyExtension is set, then this has no effect. All the
	// matches are case insensitive.
	AllowedExtensions []string
	// contains filtered or unexported fields
}

Tar is an iterator that takes a .tar URI to open and performs the un-tar operation. It will eventually return an Fs iterator since there's no need for it to know how to walk through a filesystem tree itself. It can use a local cache so that future calls to the same URI won't have to waste cycles, but only in cases when we can determine it will be the same file. This currently only unpacks files and directories. Any other file type (like symlinks) will be ignored.

func (*Tar) Close added in v0.0.4

func (obj *Tar) Close() error

Close shuts down the iterator and/or performs clean up after the Recurse method has run. This must be called if you run Recurse.

func (*Tar) GetIterator added in v0.0.4

func (obj *Tar) GetIterator() interfaces.Iterator

GetIterator returns a handle to the parent iterator that built this iterator if there is one.

func (*Tar) GetParser added in v0.0.4

func (obj *Tar) GetParser() interfaces.Parser

GetParser returns a handle to the parent parser that built this iterator if there is one.

func (*Tar) Recurse added in v0.0.4

func (obj *Tar) Recurse(ctx context.Context, scan interfaces.ScanFunc) ([]interfaces.Iterator, error)

Recurse runs a simple iterator that is responsible for untar-ing a tar URI into a local filesystem path. If this happens successfully, it will return a new FsIterator that is initialized to this root path.

func (*Tar) String added in v0.0.4

func (obj *Tar) String() string

String returns a human-readable representation of the tar path we're looking at. The output of this format is not guaranteed to be constant, so don't try to parse it.

func (*Tar) Validate added in v0.0.4

func (obj *Tar) Validate() error

Validate runs some checks to ensure this iterator was built correctly.

type Zip

type Zip struct {
	Debug  bool
	Logf   func(format string, v ...interface{})
	Prefix safepath.AbsDir

	// Parser is a pointer to the parser that returned this. If it wasn't
	// returned by a parser, leave this nil. If this iterator came from an
	// iterator, then the Iterator handle should be filled instead.
	Parser interfaces.Parser

	// Iterator is a pointer to the iterator that returned this. If it
	// wasn't returned by an iterator, leave this nil. If this iterator came
	// from a parser, then the Parser handle should be filled instead.
	Iterator interfaces.Iterator

	// Path is the location of the file to unzip.
	Path safepath.AbsFile

	// AllowAnyExtension specifies whether we will attempt to run if the
	// Path does not end with the correct zip extension.
	AllowAnyExtension bool

	// AllowedExtensions specifies a list of extensions that we are allowed
	// to try to decode from. If this is empty, then we allow only the
	// default of zip because allowing no extensions at all would make no
	// sense. If AllowAnyExtension is set, then this has no effect. All the
	// matches are case insensitive.
	AllowedExtensions []string
	// contains filtered or unexported fields
}

Zip is an iterator that takes a .zip URI to open and performs the unzip operation. It will eventually return an Fs iterator since there's no need for it to know how to walk through a filesystem tree itself. It can use a local cache so that future calls to the same URI won't have to waste cycles, but only in cases when we can determine it will be the same file.

func (*Zip) Close

func (obj *Zip) Close() error

Close shuts down the iterator and/or performs clean up after the Recurse method has run. This must be called if you run Recurse.

func (*Zip) GetIterator

func (obj *Zip) GetIterator() interfaces.Iterator

GetIterator returns a handle to the parent iterator that built this iterator if there is one.

func (*Zip) GetParser

func (obj *Zip) GetParser() interfaces.Parser

GetParser returns a handle to the parent parser that built this iterator if there is one.

func (*Zip) Recurse

func (obj *Zip) Recurse(ctx context.Context, scan interfaces.ScanFunc) ([]interfaces.Iterator, error)

Recurse runs a simple iterator that is responsible for unzipping a zip URI into a local filesystem path. If this happens successfully, it will return a new FsIterator that is initialized to this root path.

func (*Zip) String

func (obj *Zip) String() string

String returns a human-readable representation of the zip path we're looking at. The output of this format is not guaranteed to be constant, so don't try to parse it.

func (*Zip) Validate

func (obj *Zip) Validate() error

Validate runs some checks to ensure this iterator was built correctly.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL