parallel

package
v10.27.1 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Nov 13, 2024 License: MIT Imports: 8 Imported by: 0

Documentation

Index

Constants

This section is empty.

Variables

View Source
var ReaddirTimeoutError = errors.New("readdir timed out getting file properties")

Functions

func Crawl

func Crawl(ctx context.Context, root Directory, worker EnumerateOneDirFunc, parallelism int) <-chan CrawlResult

Crawl crawls an abstract directory tree, using the supplied enumeration function. May be use for whatever that function can enumerate (i.e. not necessarily a local file system, just anything tree-structured)

func CrawlLocalDirectory

func CrawlLocalDirectory(ctx context.Context, root string, parallelism int, reader DirReader) <-chan CrawlResult

CrawlLocalDirectory specializes parallel.Crawl to work specifically on a local directory. It does not follow symlinks. The items in the CrawResult output channel are FileSystemEntry s. For a wrapper that makes this look more like filepath.Walk, see parallel.Walk.

func Transform

func Transform(ctx context.Context, input <-chan CrawlResult, worker TransformFunc, parallelism int) <-chan TransformResult

transformation will stop when input is closed

func Walk

func Walk(appCtx context.Context, root string, parallelism int, parallelStat bool, walkFn filepath.WalkFunc)

Walk is similar to filepath.Walk. But note the following difference is how WalkFunc is used:

  1. If fileError passed to walkFunc is not nil, then here the filePath passed to that function will usually be "" (whereas with filepath.Walk it will usually (always?) have a value).
  2. If the return value of walkFunc function is not nil, enumeration will always stop, not matter what the type of the error. (Unlike filepath.WalkFunc, where returning filePath.SkipDir is handled as a special case).

Types

type CrawlResult

type CrawlResult struct {
	// contains filtered or unexported fields
}

func (CrawlResult) Item

func (r CrawlResult) Item() (interface{}, error)

type DirReader

type DirReader interface {
	Readdir(dir *os.File, n int) ([]os.FileInfo, error)
	Close()
}

func NewDirReader

func NewDirReader(totalAvailableParallelisim int, parallelStat bool) (DirReader, int)

NewDirReader makes a directory reader. If parallelStat is true, then the reader uses a pool of go-routines to do the lookups from name of directory entry to full os.FileInfo. Useful on Linux, but not Windows. Why do we need this? Because on Linux os.Readdir does the same lookups, but it does them sequentially which hurts performance. Alternatives like https://github.com/karrick/godirwalk avoid the lookup all together, but only if you don't need any information about each entry other than whether its a file or directory. We definitely also need to know whether its a symlink. And, in our current architecture, we also need to get the size and LMT for the file.

type Directory

type Directory interface{}

type DirectoryEntry

type DirectoryEntry interface{}

type EnumerateOneDirFunc

type EnumerateOneDirFunc func(dir Directory, enqueueDir func(Directory), enqueueOutput func(DirectoryEntry, error)) error

must be safe to be simultaneously called by multiple go-routines, each with a different dir

type FileSystemEntry

type FileSystemEntry struct {
	// contains filtered or unexported fields
}

type InputObject

type InputObject interface{}

type OutputObject

type OutputObject interface{}

type TransformFunc

type TransformFunc func(input InputObject) (OutputObject, error)

must be safe to be simultaneously called by multiple go-routines

type TransformResult

type TransformResult struct {
	// contains filtered or unexported fields
}

func (TransformResult) Item

func (r TransformResult) Item() (interface{}, error)

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL