walker

package
v1.22.0 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Sep 21, 2024 License: MIT Imports: 11 Imported by: 0

Documentation

Overview

Package walker provides Walker, Scan and Sweep.

Index

Constants

This section is empty.

Variables

View Source
var ErrUnknownAction = errors.New("Process.BeforeChild: Unknown action")

Functions

func Scan

func Scan(Visit ScanProcess, Params Params) ([]string, error)

Scan recursively scans a directory tree, and returns all nodes matching the Visit function. Nodes returned are first sorted descending by score, then by lexicographical order. When an error occurs, may continue scanning until all units have exited and returns nil, err.

This function is a convenience alternative to:

 scanner := Walker{Visit: Visit, Params: Params}
 err := scanner.Walk();
	results := scanner.Results()

func ScanMatch

func ScanMatch(value bool) float64

ScanMatch can be used to implement a boolean scan process. When value is true, it returns 1, when it is false, it returns -1.

func Sweep

func Sweep(Visit SweepProcess, Params Params) ([]string, error)

Sweep recursively sweeps a directory tree, and returns all nodes that are empty or contain only empty directories When an error occurs, may continue sweeping until all units have exited and returns nil, err.

This function is a convenience alternative to:

 scanner := Walker{Visit: Visit, Params: Params}
 err := scanner.Walk();
	results := scanner.Results()

Types

type FS

type FS interface {
	// Path returns the path of this FS.
	// The path should not be normalized.
	Path() string

	// ResolvedPath returns the current path of this FS.
	// This function will be called only once, and may perform (potentially slow) normalization.
	//
	// The return value is used for cycle detection, and also passed to all other functions in this interface.
	ResolvedPath() (string, error)

	// Read reads the root directory of this filesystem.
	// and returns a list of directory entries sorted by filename.
	//
	// If is roughly equivalent to the ReadDir method of fs.ReadDirFS.
	// Assuming fsys is an internal fs.FS the method might be implemented as:
	//
	//  fs.ReadDir(fs.FS(fsys), ".")
	Read(path string) ([]fs.DirEntry, error)

	// CanSub indicates if the given directory entry can be used as a valid FS.
	//
	// Sub creates a new FS for the provided entry.
	// path and rpath are the Path() and ResolvedPath() values.
	// Sub is only called when CanSub returns true and a nil error.
	CanSub(path string, entry fs.DirEntry) (bool, error)
	Sub(path, rpath string, entry fs.DirEntry) FS
}

FS represents a file system for use by walker

See NewRealFS for a instantiating a sample implementation.

func NewRealFS

func NewRealFS(path string, followLinks bool) FS

NewRealFS returns a new filesystem rooted at path. followLinks indicates if the filesystem should follow and resolve links.

type Params

type Params struct {
	// Root is the root filesystem to begin the walk at
	Root FS

	// ExtraRoots are extra root folders to walk on top of root.
	// These may be nil.
	ExtraRoots []FS

	// MaxParallel is maximum number of nodes that will be scanned in parallel.
	// Zero or negative values are treated as no limit.
	MaxParallel int

	// BufferSize is an integer that can be used to optimize internal behavior.
	// It should be larger than the average number of expected results.
	// Set to 0 to disable.
	BufferSize int
}

Params are parameters for a walk across a filesystem

type Process

type Process[S any] interface {
	// Visit is called for every node that is being visited.
	// It is the first function called for each node.
	//
	// It receives a context, representing the node being visited.
	//
	// Visit should return three things.
	//
	// Snapshot is an arbitrary object that captures that current state of the process
	// It is maintained throughout the processing of one node, and returned to the parent node (when being processed concurrently)
	//
	// shouldVisitChildren determines if any children of this node should be visited or if the process should stop.
	// When shouldVisitChildren is false, no other functions are called for this node, and the snapshot is returned to the parent (if any) immediately.
	//
	// Err is any error that may occur, and should typically be nil.
	// An error immediately causes iteration on this node to be aborted, and the first error of any node will be returned to the caller of Walk.
	Visit(context WalkContext[S]) (shouldVisitChildren bool, err error)

	// VisitChild is called to determine if and how a child node should be processed.
	//
	// A child entry is valid if it can be recursively processed (i.e. is a directory).
	//
	// When child is valid, it determines how the child should be processed; otherwise action is ignored.
	VisitChild(child fs.DirEntry, valid bool, context WalkContext[S]) (action Step, err error)

	// AfterVisitChild is called after a child has been visited synchronously.
	//
	// It is passed to special values, the returned snapshot (as returned from AfterVisit / Visit) and if the child was processed properly.
	// The child was processed improperly when any of the Process functions on it returned an error, listing a directory failed, or it was already processed before (loop detection). In these cases resultValue is nil.
	AfterVisitChild(child fs.DirEntry, resultValue any, resultOK bool, context WalkContext[S]) (err error)

	// AfterVisit is called after all children have been visited (or scheduled to be visited).
	// It is not called for the case where Visit returns shouldVisitChildren = false.
	//
	// result can be used to mark the current node, see also Visit.
	//
	// The returnValue returned from AfterVisit is passed to parent(s) if any.
	AfterVisit(context WalkContext[S]) (err error)
}

Process determines the behavior of a Walker.

Each process may hold intermediate state of type S. Processes should not retain references to VisitContexts (or state) beyond the invocation of each method.

type ScanProcess

type ScanProcess func(path string, root FS, depth int) (score float64, cont bool, err error)

ScanProcess is a function that is called once for each directory that is being walked. It returns a triple of float64 score, bool continue and err error.

match indicates that what score the path received. A non-negative score indicates a match, and will be returned in the array from Scan(). cont indicates if Scan() should continue scanning recursively. err != nil indicates that an error has occurred, and the entire process should be aborted.

ScanProcess may be nil. In such a case, it is assumed to return (0, true, nil) for every invocation.

ScanProcess implements Process and can be used with Walk

func (ScanProcess) AfterVisit

func (ScanProcess) AfterVisit(context WalkContext[struct{}]) (err error)

func (ScanProcess) AfterVisitChild

func (ScanProcess) AfterVisitChild(child fs.DirEntry, resultValue any, resultOK bool, context WalkContext[struct{}]) (err error)

func (ScanProcess) Visit

func (v ScanProcess) Visit(context WalkContext[struct{}]) (shouldVisitChildren bool, err error)

func (ScanProcess) VisitChild

func (ScanProcess) VisitChild(child fs.DirEntry, valid bool, context WalkContext[struct{}]) (action Step, err error)

type Step

type Step int

Step describes how a child node should be processed

const (
	// DoNothing ignores the child node, and continue with the next node.
	DoNothing Step = iota
	// DoSync synchronously processes the child node.
	// Once processing the child node has finished the AfterChild() function will be called.
	DoSync
	// DoConcurrent queues the child node to be processed concurrently.
	// The current node will node wait for
	DoConcurrent
)

type SweepProcess

type SweepProcess func(path string, root FS, depth int) (stop bool)

SweepProcess is a function that is called once for each directory that is being sweeped. It returns a boolean stop.

stop should indicate if the scan should continue recursively, or stop and treat the appropriate directory as non-empty.

Visit may be nil. In such a case, it is assumed to return the pair false for every indication.

SweepProcess implements Process and can be used with Walk.

func (SweepProcess) AfterVisit

func (SweepProcess) AfterVisit(context WalkContext[bool]) (err error)

func (SweepProcess) AfterVisitChild

func (SweepProcess) AfterVisitChild(child fs.DirEntry, resultValue any, resultOK bool, context WalkContext[bool]) (err error)

func (SweepProcess) Visit

func (v SweepProcess) Visit(context WalkContext[bool]) (shouldRecurse bool, err error)

func (SweepProcess) VisitChild

func (SweepProcess) VisitChild(child fs.DirEntry, valid bool, context WalkContext[bool]) (action Step, err error)

type WalkContext

type WalkContext[S any] interface {
	// Root node this instance of the scan started from
	Root() FS

	// Current node being operated on
	Node() FS

	// Path to the current node
	NodePath() string

	// Path from the root node to this node
	Path() []string

	// Depth of this node, equivalent to len(Path())
	Depth() int

	// Update the snapshot corresponding to the current context
	Snapshot(update func(snapshot S) (value S))

	// Mark the current node as a result with the given priority.
	// May be called multiple times, in which case the node is marked as a result multiple times.
	Mark(priority float64)
}

WalkContext represents the current state of a Walker. It may additionally hold a snapshot of the state of type S.

Any instance of WalkContext should not be retained past any method it is passed to.

type Walker

type Walker[S any] struct {
	Params  Params
	Process Process[S]
	// contains filtered or unexported fields
}

Walker is an object that can recursively operate on all subdirectories of a directory and score those matching a specific criterion. The criterion is determined by the Process parameter.

Process also determines if the process can operate on multiple directories concurrently. Parameters determine the initial root directory (or directories) to start with, and what level of concurrency the walker may make use of.

Each Walker may be used only once. A typical use of a walker looks like:

w := Walker{/* ... */}
if err := w.Walk(); err != nil {
  return err
}
results, scores := w.Results(), w.Scores()

func (*Walker[S]) Paths added in v1.19.0

func (w *Walker[S]) Paths(resolved bool) []string

Paths returns the path of all nodes which have been marked as a result.

When resolved is true, returns the normalized (resolved) paths; else the non-normalized versions are returned. Directories are returned in sorted order; sorted first ascending by priority then by lexicographically by resolved node path. Each call to result returns a new copy of the results.

Paths expects the Scan() function to have returned, and will panic if this is not the case.

func (*Walker[S]) Results

func (w *Walker[S]) Results() []string

Results behaves like w.Paths(true).

DEPRECATED

func (*Walker[S]) Scores

func (w *Walker[S]) Scores() []float64

Scores returns the scores which have been marked as a result. They are returned in the same order as Results()

Results expects the Scan() function to have returned, and will panic if this is not the case.

func (*Walker[S]) Walk

func (w *Walker[S]) Walk() error

Walk begins recursively walking the directory tree starting at the roots defined in Config.

Walk must be called at most once for each Walker and will panic() if called multiple times.

This function is untested because the tests for Scan and Sweep suffice.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL