glob

package module
v0.0.0-...-5b9d6e7 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Mar 5, 2024 License: Apache-2.0 Imports: 5 Imported by: 1

README

glob

Advanced filesystem glob for golang

CI Status Go Report Card Package Doc Releases

glob provides an advanced file system glob language, a superset of the pattern language provided by that of the golang standard lib's fs package.

Installation

glob is provided as a go module and requires go >= 1.18.

go get github.com/halimath/glob@main

Usage

glob provides a type Pattern which can be created using the New function:

pat, err := glob.New("**/*_test.go")

A Pattern may then be used to search for matches in a fs.FS. If you want all matches, simply use the GlobFS method:

files, err := pat.GlobFS(fsys, "")

Pattern language

The pattern language used by glob works similar to the pattern format of .gitignore. It is completely compatible with the pattern format used by os.Glob or fs.Glob and extends it.

The format is specified as the following EBNF:

pattern = term, { '/', term };

term        = '**' | name;
name        = { charSpecial | group | escapedChar | '*' | '?' };
charSpecial = (* any unicode rune except '/', '*', '?', '[' and '\' *);
char        = (* any unicode rune *);
escapedChar = '\\', char;
group       = '[', [ '^' ] { escapedChar | groupChar | range } ']';
groupChar   = (* any unicode rune except '-' and ']' *);
range       = ( groupChar | escapedChar ), '-', (groupChar | escapedChar);

The format operators have the following meaning:

  • any character (rune) matches the exactly this rune - with the following exceptions
  • / works as a directory separator. It matches directory boundarys of the underlying system independently of the separator char used by the OS.
  • ? matches exactly one non-separator char
  • * matches any number of non-separator chars - including zero
  • \ escapes a character's special meaning allowing * and ? to be used as regular characters.
  • ** matches any number of nested directories. If anything is matched it always extends until a separator or the end of the name.
  • Groups can be defined using the [ and ] characters. Inside a group the special meaning of the characters mentioned before is disabled but the following rules apply
    • any character used as part of the group acts as a choice to pick from
    • if the group's first character is a ^ the whole group is negated
    • a range can be defined using - matching any rune between low and high inclusive
    • Multiple ranges can be given. Ranges can be combined with choices.
    • The meaning of - and ] can be escacped using \

Performance

glob separates pattern parsing and matching. This can create a performance benefit when applied repeatedly. When reusing a precompiled pattern to match filenames glob outperforms filepath.Match with both simple and complex patterns. When not reusing the parsed pattern, filepath works much faster (but lacks the additional features).

Test Execution time [ns/op] Memory usage [B/op] Allocations per op
filepath simple pattern 15.5 0 0
glob simple pattern (reuse) 3.9 0 0
glob simple pattern (noreuse) 495.0 1112 5
filepath complex pattern 226.2 0 0
glob complex pattern (reuse) 108.1 0 0
glob complex pattern (noreuse) 1103.0 2280 8
glob directory wildcard pattern (reuse) 111.7 0 0
glob directory wildcard pattern (noreuse) 1229.0 2280 8

License

Copyright 2022 Alexander Metzner.

Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.

Documentation

Overview

Package glob implements a language for specifying glob patterns for path names starting at some root. The language does not follow the specs from filepath.Match but provides a superset which allows for directory wildcards.

Patterns consist of normal characters, non-separator wildcards '*' and '?', separators '/' and directory wildcards '**'.

A somewhat formal grammer can be given as:

pattern = term, { '/', term };
term        = '**' | name;
name        = { charSpecial | group | escapedChar | '*' | '?' };
charSpecial = (* any unicode rune except '/', '*', '?', '[' and '\' *);
char        = (* any unicode rune *);
escapedChar = '\\', char;
group       = '[', [ '^' ] { escapedChar | groupChar | range } ']';
groupChar   = (* any unicode rune except '-' and ']' *);
range       = ( groupChar | escapedChar ), '-', (groupChar | escapedChar);

The format operators have the following meaning:

  • any character (rune) matches the exactly this rune - with the following exceptions
  • `/` works as a directory separator. It matches directory boundarys of the underlying system independently of the separator char used by the OS.
  • `?` matches exactly one non-separator char
  • `*` matches any number of non-separator chars - including zero
  • `\` escapes a character's special meaning allowing `*` and `?` to be used as regular characters.
  • `**` matches any number of nested directories. If anything is matched it always extends until a separator or the end of the name.
  • Groups can be defined using the `[` and `]` characters. Inside a group the special meaning of the characters mentioned before is disabled but the following rules apply
  • any character used as part of the group acts as a choice to pick from
  • if the group's first character is a `^` the whole group is negated
  • a range can be defined using `-` matching any rune between low and high inclusive
  • Multiple ranges can be given. Ranges can be combined with choices.
  • The meaning of `-` and `]` can be escacped using `\`

Index

Constants

View Source
const (
	// Separator defines the path separator to use in patterns. This is always
	// a forward slash independently of the underlying's OS separator
	Separator = '/'
	// SingleWildcard defines the the single non-separator character wildcard
	// operator.
	SingleWildcard = '?'
	// AnyWildcard defines the the any number of non-separator characters
	// wildcard operator.
	AnyWildcard = '*'
	// Backslash escapes the next character's special meaning
	Backslash = '\\'
	// GroupStart starts a range
	GroupStart = '['
	// GroupEnd starts a range
	GroupEnd = ']'
	// GroupNegate when used as the first character of a group negates the group.
	GroupNegate = '^'
	// Range defines the range operator
	Range = '-'
)

Variables

View Source
var (
	// ErrBadPattern is returned when an invalid pattern is found. Make
	// sure you use errors.Is to compare errors to this sentinel value.
	ErrBadPattern = errors.New("bad pattern")
)

Functions

This section is empty.

Types

type Pattern

type Pattern struct {
	// contains filtered or unexported fields
}

Pattern defines a glob pattern prepared ahead of time which can be used to match filenames. Pattern is safe to use concurrently.

func New

func New(pat string) (*Pattern, error)

New creates a new pattern from pat and returns it. It returns an error indicating any invalid pattern.

func (*Pattern) GlobFS

func (pat *Pattern) GlobFS(fsys fs.FS, root string) ([]string, error)

GlobFS applies pat to all files found in fsys under root and returns the matching path names as a string slice. It uses fs.WalkDir internally and all constraints given for that function apply to GlobFS.

func (*Pattern) Match

func (pat *Pattern) Match(f string) bool

Match matches a file's path name f to the compiled pattern and returns whether the path matches the pattern or not.

func (*Pattern) MatchPrefix

func (pat *Pattern) MatchPrefix(f string) bool

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL