fileutils

package module
v0.0.0-...-d6a3fe2 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Oct 29, 2024 License: MIT Imports: 22 Imported by: 22

README

Simple file utilities for golang, written for working with markup files and other file types (like images) typically associated with documentation.

Included are types for describing a file and configuring a set of associated output files:

  • InputFile describes a file: full path, size, MIME type, ?IsXML, MMCtype, and contents (up to 2 megabytes).

  • MMCtype is meant to function like a MIME type and has three fields. It can be set based on file name and contents, and later updated if the file is XML and has a DOCTYPE declaration. Refer to file mmctype.go

  • OutputFiles makes it easier to create a group of like-named files for an InputFile, in the same directory or optionally in a like-named subdirectory.

Known issues

  • Tested only on macos (i.e. it's sure to fail on Windows)

Example

$ cd /opt
$ ls example*
example.xml
import "github.com/fbaube/fileutils"

IF, _ := fileutils.NewInputFile("example.xml")

fmt.Fprintf(os.Stdout, "You opened: %s \n", IF)
// You opened: /opt/example.xml
println("i.e.", IF.DString())
// i.e. InputFile</opt/example.xml>sz<42>dir?<n>bin?<n>img?<n>mime<text/plain>

// Argument is not "": Creates a subdirectory for associated output files.
OF, _ := IF.NewOutputFiles("_myapp")

// Creates an associated file and returns the io.WriteCloser
w_diag, _ := OF.NewOutputExt("diag")
fmt.Fprintln(w_diag, "Lots of diagnostic info")
w_diag.Close()
$ ls example*
example.xml

example.xml_myapp:
example.diag

Dependencies

  • github.com/hosom/gomagic for MIME type analysis
  • github.com/pkg/errors for wrapping errors
  • github.com/fbaube/stringutils for various

Documentation

Index

Constants

View Source
const MAX_FILE_SIZE = 100_000_000

MAX_FILE_SIZE is set (arbitrarily) to 100 megabytes

View Source
const PathSep = string(os.PathSeparator)

A token nod to Windoze compatibility.

Variables

This section is empty.

Functions

func AbsWRT

func AbsWRT(problyRelFP string, wrtDir string) string

AbsWRT is like "filepath.Abs(..)"": it can convert a possibly-relative filepath to an absolute filepath. The difference is that a relative filepath argument is not resolved w.r.t. the current working directory; it is instead done w.r.t. the supplied directory argument.

func AppendToFileBaseName

func AppendToFileBaseName(name, toAppend string) string

func ClearAndCreateDirectory

func ClearAndCreateDirectory(path string) error

ClearAndCreateDirectory deletes it before re-creating it. The older version (named "ClearDirectory") tried to keep the directory as-is while emptying it.

func ClearDirectory

func ClearDirectory(path string) error

ClearDirectory tries to keep the directory as-is while emptying it.

func CopyDirRecursivelyFromTo

func CopyDirRecursivelyFromTo(src string, dst string) error

CopyDirRecursivelyFromTo copies a whole directory recursively. BOTH arguments should be directories !! Otherwise, hilarity ensures.

func CopyFileFromTo

func CopyFileFromTo(src, dst string) error

CopyFileFromTo copies a single file from src to dst.

func CopyFileGreedily

func CopyFileGreedily(src string, dst string) error

CopyFileGreedily reads the entire file into memory, and is therefore memory-constrained !

func CopyFromTo

func CopyFromTo(src, dst string) error

CopyFromTo copies the contents of src to dst atomically, using a temp file as intermediary.

func CreateEmpty

func CreateEmpty(path AbsFilePath) (*os.File, error)

CreateEmpty opens the filepath as a writable empty file, truncating it if it exists and is non-empty.

func DirectoryContents

func DirectoryContents(f *os.File) ([]os.FileInfo, error)

DirectoryContents returns the results of "(*os.File)Readdir(..)". "File.Name()" might be a relative filepath but if it was opened okay then it at least functions as an absolute filepath. If the path is not a directory then it panics.

The call to "Readdir(..)" reads the contents of the directory associated with arg "File" and returns a slice of "FileInfo" values, as would be returned by "Lstat(..)", in directory order.

func DirectoryFiles

func DirectoryFiles(f *os.File) (int, []os.FileInfo, error)

DirectoryFiles is like "DirectoryContents(..)" except that results that are directories (not files) are nil'ed out. If there were entries but none were files, it return ("0,nil,nil").

func Enhomed

func Enhomed(s string) string

Enhomed shortens a filepath by substituting "~".

func EnsureTrailingPathSep

func EnsureTrailingPathSep(s string) string

func Exists

func Exists(path string) bool

Exists returns true *iff* the file exists and is in fact a file.

func GatherDirTreeList

func GatherDirTreeList(path string) (paths []string)

GatherDirTreeList walks a directory tree (fetched via os.DirFS) to gather a list of all item names (file, directories, symlinks, more).

NOTE that for the argument "inpath", it makes no difference whether:

  • inpath is relative or absolute
  • inpath ends with a trailing slash or not
  • inpath is a directory or a symlink to a directory

In the walk process, the item names (as returned by fs.Walkdir) are relative to `inpath` and do not include any information about `inpath` itself, i.e. the portions of the item paths "above" the directory node `inpath`, so the caller has to sort that out.

Regarding values of the argument `inpath`:

  • A valid directory returns at least one item: ".", representing `inpath` itself.
  • A valid file or non-existent path item returns only one item, the `inpath` itself.
  • A symlink to a directory it is followed; behavior for a symlink to a file is not easily summarised.

The docu for os.Dirfs states: The result implements io/fs.StatFS, io/fs.ReadFileFS and io/fs.ReadDirFS.

Therefore API calls that work are:

  • Stat errors should be of type *PathError.
  • Stat(name string) (fs.FileInfo, error)
  • Readfile on success returns a nil error, not io.EOF. The caller is permitted to modify the returned byte slice. This method should return a copy of the underlying data.
  • ReadFile(name string) ([]byte, error)
  • ReadDir reads the named directory and returns a list of directory entries sorted by filename. ReadDir(name string) ([]fs.DirEntry, error)

.

func GetHomeDir

func GetHomeDir() string

GetHomeDir is a convenience function, and refers to the invoking user's home directory.

func GetStringFromStdin

func GetStringFromStdin() (string, error)

GetStringFromStdin reads "os.Stdin" completely (i.e. until "\n^D") and returns a string.

func IsDirAndExists

func IsDirAndExists(path string) bool

IsDirAndExists returns true *iff* the directory exists and is in fact a directory.

func IsFileAtPath

func IsFileAtPath(aPath string) (bool, os.FileInfo, error)

IsFileAtPath checks that the file exists AND that it is "regular" (not dir, symlink, pipe), and also returns size and permissions in *os.FileInfo

Return values:

  • (true, *FileInfo, nil) if a regular file exists (but can be 0-len!)
  • (false, *FileInfo, nil) if something else exists (incl. dir)
  • (false, nil, nil) if nothing at all exists
  • (false, nil, anError) if some unusual error was returned (failing disk?)

Notes & caveats:

  • File emptiness (i.e. length 0) is not checked
  • "~" for user home dir is not expanded and will fail

.

func IsNonEmpty

func IsNonEmpty(path string) bool

IsNonEmpty returns true *iff* the file exists *and* contains at least one byte of data.

func IsXML

func IsXML(path string) bool

IsXML returns true *iff* the file exists *and* appears to be XML. The check is simple though.

func MTypeSub

func MTypeSub(mtype string, i int) string

func MakeDirectoryExist

func MakeDirectoryExist(path string) error

MakeDirectoryExist might not create it ?! (NOTE)

func Must

func Must(f *os.File, e error) *os.File

Must wraps this package's most common return values and panics if it gets an error.

func OpenRO

func OpenRO(path string) (f *os.File, e error)

OpenRO opens (and returns) the filepath as a readable file.

func OpenRW

func OpenRW(path string) (f *os.File, e error)

OpenRW opens (and returns) the filepath as a writable file. An existing file is not truncated, merely opened.

func ResolvePath

func ResolvePath(s string) string

ResolvePath is needed because functions in package path/filepath do not handle "~" (home directory) well. If an error occurs (for whatever reason), we punt: simply return the original input argument.

func SameContents

func SameContents(f1, f2 *os.File) bool

SameContents returns: Are the two files' contents identical ?

func SessionSummary

func SessionSummary() string

SessionSummary can be called anytime.

func StripTrailingPathSep

func StripTrailingPathSep(s string) string

func TempDir

func TempDir(dest string) string

Tempdir checks and returns the value of the envar `TMPDIR`.

func WriteAtomic

func WriteAtomic(dest string, write func(w io.Writer) error) (err error)

WriteAtomic is TBS.

func XmlAttrS

func XmlAttrS(a xml.Attr) string

func XmlNameS

func XmlNameS(n xml.Name) string

func XmlStartElmS

func XmlStartElmS(se xml.StartElement) string

Types

type AbsFilePath

type AbsFilePath string

AbsFilePath is a new type, based on `string`. It serves three purposes: - clarify and bring correctness to the processing of absolute path arguments - permit the use of a clearly named struct field - permit the definition of methods on the type

Note that when working with an `os.File`, `Name()` returns the name of the file as was passed to `Open(..)`, so it might be a relative filepath.

func AbsFP

func AbsFP(relFP string) AbsFilePath

AbsFP is like filepath.Abs(..) except using our own types.

func (AbsFilePath) Append

func (afp AbsFilePath) Append(rfp string) AbsFilePath

Append is a convenience function to keep code cleaner.

func (AbsFilePath) BaseName

func (afp AbsFilePath) BaseName() string

func (AbsFilePath) DirExists

func (afp AbsFilePath) DirExists() bool

DirExists returns true *iff* the directory exists and is in fact a directory.

func (AbsFilePath) DirPath

func (afp AbsFilePath) DirPath() AbsFilePath

func (AbsFilePath) Enhomed

func (afp AbsFilePath) Enhomed() string

func (AbsFilePath) Exists

func (afp AbsFilePath) Exists() bool

Exists returns true *iff* the file exists and is in fact a file.

func (AbsFilePath) FileExt

func (afp AbsFilePath) FileExt() string

func (AbsFilePath) FileSize

func (afp AbsFilePath) FileSize() int

FileSize returns the size *iff* the filepath exists and is in fact a file.

func (AbsFilePath) HasPrefix

func (afp AbsFilePath) HasPrefix(beg AbsFilePath) bool

StartsWith is like strings.HasPrefix(..) but uses our types.

func (AbsFilePath) OpenExistingDir

func (afp AbsFilePath) OpenExistingDir() (f *os.File, e error)

OpenExistingDir returns the directory *iff* it exists and can be opened for reading. Note that the `os.File` can be nil without error. Thus we cannot (or: *do not*) distinguish btwn non-existence and an actual error. OTOH if it exists but is not a directory, return an error.

func (AbsFilePath) OpenOrCreateDir

func (afp AbsFilePath) OpenOrCreateDir() (f *os.File, e error)

OpenOrCreateDir returns true if (a) the directory exists and can be opened, or (b) it does not exist, and/but it can be created anew.

func (AbsFilePath) S

func (afp AbsFilePath) S() string

S is a utility method to keep code cleaner.

func (AbsFilePath) Tildotted

func (afp AbsFilePath) Tildotted() string

type Errer

type Errer struct {
	Err error
}

Errer is a struct that can be used to embed an error in another struct, when we want to execute (pointer) methods on a struct in the style of a data pipeline, i.e. chainable, and executed left-to-right.

We make the error public so that it is easily set, and so that we can wrap errors easily using the "%w" printf format spec.

Methods are on *Errer, not Errer, so that modification is possible. .

func (*Errer) ClearError

func (p *Errer) ClearError()

func (*Errer) Error

func (p *Errer) Error() string

Error is an NPE-proof improvement on the standard error.Error() .

func (*Errer) GetError

func (p *Errer) GetError() error

GetError is a convenience func because getting Error.Err is ugly. .

func (*Errer) HasError

func (p *Errer) HasError() bool

HasError is a convenience function. Since Err is publicly visible, HasError is not really needed, but it seems appropriate given that we also have func Error() .

func (*Errer) SetError

func (p *Errer) SetError(e error)

SerError is a convenience func because setting Error.Err is ugly. .

type FSItem

type FSItem struct {
	// fi should NOT be exported, because it is relied on heavily
	// and updated often & carefully.
	FI fs.FileInfo
	// FSItem_type is closely linked to FI and they
	// should always be updated in lockstep.
	FSItem_type
	// TypedRaw is a ptr, to allow for lazy loading.
	*CT.TypedRaw
	// FPs is a ptr, to allow for items that are not (yet) on disk
	// or are kept only in memory. Each path includes the [FP.Base].
	// Paths are used mainly for func [Refresh] and for reproducing
	// the tree structure of import batches; other uses TBD.
	//
	// Paths follow our rules:
	//  - a directory MUST end in a slash (or OS sep)
	//  - a symlink MUST NOT end in a slash (or OS sep)
	//
	// Note that an [fs.FileInfo] does not preserve or provide path
	// info, which is part of the motivstion for this large struct.
	FPs *Filepaths
	// Exists is false when [os.Lstat] returns ´(nil, nil)´.
	Exists bool
	// Dirty has semantics TBD.
	Dirty bool
	// Perms is UNIX-style "rwx" user/group/world
	Perms string
	// Inode and NLinks are for hard link detection.
	Inode, NLinks int // uint64
	// Hash is for content change detection
	Hash []byte
	// Errer provides an NPE-proof error field
	Errer
}

FSItem is an item identified by a filepath (plus its contents) that we have tried to or will try to read, write, or create. It might be a directory or symlink, either of which requires further processing elsewhere. In the most common usage, it is a file.

It might be just a path where nothing exists but we intend to do something. Its filepath(s) can be empty ("") if (for example) its content was created interactively or it so far lives only in memory.

NOTE basically all fields are exported. This will change in the future when the handlng of modifications is tightened up.

NOTE that the file name (aka [FP.Base], the part of the full path after the last directory separator) is not stored separately: it is stored in the AbsFP *and* the RelFP. Note also that this path & name information duplicates what is stored in an instance of orderednodes.Nord .

NOTE that it embeds an fs.FileInfo, and implements interfaces [FSItemer], fs.FileInfo, and fs.DirEntry), and contains basic file system metadata PLUS the path to the item (whicih FIleInfo does not contain) AND the item contents (but only after lazy loading). The `FileInfo` is the results of a call to os.LStat/fs.Lstat (or perhaps alternatively the contents of a record in sqlar or zip), parsed.

FSItem is embedded in struct datarepo/rowmodels/ContentityRow.

This struct is rather large and all-encompassing, but this follows from certain design decisions and certain behavior in the stahdard library.

It might seem odd to include a [TypedRaw] rather than a plain [Raw]. But in general when we are working with serializing and deserializing content ASTs, it is important to know what we are working with, cos sometimes we can - or want to - have to - do things like include HTML in Markdown, or permit HTML tags in LwDITA.

It might also seem odd that MU_type_DIRLIKE is a "markup type", but this avoids many practival problems encountered in trying to process file system trees.

NOTE that RelFP and AbsFP must be exported to be persisted to the DB.

This struct might be somehow applicable to non-file FS nodes and also other hierarchical structures (like XML), but this is not explored yet. .

func NewFSItem

func NewFSItem(fp string) (*FSItem, error)

NewFSItem takes a filepath (absolute or relative) and analyzes the object (assuming one exists) at the path. This func does not load and analyse the content.

Note that a relative path is appended to the CWD, which may not be the desired behavior; in such a case, use NewFSItemRelativeTo (below).

NOTE if no item exists at fp, this might be flakey, but return `(nil, nil)`.

Note that an empty path is not OK; instead create an pathless FSItem from the content. .

func NewFSItemFromContent

func NewFSItemFromContent(s string) (*FSItem, error)

func ReadDir

func ReadDir(inpath string) ([]FSItem, error)

ReadDir returns only errors from the initial step of opening the directory. An error returned on an individual directory item is attached to the item via interface Errer.

func (*FSItem) Debug

func (p *FSItem) Debug() string

Debug implements [Stringser].

func (*FSItem) DirEntryInfo

func (p *FSItem) DirEntryInfo() fs.FileInfo

DirEntryInfo implements fs.DirEntry by returning interface fs.FileInfo. This should be named Info but it collides with interface [Stringser).

func (*FSItem) Echo

func (p *FSItem) Echo() string

Echo implements [Stringser].

func (*FSItem) HasContents

func (p *FSItem) HasContents() bool

HasContents is the opposite of [IsEmpty].

func (p *FSItem) HasMultiHardlinks() bool

func (*FSItem) Info

func (p *FSItem) Info() string

Info implements [Stringser].

func (*FSItem) IsDir

func (p *FSItem) IsDir() bool

func (*FSItem) IsDirlike

func (p *FSItem) IsDirlike() bool

IsDirlike is, well, documented elsewhere.

func (*FSItem) IsEmpty

func (p *FSItem) IsEmpty() bool

IsEmpty is a convenience function for files (and directories too?).

func (*FSItem) IsFile

func (p *FSItem) IsFile() bool

IsFile is a convenience function.

func (p *FSItem) IsSymlink() bool

IsSymlink is a convenience function.

func (*FSItem) ListingString

func (p *FSItem) ListingString() string

ListingString prints: rwx,rwx,rwx [or not exist] ... Rawtype (file)Len Name Error? \n

func (*FSItem) LoadContents

func (p *FSItem) LoadContents() error

LoadContents reads the file (assuming it is a file) into the field [TypedRaw] and quickly checks for XML and HTML5 declarations.

Before proceeding it calls [Refresh], just in case.

It is tolerant about non-files, and empty files,returning nil for error.

NOTE the call to os.Open defaults to R/W mode, altho R/O might suffice. .

func (*FSItem) NewLinesFile

func (pPI *FSItem) NewLinesFile() (*LinesFile, error)

NewLinesFile is pretty self-explanatory.

func (*FSItem) Refresh

func (p *FSItem) Refresh() error

Refresh updates the embedded fs.FileInfo and checks four things: existence, item type, file size, and modification time. Details:

  • A file coming into existence or a file being appended to might be common use cases.
  • In general, if any of the four things has changed, it writes a warning to stdout and in some cases returns an fs.PathError.
  • If [Dirty] is set, some warnings do not apply.
  • If there is already an error, this call is ignored.

.

func (p *FSItem) ResolveSymlinks() *FSItem

ResolveSymlinks will follow links until it finds something else. NOTE that this can be a SECURITY HOLE.

func (*FSItem) String

func (p *FSItem) String() (s string)

func (*FSItem) StringWithPermissions

func (p *FSItem) StringWithPermissions() (s string)

func (*FSItem) Type

func (p *FSItem) Type() fs.FileMode

Type implements fs.DirEntry by returning the fs.FileMode.

type FSItemInfo

type FSItemInfo interface {
	fs.FileInfo
	fs.DirEntry

	// IsExist is a convenience function, updated by [Refresh].
	IsExist() bool
	// CreationPath is the path (abs or rel) used to create it.
	// It is implemented by the embedded [Filepaths].
	CreationPath() string
	// IsDirty has semantics TBD.
	IsDirty() bool
	// Refresh updates the embedded [fs.FileInfo] and checks four
	// things: existence, item type, file size, modification time.
	Refresh() error
	// Permissions returns the standard Unix bits.
	Permissions() int

	// FICode4L returns one of "FILE", "DIRR", "SYML", "OTHR".
	FICode4L() string
	// IsFile says whether it is a regular file.
	IsFile() bool
	// IsDir says whether it is a directory. It is
	// pass-thru from the embedded [fs.FileInfo].
	IsDir() bool
	// IsDirlike means (a) it can NOT contain own content and
	// (b) it is/has link(s) to other items that can be further
	// examined (meaning: it is a directory or a symlink).
	IsDirlike() bool
	// IsSymlink is a convenience function.
	IsSymlink() bool
	// HasMultiHardlinks might not be portable.
	HasMultiHardlinks() bool

	// IsEmpty means either (a) it cannot have content OR
	// (b) it can but the length of the content is zero.
	IsEmpty() bool
	// HasContents means both (a) it can have content
	// AND (b) the length of that content is non-zero.
	HasContents() bool

	// DirEntryInfo implements [fs.DirEntry] by returning interface
	// [fs.FileInfo]. This should be named Info as in [fs.DirEntry.Info]
	// but that would collide with interface [Stringser).
	DirEntryInfo() fs.FileInfo
}

FSItemInfo is an interface that extends two common interfaces, fs.FileInfo and fs.DirEntry, and can be implemented by using only information provided by those two interfaces. .

type FSItem_type

type FSItem_type D.SemanticFieldType
const (
	FSItem_type_DIRR FSItem_type = FSItem_type(D.SFT_FSDIR)
	FSItem_type_FILE FSItem_type = FSItem_type(D.SFT_FSFIL)
	FSItem_type_SYML FSItem_type = FSItem_type(D.SFT_FSYML)
	FSItem_type_OTHR FSItem_type = FSItem_type(D.SFT_FSOTH)
)

type FileLine

type FileLine struct {
	CT.Raw        // string
	RawLineNr int // source file line number
	// contains filtered or unexported fields
}

FileLine is a record (i.e. a line) in a LinesFile.

type Filepaths

type Filepaths struct {
	// RelFP is tipicly the path given (e.g.) on the command line and is
	// useful for resolving relative paths in batches of content items.
	// The value might be valid only for the current CLI invocation or
	// user session, but it is persistable to preserve relationships
	// among files in import batches.
	RelFP string
	// AbsFP is the authoritative field when processing individual files.
	AbsFP string
	// GotAbs (from [path/filepath/IsAbs]) says that this struct was
	// created using a relative FP, not an absolute FP, and so the
	// field [RelFP] is calculated.
	GotAbs bool
	// Local (from [path/filepath/IsLocal]) is OK but not-Local might
	// be a security hole.
	Local bool
	// Valid (from [path/filepath/ValidPath]) fails for absolute paths,
	// but can be set to `true` for them.
	Valid bool
	// ShortFP is the path shortened by using "." (CWD) or "~" (user's
	// home directory), so it might only be valid for the current CLI
	// invocation or user session and it is def not persistable.
	ShortFP string
}

Filepaths shuld always have all three fields set, even if the third ([ShortFP]) is basically session-specific. Note that directories always have a slash (or OS sep) appended, and symlinks never should.

NOTE that the file name (aka [FP.Base], the part of the full path after the last directory separator) is not stored separately: it is stored in AbsFP *and* RelFP. Note also that all this path & name information duplicates what is stored in an instance of orderednodes.Nord . .

func NewFilepaths

func NewFilepaths(anFP string) (*Filepaths, error)

NewFilepaths relies on the std lib, and accepts either an absolute or a relative filepath. It does, however, not accept an empty filepath.

It takes care to remove a trailing slash (or OS sep) before calling functions in path/filepath, so that symlinks are not unintentionally followed.

NOTE the stdlib functions called here (Valid, IsLocal) do not like absolute filepaths, so it might be better to call this with a relative filepath when possible.

Ref: type PathError struct { Op string Path string Err error } .

func (*Filepaths) CreationPath

func (p *Filepaths) CreationPath() string

CreationPath is the path (abs or rel) used to create it. It can be "", if the item wasn't/isn't on disk.

func (*Filepaths) EnsurePathSepSuffixes

func (p *Filepaths) EnsurePathSepSuffixes()

func (*Filepaths) String

func (p *Filepaths) String() string

func (*Filepaths) TrimPathSepSuffixes

func (p *Filepaths) TrimPathSepSuffixes()

type LinesFile

type LinesFile struct {
	*FSItem
	Lines []*FileLine
}

LinesFile is for reading a file where each line is a record.

type ValidUTF8Reader

type ValidUTF8Reader struct {
	// contains filtered or unexported fields
}

ValidUTF8Reader implements a Reader which reads only bytes that constitute valid UTF-8.

func NewValidUTF8Reader

func NewValidUTF8Reader(rd io.Reader) ValidUTF8Reader

NewValidUTF8Reader constructs a new `ValidUTF8Reader` that wraps an existing `io.Reader`.

func (ValidUTF8Reader) Read

func (rd ValidUTF8Reader) Read(b []byte) (n int, err error)

Read reads bytes into the byte array passed in. It returns `n`, the number of bytes read.

Directories

Path Synopsis
cmd

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL